Now that seal_snapshot doesn't need to lookup
the snapshot_manager in pending_snapshots to
get to the file_sets, erasing the snapshot_manager
object can be done in table::snapshot which
also inserted it there.
This will make it easier to get rid of it in a later patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
and pass it to seal_snapshot, so that the latter won't
need to lookup and access the snapshot_manager object.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Write schema.cql and the files manifest in finalize_snapshot.
Currently call it from table::snapshot, but it will
be called in a later patch by snapshot_on_all_shards.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To simplify processing of the per-shard file names
for generating the manifest.
We only need to print them to the manifest at the
end of the process, so there's no point in copying
them around in the process, just move the
foreign unique unordered_set.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently copying the sstable file names are created
and destroyed on each shard and are copied by the
"coordinator" shards using submit_to, while the
coroutine holds the source on its stack frame.
To prepare for the next patches that refactor this
code so that the coordinator shard will submit_to
each shard to perform `take_snapshot` and return
the set of sstrings in the future result, we need
to wrap the result in a foreign_ptr so it gets
freed on the shard that created it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
There could be thousands of sstables so we better
cosider yielding in the tight loop that copies
the sstable names into the unordered_set we return.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Don't catch exception but rather just return
them in the return future, as the exception
is handled by the caller.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Do the actual snapshot-taking code in a per-shard
take_snapshot function, to be called from
snapshot_on_all_shards in a following patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Pass an optional truncated_at time_point to
truncate_table_on_all_shards instead of the over-complicated
timestamp_func that returns the same time_point on all shards
anyhow, and was only used for coordination across shards.
Since now we synchronize the internal execution phase in
truncate_table_on_all_shards, there is no longer need
for this timestamp_func.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
With that the database layer does no longer
need to invoke the private table::snapshot function,
so it can be defriended from class table.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Start moving the per-shard state establishment logic
to truncate_table_on_all_shards, so that we would evetually
do only the truncate logic per-se in the per-shard truncate function.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that the per-shard truncate function is called
only from truncate_table_on_all_shards, we can reject the schema_tables
keyspace in the upper layer. There's no need to check that on each shard.
While at it, reuse `is_system_keyspace`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Called from the respective database entry points.
Will be called also from the database drop / truncate path
and will be used for central coordination of per-shard
table::snapshot so we don't have to depend on the snapshot_manager
mechanism that is fragile and currently causes abort if we fail
to allocate it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
truncate the table on all shards then stop it on shards
in the upper layer rather than in the per-shard drop_column_family()
function, so we can further refactor truncate later, flushing
and taking snapshot on all shards, before truncating.
With that, rename drop_column_family to detach_columng_family
as now it only deregisters the column family from containers
that refer to it (even via its uuid) and then its caller
is reponsible to take it from there.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
cf->schema()->id() is the same one returned
by find_uuid(ks_name, cf_name);
As a follow up, we should define a concrete
table_id type and rename schema::id() to schema::table_id()
to return it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Print once on "coordinator" shard.
And promote to info level as it's important to log
when we're dropping a table (and if we're going to take a snapshot).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
timestamp_func
Since in the drop_table case we want to discard ALL
sstables in the table, not only those with `max_data_age()`
up until drop started.
Fixes#11232
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Before we change drop_table_on_all_shards to always
pass db_clock::time_point::max() in the next patch,
let it pass a unique snapshot name, otherwise
the snapshot name will always be based on the constant, max
time_point.
Refs #11232
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This assert was tweaked several times:
Introduced in 83323e155e,
then fixed in b2b1a1f7e1 to account
for no rp from discard_sstables, then in
9620755c7f to account for
cases we do not flush the table, then again in
71c5dc82df to make that more accurate.
But, the assert wasn't correct in the first place
in the sense that we first get `low_mark` which
represents the highest replay_position at the time truncate
was called, but then we call discard_sstables with a time_point
of `truncated_at` that we get from the caller via the timestamp_func,
and that one could be in the past, before truncate was called -
hence discard_sstables with that timestamp may very well
return a replay_position from older sstables, prior to flush
that can be smaller than the low_mark.
Fix this assert to account for that case.
The real fix to this issue is to have a truncate_tombstone
that will carry an authoritative api::timstamp (#11230)
Fixes#11231
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Following up on 1c26d49fba,
apply mutations on the correct db shard in all test cases
before we define and use database::truncate_table_on_all_shards.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Range tombstones are kept in memory (cache/memtable) in
range_tombstone_list. It keeps them deoverlapped, so applying a range
tombstone which covers many range tombstones will erase existing range
tombstones from the list. This operation needs to be exception-safe,
so range_tombstone_list maintains an undo log. This undo log will
receive a record for each range tombstone which is removed. For
exception safety reasons, before pushing an undo log entry, we reserve
space in the log by calling std::vector::reserve(size() + 1). This is
O(N) where N is the number of undo log entries. Therefore, the whole
application is O(N^2).
This can cause reactor stalls and availability issues when replicas
apply such deletions.
This patch avoids the problem by reserving exponentially increasing
amount of space. Also, to avoid large allocations, switches the
container to chunked_vector.
Fixes#11211Closes#11215
These are the first commits out of #10815.
It starts by moving pytest logic out of the common `test/conftest.py`
and into `test/topology/conftest.py`, including removing the async
support as it's not used anywhere else.
There's a fix of a bug of leaving tables in `RandomTables.tables` after
dropping all of them.
Keyspace creation is moved out of `conftest.py` into `RandomTables` as
it makes more sense and this way topology tests avoid all the
workarounds for old version (topology needs ScyllaDB 5+ for Raft,
anyway).
And a minor fix.
Closes#11210
* github.com:scylladb/scylladb:
test.py: fix type hint for seed in ScyllaServer
test.py: create/drop keyspace in tables helper
test.py: RandomTables clear list when dropping all tables
test.py: move topology conftest logic to its own
test.py: async topology tests auto run with pytest_asyncio
Since all topology test will use the helper, create the keyspace in the
helper.
Avoid the need of dropping all tables per test and just drop the
keyspace.
While there, use blocking CQL execution so it can be used in the
constructor and avoids possible issues with scheduling on cleanup. Also,
creation and drop should happen only once per cluster and no test should
be running changes (either not started or finished).
All topology tests are for Scylla with Raft. So don't use the Cassandra
this_dc workaround as it's unnecessary for Scylla.
Remove return type of random_tables fixture to match other fixtures
everywhere else.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Clear the list of active tables when dropping them.
While there do the list element exchange atomically across active and
removed tables lists.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Move asyncio, Raft checks, and RandomTables to topology test suite's own
conftest file.
While there, use non-async version of pre-checks to avoid unnecessary
complexity (we want async tests, not async setup, for now).
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Async tests and fixtures in the topology directory are expected to run
with pytest_asyncio (not other async frameworks). Force this with auto
mode.
CI has an older pytest_asyncio version lacking pytest_asyncio.fixture.
Auto mode helps avoiding the need of it and tests and fixtures can just
be marked with regular @pytest.mark.async.
This way tests can run in both older and newer versions of the packages.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.
The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.
Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"
* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
consistency_level: Remove is_local() and count_local_endpoints()
storage_proxy: Use topology::local_endpoints_count()
storage_proxy: Use proxy's topology for DC checks
storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
storage_proxy: Use topology local_dc_filter in its methods
storage_proxy: Mark some digest_read_resolver methods private
forwarding_service: Use topology local_dc_filter
storage_service: Use topology local_dc_filter
consistency_level: Use topology local_dc_filter
consitency-level: Call count_local_endpoints from topology
consistency_level: Get datacenter from topology
replication_strategy: Remove hold snitch reference
effective_replication_map: Get datacenter from topology
topology: Add local-dc detection shugar
No code uses them now -- switched to use topology -- so thse two can be
dropped together with their calls for global snitch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
A continuation of the previous patches -- now all the code that needs
this helper have proxy pointer at hand
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Several proxy helper classes need to filter endpoints by datacenter.
Since now the have shared_ptr<proxy> on-board, they can get topology
via proxy's token metadata
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It will be needed to get token metadata from proxy. The resolver in
question is created and maintained by abstract_read_executor which
already has shared_ptr<proxy>, so it just gives its copy
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The proxy has token metadata pointer, so it can use its topology
reference to filter endpoints by datacenter
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>