Commit Graph

32498 Commits

Author SHA1 Message Date
Benny Halevy
84dfd2cabb table: snapshot: move pending_snapshots.erase from seal_snapshot
Now that seal_snapshot doesn't need to lookup
the snapshot_manager in pending_snapshots to
get to the file_sets, erasing the snapshot_manager
object can be done in table::snapshot which
also inserted it there.

This will make it easier to get rid of it in a later patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
39276cacc3 table: finalize_snapshot: take the file sets as a param
and pass it to seal_snapshot, so that the latter won't
need to lookup and access the snapshot_manager object.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
4dd56bbd6d table: make seal_snapshot a static member
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
7cb0a3f6f4 table: finalize_snapshot: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
12716866a9 table: refactor finalize_snapshot out of snapshot
Write schema.cql and the files manifest in finalize_snapshot.
Currently call it from table::snapshot, but it will
be called in a later patch by snapshot_on_all_shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
240f83546d table: snapshot: keep per-shard file sets in snapshot_manager
To simplify processing of the per-shard file names
for generating the manifest.

We only need to print them to the manifest at the
end of the process, so there's no point in copying
them around in the process, just move the
foreign unique unordered_set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
5100c1ba68 table: take_snapshot: return foreign unique ptr
Currently copying the sstable file names are created
and destroyed on each shard and are copied by the
"coordinator" shards using submit_to, while the
coroutine holds the source on its stack frame.

To prepare for the next patches that refactor this
code so that the coordinator shard will submit_to
each shard to perform `take_snapshot` and return
the set of sstrings in the future result, we need
to wrap the result in a foreign_ptr so it gets
freed on the shard that created it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
b54626ad0e table: take_snapshot: maybe yield in per-sstable loop
There could be thousands of sstables so we better
cosider yielding in the tight loop that copies
the sstable names into the unordered_set we return.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
24a1a4069e table: take_snapshot: simplify tables construction code
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
75e38ebccc table: take_snapshot: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
67c1d00f44 table: take_snapshot: simplify error handling
Don't catch exception but rather just return
them in the return future, as the exception
is handled by the caller.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
ff6508aa53 table: refactor take_snapshot out of snapshot
Do the actual snapshot-taking code in a per-shard
take_snapshot function, to be called from
snapshot_on_all_shards in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
37b7a9cce2 utils: get rid of joinpoint
Now that it is no longer used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
56f336d1aa database: get rid of timestamp_func
Pass an optional truncated_at time_point to
truncate_table_on_all_shards instead of the over-complicated
timestamp_func that returns the same time_point on all shards
anyhow, and was only used for coordination across shards.

Since now we synchronize the internal execution phase in
truncate_table_on_all_shards, there is no longer need
for this timestamp_func.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
b640c4fd17 database: truncate: snapshot table in all-shards layer
With that the database layer does no longer
need to invoke the private table::snapshot function,
so it can be defriended from class table.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
af0c71aa12 database: truncate: flush table and views in all-shards layer
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
6e07e6b7ac database: truncate: stop and disable compaction in all-shards layer
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
e78dad1dfb database: truncate: move call to set_low_replay_position_mark to all-shards layer
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
a8bd3d97b6 database: truncate: enter per-shard table async_gate in all-shards layer
Start moving the per-shard state establishment logic
to truncate_table_on_all_shards, so that we would evetually
do only the truncate logic per-se in the per-shard truncate function.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
ff028316f2 database: truncate: move check for schema_tables keyspace to all-shards layer.
Now that the per-shard truncate function is called
only from truncate_table_on_all_shards, we can reject the schema_tables
keyspace in the upper layer.  There's no need to check that on each shard.

While at it, reuse `is_system_keyspace`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
fbe1fa1370 database: snapshot_table_on_all_shards: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
4d4ca40c38 table: add snapshot_on_all_shards
Called from the respective database entry points.

Will be called also from the database drop / truncate path
and will be used for central coordination of per-shard
table::snapshot so we don't have to depend on the snapshot_manager
mechanism that is fragile and currently causes abort if we fail
to allocate it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
be56a73e78 database: add snapshot_table_on_all_shards
We need to snapshot a single table in several paths.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
d96b56fee2 database: rename {flush,snapshot}_on_all and make static
Follow the convention of drop_table_on_all_shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
a1eed1a6e9 database: drop_table_on_all_shards: truncate and stop table in upper layer
truncate the table on all shards then stop it on shards
in the upper layer rather than in the per-shard drop_column_family()
function, so we can further refactor truncate later, flushing
and taking snapshot on all shards, before truncating.

With that, rename drop_column_family to detach_columng_family
as now it only deregisters the column family from containers
that refer to it (even via its uuid) and then its caller
is reponsible to take it from there.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
92cb7d448b database: drop_table_on_all_shards: get all table shards before drop_column_family on each
Se we the upper layer can flush, snapshot, and truncate
the table on all shards, step by step.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
0aaaefbb5c database: drop_column_family: define table& cf
To reduce the churn in the following patch
that will pass the table& as a parameter.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
bb1e5ffb8c database: drop_column_family: reuse uuid for evict_all_for_table
cf->schema()->id() is the same one returned
by find_uuid(ks_name, cf_name);

As a follow up, we should define a concrete
table_id type and rename schema::id() to schema::table_id()
to return it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
e800e1e720 database: drop_column_family: move log message up a layer
Print once on "coordinator" shard.

And promote to info level as it's important to log
when we're dropping a table (and if we're going to take a snapshot).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
ca78a63873 database: truncate: get rid of the unused ks param
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
46e2a7c83b database: add truncate_table_on_all_shards
As a first step to decouple truncate from flush
and snpashot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
5e8c05f1a8 database: drop_table_on_all_shards: do not accept a truncated_at
timestamp_func

Since in the drop_table case we want to discard ALL
sstables in the table, not only those with `max_data_age()`
up until drop started.

Fixes #11232

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:52:51 +03:00
Benny Halevy
574909c78f database: truncate: get optional snapshot_name from caller
Before we change drop_table_on_all_shards to always
pass db_clock::time_point::max() in the next patch,
let it pass a unique snapshot name, otherwise
the snapshot name will always be based on the constant, max
time_point.

Refs #11232

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:03:19 +03:00
Benny Halevy
474b2fdf37 database: truncate: fix assert about replay_position low_mark
This assert was tweaked several times:
Introduced in 83323e155e,
then fixed in b2b1a1f7e1 to account
for no rp from discard_sstables, then in
9620755c7f to account for
cases we do not flush the table, then again in
71c5dc82df to make that more accurate.

But, the assert wasn't correct in the first place
in the sense that we first get `low_mark` which
represents the highest replay_position at the time truncate
was called, but then we call discard_sstables with a time_point
of `truncated_at` that we get from the caller via the timestamp_func,
and that one could be in the past, before truncate was called -
hence discard_sstables with that timestamp may very well
return a replay_position from older sstables, prior to flush
that can be smaller than the low_mark.

Fix this assert to account for that case.

The real fix to this issue is to have a truncate_tombstone
that will carry an authoritative api::timstamp (#11230)

Fixes #11231

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 09:18:06 +03:00
Benny Halevy
9f5e13800d database_test: apply_mutation on the correct db shard
Following up on 1c26d49fba,
apply mutations on the correct db shard in all test cases
before we define and use database::truncate_table_on_all_shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 09:18:06 +03:00
Tomasz Grabiec
7f80602b01 db: range_tombstone_list: Avoid quadratic behavior when applying
Range tombstones are kept in memory (cache/memtable) in
range_tombstone_list. It keeps them deoverlapped, so applying a range
tombstone which covers many range tombstones will erase existing range
tombstones from the list. This operation needs to be exception-safe,
so range_tombstone_list maintains an undo log. This undo log will
receive a record for each range tombstone which is removed. For
exception safety reasons, before pushing an undo log entry, we reserve
space in the log by calling std::vector::reserve(size() + 1). This is
O(N) where N is the number of undo log entries. Therefore, the whole
application is O(N^2).

This can cause reactor stalls and availability issues when replicas
apply such deletions.

This patch avoids the problem by reserving exponentially increasing
amount of space. Also, to avoid large allocations, switches the
container to chunked_vector.

Fixes #11211

Closes #11215
2022-08-05 20:34:07 +03:00
Kamil Braun
d84a93d683 Merge 'Raft test topology part 1' from Alecco
These are the first commits out of #10815.

It starts by moving pytest logic out of the common `test/conftest.py`
and into `test/topology/conftest.py`, including removing the async
support as it's not used anywhere else.

There's a fix of a bug of leaving tables in `RandomTables.tables` after
dropping all of them.

Keyspace creation is moved out of `conftest.py` into `RandomTables` as
it makes more sense and this way topology tests avoid all the
workarounds for old version (topology needs ScyllaDB 5+ for Raft,
anyway).

And a minor fix.

Closes #11210

* github.com:scylladb/scylladb:
  test.py: fix type hint for seed in ScyllaServer
  test.py: create/drop keyspace in tables helper
  test.py: RandomTables clear list when dropping all tables
  test.py: move topology conftest logic to its own
  test.py: async topology tests auto run with pytest_asyncio
2022-08-05 17:56:16 +02:00
Warren Krewenki
4178ccd27f gossiper: Correct typo in log message
Closes #11212
2022-08-05 18:21:36 +03:00
Alejo Sanchez
ec70e26f12 test.py: fix type hint for seed in ScyllaServer
Param seed can be None (e.g. first server) so fix type hint accordingly.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Alejo Sanchez
1d7789e5a9 test.py: create/drop keyspace in tables helper
Since all topology test will use the helper, create the keyspace in the
helper.

Avoid the need of dropping all tables per test and just drop the
keyspace.

While there, use blocking CQL execution so it can be used in the
constructor and avoids possible issues with scheduling on cleanup. Also,
creation and drop should happen only once per cluster and no test should
be running changes (either not started or finished).

All topology tests are for Scylla with Raft. So don't use the Cassandra
this_dc workaround as it's unnecessary for Scylla.

Remove return type of random_tables fixture to match other fixtures
everywhere else.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Alejo Sanchez
9a019628f5 test.py: RandomTables clear list when dropping all tables
Clear the list of active tables when dropping them.

While there do the list element exchange atomically across active and
removed tables lists.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Alejo Sanchez
f6aa0d7bd7 test.py: move topology conftest logic to its own
Move asyncio, Raft checks, and RandomTables to topology test suite's own
conftest file.

While there, use non-async version of pre-checks to avoid unnecessary
complexity (we want async tests, not async setup, for now).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Alejo Sanchez
f665779cdb test.py: async topology tests auto run with pytest_asyncio
Async tests and fixtures in the topology directory are expected to run
with pytest_asyncio (not other async frameworks). Force this with auto
mode.

CI has an older pytest_asyncio version lacking pytest_asyncio.fixture.
Auto mode helps avoiding the need of it and tests and fixtures can just
be marked with regular @pytest.mark.async.

This way tests can run in both older and newer versions of the packages.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Botond Dénes
fbbe2529c1 Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.

The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.

Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"

* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
  consistency_level: Remove is_local() and count_local_endpoints()
  storage_proxy: Use topology::local_endpoints_count()
  storage_proxy: Use proxy's topology for DC checks
  storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
  storage_proxy: Use topology local_dc_filter in its methods
  storage_proxy: Mark some digest_read_resolver methods private
  forwarding_service: Use topology local_dc_filter
  storage_service: Use topology local_dc_filter
  consistency_level: Use topology local_dc_filter
  consitency-level: Call count_local_endpoints from topology
  consistency_level: Get datacenter from topology
  replication_strategy: Remove hold snitch reference
  effective_replication_map: Get datacenter from topology
  topology: Add local-dc detection shugar
2022-08-05 13:31:55 +03:00
Anna Stuchlik
4bc7833a0b doc: update the link to CQL3 type mapping on GitHub
Closes #11224
2022-08-05 13:21:29 +03:00
Pavel Emelyanov
c3718b7a6e consistency_level: Remove is_local() and count_local_endpoints()
No code uses them now -- switched to use topology -- so thse two can be
dropped together with their calls for global snitch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:48 +03:00
Pavel Emelyanov
9c662ee0e5 storage_proxy: Use topology::local_endpoints_count()
A continuation of the previous patches -- now all the code that needs
this helper have proxy pointer at hand

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:48 +03:00
Pavel Emelyanov
9a50d318b6 storage_proxy: Use proxy's topology for DC checks
Several proxy helper classes need to filter endpoints by datacenter.
Since now the have shared_ptr<proxy> on-board, they can get topology
via proxy's token metadata

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:48 +03:00
Pavel Emelyanov
183a2d5a83 storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
It will be needed to get token metadata from proxy. The resolver in
question is created and maintained by abstract_read_executor which
already has shared_ptr<proxy>, so it just gives its copy

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:48 +03:00
Pavel Emelyanov
e1ea801b67 storage_proxy: Use topology local_dc_filter in its methods
The proxy has token metadata pointer, so it can use its topology
reference to filter endpoints by datacenter

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:47 +03:00