Commit Graph

328 Commits

Author SHA1 Message Date
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Avi Kivity
fdc1449392 treewide: rename flat_mutation_reader_v2 to mutation_reader
flat_mutation_reader_v2 was introduced in a pair of commits in 2021:

  e3309322c3 "Clone flat_mutation_reader related classes into v2 variants"
  08b5773c12 "Adapt flat_mutation_reader_v2 to the new version of the API"

as a replacement for flat_mutation_reader, using range_tombstone_change
instead of range_tombstone to represent represent range tombstones. See
those commits for more information.

The transition was incremental; the last use of the original
flat_mutation_reader was removed in 2022 in commit

  026f8cc1e7 "db: Use mutation_partition_v2 in mvcc"

In turn, flat_mutation_reader was introduced in 2017 in commit

  748205ca75 "Introduce flat_mutation_reader"

To transition from a mutation_reader that nested rows within
a partition in a separate stream, to a flat reader that streamed
partitions and rows in the same stream.

Here, we reclaim the original name and rename the awkward
flat_mutation_reader_v2 to mutation_reader.

Note that mutation_fragment_v2 remains since we still use the original
for compatibilty, sometimes.

Some notes about the transition:

 - files were also renamed. In one case (flat_mutation_reader_test.cc), the
   rename target already existed, so we rename to
    mutation_reader_another_test.cc.

 - a namespace 'mutation_reader' with two definitions existed (in
   mutation_reader_fwd.hh). Its contents was folded into the mutation_reader
   class. As a result, a few #includes had to be adjusted.

Closes scylladb/scylladb#19356
2024-06-21 07:12:06 +03:00
Pavel Emelyanov
8906126a2c stream_manager: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:56 +03:00
Pavel Emelyanov
ae2dcdc7c2 streaming: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:55 +03:00
Pavel Emelyanov
57517d5987 streaming: Proparage view_builder& down to make_streaming_consumer()
Continuation of the previous patch. Repair itself doesn't need it, but
streaming consumer does.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:46 +03:00
Tomasz Grabiec
fdcaaea91a tablets, streaming: Implement tablet streaming for intra-node migration 2024-05-16 00:28:46 +02:00
Pavel Emelyanov
1f44a374b8 error_injection: Overload inject() instead of inject_with_handler()
The inject_with_handler() method accepts a coroutine that can be called
wiht injection_handler. With such function as an argument, there's no
need in distinctive inject_with_handler() name for a method, it can be
overload of all the existing inject()-s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 19:30:19 +03:00
Asias He
83a28342ea service: Drop unused table param from session_topology_guard
The table param is not used. Dropping it so it can be used in places
where the table object is not available.

Closes scylladb/scylladb#17628
2024-03-07 09:34:40 +02:00
Aleksandra Martyniuk
2ea5d9b623 test: test drop table on receiver side during streaming 2024-02-15 12:06:47 +01:00
Asias He
e7e1f4b01a streaming: Fix rpc::source and rpc::optional parameter order
The new rpc::optional parameter must come after any existing parameters,
including the rpc::source parameters, otherwise it will break
compatibility.

The regression was introduced in:

```
commit fd3c089ccc
Author: Tomasz Grabiec <tgrabiec@scylladb.com>
Date:   Thu Oct 26 00:35:19 2023 +0200

    service: range_streamer: Propagate topology_guard to receivers
```

We need to backport this patch ASAP before we release anything that
contains commit fd3c089ccc.

Refs: #16941
Fixes: #17175

Closes scylladb/scylladb#17176
2024-02-06 13:15:28 +01:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Kefu Chai
f86a5ae87a streaming: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16947
2024-01-23 19:38:30 +02:00
Avi Kivity
9c0f05efa1 Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec
Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later.

This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted.

The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained.

The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was.

This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas.

Closes scylladb/scylladb#15847

* github.com:scylladb/scylladb:
  test: tablets: Add test for failed streaming being fenced away
  error_injection: Introduce poll_for_message()
  error_injection: Make is_enabled() public
  api: Add API to kill connection to a particular host
  range_streamer: Do not block topology change barriers around streaming
  range_streamer, tablets: Do not keep token metadata around streaming
  tablets: Fail gracefully when migrating tablet has no pending replica
  storage_service, api: Add API to disable tablet balancing
  storage_service, api: Add API to migrate a tablet
  storage_service, raft topology: Run streaming under session topology guard
  storage_service, tablets: Use session to guard tablet streaming
  tablets: Add per-tablet session id field to tablet metadata
  service: range_streamer: Propagate topology_guard to receivers
  streaming: Always close the rpc::sink
  storage_service: Introduce concept of a topology_guard
  storage_service: Introduce session concept
  tablets: Fix topology_metadata_guard holding on to the old erm
  docs: Document the topology_guard mechanism
2023-12-07 16:29:02 +02:00
Tomasz Grabiec
7d0f4c10a2 test: tablets: Add test for failed streaming being fenced away 2023-12-06 18:37:01 +01:00
Tomasz Grabiec
9dac0febce range_streamer: Do not block topology change barriers around streaming
Streaming was keeping effective_replication_map_ptr around the whole
process, which blocks topology change barriers.

This will inhibit progress of tablet load balancer or concurrent
migrations, resulting in worse performance.

Fix by switching to the most recent erm on sharder
calls. multishard_writer calls shard_of() for each new partition.

A better way would be to switch immediately when topology version
changes, but this is left for later.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
fd3c089ccc service: range_streamer: Propagate topology_guard to receivers 2023-12-06 18:36:16 +01:00
Tomasz Grabiec
063095ea50 streaming: Always close the rpc::sink
rpc::sink::~sink aborts if not closed. There is a try/catch clause
which ensures that close() is called, but there was code after sink is
created which is not covered by it. Move sink construction past that
code.
2023-12-06 18:35:41 +01:00
Benny Halevy
6f7de427f0 streaming: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:12:11 +02:00
Kamil Braun
819f542ee6 migration_manager: take abort_source& in get_schema_for_read/write
No callsite needed the `nullptr` case, so we can convert pointer to
reference.
2023-11-23 17:23:47 +01:00
Asias He
2b2302d373 streaming: Ignore dropped table on both sides
It is possible the sender and receiver of streaming nodes have different
views on if a table is dropped or not.

For example:
- n1, n2 and n3 in the cluster

- n4 started to join the cluster and stream data from n1, n2, n3

- a table was dropped

- n4 failed to write data from n2 to sstable because a table was dropped

- n4 ended the streaming

- n2 checked if the table was present and would ignore the error if the table was dropped

- however n2 found the table was still present and was not dropped

- n2 marked the streaming as failed

This will fail the streaming when a table is dropped. We want streaming to
ignore such dropped tables.

In this patch, a status code is sent back to the sender to notify the
table is dropped so the sender could ignore the dropped table.

Fixes #15370

Closes scylladb/scylladb#15912
2023-11-03 13:38:48 +02:00
Pavel Emelyanov
b974d8ca1b stream_session: Do not print banign exceptions with error level
Handler of STREAM_MUTATION_FRAGMENTS verb creates and starts reader. The
resulting future is then checked for being exceptional and an error
message is printed in logs.

However, if reader fails because of socket being closed by peer, the
error looks excessive. In that case the exception is just regular
handling of the socket/stream closure and can be demoted down to debug
level.

fixes: #15891

Similar cherry-picking of log level exists in e.g. storage proxy, see
for example 56bd9b5d (service: storage_proxy: do not report abort
    requests in handle_write )

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15892
2023-10-31 14:21:22 +02:00
Botond Dénes
4a02865ea1 Merge 'Prevent invalidation of iterators over database::_column_families' from Aleksandra Martyniuk
Maps related to column families in database are extracted
to a column_families_data class. Access to them is possible only
through methods. All methods which may preempt hold rwlock
in relevant mode, so that the iterators can't become invalid.

Fixes: #13290

Closes #13349

* github.com:scylladb/scylladb:
  replica: make tables_metadata's attributes private
  replica: add methods to get a filtered copy of tables map
  replica: add methods to check if given table exists
  replica: add methods to get table or table id
  replica: api: return table_id instead of const table_id&
  replica: iterate safely over tables related maps
  replica: pass tables_metadata to phased_barrier_top_10_counts
  replica: add methods to safely add and remove table
  replica: wrap column families related maps into tables_metadata
  replica: futurize database::add_column_family and database::remove
2023-07-31 15:31:59 +03:00
Tomasz Grabiec
f88220aeee stream_transfer_task, multishard_writer: Work with table sharder
So that we can use it on tablet-based tables.
2023-07-25 21:08:51 +02:00
Aleksandra Martyniuk
cdbfa0b2f5 replica: iterate safely over tables related maps
Loops over _column_families and _ks_cf_to_uuid which may preempt
are protected by reader mode of rwlock so that iterators won't
get invalid.
2023-07-25 17:13:04 +02:00
Aleksandra Martyniuk
52afd9d42d replica: wrap column families related maps into tables_metadata
As a preparation for ensuring access safety for column families
related maps, add tables_metadata, access to members of which
would be protected by rwlock.
2023-07-25 16:13:00 +02:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Avi Kivity
26c8470f65 treewide: use #include <seastar/...> for seastar headers
We treat Seastar as an external library, so fix the few places
that didn't do so to use angle brackets.

Closes #14037
2023-06-06 08:36:09 +03:00
Gleb Natapov
7caf1d26fb migration manager: Make schema pull abortable.
Now which schema pull may issues raft read barrier it may stuck if
majority is not available. Make the operation abortable and abort it
during queries if timeout is reached.
2023-05-11 16:31:23 +03:00
Botond Dénes
156e5d346d reader_permit: keep trace_state pointer on permit
And propagate it down to where it is created. This will be used to add
trace points for semaphore related events, but this will come in the
next patches.
2023-03-22 04:58:01 -04:00
Benny Halevy
1392c7e1cf streaming: stream_session: maybe_yield
To prevent reactor stalls when freeing many/long
token range vectors.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 17:32:44 +02:00
Benny Halevy
c4836ab9e9 streaming: stream_session: prepare: move token ranges to add_transfer_ranges
Reduce copies on the path to calling add_transfer_ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 17:04:47 +02:00
Pavel Emelyanov
f51762c72a headers: Refine view_update_generator.hh and around
The initial intent was to reduce the fanout of shared_sstable.hh through
v.u.g.hh -> cql_test_env.hh chain, but it also resulted in some shots
around v.u.g.hh -> database.hh inclusion.

By and large:
- v.u.g.hh doesn't need database.hh
- cql_test_env.hh doesn't need v.u.g.hh (and thus -- the
  shared_sstable.hh) but needs database.hh instead
- few other .cc files need v.u.g.hh directly as they pulled it via
  cql_test_env.hh before
- add forward declarations in few other places

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12952
2023-02-22 09:32:30 +02:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Asias He
b9e5e340aa streaming: Enable offstrategy for all classic streaming based node ops
This patch enables offstrategy compaction for all classic streaming
based node ops. We can use this method because tables are streamed one
after another. As long as there is still streamed data for a given
table, we update the automatic trigger timer. When all the streaming has
finished, the trigger timer will timeout and fire the offstrategy
compaction for the given table.

I checked with this patch, rebuild is 3X faster. There was no compaction
in the middle of the streaming. The streamed sstables are compacted
together after streaming is done.

Time Before:
INFO  2022-11-25 10:06:08,213 [shard 0] range_streamer - Rebuild
succeeded, took 67 seconds, nr_ranges_remaining=0

Time After:
INFO  2022-11-25 09:42:50,943 [shard 0] range_streamer - Rebuild
succeeded, took 23 seconds, nr_ranges_remaining=0

Compaciton Before:
88 sstables were written -> 88 sstables were added into main set

Compaction After:
88 sstables written ->  after offstretegy 2 sstables were added into main seet

Closes #11848
2022-12-28 11:12:02 +02:00
Benny Halevy
314e45d957 streaming: define plan_id as a strong tagged_uuid type
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-22 19:45:30 +03:00
Benny Halevy
2b017ce285 schema, everywhere: define and use table_schema_version as a strong type
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.

Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:45 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Asias He
953af38281 streaming: Allow drop table during streaming
Currently, if a table is dropped during streaming, the streaming would
fail with no_such_column_family error.

Since the table is dropped anyway, it makes more sense to ignore the
streaming result of the dropped table, whether it is successful or
failed.

This allows users to drop tables during node operations, e.g., bootstrap
or decommission a node.

This is especially useful for the cloud users where it is hard to
coordinate between a node operation by admin and user cql change.

This patch also fixes a possible user after free issue by not passing
the table reference object around.

Fixes #10395

Closes #10396
2022-04-24 17:43:20 +03:00
Botond Dénes
b029bd3db7 tree: remove mutation_reader.hh include
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
2022-03-30 15:42:51 +03:00
Botond Dénes
fcf15fda94 readers: generating_reader: use noncopyable_function<>
std::function<> requires the functor it wraps to be copyable, which is
an unnecessarily strict requirement. To relax this, we use
noncopyable_function<> instead. Since the former seems to lack some
disambiguation magic of the latter, we add `_v1` and `_v2` postfixes to
manually disambiguate.
2022-03-17 06:53:44 +02:00
Botond Dénes
35bbd54946 readers: merge generating.hh into generating_v2.hh
Both variants return a v2 reader and we are going to keep both for a
time to come.
2022-03-17 06:52:28 +02:00
Botond Dénes
7844ff9912 readers/generating.hh: return v2 reader from make_generating_reader()
For now, the (v1) reader is just upgraded to v2 behind the scenes.
2022-03-17 06:51:20 +02:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Botond Dénes
ad1b157452 streaming/consumer: convert to v2
At least on the API level, internally there are still conversions, but
these are going to be sorted out in the next patches too.
2022-03-02 09:55:09 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Raphael S. Carvalho
426450dc04 treewide: remove useless include of database.hh
Wrote a script based on cpp-include to find places that needlessly
included database.hh, which is expensive to process during
build time.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220104204359.168895-1-raphaelsc@scylladb.com>
2022-01-05 10:15:19 +02:00
Pavel Emelyanov
a3b4d4d3cf stream_session: Use manager reference from result-future
When the stream_session initializes it's being equipped with
the shared-pointer on the stream_result_future very early. In
all the places where stream_session needs the manager this
pointer is alive and session get get manager from it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:17:37 +03:00