Commit Graph

292 Commits

Author SHA1 Message Date
Benny Halevy
314e45d957 streaming: define plan_id as a strong tagged_uuid type
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-22 19:45:30 +03:00
Benny Halevy
2b017ce285 schema, everywhere: define and use table_schema_version as a strong type
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.

Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:45 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Asias He
953af38281 streaming: Allow drop table during streaming
Currently, if a table is dropped during streaming, the streaming would
fail with no_such_column_family error.

Since the table is dropped anyway, it makes more sense to ignore the
streaming result of the dropped table, whether it is successful or
failed.

This allows users to drop tables during node operations, e.g., bootstrap
or decommission a node.

This is especially useful for the cloud users where it is hard to
coordinate between a node operation by admin and user cql change.

This patch also fixes a possible user after free issue by not passing
the table reference object around.

Fixes #10395

Closes #10396
2022-04-24 17:43:20 +03:00
Botond Dénes
b029bd3db7 tree: remove mutation_reader.hh include
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
2022-03-30 15:42:51 +03:00
Botond Dénes
fcf15fda94 readers: generating_reader: use noncopyable_function<>
std::function<> requires the functor it wraps to be copyable, which is
an unnecessarily strict requirement. To relax this, we use
noncopyable_function<> instead. Since the former seems to lack some
disambiguation magic of the latter, we add `_v1` and `_v2` postfixes to
manually disambiguate.
2022-03-17 06:53:44 +02:00
Botond Dénes
35bbd54946 readers: merge generating.hh into generating_v2.hh
Both variants return a v2 reader and we are going to keep both for a
time to come.
2022-03-17 06:52:28 +02:00
Botond Dénes
7844ff9912 readers/generating.hh: return v2 reader from make_generating_reader()
For now, the (v1) reader is just upgraded to v2 behind the scenes.
2022-03-17 06:51:20 +02:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Botond Dénes
ad1b157452 streaming/consumer: convert to v2
At least on the API level, internally there are still conversions, but
these are going to be sorted out in the next patches too.
2022-03-02 09:55:09 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Raphael S. Carvalho
426450dc04 treewide: remove useless include of database.hh
Wrote a script based on cpp-include to find places that needlessly
included database.hh, which is expensive to process during
build time.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220104204359.168895-1-raphaelsc@scylladb.com>
2022-01-05 10:15:19 +02:00
Pavel Emelyanov
a3b4d4d3cf stream_session: Use manager reference from result-future
When the stream_session initializes it's being equipped with
the shared-pointer on the stream_result_future very early. In
all the places where stream_session needs the manager this
pointer is alive and session get get manager from it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:17:37 +03:00
Pavel Emelyanov
56f5327450 stream_session: Capture container() in message handler
The stream_mutation_fragments handler need to access the manager. Since
the handler is registered by the manager itself, it can capture the
local manager reference and use container() where appropriate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:17:37 +03:00
Pavel Emelyanov
db33607eb2 stream_session: Keep stream_manager reference
The manager is needed to get messaging service and database from.
Actually, the database can be pushed though arguments in all the
places, so effectively session only needs the messaging. However,
the stream-task's need the manager badly and there's no other
place to get it from other than the session.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:17:37 +03:00
Pavel Emelyanov
f2ae080c63 stream_session: Remove unused default contructor
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:17:37 +03:00
Pavel Emelyanov
5b748a72de stream_result_future: Keep stream_manager reference
The stream_result_future needs manager to register on it and to
unregister from it. Also the result-future is referenced from
stream_session that also needs the manager (see next patches).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:17:37 +03:00
Pavel Emelyanov
8ab96a8362 streaming: Move get_session into stream_manager
This makes the code a bit shorter and helps removing one more call
for global stream manager.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:59 +03:00
Pavel Emelyanov
228b4520a6 streaming: Use container.invoke_on in rpc handlers
This will help to reduce the usage of global manager instance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:59 +03:00
Pavel Emelyanov
73e10c7aed streaming: Move start/stop onto common rails
In case of streaming this mostly means dropping the global
init/uninit calls and replacing them with sharded<stream_manager>
instance. It's still global, but it's being fixed atm.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Pavel Emelyanov
ba298bd5c6 streaming: Remove global dependency pointers
Now they are not needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Pavel Emelyanov
6d7eb76fad streaming: Use get_stream_manager to get dependencies
Currently streaming uses global pointers to save and get a
dependency. Now all the dependencies live on the manager,
this patch changes all the places in streaming/ to get the
needed dependencies from it, not from global pointer (next
patch will remove those globals).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Pavel Emelyanov
e448774588 streaming: Move rpc verbs reg/unreg into manager
As a part of streaming start/stop unification.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Pavel Emelyanov
165971fb7f streaming: Initialize stream manager with proper deps
The stream manager is going to become central point of control
for the streaming subsys. This patch makes its dependencies
explicit and prepares the gound for further patching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Avi Kivity
1b492396c1 stream_session.cc: trim unneeded includes
stream_session.cc doesn't need storage_proxy, or sstables, or the
system keyspace. Remove them.

Closes #9230
2021-08-23 10:57:04 +03:00
Botond Dénes
5293bd21cf streaming/stream_session: use database::obtain_reader_permit() 2021-07-14 16:48:43 +03:00
Asias He
5c9816615f streaming: Enable off-strategy compaction for bootstrap and replace
The off-strategy compaction is now enabled for repair based node
operations. It is not bound to repair based node operations though. It
makes sense to enable it for streaming based node operations too.

Fixes #8820

Closes #8821
2021-06-08 12:13:20 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Emelyanov
0944d69475 repair, streaming: Generalize consumer lambdas
Both streaming and repair call the distributed sstables writing with
equal lambdas each being ~30 lines of code. The only difference between
them is repair might request offstrategy compaction for new sstable.

Generalization of these two pieces save lines of codes and speeds the
release/repair/row_level.o compilation by half a minute (out of twelve).

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210531133113.23003-1-xemul@scylladb.com>
2021-06-06 09:21:23 +03:00
Pavel Emelyanov
6b31c47a75 migration_manager: Make get_schema_for_... methods
These two helpers are now namespace-scoped methods, but both
need the migration manager instance inside. All their callers
are now patched to have the migration manager at hands, so
the helpers can be turned into methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
e0ca3ccc1c streaming: Keep migration_manager ptr in rpc lambdas
Same as previous patch, but for streaming.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
423d0baa65 streaming: Get migration_manager shared_ptr in messaging
Same as in previous patch, but for streaming code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Benny Halevy
830128cd95 streaming: stream_session: do not log err.c_str verbatim
It is dangerous to print a formatted string as is, like
sslog.warn(err.c_str()) since it might hold curly braces ('{}')
and those require respective runtime args.

Instead, it should be logged as e.g. sslog.warn("{}", err.c_str()).

This will prevent issues like #8436.

Refs #8436

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210408173048.124417-2-bhalevy@scylladb.com>
2021-04-09 08:36:49 +03:00
Benny Halevy
76cd315c42 streaming: stream_session: do not escape curly braces in format strings
Those turn into '{}' in the formatted strings and trigger
a logger error in the following sstlog.warn(err.c_str())
call.

Fixes #8436

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210408173048.124417-1-bhalevy@scylladb.com>
2021-04-09 08:36:49 +03:00
Avi Kivity
82c76832df treewide: don't include "db/system_distributed_keyspace.hh" from headers
This just causes unneeded and slower recompliations. Instead replace
with forward declarations, or includes of smaller headers that were
incidentally brought in by the one removed. The .cc files that really
need it gain the include, but they are few.

Ref #1.

Closes #8403
2021-04-04 14:00:26 +03:00
Benny Halevy
d01e7e7b58 stream_session: prepare: fix missing string format argument
As seen in
mv_populating_from_existing_data_during_node_decommission_test dtest:
```
ERROR 2021-02-11 06:01:32,804 [shard 0] stream_session - failed to log message: fmt::v7::format_error (argument not found)
```

Fixes #8067

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210211100158.543952-1-bhalevy@scylladb.com>
2021-02-11 12:05:32 +02:00
Benny Halevy
22f6023ac3 sstables: sstable_writer_config: add origin member
Add a string describing where the sstables originated
from (e.g. memtable, repair, streaming, compaction, etc.)

If configure_writer is called with a nullptr, the origin
will be equal to an empty string.

Introduce test_env_sstables_manager that provides an overload
of configure_writer with no parmeters that calls the base-class'
configure_writer with "test" origin.  This was to reduce the
code churn in this patch and to keep the tests simple.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-01 16:45:52 +02:00
Avi Kivity
5d45662804 database, streaming: remove remnants of memtable-base streaming
Commit e5be3352cf ("database, streaming, messaging: drop
streaming memtables") removed streaming memtables; this removes
the mechanisms to synchronize them: _streaming_flush_gate and
_streaming_flush_phaser. The memory manager for streaming is removed,
and its 10% reserve is evenly distributed between memtables and
general use (e.g. cache).

Note that _streaming_flush_phaser and _streaming_flush_date are
no longer used to syncrhonize anything - the gate is only used
to protect the phaser, and the phaser isn't used for anything.

Closes #7454
2020-11-16 14:32:19 +01:00
Botond Dénes
ff623e70b3 reader_concurrency_semaphore: name permits
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.

As not all read can be associated with one schema, the schema is allowed
to be null.

The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
2020-10-13 12:32:13 +03:00
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Pavel Emelyanov
812eed27fe code: Force formatting of pointer in .debug and .trace
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Pavel Emelyanov
78f2193956 streaming: Do not reveal raw pointer in info message
Showing raw pointer values in logs is not considered to be good
practice. However, for debugging/tracing this might be helpful.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Pavel Emelyanov
24eaf827c0 migration_manager: Add messaging service as argument to get_schema_definition
There are 4 places that call this helper:

- storage proxy. Callers are rpc verb handlers and already have the proxy
  at hands from which they can get the messaging service instance
- repair. There's local-global messaging instance at hands, and the caller
  is in verb handler too
- streaming. The caller is verb handler, which is unregistered on stop, so
  the messaging service instance can be captured
- migration manager itself. The caller already uses "this", so the messaging
  service instance can be get from it

The better approach would be to make get_schema_definition be the method of
migration_manager, but the manager is stopped for real on shutdown, thus
referencing it from the callers might not be safe and needs revisiting. At
the same time the messaging service is always alive, so using its reference
is safe.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:53 +03:00
Pavel Emelyanov
a6888e3ce3 streaming: Keep reference on messaging
Streaming uses messaging, init it with itw own reference.

Nowadays the whole streaming subsystem uses global static references on the
needed services.  This is not nice, but still better than just using code-wide
globals, so treat the messaging service here the same way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Pavel Emelyanov
163d615dc3 streaming: Use local ms() on ::start
This is just a cleanup to avoid explicit global call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Botond Dénes
fe127a2155 sstables: clamp estimated_partitions to [1, +inf) in writers
In some cases estimated number of partitions can be 0, which is albeit a
legit estimation result, breaks many low-level sstable writer code, so
some of these have assertions to ensure estimated partitions is > 0.
To avoid hitting this assert all users of the sstable writers do the
clamping, to ensure estimated partitions is at least 1. However leaving
this to the callers is error prone as #6913 has shown it. As this
clamping is standard practice, it is better to do it in the writers
themselves, avoiding this problem altogether. This is exactly what this
patch does. It also adds two unit tests, one that reproduces the crash
in #6913, and another one that ensures all sstable writers are fine with
estimated partitions being 0 now. Call sites previously doing the
clamping are changed to not do it, it is unnecessary now as the writer
does it itself.

Fixes #6913

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200724120227.267184-1-bdenes@scylladb.com>
2020-07-27 09:19:37 +02:00
Pavel Emelyanov
5060063cd6 messaging: Add missing per-service unregistering methods
5 services register handlers in messaging, but not all of them
have clear unregistration methods.

Summary:

migration_manager: everything is in place, no changes
gossiper: ditto
proxy: some verbs unregistration is missing
repair: no unregistration at all
streaming: ditto

This patch adds the needed unregistration methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-22 16:34:00 +03:00
Pavel Emelyanov
08e36ca77c streaming: Do not use db->invoke_on_all in vain
The db instance is not needed to initialize messages, so use plain smp::invoke_on_all

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-22 16:31:57 +03:00