Commit Graph

320 Commits

Author SHA1 Message Date
Avi Kivity
2a76065e3d table, memtable: share log-structured allocator statistics across all memtables in a table
The log-structured allocator collects allocation statistics (which it
uses to manage memory reserves) in some objects kept in
memtable_table_shared_data. Right now, this object is local to memtable_list,
which itself is local to a tablet replica. Move it to table scope so
different tablets in the shard share the statistics. This helps a
newly-migrated tablet adjust more quickly.
2023-12-26 21:24:51 +02:00
Avi Kivity
02111d6754 memtable: consolidate _read_section, _allocating_section in a struct
Those two members are passed from memtable_list to memtable. Since we
wish to pass them from table, it becomes awkward to pass them as two
separate variables as their contents are specific to memtable internals.

Wrap them in a name that indicates their role (being table-wide shared
data for memtables) and pass them as a unit.
2023-12-26 21:11:48 +02:00
Raphael S. Carvalho
5e55954f27 replica: Make the storage snapshot survive concurrent compactions
Consider this:
1) file streaming takes storage snapshot = list of sstables
2) concurrent compaction unlink some of those sstables from file system
3) file streaming tries to send unlinked sstables, but files other
than data and index cannot be read as only data and index have file
descriptors opened

To fix it, the snapshot now returns a set of files, one per sstable
component, for each sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16476
2023-12-21 12:50:28 +02:00
Raphael S. Carvalho
546b31846a replica: Introduce storage group splitting
This introduces the ability to split a storage group.
The main compaction group is split into left and right groups.

set_split() is used to set the storage group to splitting mode, which
will create left and right compaction groups. Incoming writes will
now be placed into memtable of either left or right groups.

split() is used to complete the splitting of a group. It only
returns when all preexisting data is split. That means main
compaction group will be empty and all the data will be stored
in either left or right group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 12:02:01 -03:00
Raphael S. Carvalho
213b2f1382 replica: Rename compaction_group_manager to storage_group_manager
That's to reflect the fact that the manager now works with
storage groups instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
15de1cdcbc replica: Introduce concept of storage group
Storage group is the storage of tablets. This new concept is helpful
for tablet splitting, where the storage of tablet will be split
in multiple compaction groups, where each can be compacted
independently.

The reason for not going with arena concept is that it added
complexity, and it felt much more elegant to keep compaction
group unchanged which at the end of the day abstracts the concept
of a set of sstables that can be compacted and operated
independently.

When splitting, the storage group for a tablet may therefore own
multiple compaction groups, left, right, and main, where main
keeps the data that needs splitting. When splitting completes,
only left and right compaction groups will be populated.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Benny Halevy
cddcf3ad0c table: add table_holder and hold method
A smart pointer that guards the table object
while it's being used by async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:43:49 +02:00
Patryk Jędrzejczak
c8ee7d4499 db: make schema commitlog feature mandatory
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.

A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
  cluster feature.

The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.

The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.

When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.

In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.

Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
  assumed to be true.

Closes scylladb/scylladb#16254
2023-12-04 21:02:16 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Benny Halevy
66ba983fe0 compaction_manager: flush_all_tables before major compaction
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb/scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
be763bea34 database: add flush_all_tables
Flushes all tables after forcing force_new_active_segment
of the commitlog to make sure all commitlog segments can
get recycled.

Otherwise, due to "false sharing", rarely-written tables
might inhibit recycling of the commitlog segments they reference.

After f42eb4d1ce,
that won't allow compaction to purge some tombstones based on
the min_gc_time.

To be used in the next patch by major compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
1fd85bd37b api: compaction: add flush_memtables option
When flushing is done externally, e.g. by running
`nodetool flush` prior to `nodetool compact`,
flush_memtables=false can be passed to skip flushing
of tables right before they are major-compacted.

This is useful to prevent creation of small sstables
due to excessive memtable flushing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Kefu Chai
15bfa09454 treewide: do not mark return value const if this has no effect
this change is a cleanup.

to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.

and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:

```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
  245 |     const index_metadata_kind kind() const;
      |     ^~~~~
```
so this change also silences the above warnings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-17 17:46:19 +08:00
Pavel Emelyanov
68cf26587c database: Add get_sstables_manager(bool_class is_system) method
There's one place that does this selection, soon there will appear
another, so it's worth having a convenience helper getter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Patryk Jędrzejczak
fbcd667030 replica: keyspace::create_replication_strategy: remove a redundant parameter
The options parameter is redundant. We always use
`_metadata->strategy_options()` and
`keyspace::create_replication_strategy` already assumes that
`_metadata` is set by using its other fields.

Closes scylladb/scylladb#15776
2023-10-20 10:20:49 +03:00
Avi Kivity
7d5e22b43b replica: memtable: don't forget memtable memory allocation statistics
A memtable object contains two logalloc::allocating_section members
that track memory allocation requirements during reads and writes.
Because these are local to the memtable, each time we seal a memtable
and create a new one, these statistics are forgotten. As a result
we may have to re-learn the typical size of reads and writes, incurring
a small performance penalty.

The solution is to move the allocating_section object to the memtable_list
container. The workload is the same across all memtables of the same
table, so we don't lose discrimination here.

The performance penalty may be increased later if log changes to
memory reserve thresholds including a backtrace, so this reduces the
odds of incurring such a penalty.

Closes scylladb/scylladb#15737
2023-10-18 17:43:33 +02:00
Tomasz Grabiec
0aef0f900b Merge 'truncation records refactorings' from Petr Gusev
This PR contains several refactoring, related to truncation records handling in `system_keyspace`, `commitlog_replayer` and `table` clases:
* drop map_reduce from `commitlog_replayer`, it's sufficient to load truncation records from the null shard;
* add a check that `table::_truncated_at` is properly initialized before it's accessed;
* move its initialization after `init_non_system_keyspaces`

Closes scylladb/scylladb#15583

* github.com:scylladb/scylladb:
  system_keyspace: drop truncation_record
  system_keyspace: remove get_truncated_at method
  table: get_truncation_time: check _truncated_at is initialized
  database: add_column_family: initialize truncation_time for new tables
  database: add_column_family: rename readonly parameter to is_new
  system_keyspace: move load_truncation_times into distributed_loader::populate_keyspace
  commitlog_replayer: refactor commitlog_replayer::impl::init
  system_keyspace: drop redundant typedef
  system_keyspace: drop redundant save_truncation_record overload
  table: rename cache_truncation_record -> set_truncation_time
  system_keyspace: get_truncated_position -> get_truncated_positions
2023-10-17 10:55:30 +02:00
Petr Gusev
80fa5810a7 table: get_truncation_time: check _truncated_at is initialized 2023-10-05 15:19:59 +04:00
Petr Gusev
32a19fd61b database: add_column_family: rename readonly parameter to is_new
We want to make table::_truncated_at optional, so that in
get_truncation_time we can assert that it is initialized.
For existing tables this initialisation will happen in
load_truncation_times function, and for new tables we
want to initialize it in add_column_family like we do
with mark_ready_for_writes.

Now add_column_family function has parameter 'readonly', which is
set by the callers to false if we are creating a fresh new table
and not loading it from sstables. In this commit we rename this
parameter to is_new and invert the passed values.
This will allow us in the next commit to initialize _truncated_at field
for new tables.
2023-10-05 15:19:59 +04:00
Raphael S. Carvalho
893ee68251 replica: Clean up storage of tablet on migration
When a tablet is migrated into a new home, we need to clean its
storage (i.e. the compaction group) in the old home.
This includes its presence in row cache, which can be shared by
multiple tablets living in the same shard.

For exception safety, the following is done first in a "prepare
phase" during cache invalidation.

1) take a compaction guard, to stop and disable compaction
2) flush memtable(s).
3) builds a list of all sstables, which represents all the
storage of the tablet.

Then once cache is invalidated successfully, we then clear
the sstable sets of the the group in the "execution phase",
to prevent any background op from incorrectly picking them
and also to allow for their deletion.

All the sstables of a tablet are deleted atomically, in order
to guarantee that a failure midway won't cause data resurrection
if it happens tablet is migrated back into the old home.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-04 12:16:19 -03:00
Petr Gusev
da1e6751e9 table: rename cache_truncation_record -> set_truncation_time
This is a refactoring commit without observable
changes in behaviour.

There is a truncation_record struct, but in this method we
only care about time, so rename it (and other related methods)
appropriately to avoid confusion.
2023-10-03 17:11:35 +04:00
Avi Kivity
a3d73bfba7 Merge 'Add support for decommission with tablets' from Tomasz Grabiec
Load balancer will recognize decommissioning nodes and will
move tablet replicas away from such nodes with highest priority.

Topology changes have now an extra step called "tablet draining" which
calls the load balancer. The step will execute tablet migration track
as long as there are nodes which require draining. It will not do regular
load balancing.

If load balancer is unable to find new tablet replicas, because RF
cannot be met or availability is at risk due to insufficient node
distribution in racks, it will throw an exception. Currently, topology
change will retry in a loop. We should make this error cause topology
change to be aborted. There is no infrastructure for
aborts yet, so this is not implemented.

Closes #15197

* github.com:scylladb/scylladb:
  tablets, raft topology: Add support for decommission with tablets
  tablet_allocator: Compute load sketch lazily
  tablet_allocator: Set node id correctly
  tablet_allocator: Make migration_plan a class
  tablets: Implement cleanup step
  storage_service, tablets: Prevent stale RPCs from running beyond their stage
  locator: Introduce tablet_metadata_guard
  locator, replica: Add a way to wait for table's effective_replication_map change
  storage_service, tablets: Extract do_tablet_operation() from stream_tablet()
  raft topology: Add break in the final case clause
  raft topology: Fix SIGSEGV when trace-level logging is enabled
  raft topology: Set node state in topology
  raft topology: Always set host id in topology
2023-09-14 17:16:23 +03:00
Tomasz Grabiec
d5539e080d tablets: Implement cleanup step
This change adds a stub for tablet cleanup on the replica side and wires
it into the tablet migration process.

The handling on replica side is incomplete because it doesn't remove
the actual data yet. It only flushes the memtables, so that all data
is in sstables and none requires a memtable flush.

This patch is necessary to make decommission work. Otherwise, a
memtable flush would happen when the decommissioned node is put in the
drained state (as in nodetool drain) and it would fail on missing host
id mapping (node is no longer in topology), which is examined by the
tablet sharder when producing sstable sharding metadata. Leading to
abort due to failed memtable flush.
2023-09-14 12:45:10 +02:00
Petr Gusev
ce0ee32d5a database.cc: make _uses_schema_commitlog optional
This field on the null shard is properly initialized
in maybe_init_schema_commitlog function, until then
we can't make decisions based on its value. This problem
can happen e.g. if add_column_family function is called
with readonly=false before maybe_init_schema_commitlog.
It will call commitlog_for to pass the commitlog to
mark_ready_for_writes and commitlog_for reads _uses_schema_commitlog.

In this commit we add protection against this case - we
trigger internal_error if _uses_schema_commitlog is read
before it is initialized.

maybe_init_schema_commitlog() was added to cql_test_env
to make boost tests work with the new invariant.
2023-09-13 23:17:20 +04:00
Petr Gusev
beb29f094b system_keyspace: drop load phases
We want to switch system.scylla_local table to the
schema commitlog, but load phases hamper here - schema
commitlog is initialized after phase1,
so a table which is using it should be moved to phase2,
but system.scylla_local contains features, and we need
them before  schema commitlog initialization for
SCHEMA_COMMITLOG feature.

In this commit we are taking a different approach to
loading system tables. First, we load them all in
one pass in 'readonly' mode. In this mode, the table
cannot be written to and has not yet been assigned
a commit log. To achieve this we've added _readonly bool field
to the table class, it's initialized to true in table's
constructor. In addition, we changed the table constructor
to always assign nullptr to commitlog, and we trigger
an internal error if table.commitlog() property is accessed
while the table is in readonly mode. Then, after
triggering on_system_tables_loaded notifications on
feature_service and sstable_format_selector, we call
system_keyspace::mark_writable and eventually
table::mark_ready_for_writes which selects the
proper commitlog and marks the table as writable.

In sstable_compaction_test we drop several
mark_ready_for_writes calls since they are redundant,
the table has already been made writable in
env.make_table_for_tests call.

The table::commitlog function either returns the current
commitlog or causes an error if the table is readonly. This
didn't work for virtual tables, since they never called
mark_ready_for_writes. In this commit we add this
call to initialize_virtual_tables.
2023-09-13 23:17:20 +04:00
Petr Gusev
47ffc66c7f database.hh: add_column_family: add readonly parameter
Previously, creating a table or view in
schema_tables.cc/merge_tables_and_views was a two-step process:
first adding a column family (add_column_family function) and
then marking it as ready for writes (mark_table_as_writable).
There is an yield between these stages, this means
someone could see a table or view for which the
mark_table_as_writable method had not yet been called,
and start writing to it.

This problem was demonstrated by materialised view dtests.
A view is created on all nodes. On some nodes it will be created
earlier than on others and the view rebuild process will start
writing data to that view on other nodes, where mark_table_as_writable
has not yet been called.

In this patch we solve this problem by adding a readonly parameter
to the add_column_family method. When loading tables from disk,
this flag is set to true and the mark_table_as_writable
is called only after all sstables have been loaded.
When creating a new table, this flag is set to false,
mark_table_as_writable is called from inside add_column_family
and the new table becomes visible already as writable.
2023-09-13 23:17:20 +04:00
Tomasz Grabiec
c27d212f4b api, storage_service: Recalculate table digests on relocal_schema api call
Currently, the API call recalculates only per-node schema version. To
workaround issues like #4485 we want to recalculate per-table
digests. One way to do that is to restart the node, but that's slow
and has impact on availability.

Use like this:

  curl -X POST http://127.0.0.1:10000/storage_service/relocal_schema

Fixes #15380

Closes #15381
2023-09-13 18:27:57 +03:00
Pavel Emelyanov
c2f2e0fd7a table: Remove find_partition_slow() helper
It's no longer used

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:38:41 +03:00
Raphael S. Carvalho
a22f74df00 table: Introduce storage snapshot for upcoming tablet streaming
New file streaming for tablets will require integration with compaction
groups. So this patch introduces a way for streaming to take a storage
snapshot of a given tablet using its token range. Memtable is flushed
first, so all data of a tablet can be streamed through its sstables.
The interface is compaction group / tablet agnostic, but user can
easily pick data from a single tablet by using the range in tablet
metadata for a given tablet.

E.g.:

	auto erm = table.get_effective_replication_map();
	auto& tm = erm->get_token_metadata();
	auto tablet_map = tm.tablets().get_tablet_map(table.schema()->id());

	for (auto tid : tablet_map.tablet_ids()) {
		auto tr = tmap.get_token_range(tid);

		auto ssts = co_await table.take_storage_snapshot(tr);

		...
	}

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #15128
2023-08-25 13:06:02 +02:00
Tomasz Grabiec
bd8bb5d4b1 Merge 'Wire tablet into compaction group' from Raphael "Raph" Carvalho
Compaction group is the data plane for tablets, so this integration
allows each tablet to have its own storage (memtable + sstables).
A crucial step for dynamic tablets, where each tablet can be worked
on independently.

There are still some inefficiencies to be worked on, but as it is,
it already unlocks further development.

```
INFO  2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata
INFO  2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf
```

Closes #14863

* github.com:scylladb/scylladb:
  Kill scylla option to configure number of compaction groups
  replica: Wire tablet into compaction group
  token_metadata: Add this_host_id to topology config
  replica: Switch to chunked_vector for storing compaction groups
  replica: Generate group_id for compaction_group on demand
2023-08-18 15:17:17 +02:00
Raphael S. Carvalho
cc60598368 replica: Wire tablet into compaction group
Compaction group is the data plane for tablets, so this integration
allows each tablet to have its own storage (memtable + sstables).
A crucial step for dynamic tablets, where each tablet can be worked
on independently.

There are still some inefficiencies to be worked on, but as it is,
it already unlocks further development.

INFO  2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata
INFO  2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf

There's a need for compaction_group_manager, as table will still support
"tabletless" mode, and we don't want to sprinkle ifs here and there,
to support both modes. It's not really a manager (it's not even supposed
to store a state), but I couldn't find a better name.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-08-16 18:23:53 -03:00
Raphael S. Carvalho
d3f71ae4ee replica: Switch to chunked_vector for storing compaction groups
We aim for a large number of tablets, therefore let's switch
to chunked_vector to avoid large contiguous allocs.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-08-15 09:04:05 -03:00
Pavel Emelyanov
fdfec474ae table: Make sstables with required state
By default it's created with normal state, but there are some places
that need to put it into staging. Do it with new state enum

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 15:28:54 +03:00
Pavel Emelyanov
6628dc47c5 table: Open-code sstables making streaming helpers
There are two of those that call each other to end up calling plain
make_sstable() one. It's simpler to patch both if they just call the
latter directly.

While at it -- drop the unused default argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:02 +03:00
Pavel Emelyanov
fa93ac9bfd database: Add wasm::manager& dependency
The dependency is needed by db::schema_tables to get wasm manager for
its needs. This patch prepares the ground. Now the wasm::manager is
shared between replica::database and cql3::query_processor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Botond Dénes
4d538e1363 Merge 'Task manager tasks covering compaction group compaction' from Aleksandra Martyniuk
All compaction task executors, except for regular compaction one,
become task manager compaction tasks.

Creating and starting of major_compaction_task_executor is modified
to be consistent with other compaction task executors.

Closes #14505

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to cover compaction group tasks
  compaction: turn custom_task_executor into compaction_task_impl
  compaction: turn sstables_task_executor into sstables_compaction_task_impl
  compaction: change sstables compaction tasks type
  compaction: move table_upgrade_sstables_compaction_task_impl
  compaction: pass task_info through sstables compaction
  compaction: turn offstrategy_compaction_task_executor into offstrategy_compaction_task_impl
  compaction: turn cleanup_compaction_task_executor into cleanup_compaction_task_impl
  comapction: use optional task info in major compaction
  compaction: use perform_compaction in compaction_manager::perform_major_compaction
2023-08-04 10:11:00 +03:00
Amnon Heiman
d10a3dd19a config: add enable_node_table_metrics flag
By default, per-table-per-shard metrics reporting is turned off, and the
aggregated version of the metrics (per-table-per-node) will be turned
on.

There could be a situation where a user with an excessive number of
tables would suffer from performance issues, both from the network and
the metrics collection server.

This patch adds a config option, enable_node_table_metrics, which allows
users to turn off per-table metrics reporting altogether.

For example, when running Scylla with the command line argument
'--enable-node-aggregated-table_metrics 0' per-table metrics will not be reported.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2023-08-02 10:20:18 +03:00
Botond Dénes
4a02865ea1 Merge 'Prevent invalidation of iterators over database::_column_families' from Aleksandra Martyniuk
Maps related to column families in database are extracted
to a column_families_data class. Access to them is possible only
through methods. All methods which may preempt hold rwlock
in relevant mode, so that the iterators can't become invalid.

Fixes: #13290

Closes #13349

* github.com:scylladb/scylladb:
  replica: make tables_metadata's attributes private
  replica: add methods to get a filtered copy of tables map
  replica: add methods to check if given table exists
  replica: add methods to get table or table id
  replica: api: return table_id instead of const table_id&
  replica: iterate safely over tables related maps
  replica: pass tables_metadata to phased_barrier_top_10_counts
  replica: add methods to safely add and remove table
  replica: wrap column families related maps into tables_metadata
  replica: futurize database::add_column_family and database::remove
2023-07-31 15:31:59 +03:00
Aleksandra Martyniuk
4e439ac957 compaction: turn offstrategy_compaction_task_executor into offstrategy_compaction_task_impl
offstrategy_compaction_task_executor inherits both from compaction_task_executor
and offstrategy_compaction_task_impl.
2023-07-28 10:51:55 +02:00
Aleksandra Martyniuk
92f2987217 compaction: turn cleanup_compaction_task_executor into cleanup_compaction_task_impl
cleanup_compaction_task_executor inherits both from compaction_task_executor
and cleanup_compaction_task_impl.

Add a new version of compaction_manager::perform_task_on_all_files
which accepts only the tasks that are derived from compaction_task_impl.
After all task executors' conversions are done, the new version replaces
the original one.
2023-07-28 10:48:58 +02:00
Aleksandra Martyniuk
8317e4dd7f comapction: use optional task info in major compaction
To make it consistent with the upcoming methods, methods triggering
major compaction get std::optional<tasks::task_info> as an argument.

Thanks to that we can distinguish between a task that has no parent
and the task which won't be registered in task manager.
2023-07-28 09:25:21 +02:00
Botond Dénes
b599f15b26 replica: make_[multishard_]streaming_reader(): make compaction_time mandatory
Now that all users have opted in unconditionally, there is no point in
keeping this optional. Make it mandatory to make sure there are no
opt-out by mistake.
The global override via enable_compacting_data_for_streaming_and_repair
config item still remains, allowing compaction to be force turned-off.
2023-07-27 04:57:52 -04:00
Botond Dénes
2f8d77e97b replica/table: add optional compacting to make_multishard_streaming_reader()
Doing to make_multishard_streaming_reader() what the previous commit did
to make_streaming_reader(). In fact, the new compaction_time parameter
is simply forwarded to the make_streaming_reader() on the shard readers.

Call sites are updated, but none opt in just yet.
2023-07-27 03:22:11 -04:00
Botond Dénes
42b0dd5558 replica/table: add optional compacting to make_streaming_reader()
Opt-in is possible by passing an engaged `compaction_time`
(gc_clock::time_point) to the method. When this new parameter is
disengaged, no compaction happens.
Note that there is a global override, via the
enable_compacting_data_for_streaming_and_repair config item, which can
force-disable this compaction.
Compaction done on the output of the streaming reader does *not*
garbage-collect tombstones!

All call-sites are adjusted (the new parameter is not defaulted), but
none opt in yet. This will be done in separate commit per user.
2023-07-27 03:22:11 -04:00
Botond Dénes
ad2ddffb22 Merge 'Remove qctx from system_keyspace::save_truncation_record()' from Pavel Emelyanov
The method is called by db::truncate_table_on_all_shards(), its call-chain, in turn, starts from

- proxy::remote::handle_truncate()
- schema_tables::merge_schema()
- legacy_schema_migrator
- tests

All of the above are easy to get system_keyspace reference from. This, in turn, allows making the method non-static and use query_processor reference from system_keyspace object in stead of global qctx

Closes #14778

* github.com:scylladb/scylladb:
  system_keyspace: Make save_truncation_record() non-static
  code: Pass sharded<db::system_keyspace>& to database::truncate()
  db: Add sharded<system_keyspace>& to legacy_schema_migrator
2023-07-26 08:48:49 +03:00
Aleksandra Martyniuk
6e6ba7309e replica: make tables_metadata's attributes private
Make _column_families and _ks_cf_to_uuid private to prevent unsafe
access. The maps can be accessed only through method which use locks
if preemption is possible.
2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk
c5cad803b3 replica: add methods to get a filtered copy of tables map 2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk
ff26b2ba3f replica: add methods to check if given table exists 2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk
6796721c3d replica: add methods to get table or table id 2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk
e072a2341d replica: api: return table_id instead of const table_id&
Return table_id instead of const table_id& from database::find_uuid
as copying table_id does not cause much overhead and simplifies
methods signature.
2023-07-25 17:13:24 +02:00