Commit Graph

38318 Commits

Author SHA1 Message Date
Pavel Emelyanov
d7f5d6dba8 system_keyspace: Use system_keyspace's container() to flush
In force_blocking_flush() there's an invoke-on-all invocation of
replica::database::flush() and a FIXME to get the replica database from
somewhere else rather than via query-processor -> data_dictionary.

Since now the force_blocking_flush() is non-static the invoke-on-all can
happen via system_keyspace's container and the database can be obtained
directly from the sys.ks. local instance

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:09:32 +03:00
Pavel Emelyanov
7a342ed5c0 system_keyspace: Make force_blocking_flush() non-static
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:09:20 +03:00
Pavel Emelyanov
6b8fe5ac43 system_keyspace: Coroutinize update_tokens()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:09:15 +03:00
Pavel Emelyanov
1700d79b60 system_keyspace: Coroutinize save_truncation_record()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:09:09 +03:00
Avi Kivity
4f7e83a4d0 cql3: select_statement: reject DISTINCT with GROUP BY on clustering keys
While in SQL DISTINCT applies to the result set, in CQL it applies
to the table being selected, and doesn't allow GROUP BY with clustering
keys. So reject the combination like Cassandra does.

While this is not an important issue to fix, it blocks un-xfailing
other issues, so I'm clearing it ahead of fixing those issues.

An issue is unmarked as xfail, and other xfails lose this issue
as a blocker.

Fixes #12479

Closes #14970
2023-08-07 15:35:59 +03:00
Patryk Jędrzejczak
1772433ae2 raft_group0: log gaining and losing leadership on the INFO level
Knowing that a server gained or lost leadership in group 0 is
sometimes useful for the purpose of debugging, so we log
information about these events on the INFO level.

Gaining and losing leadership are relatively rare events, so
this change shouldn't flood the logs.

Closes #14877
2023-08-07 12:13:24 +02:00
Kamil Braun
9edc98f8e9 Merge 'raft: make a removed/decommissioning node a non-voter early' from Patryk Jędrzejczak
For `removenode`, we make a removed node a non-voter early. There is no
downside to it because the node is already dead. Moreover, it improves
availability in some situations.

For `decommission`, if we decommission a node when the number of nodes
is even, we make it a non-voter early to improve availability. All
majorities containing this node will remain majorities when we make this
node a non-voter and remove it from the set because the required size of
a majority decreases.

We don't change `decommission` when the number of nodes is odd since
this may reduce availability.

Fixes #13959

Closes #14911

* github.com:scylladb/scylladb:
  raft: make a decommissioning node a non-voter early
  raft: topology_coordinator: implement step_down_as_nonvoter
  raft: make a removed node a non-voter early
2023-08-07 10:14:33 +02:00
Botond Dénes
fa4aec90e9 Merge 'test: tasks: Fix task_manager/wait_task test ' from Aleksandra Martyniuk
Rewrite test that checks whether task_manager/wait_task works properly.
The old version didn't work. Delete functions used in old version.

Closes #14959

* github.com:scylladb/scylladb:
  test: rewrite wait_task test
  test: move ThreadWrapper to rest_util.py
2023-08-07 09:04:29 +03:00
Benny Halevy
6f037549ac sstables: delete_with_pending_deletion_log: batch sync_directory
When deleting multiple sstables with the same prefix
the deletion atomicity is ensured by the pending_delete_log file,
so if scylla crashes in the middle, deletions will be replyed on
restart.

Therefore, we don't have to ensure atomicity of each individual
`unlink`.  We just need to sync the directory once, before
removing the pending_delete_log file.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14967
2023-08-06 18:52:13 +03:00
Avi Kivity
6c1e44e237 Merge 'Make replica::database and cql3::query_processor share wasm manager' from Pavel Emelyanov
This makes it possible to remove remaining users of the global qctx.

The thing is that db::schema_tables code needs to get wasm's engine, alien runner and instance cache to build wasm context for the merged function or to drop it from cache in the opposite case. To get the wasm stuff, this code uses global qctx -> query_processor -> wasm chain. However, the functions (un)merging code already has the database reference at hand, and its natural to get wasm stuff from it, not from the q.p. which is not available

So this PR packs the wasm engine, runner and cache on sharded<wasm::manager> instance, makes the manager be referenced by both q.p. and database and removes the qctx from schema tables code

Closes #14933

* github.com:scylladb/scylladb:
  schema_tables: Stop using qctx
  database: Add wasm::manager& dependency
  main, cql_test_env, wasm: Start wasm::manager earlier
  wasm: Shuffle context::context()
  wasm: Add manager::remove()
  wasm: Add manager::precompile()
  wasm: Move stop() out of query_processor
  wasm: Make wasm sharded<manager>
  query_processor: Wrap wasm stuff in a struct
2023-08-06 17:00:28 +03:00
Avi Kivity
412629a9a1 Merge 'Export tablet load-balancer metrics' from Tomasz Grabiec
The metrics are registered on-demand when load-balancer is invoked, so that only leader exports the metrics. When leader changes, the old leader will stop exporting.

The metrics are divided into two levels: per-dc and per-node. In prometheus, they will have appropriate labels for dc and host_id values.

Closes #14962

* github.com:scylladb/scylladb:
  tablet_allocator: unregister metrics when leadership is lost
  tablets: load_balancer: Export metrics
  service, raft: Move balance_tablets() to tablet_allocator
  tablet_allocator: Start even if tablets feature is not enabled
  main, storage_service: Pass tablet allocator to storage_service
2023-08-06 16:58:27 +03:00
Tomasz Grabiec
f26e65d4d4 tablets: Fix crash on table drop
Before the patch, tablet metadata update was processed on local schema merge
before table changes.

When table is dropped, this means that for a while table will exist
without a corresponding tablet map. This can cause memtable flush for
this table to fail, resulting in intentional abort(). That's because
sstable writing attempts to access tablet map to generate sharding
metadata.

If auto_snapshot is enabled, this is much more likely to happen,
because we flush memtables on table drop.

To fix the problem, process tablet metadata after dropping tables, but
before creating tables.

Fixes #14943

Closes #14954
2023-08-06 16:45:43 +03:00
Pavel Emelyanov
3c6686e181 bptree: Replace assert with static_assert
The one runs under checked constexpr value anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14951
2023-08-06 16:36:12 +03:00
Tomasz Grabiec
f827cfd5b6 tablet_allocator: unregister metrics when leadership is lost
So that graphs are not polluted with stale metrics from past leaders.
2023-08-05 21:48:08 +02:00
Tomasz Grabiec
d653cbae53 tablets: load_balancer: Export metrics 2023-08-05 21:48:08 +02:00
Tomasz Grabiec
67c7aadded service, raft: Move balance_tablets() to tablet_allocator
The implementation will access metrics registered from tablet_allocator.
2023-08-05 21:48:08 +02:00
Tomasz Grabiec
cb0d763a22 tablet_allocator: Start even if tablets feature is not enabled
topology coordinator will call it. Rather than spreading ifs there,
it's simpler to start it and disable functionality in the tablet
allocator.
2023-08-05 21:48:08 +02:00
Tomasz Grabiec
5bfc8b0445 main, storage_service: Pass tablet allocator to storage_service
Tablet balancing will be done through tablet_allocator later.
2023-08-05 03:10:26 +02:00
Pavel Emelyanov
fd50ba839c schema_tables: Stop using qctx
There are two places in there that need qctx to get query_processor from
to, in turn, get wasm::manager from. Fortunately, both places have the
database reference at hand and can get the wasm::manager from it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
fa93ac9bfd database: Add wasm::manager& dependency
The dependency is needed by db::schema_tables to get wasm manager for
its needs. This patch prepares the ground. Now the wasm::manager is
shared between replica::database and cql3::query_processor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
f4e7ffa0fc main, cql_test_env, wasm: Start wasm::manager earlier
It will be needed by replica::database and should be available that
early. It doesn't depend on anything and can be moved in the starting
order safely

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
595c5abbf9 wasm: Shuffle context::context()
Add a constructor that builds context out of const manager reference.
The existing one needs to get engine and instance cache and does it via
query_processor. This change lets removing those exports and finally --
drop the wasm::manager -> cql3::query_processor friendship

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
56404ee053 wasm: Add manager::remove()
This is one of the users of query_processor's export of wasm::manager's
instance cache. Remove it in advance

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
93cb73fddb wasm: Add manager::precompile()
This is not to make query_processor export alien runner from the
wasm::manager

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
d58a2d65b5 wasm: Move stop() out of query_processor
When the q.p. stops it also "stops" the wasm manager. Move this call
into main. The cql test env doesn't need this change, it stops the whole
sharded service which stops instances on its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
243f2217dd wasm: Make wasm sharded<manager>
The wasm::manager is just cql3::wasm_context renamed. It now sits in
lang/wasm* and is started as a sharded service in main (and cql test
env). This move also needs some headers shuffling, but it's not severe

This change is required to make it possible for the wasm::manager to be
shared (by reference) between q.p. and replica::database further

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
dde285e7e9 query_processor: Wrap wasm stuff in a struct
There are three wasm-only fields on q.p. -- engine, cache and runner.
This patch groups them on a single wasm_context structure to make it
earier to manipulate them in the next patches

The 'friend' declaration it temporary, will go away soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Kamil Braun
421a5ad55c Merge 'feature_service: don't load whole topology state to check features' from Piotr Dulikowski
Currently, feature service uses `system_keyspace::load_topology_state`
to load information about features from the `system.topology` table.
This function implicitly assumes that it is called after schema
commitlog replay and will correspond to the state of the topology state
machine after some command is applied.

However, feature check happens before the commitlog replay. If some
group 0 command consists of multiple mutations that are not applied
atomically, the `load_topology_state` function may fail to construct a
`service::topology` object based on the table state. Moreover, this
function not only checks `system.topology` but also
`system.cdc_generations_v3` - in the case of the issue, the entry that
was loaded from the this table didn't contain the `num_ranges`
parameter.

In order to fix this, the feature check code now uses
`load_topology_features_state` which only loads enabled and supported
features from `system.topology`. Only this information is really
necessary for the feature check, and it doesn't have any invariants to
check.

Fixes: #14944

Closes #14955

* github.com:scylladb/scylladb:
  feature_service: don't load whole topology state to check features
  system_keyspace: separate loading topology_features from topology
  topology_state_machine: extract features-related fields to a struct
  untyped_result_set: add missing_column_exception
2023-08-04 15:09:12 +02:00
Kamil Braun
fed775e13b Merge 'group0_state_machine: await transfer_snapshot' from Benny Halevy
Hold a (newly added) group0_state_machine gate
that is closed and waited on in group0_state_machine::abort()
To prevent use-after-free when destroying the group0_state_machine
while transfer_snapshot runs.

Fixes #14907

Also, use an abort_source in group0_state_machine
to abort an ongoing transfer_snapshot operation
on group0_state_machine::abort()

Closes #14952

* github.com:scylladb/scylladb:
  raft: group0_state_machine: transfer_snapshot: make abortable
  raft: group0_state_machine: transfer_snapshot: hold gate
2023-08-04 14:21:57 +02:00
Botond Dénes
68d2397d01 Merge 'repair: delete unused fields' from Aleksandra Martyniuk
Delete unused shard_repair_task_impl members and incorrectly used method's argument.

Closes #14956

* github.com:scylladb/scylladb:
  repair: delete task_manager_module::get_progress argument
  repair: delete unused shard_repair_task_impl fields
2023-08-04 15:08:31 +03:00
Aleksandra Martyniuk
629f893355 test: rewrite wait_task test
Rewrite test that checks whether task_manager/wait_task works properly.
The old version didn't work. Delete functions used in old version.
2023-08-04 13:34:58 +02:00
Aleksandra Martyniuk
9d2e55fd37 test: move ThreadWrapper to rest_util.py
Move ThreadWrapper to rest_util.py so it can be reused in different tests.
2023-08-04 13:29:03 +02:00
Piotr Dulikowski
b7d9348229 feature_service: don't load whole topology state to check features
Currently, feature service uses `system_keyspace::load_topology_state`
to load information about features from the `system.topology` table.
This function implicitly assumes that it is called after schema
commitlog replay and will correspond to the state of the topology state
machine after some command is applied.

However, feature check happens before the commitlog replay. If some
group 0 command consists of multiple mutations that are not applied
atomically, the `load_topology_state` function may fail to construct a
`service::topology` object based on the table state. Moreover, this
function not only checks `system.topology` but also
`system.cdc_generations_v3` - in the case of the issue, the entry that
was loaded from the this table didn't contain the `num_ranges`
parameter.

In order to fix this, the feature check code now uses
`load_topology_features_state` which only loads enabled and supported
features from `system.topology`. Only this information is really
necessary for the feature check, and it doesn't have any invariants to
check.

Fixes: #14944
2023-08-04 12:32:05 +02:00
Piotr Dulikowski
8f491457ae system_keyspace: separate loading topology_features from topology
Now, it is possible to load topology_features separately from the
topology struct. It will be used in the code that checks enabled
features on startup.
2023-08-04 12:32:04 +02:00
Piotr Dulikowski
f1704eeee6 topology_state_machine: extract features-related fields to a struct
`enabled_features` and `supported_features` are now moved to a new
`topology::features` struct. This will allow to move load this
information independently from the `topology` struct, which will be
needed for feature checking during start.
2023-08-04 12:21:51 +02:00
Aleksandra Martyniuk
66df686980 repair: delete task_manager_module::get_progress argument
Getting reason argument in task_manager_module::get_progress is deceiving
as the method works properly only for streaming::stream_reason::repair
(repair::shard_repair_task_impl::nr_ranges_finished isn't updated for
any other reason).
2023-08-04 11:09:37 +02:00
Aleksandra Martyniuk
93ebbdcf1d repair: delete unused shard_repair_task_impl fields 2023-08-04 10:52:24 +02:00
Botond Dénes
00a62866ac Merge 'Make database::add_column_family exception safe.' from Aleksandra Martyniuk
If some state update in database::add_column_family throws,
info about a column family would be inconsistent.

Undo already performed operations in database::add_column_family
when one throws.

Fixes: #14666.

Closes #14672

* github.com:scylladb/scylladb:
  replica: undo the changes if something fails
  replica: start table earlier in database::add_column_family
2023-08-04 10:58:17 +03:00
Botond Dénes
4d538e1363 Merge 'Task manager tasks covering compaction group compaction' from Aleksandra Martyniuk
All compaction task executors, except for regular compaction one,
become task manager compaction tasks.

Creating and starting of major_compaction_task_executor is modified
to be consistent with other compaction task executors.

Closes #14505

* github.com:scylladb/scylladb:
  test: extend test_compaction_task.py to cover compaction group tasks
  compaction: turn custom_task_executor into compaction_task_impl
  compaction: turn sstables_task_executor into sstables_compaction_task_impl
  compaction: change sstables compaction tasks type
  compaction: move table_upgrade_sstables_compaction_task_impl
  compaction: pass task_info through sstables compaction
  compaction: turn offstrategy_compaction_task_executor into offstrategy_compaction_task_impl
  compaction: turn cleanup_compaction_task_executor into cleanup_compaction_task_impl
  comapction: use optional task info in major compaction
  compaction: use perform_compaction in compaction_manager::perform_major_compaction
2023-08-04 10:11:00 +03:00
Michał Jadwiszczak
b92d47362f schema::describe: print 'synchronous_updates' only if it was specified
While describing materialized view, print `synchronous_updates` option
only if the tag is present in schema's extensions map. Previously if the
key wasn't present, the default (false) value was printed.

Fixes: #14924

Closes #14928
2023-08-04 09:52:37 +03:00
Kefu Chai
d8d91379e7 test: remove unnecessary check in compaction_manager_basic_test
we wait for the same condition couple lines before, so no need to
check it again using `BOOST_CHECK_EQUAL()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14921
2023-08-04 09:26:22 +03:00
Piotr Dulikowski
fad1e82bf7 untyped_result_set: add missing_column_exception
Currently, when one tries to access a column that an untyped_result_set
does not contain, a `std::bad_variant_access` exception is thrown. This
exception's message provides very little context and it can be difficult
to even figure out where this message is coming from.

In order to improve the situation, a new exception `missing_column` is
introduced which includes the missing column's name in its error
message. The exception derives from `std::bad_variant_access` for
compatibility with existing code that may want to catch it.
2023-08-04 07:37:12 +02:00
Kefu Chai
374bed8c3d tools: do not create bpo::value unless transfer it to an option_description
`boost::program_options::value()` create a new typed_value<T> object,
without holding it with a shared_ptr. boost::program_options expects
developer to construct a `bpo::option_description` right away from it.
and `boost::program_options::option_description` takes the ownership
of the `type_value<T>*` raw pointer, and manages its life cycle with
a shared_ptr. but before passing it to a `bpo::option_description`,
the pointer created by `boost::program_options::value()` is a still
a raw pointer.

before this change, we initialize positional options as global
variables using `boost::program_options::value()`. but unfortunately,
we don't always initialize a `bpo::option_description` from it --
we only do this on demand when the corresponding subcommand is
called.

so, if the corresponding subcommand is not called, the created
`typed_value<T>` objects are leaked. hence LeakSanitizer warns us.

after this change, we create the option vector as a static
local variable in a function so it is created on demand as well.
as an alternative, we could initialize the options vector as local
variable where it used. but to be more consistent with how
`global_option` is specified. and to colocate them in a single
place, let's keep the existing code layout.

Fixes #14929
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14939
2023-08-04 08:03:11 +03:00
Aleksandra Martyniuk
1e9b2972ea replica: undo the changes if something fails
If a step of adding a table fails, previous steps are undone.
2023-08-03 17:37:31 +02:00
Benny Halevy
46c9e3032d storage_service: get_all_ranges: reserve enough space in ranges
Commit bc5f6cf45d
added a reserve call to the `ranges` vector before
inserting all the returned token ranges into it.
However, that reservation is too small as we need
to express size+1 ranges for size tokens with
<unbound, token[0]> and <token[size-1], unbound>
ranges at the front and back, respectively.

Fixes #14849

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14938
2023-08-03 17:13:03 +03:00
Benny Halevy
357d57c82d raft: group0_state_machine: transfer_snapshot: make abortable
Use an abort_source in group0_state_machine
to abort an ongoing transfer_snapshot operation
on group0_state_machine::abort()

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-03 16:32:08 +03:00
Benny Halevy
a23b58231e raft: group0_state_machine: transfer_snapshot: hold gate
Hold a (newly added) group0_state_machine gate
that is closed and waited on in group0_state_machine::abort()
To prevent use-after-free when destroying the group0_state_machine
while transfer_snapshot runs.

Fixes #14907

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-03 15:45:34 +03:00
Botond Dénes
946c6487ee Merge 'repair: Add ranges_parallelism option' from Asias He
This patch adds the ranges_parallelism option to repair restful API.

Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel.

Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job.

To make it safer, unlike the PR #4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel.

Fixes #4847

Closes #14886

* github.com:scylladb/scylladb:
  repair: Add ranges_parallelism option
  repair: Change to use coroutine in do_repair_ranges
2023-08-03 11:34:05 +03:00
Kefu Chai
d4ee84ee1e s3/test: nuke tempdir but keep $tempdir/log
before this change, if the object_store test fails, the tempdir
will be preserved. and if our CI test pipeline is used to perform
the test, the test job would scan for the artifacts, and if the
test in question fails, it would take over 1 hour to scan the tempdir.

to alleviate the pain, let's just keep the scylla logging file
no matter the test fails or succeeds. so that jenkins can scan the
artifacts faster if the test fails.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14880
2023-08-03 11:07:59 +03:00
Avi Kivity
cb3b808e3f Merge 'replica/table.cc: Add per-node-per-table metrics' from Amnon Heiman
Per-table metrics are very valuable for the users, it does come with a high load on both the reporting and the collecting metrics systems.

This patch adds a small subset of per-metrics table that will be reported on the node level.

The list of metrics is:
system_column_family_memtable_switch - Number of times flush has
  resulted in the memtable being switched out
system_column_family_memtable_partition_writes - Number of write
  operations performed on partitions in memtables
system_column_family_memtable_partition_hits - Number of times a write
  operation was issued on an existing partition in memtables
system_column_family_memtable_row_writes - Number of row writes
  performed in memtables
system_column_family_memtable_row_hits - Number of rows overwritten by write operations in memtables
system_column_family_total_disk_space - Total disk space used system_column_family_live_sstable - Live sstable count system_column_family_read_latency_count - Number of reads system_column_family_write_latency_count - Number of writes

The names of the read/write metrics is based on the histogram convention, so when latencies histograms will be added, the names will not change.

The metrics are label with a specific label __per_table="node" so it will be possible to easily manipulate it.

The metrics will be available when enable_metrics_reporting (the per-table full metrics flag) is off

Fixes #2198

Closes #13293

* github.com:scylladb/scylladb:
  replica/table.cc: Add node-per-table metrics
  config: add enable_node_table_metrics flag
2023-08-02 22:17:47 +03:00