Commit Graph

38084 Commits

Author SHA1 Message Date
Avi Kivity
ff1f461a42 Merge 'Introduce tablet load balancer' from Tomasz Grabiec
After this series, tablet replication can handle the scenario of bootstrapping new nodes. The ownership is distributed indirectly by the means of a load-balancer which moves tablets around in the background. See docs/dev/topology-over-raft.md for details.

The implementation is by no means meant to be perfect, especially in terms of performance, and will be improved incrementally.

The load balancer will be also kicked by schema changes, so that allocation/deallocation done during table creation/drop will be rebalanced.

Tablet data is streamed using existing `range_streamer`, which is the infrastructure for "the old streaming". This will be later replaced by sstable transfer once integration of tablets with compaction groups is finished. Also, cleanup is not wired yet, also blocked by compaction group integration.

Closes #14601

* github.com:scylladb/scylladb:
  tests: test_tablets: Add test for bootstraping a node
  storage_service: topology_coordinator: Implement tablet migration state machine
  tablets: Introduce tablet_mutation_builder
  service: tablet_allocator: Introduce tablet load balancer
  tablets: Introduce tablet_map::for_each_tablet()
  topology: Introduce get_node()
  token_metadata: Add non-const getter of tablet_metadata
  storage_service: Notify topology state machine after applying schema change
  storage_service: Implement stream_tablet RPC
  tablets: Introduce global_tablet_id
  stream_transfer_task, multishard_writer: Work with table sharder
  tablets: Turn tablet_id into a struct
  db: Do not create per-keyspace erm for tablet-based tables
  tablets: effective_replication_map: Take transition stage into account when computing replicas
  tablets: Store "stage" in transition info
  doc: Document tablet migration state machine and load balancer
  locator: erm: Make get_endpoints_for_reading() always return read replicas
  storage_service: topology_coordinator: Sleep on failure between retries
  storage_service: topology_coordinator: Simplify coordinator loop
  main: Require experimental raft to enable tablets
2023-07-26 12:30:29 +03:00
Botond Dénes
ad2ddffb22 Merge 'Remove qctx from system_keyspace::save_truncation_record()' from Pavel Emelyanov
The method is called by db::truncate_table_on_all_shards(), its call-chain, in turn, starts from

- proxy::remote::handle_truncate()
- schema_tables::merge_schema()
- legacy_schema_migrator
- tests

All of the above are easy to get system_keyspace reference from. This, in turn, allows making the method non-static and use query_processor reference from system_keyspace object in stead of global qctx

Closes #14778

* github.com:scylladb/scylladb:
  system_keyspace: Make save_truncation_record() non-static
  code: Pass sharded<db::system_keyspace>& to database::truncate()
  db: Add sharded<system_keyspace>& to legacy_schema_migrator
2023-07-26 08:48:49 +03:00
Benny Halevy
90b2e6515c gossiper: mark_alive: enter background_msg gate
The function dispatch a background operation that must be
waited on in stop().

Fixes scylladb/scylladb#14791

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14797
2023-07-26 00:51:22 +02:00
Tomasz Grabiec
ae8ffe23fc tests: test_tablets: Add test for bootstraping a node 2023-07-25 21:08:51 +02:00
Tomasz Grabiec
f0b9dcee04 storage_service: topology_coordinator: Implement tablet migration state machine
See the documentation in topology-over-raft.md for description of the mechanism.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
5c681a1d63 tablets: Introduce tablet_mutation_builder 2023-07-25 21:08:51 +02:00
Tomasz Grabiec
6f4a35f9ae service: tablet_allocator: Introduce tablet load balancer
Will be invoked by the topology coordinator later to decide
which tablets to migrate.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
d59b8d316c tablets: Introduce tablet_map::for_each_tablet() 2023-07-25 21:08:51 +02:00
Tomasz Grabiec
0e3eac29d0 topology: Introduce get_node() 2023-07-25 21:08:51 +02:00
Tomasz Grabiec
f2fdf37415 token_metadata: Add non-const getter of tablet_metadata
Needed for tests.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
1885f94474 storage_service: Notify topology state machine after applying schema change
Table construction may allocate tablets which may need rebalancing.
Notify topology change coordinator to invoke the load balancer.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
6d545b2f9e storage_service: Implement stream_tablet RPC
Performs streaming of data for a single tablet between two tablet
replicas. The node which gets the RPC is the receiving replica.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
e3a8bb7ec9 tablets: Introduce global_tablet_id
Identifies tablet in the scope of the whole cluster. Not to be
confused with tablet replicas, which all share global_tablet_id.

Will be needed by load balancer and tablet migration algorithm to
identify tablets globally.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
f88220aeee stream_transfer_task, multishard_writer: Work with table sharder
So that we can use it on tablet-based tables.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
8cf92d4c86 tablets: Turn tablet_id into a struct
The IDL compiler cannot deal with enum classes like this.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
c2b18ae483 db: Do not create per-keyspace erm for tablet-based tables
This erm is not updated when replicating token metadata in
storage_service::replicate_to_all_cores() so will pin token metadata
version and prevent token metadata barrier from finishing.

It is not necessary to have per-keyspace erm for tablet-based tables,
so just don't create it.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
91dee5c872 tablets: effective_replication_map: Take transition stage into account when computing replicas 2023-07-25 21:08:51 +02:00
Tomasz Grabiec
dc2ec3f81c tablets: Store "stage" in transition info
It's needed to implement tablet migration. It stores the current step
of tablet migration state machine. The state machine will be advanced
by the topology change coordinator.

See the "Tablet migration" section of topology-over-raft.md
2023-07-25 21:08:02 +02:00
Tomasz Grabiec
05519bd5e5 doc: Document tablet migration state machine and load balancer 2023-07-25 21:08:02 +02:00
Tomasz Grabiec
7851694eaa locator: erm: Make get_endpoints_for_reading() always return read replicas
Just a simplification.

Drop the test case from token_metadata which creates pending endpoints
without normal tokens. It fails after this change with exception:
"sorted_tokens is empty in first_token_index!" thrown from
token_metadata::first_token_index(), which is used when calculating
normal endpoints. This test case is not valid, first node inserts
its tokens as normal without going through bootstrap procedure.
2023-07-25 21:08:01 +02:00
Tomasz Grabiec
b642e69eb3 storage_service: topology_coordinator: Sleep on failure between retries
Avoid failing in a tight loop. Can happen if some node is down, for example.
2023-07-25 21:08:01 +02:00
Tomasz Grabiec
f0e9dbf911 storage_service: topology_coordinator: Simplify coordinator loop
This refactoring removes a boolean and branching which makes it easier
to reason about the flow, and easier to extend it with more steps.
2023-07-25 21:08:01 +02:00
Tomasz Grabiec
b294932cf1 main: Require experimental raft to enable tablets
Tablets depend on the topology changes on raft feature.

Drop "tablets" from suite.yaml of the topology/ suite, which doesn't
use tablets anymore.
2023-07-25 21:08:01 +02:00
Pavel Emelyanov
c46c57d535 messaging_service: Clear list of clients on shutdown
When messaging_service shuts down it first sets _shutting_down to true
and proceeds with stopping clients and servers. Stopping clients, in
turn, is calling client.stop() on each.

Setting _shutting_down is used in two places.

First, when a client is stopped it may happen that it's in the middle of
some operation, which may result in call to remove_error_rpc_client()
and not to call .stop() for the second time it just does nothing if the
shutdown flag is set (see 357c91a076).

Second, get_rpc_client() asserts that this flag is not set, so once
shutdown started it can make sure that it will call .stop() on _all_
clients and no new ones would appear in parallel.

However, after shutdown() is complete the _clients vector of maps
remains intact even though all clients from it are stopped. This is not
very debugging-friendly, the clients are better be removed on shutdown.

fixes: #14624

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14632
2023-07-25 13:08:20 +03:00
Botond Dénes
ed025890e5 scripts/coverage.py: --run: swallow KeyboardInterrupt
It is quite common to stop a tested scylla process with ^C, which will
raise KeyboardInterrupt from subprocess.run(). Catch and swallow this
exception, allowing the post-processing to continue.
The interrupted process has to handle the interrupt correctly too --
flush the coverage data even on premature exit -- but this is for
another patch.

Closes #14815
2023-07-25 12:29:22 +03:00
Kefu Chai
2943d3c1b0 tools/scylla-sstable: s/foo.find(bar) != foo.end()/foo.count(bar) != 0/
just for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14816
2023-07-25 11:38:44 +03:00
Raphael S. Carvalho
0ac43ea877 Fix stack-use-after-return in mutation source excluding staging
The new test detected a stack-use-after-return when using table's
as_mutation_source_excluding_staging() for range reads.

This doesn't really affect view updates that generate single
key reads only. So the problem was only stressed in the recently
added test. Otherwise, we'd have seen it when running dtests
(in debug mode) that stress the view update path from staging.

The problem happens because the closure was feeded into
a noncopyable_function that was taken by reference. For range
reads, we defer before subsequent usage of the predicate.
For single key reads, we only defer after finished using
the predicate.

Fix is about using sstable_predicate type, so there won't
be a need to construct a temporary object on stack.

Fixes #14812.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14813
2023-07-25 10:38:20 +03:00
Botond Dénes
3eec990e4e Merge 'test: use different table names in simple_backlog_controller_test ' from Kefu Chai
in this series, we use different table names in simple_backlog_controller_test. this test is a test exercising sstables compaction strategies. and it creates and keeps multiple tables in a single test session. but we are going to add metrics on per-table basis, and will use the table's ks and cf as the counter's labels. as the metrics subsystem does not allow multiple counters to share the same label. the test will fail when the metrics are being added.

to address this problem, in this change

1. a new ctor is added for `simple_schema`, so we can create `simple_schema` with different names
2. use the new ctor in simple_backlog_controller_test

Fixes #14767

Closes #14783

* github.com:scylladb/scylladb:
  test: use different table names in simple_backlog_controller_test
  test/lib/simple_schema: add ctor for customizing ks.cf
  test/lib/simple_schema: do not hardwire ks.cf
2023-07-25 10:26:33 +03:00
Anna Stuchlik
f6732865b9 doc: doc: move unified installer from web to docs
This commit adds the information on how to install ScyllaDB
without root privileges (with "unified installer", but we've
decided to drop that name - see the page title).

The content taken from the website
https://www.scylladb.com/download/?platform=tar&version=scylla-5.2#open-source
is divided into two sections: "Download and Install" and
"Configure and Run ScyllaDB".
In addition, the "Next Steps" section is also copied from
the website, and adjusted to be in sync with other installation
pages in the docs.

Refs https://github.com/scylladb/scylla-docs/issues/4091

Closes #14781
2023-07-25 10:23:02 +03:00
Benny Halevy
a07440173f storage_service: node_ops_ctl: send_to_all: fix "Node is down for" log message args order
The node and op_desc args are reversed.

Fixes #14807

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14808
2023-07-24 21:13:06 +03:00
Petr Gusev
5fb8da4181 hints: add fencing
In this commit we just pass a fencing_token
through hint_mutation RPC verb.

The hints manager uses either
storage_proxy::send_hint_to_all_replicas or
storage_proxy::send_hint_to_endpoint to send a hint.
Both methods capture the current erm and use the
corresponding fencing token from it in the
mutation or hint_mutation RPC verb. If these
verbs are fenced out, the server stale_topology_exception
is translated to a mutation_write_failure_exception
on the client with an appropriate error message.
The hint manager will attempt to resend the failed
hint from the commitlog segment after a delay.
However, if delivery is unsuccessful, the hint will
be discarded after gc_grace_seconds.

Closes #14580
2023-07-24 18:12:48 +02:00
Tomasz Grabiec
5b30931406 Merge 'raft topology: restore gossiper eps' from Gusev Petr
We don't load gossiper endpoint states in `storage_service::join_cluster` if `_raft_topology_change_enabled`, but gossiper is still needed even in case of `_raft_topology_change_enabled` mode, since it still contains part of the cluster state. To work correctly, the gossiper needs to know the current endpoints. We cannot rely on seeds alone, since it is not guaranteed that seeds will be up to date and reachable at the time of restart.

The problem was demonstrated by the test `test_joining_old_node_fails`, it fails occasionally with `experimental_features: [consistent-topology-changes]` on the line where it waits for `TEST_ONLY_FEATURE` to become enabled on all nodes. This doesn't happen since `SUPPORTED_FEATURES` gossiper state is not disseminated, and feature_service still relies on gossiper to disseminate information around the cluster.

The series also contains a fix for a problem in `gossiper::do_send_ack2_msg`, see commit message for details.

Fixes #14675

Closes #14775

* github.com:scylladb/scylladb:
  storage_service: restore gossiper endpoints on topology_state_load fix
  gossiper: do_send_ack2_msg fix
2023-07-24 13:55:50 +02:00
Botond Dénes
a8feb7428d Merge 'semaphore mismatch: don't throw an error if both semaphores belong to user' from Michał Jadwiszczak
If semaphore mismatch occurs, check whether both semaphores belong
to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error.

Until now, semaphore mismatch was only checked in multi-partition queries.  The PR pushes the check to `querier_cache` and perform it on all `lookup_*_querier` methods.

The mismatch can happen if user's scheduling group changed during
a query. We don't want to throw an error then, but drop and reset
cached reader.

This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it.

Refers: https://github.com/scylladb/scylla-enterprise/issues/3182
Refers: https://github.com/scylladb/scylla-enterprise/issues/3050
Closes: #14770

Closes #14736

* github.com:scylladb/scylladb:
  querier_cache: add stats of scheduling group mismatches
  querier_cache: check semaphore mismatch during querier lookup
  querier_cache: add reference to `replica::database::is_user_semaphore()`
  replica:database: add method to determine if semaphore is user one
2023-07-24 14:13:09 +03:00
Petr Gusev
75694aa080 storage_service: restore gossiper endpoints on topology_state_load fix
We don't load gossiper endpoint states in
storage_service::join_cluster if
_raft_topology_change_enabled, but gossiper
is still needed even in case of
_raft_topology_change_enabled mode, since it
still contains part of the cluster state.
To work correctly, the gossiper needs to know
the current endpoints. We cannot rely on seeds alone,
since it is not guaranteed that seeds will be
up to date and reachable at the time of restart.

The specific scenario of the problem: cluster with
three nodes, the second has the first in seeds,
the third has the first and second. We restart all
the nodes simultaneously, the third node uses its
seeds as _endpoints_to_talk_with in the first gossiper round
and sends SYN to the first and sedond. The first node
hasn't started its gossiper yet, so handle_syn_msg
returns immediately after if (!this->is_enabled());
The third node receives ack from the second node and
no communication from the first node, so it fills
its _live_endpoints collection with the second node
and will never communicate with the first node again.

The problem was demonstrated by the test
test_joining_old_node_fails, it fails occasionally with
experimental_features: [consistent-topology-changes]
on the line where it waits for TEST_ONLY_FEATURE
to become enabled on all nodes. This doesn't happen
since SUPPORTED_FEATURES gossiper state is not
disseminated because of the problem described above.

The first commit is needed since add_saved_endpoint
adds the endpoint with some default app states with locally
incrementing versions and without that fix gossiper
refuses to fill the real app states for this endpoint later.

Fixes: #14675
2023-07-24 12:36:39 +04:00
Kamil Braun
e6099c4685 Merge 'config: set schema_commitlog_segment_size_in_mb to 128 ' from Patryk Jędrzejczak
Fixes #14668

In #14668, we have decided to introduce a new `scylla.yaml` variable for the schema commitlog segment size and set it to 128MB. The reason is that segment size puts a limit on the mutation size that can be written at once, and some schema mutation writes are much larger than average, as shown in #13864. This `schema_commitlog_segment_size_in_mb variable` variable is now added to `scylla.yaml` and `db/config`.

Additionally,  we do not derive the commitlog sync period for schema commitlog anymore because schema commitlog runs in batch mode, so it doesn't need this parameter. It has also been discussed in #14668.

Closes #14704

* github.com:scylladb/scylladb:
  replica: do not derive the commitlog sync period for schema commitlog
  config: set schema_commitlog_segment_size_in_mb to 128
  config: add schema_commitlog_segment_size_in_mb variable
2023-07-24 10:23:34 +02:00
Petr Gusev
87cd7e8741 gossiper: do_send_ack2_msg fix
This commit is a first part of the fix for #14675.
The issue is about the test test_joining_old_node_fails
faling occasionally with
experimental_features: [consistent-topology-changes].
The next commit contains a fix for it, here we
solve the pre-existing gossiper problem
which we stumble upon after the fix.

Local generation for addr may have been
increased since the current node sent
an initial SYN. Comparing versions across different
generations in get_state_for_version_bigger_than
could result in loosing some app states with
smaller versions.

More specifically, consider a cluster with nodes
.1, .2, .3, .3 has .1 and .2 as seeds, .2 has .1
as a seed. Suppose .2 receives a SYN from .3 before
its gossiper starts, and it has a
version 0.24 for .1 in endpoint_states.

The digest from .3 contains 0.25 as a version for .1,
so examine_gossiper produces .1->0.24 as a digest
and this digest is send to .3 as part of the ack.
Before processing this ack, .3 processed an ack from
.1 (scylla sends SYN to many nodes) and updates
its endpoint_states according to it, so now it
has .1->100500.32 for .1. Then
we get to do_send_ack2_msg and call
get_state_for_version_bigger_than(.1, 24).
This returns properties which has version > 24,
ignoring a lot of them with smaller versions
which has been received from .1. Also,
get_state_for_version_bigger_than updates
generation (it copies get_heart_beat_state from
.3), so when we apply the ack in handle_ack2_msg
at .2 we update the generation and now the
skipped app states will only be updated on .2
if somebody change them and increment their version.

Cassandra behaviour is the same in this case
(see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java#L86). This is probably less
of a problem for them since most of the time  they
send only one SYN in one gossiper round
(save for unreachable nodes), so there is less
room for conflicts.
2023-07-24 11:52:56 +04:00
Kefu Chai
3ad844a4bb build: cmake: set scylla version strings as CACHED strings
before this change, add_version_library() is a single function
which accomplishes two tasks:

1. build scylla-version target using
2. add an object library

but this has two problems:

1. we should run `SCYLLA-VERSION-GEN` at configure time, instead
   of at build time. otherwise the targets which read from the
   SCYLLA-{VERSION, RELEASE, PRODUCT}-FILE cannot access them,
   unless they are able to read them in their build rules. but
   they always use `file(STRINGS ..)` to read them, and thsee
   `file()` command is executed at configure time. so, this
   is a dead end.
2. we repeat the `file(STRING ..)` multiple places. this is
   not ideal if we want to minimize the repeatings.

so, to address this problem, in this change:

1. use `execute_process()` instead of `add_custom_command()`
   for generating these *-FILE files. so they are always ready
   at build time. this partially reverts bb7d99ad37.
2. extract `generate_scylla_version()` out of `add_version_library()`.
   so we can call the former much earlier than the latter.
   this would allow us to reference the variables defined by
   the `generate_scylla_version()` much earlier.
3. define cached strings in the extracted function, so that
   they can consumed by other places.
4. reference the cached variables in `build_submodule.cmake`.

also, take this opportunity to fix the version string
used in build_submodule.cmake: we should have used
`scylla_version_tilde`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14769
2023-07-24 08:57:19 +03:00
Michał Jadwiszczak
246728cbbb querier_cache: add stats of scheduling group mismatches
Add stats to count dropped queriers because of scheduling group
mismatch.
2023-07-21 19:05:55 +02:00
Michał Jadwiszczak
a5fc53aa11 querier_cache: check semaphore mismatch during querier lookup
Previously semaphore mismatch was checked only in multi-partition
queries and if happened, an internal error was thrown.

This commit pushed the check down to `querier_cache`, so each
`lookup_*_querier` method will check for the mismatch.

What's more, if semaphore mismatch occurs, check whether both semaphores belong
to user. If so, log a warning and drop cached reader instead of
throwing an error.

The mismatch can happen if user's scheduling group changed during
a query. We don't want to throw an error then, but drop and reset
cached reader.
2023-07-21 19:05:50 +02:00
Michał Jadwiszczak
e5c965b280 querier_cache: add reference to replica::database::is_user_semaphore() 2023-07-21 18:58:57 +02:00
Jan Ciolek
decbc841b7 cql3/prepare_expr: fix partially preparing function arguments
Before choosing a function, we prepare the arguments that can be
prepared without a receiver. Preparing an argument makes
its type known, which allows to choose the best overload
among many possible functions.

The function that prepared the argument passes the unprepared
argument by mistake. Let's fix it so that it actually uses
the prepared argument.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #14786
2023-07-21 18:59:56 +03:00
Jan Ciolek
cbc97b41d4 cql.g: make the parser reject INSERT JSON without a JSON value
We allow inserting column values using a JSON value, eg:
```cql
INSERT INTO mytable JSON '{ "\"myKey\"": 0, "value": 0}';
```

When no JSON value is specified, the query should be rejected.

Scylla used to crash in such cases. A recent change fixed the crash
(https://github.com/scylladb/scylladb/pull/14706), it now fails
on unwrapping an uninitialized value, but really it should
be rejected at the parsing stage, so let's fix the grammar so that
it doesn't allow JSON queries without JSON values.

A unit test is added to prevent regressions.

Refs: https://github.com/scylladb/scylladb/pull/14707
Fixes: https://github.com/scylladb/scylladb/issues/14709

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #14785
2023-07-21 18:52:47 +03:00
Kefu Chai
d78c6d5f50 test: use different table names in simple_backlog_controller_test
in `simple_backlog_controller_test`, we need to have multiple tables
at the same time. but the default constructor of `simple_schema` always
creates schema with the table name of "ks.cf". we are going to have
a per-table metrics. and the new metric group will use the table name
as its counter labels, so we need to either disable this per-table
metrics or use a different table name for each table.

as in real world, we don't have multiple tables at the same time. it
would be better to stop reusing the same table name in a single test
session. so, in this change, we use a random cf_name for each of
the created table.

Fixes #14767
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-21 19:08:29 +08:00
Kefu Chai
1f596e4669 test/lib/simple_schema: add ctor for customizing ks.cf
some low level tests, like the ones exercising sstables, creates
multiple tables. and we are going to add per-table metrics and
the new metrics uses the ks.cf as part of its unique id. so,
once the per-table metrics is enabled, the sstable tests would fail.
as the metrics subsystem does not allow registering multiple
metric groups with the same name.

so, in this change, we add a new constructor for `simple_schema`,
so that we can customize the the schema's ks and cf when creating
the `simple_schema`. in the next commit, we will use this new
constructor in a sstable test which creates multiple tables.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-21 19:07:45 +08:00
Kefu Chai
306439d3aa test/lib/simple_schema: do not hardwire ks.cf
instead, query the name of ks and cf from the scheme. this change
prepare us for the a simple_schema whose ks and cf can be customized
by its contructor.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-21 19:07:45 +08:00
Mikołaj Grzebieluch
37ceef23a6 test: raft: skip test_old_ip_notification_repro in debug mode
Closes #14777
2023-07-21 12:41:03 +02:00
Pavel Emelyanov
db1c6e2255 system_keyspace: Make save_truncation_record() non-static
... and stop using qctx

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-21 13:12:50 +03:00
Pavel Emelyanov
eaeffcdb81 code: Pass sharded<db::system_keyspace>& to database::truncate()
The arguments goes via the db::(drop|truncate)_table_on_all_shards()
pair of calls that start from

- storage_proxy::remote: has its sys.ks reference already
- schema_tables::merge_schema: has sys.ks argument already
- legacy_schema_migrator: the reference was added by previous patch
- tests: run in cql_test_env with sys.ks on board

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-21 13:11:59 +03:00
Pavel Emelyanov
1ef34a5ada db: Add sharded<system_keyspace>& to legacy_schema_migrator
One of the class' methods calls db::drop_table_on_all_shards() that will
need sys.ks. in the next patch.

The reference in question is provided from the only caller -- main.cc

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-21 12:38:46 +03:00
Kefu Chai
a87b0d68cd s3/test: remove the tempdir if test succeeds
in 46616712, we tried to keep the tmpdir only if the test failed,
and keep up to 1 of them using the recently introduced
option of `tmp_path_retention_count`. but it turns out this option
is not supported by the pytest used by our jenkins nodes, where we
have pytest 6.2.5. this is the one shipped along with fedora 36.

so, in this change, the tempdir is removed if the test completes
without failures. as the tempdir contains huge number of files,
and jenkins is quite slow scanning them. after nuking the tempdir,
jenkins will be much faster when scanning for the artifacts.

Fixes #14690
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14772
2023-07-21 12:21:51 +03:00