scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 06:23:03 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	e3a8bb7ec9	tablets: Introduce global_tablet_id Identifies tablet in the scope of the whole cluster. Not to be confused with tablet replicas, which all share global_tablet_id. Will be needed by load balancer and tablet migration algorithm to identify tablets globally.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	f88220aeee	stream_transfer_task, multishard_writer: Work with table sharder So that we can use it on tablet-based tables.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	8cf92d4c86	tablets: Turn tablet_id into a struct The IDL compiler cannot deal with enum classes like this.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	c2b18ae483	db: Do not create per-keyspace erm for tablet-based tables This erm is not updated when replicating token metadata in storage_service::replicate_to_all_cores() so will pin token metadata version and prevent token metadata barrier from finishing. It is not necessary to have per-keyspace erm for tablet-based tables, so just don't create it.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	91dee5c872	tablets: effective_replication_map: Take transition stage into account when computing replicas	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	dc2ec3f81c	tablets: Store "stage" in transition info It's needed to implement tablet migration. It stores the current step of tablet migration state machine. The state machine will be advanced by the topology change coordinator. See the "Tablet migration" section of topology-over-raft.md	2023-07-25 21:08:02 +02:00
Tomasz Grabiec	05519bd5e5	doc: Document tablet migration state machine and load balancer	2023-07-25 21:08:02 +02:00
Tomasz Grabiec	7851694eaa	locator: erm: Make get_endpoints_for_reading() always return read replicas Just a simplification. Drop the test case from token_metadata which creates pending endpoints without normal tokens. It fails after this change with exception: "sorted_tokens is empty in first_token_index!" thrown from token_metadata::first_token_index(), which is used when calculating normal endpoints. This test case is not valid, first node inserts its tokens as normal without going through bootstrap procedure.	2023-07-25 21:08:01 +02:00
Tomasz Grabiec	b642e69eb3	storage_service: topology_coordinator: Sleep on failure between retries Avoid failing in a tight loop. Can happen if some node is down, for example.	2023-07-25 21:08:01 +02:00
Tomasz Grabiec	f0e9dbf911	storage_service: topology_coordinator: Simplify coordinator loop This refactoring removes a boolean and branching which makes it easier to reason about the flow, and easier to extend it with more steps.	2023-07-25 21:08:01 +02:00
Tomasz Grabiec	b294932cf1	main: Require experimental raft to enable tablets Tablets depend on the topology changes on raft feature. Drop "tablets" from suite.yaml of the topology/ suite, which doesn't use tablets anymore.	2023-07-25 21:08:01 +02:00
Pavel Emelyanov	c46c57d535	messaging_service: Clear list of clients on shutdown When messaging_service shuts down it first sets _shutting_down to true and proceeds with stopping clients and servers. Stopping clients, in turn, is calling client.stop() on each. Setting _shutting_down is used in two places. First, when a client is stopped it may happen that it's in the middle of some operation, which may result in call to remove_error_rpc_client() and not to call .stop() for the second time it just does nothing if the shutdown flag is set (see `357c91a076`). Second, get_rpc_client() asserts that this flag is not set, so once shutdown started it can make sure that it will call .stop() on _all_ clients and no new ones would appear in parallel. However, after shutdown() is complete the _clients vector of maps remains intact even though all clients from it are stopped. This is not very debugging-friendly, the clients are better be removed on shutdown. fixes: #14624 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14632	2023-07-25 13:08:20 +03:00
Botond Dénes	ed025890e5	scripts/coverage.py: --run: swallow KeyboardInterrupt It is quite common to stop a tested scylla process with ^C, which will raise KeyboardInterrupt from subprocess.run(). Catch and swallow this exception, allowing the post-processing to continue. The interrupted process has to handle the interrupt correctly too -- flush the coverage data even on premature exit -- but this is for another patch. Closes #14815	2023-07-25 12:29:22 +03:00
Kefu Chai	2943d3c1b0	tools/scylla-sstable: s/foo.find(bar) != foo.end()/foo.count(bar) != 0/ just for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14816	2023-07-25 11:38:44 +03:00
Raphael S. Carvalho	0ac43ea877	Fix stack-use-after-return in mutation source excluding staging The new test detected a stack-use-after-return when using table's as_mutation_source_excluding_staging() for range reads. This doesn't really affect view updates that generate single key reads only. So the problem was only stressed in the recently added test. Otherwise, we'd have seen it when running dtests (in debug mode) that stress the view update path from staging. The problem happens because the closure was feeded into a noncopyable_function that was taken by reference. For range reads, we defer before subsequent usage of the predicate. For single key reads, we only defer after finished using the predicate. Fix is about using sstable_predicate type, so there won't be a need to construct a temporary object on stack. Fixes #14812. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14813	2023-07-25 10:38:20 +03:00
Botond Dénes	3eec990e4e	Merge 'test: use different table names in simple_backlog_controller_test ' from Kefu Chai in this series, we use different table names in simple_backlog_controller_test. this test is a test exercising sstables compaction strategies. and it creates and keeps multiple tables in a single test session. but we are going to add metrics on per-table basis, and will use the table's ks and cf as the counter's labels. as the metrics subsystem does not allow multiple counters to share the same label. the test will fail when the metrics are being added. to address this problem, in this change 1. a new ctor is added for `simple_schema`, so we can create `simple_schema` with different names 2. use the new ctor in simple_backlog_controller_test Fixes #14767 Closes #14783 * github.com:scylladb/scylladb: test: use different table names in simple_backlog_controller_test test/lib/simple_schema: add ctor for customizing ks.cf test/lib/simple_schema: do not hardwire ks.cf	2023-07-25 10:26:33 +03:00
Anna Stuchlik	f6732865b9	doc: doc: move unified installer from web to docs This commit adds the information on how to install ScyllaDB without root privileges (with "unified installer", but we've decided to drop that name - see the page title). The content taken from the website https://www.scylladb.com/download/?platform=tar&version=scylla-5.2#open-source is divided into two sections: "Download and Install" and "Configure and Run ScyllaDB". In addition, the "Next Steps" section is also copied from the website, and adjusted to be in sync with other installation pages in the docs. Refs https://github.com/scylladb/scylla-docs/issues/4091 Closes #14781	2023-07-25 10:23:02 +03:00
Benny Halevy	a07440173f	storage_service: node_ops_ctl: send_to_all: fix "Node is down for" log message args order The node and op_desc args are reversed. Fixes #14807 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #14808	2023-07-24 21:13:06 +03:00
Petr Gusev	5fb8da4181	hints: add fencing In this commit we just pass a fencing_token through hint_mutation RPC verb. The hints manager uses either storage_proxy::send_hint_to_all_replicas or storage_proxy::send_hint_to_endpoint to send a hint. Both methods capture the current erm and use the corresponding fencing token from it in the mutation or hint_mutation RPC verb. If these verbs are fenced out, the server stale_topology_exception is translated to a mutation_write_failure_exception on the client with an appropriate error message. The hint manager will attempt to resend the failed hint from the commitlog segment after a delay. However, if delivery is unsuccessful, the hint will be discarded after gc_grace_seconds. Closes #14580	2023-07-24 18:12:48 +02:00
Tomasz Grabiec	5b30931406	Merge 'raft topology: restore gossiper eps' from Gusev Petr We don't load gossiper endpoint states in `storage_service::join_cluster` if `_raft_topology_change_enabled`, but gossiper is still needed even in case of `_raft_topology_change_enabled` mode, since it still contains part of the cluster state. To work correctly, the gossiper needs to know the current endpoints. We cannot rely on seeds alone, since it is not guaranteed that seeds will be up to date and reachable at the time of restart. The problem was demonstrated by the test `test_joining_old_node_fails`, it fails occasionally with `experimental_features: [consistent-topology-changes]` on the line where it waits for `TEST_ONLY_FEATURE` to become enabled on all nodes. This doesn't happen since `SUPPORTED_FEATURES` gossiper state is not disseminated, and feature_service still relies on gossiper to disseminate information around the cluster. The series also contains a fix for a problem in `gossiper::do_send_ack2_msg`, see commit message for details. Fixes #14675 Closes #14775 * github.com:scylladb/scylladb: storage_service: restore gossiper endpoints on topology_state_load fix gossiper: do_send_ack2_msg fix	2023-07-24 13:55:50 +02:00
Botond Dénes	a8feb7428d	Merge 'semaphore mismatch: don't throw an error if both semaphores belong to user' from Michał Jadwiszczak If semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error. Until now, semaphore mismatch was only checked in multi-partition queries. The PR pushes the check to `querier_cache` and perform it on all `lookup__querier` methods. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader. This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it. Refers: https://github.com/scylladb/scylla-enterprise/issues/3182 Refers: https://github.com/scylladb/scylla-enterprise/issues/3050 Closes: #14770 Closes #14736 github.com:scylladb/scylladb: querier_cache: add stats of scheduling group mismatches querier_cache: check semaphore mismatch during querier lookup querier_cache: add reference to `replica::database::is_user_semaphore()` replica:database: add method to determine if semaphore is user one	2023-07-24 14:13:09 +03:00
Petr Gusev	75694aa080	storage_service: restore gossiper endpoints on topology_state_load fix We don't load gossiper endpoint states in storage_service::join_cluster if _raft_topology_change_enabled, but gossiper is still needed even in case of _raft_topology_change_enabled mode, since it still contains part of the cluster state. To work correctly, the gossiper needs to know the current endpoints. We cannot rely on seeds alone, since it is not guaranteed that seeds will be up to date and reachable at the time of restart. The specific scenario of the problem: cluster with three nodes, the second has the first in seeds, the third has the first and second. We restart all the nodes simultaneously, the third node uses its seeds as _endpoints_to_talk_with in the first gossiper round and sends SYN to the first and sedond. The first node hasn't started its gossiper yet, so handle_syn_msg returns immediately after if (!this->is_enabled()); The third node receives ack from the second node and no communication from the first node, so it fills its _live_endpoints collection with the second node and will never communicate with the first node again. The problem was demonstrated by the test test_joining_old_node_fails, it fails occasionally with experimental_features: [consistent-topology-changes] on the line where it waits for TEST_ONLY_FEATURE to become enabled on all nodes. This doesn't happen since SUPPORTED_FEATURES gossiper state is not disseminated because of the problem described above. The first commit is needed since add_saved_endpoint adds the endpoint with some default app states with locally incrementing versions and without that fix gossiper refuses to fill the real app states for this endpoint later. Fixes: #14675	2023-07-24 12:36:39 +04:00
Kamil Braun	e6099c4685	Merge 'config: set schema_commitlog_segment_size_in_mb to 128 ' from Patryk Jędrzejczak Fixes #14668 In #14668, we have decided to introduce a new `scylla.yaml` variable for the schema commitlog segment size and set it to 128MB. The reason is that segment size puts a limit on the mutation size that can be written at once, and some schema mutation writes are much larger than average, as shown in #13864. This `schema_commitlog_segment_size_in_mb variable` variable is now added to `scylla.yaml` and `db/config`. Additionally, we do not derive the commitlog sync period for schema commitlog anymore because schema commitlog runs in batch mode, so it doesn't need this parameter. It has also been discussed in #14668. Closes #14704 * github.com:scylladb/scylladb: replica: do not derive the commitlog sync period for schema commitlog config: set schema_commitlog_segment_size_in_mb to 128 config: add schema_commitlog_segment_size_in_mb variable	2023-07-24 10:23:34 +02:00
Petr Gusev	87cd7e8741	gossiper: do_send_ack2_msg fix This commit is a first part of the fix for #14675. The issue is about the test test_joining_old_node_fails faling occasionally with experimental_features: [consistent-topology-changes]. The next commit contains a fix for it, here we solve the pre-existing gossiper problem which we stumble upon after the fix. Local generation for addr may have been increased since the current node sent an initial SYN. Comparing versions across different generations in get_state_for_version_bigger_than could result in loosing some app states with smaller versions. More specifically, consider a cluster with nodes .1, .2, .3, .3 has .1 and .2 as seeds, .2 has .1 as a seed. Suppose .2 receives a SYN from .3 before its gossiper starts, and it has a version 0.24 for .1 in endpoint_states. The digest from .3 contains 0.25 as a version for .1, so examine_gossiper produces .1->0.24 as a digest and this digest is send to .3 as part of the ack. Before processing this ack, .3 processed an ack from .1 (scylla sends SYN to many nodes) and updates its endpoint_states according to it, so now it has .1->100500.32 for .1. Then we get to do_send_ack2_msg and call get_state_for_version_bigger_than(.1, 24). This returns properties which has version > 24, ignoring a lot of them with smaller versions which has been received from .1. Also, get_state_for_version_bigger_than updates generation (it copies get_heart_beat_state from .3), so when we apply the ack in handle_ack2_msg at .2 we update the generation and now the skipped app states will only be updated on .2 if somebody change them and increment their version. Cassandra behaviour is the same in this case (see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java#L86). This is probably less of a problem for them since most of the time they send only one SYN in one gossiper round (save for unreachable nodes), so there is less room for conflicts.	2023-07-24 11:52:56 +04:00
Kefu Chai	3ad844a4bb	build: cmake: set scylla version strings as CACHED strings before this change, add_version_library() is a single function which accomplishes two tasks: 1. build scylla-version target using 2. add an object library but this has two problems: 1. we should run `SCYLLA-VERSION-GEN` at configure time, instead of at build time. otherwise the targets which read from the SCYLLA-{VERSION, RELEASE, PRODUCT}-FILE cannot access them, unless they are able to read them in their build rules. but they always use `file(STRINGS ..)` to read them, and thsee `file()` command is executed at configure time. so, this is a dead end. 2. we repeat the `file(STRING ..)` multiple places. this is not ideal if we want to minimize the repeatings. so, to address this problem, in this change: 1. use `execute_process()` instead of `add_custom_command()` for generating these *-FILE files. so they are always ready at build time. this partially reverts `bb7d99ad37`. 2. extract `generate_scylla_version()` out of `add_version_library()`. so we can call the former much earlier than the latter. this would allow us to reference the variables defined by the `generate_scylla_version()` much earlier. 3. define cached strings in the extracted function, so that they can consumed by other places. 4. reference the cached variables in `build_submodule.cmake`. also, take this opportunity to fix the version string used in build_submodule.cmake: we should have used `scylla_version_tilde`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14769	2023-07-24 08:57:19 +03:00
Michał Jadwiszczak	246728cbbb	querier_cache: add stats of scheduling group mismatches Add stats to count dropped queriers because of scheduling group mismatch.	2023-07-21 19:05:55 +02:00
Michał Jadwiszczak	a5fc53aa11	querier_cache: check semaphore mismatch during querier lookup Previously semaphore mismatch was checked only in multi-partition queries and if happened, an internal error was thrown. This commit pushed the check down to `querier_cache`, so each `lookup_*_querier` method will check for the mismatch. What's more, if semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning and drop cached reader instead of throwing an error. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader.	2023-07-21 19:05:50 +02:00
Michał Jadwiszczak	e5c965b280	querier_cache: add reference to `replica::database::is_user_semaphore()`	2023-07-21 18:58:57 +02:00
Jan Ciolek	decbc841b7	cql3/prepare_expr: fix partially preparing function arguments Before choosing a function, we prepare the arguments that can be prepared without a receiver. Preparing an argument makes its type known, which allows to choose the best overload among many possible functions. The function that prepared the argument passes the unprepared argument by mistake. Let's fix it so that it actually uses the prepared argument. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #14786	2023-07-21 18:59:56 +03:00
Jan Ciolek	cbc97b41d4	cql.g: make the parser reject INSERT JSON without a JSON value We allow inserting column values using a JSON value, eg: ```cql INSERT INTO mytable JSON '{ "\"myKey\"": 0, "value": 0}'; ``` When no JSON value is specified, the query should be rejected. Scylla used to crash in such cases. A recent change fixed the crash (https://github.com/scylladb/scylladb/pull/14706), it now fails on unwrapping an uninitialized value, but really it should be rejected at the parsing stage, so let's fix the grammar so that it doesn't allow JSON queries without JSON values. A unit test is added to prevent regressions. Refs: https://github.com/scylladb/scylladb/pull/14707 Fixes: https://github.com/scylladb/scylladb/issues/14709 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #14785	2023-07-21 18:52:47 +03:00
Kefu Chai	d78c6d5f50	test: use different table names in simple_backlog_controller_test in `simple_backlog_controller_test`, we need to have multiple tables at the same time. but the default constructor of `simple_schema` always creates schema with the table name of "ks.cf". we are going to have a per-table metrics. and the new metric group will use the table name as its counter labels, so we need to either disable this per-table metrics or use a different table name for each table. as in real world, we don't have multiple tables at the same time. it would be better to stop reusing the same table name in a single test session. so, in this change, we use a random cf_name for each of the created table. Fixes #14767 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-21 19:08:29 +08:00
Kefu Chai	1f596e4669	test/lib/simple_schema: add ctor for customizing ks.cf some low level tests, like the ones exercising sstables, creates multiple tables. and we are going to add per-table metrics and the new metrics uses the ks.cf as part of its unique id. so, once the per-table metrics is enabled, the sstable tests would fail. as the metrics subsystem does not allow registering multiple metric groups with the same name. so, in this change, we add a new constructor for `simple_schema`, so that we can customize the the schema's ks and cf when creating the `simple_schema`. in the next commit, we will use this new constructor in a sstable test which creates multiple tables. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-21 19:07:45 +08:00
Kefu Chai	306439d3aa	test/lib/simple_schema: do not hardwire ks.cf instead, query the name of ks and cf from the scheme. this change prepare us for the a simple_schema whose ks and cf can be customized by its contructor. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-21 19:07:45 +08:00
Mikołaj Grzebieluch	37ceef23a6	test: raft: skip test_old_ip_notification_repro in debug mode Closes #14777	2023-07-21 12:41:03 +02:00
Kefu Chai	a87b0d68cd	s3/test: remove the tempdir if test succeeds in `46616712`, we tried to keep the tmpdir only if the test failed, and keep up to 1 of them using the recently introduced option of `tmp_path_retention_count`. but it turns out this option is not supported by the pytest used by our jenkins nodes, where we have pytest 6.2.5. this is the one shipped along with fedora 36. so, in this change, the tempdir is removed if the test completes without failures. as the tempdir contains huge number of files, and jenkins is quite slow scanning them. after nuking the tempdir, jenkins will be much faster when scanning for the artifacts. Fixes #14690 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14772	2023-07-21 12:21:51 +03:00
Nadav Har'El	5860820934	Merge 'mutation/mutation_compactor: validate the input stream' from Botond Dénes The mutation compactor has a validator which it uses to validate the stream of mutation fragments that passes through it. This validator is supposed to validate the stream as it enters the compactor, as opposed to its compacted form (output). This was true for most fragment kinds except range tombstones, as purged range tombstones were not visible to the validator for the most part. This mistake was introduced by https://github.com/scylladb/scylladb/commit `e2c9cdb576`, which itself was a flawed attempt at fixing an error seen because purged tombstones were not terminated by the compactor. This patch corrects this mistake by fixing the above problem properly: on page-cut, if the validator has an active tombstone, a closing tombstone is generated for it, to avoid the false-positive error. With this, range tombstones can be validated again as they come in. The existing unit test checking the validation in the compactor is greatly expanded to check all (I hope) different validation scenarios. Closes #13817 * github.com:scylladb/scylladb: test/mutation_test: test_compactor_validator_sanity_test mutation/mutation_compactor: fix indentation mutation/mutation_compactor: validate the input stream mutation: mutation_fragment_stream_validating_filter: add accessor to underlying validator readers: reader-from-fragment: don't modify stream when created without range	2023-07-21 00:26:46 +03:00
Avi Kivity	e00811caac	cql3: grammar: reject intValue with no contents The grammar mistakenly allows nothing to be parsed as an intValue (itself accepted in LIMIT and similar clauses). Easily fixed by removing the empty alternative. A unit test is added. Fixes #14705. Closes #14707	2023-07-21 00:24:51 +03:00
Pavel Emelyanov	98609e2115	Merge 's3/test: close using deferred_close() or deferred()' from Kefu Chai let's use RAII to tear down the client and the input file, so we can always perform the cleanups even if the test throws. Closes #14765 * github.com:scylladb/scylladb: s3/test: use seastar::deferred() to perform cleanup s3/test: close using deferred_close()	2023-07-20 20:05:34 +03:00
Botond Dénes	bf6186ed7e	Update tools/java submodule * tools/java 9f63a96f...585b30fd (1): > cassandra-stress: add support for using RackAwareRoundRobinPolicy	2023-07-20 18:13:32 +03:00
Botond Dénes	819b45d107	Merge 'Remove dead replacing_nodes_pending_ranges_updater manipulations' from Pavel Emelyanov The set in question is read-and-delete-only and thus always empty. Originally it was removed by commit `c9993f020d` (storage_service: get rid of handle_state_replacing), but some dangling ends were left. Consequentially, the on_alive() callback can get rid of few dead if-else branches Closes #14762 * github.com:scylladb/scylladb: storage_service: Relax on_alive() storage_service: Remove _replacing_nodes_pending_ranges_updater	2023-07-20 16:55:44 +03:00
Pavel Emelyanov	9df750fd4c	storage_service: Remove dead get_rpc_address() Unused. Locator calls gossiper directly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14761	2023-07-20 16:54:24 +03:00
Botond Dénes	53da97416a	Merge 'Remove qctx from system.paxos table access methods' from Pavel Emelyanov The "fix" is straightforward -- callers of system_keyspace::paxos methods need to get system keyspace from somewhere. This time the only caller is storage_proxy::remote that can have system keyspace via direct dependency reference. Closes #14758 * github.com:scylladb/scylladb: db/system_keyspace: Move and use qctx::execute_cql_with_timeout() db/system_keyspace: Make paxos methods non-static service/paxos: Add db::system_keyspace& argument to some methods test: Optionally initialize proxy remote for cql_test_env proxy/remote: Keep sharded<db::system_keyspace>& dependency	2023-07-20 16:53:25 +03:00
Botond Dénes	e62325babc	Merge 'Compaction reshard task' from Aleksandra Martyniuk Task manager tasks covering reshard compaction. Reattempt on https://github.com/scylladb/scylladb/pull/14044. Bugfix for https://github.com/scylladb/scylladb/issues/14618 is squashed with 95191f4. Regression test added. Closes #14739 * github.com:scylladb/scylladb: test: add test for resharding with non-empty owned_ranges_ptr test: extend test_compaction_task.py to test resharding compaction compaction: add shard_reshard_sstables_compaction_task_impl compaction: invoke resharding on sharded database compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run() compaction: add reshard_sstables_compaction_task_impl compaction: create resharding_compaction_task_impl	2023-07-20 16:43:22 +03:00
Botond Dénes	a35f4f6985	test/mutation_test: test_compactor_validator_sanity_test Greatly expand this test to check that the compactor validates the input stream properly. The test is renamed (the _sanity_test suffix is removed) to reflect the expanded scope.	2023-07-20 08:48:50 -04:00
Botond Dénes	18ed94e60b	mutation/mutation_compactor: fix indentation Left broken by the previous patch.	2023-07-20 08:48:50 -04:00
Botond Dénes	3d5b70e0d7	mutation/mutation_compactor: validate the input stream The mutation compactor has a validator which it uses to validate the stream of mutation fragments that passes through it. This validator is supposed to validate the stream as it enters the compactor, as opposed to its compacted form (output). This was true for most fragment kinds except range tombstones, as purged range tombstones were not visible to the validator for the most part. This mistake was introduced by `e2c9cdb576`, which itself was a flawed attempt at fixing an error seen because purged tombstones were not terminated by the compactor. This patch corrects this mistake by fixing the above problem properly: on page-cut, if the validator has an active tombstone, a closing tombstone is generated for it, to avoid the false-positive error. With this, range tombstones can be validated again as they come in.	2023-07-20 08:48:50 -04:00
Botond Dénes	dbb2a6f03a	mutation: mutation_fragment_stream_validating_filter: add accessor to underlying validator	2023-07-20 08:48:50 -04:00
Botond Dénes	93dd16fccc	readers: reader-from-fragment: don't modify stream when created without range The fragment reader currently unconditionally forwards its buffer to the passed-in partition range. Even if this range is `query::full_partition_range`, this will involve dropping any fragments up to the first partitions tart. This causes problems for test users who intentionally create invalid fragment streams, that don't start with a partition-start. Refactor the reader to not do any modifications on the stream, when neither slice, nor partition-range was passed by the user.	2023-07-20 08:48:50 -04:00
Kefu Chai	fdf61d2f7c	compaction_manager: prevent gc-only sstables from being compacted before this change, there are chances that the temporary sstables created for collecting the GC-able data create by a certain compaction can be picked up by another compaction job. this wastes the CPU cycles, adds write amplification, and causes inefficiency. in general, these GC-only SSTables are created with the same run id as those non-GC SSTables, but when a new sstable exhausts input sstable(s), we proactively replace the old main set with a new one so that we can free up the space as soon as possible. so the GC-only SSTables are added to the new main set along with the non-GC SSTables, but since the former have good chance to overlap the latter. these GC-only SSTables are assigned with different run ids. but we fail to register them to the `compaction_manager` when replacing the main sstable set. that's why future compactions pick them up when performing compaction, when the compaction which created them is not yet completed. so, in this change, * to prevent sstables in the transient stage from being picked up by regular compactions, a new interface class is introduced so that the sstable is always added to registration before it is added to sstable set, and removed from registration after it is removed from sstable set. the struct helps to consolidate the regitration related logic in a single place, and helps to make it more obvious that the timespan of an sstable in the registration should cover that in the sstable set. * use a different run_id for the gc sstable run, as it can overlap with the output sstable run. the run_id for the gc sstable run is created only when the gc sstable writer is created. because the gc sstables is not always created for all compactions. please note, all (indirect) callers of `compaction_task_executor::compact_sstables()` passes a non-empty `std::function` to this function, so there is no need to check for empty before calling it. so in this change, the check is dropped. Fixes #14560 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14725	2023-07-20 15:47:48 +03:00
Asias He	865891cf02	doc: Repair system_auth with nodetool repair -pr option Since repair is performed on all nodes, each node can just repair the primary ranges instead of all owned ranges. This avoids repair ranges more than once. Closes #14766	2023-07-20 15:12:20 +03:00

1 2 3 4 5 ...

38069 Commits