scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 22:25:48 +00:00

Author	SHA1	Message	Date
Avi Kivity	03313d359e	Merge ' db: commitlog_replayer: ignore mutations affected by (tablet) cleanups ' from Michał Chojnowski To avoid data resurrection, mutations deleted by cleanup operations should be skipped during commitlog replay. This series implements the above for tablet cleanups, by using a new system table which holds records of cleanup operations. Fixes #16752 Closes scylladb/scylladb#16888 * github.com:scylladb/scylladb: test: test_tablets: add a test for cleanup after migration test: pylib: add ScyllaCluster.wipe_sstables test: boost: add commitlog_cleanup_test db: commitlog_replayer: ignore mutations affected by (tablet) cleanups replica: table: garbage-collect irrelevant system.commitlog_cleanups records db: commitlog: add min_position() replica: table: populate system.commitlog_cleanups on tablet cleanup db: system_keyspace: add system.commitlog_cleanups replica: table: refresh compound sstable set after tablet cleanup	2024-01-25 20:51:03 +02:00
Patryk Wrobel	a858daf038	service/client_state.cc: remove redundant copying db::schema_tables::all_table_names() returns std::vector<sstring>. Usage of range-for loop without reference results in copying each of the elements of the traversed container. Such copying is redundant. This change introduces usage of const reference to avoid copying. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16983	2024-01-25 20:35:05 +02:00
Kamil Braun	543ad0987a	Merge 'raft topology: send barrier_and_drain to a decommissioning node' from Patryk Jędrzejczak We didn't send the `barrier_and_drain` command to a decommissioning node that could still be coordinating requests. It could happen that a decommissioning node sent a request with an old topology version after normal nodes received the new fence version. Then, the request would fail on replicas with the stale topology exception. This PR fixes this problem by modifying `exec_global_command`. From now on, it sends `barrier_and_drain` to a decommissioning node. We also stop filtering stale topology exceptions in `test_topology_ops`. We added this filter after detecting the bug fixed by this PR. Fixes scylladb/scylladb#15804 Fixes scylladb/scylladb#16579 Fixes scylladb/scylladb#16642 Closes scylladb/scylladb#16797 * github.com:scylladb/scylladb: test: test_topology_ops: remove failed mutations filter raft topology: send barrier_and_drain to a decommissioning node raft topology: ensure at most one transitioning node	2024-01-25 16:09:02 +01:00
Kefu Chai	ee28cf2285	test.py: s/defalt/default/ this typo was identified by codespell Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16980	2024-01-25 16:54:07 +02:00
Botond Dénes	6d5ee6d48a	Merge 'test/nodetool: run nodetool tests using "unshare"' from Kefu Chai before this change, we use a random address when launching rest_api_mock server, but there are chances that the randomly picked address conflicts with an already-used address on the host. and the subprocess fails right away with the returncode of 1 upon this failure, but we just continue on and check the readiness of the already-dead server. actually, we've seen test failures caused by the EADDRINUSE failure, and when we checked the readiness of the rest_api_mock by sending HTTP request and reading the response, what we had is not a JSON encoded response but a webpage, which was likely the one returned by a minio server. in this change, we * specify the "launcher" option of nodetool test suite to "unshare", so that all its tests are launched in separated namespaces. * do not use a random address for the mock server, as the network namespaces are separated. Fixes #16542 Closes scylladb/scylladb#16773 * github.com:scylladb/scylladb: test/nodetool: run nodetool tests using "unshare" test.py: add "launcher" option support	2024-01-25 16:53:49 +02:00
Botond Dénes	c67698ea06	compaction/compaction_manager: perform_cleanup(): hold the compaction gate While the cleanup is ongoing. Otherwise, a concurrent table drop might trigger a use-after-free, as we have seen in dtests recently. Fixes: #16770 Closes scylladb/scylladb#16874	2024-01-25 14:52:50 +01:00
Pavel Emelyanov	bf3cae4992	Merge 'tests: utils: error injection: print time duration instead of count' from Kefu Chai before this change, we always cast the wait duration to millisecond, even if it could be using a higher resolution. actually `std::chrono::steady_clock` is using `nanosecond` for its duration, so if we inject a deadline using `steady_clock`, we could be awaken earlier due to the narrowing of the duration type caused by the duration_cast. in this change, we just use the duration as it is. this should allow the caller to use the resolution provided by Seastar without losing the precision. the tests are updated to print the time duration instead of count to provide information with a higher resolution. Fixes #15902 Closes scylladb/scylladb#16264 * github.com:scylladb/scylladb: tests: utils: error injection: print time duration instead of count error_injection: do not cast to milliseconds when injecting timeout	2024-01-25 16:13:27 +03:00
Avi Kivity	69d597075a	Merge 'tablets: Add support for removenode and replace handling' from Tomasz Grabiec New tablet replicas are allocated and rebuilt synchronously with node operations. They are safely rebuilt from all existing replicas. The list of ignored nodes passed to node operations is respected. Tablet scheduler is responsible for scheduling tablet rebuilding transition which changes the replicas set. The infrastructure for handling decommission in tablet scheduler is reused for this. Scheduling is done incrementally, respecting per-shard load limits. Rebuilding transitions are recognized by load calculation to affect all tablet replicas. New kind of tablet transition is introduced called "rebuild" which adds new tablet replica and rebuilds it from existing replicas. Other than that, the transition goes through the same stages as regular migration to ensure safe synchronization with request coordinators. In this PR we simply stream from all tablet replicas. Later we should switch to calling repair to avoid sending excessive amounts of data. Fixes https://github.com/scylladb/scylladb/issues/16690. Closes scylladb/scylladb#16894 * github.com:scylladb/scylladb: tests: tablets: Add tests for removenode and replace tablets: Add support for removenode and replace handling topology_coordinator: tablets: Do not fail in a tight loop topology_coordinator: tablets: Avoid warnings about ignored failured future storage_service, topology: Track excluded state in locator::topology raft topology: Introduce param-less topology::get_excluded_nodes() raft topology: Move get_excluded_nodes() to topology tablets: load_balancer: Generalize load tracking tablets: Introduce get_migration_streaming_info() which works on migration request tablets: Move migration_to_transition_info() to tablets.hh tablets: Extract get_new_replicas() which works on migraiton request tablets: Move tablet_migration_info to tablets.hh tablets: Store transition kind per tablet	2024-01-25 14:49:43 +02:00
Patryk Jędrzejczak	b348014745	test: test_topology_ops: remove failed mutations filter We added this filter after detecting a bug in the Raft-based topology. We weren't sending `barrier_and_drain` commands to a decommissioning node that could still be coordinating requests. It could cause stale topology exceptions on replicas if the decommissioning node sent a request with an old topology version after normal nodes received the new fence version. This bug has been fixed in the previous commit, so we remove the filter.	2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak	9aebd6dd96	raft topology: send barrier_and_drain to a decommissioning node Before this patch, we didn't send the `barrier_and_drain` command to a decommissioning node that could still be coordinating requests. It could happen that a decommissioning node sent a request with an old topology version after normal nodes received the new fence version. Then, the request would fail on replicas with the stale topology exception. We fix this problem by modifying `exec_global_command`. From now on, it sends `barrier_and_drain` to a decommissioning node, which can also be in the `left_token_ring` state.	2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak	378cbd0b70	raft topology: ensure at most one transitioning node We add a sanity check to ensure at most one transitioning node at a time. If there is more, something must have gone wrong. In the future, we might implement concurrent topology operations. Then, we will remove this sanity check. We also extend the comment describing `transition_nodes` so that it better explains why we use a map and how it should be handled.	2024-01-25 13:42:46 +01:00
Alexander Turetskiy	c1ae5425f7	DROP TYPE IF EXISTS should work on non-existent keyspace DROP TYPE IF EXISTS should pass and do nothing on non-existent keyspace fixes #9082 Closes scylladb/scylladb#16504	2024-01-25 14:28:43 +02:00
Kefu Chai	b1431f08f7	test/nodetool: run nodetool tests using "unshare" before this change, we use a random address when launching rest_api_mock server, but there are chances that the randomly picked address conflicts with an already-used address on the host. and the subprocess fails right away with the returncode of 1 upon this failure, but we just continue on and check the readiness of the already-dead server. actually, we've seen test failures caused by the EADDRINUSE failure, and when we checked the readiness of the rest_api_mock by sending HTTP request and reading the response, what we had is not a JSON encoded response but a webpage, which was likely the one returned by a minio server. in this change, we * specify the "launcher" option of nodetool test suite to "unshare", so that all its tests are launched in separated namespaces. * use a random fixed address for the mock server, as the network namespaces are not shared anymore * add an option in `nodetool/conftest.py`, so that it can optionally setup the lo network interface when it is launched in a separated new network namespace. Fixes #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 20:28:36 +08:00
Kefu Chai	35b3c51f40	test.py: add "launcher" option support before this change, all "tool" test suites use "pytest" to launch their tests. but some of the tests might need a dedicated namespace so they do not interfere with each other. fortunately, "unshare(1)" allows us to run a progame in new namespaces. in this change, we add a "launcher" option to "tool" test suites. so that these tests can run with the specified "launcher" instead of using "launcher". if "launcher" is not specified, its default value of "pytest" is used. Refs #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 20:28:01 +08:00
Kurashkin Nikita	d90eeb5c4f	cql3:statement_restrictions.cc: multi-column relation null check Before this patch we received internal server error "Attempted to create key component from empty optional" when used null in multi-column relations. This patch adds a null check for each element of each tuple in the expression and generates an invalid request error if it finds such an element. Modified cassandra test and added a new one that checks the occurrence of null values in tuples. Added a test that checks whether the wrong number of items is entered in tuples. Fixes #13217 Closes scylladb/scylladb#16415	2024-01-25 14:17:43 +02:00
Botond Dénes	5df4ad2e48	test/cql-pytest: test_tools.py: fix flaky schema load failure test The test TestScyllaSsstableSchemaLoading.test_fail_schema_autodetect was observed to be flaky. Sometimes failing on local setups, but not in CI. As it turns out, this is because, when run via test.py, the test's working directory is root directory of scylla.git. In this case, scylla-sstable will find and read conf/scylla.yaml. After having done so, it will try look in the default data directory (/var/lib/scylla/data) for the schema tables. If the local machine happens to have a scylla data-dir setup at the above mentioned location, it will read the schema tables and will succeed to find the tested table (which is system table, so it is always present). This will fail the test, as the test expects the opposite -- the table not being found. The solution is to change the test's working directory to the random temporary work dir, so that the local environment doesn't interfere with it. Fixes: #16828 Closes scylladb/scylladb#16837	2024-01-25 15:14:16 +03:00
Botond Dénes	b341aa8f6d	Merge 'api/api.hh: improve usage of standard containers' from Patryk Wróbel This PR contains improvements related to usage of std::vector and looping over containers in the range-for loop. It is advised to use `std::vector::reserve()` to avoid unneeded memory allocations when the total size is known beforehand. When looping over a container that stores non-trivial types usage of const reference is advised to avoid redundant copies. Closes scylladb/scylladb#16978 * github.com:scylladb/scylladb: api/api.hh: use const reference when looping over container api/api.hh: use std::vector::reserve() when the total size is known	2024-01-25 13:22:48 +02:00
Kamil Braun	994a2ea5c3	Merge 'Call left/joined notifiers when topology coordinator is enabled' from Gleb The gossiper topology change code calls left/joined notifiers when a node leave or joins the cluster. This code it not executed in topology coordinator mode, so the coordinator needs to call those notifiers by itself. The series add the calls. Fixes scylladb/scylladb#15841 * 'gleb/raft-topo-notifications-v1' of github.com:scylladb/scylla-dev: storage service: topology coordinator: call notify_joined() when a node joins a cluster storage service: topology coordinator: call notify_left() when a node leaves a cluster storage_service: drop redundant check from notify_joined()	2024-01-25 12:12:53 +01:00
Kefu Chai	1d33a68dd7	tests: utils: error injection: print time duration instead of count instead of casting / comparing the count of duration unit, let's just compare the durations, so that boost.test is able to print the duration in a more informative and user friendly way (line wrapped) test/boost/error_injection_test.cc(167): fatal error: in "test_inject_future_disabled": critical check wait_time > sleep_msec has failed [23839ns <= 10ms] Refs #15902 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 19:10:24 +08:00
Kefu Chai	8a5689e7a7	error_injection: do not cast to milliseconds when injecting timeout before this change, we always cast the wait duration to millisecond, even if it could be using a higher resolution. actually `std::chrono::steady_clock` is using `nanosecond` for its duration, so if we inject a deadline using `steady_clock`, we could be awaken earlier due to the narrowing of the duration type caused by the duration_cast. in this change, we just use the duration as it is. this should allow the caller to use the resolution provided by Seastar without losing the precision. Fixes #15902 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 19:10:24 +08:00
Gleb Natapov	adf70aae15	storage service: topology coordinator: call notify_joined() when a node joins a cluster When the topology coordinator is used for topology changes the gossiper based code that calls notify_joined() is not called. The coordinator needs to call it itself. But it needs to call it only once when node becomes normal. For that the patch changes state loading code to remember the old set of nodes in normal state to check if a node that is normal after new state is loaded was not in the normal state before.	2024-01-25 12:28:08 +02:00
Botond Dénes	c9f247f3e8	Merge 'sstables: writer: don't block topology changes while writing sstables' from Avi Kivity The sstable writer held the effective_replication_map_ptr while writing sstables, which is both a layering violation and slows down tablet load balancing. It was needed in order to ensure the sharder was stable. But it turns out that sharding metadata is unnecessary for tablets, so just skip the whole thing when writing an sstable for tablets. Closes scylladb/scylladb#16953 * github.com:scylladb/scylladb: sstables: writer: don't require effective_replication_map for sharding metadata schema: provide method to get sharder, iff it is static	2024-01-25 12:12:01 +02:00
Botond Dénes	8e82df6fb6	Merge 'coverage libraries: bug fixes' from Eliran Sinvani This mini-series contains two bug fixes that were found as part of testing coverage reporting in CI: ref: https://github.com/scylladb/scylladb/pull/16895 1. The html-fixup which is triggered when using:`test/pylib/coverage_utils.py lcov-tools genhtml...` rendered incorrect links for multiple links in the same line. 2. For files that contined `,` in their name the output was simply wrong and resulted in lcov not being able to find such files for the purpose of filtering or generating reports. The aforementioned draft PR served as a testing bed for finding and fixing those bugs. Closes scylladb/scylladb#16977 * github.com:scylladb/scylladb: lcov_utils.py: support sourcefiles that contains commas in their name coreage_utils.py: make regular expression lazy in html-fixup	2024-01-25 11:46:15 +02:00
Kefu Chai	0fbfc96619	db: add formatter for schema_tables::table_kind before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for db::schema_tables::table_kind, and its operator<<() is still used by the homebrew generic formatter for std::map<>, so it is preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16972	2024-01-25 11:33:13 +03:00
Kefu Chai	ffb5ad494f	api: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16973	2024-01-25 11:28:02 +03:00
Patryk Wrobel	cdfe0c1c35	api/api.hh: use const reference when looping over container When reference is not used in the range-for loop, then each element of a container is copied. Such copying is not a problem for scalar types. However, the in case of non-trivial types it may cause unneeded overhead. This change replaces copying with const references to avoid copying of types like seastar::sstring etc. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-25 09:20:35 +01:00
Patryk Wrobel	1ca71f2532	api/api.hh: use std::vector::reserve() when the total size is known When growing via push_back(), std::vector may need to reallocate its internal block of memory due to not enough space. It is advised to allocate the required space before appending elements if the size is known beforehand. This change introduces usage of std::vector::reserve() in api.hh to ensure that push_back() does not cause reallocations. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-25 08:50:19 +01:00
Eliran Sinvani	d27283918f	lcov_utils.py: support sourcefiles that contains commas in their name As part of the parsing, every line of an lcov file was modeled as INFO_TYPE:field[,field]... However specifically for info type "SF" which represents the source file there can only be one field. This caused files that are using ',' in their names to be cut down up to the first ',' and as a results not handled correctly by lcov_utils.py especially when rewriting a file. This patch adds a special handling for the "SF" INFO_TYPE. ref : `man geninfo` Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-25 09:30:52 +02:00
Eliran Sinvani	11eb9f5bb2	coreage_utils.py: make regular expression lazy in html-fixup The html-fixup procedure was created because of a bug in genhtml (`man genhtml` for details about what genhtml is). The bug is that genhtml doesn't account for file names that contains illegal url characters (ref: https://stackoverflow.com/a/1547940/2669716). html-fixup converts those characters to the %<octet> notation (i.e space character becomes %20 etc..). However, the regular expression used to detect links was eager, which didn't account for multiple links in the same line. This was discovered during browsing one of the report and noticing that the links that are meant to alternate between code view and function view of a source got scrambled and unusable after html-fixup. This change makes the regex that is used to detect links lazy so it can handle multiple links in the same line in an html file correctly. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-25 09:30:42 +02:00
Nadav Har'El	69a68e35dd	Merge 'scylla-sstable: add support for loading schema of views and indexes' from Botond Dénes Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables. This PR addresses both: * When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly. * When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`. Tests are also added. Fixes: #16492 Closes scylladb/scylladb#16517 * github.com:scylladb/scylladb: test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI test/cql-pytest: test_tools.py: extract some fixture logic to functions test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas test/boost/schema_loader_test: add test for mvs and indexes tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL replica/database: extract existing_index_names and get_available_index_name tools/schema_loader: make real_db.tables the only source of truth on existing tables tools/schema_loader: table(): store const keyspace& tools/schema_loader: make database,keyspace,table non-movable cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value cql3/statements/create_index_statement: make build_index_schema() public cql3/statements/create_index_statement: relax some method's dependence on qp cql3/statements/create_view_statement: make prepare_view() public	2024-01-24 23:36:54 +02:00
Nadav Har'El	df6c9828ef	Merge 'Add protobuf and Native histogram support' from Amnon Heiman Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. Native histograms hold the benefits of high resolution at a lower resource cost. This series allows sending histograms in a native histogram format over protobuf. By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf. Fixes #12931 Closes scylladb/scylladb#16737 * github.com:scylladb/scylladb: main.cc: Add prometheus_allow_protobuf command line histogram_metrics_helper: support native histogram config: Add prometheus_allow_protobuf flag	2024-01-24 21:24:50 +02:00
Michał Chojnowski	f0eadc734e	test: test_tablets: add a test for cleanup after migration Reproduces the problems fixed by earlier commits in the series.	2024-01-24 19:36:29 +01:00
Botond Dénes	7bb3ed7f23	docs/operating-scylla: scylla-sstable.rst: fix checksum list Add empty line before list of different checksums in validate-checksums's description. Otherwise the list is not rendered. Closes scylladb/scylladb#16401	2024-01-24 16:34:13 +01:00
Kefu Chai	a9851cf834	test.py: replace "$foo is False" with "not $foo" for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16960	2024-01-24 15:21:53 +02:00
Kefu Chai	add74ec8ee	mutation_writer: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16958	2024-01-24 15:20:02 +02:00
Kefu Chai	c978d1b3f8	config: s/re-use/reuse/ this misspelling is identified by codespell. per m-w, reuse is a word per-se, and we don't need the hyphen for addressing the ambiguity in the use cases, like, recover and re-cover. see also https://www.merriam-webster.com/dictionary/reuse Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16962	2024-01-24 15:19:03 +02:00
Kefu Chai	8c39aba820	tools/scylla-sstable: use canonical path for sst_path we deduce the paths to other SSTable components from the one specified from the command line, for instance, if /a/b/c/me-really-big-Data.db is fed to `scylla sstable`, the tool would try to read /a/b/c/me-really-big-TOC.txt for the list of other components. this works fine if the full path is specified in the command line. but if a relative path is specified, like, "me-really-big-Data.db", this does not work anymore. before this change, the tool would be reading "/me-really-big-TOC.txt", which does not exist under most circumstances. while $PWD/me-really-big-TOC.txt should exist if the SSTable is sane. after this change, we always convert the specified path to its canonical representation, no matter it is relative or absolutate. this enables us to get the correct parent path path when trying to read, for instance, the TOC component. Fixes #16955 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16964	2024-01-24 13:28:40 +02:00
Michał Chojnowski	b88a0eb9ab	test: pylib: add ScyllaCluster.wipe_sstables Add a method which wipes sstables files for a particular table on a particular stopped node.	2024-01-24 11:52:49 +01:00
Michał Chojnowski	94cdfcaa94	test: boost: add commitlog_cleanup_test Adds a test for the commitlog cleanup functionality added earlier in the series.	2024-01-24 10:37:39 +01:00
Michał Chojnowski	a246bb39ef	db: commitlog_replayer: ignore mutations affected by (tablet) cleanups To avoid data resurrection, mutations deleted by cleanup operations have to be skipped during commitlog replay. This patch implements this, based on the metadata recorded on cleanup operations into system.commitlog_cleanups.	2024-01-24 10:37:39 +01:00
Michał Chojnowski	f458a1bf3e	replica: table: garbage-collect irrelevant system.commitlog_cleanups records Currently, rows in system.commitlog_cleanups are only dropped on node restart, so the table can accumulate an unbounded number of records. This probably isn't a problem in practice, because tablet cleanups aren't that frequent, but this patch adds a countermeasure anyway. This patch makes the choice to delete the unneeded records right when new records are added. This isn't ideal -- it would be more natural if the unneeded records were deleted as soon as they become unneeded -- but it does the job with a minimal amount of code.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	05ff32ebf9	db: commitlog: add min_position() Add a helper function which returns the minimum replay position across all existing or future commitlog segments. Only positions greater or equal to it can be replayed on the next reboot. We will use this helper in a future patch to garbage collect some cleanup metadata which refers to replay positions.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	a10650959c	replica: table: populate system.commitlog_cleanups on tablet cleanup To avoid data resurrection after cleanup, we have to filter out the cleaned mutations during commitlog replay. In this patch, we get tablet cleanup to record the affected set of mutations to system.commitlog_cleanups. In a later patch, we will use these records for filtering during commitlog replay.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	7c5a8894be	db: system_keyspace: add system.commitlog_cleanups Add a system table which will hold records of cleanup operations, for the purpose of filtering commitlog replays to avoid data resurrections.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	8bfd078c54	replica: table: refresh compound sstable set after tablet cleanup If the compound set isn't refreshed, readers will keep seeing the dataset as it was before the cleanup, which is a bug.	2024-01-24 10:37:38 +01:00
Kefu Chai	207fe93b90	utils: add formatter for rjson::value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for rjson::value, and drop its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16956	2024-01-24 10:30:52 +02:00
Gleb Natapov	b97ff54a41	storage service: topology coordinator: call notify_left() when a node leaves a cluster When the topology coordinator is used for topology changes the gossiper based code that calls notify_left() is not called. The coordinator needs to call it itself.	2024-01-24 10:21:01 +02:00
Gleb Natapov	5459a8b9a5	storage_service: drop redundant check from notify_joined() notify_joined() is called from handle_state_normal only, so there is no point checking that the state is normal inside the function as well.	2024-01-24 10:17:12 +02:00
Avi Kivity	8ee75ae8f4	sstables: writer: don't require effective_replication_map for sharding metadata Currently, we pass an effective_replication_map_ptr to sstable_writer, so that we can get a stable dht::sharder for writing the sharding metadata. This is needed because with tablets, the sharder can change dynamically. However, this is both bad and unnecessary: - bad: holding on to an effective_replication_map_ptr is a barrier for topology operations, preventing tablet migrations (etc) while an sstable is being written - unnecessary: tablets don't require sharding metadata at all, since two tablets cannot overlap (unlike two sstables from different shards in the same node). So the first/last key is sufficient to determine the shard/tablet ownership. Given that, just pass the sharder for vnode sstables, and don't generate sharding metadata for tablet sstables.	2024-01-23 22:23:08 +02:00
Avi Kivity	b88f422a53	schema: provide method to get sharder, iff it is static The current get_sharder() method only allows getting a static sharder (since a dynamic sharder needs additional protection). However, it chooses to abort if someone attempt to get a dynamic sharder. In one case, it's useful to get a sharder only if it's static, so provide a method to do that. This is for providing sstable sharding metadata, which isn't useful with tablets.	2024-01-23 22:20:59 +02:00

1 2 3 4 5 ...

40863 Commits