scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 05:35:48 +00:00

Author	SHA1	Message	Date
Kefu Chai	aea6cd0b2d	test/tablets: do not compare signed and unsigned this change should silence following warning: ``` test/boost/tablets_test.cc:1600:27: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare] 19:47:04 for (int i = 0; i < smp::count * 20; i++) { 19:47:04 ~ ^ ~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 20:49:21 +08:00
Pavel Emelyanov	66b859a29f	gms: Remove unused operator<< for feature object Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17109	2024-02-01 19:00:46 +02:00
Kefu Chai	aad8035bed	replica/database: use structured-bind when appropriate for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17104	2024-02-01 16:31:29 +02:00
Botond Dénes	dc8e13baed	Merge 'Move some tablets tests from topology_custom to cql-pytest' from Pavel Emelyanov The latter suite is now tablets-aware and tablets cases from the former one can happily work with single shared scylla instance Closes scylladb/scylladb#17101 * github.com:scylladb/scylladb: test/topology_custom: Remove test_tablets.py test/topology: Move test_tablet_change_initial_tablets test/topology: Move test_tablet_explicit_disabling test/topology: Move test_tablet_default_initialization test/topology: Move test_tablet_change_replication_strategy test/topology: Move test_tablet_change_replication_vnode_to_tablets cql-pytest: Add skip_without_tablets fixture	2024-02-01 16:28:43 +02:00
Kamil Braun	c911bf1a33	test_raft_snapshot_request: fix flakiness (again) At the end of the test, we wait until a restarted node receives a snapshot from the leader, and then verify that the log has been truncated. To check the snapshot, the test used the `system.raft_snapshots` table, while the log is stored in `system.raft`. Unfortunately, the two tables are not updated atomically when Raft persists a snapshot (scylladb/scylladb#9603). We first update `system.raft_snapshots`, then `system.raft` (see `raft_sys_table_storage::store_snapshot_descriptor`). So after the wait finishes, there's no guarantee the log has been truncated yet -- there's a race between the test's last check and Scylla doing that last delete. But we can check the snapshot using `system.raft` instead of `system.raft_snapshots`, as `system.raft` has the latest ID. And since `1640f83fdc`, storing that ID and truncating the log in `system.raft` happens atomically. Closes scylladb/scylladb#17106	2024-02-01 16:06:12 +02:00
Kefu Chai	946d281d39	exceptions: s/#warn/#warning/ `#warning` is a preprocessor macro in C/C++, while `#warn` is not. the reason we haven't run into the build failure caused by this is likely that we are only building on amd64/aarch64 with libstdc++ at the time of writing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17074	2024-02-01 14:50:17 +02:00
Botond Dénes	1a0300dba6	Merge 'compaction_manager: flush tables before cleanup' from Kefu Chai according to the document "nodetool cleanup" > Triggers removal of data that the node no longer owns currently, scylla performs cleanup by rewriting the sstables. but commitlog segments may still contain the mutations to the tables which are dropped during sstable rewriting. when scylla server restarts, the dirty mutations are replayed to the memtable. if any of these dirty mutations changes the tables cleaned up. the stale data are reapplied. this would lead to data resurrection. so, in this change we following the same model of major compaction where we 1. forcing new active segment, 2. flushing tables being cleaned up 3. perform cleanup using compaction Fixes #4734 Closes scylladb/scylladb#16757 * github.com:scylladb/scylladb: storage_service: fall back to local cleanup in cleanup_all compaction: format flush_mode without the helper compaction_manager: flush all tables before cleanup replica: table: pass do_flush to table::perform_cleanup_compaction() api, compaction: promote flush_mode	2024-02-01 13:47:45 +02:00
libo-sober	a341b870bc	Remove unnecessary calculations in integrity_checked_file_impl::write_dma. Use calculated `rbuf_end` in `std::mismatch` to reduce unnecessary calculations. Closes scylladb/scylladb#16979	2024-02-01 13:42:59 +02:00
Botond Dénes	8debb6b98f	Merge 'storage_service: Run stream_ranges cmd in streaming group' from Asias He Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group. Fixes #17090 Closes scylladb/scylladb#17097 * github.com:scylladb/scylladb: streaming: Verify stream consumer runs inside streaming group storage_service: Run stream_ranges cmd in streaming group	2024-02-01 13:18:26 +02:00
Patryk Wrobel	25324bbe50	cql_test_env.cc: remove dead code This change removes empty anonymous namespace that is a dead code. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17099	2024-02-01 13:17:48 +02:00
Pavel Emelyanov	64cb3a6496	test/topology_custom: Remove test_tablets.py It's now empty, all test cases had been moved to cql-pytest Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	3fbe93e45d	test/topology: Move test_tablet_change_initial_tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	480227fcad	test/topology: Move test_tablet_explicit_disabling Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	45b0490100	test/topology: Move test_tablet_default_initialization Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	3258c56ca3	test/topology: Move test_tablet_change_replication_strategy Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	6f50cc2783	test/topology: Move test_tablet_change_replication_vnode_to_tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Botond Dénes	b9af2efcb1	Merge 'directories: prevent inode cache fragmentation by orderly verifying data directory contents' from Lakshmi Narayanan Sreethar During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache. Mitigate this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache. Fixes https://github.com/scylladb/scylladb/issues/14506 Closes scylladb/scylladb#16952 * github.com:scylladb/scylladb: directories: prevent inode cache fragmentation by orderly verifying data directory contents directories: skip verifying data directory contents during startup directories: co-routinize create_and_verify	2024-02-01 12:30:07 +02:00
Kefu Chai	4ec104e086	api: storage_service: correct a typo s/a any keyspace/a given keyspace/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17098	2024-02-01 10:55:58 +02:00
Botond Dénes	2a4b991772	Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs. The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function. Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw. Fixes #17035 Closes scylladb/scylladb#17073 * github.com:scylladb/scylladb: utils: replace assert() by on_internal_error() utils: add on_internal_error with common logger utils: add a timeuuid minimum, like we had maximum test/cql-pytest: tests for "date" type	2024-02-01 10:48:48 +02:00
Patryk Wrobel	6e5a85c387	replica/table: add tablet count metric This change introduces a new metric called tablet_count that is recalculated during construction of table object and on each call to table::update_effective_replication_map(). To get the count of tablet per current shard, tablet map is traversed and for each tablet_id tablet_map::get_shard() is called. Its return value is compared with this_shard_id(). The new metric is maintained and exposed only for tables that uses tablets. Refs: scylladb#16131 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17056	2024-02-01 10:46:53 +02:00
Asias He	2888c3086c	utils: Add uuid_xor_to_uint32 helper Convert the uuid to a uint32_t using xor. It is useful to get a uint32_t number from the uuid. Refs: #16927 Closes scylladb/scylladb#17049	2024-02-01 10:27:55 +02:00
Botond Dénes	f5917b215f	Merge 'replica, tablet_allocator: do not compare unsigned with signed' from Kefu Chai this series addresses couple `-Wsign-compare` warnings surfaced in the tree. Closes scylladb/scylladb#17091 * github.com:scylladb/scylladb: tablet_allocator: do not compare signed and unsigned replica: table: do not compare signed with unsigned	2024-02-01 10:26:04 +02:00
Kefu Chai	7a8e8c2ced	db: add formatter for db::write_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::write_type`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17093	2024-02-01 10:22:45 +02:00
Kefu Chai	005d231f96	db: add formatter for gms::application_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::application_state`, but its operator<< is preserved, as it is still used by the generic homebrew formatter for `std::unordered_map<>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17096	2024-02-01 10:02:25 +02:00
Pavel Emelyanov	ab7ce3d1fa	cql-pytest: Add skip_without_tablets fixture It's opposite to skip_with_tablets one and thus also depends on scylla_only one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 10:58:13 +03:00
Lakshmi Narayanan Sreethar	dbe758d309	directories: prevent inode cache fragmentation by orderly verifying data directory contents During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache. Prevent this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache. Fixes #14506 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 12:20:23 +05:30
Lakshmi Narayanan Sreethar	74a4085426	directories: skip verifying data directory contents during startup This is in preparation for a subsequent patch that will verify the contents of the data directory in a specific order. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 11:54:59 +05:30
Lakshmi Narayanan Sreethar	2e3d2498f4	directories: co-routinize create_and_verify Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 11:41:10 +05:30
Kefu Chai	5e0b3671d3	storage_service: fall back to local cleanup in cleanup_all before this change, if no keyspaces are specified, scylla-nodetool just enumerate all non-local keyspaces, and call "/storage_service/keyspace_cleanup" on them one after another. this is not quite efficient, as each this RESTful API call force a new active commitlog segment, and flushes all tables. so, if the target node of this command has N non-local keyspaces, it would repeat the steps above for N times. this is not necessary. and after a topology change, we would like to run a global "nodetool cleanup" without specifying the keyspace, so this is a typical use case which we do care about. to address this performance issue, in this change, we improve an existing RESTful API call "/storage_service/cleanup_all", so if the topology coordinator is not enabled, we fall back to a local cleanup to cleanup all non-local keyspaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	4f90a875f6	compaction: format flush_mode without the helper since flush_mode is moved out of major_compaction_task_impl, let's drop the helper hosted in that class as well, and implement the formatter witout it. please note, the `__builtin_unreachable()` is dropped. it should not change the behavior of the formatter. we don't put it in the `default` branch in hope that `-Wswitch` can warn us in the case when another enum of `flush_mode` is added, but we fail to handle it somehow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	b39cc01bb3	compaction_manager: flush all tables before cleanup according to the document "nodetool cleanup" > Triggers removal of data that the node no longer owns currently, scylla performs cleanup by rewriting the sstables. but commitlog segments may still contain the mutations to the tables which are dropped during sstable rewriting. when scylla server restarts, the dirty mutations are replayed to the memtable. if any of these dirty mutations changes the tables cleaned up. the stale data are reapplied. this would lead to data resurrection. so, in this change we following the same model of major compaction: 1. force new active segment, 2. flush all tables 3. perform cleanup using compaction, which rewrites the sstables of specified tables because we already `flush()` all tables in `cleanup_keyspace_compaction_task_impl::run()`, there is no need to call `flush()` again, in `table::perform_cleanup_compaction()`, so the `flush()` call is dropped in this function, and the tests using this function are updated to call `flush()` manually to preserve the existing behavior. there are two callers of `cleanup_keyspace_compaction_task_impl`, * one is `storage_service::sstable_cleanup_fiber()`, which listens for the events fired by topology_state_machine, which is in turn driven by, for instance, "/storage_service/cleanup_all" API. which cleanup all keyspaces in one after another. * another is "/storage_service/keyspace_cleanup", which cleans up the specified keyspace. in the first use case, we can force a new active segment for a single time, so another parameter to the ctor of `cleanup_keyspace_compaction_task_impl` is introduced to specify if the `db.flush_all_tables()` call should be skiped. please note, there are two possible optimizations, 1. force new active segment only if the mutations in it touches the tables being cleaned up 2. after forcing new active segment, only flush the (mem)tables mutated by the non-active segments but let's leave them for following-up changes. this change is a minimal fix for data resurrection issue. Fixes #16757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	34d80690fa	replica: table: pass do_flush to table::perform_cleanup_compaction() this parameter defaults to do_flush::yes, so the existing behavior is preserved. and this change prepares for a change which flushes all tables before performing cleanup on the tables per-demand. please note, we cannot pass compaction::flush_mode to this function, as it is used by compaction/task_manager_module.hh, if we want to share it by both database.hh and compaction/task_manager_module.hh, we would have to find it a new home. so `table::do_flush` boolean tag is reused instead. Refs #16757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	9afec2e3e7	api, compaction: promote flush_mode so that this enum type can be shared by other task(s) as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	110d2e52be	tablet_allocator: do not compare signed and unsigned `available_shards` could be negative when `resize_plan` is empty, and the loop to build `resize_plan` stops at the next iteration after `available_shards` is assigned with a negative number. so, instead of making it an `unsigned`, let's just compare it using `std::cmp_less()`. this change should silence following warning: ``` /home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -g -O0 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wignored-qualifiers -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -MF service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o.d -o service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -c /home/kefu/dev/scylladb/service/tablet_allocator.cc /home/kefu/dev/scylladb/service/tablet_allocator.cc:529:60: error: comparison of integers of different signs: 'long' and 'const size_t' (aka 'const unsigned long') [-Werror,-Wsign-compare] 529 \| if (resize_plan.size() > 0 && available_shards < size_desc.shard_count) { \| ~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:01:19 +08:00
Kefu Chai	493a608417	replica: table: do not compare signed with unsigned this change helps to silence follow warning: ``` /home/kefu/dev/scylladb/replica/table.cc:1952:26: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] 1952 \| for (auto id = 0; id < _storage_groups.size(); id++) { \| ~~ ^ ~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:01:19 +08:00
Asias He	e1fc91bea9	streaming: Verify stream consumer runs inside streaming group This will catch schedule group leaks by accident. Refs: 17090	2024-02-01 10:37:24 +08:00
Asias He	f103f75ed8	storage_service: Run stream_ranges cmd in streaming group Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group. Fixes #17090	2024-02-01 10:20:02 +08:00
Kamil Braun	b2c02d8268	Merge 'schema: column_mapping::{static,regular}_column_at(): use on_internal_error()' from Botond Dénes Instead of std::out_of_range(). Accessing a non-existing column is a serious bug and the backtrace coming with `on_internal_error()` can be very valuable when debugging it. As can be the coredump that is possible to trigger with `--abort-on-internal-error`. This change follows another similar change to `schema::column_at()`. This should help us get to the bottom of the mysterious repair failures caused by invalid column access, seen in https://github.com/scylladb/scylladb/issues/16821. Refs: https://github.com/scylladb/scylladb/issues/16821 Closes scylladb/scylladb#17080 * github.com:scylladb/scylladb: schema: column_mapping::{static,regular}_column_at(): use on_internal_error() schema: column_mapping: move column accessors out-of-line	2024-01-31 16:29:15 +01:00
Nadav Har'El	458fd0c2f7	utils: replace assert() by on_internal_error() In issue #17035 we had a situation where a certain input timestamp could result in the create_time() utility function getting called on a timestamp that cannot be represented as timeuuid, and this resulted in an assertion failure, and a crash. I guess we used an assertion because we believed that callers try to avoid calling this function on excessively large timestamps, but evidentally, they didn't tried hard enough and we got a crash. The code in UUID_gen.hh changed a lot over the years and has become very convoluted and it is almost impossible to understand all the code paths that could lead to this assertion failures. So it's better to replace this assertion by a on_internal_error, which by default is just an exception - and also logs the backtrace of the failure. Issue #17035 would have been much less serious if we had an exception instead of an assert. Refs #17035 Refs #7871, Refs #13970 (removes an assert) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 16:45:28 +02:00
Nadav Har'El	259811b6ec	utils: add on_internal_error with common logger Seastar's on_internal_error() is a useful replacement for assert() but it's inconvenient that it requires each caller to supply a logger - which is often inconvenient, especially when the caller is a header file. So in this patch we introduce a utils::on_internal_error() function which is the same as seastar::on_internal_error() (the former calls the latter), except it uses a single logger instead of asking the caller to pass a logger. Refs #7871 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 16:45:09 +02:00
Patryk Wrobel	c6de20a608	replica/mutation_dump.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::stringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Moreover, it introduces usage of std::stringstream::view() when checking if the stream contains some characters. It skips another copy of the underlying string, because std::string_view is returned. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17084	2024-01-31 14:58:20 +02:00
Pavel Emelyanov	7c5c89ba8d	Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel" This reverts commit `370fbd346c`, reversing changes made to `0912d2a2c6`. This makes scylla-manager mis-interpret the data_file_directories somehow, issue #17078	2024-01-31 15:08:14 +03:00
Avi Kivity	c8397f0287	Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer. If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint. Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on. A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split. Tablet metadata gains 2 new fields for managing this: resize_type: resize decision type, can be either of "merge", "split", or "none". resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator). A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ). When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready. When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split"). The split monitor will start splitting of compaction groups (using mechanism introduced here: `081f30d149`) for the table. And once splitting work is completed, the table updates its local state as having completed split. When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step). Fixes #16536. Closes scylladb/scylladb#16580 * github.com:scylladb/scylladb: test/topology_experimental_raft: Add tablet split test replica: Bypass reshape on boot with tablets temporarily replica: Fix table::compaction_group_for_sstable() for tablet streaming test/topology_experimental_raft: Disable load balancer in test fencing replica: Remap compaction groups when tablet split is finalized service: Split tablet map when split request is finalized replica: Update table split status if completed split compaction work storage_service: Implement split monitor topology_cordinator: Generate updates for resize decisions made by balancer load_balancer: Introduce metrics for resize decisions db: Make target tablet size a live-updateable config option load_balancer: Implement resize decisions service: Wire table_resize_plan into migration_plan service: Introduce table_resize_plan tablet_mutation_builder: Add set_resize_decision() topology_coordinator: Wire load stats into load balancer storage_service: Allow tablet split and migration to happen concurrently topology_coordinator: Periodically retrieve table_load_stats locator: Introduce topology::get_datacenter_nodes() storage_service: Implement table_load_stats RPC replica: Expose table_load_stats in table replica: Introduce storage_group::live_disk_space_used() locator: Introduce table_load_stats tablets: Add resize decision metadata to tablet metadata locator: Introduce resize_decision	2024-01-31 13:59:56 +02:00
Kefu Chai	bd71e0b794	tracing: add formatter for tracing::span_id before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `tracing::span_id`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17058	2024-01-31 13:43:46 +02:00
Kefu Chai	f5e3a2d98e	test.py: add `boost_tests()` to suite this change is a cleanup. so it only returns tests, to be more symmetric with `junit_tests()`. this allows us to drop the dummy `get_test_case()` in `PythonTestSuite`. as only the BoostTest will be asked for `get_test_case()` after this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16961	2024-01-31 13:43:21 +02:00
Botond Dénes	181f68f248	Merge 'raft_group0: trigger snapshot if existing snapshot index is 0' from Kamil Braun The persisted snapshot index may be 0 if the snapshot was created in older version of Scylla, which means snapshot transfer won't be triggered to a bootstrapping node. Commands present in the log may not cover all schema changes --- group 0 might have been created through the upgrade upgrade procedure, on a cluster with existing schema. So a deployment with index=0 snapshot is broken and we need to fix it. We can use the new `raft::server::trigger_snapshot` API for that. Also add a test. Fixes scylladb/scylladb#16683 Closes scylladb/scylladb#17072 * github.com:scylladb/scylladb: test: add test for fixing a broken group 0 snapshot raft_group0: trigger snapshot if existing snapshot index is 0	2024-01-31 13:04:59 +02:00
Kefu Chai	843d74428d	configure.py: s/-DBOOST_TEST_DYN_LINK/-DBOOST_ALL_DYN_LINK/ we add `-DBOOST_TEST_DYN_LINK` to the cflags when `--static-boost` is not passed to `configure.py`. but we don't never pass this option to `configure.py` in our CI/CD. also, we don't install `boost-static` in `install-dependencies.sh`, so the linker always use the boost shared libraries when building scylla and other executables in this project. this fact has been verified with the latest master HEAD, after building scylla from `build.ninja` which was in turn created using `configure.py`. Seastar::seastar_testing exposes `Boost::dynamic_linking` in its public interface, and `Boost::dynamic_linking` exposes `-DBOOST_ALL_DYN_LINK` as one of its cflags. so, when building testings using CMake, the tests are compiled with `-DBOOST_ALL_DYN_LINK`, while when building tests using `configure.py`, they are compiled with `-DBOOST_TEST_DYN_LINK`. the former is exposed by `Boost::dynamic_linking`, the latter is hardwired using `configure.py`. but the net results are identical. it would be better to use identical cflags on these two building systems. so, let's use `-DBOOST_ALL_DYN_LINK` in `configure.py` also. furthermore, this is what non-static-boost implies. please note, we don't consume the cflags exposed by `seastar-testing.pc`, so they don't override the ones we set using `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17070	2024-01-31 12:21:31 +02:00
Botond Dénes	ecf654ea11	schema: column_mapping::{static,regular}_column_at(): use on_internal_error() Instead of std::out_of_range(). Accessing a non-existing column is a serious bug and the backtrace coming with on_internal_error() can be very valuable when debugging it. As can be the coredump that is possible to trigger with --abort-on-internal-error. This change follows another similar change to schema::column_at().	2024-01-31 05:12:33 -05:00
Botond Dénes	03ed9f77ff	schema: column_mapping: move column accessors out-of-line To faciliate further patching.	2024-01-31 05:06:34 -05:00
Lakshmi Narayanan Sreethar	b5e1097858	build: cmake: include raft.cc in api library When building with cmake, include the raft source files introduced by commit `617e0913` as sources for api library target. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17075	2024-01-31 11:39:41 +02:00

1 2 3 4 5 ...

41043 Commits