scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 12:17:02 +00:00

Author	SHA1	Message	Date
Kamil Braun	5d3dde50f4	Merge '[Backport 6.0] Fail bootstrap if ip mapping is missing during double write stage' from ScyllaDB If a node restart just before it stores bootstrapping node's IP it will not have ID to IP mapping for bootstrapping node which may cause failure on a write path. Detect this and fail bootstrapping if it happens. (cherry picked from commit `1faef47952`) (cherry picked from commit `27445f5291`) (cherry picked from commit `6853b02c00`) (cherry picked from commit `f91db0c1e4`) Refs #18927 Closes scylladb/scylladb#19118 * github.com:scylladb/scylladb: raft topology: fix indentation after previous commit raft topology: do not add bootstrapping node without IP as pending test: add test of bootstrap where the coordinator crashes just before storing IP mapping schema_tables: remove unused code	2024-06-06 11:35:13 +02:00
Tomasz Grabiec	b7fe4412d0	test: pylib: Fetch all pages by default in run_async Fetching only the first page is not the intuitive behavior expected by users. This causes flakiness in some tests which generate variable amount of keys depending on execution speed and verify later that all keys were written using a single SELECT statement. When the amount of keys becomes larger than page size, the test fails. Fixes #18774 (cherry picked from commit `2c3f7c996f`) Closes scylladb/scylladb#19130	2024-06-06 08:22:45 +03:00
Benny Halevy	fd7284ec06	gms: endpoint_state: get_dc_rack: do not assign to uninitialized memory Assigning to a member of an uninitialized optional does not initialize the object before assigning to it. This resulted in the AddressSanitizer detecting attempt to double-free when the uninitialized string contained apprently a bogus pointer. The change emplaces the returned optional when needed without resorting to the copy-assignment operator. So it's not suceptible to assigning to uninitialized memory, and it's more efficient as well... Fixes scylladb/scylladb#19041 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b2fa954d82`) Closes scylladb/scylladb#19117	2024-06-06 08:21:05 +03:00
Botond Dénes	8d12eeee62	Merge '[Backport 6.0] tasks: introduce task manager's task folding' from Aleksandra Martyniuk Task manager's tasks stay in memory after they are finished. Moreover, even if a child task is unregistered from task manager, it is still alive since its parent keeps a foreign pointer to it. Also, when a task has finished successfully there is no point in keeping all of its descendants in memory. The patch introduces folding of task manager's tasks. Whenever a task which has a parent is finished it is unregistered from task manager and foreign_ptr to it (kept in its parent) is replaced with its status. Children's statuses of the task are dropped unless they or one of their descendants failed. So for each operation we keep a tree of tasks which contains: - a root task and its direct children (status if they are finished, a task otherwise); - running tasks and their direct children (same as above); - a statuses path from root to failed tasks. /task_manager/wait_task/ does not unregister tasks anymore. Refs: https://github.com/scylladb/scylladb/issues/16694. - [ ] Backport reason (please explain below if this patch should be backported or not) Requires backport to 6.0 as task number exploded with tablets. (cherry picked from commit `6add9edf8a`) (cherry picked from commit `319e799089`) (cherry picked from commit `e6c50ad2d0`) (cherry picked from commit `a82a2f0624`) (cherry picked from commit `c1b2b8cb2c`) (cherry picked from commit `30f97ea133`) (cherry picked from commit `fc0796f684`) (cherry picked from commit `d7e80a6520`) (cherry picked from commit `beef77a778`) Refs https://github.com/scylladb/scylladb/pull/18735 Closes scylladb/scylladb#19104 * github.com:scylladb/scylladb: docs: describe task folding test: rest_api: add test for task tree structure test: rest_api: modify new_test_module tasks: test: modify test_task methods api: task_manager: do not unregister task in /task_manager/wait_task/ tasks: unregister tasks with parents when they are finished tasks: fold finished tasks info their parents tasks: make task_manager::task::impl::finish_failed noexcept tasks: change _children type	2024-06-06 07:56:12 +03:00
Gleb Natapov	e11827f37e	raft topology: fix indentation after previous commit (cherry picked from commit `f91db0c1e4`)	2024-06-05 13:55:29 +00:00
Gleb Natapov	0acfc223ab	raft topology: do not add bootstrapping node without IP as pending If there is no mapping from host id to ip while a node is in bootstrap state there is no point adding it to pending endpoint since write handler will not be able to map it back to host id anyway. If the transition sate requires double writes though we still want to fail. In case the state is write_both_read_old we fail the barrier that will cause topology operation to rollback and in case of write_both_read_new we assert but this should not happen since the mapping is persisted by this point (or we failed in write_both_read_old state). Fixes: scylladb/scylladb#18676 (cherry picked from commit `6853b02c00`)	2024-06-05 13:55:28 +00:00
Gleb Natapov	c53cd98a41	test: add test of bootstrap where the coordinator crashes just before storing IP mapping On the next boot there is no host ID to IP mapping which causes node to crash again with "No mapping for :: in the passed effective replication map" assertion. (cherry picked from commit `27445f5291`)	2024-06-05 13:55:28 +00:00
Gleb Natapov	fa6a7cf144	schema_tables: remove unused code (cherry picked from commit `1faef47952`)	2024-06-05 13:55:28 +00:00
Patryk Jędrzejczak	65021c4b1c	[Backport 6.0] test: test_topology_ops: run correctly without tablets The values of `tablets_enabled` were nonempty strings, so they always evaluated to `True` in the if statement responsible for enabling writing workers only if tablets are disabled. Hence, the writing workers were always disabled. The original commit, `ea4717da65`, contains one more change, which is not needed (and conflicting) in 6.0 because scylladb/scylladb#18898 has been backported first. Closes scylladb/scylladb#19111	2024-06-05 15:15:00 +02:00
Botond Dénes	341c29bd74	Merge '[Backport 6.0] storage_service: Fix race between tablet split and stats retrieval' from Raphael "Raph" Carvalho Retrieval of tablet stats must be serialized with mutation to token metadata, as the former requires tablet id stability. If tablet split is finalized while retrieving stats, the saved erm, used by all shards, can have a lower tablet count than the one in a particular shard, causing an abort as tablet map requires that any id feeded into it is lower than its current tablet count. Fixes https://github.com/scylladb/scylladb/issues/18085. (cherry picked from commit `abcc68dbe7`) (cherry picked from commit `551bf9dd58`) (cherry picked from commit `e7246751b6`) Refs https://github.com/scylladb/scylladb/pull/18287 Closes scylladb/scylladb#19095 * github.com:scylladb/scylladb: topology_experimental_raft/test_tablets: restore usage of check_with_down test: Fix flakiness in topology_experimental_raft/test_tablets service: Use tablet read selector to determine which replica to account table stats storage_service: Fix race between tablet split and stats retrieval	2024-06-05 13:06:32 +03:00
Aleksandra Martyniuk	e963631859	docs: describe task folding (cherry picked from commit `beef77a778`)	2024-06-05 10:09:13 +02:00
Jenkins Promoter	c6f0a3267e	Update ScyllaDB version to: 6.0.0-rc3 scylla-6.0.0-rc3 scylla-6.0.0-rc3-candidate-20240605025744	2024-06-05 10:03:47 +03:00
Marcin Maliszkiewicz	f02f2fef40	docs: remove note about performance degradation with default superuser This doesn't apply for auth-v2 as we improved data placement and removed cassandra quirk which was setting different CL for some default superuser involved operations. Fixes #18773 (cherry picked from commit `9adf74ae6c`) Closes scylladb/scylladb#18860	2024-06-05 09:04:45 +03:00
Benny Halevy	f8ae38a68c	data_dictionary: keyspace_metadata: format: print also initial_tablets Currently, there is no indication of tablets in the logged KSMetaData. Print the tablets configuration of either the`initial` number of tablets, if enabled, or {'enabled':false} otherwise. For example: ``` migration_manager - Create new Keyspace: KSMetaData{name=tablets_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"initial":0}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004d446a8} migration_manager - Create new Keyspace: KSMetaData{name=vnodes_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"enabled":false}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004c33ea8} Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `4fe700a962`) Closes scylladb/scylladb#19009	2024-06-05 08:31:21 +03:00
Botond Dénes	8a064daccf	Update tools/java submodule * tools/java 4ee15fd9...6dfc187a (1): > Update Scylla Java driver to 3.11.5.3. [botond: regenerate frozen toolchain] Closes scylladb/scylladb#18999	2024-06-05 08:00:19 +03:00
Botond Dénes	7f540407c9	Merge '[Backport 6.0] repair: Introduce new primary replica selection algorithm for tablets' from ScyllaDB Tablet allocation does not guarantee fairness of the first replica in the replicas set across dcs. The lack of this fix cause the following dtest to fail: repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc Use the tablet_map get_primary_replica or get_primary_replica_within_dc, respectively to see if this node is the primary replica for each tablet or not. Fixes https://github.com/scylladb/scylladb/issues/17752 No backport is required before 6.0 as tablets (and tablet repair) are introduced in 6.0 (cherry picked from commit `c52f70f92c`) (cherry picked from commit `2de79c39dc`) (cherry picked from commit `84761acc31`) (cherry picked from commit `009767455d`) (cherry picked from commit `18df36d920`) Refs #18784 Closes scylladb/scylladb#19068 * github.com:scylladb/scylladb: repair: repair_tablets: use get_primary_replica repair: repair_tablets: no need to check ranges_specified per tablet locator: tablet_map: add get_primary_replica_within_dc locator: tablet_map: get_primary_replica: do not copy tablet info locator: tablet_map: get_primary_replica: return tablet_replica	2024-06-05 07:47:24 +03:00
Aleksandra Martyniuk	50e1369d1d	test: rest_api: add test for task tree structure Add test which checks whether the tasks are folded into their parent as expected. (cherry picked from commit `d7e80a6520`)	2024-06-04 14:42:10 +00:00
Aleksandra Martyniuk	21e860453c	test: rest_api: modify new_test_module Remove remaining test tasks when a test module is removed, so that a node could shutdown even if a test fails. (cherry picked from commit `fc0796f684`)	2024-06-04 14:42:10 +00:00
Aleksandra Martyniuk	1d34da21a9	tasks: test: modify test_task methods Wait until the task is done in test_task::finish_failed and test_task::finish to ensure that it is folded into its parent. (cherry picked from commit `30f97ea133`)	2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk	377bc345f1	api: task_manager: do not unregister task in /task_manager/wait_task/ If /task_manager/wait_task/ unregisters the task, then there is no way to examine children failures, since their statuses can be checked only through their parent. (cherry picked from commit `c1b2b8cb2c`)	2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk	607be221b8	tasks: unregister tasks with parents when they are finished Unregister children that are finished from task manager. They can be examined through they parents. (cherry picked from commit `a82a2f0624`)	2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk	cb242ad48c	tasks: fold finished tasks info their parents Currently, when a child task is unregistered, it is still kept by its parent. This leads to excessive memory usage, especially when the tasks are configured to be kept in task manager after they are finished (task_ttl_in_seconds). Introduce task_essentials struct which keeps only data necesarry for task manager API. When a task which has a parent is finished, a foreign pointer to it in its parent is replaced with respective task_essentials. Once a parent task is finished it is also folded into its parent (if it has one). Children details of a folded task are lost, unless they (or some of their subtrees) failed. That is, when a task is finished, we keep: - a root task (until it is unregistered); - task_essentials of root's direct children; - a path (of task_essentials) from root to each failed task (so that the reason of a failure could be examined). (cherry picked from commit `e6c50ad2d0`)	2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk	7258f4f73c	tasks: make task_manager::task::impl::finish_failed noexcept (cherry picked from commit `319e799089`)	2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk	baf0385728	tasks: change _children type Keep task children in a map. It's a preparation for further changes. (cherry picked from commit `6add9edf8a`)	2024-06-04 14:42:08 +00:00
Raphael S. Carvalho	a373ed52a5	topology_experimental_raft/test_tablets: restore usage of check_with_down `e7246751b6` incorrectly dropped its usage in test_tablet_missing_data_repair. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-04 11:01:22 -03:00
Raphael S. Carvalho	9a341a65af	replica: Only consume memtable of the tablet intersecting with range read storage_proxy is responsible for intersecting the range of the read with tablets, and calling replica with a single tablet range, therefore it makes sense to avoid touching memtables of tablets that don't intersect with a particular range. Note this is a performance issue, not correctness one, as memtable readers that don't intersect with current range won't produce any data, but cpu is wasted until that's realized (they're added to list of readers in mutation_reader_merger, more allocations, more data sources to peek into, etc). That's also important for streaming e.g. after decommission, that will consume one tablet at a time through a reader, so we don't want memtables of streamed tablets (that weren't cleaned up yet) to be consumed. Refs #18904. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `832fb43fb4`) Closes scylladb/scylladb#18983	2024-06-04 16:17:47 +03:00
Kefu Chai	35b4b47d74	build: add sanitizer compiling options directly before this change, in order to avoid repeating/hardwiring the compiling options set by Seastar, we just inherit the compiling options of Seastar for building Abseil, as the former exposes the options to enable sanitizers. this works fine, despite that, strictly speaking, not all options are necessary for building abseil, as abseil is not a Seastar application -- it is just a C++ library. but when we introduce dependencies which are only generated at build time, and these dependencies are passed to the compiler at build time, this breaks the build of Abseil. because these dependencies are exposed by the Seastar's .pc file, and consumed by Abseil. when building Abseil, apparently, the building process driven by ninja is not started yet, so we are not able to build Abseil with these settings due to missing dependencies. so instead of inheriting the compiling options from Seastar, just set the sanitizer related compiling options directly, to avoid referencing these missing dependencies. the upside is that we pass a much smaller set of compiling options to compiler when building Abseil, the downside is that we hardwire these options related to sanitizer manually, they are also detected by Seastar's building system. but fortunately, these options are relatively stable across the building environements we support. Fixes #19055 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `c436dfd2db`) Closes scylladb/scylladb#19064	2024-06-04 15:00:43 +03:00
Benny Halevy	6d7388c689	repair: repair_tablets: use get_primary_replica Tablet allocation does not guarantee fairness of the first replica in the replicas set across dcs. The lack of this fix cause the following dtest to fail: repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc Use the tablet_map get_primary_replica* functions to get the primary replica for each tablet, possibly within a dc. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `18df36d920`)	2024-06-03 19:50:40 +00:00
Benny Halevy	6ac34f7acf	repair: repair_tablets: no need to check ranges_specified per tablet The code already turns off `primary_replica_only` if `!ranges_specified.empty()`, so there's no need to check it again inside the per-tablet loop. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `009767455d`)	2024-06-03 19:50:40 +00:00
Benny Halevy	bdf3e71f62	locator: tablet_map: add get_primary_replica_within_dc Will be needed by repair in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `84761acc31`)	2024-06-03 19:50:40 +00:00
Benny Halevy	ec30bdc483	locator: tablet_map: get_primary_replica: do not copy tablet info Currently, the function needlessly copies the tablet_info (all tablet replicas in particular) to a local variable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `2de79c39dc`)	2024-06-03 19:50:40 +00:00
Benny Halevy	21f87c9cfa	locator: tablet_map: get_primary_replica: return tablet_replica This is required by repair when it will start using get_primary_replica in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `c52f70f92c`)	2024-06-03 19:50:39 +00:00
Botond Dénes	a38d5463ef	Merge '[Backport 6.0] tablets: load balancer: Use random selection of candidates when moving tablets' from ScyllaDB In order to avoid per-table tablet load imbalance balance from forming in the cluster after adding nodes, the load balancer now picks the candidate tablet at random. This should keep the per-table distribution on the target node similar to the distribution on the source nodes. Currently, candidate selection picks the first tablet in the unordered_set, so the distribution depends on hashing in the unordered set. Due to the way hash is calculated, table id dominates the hash and a single table can be chosen more often for migration away. This can result in imbalance of tablets for any given table after bootstrapping a new node. For example, consider the following results of a simulation which starts with a 6-node cluster and does a sequence of node bootstraps and decommissions. One table has 4096 tablets and RF=1, and the other has 256 tablets and RF=2. Before the patch, the smaller table has node overcommit of 2.34 in the worst topology state, while after the patch it has overcommit of 1.65. overcommit is calculated as max load (tablet count per node) dividied by perfect average load (all tablets / nodes): Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64} Overcommit : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}} Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}} The worst state before the patch had the following distribution of tablets for the smaller table: Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62 Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76 Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88 Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05 Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37 Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74 Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33 One node has as many as 171 tablets of that table and another one has as few as 3. After the patch, the worst distribution looks like this: Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17 Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68 Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00 Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32 Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88 Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72 Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65 Most-loaded node has 121 tablets and least loaded node has 34 tablets. It's still not good, a better distribution is possible, but it's an improvement. Refs #16824 (cherry picked from commit `3be6120e3b`) (cherry picked from commit `c9bcb5e400`) (cherry picked from commit `7b1eea794b`) (cherry picked from commit `603abddca9`) Refs #18885 Closes scylladb/scylladb#19036 * github.com:scylladb/scylladb: tablets: load balancer: Use random selection of candidates when moving tablets test: perf: Add test for tablet load balancer effectiveness load_sketch: Extract get_shard_minmax() load_sketch: Allow populating only for a given table	2024-06-03 12:25:05 +03:00
Raphael S. Carvalho	3cb71c5b88	replica: Fix race of tablet snapshot with compaction tablet snapshot, used by migration, can race with compaction and can find files deleted. That won't cause data loss because the error is propagated back into the coordinator that decides to retry streaming stage. So the consequence is delayed migration, which might in turn reduce node operation throughput (e.g. when decommissioning a node). It should be rare though, so shouldn't have drastic consequences. Fixes #18977. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `b396b05e20`) Closes scylladb/scylladb#19008	2024-06-03 12:21:52 +03:00
Lakshmi Narayanan Sreethar	85805f6472	db/config.cc: increment components_memory_reclaim_threshold config default Incremented the components_memory_reclaim_threshold config's default value to 0.2 as the previous value was too strict and caused unnecessary eviction in otherwise healthy clusters. Fixes #18607 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `3d7d1fa72a`) Closes scylladb/scylladb#19014	2024-06-03 12:19:16 +03:00
Pavel Emelyanov	62a23fd86a	config: Remove experimental TABLETS feature ... and replace it with boolean enable_tablets option. All the places in the code are patched to check the latter option instead of the former feature. The option is OFF by default, but the default scylla.yaml file sets this to true, so that newly installed clusters turn tablets ON. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `83d491af02`) Closes scylladb/scylladb#19012	2024-06-03 12:16:41 +03:00
Tomasz Grabiec	b9c88fdf4b	tablets: load balancer: Use random selection of candidates when moving tablets In order to avoid per-table tablet load imbalance balance from forming in the cluster after adding nodes, the load balancer now picks the candidate tablet at random. This should keep the per-table distribution on the target node similar to the distribution on the source nodes. Currently, candidate selection picks the first tablet in the unordered_set, so the distribution depends on hashing in the unordered set. Due to the way hash is calculated, table id dominates the hash and a single table can be chosen more often for migration away. This can result in imbalance of tablets for any given table after bootstrapping a new node. For example, consider the following results of a simulation which starts with a 6-node cluster and does a sequence of node bootstraps and decommissions. One table has 4096 tablets and RF=1, and the other has 256 tablets and RF=2. Before the patch, the smaller table has node overcommit of 2.34 in the worst topology state, while after the patch it has overcommit of 1.65. overcommit is calculated as max load (tablet count per node) dividied by perfect average load (all tablets / nodes): Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64} Overcommit : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}} Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}} The worst state before the patch had the following distribution of tablets for the smaller table: Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62 Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76 Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88 Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05 Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37 Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74 Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33 One node has as many as 171 tablets of that table and the one has as few as 3. After the patch, the worst distribution looks like this: Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17 Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68 Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00 Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32 Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88 Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72 Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65 Most-loaded node has 121 tablets and least loaded node has 34 tablets. It's still not good, a better distribution is possible, but it's an improvement. Refs #16824 (cherry picked from commit `603abddca9`)	2024-06-02 22:40:46 +00:00
Tomasz Grabiec	0c1b6fed16	test: perf: Add test for tablet load balancer effectiveness (cherry picked from commit `7b1eea794b`)	2024-06-02 22:40:45 +00:00
Tomasz Grabiec	fb7a33be13	load_sketch: Extract get_shard_minmax() (cherry picked from commit `c9bcb5e400`)	2024-06-02 22:40:44 +00:00
Tomasz Grabiec	b208953e07	load_sketch: Allow populating only for a given table (cherry picked from commit `3be6120e3b`)	2024-06-02 22:40:44 +00:00
Michał Jadwiszczak	803662351d	docs/procedures/backup-restore: use `DESC SCHEMA WITH INTERNALS` Update docs for backup procedure to use `DESC SCHEMA WITH INTERNALS` instead of plain `DESC SCHEMA`. Add a note to use cqlsh in a proper version (at least 6.0.19). Closes scylladb/scylladb#18953 (cherry picked from commit `5b4e688668`)	2024-06-02 23:15:49 +02:00
Marcin Maliszkiewicz	cbf47319c1	db: auth: move auth tables to system keyspace Separate keyspace which also behaves as system brings little benefit while creating some compatibility problems like schema digest mismatch during rollback. So we decided to move auth tables into system keyspace. Fixes https://github.com/scylladb/scylladb/issues/18098 Closes scylladb/scylladb#18769 (cherry picked from commit `2ab143fb40`) [avi: adjust test/alternator/suite.yaml to reflect new keyspace] scylla-6.0.0-rc2 scylla-6.0.0-rc2-candidate-20240603043307	2024-06-02 21:41:14 +03:00
Jenkins Promoter	64388bcf22	Update ScyllaDB version to: 6.0.0-rc2	2024-06-02 15:35:58 +03:00
Anna Stuchlik	83dfe6bfd6	doc: add support for Ubuntu 24.04 (cherry picked from commit `e81afa05ea`) Closes scylladb/scylladb#19010	2024-05-31 18:33:57 +03:00
Wojciech Mitros	3c47ab9851	mv: handle different ERMs for base and view table When calculating the base-view mapping while the topology is changing, we may encounter a situation where the base table noticed the change in its effective replication map while the view table hasn't, or vice-versa. This can happen because the ERM update may be performed during the preemption between taking the base ERM and view ERM, or, due to `f2ff701`, the update may have just been performed partially when we are taking the ERMs. Until now, we assumed that the ERMs are synchronized while calling finding the base-view endpoint mapping, so in particular, we were using the topology from the base's ERM to check the datacenters of all endpoints. Now that the ERMs are more likely to not be the same, we may try to get the datacenter of a view endpoint that doesn't exist in the base's topology, causing us to crash. This is fixed in this patch by using the view table's topology for endpoints coming from the view ERM. The mapping resulting from the call might now be a temporary mapping between endpoints in different topologies, but it still maps base and view replicas 1-to-1. Fixes: #17786 Fixes: #18709 (cherry-picked from `519317dc58`) This commit also includes the follow-up patch that removes the flakiness from the test that is introduced by the commit above. The flakiness was caused by enabling the delay_before_get_view_natural_endpoint injection on a node and not disabling it before the node is shut down. The patch removes the enabling of the injection on the node in the first place. By squashing the commits, we won't introduce a place in the commit history where a potential bisect could mistakenly fail. Fixes: https://github.com/scylladb/scylladb/issues/18941 (cherry-picked from `0de3a5f3ff`) Closes scylladb/scylladb#18974	2024-05-30 09:13:31 +02:00
Anna Stuchlik	bef3777a5f	doc: add the tablets information to the nodetool describering command This commit adds an explanation of how the `nodetool describering` command works if tablets are enabled. (cherry picked from commit `888d7601a2`) Closes scylladb/scylladb#18981	2024-05-30 09:22:49 +03:00
Pavel Emelyanov	b25dd2696f	Backport Merge 'tablets: alter keyspace' from Piotr Smaron This change supports changing replication factor in tablets-enabled keyspaces. This covers both increasing and decreasing the number of tablets replicas through first building topology mutations (`alter_keyspace_statement.cc`) and then tablets/topology/schema mutations (`topology_coordinator.cc`). For the limitations of the current solution, please see the docs changes attached to this PR. refs: scylladb/scylladb#16723 * br-backport-alter-ks-tablets: test: Do not check tablets mutations on nodes that don't have them test: Fix the way tablets RF-change test parses mutation_fragments test/tablets: Unmark RF-changing test with xfail docs: document ALTER KEYSPACE with tablets Return response only when tablets are reallocated cql-pytest: Verify RF is changes by at most 1 when tablets on cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1 Reject ALTER with 'replication_factor' tag Implement ALTER tablets KEYSPACE statement support Parameterize migration_manager::announce by type to allow executing different raft commands Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks Extend system.topology with 3 new columns to store data required to process alter ks global topo req Allow query_processor to check if global topo queue is empty Introduce new global topo `keyspace_rf_change` req New raft cmd for both schema & topo changes Add storage service to query processor tablets: tests for adding/removing replicas tablet_allocator: make load_balancer_stats_manager configurable by name scylla-6.0.0-rc1 scylla-6.0.0-rc1-candidate-20240530122000	2024-05-30 08:33:58 +03:00
Pavel Emelyanov	57d267a97e	test: Do not check tablets mutations on nodes that don't have them The check is performed by selecting from mutation_fragments(table), but it's known that this query crashes Scylla when there's no tablet replica on that node. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 08:33:26 +03:00
Pavel Emelyanov	5b8523273b	test: Fix the way tablets RF-change test parses mutation_fragments When the test changes RF from 2 to 3, the extra node executes "rebuild" transition which means that it streams tablets replicas from two other peers. When doing it, the node receives two sets of sstables with mutations from the given tablet. The test part that checks if the extra node received the mutations notices two mutation fragments on the new replica and errorneously fails by seeing, that RF=3 is not equal to the number of mutations found, which is 4. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 08:33:26 +03:00
Pavel Emelyanov	6497ed68ed	test/tablets: Unmark RF-changing test with xfail Now the scailing works and test must check it does Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 08:33:26 +03:00

1 2 3 4 5 ...

42874 Commits