scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-04 05:53:13 +00:00

Author	SHA1	Message	Date
Botond Dénes	fa2ee6d545	querier: consume_page(): extract code which logs tombstone warning Soon, we want to log a warning on too many cell tombstones as well. Extract the logging code to allow reuse between the row and cell tombstone warnings.	2024-08-06 08:56:11 -04:00
Botond Dénes	e403644c8b	mutation/mutation_compactor: collect and aggregate cell compaction stats row::compact_and_expire() now returns details cell stats. Collect and aggregate these, using the existing compaction_stats::row_stats structure.	2024-08-06 08:56:11 -04:00
Botond Dénes	0396db497c	mutation: row::compact_and_expire(): use compact_and_expire_result Collect, store and return stats about cells, via compact_and_expire_result.	2024-08-06 08:56:11 -04:00
Botond Dénes	2c6d4e21e6	collection_mutation: compact_and_expire(): use compact_and_expire_result Collect, store and return stats about cells, via compact_and_expire_result.	2024-08-06 08:56:11 -04:00
Botond Dénes	e773a8eee6	mutation: introduce compact_and_expire_result To hold cell stats, to be collected during row::compact_and_expire(). Users will come in the next patches.	2024-08-06 08:56:11 -04:00
Piotr Dulikowski	a038a1fdef	Merge 'db: coroutinize do_apply_counter_update' from Michael Litvak rewrite the function as coroutine to make it easier to read and maintain, following lifetime issues we had and fixed in this function. The second commit adds a test that drops a table while there is a counter update operation ongoing in the table. The test reproduces issue https://github.com/scylladb/scylla-enterprise/issues/4475 and verifies it is fixed. Follow-up to https://github.com/scylladb/scylladb/pull/19948 Doesn't require backport because the fix to the issue was already done and backported. This is just cleanup and a test. Closes scylladb/scylladb#19982 * github.com:scylladb/scylladb: db: test counter update while table is dropped db: coroutinize do_apply_counter_update	2024-08-05 10:08:18 +02:00
Nadav Har'El	247b84715a	test/cql-pytest: reproducers for key length bugs Recently, some users have seen "Key size too large" errors in various places. Cassandra and Scylla impose a 64KB length limit on keys, and we have known about bugs in this area for a long time - and even had some translated Cassandra unit tests that cover some of them. But these tests did not cover all the corner cases and left us with partial and fragmented knowledge of this problem, spread over many test files and many issues. In this patch, we add a single test file, test/cql-pytest/test_key_length.py which attempts to rigourously explore the various bugs we have with CQL key length limits. These test aim to reproduce all known bugs in this area: * Refs #3017 - CQL layer accepts set values too large to be written to an sstable * Refs #10366 - Enforce Key-length limits during SELECT * Refs #12247 - Better error reporting for oversized keys during INSERT * Refs #16772 - Key length should be limited to exactly 65535, not less The following less interesting bug is already covered by many tests so I decided not to test it again: * Refs #7745 - Length of map keys and set items are incorrectly limited to 64K in unprepared CQL There's also a situation in materialized views and secondary indexes, where a column that was _not_ a key, now becomes a key, and a length limit needs to be enforced on it. We already have good test coverage for this (in test/cql-pytest/test_secondary_index.py and in test/cql-pytest/test_materialized_view.py), and we have an issue: * Refs #8627 - Cleanly reject updates with indexed values where value > 64k All 16 tests added here pass on Cassandra 5 except one that fails on https://issues.apache.org/jira/browse/CASSANDRA-19270, but 11 of the tests currently fail on Scylla (6 on #12247, 2 on #10366, 3 on #16772). It is possible that our decision in #16772 will not be to fix Scylla to match Cassandra but rather to declare that strict compatibility isn't needed in this case or even that Cassandra is wrong. But even then, having these tests which demonstrate the behavior of both Cassandra and Scylla will be important. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16779	2024-08-05 10:13:49 +03:00
Tzach Livyatan	861a1cedea	Improve tombstone_compaction_interval description Closes scylladb/scylladb#19072	2024-08-05 10:10:55 +03:00
Pavel Emelyanov	f0f28cf685	docs: Extend debugging with info about exploring ELF notes When debugging coredumps some (small, but useful) information is hidden in the notes of the core ELF file. Add some words about it exists, what it includes and the thing that is always forgotten -- the way to get one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19962	2024-08-05 09:49:52 +03:00
Tzach Livyatan	858fd4d183	Update tracing.rst - fix table node_slow_log_time name Closes scylladb/scylladb#19893	2024-08-05 09:47:27 +03:00
Botond Dénes	76b6e8c5aa	Merge 'Drop datadir from keyspace::config' from Pavel Emelyanov Commit `ad0e6b79` (replica: Remove all_datadir from keyspace config) removed all_datadirs from keyspace config, now it's datadir turn. After this change keyspace no longer references any on-disk directories, only the sstables's storage driver attached to keyspace's tables does. refs #12707 Closes scylladb/scylladb#19866 * github.com:scylladb/scylladb: replica: Remove keyspace::config::datadir sstables/storage: Evaluate path for keyspace directory in storage sstables/storage: Add sstables_manager arg to init_keyspace_storage()	2024-08-05 09:46:29 +03:00
Avi Kivity	2eff4b41ad	repair: row_level: coroutinize working_row_hashes() It uses do_with, so it allocates unconditionally. Might as well use the allocation for a nice coroutine. Closes scylladb/scylladb#19915	2024-08-05 08:55:34 +03:00
Anna Stuchlik	eca2dfd8c3	doc: add OS support for version 6.1 This commit adds OS support for version 6.1 and removes OS support for 5.4 (according to our support policy for versions). Closes scylladb/scylladb#19992	2024-08-05 08:25:16 +03:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Avi Kivity	cdee667170	alternator: destroy streamed json values gently Large json return values are streamed to avoid memory pressure and stalls, but are destroyed all at once. This in itself can cause stalls [1]. Destroy them gently to avoid the stalls. [1] ++[0#1/1 100%] addr=0x46880df total=514498 count=7004 avg=73: \| seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}> at ./build/release/seastar.lto/./seastar/include/seastar/util/backtrace.hh:64 ++ - addr=0x4680b35: \| seastar::backtrace_buffer::append_backtrace_oneline at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:839 \| (inlined by) seastar::print_with_backtrace at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:858 ++ - addr=0x46800f7: \| seastar::internal::cpu_stall_detector::generate_trace at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1469 ++ - addr=0x4680178: \| seastar::internal::cpu_stall_detector::maybe_report at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1206 \| (inlined by) seastar::internal::cpu_stall_detector::on_signal at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1226 ++ - addr=0x3dbaf: ?? ??:0 ++[1#1/812 13%] addr=0x217b774 total=69336 count=990 avg=70: \| rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:721 \| ++[2#1/3 85%] addr=0x217b7db total=58974 count=842 avg=70: \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| ++[3#1/4 45%] addr=0x217b7db total=902102 count=12903 avg=70: \| \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| -> continued at addr=0x217b7db above \| \| \|+[3#2/4 40%] addr=0x217b8b3 total=794219 count=11363 avg=70: \| \| \| rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:726 \| \| \| ++[4#1/1 100%] addr=0x217b7db total=909571 count=13012 avg=70: \| \| \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| \| -> continued at addr=0x217b7db above \| \| \|+[3#3/4 15%] addr=0x43d35a3 total=296768 count=4246 avg=70: \| \| \| seastar::shared_ptr_count_for<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492 \| \| \| (inlined by) seastar::shared_ptr_count_for<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492 \| \| \| ++[4#1/2 98%] addr=0x43e7d06 total=289680 count=4144 avg=70: \| \| \| \| seastar::shared_ptr<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:570 \| \| \| \| (inlined by) alternator::make_streamed(rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>&&)::$_0::operator() at ./alternator/executor.cc:127 \| \| \| ++ - addr=0x184e0a6: \| \| \| \| std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240 \| \| \| \| (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose at ./build/release/seastar.lto/./seastar/include/seastar/core/coroutine.hh:125 \| \| \| \| (inlined by) seastar::reactor::run_tasks at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:2651 \| \| \| \| (inlined by) seastar::reactor::run_some_tasks at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3114 \| \| \| \| ++[5#1/1 100%] addr=0x2503b87 total=310677 count=4417 avg=70: \| \| \| \| \| seastar::reactor::do_run at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3283 \| \| \| \| ++[6#1/2 78%] addr=0x46a2898 total=400571 count=5450 avg=73: \| \| \| \| \| seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0::operator() at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:4501 \| \| \| \| \| (inlined by) std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 \| \| \| \| \| (inlined by) std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:111 \| \| \| \| \| (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290 \| \| \| \| ++ - addr=0x4673fda: \| \| \| \| \| std::function<void ()>::operator() at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591 \| \| \| \| \| (inlined by) seastar::posix_thread::start_routine at ./build/release/seastar.lto/./seastar/src/core/posix.cc:90 \| \| \| \| ++ - addr=0x8c946: ?? ??:0 \| \| \| \| ++ - addr=0x11296f: ?? ??:0 \| \| \| \| ++[6#2/2 22%] addr=0x2502c1e total=113613 count=1549 avg=73: \| \| \| \| \| seastar::reactor::run at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3166 \| \| \| \| ++ - addr=0x22068e0: \| \| \| \| \| seastar::app_template::run_deprecated at ./build/release/seastar.lto/./seastar/src/core/app-template.cc:276 \| \| \| \| ++ - addr=0x220630b: \| \| \| \| \| seastar::app_template::run at ./build/release/seastar.lto/./seastar/src/core/app-template.cc:167 \| \| \| \| ++ - addr=0x22334bc: \| \| \| \| \| scylla_main at ./main.cc:672 \| \| \| \| ++ - addr=0x20411cc: \| \| \| \| \| std::function<int (int, char**)>::operator() at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591 \| \| \| \| \| (inlined by) main at ./main.cc:2072 \| \| \| \| ++ - addr=0x27b89: ?? ??:0 \| \| \| \| ++ - addr=0x27c4a: ?? ??:0 \| \| \| \| ++ - addr=0x28c8fb4: \| \| \| \| \| _start at ??:? Closes scylladb/scylladb#19968	2024-08-05 00:35:52 +03:00
Botond Dénes	c34127092d	reader_concurrency_semaphore: test constructor: don't ignore metrics param The for_tests constructor has a metrics parameter defaulted to register_metrics::no, but when delegating to the other constructor, a hard-coded register_metrics::no is passed. This makes no difference currently, because all callers use the default and the hard-coded value corresponds to it. Let's fix it nevertheless to avoid any future surprises. Closes scylladb/scylladb#20007	2024-08-04 21:14:42 +03:00
Kefu Chai	0660675387	utils/div_ceil: add constraints to template arguments to better reflect what we expect from the arguments. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20003	2024-08-04 15:32:01 +03:00
Aleksandra Martyniuk	2ab56b7f56	repair: use find_column_family in insert_repair_meta repair_service::insert_repair_meta gets the reference to a table and passes it to continuations. If the table is dropped in the meantime, the reference becomes invalid. Use find_column_family at each table occurrence in insert_repair_meta instead. Closes scylladb/scylladb#19953	2024-08-04 13:56:38 +03:00
Kefu Chai	571ae0ac96	docs: link to current document instead of the github wiki before this change, the hyper link brings us to a GitHub wiki page, which just points the reader to https://docs.scylladb.com/operating-scylla/snitch/. this is not a great user experience. so, in this change, we just reference the document in the current build. more efficient this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19952	2024-08-04 11:47:21 +03:00
Kefu Chai	f7556edc65	build: cmake: define SCYLLA_ENABLE_PREEMPTION_SOURCE for dev build in `fabab2f4`, we introduced preemption_source, and added `SCYLLA_ENABLE_PREEMPTION_SOURCE` preprocessor macro to enable opt-in the pluggable preemption check. but CMake building system was not updated accordingly. so, in this change, let's sync the CMake building system with `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19951	2024-08-04 11:46:28 +03:00
Yaron Kaikov	8221a178d8	Revert "dist: support nonroot and offline mode for scylla-housekeeping" This reverts commit `c3bea539b6`. Since it breaking offline-installer artifact-tests. Also, it seems that we should have merged it in the first place since we don't need scylla-housekeeping checks for offline-installer Closes scylladb/scylladb#19976	2024-08-04 10:55:26 +03:00
Piotr Dulikowski	39b49a41cc	Merge 'mv: delete a partition in a single operation when applicable' from Michael Litvak Currently when a partition is deleted from the base table, we generate a row tombstone update for each one of the view rows in the partition. When the partition key in the view is the same as the base, maybe in a different order, this can be done more efficiently - The whole corresponding view partition can be deleted with one partition tombstone update. With this commit, when generating view updates, if the update mutation has a partition tombstone then for the views which have the same partition key we will generate a partition tombstone update, and skip the individual row tombstone updates. Fixes scylladb/scylladb#8199 Closes scylladb/scylladb#19338 * github.com:scylladb/scylladb: mv: skip reading rows when generating partition tombstone update mv: delete a partition in a single operation when applicable cql-pytest: move ScyllaMetrics to util file to allow reuse	2024-08-02 11:00:18 +02:00
Michael Litvak	0f5e8c52ad	db: test counter update while table is dropped Add a test that drops a table while there is a counter update operation ongoing in the table. The test reproduces issue scylladb/scylla-enterprise#4475 and verifies it is fixed.	2024-08-01 22:23:17 +03:00
Avi Kivity	99d0aaa7d2	Merge 'tablets: load_balancer: Improve per-table balance' from Tomasz Grabiec Tablet load balancer tries to equalize tablet load between shards by moving tablets. Currently, the tablet load balancer assumes that each tablet has the same hotness. This may not be true, and some tables may be hotter than others. If some nodes end up getting more tablets of the hot table, we can end up with request load imbalance and reduced performance. In `79d0711c7e` we implemented a mitigation for the problem by randomly choosing the table whose tablet replica should be moved. This should improve fairness of movement. However, this proved to not be enough to get a good distribution of tablets. This change improves candidate selection to not relay on randomness but rather evaluating candidates with respect to the impact on load imbalance. Also, if there is no good candidate, we consider picking other source shards, not the most-loaded one. This is helpful because when finishing node drain we get just a few candidates per shard, all of which may belong to a single table, and the destination may already be overloaded with that table. Another shard may contain tablets of another table which is not yet overloaded on the destination. And shards may be of similar load, so it doesn't matter much which shard we choose to unload. We also consider other destinations, not the least-loaded one. This helps when draining nodes and the source node has few shard candidates. Shards on the destination may have similar load so there is more than one good destinatin candidate. By limiting ourselves to a single shard, we increase the chance that we're overload the table on that shard. The algorithm was evaluated using "scylla perf-load-balancing", which simulates a sequeunce of 8 node bootstraps and decommissions for different node and shard counts, RF, and tablet counts. For example, for the following parameters: params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32} The results are: Before: Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}} Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}} After: Overcommit : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}} Overcommit : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} So worst shard overcommit for table1 was reduced from 4 to 1.5. Overcommit of 4 means that the most-loaded shard has 4 times more tablets than the average per-shard load in the cluster. Also, node overcommit for table1 was reduced from 1.81 to 1.02. The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution. The algorithm is not perfect, it finds a local optimum. In the above test, overcommit of 1.5 is not the best possible (1.25). One of the reason why the current algorithm doesn't achieve best distribution is that it works with a single movement at a time and replication constraints limit the choice of destinations. Viable destinations for remaining candidates may by only on nodes which are not least-loaded, and we won't be able to fill the least loaded node. Doing so would require more complex movement involving moving a tablet from one of the destination nodes which doesn't have a replica on the least loaded node and then replacing it with the candidate from the source node. Another limitation is that the algorithm can only fix balance by moving tablets away from most loaded nodes, and it does so due to imbalance between nodes. So it cannot fix the imbalance which is already present on the nodes if there is not much to move due to similar load between nodes. It is designed to not make the imbalance worse, so it works good if we started in a good shape. Fixes https://github.com/scylladb/scylladb/issues/16824 Closes scylladb/scylladb#19779 * github.com:scylladb/scylladb: test: perf: tablet_load_balancing: Test with higher shard and tablet counts tablets: load_balancer: Avoid quadratic complexity when finding best candidate tablets: load_balancer: Maintain load sketch properly during intra-node migration tablets: load_balancer: Use "drained" flag test: perf: tablet_load_balancing: Report load balancer stats tablets: load_balancer: Move load_balancer_stats_manager to header file tablets: load_balancer: Split evaluate_candidate() into src and dst part tablets: load_balancer: Optimize evaluate_candidate() tablets: load_balancer: Add more statistics tablets: load_balancer: Track load per table on cluster level tablets: load_balancer: Track load per table on node level tablets: load_balancer: Use a single load sketch for tracking all nodes locator: load_sketch: Introduce populate_dc() tablets: load_balancer: Modify target load sketch only when emitting migration locator: load_sketch: Introduce get_most_loaded_shard() locator: load_sketch: Introduce get_least_loaded_shard() locator: load_sketch: Optimize pick()/unload() locator: load_sketch: Introduce load_type test: perf: tablet_load_balancing: Report total tablet counts test: perf: tablet_load_balancing: Print run parameters in the single simulation case too test: perf: tablet_load_balancing: Report time it took to schedule migrations tablets: load_balancer: Log table load stats after each migration tablets: load_balancer: Log per-shard load distribution in debug level tablets: load_balancer: Improve per-table balance tablets: load_balancer: Extract check_convergence() tablets: load_balancer: Extract nodes_by_load_cmp tablets: load_balancer: Maintain tablet count per table tablets: load_balancer: Reuse src_node_info test: perf: tablet_load_balancing: Print warnings about bad overcommit test: perf: tablet_load_balancing: Allow running a single simulation test: perf: tablet_load_balancing: Report best possible shard overcommit test: perf: tablet_load_balancing: Report global shard overcommit	2024-08-01 21:12:14 +03:00
Michael Litvak	22b282f5c5	db: coroutinize do_apply_counter_update rewrite the function as coroutine to make it easier to read and maintain, following lifetime issues we had and fixed in this function.	2024-08-01 19:09:04 +03:00
Piotr Dulikowski	44f327675d	Merge 'Remove gossiper argument from storage_service::join_cluster()' from Pavel Emelyanov It's only needed to start hints via proxy, but proxy can do it without gossiper argument Closes scylladb/scylladb#19894 * github.com:scylladb/scylladb: storage_service: Remote gossiper argument from join_cluster() proxy: Use remote gossiper to start hints resource manager hints: Const-ify gossiper references and anchor pointers	2024-08-01 10:18:14 +02:00
Michael Litvak	c944e28e43	db: fix waiting for counter update operations on table stop When a table is dropped it should wait for all pending operations in the table before the table is destroyed, because the operations may use the table's resources. With counter update operations, currently this is not the case. The table may be destroyed while there is a counter update operation in progress, causing an assert to be triggered due to a resource being destroyed while it's in use. The reason the operation is not waited for is a mistake in the lifetime management of the object representing the write in progress. The commit fixes it so the object lives for the duration of the entire counter update operation, by moving it to the `do_with` list. Fixes scylladb/scylla-enterprise#4475 Closes scylladb/scylladb#19948	2024-08-01 09:39:49 +02:00
Nadav Har'El	5411559a94	test/cql-pytest: test ALLOW FILTERING in intersection of two indexes A user complained that ScyllaDB is incompatible with Cassandra when it requires ALLOW FILTERING on a restriction like WHERE x=1 AND y=1 where x and y are two columns with secondary indexes. In the tests added in this patch we show that: 1. Scylla is compatible with Cassandra when the traditional "CREATE INDEX" is used - ALLOW FILTERING is required in this case in both Cassandra and Scylla. 2. If SAI is used in Cassandra (CREATE CUSTOM INDEX USING 'SAI'), indeed ALLOW FILTERING becomes optional. I believe this is incorrect so I opened CASSANDRA-19795. These two tests combined show that we're not incompatible with Cassandra, rather Cassandra's two index implementations are incompatible between themselves, and Scylla is in fact compatible in this case with Cassadra's traditional index and not with SAI. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19909	2024-07-31 14:01:29 +03:00
Laszlo Ersek	e67eb0ccc1	test/sstable: coroutinize do_write_sst() Make do_write_sst() easier to read by coroutinizing it. Closes #19803. Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19937	2024-07-31 13:59:26 +03:00
Kefu Chai	020333fcf1	sstables: fix a typo in comment s/guranteed/guaranteed/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19946	2024-07-31 13:58:09 +03:00
Tomasz Grabiec	28de5231f4	test: perf: tablet_load_balancing: Test with higher shard and tablet counts We have up to 200 shards in production, so test this to catch performance issues.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	19b7fb3a4d	tablets: load_balancer: Avoid quadratic complexity when finding best candidate If the source and destination shards picked for migration based on global tablet balance do not have a good candidate in terms of effect on per-table balance, the algorithm explores other source shards and destinations. This has quadratic complexity in terms of shard count in the worst case, when there are no good candidates. Since we can have up to ~200 shards, this can slow down scheduling significantly. I saw total scheduling time of 5 min in the following run: scylla perf-load-balancing -c1 -m1G --iterations=8 \ --nodes=4 --tablets1=1024 --tablets2=8096 \ --rf1=2 --rf2=3 --shards=256 To improve, change the apprach to first find the best source shard and then best target shard, sequentially. So it's now linear in terms of shard count. After the change, the total scheduling time in that run is down to 4s. Minimizing source and destination metrics piece-wise minimizes the combined metric, so badness of the best candidate doesn't suffer after this change.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	93df82032f	tablets: load_balancer: Maintain load sketch properly during intra-node migration Affects only intra-node migration. The code was recording destination shard as taken and did not un-take it in case we skipped the migration due to lack of candidates. Noticed during code review. Impact is minor, since even if this leads to suboptimal balance, the next scheduling round should fix it. Also, the source shard was not unloaded, but that should have no impact on decisions. But to be future-proof, better to maintain the load accurately in case the algorithm is extended with more steps.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	88988ce0db	tablets: load_balancer: Use "drained" flag Cleanup / optimization.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	56801b7cb7	test: perf: tablet_load_balancing: Report load balancer stats	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	90c9934099	tablets: load_balancer: Move load_balancer_stats_manager to header file So that stats can be accessed outside tablet allocator.	2024-07-31 12:57:15 +02:00
Anna Stuchlik	ae28880fc8	doc: enable publishing docs for branch-6.1 This commit enables publishing documentation from branch-6.1. The docs will be published as UNSTABLE (the warning about version 6.1 being unstable will be displayed). Fixes https://github.com/scylladb/scylladb/issues/19926 No backport is required. Closes scylladb/scylladb#19931	2024-07-31 12:48:51 +02:00
Kamil Braun	c05e077a13	Merge 'raft: fix the shutdown phase being stuck' from Emil Maskovsky Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 Closes scylladb/scylladb#19860 * github.com:scylladb/scylladb: raft: fix the shutdown phase being stuck raft: use the abort source reference in raft group0 client interface	2024-07-31 12:10:30 +02:00
Tomasz Grabiec	94cce4b7d3	tablets: load_balancer: Split evaluate_candidate() into src and dst part Those parts will be used separately later.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	4df2abe47a	tablets: load_balancer: Optimize evaluate_candidate() Moves load computation out of the hot path by relying on data structures maintained globally during plan making.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	5e7facd543	tablets: load_balancer: Add more statistics	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	be055977c9	tablets: load_balancer: Track load per table on cluster level	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	81fcee2040	tablets: load_balancer: Track load per table on node level	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	e7ef7419dc	tablets: load_balancer: Use a single load sketch for tracking all nodes This is code simplification and optimization. Avoids multiple passes of tablet metadata to consturct load sketch for each target node.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	352b8e0ddd	locator: load_sketch: Introduce populate_dc()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	9a7afd334b	tablets: load_balancer: Modify target load sketch only when emitting migration This avoids the need to unpick() a replica when the candidate is not selected. Optimization.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	b78657ce7d	locator: load_sketch: Introduce get_most_loaded_shard()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	de404471b7	locator: load_sketch: Introduce get_least_loaded_shard()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	8fbfd595bb	locator: load_sketch: Optimize pick()/unload() They are executed frequently during tablet scheduling. Currently, they have time complexity of O(N*log(N)) in terms of shard count. With large shard counts, that has significant overhead. This patch optimizes them down to O(log(N)).	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	d0b0f95849	locator: load_sketch: Introduce load_type	2024-07-31 11:38:17 +02:00

1 2 3 4 5 ...

43743 Commits