scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	86066f5313	Merge '[Backport 6.0] replica: Fix tombstone GC during tablet split preparation' from Raphael Raph Carvalho During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: sstable B is split first, and moved from main (unsplit) group to a split-ready group now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes https://github.com/scylladb/scylladb/issues/20044. Please replace this line with justification for the backport/* labels added to this PR Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed. (cherry picked from commit `bcd358595f`) (cherry picked from commit `93815e0649`) Refs https://github.com/scylladb/scylladb/pull/20939 Closes scylladb/scylladb#21204 * github.com:scylladb/scylladb: replica: Fix tombstone GC during tablet split preparation service: Improve error handling for split	2024-10-23 11:48:45 +02:00
Raphael S. Carvalho	553803ac0f	replica: Fix tombstone GC during tablet split preparation During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: 1) sstable B is split first, and moved from main (unsplit) group to a split-ready group 2) now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes #20044. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `93815e0649`)	2024-10-20 20:38:21 -03:00
Benny Halevy	f61e0e0f3e	sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory To be able to atomically delete sstables both in base table directory and in its sub-directories, like `staging/`, use a shared pending_delete_dir under under the base directory. Note that this requires loading and processing the base directory first. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `f47b5e60bc`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> # Conflicts: # sstables/sstable_directory.hh	2024-10-20 09:17:06 +03:00
Calle Wilund	c311a93f0a	commitlog: Fix buffer_list_bytes not updated correctly Fixes #20862 With the change in `60af2f3cb2` the bookkeep for buffer memory was changed subtly, the problem here that we would shrink buffer size before we after flush use said buffer's size to decrement the buffer_list_bytes value, previously inc:ed by the full, allocated size. I.e. we would slowly grow this value instead of adjusting properly to actual used bytes. Test included. (cherry picked from commit `ee5e71172f`) Closes scylladb/scylladb#20913	2024-10-03 09:12:33 +03:00
Botond Dénes	276af11202	Merge '[Backport 6.0] replica: ignore cleanup of deallocated storage group' from Aleksandra Martyniuk Cleanup of a deallocated tablet throws an exception. Since failed cleanup is retried, we end up in an infinite loop. Ignore cleanup of deallocated storage groups. Fixes: https://github.com/scylladb/scylladb/issues/19752. Needs to be backported to all branches with tablets (6.0 and later) (cherry picked from commit `20d6cf55f2`) (cherry picked from commit `2c4b1d6b45`) Refs https://github.com/scylladb/scylladb/pull/20584 Closes scylladb/scylladb#20628 * github.com:scylladb/scylladb: test: check if cleanup of deallocated sg is ignored replica: ignore cleanup of deallocated storage group	2024-09-26 10:58:01 +03:00
Botond Dénes	7d421abec4	Merge '[Manual Backport 6.0] generic_server: convert connection tracking to seastar::gate' from Laszlo Ersek This is a manual backport of #20212 to 6.0, superseding #20346 (which had run into conflicts). Please see the individual commit messages for backport notes. Fixes #10305 Closes scylladb/scylladb#20349 * github.com:scylladb/scylladb: generic_server: make server::stop() idempotent generic_server: coroutinize server::shutdown() generic_server: make server::shutdown() idempotent test/generic_server: add test case configure, cmake: sort the lists of boost unit tests generic_server: convert connection tracking to seastar::gate	2024-09-19 09:18:48 +03:00
Tomasz Grabiec	e38b42cedf	Merge '[Backport 6.0] tablets: Fix race between repair and split ' from Raphael "Raph" Carvalho Consider the following: ``` T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes ``` If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes https://github.com/scylladb/scylladb/issues/19378. Fixes https://github.com/scylladb/scylladb/issues/19416. Please replace this line with justification for the backport/* labels added to this PR (cherry picked from commit `239344ab55`) (cherry picked from commit `74612ad358`) Refs https://github.com/scylladb/scylladb/pull/19427 Closes scylladb/scylladb#20593 * github.com:scylladb/scylladb: tablets: Fix race between repair and split compaction: Allow "offline" sstable to be split	2024-09-17 13:24:36 +02:00
Aleksandra Martyniuk	62593ee85f	test: check if cleanup of deallocated sg is ignored (cherry picked from commit `2c4b1d6b45`)	2024-09-16 17:49:59 +02:00
Avi Kivity	ad52caac55	cql3: add option to not unify bind variables with the same name Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes #15559 (cherry picked from commit `ea8441dfa3`) (cherry picked from commit `edb3068ecf`)	2024-09-11 22:55:22 +03:00
Avi Kivity	aabad7e88f	cql3: introduce dialect infrastructure A dialect is a different way to interpret the same CQL statement. Examples: - how duplicate bind variable names are handled (later in this series) - whether `column = NULL` in LWT can return true (as is now) or whether it always returns NULL (as in SQL) Currently, dialect is an empty structure and will be filled in later. It is passed to query_processor methods that also accept a CQL string, and from there to the parser. It is part of the prepared statement cache key, so that if the dialect is changed online, previous parses of the statement are ignored and the statement is prepared again. The patch is careful to pick up the dialect at the entry point (e.g. CQL protocol server) so that the dialect doesn't change while a statement is parsed, prepared, and cached. (cherry picked from commit `d69bf4f010`)	2024-09-11 22:55:22 +03:00
Laszlo Ersek	272c409b26	test/generic_server: add test case Check whether we can stop a generic server without first asking it to listen. The test fails currently; the failure mode is a hang, which triggers the 5 minute timeout set in the test: > unknown location(0): fatal error: in "stop_without_listening": > seastar::timed_out_error: timedout > seastar/src/testing/seastar_test.cc(43): last checkpoint > test/boost/generic_server_test.cc(34): Leaving test case > "stop_without_listening"; testing time: 300097447us Backport notes for 6.0: - Replace #include "utils/assert.hh" SCYLLA_ASSERT(false); with #include <cassert> assert(false); due to 6.0 lacking commit `aa1270a00c` ("treewide: change assert() to SCYLLA_ASSERT()", 2024-08-05). The header file "utils/assert.hh" wouldn't be difficult to backport, but separating it from the treewide changes in commit `aa1270a00c` might not be the best idea. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `dbc0ca6354`)	2024-08-30 15:22:18 +02:00
Laszlo Ersek	5490092abf	configure, cmake: sort the lists of boost unit tests Both lists were obviously meant to be sorted originally, but by today we've introduced many instances of disorder -- thus, inserting a new test in the proper place leaves the developer scratching their head. Sort both lists. Backport notes for 6.0: - Conflicts in "configure.py" and "test/boost/CMakeLists.txt", unsurprisingly. For the backport, I sorted the boost unit test list in each file manually, from scratch. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `931f2f8d73`)	2024-08-30 14:43:26 +02:00
Botond Dénes	e33fcfe27b	Merge '[Backport 6.0] Make Summary support histogram with infinite bucket vlaues' from ScyllaDB This series fixes an issue where histogram Summaries return an infinite value. It updated the quantile calculation logic to address cases where values fall into the infinite bucket of a histogram. Now, instead of returning infinite (max int), the calculation will return the last bucket limit, ensuring finite outputs in all cases. The series adds a test for summaries with a specific test case for this scenario. Fixes #20255 Need backport to 6.0, 6.1 and 2023.1 and above (cherry picked from commit `011aa91a8c`) (cherry picked from commit `644e6f0121`) Refs #20257 Closes scylladb/scylladb#20304 * github.com:scylladb/scylladb: test/estimated_histogram_test Add summary tests utils/histogram.hh: Make summary support inifinite bucket.	2024-08-29 07:52:36 +03:00
Lakshmi Narayanan Sreethar	3f23780650	boost/sstable_datafile_test: wait for total memory reclaimed update The testcase `test_bloom_filter_reclaim_during_reload` checks the SSTable manager's `_total_memory_reclaimed` against an expected value to verify that a Bloom filter was reloaded. However, it does not wait for the manager to update the variable, causing the check to fail if the update has not occurred yet. Fix it by making the testcase wait until the variable is updated to the expected value. Fixes #19879 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#19883 (cherry picked from commit `27b305b9d1`) Closes scylladb/scylladb#19963	2024-08-28 06:33:29 +03:00
Benny Halevy	e06be56c28	sstable_directory: delete_atomically: allow sstables from multiple prefixes Currently, delete_atomically can be called with a list of sstables from mixed prefixes in two cases: 1. truncate: where we delete all the sstables in the table directory 2. tablet cleanup: similar to truncate but restricted to sstables in a single tablet replica In both cases, it is possible that sstables in staging (or quarantine) are mixed with sstables in the base directory. Until a more comprehensive fix is in place, (see https://github.com/scylladb/scylladb/pull/19555) this change just lifts the ban on atomic deletion of sstables from different prefixes, and acknowledging that the implementation is not atomic across prefixes. This is better than crashing for now, and can be backported more easily to branches that support tablets so tablet migration can be done safely in the presence of repair of tables with views. Refs scylladb/scylladb#18862 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `26abad23d9`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19920	2024-08-28 06:32:12 +03:00
Amnon Heiman	043855c574	test/estimated_histogram_test Add summary tests This patch adds tests for summary calculation. It adds two tests, the first is a basic calculation for P50, P95, P99 by adding 100 elements into 20 buckets. The second test look that if elements are found in the infinite bucket, the result would be the lower limit (33s) and not infinite. Relates to #20255 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `644e6f0121`)	2024-08-27 12:12:39 +00:00
Raphael S. Carvalho	b8099b9e83	compaction: Allow "offline" sstable to be split In order to fix the race between split and repair, we must introduce the ability to split an "offline" sstable, one that wasn't added to any of the table's sstable set yet. It's not safe to split a sstable after adding it to the set, because a failure to split can result in unsplit data left in the set, causing split to fail down the road, since the coordinator thinks this replica has only split data in the set. Refs #19378. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `239344ab55`)	2024-08-20 10:39:31 +00:00
Lakshmi Narayanan Sreethar	ab6b8be69a	boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `ec47b50859`)	2024-08-19 12:13:11 +00:00
Raphael S. Carvalho	39eb44dfa0	replica: get rid of fragile compaction group intrusive list It was added to make integration of storage groups easier, but it's complicated since it's another source of truth and we could have problems if it becomes inconsistent with the group map. Fixes #18506. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `ad5c5bca5f`)	2024-08-13 12:26:11 -03:00
Tomasz Grabiec	89a93a784e	tablets: Do not allocate tablets on nodes being decommissioned If tablet-based table is created concurrently with node being decommissioned after tablets are already drained, the new table may be permanently left with replicas on the node which is no longer in the topology. That creates an immidiate availability risk because we are running with one replica down. This also violates invariants about replica placement and this state cannot be fixed by topology operations. One effect is that this will lead to load balancer failure which will inhibit progress of any topology operations: load_balancer - Replica 154b0380-1dd2-11b2-9fdd-7156aa720e1a:0 of tablet 7e03dd40-537b-11ef-9fdd-7156aa720e1a:1 not found in topology, at: ... Fixes #20032 (cherry picked from commit `f5c74a5df2`) Closes scylladb/scylladb#20067	2024-08-08 11:57:09 +03:00
Kefu Chai	e1dab2779d	test/boost: include test/lib/test_utils.hh this change was created in the same spirit of 505900f18f. because we are deprecating the operator<< for vector and unorderd_map in Seastar, some tests do not compile anymore if we disable these operators. so to be prepared for the change disabling them, let's include test/lib/test_utils.hh for accessing the printer dedicated for Boost.test. and also '#include <fmt/ranges.h>' when necessary, because, in order to format the ranges using {fmt}, we need to use fmt/ranges.h. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-02 15:04:34 +02:00
Dawid Medrek	13183069f7	test/boost/hint_test.cc: Add missing parse() callback Before these changes, compilation was failing with the following error: In file included from test/boost/hint_test.cc:12: /usr/include/fmt/ranges.h:298:7: error: no member named 'parse' in 'fmt::formatter<db::hints::sync_point::host_id_or_addr>' 298 \| f.parse(ctx); \| ~ ^ We add the missing callback. Closes scylladb/scylladb#19375	2024-08-01 14:49:36 +02:00
Michael Litvak	df0503afd6	db/hints: migrate sync point to host ID Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module. Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs. The encoding of sync points now always uses the new v3 format with host IDs. The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of the new format the translation from IP to host ID is avoided.	2024-07-31 18:00:28 +02:00
Tomasz Grabiec	416cbafd16	Merge '[Backport 6.0] sstables: fix some mixups between the writer's schema and the sstable's schema' from Michał Chojnowski There are two schemas associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. This series fixes the known mixups between the two — when setting up compression, and when setting up the bloom filters. Fixes scylladb/scylladb#16065 The bug is present in all supported versions, so the patch has to be backported to all of them. (cherry picked from commit `a1834efd82`) (cherry picked from commit `d10b38ba5b`) (cherry picked from commit `1a8ee69a43`) Refs scylladb/scylladb#19695 Closes scylladb/scylladb#19877 * github.com:scylladb/scylladb: sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's sstables: for i_filter downcasts, use dynamic_cast instead of static_cast	2024-07-29 15:36:52 +02:00
Michał Chojnowski	43ba44ce97	sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the compressor objects based on its own schema, but using them based based on the sstable's schema the sstable's schema. This patch forces the writer to use the sstable's schema for both. (cherry picked from commit `1a8ee69a43`)	2024-07-25 12:23:58 +02:00
Michał Chojnowski	d6d3a91283	sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the filter based on its own schema, while the layer outside the writer was interpreting it as if it was created with the sstable's schema. This patch forces the writer to pick the filter's parameters based on the sstable's schema instead. (cherry picked from commit `d10b38ba5b`)	2024-07-25 12:23:58 +02:00
Lakshmi Narayanan Sreethar	3c1fd843c8	[Backport 6.0]: sstables: do not reload components of unlinked sstables The SSTable is removed from the reclaimed memory tracking logic only when its object is deleted. However, there is a risk that the Bloom filter reloader may attempt to reload the SSTable after it has been unlinked but before the SSTable object is destroyed. Prevent this by removing the SSTable from the reclaimed list maintained by the manager as soon as it is unlinked. The original logic that updated the memory tracking in `sstables_manager::deactivate()` is left in place as (a) the variables have to be updated only when the SSTable object is actually deleted, as the memory used by the filter is not freed as long as the SSTable is alive, and (b) the `_reclaimed.erase(sst)` is still useful during shutdown, for example, when the SSTable is not unlinked but just destroyed. Fixes https://github.com/scylladb/scylladb/issues/19722 Closes scylladb/scylladb#19717 github.com:scylladb/scylladb: boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded sstables: do not reload components of unlinked sstables sstables/sstables_manager: introduce on_unlink method (cherry picked from commit `591876b44e`) Backported from #19717 to 6.0 Closes scylladb/scylladb#19830	2024-07-23 23:16:53 +03:00
Botond Dénes	f42e8e872a	tools/schema_loader: introduce load_schema_from_sstable() Allows loading the schema from an sstable's serialization header. This schema is incomplete, but it is enough to parse and display the content of the sstable. (cherry picked from commit `8f2ba03465`)	2024-07-12 10:36:59 +00:00
Michał Chojnowski	c5c19e90ac	logalloc: add hold_reserve mutation_partition_v2::apply_monotonically() needs to perform some allocations in a destructor, to ensure that the invariants of the data structure are restored before returning. But it is usually called with reclaiming disabled, so the allocations might fail even in a perfectly healthy node with plenty of reclaimable memory. This patch adds a mechanism which allows to reserve some LSA memory (by asking the allocator to keep it unused) and make it available for allocation right when we need to guarantee allocation success. (cherry picked from commit `7b3f55a65f`)	2024-07-10 08:36:11 +00:00
Botond Dénes	88d3c2eb4b	test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrency (cherry picked from commit `b4f3809ad2`)	2024-07-08 08:13:07 +03:00
Botond Dénes	4307631950	test/boost/reader_concurrency_semaphore_test: hoist require_can_admit This is currently a lambda in a test, hoist it into the global scope and make it into a function, so other tests can use it too (in the next patch). (cherry picked from commit `9cbdd8ef92`)	2024-07-08 08:12:34 +03:00
Piotr Dulikowski	8b9e62e107	Merge '[Backport 6.0] cql3/statement/select_statement: do not parallelize single-partition aggregations' from Michał Jadwiszczak This patch adds a check if aggregation query is doing single-partition read and if so, makes the query to not use forward_service and do not parallelize the request. Fixes scylladb/scylladb#19349 (cherry picked from commit `e9ace7c203`) (cherry picked from commit `8eb5ca8202`) Refs scylladb/scylladb#19350 Closes scylladb/scylladb#19499 * github.com:scylladb/scylladb: test/boost/cql_query_test: add test for single-partition aggregation cql3/select_statement: do not parallelize single-partition aggregations	2024-07-02 21:03:24 +02:00
Michał Jadwiszczak	29c6a4cf44	test/boost/cql_query_test: add test for single-partition aggregation (cherry picked from commit `8eb5ca8202`)	2024-06-25 23:56:49 +02:00
Raphael S. Carvalho	3d9aa9d49e	compaction: Reduce twcs off-strategy space overhead to 10% of free space TWCS off-strategy suffers with 100% space overhead, so a big TWCS table can cause scylla to run out of disk space during node ops. To not penalize TWCS tables, that take a small percentage of disk, with increased write ampl, TWCS off-strategy will be restricted to 10% of free disk space. Then small tables can still compact all disjoint sstables in a single round. Fixes #16514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `ace4e5111e`)	2024-06-20 20:41:41 +00:00
Raphael S. Carvalho	ef72075920	compaction: wire storage free space into reshape procedure After this, TWCS reshape procedure can be changed to limit job to 10% of available space. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `0ce8ee03f1`)	2024-06-20 20:41:41 +00:00
Raphael S. Carvalho	37f1af2646	sstables: Allow to get free space from underlying storage That will be used in turn to restrict reshape to 10% of available space in underlying storage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `51c7ee889e`)	2024-06-20 20:41:41 +00:00
Lakshmi Narayanan Sreethar	e64e659ef1	test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial Update the maximum size tested by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (12.8 KB). So, use twice the max_chunk_capacity as the test size distribution upper limit to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `310c5da4bb`)	2024-06-14 15:48:57 +00:00
Lakshmi Narayanan Sreethar	397b04b2a4	utils/lsa/chunked_managed_vector: fix reserve_partial() Fix the method comment and return types of chunked_managed_vector's reserve_partial() similar to chunked_vector's reserve_partial() as it has the same issues mentioned in #19254. Also update the usage in the chunked_managed_vector_test. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `d4f8b91bd6`)	2024-06-14 15:48:56 +00:00
Lakshmi Narayanan Sreethar	4e68599b17	test/boost/chunked_vector_test: fix testcase tests_reserve_partial Fix the usage of reserve_partial in the testcase. Also update the maximum chunk size used by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (128 KB). So, use smaller chunk size, 512 bytes, to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `29f036a777`)	2024-06-14 15:48:56 +00:00
Kefu Chai	b39c0a1d15	test: memtable_test: increase unspooled_dirty_soft_limit before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `223fba3243`)	2024-06-12 15:44:11 +00:00
Kefu Chai	548fd01bd4	test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE before this change, we verify the behavior of design under test using `BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test fails, the test just aborts. this is not very helpful for postmortem debugging. after this change, we use `BOOST_REQUIRE` macro for verifying the behavior, so that Boost.Test prints out the condition if it does not hold when we test it. Refs #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `2df4e9cfc2`)	2024-06-12 15:44:11 +00:00
Lakshmi Narayanan Sreethar	85805f6472	db/config.cc: increment components_memory_reclaim_threshold config default Incremented the components_memory_reclaim_threshold config's default value to 0.2 as the previous value was too strict and caused unnecessary eviction in otherwise healthy clusters. Fixes #18607 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `3d7d1fa72a`) Closes scylladb/scylladb#19014	2024-06-03 12:19:16 +03:00
Pavel Emelyanov	62a23fd86a	config: Remove experimental TABLETS feature ... and replace it with boolean enable_tablets option. All the places in the code are patched to check the latter option instead of the former feature. The option is OFF by default, but the default scylla.yaml file sets this to true, so that newly installed clusters turn tablets ON. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `83d491af02`) Closes scylladb/scylladb#19012	2024-06-03 12:16:41 +03:00
Marcin Maliszkiewicz	cbf47319c1	db: auth: move auth tables to system keyspace Separate keyspace which also behaves as system brings little benefit while creating some compatibility problems like schema digest mismatch during rollback. So we decided to move auth tables into system keyspace. Fixes https://github.com/scylladb/scylladb/issues/18098 Closes scylladb/scylladb#18769 (cherry picked from commit `2ab143fb40`) [avi: adjust test/alternator/suite.yaml to reflect new keyspace]	2024-06-02 21:41:14 +03:00
Piotr Smaron	5afa3028a3	Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks	2024-05-30 08:33:15 +03:00
Paweł Zakrzewski	242caa14fe	tablets: tests for adding/removing replicas Note we're suppressing a UBSanitizer overflow error in UTs. That's because our linter complains about a possible overflow, which never happens, but tests are still failing because of it.	2024-05-30 08:33:15 +03:00
Avi Kivity	33ec6ccea9	test: boost: chunked_vector_test: include <optional> std::optional is used but not imported. This fails on libstdc++-14. Closes scylladb/scylladb#18739	2024-05-21 07:37:11 +03:00
Avi Kivity	61505d057e	Merge 'Sort user-defined types in describe statements' from Michał Jadwiszczak User-defined types can depend on each other, creating directed acyclic graph. In order to support restoring schema from `DESC SCHEMA`, UDTs should be ordered topologically, not alphabetically as it was till now. This patch changes the way UDTs are ordered in `DESC SCHEMA`/`DESC KEYSPACE <ks>` statements, so the output can be safely copy-pasted to restore the schema. Fixes #18539 Closes scylladb/scylladb#18302 * github.com:scylladb/scylladb: test/cql-pytest/test_describe: add test for UDTs ordering cql3/statements/describe_statement: UDTs topological sorting cql3/statements/describe_statement: allow to skip alphabetical sorting types: add a method to get all referenced user types db/cql_type_parser: use generic topological sorting db/cql_type_parses: futurize raw_builder::build() test/boost: add test for topological sorting utils: introduce generic topological sorting algorithm	2024-05-20 16:58:17 +03:00
Avi Kivity	52fe351c31	Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec This is needed to avoid severe imbalance between shards which can happen when some table grows and is split. The inter-node balance can be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N then there is not even a possibility of moving tablets around to fix the imbalance. The only way to bring the system to balance is to move tablets within the nodes. The system is not prepared for intra-node migration currently. Request coordination is host-based, while for intra-node migration it should be (also) shard-based. The solution employed here is to keep the coordination between nodes as-is, and for intra-node migration storage_proxy-level coordinator is not aware of the migration (no pending host). The replica-side request handler will be a second-level coordinator which routes requests to shards, similar to how the first-level coordinator routes them to hosts. Tablet sharder is adjusted to handle intra-migration where a tablet can have two replicas on the same host. For reads, sharder uses the read selector to resolve the conflict. For writes, the write selector is used. The old shard_of() API is kept to represent shard for reads, and new method is introduced to query the shards for writing: shard_for_writes(). All writers should be switched to that API, which is not done in this patch yet. The request handler on replica side acts as a second-level coordinator, using sharder to determine routing to shards. A given sharder has a scope of a single topology version, a single effective_replication_map_ptr, which should be kept alive during writes. perf-simple-query test results show no signs of regression: Command: perf-simple-query -c1 -m1G --write --tablets --duration=10 Before: > 83294.81 tps ( 59.5 allocs/op, 14.3 tasks/op, 53725 insns/op, 0 errors) > 87756.72 tps ( 59.5 allocs/op, 14.3 tasks/op, 54049 insns/op, 0 errors) > 86428.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54208 insns/op, 0 errors) > 86211.38 tps ( 59.7 allocs/op, 14.3 tasks/op, 54219 insns/op, 0 errors) > 86559.89 tps ( 59.6 allocs/op, 14.3 tasks/op, 54188 insns/op, 0 errors) > 86609.39 tps ( 59.6 allocs/op, 14.3 tasks/op, 54117 insns/op, 0 errors) > 87464.06 tps ( 59.5 allocs/op, 14.3 tasks/op, 54039 insns/op, 0 errors) > 86185.43 tps ( 59.6 allocs/op, 14.3 tasks/op, 54169 insns/op, 0 errors) > 86254.71 tps ( 59.6 allocs/op, 14.3 tasks/op, 54139 insns/op, 0 errors) > 83395.35 tps ( 60.2 allocs/op, 14.4 tasks/op, 54693 insns/op, 0 errors) > > median 86428.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54208 insns/op, 0 errors) > median absolute deviation: 243.04 > maximum: 87756.72 > minimum: 83294.81 > After: > 85523.06 tps ( 59.5 allocs/op, 14.3 tasks/op, 53872 insns/op, 0 errors) > 89362.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54226 insns/op, 0 errors) > 88167.55 tps ( 59.7 allocs/op, 14.3 tasks/op, 54400 insns/op, 0 errors) > 87044.40 tps ( 59.7 allocs/op, 14.3 tasks/op, 54310 insns/op, 0 errors) > 88344.50 tps ( 59.6 allocs/op, 14.3 tasks/op, 54289 insns/op, 0 errors) > 88355.06 tps ( 59.6 allocs/op, 14.3 tasks/op, 54242 insns/op, 0 errors) > 88725.46 tps ( 59.6 allocs/op, 14.3 tasks/op, 54230 insns/op, 0 errors) > 88640.08 tps ( 59.6 allocs/op, 14.3 tasks/op, 54210 insns/op, 0 errors) > 90306.31 tps ( 59.4 allocs/op, 14.3 tasks/op, 54043 insns/op, 0 errors) > 87343.62 tps ( 59.8 allocs/op, 14.3 tasks/op, 54496 insns/op, 0 errors) > > median 88355.06 tps ( 59.6 allocs/op, 14.3 tasks/op, 54242 insns/op, 0 errors) > median absolute deviation: 1007.41 > maximum: 90306.31 > minimum: 85523.06 Command (reads): perf-simple-query -c1 -m1G --tablets --duration=10 Before: > 95860.18 tps ( 63.1 allocs/op, 14.1 tasks/op, 42476 insns/op, 0 errors) > 97537.69 tps ( 63.1 allocs/op, 14.1 tasks/op, 42454 insns/op, 0 errors) > 97549.23 tps ( 63.1 allocs/op, 14.1 tasks/op, 42470 insns/op, 0 errors) > 97511.29 tps ( 63.1 allocs/op, 14.1 tasks/op, 42470 insns/op, 0 errors) > 97227.32 tps ( 63.1 allocs/op, 14.1 tasks/op, 42471 insns/op, 0 errors) > 94031.94 tps ( 63.1 allocs/op, 14.1 tasks/op, 42441 insns/op, 0 errors) > 96978.04 tps ( 63.1 allocs/op, 14.1 tasks/op, 42462 insns/op, 0 errors) > 96401.70 tps ( 63.1 allocs/op, 14.1 tasks/op, 42473 insns/op, 0 errors) > 96573.77 tps ( 63.1 allocs/op, 14.1 tasks/op, 42440 insns/op, 0 errors) > 96340.54 tps ( 63.1 allocs/op, 14.1 tasks/op, 42468 insns/op, 0 errors) > > median 96978.04 tps ( 63.1 allocs/op, 14.1 tasks/op, 42462 insns/op, 0 errors) > median absolute deviation: 571.20 > maximum: 97549.23 > minimum: 94031.94 > After: > 99794.67 tps ( 63.1 allocs/op, 14.1 tasks/op, 42471 insns/op, 0 errors) > 101244.99 tps ( 63.1 allocs/op, 14.1 tasks/op, 42472 insns/op, 0 errors) > 101128.37 tps ( 63.1 allocs/op, 14.1 tasks/op, 42485 insns/op, 0 errors) > 101065.27 tps ( 63.1 allocs/op, 14.1 tasks/op, 42465 insns/op, 0 errors) > 101212.98 tps ( 63.1 allocs/op, 14.1 tasks/op, 42456 insns/op, 0 errors) > 101413.31 tps ( 63.1 allocs/op, 14.1 tasks/op, 42463 insns/op, 0 errors) > 101464.92 tps ( 63.1 allocs/op, 14.1 tasks/op, 42466 insns/op, 0 errors) > 101086.74 tps ( 63.1 allocs/op, 14.1 tasks/op, 42488 insns/op, 0 errors) > 101559.09 tps ( 63.1 allocs/op, 14.1 tasks/op, 42468 insns/op, 0 errors) > 100742.58 tps ( 63.1 allocs/op, 14.1 tasks/op, 42491 insns/op, 0 errors) > > median 101212.98 tps ( 63.1 allocs/op, 14.1 tasks/op, 42456 insns/op, 0 errors) > median absolute deviation: 200.33 > maximum: 101559.09 > minimum: 99794.67 > Fixes #16594 Closes scylladb/scylladb#18026 * github.com:scylladb/scylladb: Implement fast streaming for intra-node migration test: tablets_test: Test sharding during intra-node migration test: tablets_test: Check sharding also on the pending host test: py: tablets: Test writes concurrent with migration test: py: tablets: Test crash during intra-node migration api, storage_service: Introduce API to wait for topology to quiesce dht, replica: Remove deprecated sharder APIs test: Avoid using deprecated sharded API db: do_apply_many() avoid deprecated sharded API replica: mutation_dump: Avoid deprecated sharder API repair: Avoid deprecated sharder API table: Remove optimization which returns empty reader when key is not owned by the shard dht: is_single_shard: Avoid deprecated sharder API dht: split_range_to_single_shard: Work with static_sharder only dht: ring_position_range_sharder: Avoid deprecated sharder APIs dht: token: Avoid use of deprecated sharder API by switching to static_sharder selective_token_sharder: Avoid use of deprecated sharder API docs: Document tablet sharding vs tablet replica placement readers/multishard.cc: use shard_for_reads() instead of shard_of() multishard_mutation_query.cc: use shard_for_reads() instead of shard_of() storage_proxy: Extract common code to apply mutations on many shards according to sharder storage_proxy: Prepare per-partition rate-limiting for intra-node migration storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate() storage_proxy: Prepare mutate_hint() for intra-node tablet migration commitlog_replayer: Avoid deprecated sharder::shard_of() lwt: Avoid deprecated sharder::shard_of() compaction: Avoid deprecated sharder::shard_of() dht: Extract dht::static_sharder replica: Deprecate table::shard_of() locator: Deprecate effective_replication_map::shard_of() dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard tests: tablets: py: Add intra-node migration test tests: tablets: Test that drained nodes are not balanced internally tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load tests: tablets: Verify that disabling balancing results in no intra-node migrations tests: tablets: Check that nodes are internally balanced tests: tablets: Improve debuggability by showing which rows are missing tablets, storage_service: Support intra-node migration in move_tablet() API tablet_allocator: Generate intra-node migration plan tablet_allocator: Extract make_internode_plan() tablet_allocator: Maintain candidate list and shard tablet count for target nodes tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions tablets, streaming: Implement tablet streaming for intra-node migration dht, auto_refreshing_sharder: Allow overriding write selector multishard_writer: Handle intra-node migration storage_proxy: Handle intra-node tablet migration for writes tablets: Get rid of tablet_map::get_shard() tablets: Avoid tablet_map::get_shard in cleanup tablets: test: Use sharder instead of tablet_map::get_shard() tablets: tablet_sharder: Allow working with non-local host sharding: Prepare for intra-node-migration docs: Document sharder use for tablets tablets: Introduce tablet transition kind for intra-node migration tests: tablets: Fix use-after-move of skiplist in rebalance_tablets() sstables, gdb: Track readers in a linked list raft topology: Fix global token metadata barrier to not fence ahead of what is drained	2024-05-20 16:13:01 +03:00
Kefu Chai	40ce52c3cc	test: use generic boost_test_print_type() in this change, we trade the `boost_test_print_type()` overloads for the generic template of `boost_test_print_type()`, except for those in the very small tests, which presumably want to keep themselves relative self-contained. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18727	2024-05-20 12:56:20 +03:00

1 2 3 4 5 ...

3228 Commits