scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 13:06:57 +00:00

Author	SHA1	Message	Date
Michał Chojnowski	6a982ee0dc	service: make Raft group 0 aware of system.dicts Adds glue which causes the contents of system.dicts to be sent in group 0 snapshots, and causes a callback to be called when system.dicts is updated locally. The callback is currently empty and will be hooked up to the RPC compressor tracker in one of the next commits.	2024-12-23 23:37:02 +01:00
Michał Chojnowski	cc15ca329e	db/system_keyspace: add system.dicts Adds a new system table which will act as the medium for distributing compression dictionaries over the cluster. This table will be managed by Raft (group 0). It will be hooked up to it in follow-up commits.	2024-12-23 23:37:02 +01:00
Michał Chojnowski	0fd1050784	utils: add advanced_rpc_compressor Adds glue needed to pass lz4 and zstd with streaming and/or dictionaries as the network traffic compressors for Seastar's RPC servers. The main jobs of this glue are: 1. Implementing the API expected by Seastar from RPC compressors. 2. Expose metrics about the effectiveness of the compression. 3. Allow dynamically switching algorithms and dictionaries on a running connection, without any extra waits. The biggest design decision here is that the choice of algorithm and dictionary is negotiated by both sides of the connection, not dictated unilaterally by the sender. The negotiation algorithm is fairly complicated (a TLA+ model validating it is included in the commit). Unilateral compression choice would be much simpler. However, negotiation avoids re-sending the same dictionary over every connection in the cluster after dictionary updates (with one-way communication, it's the only reliable way to ensure that our receiver possesses the dictionary we are about to start using), lets receivers ask for a cheaper compression mode if they want, and lets them refuse to update a dictionary if they don't think they have enough free memory for that. In hindsight, those properties probably weren't worth the extra complexity and extra development effort. Zstd can be quite expensive, so this patch also includes a mechanism which temporarily downgrades the compressor from zstd to lz4 if zstd has been using too much CPU in a given slice of time. But it should be noted that this can't be treated as a reliable "protection" from negative performance effects of zstd, since a downgrade can happen on the sender side, and receivers are at the mercy of senders.	2024-12-23 23:37:02 +01:00
Michał Chojnowski	5294762ac7	utils: add dict_trainer	2024-12-23 23:37:02 +01:00
Michał Chojnowski	9de52b1c98	utils: introduce reservoir_sampling We are planning to improve some usages of compression in Scylla (in which we compress small blocks of data) by pre-training compression dictionaries on similar data seen so far. For example, many RPC messages have similar structure (and likely similar data), so the similarity could be exploited for better compression. This can be achieved e.g. by training a dictionary on the RPC traffic, and compressing subsequent RPC messages against that dictionary. To work well, the training should be fed a representative sample of the compressible data. Such a sample can be approached by taking a random subset (of some given reasonable size) of the data, with uniform probability. For our purposes, we need an online algorithm for this -- one which can select the random k-subset from a stream of arbitrary size (e.g. all RPC traffic over an hour), while requiring only the necessary minimum of memory. This is a known problem, called "reservoir sampling". This PR introduces `reservoir_sampler`, which implements an optimal algorithm for reservoir sampling. Additionally, it introduces `page_sampler` -- a wrapper for `reservoir_sampler`, which uses it to select a random sample of pages from a stream of bytes.	2024-12-23 23:37:02 +01:00
Michał Chojnowski	d301c29af5	utils: introduce alien_worker Introduces a util which launches a new OS thread and accepts callables for concurrent execution. Meant to be created once at startup and used until shutdown, for running nonpreemptible, 3rd party, non-interactive code. Note: this new utility is almost identical to wasm::alien_thread_runner. Maybe we should unify them.	2024-12-23 23:37:02 +01:00
Michał Chojnowski	866326efe4	utils: add stream_compressor Adds utilities for "advanced" methods of compression with lz4 and zstd -- with streaming (a history buffer persisted across messages) and/or precomputed dictionaries. This patch is mostly just glue needed to use the underlying libraries with discontiguous input and output buffers, and for reusing the same compressor context objects across messages. It doesn't contain any innovations of its own. There is one "design decision" in the patch. The block format of LZ4 doesn't contain the length of the compressed blocks. At decompression time, that length must be delivered to the decompressor by a channel separate to the compressed block itself. In `lz4_cstream`, we deal with that by prepending a variable-length integer containing the compressed size to each compressed block. This is suboptimal for single-fragment messages, since the user of lz4_cstream is likely going to remember the length of the whole message anyway, which makes the length prepended to the block redundant. But a loss of 1 byte is probably acceptable for most uses.	2024-12-23 23:28:12 +01:00
Kefu Chai	cd2a2bd021	repair: correct misspelling of "corespondent" replace "corespondent" with "corresponding" in a logging message. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22003	2024-12-23 11:29:58 +02:00
Takuya ASADA	03461d6a54	test: compile unit tests into a single executable To reduce test executable size and speed up compilation time, compile unit tests into a single executable. Here is a file size comparison of the unit test executable: - Before applying the patch $ du -h --exclude='.o' --exclude='.o.d' build/release/test/boost/ build/debug/test/boost/ 11G build/release/test/boost/ 29G build/debug/test/boost/ - After applying the patch du -h --exclude='.o' --exclude='.o.d' build/release/test/boost/ build/debug/test/boost/ 5.5G build/release/test/boost/ 19G build/debug/test/boost/ It reduces executable sizes 5.5GB on release, and 10GB on debug. Closes #9155 Closes scylladb/scylladb#21443	2024-12-22 19:14:09 +02:00
Piotr Smaron	200f0bb219	alternator: use get_datacenters() in get_network_topology_options() Currently, `get_network_topology_options()` is using gossip data and iterates over topology using IPs and not host IDs, which may result in operating on inconsistent data. This method's implemenations has been changed to instead use `get_datacenters()`, which should always return consistent data. Fixes: scylladb/scylladb#21490 Closes scylladb/scylladb#21940	2024-12-22 18:57:10 +02:00
Avi Kivity	f8ce49ebe9	cql3: implement NOT IN Where the grammar supports IN, we add NOT IN. This includes the WHERE clause and LWT IF clause. Evaluation of NOT IN follows from IN. In statement_restrictions analysis, they are different, as NOT IN doesn't enable any clever query plan and must filter. Some tests are added. An error message was changed ('in' changed to 'IN'), so some tests are adjusted. Closes scylladb/scylladb#21992	2024-12-22 15:15:23 +02:00
Kefu Chai	10c79a4d47	test/pylib: do not check for self.cmd when tearing down ScyllaServer we already check `self.cmd` for null at the very beginning of the `ScyllaServer.stop()`, and in the `try` block, we don't reset `self.cmd`, hence there is no need to check it again. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21936	2024-12-20 16:21:40 +02:00
Avi Kivity	eb62593f2c	treewide: use angle brackets when including seastar headers We treat Seastar as a "system" library, and those are included with angle brackets. Closes scylladb/scylladb#21959	2024-12-20 16:16:28 +02:00
Kefu Chai	f1a0613a39	mutation: remove unused function `prefixed()` is a static function in `mutation_partition_v2.cc`. and this function is not used in this translation unit. so let's remove it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22006	2024-12-20 16:12:10 +02:00
Yaniv Michael Kaul	dbe4ac7465	LICENSE-ScyllaDB-Source-Available.md: fix markdown Codespell complained. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#21980	2024-12-20 16:11:39 +02:00
Aleksandra Martyniuk	1c29726477	replica: do not set tablet_task_info if it isn't valid Currently, in tablet_map_to_mutation, repair's and migration's tablet_task_info is always set. Do not set the tablet_task_info if there is no running operation. Closes scylladb/scylladb#22005	2024-12-20 16:10:53 +02:00
Kefu Chai	2a9f34bb85	test/pytest.ini: put `repair` marker declaration back During the consolidation of per-suite pytest.ini files (commit `8bf62a086f`), the 'repair' marker was inadvertently dropped. This led to pytest warnings for tests using the @pytest.mark.repair decorator. This patch restores the marker declaration to eliminate the distracting PytestUnknownMarkWarning: ``` test/topology_experimental_raft/test_tablets.py:396 /home/kefu/dev/scylladb/test/topology_experimental_raft/test_tablets.py:396: PytestUnknownMarkWarning: Unknown pytest.mark.repair - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html @pytest.mark.repair ``` Restoring the marker allows tests to use the 'repair' mark without generating warnings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21931	2024-12-20 14:04:50 +02:00
Botond Dénes	42d24b2a8a	Merge 'Retire topology::sort_by_proximity and compare_endpoints flavors using gms::inet_address' from Benny Halevy This series converts the call site using compare_endpoints with gms::inet_address. With that both flavors of compare_endpoints and sort_by_proximity for inet_address can be retired as no other uses remain. Also, add a unit test for topology::sort_by_proximity before further changes to it are considered. * Code cleanup, no backport is needed Closes scylladb/scylladb#21976 * github.com:scylladb/scylladb: test: network_topology_strategy_test: add test_topology_sort_by_proximity locator/topology: retire sort_by_proximity/compare_endpoints for inet_address test: test_topology_compare_endpoints: use host_id:s	2024-12-20 13:34:55 +02:00
Yaron Kaikov	74c5aabd23	build_docker: add option for building container based on Ubuntu Pro Today our container is based on ubuntu:22.04, we need to build another container based on Ubuntu Pro for FIPS support (currently the latest one is 20.04) The default docker build process doesn't change, if FIPS is required I have added `--type pro` to build a supported container. To enable FIPS there is a need to attach an Ubuntu Pro subscription (it will be done as part of https://github.com/scylladb/scylla-pkg/issues/4186) Closes scylladb/scylladb#21974	2024-12-20 13:09:24 +02:00
Asias He	0141906c4a	repair: Enable small table optimization for RBNO rebuild Similar to `9ace191616` (repair: Enable small table optimization for RBNO bootstrap and decommission), this patch enables small table optimization for RBNO rebuild. This is useful for rebuild ops which is used for building an empty DC. Fixes: #21951 Closes scylladb/scylladb#21952	2024-12-20 13:03:34 +02:00
Kefu Chai	24283d9dd0	test/topology: rename manager_internal to manager_client instead of reusing the variable name and overriding the parameter, use a new name for the return value of `manager_internal()` for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21932	2024-12-20 13:01:45 +02:00
Kefu Chai	6914892a1b	repair: do not include unused headers these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21837	2024-12-20 08:55:56 +02:00
Botond Dénes	d4129ddaa6	Merge 'sstables_manager: do not reclaim unlinked sstables' from Lakshmi Narayanan Sreethar When an sstable is unlinked, it remains in the _active list of the sstable manager. Its memory might be reclaimed and later reloaded, causing issues since the sstable is already unlinked. This patch updates the on_unlink method to reclaim memory from the sstable upon unlinking, remove it from memory tracking, and thereby prevent the issues described above. Added a testcase to verify the fix. Fixes #21887 This is a bug fix in the bloom filter reload/reclaim mechanism and should be backported to older versions. Closes scylladb/scylladb#21895 * github.com:scylladb/scylladb: sstables_manager: reclaim memory from sstables on unlink sstables_manager: introduce reclaim_memory_and_stop_tracking_sstable() sstables: introduce disable_component_memory_reload() sstables_manager: log sstable name when reclaiming components	2024-12-19 15:18:16 +02:00
Kefu Chai	16397d8cba	message: do not include unused header In commit `bfee93c7`, repair verbs were moved to IDL. During this refactoring, the `gc_clock.hh` header became unused as its references were relocated. `clang-include-cleaner` helped identify this unnecessary include, which is now removed to clean up the codebase. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21919	2024-12-19 15:16:34 +02:00
Michał Chojnowski	f6ebd445e4	test_tablets.py: limit concurrency in test_tablet_storage_freeing Apparently the python driver can't deal with the current concurrency sometimes. Lower it from 1000 to 100. Fixes scylladb/scylladb#20489 Closes scylladb/scylladb#20494	2024-12-19 15:14:41 +02:00
Kefu Chai	df36985fc3	raft: do not include unused headers these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21838	2024-12-19 14:57:22 +02:00
Kefu Chai	93be8f3a0c	db,sstables: migate boost::range::stable_partition to std library now that we are allowed to use C++23. we now have the luxury of using `std::ranges::stable_partition`. in this change, we: - replace `boost::range::stable_parition()` to `std::ranges::stable_parition()` - since `std::ranges::stable_parition()` returns a subrange instead of an iterator, change the names of variables which were previously used for holding the return value of `boost::range::stable_partition()` accordingly for better readability. - remove unused `#include` of boost headers Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21911	2024-12-19 14:56:07 +02:00
Avi Kivity	a4440392d7	build: update dependencies for features to be ported from enterprise ldap/slapd/toxiproxy/cyrus-sasl - for ldap authentication and authorization git-lfs/bolt - for profile-guided optimization lz4-static - for dictionary based network compression jwt - for Oauth/GCP connectivity (for key management) openkmip - for kmip testing fipscheck - for FIPS validation Frozen toolchain regenerated, with optimized clang from https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-x86_64.tar.gz	2024-12-19 14:26:31 +02:00
Wojciech Mitros	37a25d3af4	mv: avoid stalls when calculating affected clustering ranges Currently, when finishing db::view::calculate_affected_clustering_ranges we deoverlap, transform and copy all ranges prepared before. This is all done within a single continuation and can cause stalls. We fix this by adding yields after each transform and moving elements to the final vector one by one instead of copying them all at the end. After this change, the longest continuation in this code will be deoverlapping the initial ranges (and one transform). While it has a relatively high computational complexity (we sort all ranges), it should execute quickly because we're operating on views there and we don't need to copy the actual bytes. If we encounter a stall there, we'll need to implement an asynchronous `deoverlap` method. Fixes scylladb/scylladb#21843 Closes scylladb/scylladb#21846	2024-12-19 12:50:30 +01:00
Kamil Braun	91cddcc17f	Merge 'Do not reset quarantine list in non raft mode' from Gleb Natapov The series contains small fixes to the gossiper one of which fixes #21930. Others I noticed while debugged the issue. Fixes: scylladb/scylladb#21930 Closes scylladb/scylladb#21956 * github.com:scylladb/scylladb: gossiper: do not reset _just_removed_endpoints in non raft mode gossiper: do not send echo message to yourself gossiper: do not call apply for the node's old state	2024-12-19 11:03:35 +01:00
Pavel Emelyanov	bb094cc099	Merge 'Make restore task abortable' from Calle Wilund Fixes #20717 Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data. Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()". Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader. There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored. Closes scylladb/scylladb#21567 * github.com:scylladb/scylladb: test_backup: Add restore abort test case sstables_loader: Make restore task abortable distributed_loader: Add optional abort_source to get_sstables_from_object_store s3_storage: Add optional abort_source to params/object s3::client: Make "readable_file" abortable	2024-12-19 12:23:33 +03:00
Benny Halevy	67b7015ced	test: network_topology_strategy_test: add test_topology_sort_by_proximity Before further changes are made to sort_by_proximity add a unit test for it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-12-19 09:45:02 +02:00
Benny Halevy	1c5b0eca41	locator/topology: retire sort_by_proximity/compare_endpoints for inet_address Those are not used anymore now that the last call site for compare_endpoints by inet_address is converted to use host_id. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-12-19 09:44:41 +02:00
Benny Halevy	dcdc60fffd	test: test_topology_compare_endpoints: use host_id:s This is the last call site requiring the compare_endpoints flavour for inet_address. Once this test is converted to use host_id:s instead, compare_endpoints and sort_by_proximity can be simplified to support only host_id:s. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-12-19 09:44:26 +02:00
Kefu Chai	2a31a82ae2	.github: Ensure header generation before include analysis When running clang-include-cleaner, the tool performs static analysis by "compiling" specified source files. Previously, non-existent included headers caused the tool to skip source files, reducing the effectiveness of unused include detection. Problem: - Header files like 'rust/wasmtime_bindings.hh' were not pre-generated - Compilation errors led to skipping source file analysis ``` /__w/scylladb/scylladb/lang/wasm.hh:15:10: fatal error: 'rust/wasmtime_bindings.hh' file not found 15 \| #include "rust/wasmtime_bindings.hh" \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Skipping file /__w/scylladb/scylladb/lang/wasm.hh due to compiler errors. clang-include-cleaner expects to work on compilable source code. 1 error generated. ``` - This significantly reduced clang-include-cleaner's coverage Solution: - Build the `wasmtime_bindings` target to generate required header files - Ensure all necessary headers are created before running static analysis - Enable full source file checking for unused includes By generating headers before analysis, we prevent skipping of source files and improve the comprehensiveness of our include cleaner workflow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21739	2024-12-19 09:41:46 +02:00
Ferenc Szili	dc375b8cd3	test: enable test_truncate_with_coordinator_crash This test was added in PR #19789 but was disabled with xfail because of the bug with way truncate saved the commit log replay positions. More specifically, the replay positions for shards that had no mutations were saved to system.truncated with shard_id == 0, regardless for which shard it was actually saved for (see #21719). The bug was fixed in #21722, so this change removes the xfail tag from the test. Closes scylladb/scylladb#21902	2024-12-18 18:02:52 +01:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Botond Dénes	1a717f3014	service/storage_proxy: data_resolver::resolve(): apply mutations gently The data resolved has to apply all mutations from all replica to a single mutation. In the extreme case, when all rows are dead, the mutations can have around 10K rows in them. This is not a huge amount, but it is enough to cause moderate stalls of <20ms. To avoid this, use the gentle variant of apply(), which can yield in the middle. Fixes: scylladb/scylladb#21818 Closes scylladb/scylladb#21884	2024-12-18 15:21:19 +01:00
Kefu Chai	e65fc35b5e	replica: do not include unused headers these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21836	2024-12-18 13:52:57 +02:00
Avi Kivity	5a849b0a6a	Merge "Move more subsystems to use host ids instead of ips" from Gleb " This series converts repair, streaming and node_ops (and some parts of alternator) to work on host ids instead of ips. This allows to remove a lot of (but not all) functions that work on ips from effective replication map. CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13830/ Refs: scylladb/scylladb#21777 " * 'gleb/move-to-host-id-more' of github.com:scylladb/scylla-dev: locator: topology: remove no longer use get_all_ips() gossiper: change get_unreachable_nodes to host ids locator: drop no longer used ip based functions from effective replication map and friends test: move network_topology_strategy_test and token_metadata_test to use host id based APIs replica/database: drop usage of ip in favor of host id in get_keyspace_local_ranges replica/mutation_dump: use host ids instead of ips alternator: move ttl to work with host ids instead of ips storage_service: move node_ops code to use host ids instead of host ips streaming: move streaming code to use host ids instead of host ips repair: move repair code to use host ids instead of host ips gossiper: add get_unreachable_host_ids() function locator: topology: add more function that return host ids to effective replication map locator: add more function that return host ids to effective replication map	2024-12-18 13:48:22 +02:00
Piotr Dulikowski	d067d8caef	Merge 'More Python tests for materialized view and Alternator GSI feature' from Nadav Har'El This patch includes more tests (in Python) that I wrote while implementing the Alternator UpdateTable feature for adding a GSI to an existing table (https://github.com/scylladb/scylladb/issues/11567). I explain each of these tests in the separate patches below, but basically they fall into two types: 1. Tests which pass with today's materialized views and Alternator GSI/LSI, and serve to ensure that whatever changes I do to the view update implementation, doesn't break corner cases that already worked. 2. Tests for the UpdateTable feature in Alternator which doesn't work today so xfail - and will need to work for #11567. We already had a few tests for this, but here I add more and improve coverage of various corner cases I discovered while implementing the featue. I already have a working prototype for #11567 which passes all these tests. Many of these tests helped exposed various bugs in earlier versions of my code. Closes scylladb/scylladb#21927 * github.com:scylladb/scylladb: test/cqlpy: a few more functional tests for materialized views test/alternator: more tests for UpdateTable create and delete GSI test/alternator: make UpdateTable tests wait less test/alternator: move UpdateTable tests to a separate file test/alternator: add another test for elaborate GSI updates test/alternator: test that DescribeTable returns IndexStatus for GSI test/alternator: fix wrong test for UpdateTable metrics test/alternator: add test for missing attribute in item in LSI test/alternator: test that DescribeTable doesn't return IndexStatus for LSI test/alternator: add tests for RBAC for create and delete GSI	2024-12-17 20:43:07 +01:00
Yaron Kaikov	3a00ffd2eb	build_docker.sh: remove rsyslog installation and conf It seems that no one is using rsyslog, so there is no point having it inside our container (see https://github.com/scylladb/scylladb/issues/21923#issuecomment-2545191667) Refs: https://github.com/scylladb/scylladb/issues/21923 Closes scylladb/scylladb#21953	2024-12-17 17:34:35 +02:00
Gleb Natapov	e318dfb83a	gossiper: do not reset _just_removed_endpoints in non raft mode By the time the function is called during start it may already be populated. Fixes: scylladb/scylladb#21930	2024-12-17 16:57:13 +02:00
Gleb Natapov	3368019982	gossiper: do not send echo message to yourself When sending by ID we should check that we do not translate our old address to our ID and sending locally. mark_alive should not be called with node's old ip anyway.	2024-12-17 16:57:13 +02:00
Gleb Natapov	e80355d3a1	gossiper: do not call apply for the node's old state If a nodes changed its address an old state may be still in a gossiper, so ignore it.	2024-12-17 16:57:13 +02:00
Avi Kivity	01cdba9a98	Merge 'cache_algorithm_test: fix flaky failures' from Michał Chojnowski This series attempts to get read of flakiness in `cache_algorithm_test` by solving two problems. Problem 1: The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form: 0x0000000000000000000000000000000000000000... 0x0100000000000000000000000000000000000000... 0x0200000000000000000000000000000000000000... But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar... 0x0100000000000000garbagegarbagegarbagegar... 0x0200000000000000garbagegarbagegarbagegar... Each of these keys is created several times and -- for the test to pass -- the result must be the same each time. By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses. Problem 2: Cache stats are global, so there's no good way to reliably verify that e.g. a given read causes 0 cache misses, because something done by Scylla in a background can trigger a cache miss. This can cause the test to fail spuriously. With how the test framework and the cache are designed, there's probably no good way to test this properly. It would require ensuring that cache stats are per-read, or at least per-table, and that Scylla's background activity doesn't cause enough memory pressure to evict the tested rows. This patch tries to deal with the flakiness without deleting the test altogether by letting it retry after a failure if it notices that it can be explained by a read which wasn't done by the test. (Though, if the test can't be written well, maybe it just shouldn't be written...) Fixes #21536 Should be backported to prevent flaky failures in older branches. Closes scylladb/scylladb#21948 * github.com:scylladb/scylladb: cache_algorithm_test: harden against stats being confused by background activity cache_algorithm_test: fix a use of an uninitialized variable	2024-12-17 14:46:43 +02:00
Lakshmi Narayanan Sreethar	4fe4367242	sstables_manager: reclaim memory from sstables on unlink When an sstable is unlinked, it remains in the _active list of the sstable manager. Its memory might be reclaimed and later reloaded, causing issues since the sstable is already unlinked. This patch updates the on_unlink method to reclaim memory from the sstable upon unlinking, remove it from memory tracking, and thereby prevent the issues described above. Added a testcase to verify the fix. Fixes #21887 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-12-17 18:14:43 +05:30
Lakshmi Narayanan Sreethar	5dffc19f2d	sstables_manager: introduce reclaim_memory_and_stop_tracking_sstable() When an sstable is unlinked or deactivated, it should be removed from the component memory tracking metrics and any further reload/reclaim should be disabled. This patch adds a new method that implements the above mentioned functionality. This patch also updates the deactivate() to use the new method. Next patch will use it to disable tracking when an sstable is unlinked. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-12-17 18:14:43 +05:30
Lakshmi Narayanan Sreethar	b7b4c5c661	sstables: introduce disable_component_memory_reload() Added a new method to disable reload of previously reclaimed components from the sstable. This will be used to disable reload of bloom filters after an sstable has been unlinked or deactivated. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-12-17 18:14:43 +05:30
Lakshmi Narayanan Sreethar	6ad962cb38	sstables_manager: log sstable name when reclaiming components Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-12-17 18:14:36 +05:30

1 2 3 4 5 ...

45920 Commits