scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Benny Halevy	9413afce41	chunked_vector_test: exception_safe_class: count also moved objects We have to account for moved objects as well as copied objects so they will be balanced with the respective `del_live_object` calls called by the destructor. However, since chunked_vector requires the value_type to be nothrow_move_constructible, just count the additional live object, but do not modify _countdown or, respectively, throw an exception, as this should be considered only for the default and copy constructors. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-21 11:05:38 +03:00
Benny Halevy	8e20379305	utils: chunked_vector: fill ctor: make exception safe Currently, if the fill ctor throws an exception, the destructor won't be called, as it object is not fully constructed yet. Call the default ctor first (which doesn't throw) to make sure the destructor will be called on exception. Fixes scylladb/scylladb#18635 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-21 11:05:38 +03:00
Botond Dénes	4e9ed69a75	Merge '[Backport 5.4] mutation_fragment_stream_validating_filter: respect validating_level::none' from ScyllaDB Even when configured to not do any validation at all, the validator still did some. This small series fixes this, and adds a test to check that validation levels in general are respected, and the validator doesn't validate more than it is asked to. Fixes: #18662 (cherry picked from commit `f6511ca1b0`) (cherry picked from commit `e7b07692b6`) (cherry picked from commit `78afb3644c`) Refs #18667 Closes scylladb/scylladb#18724 * github.com:scylladb/scylladb: test/boost/mutation_fragment_test.cc: add test for validator validation levels mutation: mutation_fragment_stream_validating_filter: fix validation_level::none mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter	2024-05-20 09:02:52 +03:00
Botond Dénes	7552c4b187	test/boost/mutation_fragment_test.cc: add test for validator validation levels To make sure that the validator doesn't validate what the validation level doesn't include. (cherry picked from commit `78afb3644c`)	2024-05-17 07:55:05 +00:00
Botond Dénes	87dcd29ec3	mutation: mutation_fragment_stream_validating_filter: fix validation_level::none Despite its name, this validation level still did some validation. Fix this, by short-circuiting the catch-all operator(), preventing any validation when the user asked for none. (cherry picked from commit `e7b07692b6`)	2024-05-17 07:55:04 +00:00
Botond Dénes	9e7cd767dd	mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter When set to false, no exceptions will be raised from the validator on validation error. Instead, it will just return false from the respective validator methods. This makes testing simpler, asserting exceptions is clunky. When true (default), the previous behaviour will remain: any validation error will invoke on_internal_error(), resulting in either std::abort() or an exception. (cherry picked from commit `f6511ca1b0`)	2024-05-17 07:55:04 +00:00
Botond Dénes	63d1c763fc	Merge '[Backport 5.4] tools/scylla-sstable: add `scylla sstable shard-of` command' from Kefu Chai when migrating to the uuid-based identifiers, the mapping from the integer-based generation to the shard-id is preserved. we used to have "gen % smp_count" for calculating the shard which is responsible to host a given sstable. despite that this is not a documented behavior, this is handy when we try to correlate an sstable to a shard, typically when looking at a performance issue. in this change, a new subcommand is added to expose the connection between the sstable and its "owner" shards. Fixes #16343 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes https://github.com/scylladb/scylladb/pull/16345 (cherry picked from commit `273ee36bee`) Fixes #18381 - [x] need to backport, because we have needs in production to figure out the mapping from an sstable identifier to the shard which "owns" it. Closes scylladb/scylladb#18681 * github.com:scylladb/scylladb: tools: Make sstable shard-of efficient by loading minimum to compute owners test/cql-pytest/test_tools.py: test shard-of with a single partition tools/scylla-sstable: add `scylla sstable shard-of` command	2024-05-16 11:07:47 +03:00
Pavel Emelyanov	29c892ea5a	functions: Do not crash when schema is missing Getting token() function first tries to find a schema for underlying table and continues with nullptr if there's no one. Later, when creating token_fct, the schema is passed as is and referenced. If it's null crash happens. It used to throw before `5983e9e7b2` (cql3: test_assignment: pass optional schema everywhere) on missing schema, but this commit changed the way schema is looked up, so nullptr is now possible. fixes: #18637 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `df8a446437`) Closes scylladb/scylladb#18698	2024-05-16 11:06:25 +03:00
Raphael S. Carvalho	9bb175852d	tools: Make sstable shard-of efficient by loading minimum to compute owners Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18440 (cherry picked from commit `d7a01598ce`)	2024-05-15 14:32:43 +08:00
Kefu Chai	daf4ffb9b4	test/cql-pytest/test_tools.py: test shard-of with a single partition test_scylla_sstable_shard_of takes lots of time preparing the keys for a certain shard. with the debug build, it takes 3 minutes to complete the test. so in order to test the "shard-of" subcommand in an more efficient way, in this change, we improve the test in two ways: 1. cache the output of 'scylla types shardof`. so we can avoid the overhead of running a seastar application repeatly for the same keys. 2. reduce the number of partitions from 42 to 1. as the number of partitions in an sstable does not matter when testing the output of "shard-of" command of a certain sstable. because, the sstable is always generated by a certain shard. before this change, with pytest-profiling: ``` ncalls tottime percall cumtime percall filename:lineno(function) 4/3 0.000 0.000 181.950 60.650 runner.py:219(call_and_report) 4/3 0.000 0.000 181.948 60.649 runner.py:247(call_runtest_hook) 4/3 0.000 0.000 181.948 60.649 runner.py:318(from_call) 4/3 0.000 0.000 181.948 60.649 runner.py:262(<lambda>) 44/11 0.000 0.000 181.935 16.540 _hooks.py:427(__call__) 43/11 0.000 0.000 181.935 16.540 _manager.py:103(_hookexec) 43/11 0.000 0.000 181.935 16.540 _callers.py:30(_multicall) 361 0.001 0.000 181.531 0.503 contextlib.py:141(__exit__) 782/81 0.001 0.000 177.578 2.192 {built-in method builtins.next} 1044 0.006 0.000 92.452 0.089 base_events.py:1894(_run_once) 11 0.000 0.000 91.129 8.284 fixtures.py:686(<lambda>) 17/11 0.000 0.000 91.129 8.284 fixtures.py:1025(finish) 4 0.000 0.000 91.128 22.782 fixtures.py:913(_teardown_yield_fixture) 2/1 0.000 0.000 91.055 91.055 runner.py:111(pytest_runtest_protocol) 2/1 0.000 0.000 91.055 91.055 runner.py:119(runtestprotocol) 2 0.000 0.000 91.052 45.526 conftest.py:50(cql) 2 0.000 0.000 91.040 45.520 util.py:161(cql_session) 1 0.000 0.000 91.040 91.040 runner.py:180(pytest_runtest_teardown) 1 0.000 0.000 91.040 91.040 runner.py:509(teardown_exact) 1945 0.002 0.000 90.722 0.047 events.py:82(_run) ``` after this change: ``` ncalls tottime percall cumtime percall filename:lineno(function) 4/3 0.000 0.000 8.271 2.757 runner.py:219(call_and_report) 44/11 0.000 0.000 8.270 0.752 _hooks.py:427(__call__) 44/11 0.000 0.000 8.270 0.752 _manager.py:103(_hookexec) 44/11 0.000 0.000 8.270 0.752 _callers.py:30(_multicall) 4/3 0.000 0.000 8.269 2.756 runner.py:247(call_runtest_hook) 4/3 0.000 0.000 8.269 2.756 runner.py:318(from_call) 4/3 0.000 0.000 8.269 2.756 runner.py:262(<lambda>) 48 0.000 0.000 8.269 0.172 {method 'send' of 'generator' objects} 27 0.000 0.000 5.671 0.210 contextlib.py:141(__exit__) 11 0.000 0.000 4.297 0.391 fixtures.py:686(<lambda>) 2/1 0.000 0.000 4.228 4.228 runner.py:111(pytest_runtest_protocol) 2/1 0.000 0.000 4.228 4.228 runner.py:119(runtestprotocol) 2 0.000 0.000 4.213 2.106 capture.py:877(pytest_runtest_teardown) 1 0.000 0.000 4.213 4.213 runner.py:180(pytest_runtest_teardown) 1 0.000 0.000 4.213 4.213 runner.py:509(teardown_exact) 2 0.000 0.000 3.628 1.814 capture.py:872(pytest_runtest_call) 1 0.000 0.000 3.627 3.627 runner.py:160(pytest_runtest_call) 1 0.000 0.000 3.627 3.627 python.py:1797(runtest) 114/81 0.001 0.000 3.505 0.043 {built-in method builtins.next} 15 0.784 0.052 3.183 0.212 subprocess.py:417(check_output) ``` Fixes #16516 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16523 (cherry picked from commit `642652efab`)	2024-05-15 14:32:43 +08:00
Kefu Chai	03a54a4c07	tools/scylla-sstable: add `scylla sstable shard-of` command when migrating to the uuid-based identifiers, the mapping from the integer-based generation to the shard-id is preserved. we used to have "gen % smp_count" for calculating the shard which is responsible to host a given sstable. despite that this is not a documented behavior, this is handy when we try to correlate an sstable to a shard, typically when looking at a performance issue. in this change, a new subcommand is added to expose the connection between the sstable and its "owner" shards. Fixes #16343 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16345 (cherry picked from commit `273ee36bee`)	2024-05-15 14:32:42 +08:00
Lakshmi Narayanan Sreethar	4b0c60cdc3	compaction: improve partition estimates for garbage collected sstables When a compaction strategy uses garbage collected sstables to track expired tombstones, do not use complete partition estimates for them, instead, use a fraction of it based on the droppable tombstone ratio estimate. Fixes #18283 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18465 (cherry picked from commit `d39adf6438`) Closes scylladb/scylladb#18656	2024-05-14 07:53:07 +03:00
Patryk Wrobel	28d0fc1b6b	scylla_io_setup: ensure correct RLIMIT_NOFILE for iotune The default limit of open file descriptors per process may be too small for iotune on certain machines with large number of cores. In such case iotune reports failure due to unability to create files or to set up seastar framework. This change configures the limit of open file descriptors before running iotune to ensure that the failure does not occur. The limit is set via 'resource.setrlimit()' in the parent process. The limit is then inherited by the child process. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> (cherry picked from commit `ec820e214c`) Closes scylladb/scylladb#18655	2024-05-14 07:48:53 +03:00
Israel Fruchter	393880f355	Update tools/cqlsh submodule to v6.0.17 Mostly a set of fixes in the area of ssl handling * tools/cqlsh 99b2b777...9d49b385 (21): > cqlshlib/sslhandling: fix logic of `ssl_check_hostname` > cqlshlib/sslhandling.py: don't use empty userkey/usercert > Dockerfile: noninteractive isn't enough for answering yet on apt-get > fix cqlsh version print > cqlshlib/sslhandling: change `check_hostname` deafult to False > Introduce new ssl configuration for disableing check_hostname > set the hostname in ssl_options.server_hostname when SSL is used > issue-73 Fixed a bug where username and password from the credentials file were ignored. > issue-73 Fixed a bug where username and password from the credentials file were ignored. > issue-73 > github actions: update `cibuildwheel==v2.16.5` > dist/debian: fix the trailer line format > `COPY TO STDOUT` shouldn't put None where a function is expected > Make cqlsh work with unix domain sockets > Bump python-driver version > dist/debian: add trailer line > dist/debian: wrap long line > Draft: explicit build-time packge dependencies > stop retruning status_code=2 on schema disagreement > Fix minor typos in the code > Dockerfile: apt-get update and apt-get upgrade to get latest OS packages Ref: #18590 Closes scylladb/scylladb#18652	2024-05-14 07:47:37 +03:00
Botond Dénes	e008060f39	Merge 'doc: fix Rust Driver release information' from Anna Stuchlik This PR removes the incorrect information that the ScyllaDB Rust Driver is not GA. In addition, it replaces "Scylla" with "ScyllaDB". Fixes https://github.com/scylladb/scylladb/issues/16178 Closes scylladb/scylladb#16199 * github.com:scylladb/scylladb: doc: remove the "preview" label from Rust driver doc: fix Rust Driver release information (cherry picked from commit `56c3515751`)	2024-05-10 12:22:03 +02:00
Aleksandra Martyniuk	7589981898	tasks: use default task_ttl in scylla.yaml Currently default task_ttl_in_seconds is 0, but scylla.yaml changes the value to 10. Change task_ttl_in_seconds in scylla.yaml to 0, so that there are consistent defaults. Comment it out. Fixes: #16714. (cherry picked from commit `67bbaad62e`) Closes scylladb/scylladb#18584	2024-05-09 16:10:14 +03:00
Kamil Braun	ed89deab40	direct_failure_detector: increase ping timeout and make it tunable The direct failure detector design is simplistic. It sends pings sequentially and times out listeners that reached the threshold (i.e. didn't hear from a given endpoint for too long) in-between pings. Given the sequential nature, the previous ping must finish so the next ping can start. We timeout pings that take too long. The timeout was hardcoded and set to 300ms. This is too low for wide-area setups -- latencies across the Earth can indeed go up to 300ms. 3 subsequent timed out pings to a given node were sufficient for the Raft listener to "mark server as down" (the listener used a threshold of 1s). Increase the ping timeout to 600ms which should be enough even for pinging the opposite side of Earth, and make it tunable. Increase the Raft listener threshold from 1s to 2s. Without the increased threshold, one timed out ping would be enough to mark the server as down. Increasing it to 2s requires 3 timed out pings which makes it more robust in presence of transient network hiccups. In the future we'll most likely want to decrease the Raft listener threshold again, if we use Raft for data path -- so leader elections start quickly after leader failures. (Faster than 2s). To do that we'll have to improve the design of the direct failure detector. Ref: scylladb/scylladb#16410 Fixes: scylladb/scylladb#16607 --- I tested the change manually using `tc qdisc ... netem delay`, setting network delay on local setup to ~300ms with jitter. Without the change, the result is as observed in scylladb/scylladb#16410: interleaving ``` raft_group_registry - marking Raft server ... as dead for Raft groups raft_group_registry - marking Raft server ... as alive for Raft groups ``` happening once every few seconds. The "marking as dead" happens whenever we get 3 subsequent failed pings, which is happens with certain (high) probability depending on the latency jitter. Then as soon as we get a successful ping, we mark server back as alive. With the change, the phenomenon no longer appears. (cherry picked from commit `8df6d10e88`) Closes scylladb/scylladb#18559	2024-05-08 14:57:09 +02:00
Pavel Emelyanov	905b8f59bd	Update seastar submodule (iotune iodepth underflow fix) * seastar ae05c138...cfb015d0 (1): > iotune: ignore shards with id above max_iodepth Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-06 19:27:31 +03:00
Nadav Har'El	862e2affe0	cql3: Fix invalid JSON parsing for JSON object with different key types More than three years ago, in issue #7949, we noticed that trying to set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the fromJson() function) fails - the ascii key is incorrectly parsed. We fixed that issue in commit `75109e9519` but unfortunately, did not do our due diligence: We did not write enough tests inspired by this bug, and failed to discover that actually we have the same bug for many other key types, not just for "ascii". Specifically, the following key types have exactly the same bug: * blob * date * inet * time * timestamp * timeuuid * uuid Other types, like numbers or boolean worked "by accident" - instead of parsing them as a normal string, we asked the JSON parser to parse them again after removing the quotes, and because unquoted numbers and unquoted true/false happwn to work in JSON, this didn't fail. The fix here is very simple - for all native types (i.e., not collections or tuples), the encoding of the key in JSON is simply a quoted string - and removing the quotes is all we need to do and there's no need to run the JSON parser a second time. Only for more elaborate types - collections and tuples - we need to run the JSON parser a second time on the key string to build the more elaborate object. This patch also includes tests for fromJson() reading a map with all native key types, confirming that all the aforementioned key types were broken before this patch, and all key types (including the numbers and booleans which worked even befoe this patch) work with this patch. Fixes #18477. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `21557cfaa6`) Closes scylladb/scylladb#18522	2024-05-05 23:53:19 +03:00
Pavel Emelyanov	d68d765247	view-builder: Print correct exception in built ste exception handler Inside .handle_exception() continuation std::current_exception() doesn't work, there's std::exception ex argument to handler's lambda instead fixes #18423 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18349 (cherry picked from commit `4ac30e5337`)	2024-05-01 10:19:28 +03:00
Michał Chojnowski	d8df02f490	docs: clarify that `DELETE` can be used with `USING TIMEOUT` The current text seems to suggest that `USING TIMEOUT` doesn't work with `DELETE` and `BATCH`. But that's wrong. Closes scylladb/scylladb#18424 (cherry picked from commit `c1146314a1`)	2024-05-01 10:17:05 +03:00
Anna Stuchlik	d85d37921a	doc: run repair after changing RF of system_auth This commit adds the requirement to run repair after changing the replication factor of the system_auth keyspace in the procedure of adding a new node to a cluster. Refs: https://github.com/scylladb/scylla-enterprise/issues/4129 Closes scylladb/scylladb#18466	2024-04-30 19:15:55 +03:00
Asias He	2d4825835c	streaming: Fix use after move in fire_stream_event The event is used in a loop. Found by clang-tidy: ``` streaming/stream_result_future.cc:80:49: warning: 'event' used after it was moved [bugprone-use-after-move] listener->handle_stream_event(std::move(event)); ^ streaming/stream_result_future.cc:80:39: note: move occurred here listener->handle_stream_event(std::move(event)); ^ streaming/stream_result_future.cc:80:49: note: the use happens in a later loop iteration than the move listener->handle_stream_event(std::move(event)); ^ ``` Fixes #18332 Closes scylladb/scylladb#18333 (cherry picked from commit `1ca779d287`)	2024-04-26 11:00:01 +03:00
Lakshmi Narayanan Sreethar	201d990072	sstables: reclaim_memory_from_components: do not update _recognised_components When reclaiming memory from bloom filters, do not remove them from _recognised_components, as that leads to the on-disk filter component being left back on disk when the SSTable is deleted. Fixes #18398 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18400 (cherry picked from commit `6af2659b57`)	2024-04-26 10:59:13 +03:00
Wojciech Mitros	678948e671	mv: keep semaphore units alive until the end of a remote view update When a view update has both a local and remote target endpoint, it extends the lifetime of its memory tracking semaphore units only until the end of the local update, while the resources are actually used until the remote update finishes. This patch changes the semaphore transferring so that in case of both local and remote endpoints, both view updates share the units, causing them to be released only after the update that takes longer finishes. Fixes #17890 (cherry picked from commit `9789a3dc7c`) Refs #17891 Closes scylladb/scylladb#18108	2024-04-25 12:45:01 +02:00
Raphael S. Carvalho	8acedb9255	sstables: Fix use-after-move in an error path of FS-based sstable writer ``` sstables/storage.cc:152:21: warning: 'file_path' used after it was moved [bugprone-use-after-move] remove_file(file_path).get(); ^ sstables/storage.cc:145:64: note: move occurred here auto w = file_writer(output_stream<char>(std::move(sink)), std::move(file_path)); ``` It's a regression when TOC is found for a new sstable, and we try to delete temporary TOC. courtesy of clang-tidy. Fixes #18323. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `2fba1f936d`) Closes scylladb/scylladb#18382	2024-04-25 13:13:48 +03:00
Kefu Chai	3ed0826292	thrift: avoid use-after-move in `make_non_overlapping_ranges()` in handler.cc, `make_non_overlapping_ranges()` references a moved instance of `ColumnSlice` when something unexpected happens to format the error message in an exception, the move constructor of `ColumnSlice` is default-generated, so the members' move constructors are used to construct the new instance in the move constructor. this could lead to undefined behavior when dereferencing the move instance. in this change, in order to avoid use-after free, let's keep a copy of the referenced member variables and reference them when formatting error message in the exception. this use-after-move issue was introduced in `822a315dfa`, which implemented `get_multi_slice` verb and this piece in the first place. since both 5.2 and 5.4 include this commit, we should backport this change to them. Refs `822a315dfa` Fixes #18356 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `1ad3744edc`) Closes scylladb/scylladb#18374	2024-04-25 11:35:38 +03:00
Botond Dénes	154fcffbc7	Merge '[branch-5.4] Fix false-positive errors in scrub validate-mode' from Botond Dénes The new MX-native validator, which validates the index in tandem with the data file, was discovered to print false-positive errors, related to range-tombstones and promoted-index positions. This series fixes that. But first, it refactors the scrub-related tests. These are currently dominated by boiler-plate code. They are hard to read and hard to write. In the first half of the series, a new scrub_test is introduced, which moves all the boiler-plate to a central place, allowing the tests to focus on just the aspect of scrub that is tested. Then, all the found bugs in validate are fixed and finally a new test, checking validate with valid sstable is introduced. This PR backports https://github.com/scylladb/scylladb/pull/16327. Fixes: https://github.com/scylladb/scylladb/issues/16326 Closes scylladb/scylladb#18404 * github.com:scylladb/scylladb: test/boost/sstable_compaction_test: add validation test with valid sstable sstablex/mx/reader: validate(): print trace message when finishing the PI block sstablex/mx/reader: validate(): make index-data PI position check message consistent sstablex/mx/reader: validate(): only load the next PI block if current is exhausted sstablex/mx/reader: validate(): reset the current PI block on partition-start sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked sstablex/mx/reader: validate(): fix validator for range tombstone end bounds test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper test/boost/sstable_compaction_test: fix indentation test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test test/boost/sstable_compaction_test: introduce scrub_test_framework test/lib/random_schema: add uncompatible_timestamp_generator()	2024-04-25 08:36:30 +03:00
Jenkins Promoter	7e946925c3	Update ScyllaDB version to: 5.4.7	2024-04-24 23:13:50 +03:00
Botond Dénes	d09e2a2311	test/boost/sstable_compaction_test: add validation test with valid sstable Add a positive test, as it turns out we had some false-positive validation bugs in the validator and we need a regression test for this. (cherry picked from commit `2335f42b2b`)	2024-04-24 09:38:57 -04:00
Botond Dénes	6462a2f391	sstablex/mx/reader: validate(): print trace message when finishing the PI block (cherry picked from commit `a19a2d76c9`)	2024-04-24 09:37:15 -04:00
Botond Dénes	2e19e7cb6d	sstablex/mx/reader: validate(): make index-data PI position check message consistent The message says "index-data" but when printing the position, the data position is printed first, causing confusion. Fix this and while at it, also print the position of the partition start. (cherry picked from commit `677be168c4`)	2024-04-24 09:33:36 -04:00
Botond Dénes	fef7498da2	sstablex/mx/reader: validate(): only load the next PI block if current is exhausted The validate() consumes the content of partitions in a consume-loop. Every time the consumer asks for a "break", the next PI block is loaded and set on the validator, so it can validate that further clustering elements are indeed from this block. This loop assumed the consumer would only request interruption when the current clustering block is finished. This is wrong, the consumer can also request interruption when yielding is needed. When this is the case, the next PI block doesn't have to be loaded yet, the current one is not exhausted yet. Check this condition, before loading the next PI block, to prevent false positive errors, due to mismatched PI block and clustering elements from the sstable. (cherry picked from commit `5bff7c40d3`)	2024-04-24 09:32:28 -04:00
Botond Dénes	f45c878149	sstablex/mx/reader: validate(): reset the current PI block on partition-start It is possible that the next partition has no PI and thus there won't be a new PI block to overwrite the old one. This will result in false-positive messages about rows being outside of the finished PI block. (cherry picked from commit `e073df1dbb`)	2024-04-24 09:31:10 -04:00
Botond Dénes	7225410af7	sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked Promoted index entries can be written on any clustering elements, icluding range tombstones. So the validating consumer also has the check whether the current expected clustering block is finished, when consuming a range tombstone. If it is, consumption has to be interrupted, so that the outer-loop can load up the next promoted index block, before moving on to the next clustering element. (cherry picked from commit `2737899c21`)	2024-04-24 09:29:32 -04:00
Botond Dénes	564b01fda9	sstablex/mx/reader: validate(): fix validator for range tombstone end bounds For range tombstone end-bounds, the validate_fragment_order() should be passed a null tombstone, not a disengaged optional. The latter means no change in the current tombstone. This caused the end bound of range tombstones to not make it to the validator and the latter complained later on partition-end that the partition has unclosed range tombstone. (cherry picked from commit `f46b458f0d`)	2024-04-24 09:28:18 -04:00
Botond Dénes	1d80427888	test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper It is not used anymore. (cherry picked from commit `8be97884ec`)	2024-04-24 09:26:46 -04:00
Botond Dénes	ff17ec81e4	test/boost/sstable_compaction_test: fix indentation (cherry picked from commit `da0f4d3a9f`)	2024-04-24 09:26:30 -04:00
Botond Dénes	121f2a530e	test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch. (cherry picked from commit `c35092aff6`)	2024-04-24 09:03:44 -04:00
Botond Dénes	b77581c84c	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch. (cherry picked from commit `3f76aad609`)	2024-04-24 08:58:18 -04:00
Botond Dénes	ea176bf4ce	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test The test becomes a lot shorter and it now uses random schema and random data. The test is also split in two: one test for abort mode and one for skip mode. Indentation is left broken, to be fixed in a future patch. (cherry picked from commit `5237e8133b`)	2024-04-24 08:45:53 -04:00
Botond Dénes	3835fd681d	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch. (cherry picked from commit `76785baf43`)	2024-04-24 07:57:54 -04:00
Botond Dénes	14da273c4c	test/boost/sstable_compaction_test: introduce scrub_test_framework Scrub tests require a lot of boilerplate code to work. This has a lot of disadvantages: * Tests are long * The "meat" of the test is lost between all the boiler-plate, it is hard to glean what a test actually does * Tests are hard to write, so we have only a few of them and they test multiple things. * The boiler-plate differs sligthly from test-to-test. To solve this, this patch introduces a new class, `scrub_test_frawmework`, which is a central place for all the boiler-plate code needed to write scrub-related tests. In the next patches, we will migrate scrub related tests to this class. (cherry picked from commit `b6f0c4efa0`)	2024-04-24 07:57:54 -04:00
Botond Dénes	33d5f27244	test/lib/random_schema: add uncompatible_timestamp_generator() Guarantees that produced mutations will not be compactible. (cherry picked from commit `e412673c44`)	2024-04-24 07:57:54 -04:00
Wojciech Mitros	c4515a9b99	mv: adjust memory tracking of single view updates within a batch Currently, when dividing memory tracked for a batch of updates we do not take into account the overhead that we have for processing every update. This patch adds the overhead for single updates and joins the memory calculation path for batches and their parts so that both use the same overhead. Fixes #17854 (cherry picked from commit `efcb718e0a`) Closes scylladb/scylladb#18107	2024-04-24 09:42:18 +02:00
Asias He	10f137e367	repair: Improve estimated_partitions to reduce memory usage Currently, we use the sum of the estimated_partitions from each participant node as the estimated_partitions for sstable produced by repair. This way, the estimated_partitions is the biggest possible number of partitions repair would write. Since repair will write only the difference between repair participant nodes, using the biggest possible estimation will overestimate the partitions written by repair, most of the time. The problem is that overestimated partitions makes the bloom filter consume more memory. It is observed that it causes OOM in the field. This patch changes the estimation to use a fraction of the average partitions per node instead of sum. It is still not a perfect estimation but it already improves memory usage significantly. Fixes #18140 Closes scylladb/scylladb#18141 (cherry picked from commit `642f9a1966`) scylla-5.4.6 scylla-5.4.6-candidate	2024-04-18 11:40:06 +03:00
Kamil Braun	53e1ed0ebb	Merge '[Backport 5.4] gossiper: lock local endpoint when updating heart_beat' from ScyllaDB In testing, we've observed multiple cases where nodes would fail to observe updated application states of other nodes in gossiper. For example: - in scylladb/scylladb#16902, a node would finish bootstrapping and enter NORMAL state, propagating this information through gossiper. However, other nodes would never observe that the node entered NORMAL state, still thinking that it is in joining state. This would lead to further bad consequences down the line. - in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for schema versions to converge. Convergence would never be achieved and the test eventually timed out. The node was observing outdated schema state of some existing node in gossip. I created a test that would bootstrap 3 nodes, then wait until they all observe each other as NORMAL, with timeout. Unfortunately, thousands of runs of this test on different machines failed to reproduce the problem. After banging my head against the wall failing to reproduce, I decided to sprinkle randomized sleeps across multiple places in gossiper code and finally: the test started catching the problem in about 1 in 1000 runs. With additional logging and additional head-banging, I determined the root cause. The following scenario can happen, 2 nodes are sufficient, let's call them A and B: - Node B calls `add_local_application_state` to update its gossiper state, for example, to propagate its new NORMAL status. - `add_local_application_state` takes a copy of the endpoint_state, and updates the copy: ``` auto local_state = ep_state_before; for (auto& p : states) { auto& state = p.first; auto& value = p.second; value = versioned_value::clone_with_higher_version(value); local_state.add_application_state(state, value); } ``` `clone_with_higher_version` bumps `version` inside gms/version_generator.cc. - `add_local_application_state` calls `gossiper.replicate(...)` - `replicate` works in 2 phases to achieve exception safety: in first phase it copies the updated `local_state` to all shards into a separate map. In second phase the values from separate map are used to overwrite the endpoint_state map used for gossiping. Due to the cross-shard calls of the 1 phase, there is a yield before the second phase. During this yield* the following happens: - `gossiper::run()` loop on B executes and bumps node B's `heart_beat`. This uses the monotonic version_generator, so it uses a higher version then the ones we used for states added above. Let's call this new version X. Note that X is larger than the versions used by application_states added above. - now node B handles a SYN or ACK message from node A, creating an ACK or ACK2 message in response. This message contains: - old application states (NOT including the update described above, because `replicate` is still sleeping before phase 2), - but bumped heart_beat == X from `gossiper::run()` loop, and sends the message. - node A receives the message and remembers that the max version across all states (including heart_beat) of node B is X. This means that it will no longer request or apply states from node B with versions smaller than X. - `gossiper.replicate(...)` on B wakes up, and overwrites endpoint_state with the ones it saved in phase 1. In particular it reverts heart_beat back to smaller value, but the larger problem is that it saves updated application_states that use versions smaller than X. - now when node B sends the updated application_states in ACK or ACK2 message to node A, node A will ignore them, because their versions are smaller than X. Or node B will never send them, because whenever node A requests states from node B, it only requests states with versions > X. Either way, node A will fail to observe new states of node B. If I understand correctly, this is a regression introduced in `38c2347a3c`, which introduced a yield in `replicate`. Before that, the updated state would be saved atomically on shard 0, there could be no `heart_beat` bump in-between making a copy of the local state, updating it, and then saving it. With the description above, it's easy to make a consistent reproducer for the problem -- introduce a longer sleep in `add_local_application_state` before second phase of replicate, to increase the chance that gossiper loop will execute and bump heart_beat version during the yield. Further commit adds a test based on that. The fix is to bump the heart_beat under local endpoint lock, which is also taken by `replicate`. The PR also adds a regression test. Fixes: scylladb/scylladb#15393 Fixes: scylladb/scylladb#15602 Fixes: scylladb/scylladb#16668 Fixes: scylladb/scylladb#16902 Fixes: scylladb/scylladb#17493 Fixes: scylladb/scylladb#18118 Ref: scylladb/scylla-enterprise#3720 (cherry picked from commit `a0b331b310`) (cherry picked from commit `72955093eb`) Refs scylladb/scylladb#18184 Closes scylladb/scylladb#18245 * github.com:scylladb/scylladb: test: reproducer for missing gossiper updates gossiper: lock local endpoint when updating heart_beat	2024-04-17 17:50:30 +02:00
Botond Dénes	1aedc7372d	Merge '[Backport 5.4] : Track and limit memory used by bloom filters' from Lakshmi Narayanan Sreethar Added support to track and limit the memory usage by sstable components. A reclaimable component of an SSTable is one from which memory can be reclaimed. SSTables and their managers now track such reclaimable memory and limit the component memory usage accordingly. A new configuration variable defines the memory reclaim threshold. If the total memory of the reclaimable components exceeds this limit, memory will be reclaimed to keep the usage under the limit. This PR considers only the bloom filters as reclaimable and adds support to track and limit them as required. The feature can be manually verified by doing the following : 1. run a single-node single-shard 1GB cluster 2. create a table with bloom-filter-false-positive-chance of 0.001 (to intentionally cause large bloom filter) 3. populate with tiny partitions 4. watch the bloom filter metrics get capped at 100MB The default value of the `components_memory_reclaim_threshold` config variable which controls the reclamation process is `.1`. This can also be reduced further during manual tests to easily hit the threshold and verify the feature. Fixes https://github.com/scylladb/scylladb/issues/17747 Backported from #17771 to 5.4. Closes scylladb/scylladb#18248 * github.com:scylladb/scylladb: test_bloom_filter.py: disable reclaiming memory from components sstable_datafile_test: add tests to verify auto reclamation of components test/lib: allow overriding available memory via test_env_config sstables_manager: support reclaiming memory from components sstables_manager: store available memory size sstables_manager: add variable to track component memory usage db/config: add a new variable to limit memory used by table components sstable_datafile_test: add testcase to verify reclamation from sstables sstables: support reclaiming memory from components	2024-04-17 14:33:19 +03:00
Kamil Braun	28781ca37e	test: reproducer for missing gossiper updates Regression test for scylladb/scylladb#17493. (cherry picked from commit `72955093eb`) Backport note: removed `timeout` parameter passed to `server_add`, missing on this branch. (If server adding hangs, it will timeout after `TOPOLOGY_TIMEOUT` from scylla_cluster.py) Removed `force_gossip_join_boot` error injection from test, not present in this branch. Starting nodes with `experimental_features` disabled. Added missing `handle_state_normal.*finished` message.	2024-04-17 13:09:39 +02:00
Beni Peled	9218bbb9b9	test.py: add the pytest junit_suite_name parameter By default the suitename in the junit files generated by pytest is named `pytest` for all suites instead of the suite, ex. `topology_experimental_raft` With this change, the junit files will use the real suitename This change doesn't affect the Test Report in Jenkins, but it raised part of the other task of publishing the test results to elasticsearch https://github.com/scylladb/scylla-pkg/pull/3950 where we parse the XMLs and we need the correct suitename Closes scylladb/scylladb#18172 (cherry picked from commit `223275b4d1`)	2024-04-17 07:02:33 +03:00

1 2 3 4 5 ...

39578 Commits