scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 06:53:12 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	c905554121	test: sstables: sstable_3_x_test: Improve failure message	2024-10-03 14:16:05 +02:00
Tomasz Grabiec	7f077893ed	sstables: mx: writer: Never include partition_end marker in promoted index block width Currently, it may happen that the last promoted index block includes the partition_end marker. That's because we first write the partition end marker and then emit the unclosed block. This behavior matches Cassandra (checked in 3.x and 5.0.1). This is problematic for ruling out data file reads based on index. The width field is currently unused, but it will be used later where the width of the last block is used to compute the skip position past the last block for lookups which land after all keys in the partition. If width includes the marker then such a skip would land in the next partition, which is incorrect, as the reader context expects a cell element. Even if that was recognized, it's wrong - if this is not a single partition read (so upper bound is not at the next partition too), then we would read from the wrong (next) partition. We want to be able to make such skips in order to avoid unnecessary data file IO for reads of missing rows. Currently, we would always read the last block even if the key is past its "end" position. Another way to solve this would be to propagate the "past the last block" condition from the index cursor to the reader and let it deal with it, but the logic for that would be complicated. With this fix, there is no special logic required.	2024-10-03 14:09:57 +02:00
Tomasz Grabiec	a29501ed67	sstables: Reduce amount of I/O for clustering-key-bounded reads from large partitions Single-row reads from large partition issue 64 KiB reads to the data file, which is equal to the default span of the promoted index block in the data file. If users would want to reduce selectivity of the index to speed up single-row reads, this won't be effective. The reason is that the reader uses promoted index to look up the start position in the data file of the read, but end position will in practice extend to the next partition, and amount of I/O will be determined by the underlying file input stream implementation and its read-ahead heuristics. By default, that results in at least 2 IOs 32KB each. There is already infrastructure to lookup end position based on upper bound of the read, but it's not effective becasue it's a non-populating lookup and the upper bound cursor has its own private cached_promoted_index, which is cold when positions are computed. It's non-populating on purpose, to avoid extra index file IO to read upper bound. In case upper bound is far-enough from the lower bound, this will only increase the cost of the read. The solution employed here is to warm up the lower bound cursor's cache before positions are computed, and use that cursor for non-populating lookup of the upper bound. We use the lower bound cursor and the slice's lower bound so that we read the same blocks as later lower-bound slicing would, so that we don't incur extra IO for cases where looking up upper bound is not worth it, that is when upper bound is far from the lower bound. If upper bound is near lower bound, then warming up using lower bound will populate cached_promoted_index with blocks which will allow us to locate the upper bound block accurately. This is especially important for single-row reads, where the bounds are around the same key. In this case we want to read the data file range which belongs to a single promoted index block. It doesn't matter that the upper bound is not exactly the same. They both will likely lie in the same block, and if not, binary search will bring adjacent blocks into cache. Even if upper bound is not near, the binary search will populate the cache with blocks which can be used to narrow down the data file range somewhat. Fixes #10030. The change was tested with perf-fast-forward. I populated the data set with `column_index_size_in_kb` set to 1 scylla perf-fast-forward --populate --run-tests=large-partition-slicing --column-index-size-in-kb=1 Test run: build/release/scylla perf-fast-forward --run-tests=large-partition-select-few-rows -c1 --keep-cache-across-test-cases --test-case-duration=0 This test reads two rows from the middle of a large partition (1M rows), of subsequent keys. The first read will miss in the index file page cache, the second read will hit. Notice that before the change, the second read issued 2 aio requests worth of 64KiB in total. After the change, the second read issued 1 aio worth of 2 KiB. That's because promoted index block is larger than 1 KiB. I verified using logging that the data file range matches a single promoted index block. Also, the first read which misses in cache is still faster after the change. Before: running: large-partition-select-few-rows on dataset large-part-ds1 Testing selecting few rows from a large partition: stride rows time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu 500000 1 0.009802 1 1 102 0 102 102 21.0 21 196 2 1 0 1 1 0 0 0 568 269 4716050 53.4% 500001 1 0.000321 1 1 3113 0 3113 3113 2.0 2 64 1 0 1 0 0 0 0 0 116 26 555110 45.0% After: running: large-partition-select-few-rows on dataset large-part-ds1 Testing selecting few rows from a large partition: stride rows time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu 500000 1 0.009609 1 1 104 0 104 104 20.0 20 137 2 1 0 1 1 0 0 0 561 268 4633407 43.1% 500001 1 0.000217 1 1 4602 0 4602 4602 1.0 1 2 1 0 1 0 0 0 0 0 110 26 313882 64.1% (cherry picked from commit dfb339376aff1ed961b26c4759b1604f7df35e54)	2024-10-01 18:40:34 +02:00
Tomasz Grabiec	41be5d1daf	sstables: clustered_cursor: Track current block Will be needed by the reader to jump to the current block even if we already advanced to it before, when setting up the reader context. We want to advance to lower bound earlier, before the praser skips to the lower bound. We want that in order to set input stream data file range based on index. If we didn't have access to the current block and used the result from advance_to(), the parser will think we're already in the block which has lower_bound when it attempts to skip, and will not skip, falling back to scanning.	2024-10-01 18:40:34 +02:00
Avi Kivity	fb8743b2d6	Merge 'sstables: Fix use-after-free on page cache buffer when parsing promoted index entries across pages' from Tomasz Grabiec This fixes a use-after-free bug when parsing clustering key across pages. Also includes a fix for allocating section retry, which is potentially not safe (not in practice yet). Details of the first problem: Clustering key index lookup is based on the index file page cache. We do a binary search within the index, which involves parsing index blocks touched by the algorithm. Index file pages are 4 KB chunks which are stored in LSA. To parse the first key of the block, we reuse clustering_parser, which is also used when parsing the data file. The parser is stateful and accepts consecutive chunks as temporary_buffers. The parser is supposed to keep its state across chunks. In `93482439`, the promoted index cursor was optimized to avoid fully page copy when parsing index blocks. Instead, parser is given a temporary_buffer which is a view on the page. A bit earlier, in `b1b5bda`, the parser was changed to keep shared fragments of the buffer passed to the parser in its internal state (across pages) rather than copy the fragments into a new buffer. This is problematic when buffers come from page cache because LSA buffers may be moved around or evicted. So the temporary_buffer which is a view on the LSA buffer is valid only around the duration of a single consume() call to the parser. If the blob which is parsed (e.g. variable-length clustering key component) spans pages, the fragments stored in the parser may be invalidated before the component is fully parsed. As a result, the parsed clustering key may have incorrect component values. This never causes parsing errors because the "length" field is always parsed from the current buffer, which is valid, and component parsing will end at the right place in the next (valid) buffer. The problematic path for clustering_key parsing is the one which calls primitive_consumer::read_bytes(), which is called for example for text components. Fixed-size components are not parsed like this, they store the intermediate state by copying data. This may cause incorrect clustering keys to be parsed when doing binary search in the index, diverting the search to an incorrect block. Details of the solution: We adapt page_view to a temporary_buffer-like API. For this, a new concept is introduced called ContiguousSharedBuffer. We also change parsers so that they can be templated on the type of the buffer they work with (page_view vs temporary_buffer). This way we don't introduce indirection to existing algorithms. We use page_view instead of temporary_buffer in the promoted index parser which works with page cache buffers. page_view can be safely shared via share() and stored across allocating sections. It keeps hold to the LSA buffer even across allocating sections by the means of cached_file::page_ptr. Fixes #20766 Closes scylladb/scylladb#20837 * github.com:scylladb/scylladb: sstables: bsearch_clustered_cursor: Add trace-level logging sstables: bsearch_clustered_cursor: Move definitions out of line test, sstables: Verify parsing stability when allocating section is retried test, sstables: Verify parsing stability when buffers cross page boundary sstables: bsearch_clustered_cursor: Switch parsers to work with page_view cached_file: Adapt page_view to ContiguousSharedBuffer cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start sstables, utils: Allow parsers to work with different buffer types sstables: promoted_index_block_parser: Make reset() always bring parser to initial state sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried	2024-10-01 00:02:55 +03:00
Calle Wilund	b5d167699c	commitlog: Fix buffer_list_bytes not updated correctly Fixes #20862 With the change in `60af2f3cb2` the bookkeep for buffer memory was changed subtly, the problem here that we would shrink buffer size before we after flush use said buffer's size to decrement the buffer_list_bytes value, previously inc:ed by the full, allocated size. I.e. we would slowly grow this value instead of adjusting properly to actual used bytes. Test included. Closes scylladb/scylladb#20886	2024-09-30 18:04:00 +03:00
Raphael S. Carvalho	cf58674029	replica: Fix schema change during migration cleanup During migration cleanup, there's a small window in which the storage group was stopped but not yet removed from the list. So concurrent operations traversing the list could work with stopped groups. During a test which emitted schema changes during migrations, a failure happened when updating the compaction strategy of a table, but since the group was stopped, the compaction manager was unable to find the state for that group. In order to fix it, we'll skip stopped groups when traversing the list since they're unused at this stage of migration and going away soon. Fixes #20699. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#20798	2024-09-30 17:30:38 +03:00
David Garcia	b94fbbf30c	docs: update command Removes the update command from the setup command. This is required because versions now are not strictly pinned in the poetry.lock file since Sphinx ScyllaDB Theme 1.8. Closes scylladb/scylladb#20876	2024-09-30 17:06:07 +03:00
Andrei Chekun	cdd0c0b7fc	test.py: Do not attach logs for passed tests To reduce the amount of space needed for reports, this PR will modify logs attachment in allure, so it will attach logs only for the tests that have status other than PASSED. To simplify the solution, with the current way it's not possible to switch off these logs completely. Closes scylladb/scylladb#20786	2024-09-30 14:55:55 +02:00
Kefu Chai	1c8100d3f1	test/unit: remove unused #include following headers are no longer used by this compilation unit: - "utils/managed_ref.hh" - "test/perf/perf.hh" this was identified by clang-include-cleaner. As the code is audited, we can safely remove the #include directive. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20850	2024-09-30 14:46:39 +03:00
Kefu Chai	c3be4a36af	test.py: pass "count" to re.sub() with kwarg since Python 3.13, passing count to `re.sub()` as positional argument has been deprecated. and when runnint `test.py` with Python 3.13, we have following warning: ``` /home/kefu/dev/scylladb/./test.py:1477: DeprecationWarning: 'count' is passed as positional argument args.tests = set(re.sub(r'.* List configured unit tests\n(.*)\n', r'\1', out, 1, re.DOTALL).split("\n")) ``` see also https://github.com/python/cpython/issues/56166 in order to silence this distracting warning, let's pass `count` using kwarg. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20859	2024-09-30 13:57:02 +03:00
Kefu Chai	947d9d5a97	scylla_coredump_setup: fix typos in comment these typos were identified by the codespell workflow. and fixed a syntax error along the way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20877	2024-09-30 13:29:34 +03:00
Aleksandra Martyniuk	efc7ad8547	node_ops: fix task_manager_module::get_nodes() Currently, node ops virtual task gathers its children from all nodes contained in a sum of service::topology::normal_nodes and service::topology::transition_nodes. The maps may contain nodes that are down but weren't removed yet. So, if a user requests the status of a node ops virtual task, the task's attempt to retrieve its children list may fail with seastar::rpc::closed_error. Filter out the tasks that are down in node_ops::task_manager_module::get_nodes. Fixes: #20843. Closes scylladb/scylladb#20856	2024-09-30 12:32:23 +03:00
Pavel Emelyanov	423b5a3ba7	Merge 'directories: cleanups to silence clang-tidy false alarms' from Kefu Chai clang-tidy warns: ``` Warning: /__w/scylladb/scylladb/utils/directories.cc:132:52: warning: 'path' used after it was moved [bugprone-use-after-move] 132 \| bool can_access = co_await file_accessible(path.string(), access_flags::read \| access_flags::write \| access_flags::execute); \| ^ /__w/scylladb/scylladb/utils/directories.cc:121:28: note: move occurred here 121 \| verification_error(std::move(path), "File not owned by current euid: {}. Owner is: {}", geteuid(), sd.uid); \| ^ ``` because we pass `std::move(path)` to `verification_error()`, and "then" use this variable again in this same function. this is a false alarm, but we could make it very clear to convince this tool that it's safe to do so. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#20875 * github.com:scylladb/scylladb: directories: mark verification_error() with [[noreturn]] directories: pass const ref of path to verification_error()	2024-09-30 12:02:39 +03:00
Kamil Braun	322efb54c2	Merge 'raft_group0_client: place on a #include diet' from Avi Kivity Reduce compile time and unnecessary compilations by reducing #include load. Minor refactoring, no backport. Closes scylladb/scylladb#20864 * github.com:scylladb/scylladb: raft_group0_client: uninclude "raft_group0_registry.hh" raft_group_registry: extract raft_timeout raft_group0_client: uninclude "mutation/mutation.hh" raft_group0_client: uninclude "db/system_keyspace.hh" db: system_keyspace: extract auth_version_t into its own header	2024-09-30 10:43:44 +02:00
Kefu Chai	faec71e666	directories: mark verification_error() with [[noreturn]] this helps the compiler or static analyzers do make the right decision. for instance, clang-tidy thinks a parameter like `std::move(path)` could be reused after being moved away. with this attribute, this tool should be able to tell that this never happens. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-30 12:07:15 +08:00
Kefu Chai	0ef72475fc	directories: pass const ref of path to verification_error() before this change, we pass a `path` to `verification_error()` by moving away from the original `path`. this works fine in the sense that it is correct and does not incur potential performance issues. but clang-tidy considers it a used-after-move, because it cannot tell `verification_error()` does not return at all, and believes that `path` could be accessed again after being moved away. so it warns like: ``` Warning: /__w/scylladb/scylladb/utils/directories.cc:132:52: warning: 'path' used after it was moved [bugprone-use-after-move] 132 \| bool can_access = co_await file_accessible(path.string(), access_flags::read \| access_flags::write \| access_flags::execute); \| ^ /__w/scylladb/scylladb/utils/directories.cc:121:28: note: move occurred here 121 \| verification_error(std::move(path), "File not owned by current euid: {}. Owner is: {}", geteuid(), sd.uid); \| ^ ``` in this change, instead of passing `fs::path` to `verification_error()`, we pass a `const fs::path&` to this function. because `verification_error()` is not coroutine, neither does it not pass `path` to another continuation to be scheduled. so it's perfectly fine to pass `path` to it. this change address the false alarms from clang-tidy. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-30 12:07:15 +08:00
Nadav Har'El	64c0540d02	cql-pytest: test a few small materialized views syntax issue While documenting materialized view in a new document (Refs #16569) I encountered a few questions and this patch contains tests that clarify their answer - and can later guarantee that the answer doesn't unintentionally change in the future. The questions that these tests answer are: 1. It is not allowed to filter a view on a static column (a comment on the test explains why). 2. We already tested that it's not allowed to SELECT a static column into a view. Here we add the check that "SELECT *" is also not allowed if a static column exists in the base table. 3. We check that CREATE MATERIALIZED VIEW ... WITH COMMENT='..' works. 4. We check that CREATE MATERIALIZED VIEW ... WITH COMPACT STORAGE is forbidden. 5. We check that CREATE MATERIALIZED VIEW ... WITH garbage=.. fails with a clean InvalidRequest. All these tests pass on both Scylla and Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20873	2024-09-29 21:34:24 +03:00
Nadav Har'El	b008dabee5	test/cql-pytest: fix support for Cassandra 3 One of the design goals of the test/cql-pytest frameworks was to be able to run these tests against Cassandra. Preferably, we should be able to run most of the tests against any popular version of Cassandra, including Cassandra 3. This is admittingly a very old version, but was still maintained until just a year ago, it's the version that Scylla is most compatible with, and we can still be curious about how it worked. Until recently cql-pytest indeed worked on Cassandra 3, but it broke on some change related to tablet detection that cause our most basic fixture - "text_keyspace" - to use the Cassandra 4 feature of "auto expand". This is trivial to fix - we should just use the this_dc fixture that we already had exactly for this purpose. Fixes #20781 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20782	2024-09-29 19:36:33 +03:00
Botond Dénes	a4c41755de	Update seastar submodule * ./seastar 69f88e2f...3c9c2696 (14): > core/reactor: don't check AIO block count when they are not needed > build: do not print the default value of --c++-standard in help output > json_formatter: Add tests for formatter::write > Add APIs to get group details and to change ownership of file. > scripts/perftune.py: improve a dry-run printout > build: drop the workaround for a GCC bug > cmake: Depend on libbsd if DPDK depends on it > http: clarify the ownership in the router's doxygen comment > build: check for P2582R1 support > python: introduce a python formatting CI check > addr2line: reformat with black > scripts: add pyproject.toml > json_formatter: Make formatter::write work for std::pair > README.md: use the github homepage of Ceph for Crimson Closes scylladb/scylladb#20836	2024-09-29 13:47:40 +03:00
Avi Kivity	5a470b2bfb	Merge 'scylla_raid_setup: configure SELinux file context' from Takuya ASADA On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla. Fixes #19325 Closes scylladb/scylladb#20528 * github.com:scylladb/scylladb: scylla_raid_setup: configure SELinux file context scylla_coredump_setup: fix SELinux configuration for RHEL9	2024-09-29 12:53:00 +03:00
Avi Kivity	884297ae2e	raft_group0_client: uninclude "raft_group0_registry.hh" Reduce unnecessary recompilations.	2024-09-28 17:25:11 +03:00
Avi Kivity	67cdd0d389	raft_group_registry: extract raft_timeout It is a vocabulary term that shouldn't need the registry to be visible. Extract it to a new header.	2024-09-28 17:25:03 +03:00
Avi Kivity	93afc77307	raft_group0_client: uninclude "mutation/mutation.hh" Lighten the dependency load. Some constructors and destructors are uninlined to avoid the header depending on the mutation class.	2024-09-28 16:31:53 +03:00
Avi Kivity	5d68efe0bd	raft_group0_client: uninclude "db/system_keyspace.hh" It doesn't need it apart from a forward declaration. Files that lost necessary includes are adjusted, and some users of auth_version_t are redirected to the definition outside system_keyspace.	2024-09-28 16:31:53 +03:00
Avi Kivity	df3ee94467	db: system_keyspace: extract auth_version_t into its own header Users of auth_version_t shouldn't need to include the heavyweight system_keyspace.hh.	2024-09-28 16:31:50 +03:00
Pavel Emelyanov	c17d353718	Revert "[script/pull_github_pr.sh] Check Gating status before merging" This reverts commit `fac682df7e`. Again, this patch broke maintainer workflows, it needs even more care.	2024-09-27 19:12:18 +03:00
Benny Halevy	23d6b996b8	test/pylib: scylla_cluster: set endpoint_snitch in scylla conf When `property_file` is provided, we generate a `cassandra-rackdc.properties` file, but to actually use it, `endpoint_snitch` must be set to `GossipingPropertyFileSnitch`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20730	2024-09-27 16:46:54 +03:00
David Garcia	4900e4b1ac	docs: update theme 1.8.1 chore: update README Closes scylladb/scylladb#20832	2024-09-27 14:35:39 +02:00
Tomasz Grabiec	b5ae7da9d2	sstables: bsearch_clustered_cursor: Add trace-level logging	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	8e54ecd38e	sstables: bsearch_clustered_cursor: Move definitions out of line In order to later use the formatter for the inner class promoted_index_block, which is defined out of line after cached_promoted_index class definition.	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	0279ac5faa	test, sstables: Verify parsing stability when allocating section is retried	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	c09fa0cb98	test, sstables: Verify parsing stability when buffers cross page boundary	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	7670ee701a	sstables: bsearch_clustered_cursor: Switch parsers to work with page_view This fixes a use-after-free bug when parsing clustering key across pages. Clustering key index lookup is based on the index file page cache. We do a binary search within the index, which involves parsing index blocks touched by the algorithm. Index file pages are 4 KB chunks which are stored in LSA. To parse the first key of the block, we reuse clustering_parser, which is also used when parsing the data file. The parser is stateful and accepts consecutive chunks as temporary_buffers. The parser is supposed to keep its state across chunks. In `b1b5bda`, the parser was changed to keep shared fragments of the buffer passed to the parser in its internal state (across pages) rather than copy the fragments into a new buffer. This is problematic when buffers come from page cache because LSA buffers may be moved around or evicted. So the temporary_buffer which is a view on the LSA buffer is valid only around the duration of a single consume() call to the parser. If the blob which is parsed (e.g. variable-length clustering key component) spans pages, the fragments stored in the parser may be invalidated before the component is fully parsed. As a result, the parsed clustering key may have incorrect component values. This never causes parsing errors because the "length" field is always parsed from the current buffer, which is valid, and component parsing will end at the right place in the next (valid) buffer. The problematic path for clustering_key parsing is the one which calls primitive_consumer::read_bytes(), which is called for example for text components. Fixed-size components are not parsed like this, they store the intermediate state by copying data. This may cause incorrect clustering keys to be parsed when doing binary search in the index, diverting the search to an incorrect block. The solution is to use page_view instead of temporary_buffer, which can be safely shared via share() and stored across allocating section. The page_view maintains its hold to the LSA buffer even across allocating sections. Fixes #20766	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	c15145b71d	cached_file: Adapt page_view to ContiguousSharedBuffer	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	29498a97ae	cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start Will be easier to implement ContiguousSharedBuffer API as the buffer size will be equal to _size.	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	c0fa49bab5	sstables, utils: Allow parsers to work with different buffer types Currently, parsers work with temporary_buffer<char>. This is unsafe when invoked by bsearch_clustered_cursor, which reuses some of the parsers, and passes temporary_buffer<char> which is a view onto LSA buffer which comes from the index file page cache. This view is stable only around consume(). If parsing requires more than one page, it will continue with a different input buffer. The old buffer will be invalid, and it's unsafe for the parser to store and access it. Unfortunetly, the temporary_buffer API allows sharing the buffer via the share() method, which shares the underlying memory area. This is not correct when the underlying is managed by LSA, because storage may move. Parser uses this sharing when parsing blobs, e.g. clustering key components. When parsing resumes in the next page, parser will try to access the stored shared buffers pointing to the previous page, which may result in use-after-free on the memory area. In prearation for fixing the problem, parametrize parsers to work with different kinds of buffers. This will allow us to instantiate them with a buffer kind which supports sharing of LSA buffers properly in a safe way. It's not purely mechanical work. Some parts of the parsing state machine still works with temporary_buffer<char>, and allocate buffers internally, when reading into linearized destination buffer. They used to store this destination in _read_bytes vector, same field which is used to store the shared buffers. Now it's not possible, since shared buffer type may be different than temporary_buffer<char>. So those paths were changed to use a new field: _read_bytes_buf.	2024-09-27 01:24:54 +02:00
Tomasz Grabiec	93bfaf4282	sstables: promoted_index_block_parser: Make reset() always bring parser to initial state When reset() is done due to allocating section retry, it can be theoretically in an arbitrary point. So we should not assume that it finished parsing and state was reset by previous parsing. We should reset all the fields.	2024-09-27 01:23:43 +02:00
Tomasz Grabiec	ac823b1050	sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method To unify logic which handles allocating section retry, and thus improve safety.	2024-09-27 01:22:35 +02:00
Nadav Har'El	9af43dcd06	Merge 'Move collections stress tests from unit/ to boost/' from Pavel Emelyanov Collection stress tests include testing of B- B+- and radix trees, and those tests live in unit/ suite. There are also small corner-case tests for those collections in boost/ suite. There's an attempt to get rid of unit suite in favor of boost one, and this PR moves the collections stress testing from unit suite into their boost counterparts. refs: scylladb/qa-tasks#1655 Closes scylladb/scylladb#20475 * github.com:scylladb/scylladb: test: Move other collection-testing headers from unit to boost test: Move stress-collecton header from unit to boost test: Move B+tree compactiont test from unit to boost test: Move radix tree compactiont test from unit to boost test: Move B-tree compactiont test from unit to boost test: Move radix tree stress test from unit to boost test: Move B-tree stress test from unit to boost test: Move b+tree stress test from unit to boost test: Add bool in_thread argument to stress_collection function	2024-09-26 18:11:23 +03:00
Botond Dénes	9fe64b5d70	Merge 'Remove datadir string from table::config' from Pavel Emelyanov The datadir keeps path to directory where local sstables can be. The very same information is now kept in table's storage options (#20542). This set fixes the remaining places that still use table::config::datadir and table::dir() and removes the datadir field. Closes scylladb/scylladb#20675 * github.com:scylladb/scylladb: treewide: Remove table::config::datadir distributed_loader: Print storage options, not datadir data_dictionary: Add formatter for storage_options test: Construct table_for_tests with table storage options test: Generalize pair of make_table_for_tests helpers tests: Add helper to get snapshot directory from storage options table: snapshot_exists: Get directory from storage options table: snapshot_on_all_shards: Get directory from storage options	2024-09-26 15:26:45 +03:00
Kamil Braun	9224e48d6b	Merge 'Populate raft address map from gossiper on raft configuration change' from Gleb Natapov For each new node added to the raft config populate its ID to IP mapping in raft address map from the gossiper. The mapping may have expired if a node is added to the raft configuration long after it first appears in the gossiper. Fixes scylladb/scylladb#20600 Backport to all supported versions since the bug may cause bootstrapping failure. Closes scylladb/scylladb#20601 * github.com:scylladb/scylladb: test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join group0: make sure that address map has an entry for each new node in the raft configuration	2024-09-26 12:41:25 +02:00
Tomasz Grabiec	8aca93b3ec	sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried Parser's state was not reset when allocating section was retried. This doesn't cause problems in practice, because reserves are enough to cover allocation demands of parsing clustering keys, which are at most 64K in size. But it's still potentially unsafe and needs fixing.	2024-09-26 12:34:41 +02:00
Laszlo Ersek	ed91d35171	sstables: coroutinize sstable::load() Best viewed with "git show -b -W". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20822	2024-09-26 13:26:22 +03:00
Lakshmi Narayanan Sreethar	7beea03196	build: cmake: link cql3 library to the service library After commit `d16ea0af`, compiling the server using cmake fails with the following error : ``` FAILED: service/CMakeFiles/service.dir/Dev/qos/service_level_controller.cc.o ... /home/Scylla/scylladb/cql3/util.hh:21:10: fatal error: 'cql3/CqlParser.hpp' file not found 21 \| #include "cql3/CqlParser.hpp" \| ^~~~~~~~~~~~~~~~~~~~ 1 error generated. ``` Fix it by linking the cql3 to the service library. Closes scylladb/scylladb#20805	2024-09-26 09:17:30 +03:00
Yaron Kaikov	fac682df7e	[script/pull_github_pr.sh] Check Gating status before merging Maintainers use scripts/pull_github_pr.sh from scylladb.git when merging PRs and before pushing to the next. We want to prevent merges from piling up on top of unstable builds. This change will check Gating's current status and notify the maintainers Related to scylladb/scylla-pkg#3644 Closes scylladb/scylladb#20742	2024-09-26 08:44:06 +03:00
Nadav Har'El	7715abfc56	Merge 'Alternator store ProvisionedThroughput' from Amnon Heiman When users create a table using the Alternator API, they can decide if the billing is PROVISIONED of PAY_PER_REQUEST. If the billing is set to PROVISIONED, they need to set the ProvisionedThroughput ReadCapacityUnits (RCU) and WriteCapacityUnits (WCU). This series adds support for getting and setting the ProvisionedThroughput. The values will be stored as table extension tags. Following how TTL is stored within the Alternator, we will use ```system:rcu_attribute``` and ```system:wcu_attribute``` for the labels. The series adds a test that sets ProvisionedThroughput and validates that it gets the value back. It was tested with both Alternator and AWS. This series is part of the effort to monitor, limit, and bill Alternator operations. New code, no need to backport. Closes scylladb/scylladb#20056 * github.com:scylladb/scylladb: docs/alternator/compatibility.md: explain the consumed capacity provisioned Add test/alternator/test_provisioned_throughput.py test/alternator/util.py: Allow override BillingMode alternator/executor.cc: Store ProvisionedThroughput	2024-09-26 01:23:17 +03:00
Avi Kivity	357168114b	cql3: statement_restrictions: use the evaluator to calculate token for constrained global index query A global index has a primary key of the form (indexed_column, token, partition_key_column..., clustering_key_column...) The primary key columns are used to point at the base table row, and the token (computed as token(partition_key_column...) is used to maintain sort order. The query planner has an optimization: if the partition key is fully constrained to a unique value, then we compute the token from the partition key and use that to seek directly into the clustering row range for that base table partition. If the clustering key is also partially constrained, it is used to refine the index clustering key. Currently, this optimization is implemented as a hack: the partition key is extracted from the prepared statement + query options in get_global_index_token_clustering_ranges(), then used to calculate the token, which is then substituted in the expression passed to get_single_column_clustering_bounds() (the expression is shared across all running queries, so this is quite dangerous). We simplify the whole thing: - Let prepare_index_global() recognize that if the partition key is not fully constrained, then there is no way that we'll be able to compute the token (as it needs all partition key columns). Since the token is the first clustering key column of the index table, we can truncate it to length zero and bail out. - Otherwise, the partition key is fully constrained. We refactor the predicate (pk1 = :a AND pk2 = :b) to (pk1, pk2) := (:a, :b). We then pass expressions representing the partition key to the token function, ending up with token(:a, :b). We then substitute this expression into (*_idx_tbl_ck_prefix)[0], which computes the first clustering key column for the index table. - Remove the runtime component in get_global_index_clustering_ranges(). Note this include the early return if the partition key wasn't fully constrained (though the comment only mentions over-constraining), and the token computation, which is now done by evaluate(). Closes scylladb/scylladb#20733	2024-09-25 22:48:16 +03:00
Yaron Kaikov	d164fd45bc	install-dependencies.sh: update node_exporter to 1.8.2 Update node_exporter to 1.8.2 Fixes: #18493 Closes scylladb/scylladb#20254 [avi: regenerate frozen toolchain, with new clang in https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-x86_64.tar.gz new clang regenerated due to new packaging format (`f6fe4d9e73`) and some other minor changes.]	2024-09-25 18:42:25 +03:00
Gleb Natapov	9e4cd32096	test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join	2024-09-25 17:10:09 +03:00

1 2 3 4 5 ...

44774 Commits