scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Nadav Har'El	6a6115cd86	mv: fix missing view deletions in some cases of range tombstones For efficiency, if a base-table update generates many view updates that go the same partition, they are collected as one mutation. If this mutation grows too big it can lead to memory exhaustion, so since commit `7d214800d0` we split the output mutation to mutations no longer than 100 rows (max_rows_for_view_updates) each. This patch fixes a bug where this split was done incorrectly when the update involved range tombstones, a bug which was discovered by a user in a real use case (#17117). Range tombstones are read in two parts, a beginning and an end, and the code could split the processing between these two parts and the result that some of the range tombstones in update could be missed - and the view could miss some deletions that happened in the base table. This patch fixes the code in two places to avoid breaking up the processing between range tombstones: 1. The counter "_op_count" that decides where to break the output mutation should only be incremented when adding rows to this output mutation. The existing code strangely incrmented it on every read (!?) which resulted in the counter being incremented on every input fragment, and in particular could reach the limit 100 between two range tombstone pieces. 2. Moreover, the length of output was checked in the wrong place... The existing code could get to 100 rows, not check at that point, read the next input - half a range tombstone - and only then check that we reached 100 rows and stop. The fix is to calculate the number of rows in the right place - exactly when it's needed, not before the step. The first change needs more justification: The old code, that incremented _op_count on every input fragment and not just output fragments did not fit the stated goal of its introduction - to avoid large allocations. In one test it resulted in breaking up the output mutation to chunks of 25 rows instead of the intended 100 rows. But, maybe there was another goal, to stop the iteration after 100 input rows and avoid the possibility of stalls if there are no output rows? It turns out the answer is no - we don't need this _op_count increment to avoid stalls: The function build_some() uses `co_await on_results()` to run one step of processing one input fragment - and `co_await` always checks for preemption. I verfied that indeed no stalls happen by using the existing test test_long_skipped_view_update_delete_with_timestamp. It generates a very long base update where all the view updates go to the same partition, but all but the last few updates don't generate any view updates. I confirmed that the fixed code loops over all these input rows without increasing _op_count and without generating any view update yet, but it does NOT stall. This patch also includes two tests reproducing this bug and confirming its fixed, and also two additional tests for breaking up long deletions that I wanted to make sure doesn't fail after this patch (it doesn't). By the way, this fix would have also fixed issue #12297 - which we fixed a year ago in a different way. That issue happend when the code went through 100 input rows without generating any output rows, and incorrectly concluding that there's no view update to send. With this fix, the code no longer stops generating the view update just because it saw 100 input rows - it would have waited until it generated 100 output rows in the view update (or the input is really done). Fixes #17117 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17164 (cherry picked from commit `14315fcbc3`)	2024-02-22 15:36:58 +02:00
Avi Kivity	e0e46fbc50	Regenerate frozen toolchain For gnutls 3.8.3. Since Fedora 37 is end-of-life, pick the package from Fedora 38. libunistring needs to be updated to satisfy the dependency solver. Fixes #17285. Closes scylladb/scylladb#17287 Signed-off-by: Avi Kivity <avi@scylladb.com> Closes #17411	2024-02-20 12:34:46 +02:00
Wojciech Mitros	27ab3b1744	rust: update dependencies The currently used version of "rustix" depency had a minor security vulnerability. This patch updates the corresponding crate. The update was performed using "cargo update" on "rustix" package and version "0.36.17" relevant package and the corresponding version. Refs #15772 Closes #17408	2024-02-19 22:12:50 +02:00
Michał Jadwiszczak	0d22471222	schema::describe: print 'synchronous_updates' only if it was specified While describing materialized view, print `synchronous_updates` option only if the tag is present in schema's extensions map. Previously if the key wasn't present, the default (false) value was printed. Fixes: #14924 Closes #14928 (cherry picked from commit `b92d47362f`)	2024-02-19 09:10:34 +02:00
Botond Dénes	422a731e85	query: do not kill unpaged queries when they reach the tombstone-limit The reason we introduced the tombstone-limit (query_tombstone_page_limit), was to allow paged queries to return incomplete/empty pages in the face of large tombstone spans. This works by cutting the page after the tombstone-limit amount of tombstones were processed. If the read is unpaged, it is killed instead. This was a mistake. First, it doesn't really make sense, the reason we introduced the tombstone limit, was to allow paged queries to process large tombstone-spans without timing out. It does not help unpaged queries. Furthermore, the tombstone-limit can kill internal queries done on behalf of user queries, because all our internal queries are unpaged. This can cause denial of service. So in this patch we disable the tombstone-limit for unpaged queries altogether, they are allowed to continue even after having processed the configured limit of tombstones. Fixes: #17241 Closes scylladb/scylladb#17242 (cherry picked from commit `f068d1a6fa`)	2024-02-15 12:50:30 +02:00
Yaron Kaikov	1fa8327504	release: prepare for 5.2.15 scylla-5.2.15	2024-02-11 14:17:31 +02:00
Pavel Emelyanov	f3c215aaa1	Update seastar submodule * seastar 29badd99...ad0f2d5d (1): > Merge "Slowdown IO scheduler based on dispatched/completed ratio" into branch-5.2 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 12:22:58 +03:00
Botond Dénes	94af1df2cf	Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs. The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function. Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw. Fixes #17035 Closes scylladb/scylladb#17073 * github.com:scylladb/scylladb: utils: replace assert() by on_internal_error() utils: add on_internal_error with common logger utils: add a timeuuid minimum, like we had maximum test/cql-pytest: tests for "date" type (cherry picked from commit `2a4b991772`)	2024-02-07 14:19:32 +02:00
Botond Dénes	9291eafd4a	Merge '[Backport 5.2] Raft snapshot fixes' from Kamil Braun Backports required to fix scylladb/scylladb#16683 in 5.2: - when creating first group 0 server, create a snapshot with non-empty ID, and start it at index 1 instead of 0 to force snapshot transfer to servers that join group 0 - add an API to trigger Raft snapshot - use the API when we restart and see that the existing snapshot is at index 0, to trigger a new one --- in order to fix broken deployments that already bootstrapped with index-0 snapshot. Closes #17087 * github.com:scylladb/scylladb: test_raft_snapshot_request: fix flakiness (again) test_raft_snapshot_request: fix flakiness Merge 'raft_group0: trigger snapshot if existing snapshot index is 0' from Kamil Braun Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun raft: server: add workaround for scylladb/scylladb#12972 raft: Store snapshot update and truncate log atomically service: raft: force initial snapshot transfer in new cluster raft_sys_table_storage: give initial snapshot a non zero value	2024-02-07 11:55:20 +02:00
Michał Chojnowski	4546d0789f	row_cache: update _prev_snapshot_pos even if apply_to_incomplete() is preempted Commit `e81fc1f095` accidentally broke the control flow of row_cache::do_update(). Before that commit, the body of the loop was wrapped in a lambda. Thus, to break out of the loop, `return` was used. The bad commit removed the lambda, but didn't update the `return` accordingly. Thus, since the commit, the statement doesn't just break out of the loop as intended, but also skips the code after the loop, which updates `_prev_snapshot_pos` to reflect the work done by the loop. As a result, whenever `apply_to_incomplete()` (the `updater`) is preempted, `do_update()` fails to update `_prev_snapshot_pos`. It remains in a stale state, until `do_update()` runs again and either finishes or is preempted outside of `updater`. If we read a partition processed by `do_update()` but not covered by `_prev_snapshot_pos`, we will read stale data (from the previous snapshot), which will be remembered in the cache as the current data. This results in outdated data being returned by the replica. (And perhaps in something worse if range tombstones are involved. I didn't investigate this possibility in depth). Note: for queries with CL>1, occurences of this bug are likely to be hidden by reconciliation, because the reconciled query will only see stale data if the queried partition is affected by the bug on on all queried replicas at the time of the query. Fixes #16759 Closes scylladb/scylladb#17138 (cherry picked from commit `ed98102c45`)	2024-02-04 14:46:57 +02:00
Kamil Braun	4e257c5c74	test_raft_snapshot_request: fix flakiness (again) At the end of the test, we wait until a restarted node receives a snapshot from the leader, and then verify that the log has been truncated. To check the snapshot, the test used the `system.raft_snapshots` table, while the log is stored in `system.raft`. Unfortunately, the two tables are not updated atomically when Raft persists a snapshot (scylladb/scylladb#9603). We first update `system.raft_snapshots`, then `system.raft` (see `raft_sys_table_storage::store_snapshot_descriptor`). So after the wait finishes, there's no guarantee the log has been truncated yet -- there's a race between the test's last check and Scylla doing that last delete. But we can check the snapshot using `system.raft` instead of `system.raft_snapshots`, as `system.raft` has the latest ID. And since `1640f83fdc`, storing that ID and truncating the log in `system.raft` happens atomically. Closes scylladb/scylladb#17106 (cherry picked from commit `c911bf1a33`)	2024-02-02 11:31:19 +01:00
Kamil Braun	08021dc906	test_raft_snapshot_request: fix flakiness Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added. (cherry picked from commit `74bf60a8ca`)	2024-02-02 11:31:19 +01:00
Botond Dénes	db586145aa	Merge 'raft_group0: trigger snapshot if existing snapshot index is 0' from Kamil Braun The persisted snapshot index may be 0 if the snapshot was created in older version of Scylla, which means snapshot transfer won't be triggered to a bootstrapping node. Commands present in the log may not cover all schema changes --- group 0 might have been created through the upgrade upgrade procedure, on a cluster with existing schema. So a deployment with index=0 snapshot is broken and we need to fix it. We can use the new `raft::server::trigger_snapshot` API for that. Also add a test. Fixes scylladb/scylladb#16683 Closes scylladb/scylladb#17072 * github.com:scylladb/scylladb: test: add test for fixing a broken group 0 snapshot raft_group0: trigger snapshot if existing snapshot index is 0 (cherry picked from commit `181f68f248`) Backport note: test_raft_fix_broken_snapshot had to be removed because the "error injections enabled at startup" feature does not yet exist in 5.2.	2024-02-01 15:39:14 +01:00
Botond Dénes	ce0ed29ad6	Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun This allows the user of `raft::server` to cause it to create a snapshot and truncate the Raft log (leaving no trailing entries; in the future we may extend the API to specify number of trailing entries left if needed). In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb/scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (`ff386e7a44`) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables). The PR adds the API to `raft::server` and a HTTP endpoint that uses it. In a follow-up PR, we plan to modify group 0 server startup logic to automatically call this API if it sees that no snapshot is present yet (to automatically fix the aforementioned 5.2 deployments once they upgrade.) Closes scylladb/scylladb#16816 * github.com:scylladb/scylladb: raft: remove `empty()` from `fsm_output` test: add test for manual triggering of Raft snapshots api: add HTTP endpoint to trigger Raft snapshots raft: server: add `trigger_snapshot` API raft: server: track last persisted snapshot descriptor index raft: server: framework for handling server requests raft: server: inline `poll_fsm_output` raft: server: fix indentation raft: server: move `io_fiber`'s processing of `batch` to a separate function raft: move `poll_output()` from `fsm` to `server` raft: move `_sm_events` from `fsm` to `server` raft: fsm: remove constructor used only in tests raft: fsm: move trace message from `poll_output` to `has_output` raft: fsm: extract `has_output()` raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` raft: server: pass `*_aborted` to `set_exception` call (cherry picked from commit `d202d32f81`) Backport notes: - `has_output()` has a smaller condition in the backported version (because the condition was smaller in `poll_output()`) - `process_fsm_output` has a smaller body (because `io_fiber` had a smaller body) in the backported version - the HTTP API is only started if `raft_group_registry` is started	2024-02-01 15:38:51 +01:00
Kamil Braun	cbe8e05ef6	raft: server: add workaround for scylladb/scylladb#12972 When a node joins the cluster, it closes connections after learning topology information from other nodes, in order to reopen them with correct encryption, compression etc. In ScyllaDB 5.2, this mechanism may interrupt an ongoing Raft snapshot transfer. This was fixed in later versions by putting some order into the bootstrap process with `50e8ec77c6` but the fix was not backported due to many prerequisites and complexity. Raft automatically recovers from interrupted snapshot transfer by retrying it eventually, and everything works. However an ERROR is reported due to that one failed snapshot transfer, and dtests dont like ERRORs -- they report the test case as failed if an ERROR happened in any node's logs even if the test passed otherwise. Here we apply a simple workaround to please dtests -- in this particular scenario, turn the ERROR into a WARN.	2024-02-01 14:29:56 +01:00
Michael Huang	84004ab83c	raft: Store snapshot update and truncate log atomically In case the snapshot update fails, we don't truncate commit log. Fixes scylladb/scylladb#9603 Closes scylladb/scylladb#15540 (cherry picked from commit `1640f83fdc`)	2024-02-01 13:10:05 +01:00
Kamil Braun	753e2d3c57	service: raft: force initial snapshot transfer in new cluster When we upgrade a cluster to use Raft, or perform manual Raft recovery procedure (which also creates a fresh group 0 cluster, using the same algorithm as during upgrade), we start with a non-empty group 0 state machine; in particular, the schema tables are non-empty. In this case we need to ensure that nodes which join group 0 receive the group 0 state. Right now this is not the case. In previous releases, where group 0 consisted only of schema, and schema pulls were also done outside Raft, those nodes received schema through this outside mechanism. In `91f609d065` we disabled schema pulls outside Raft; we're also extending group 0 with other things, like topology-specific state. To solve this, we force snapshot transfers by setting the initial snapshot index on the first group 0 server to `1` instead of `0`. During replication, Raft will see that the joining servers are behind, triggering snapshot transfer and forcing them to pull group 0 state. It's unnecessary to do this for cluster which bootstraps with Raft enabled right away but it also doesn't hurt, so we keep the logic simple and don't introduce branches based on that. Extend Raft upgrade tests with a node bootstrap step at the end to prevent regressions (without this patch, the step would hang - node would never join, waiting for schema). Fixes: #14066 Closes #14336 (cherry picked from commit `ff386e7a44`) Backport note: contrary to the claims above, it turns out that it is actually necessary to create snapshots in clusters which bootstrap with Raft, because of tombstones in current schema state expire hence applying schema mutations from old Raft log entries is not really idempotent. Snapshot transfer, which transfers group 0 history and state_ids, prevents old entries from applying schema mutations over latest schema state. Ref: scylladb/scylladb#16683	2024-01-31 17:00:10 +01:00
Gleb Natapov	42cf25bcbb	raft_sys_table_storage: give initial snapshot a non zero value We create a snapshot (config only, but still), but do not assign it any id. Because of that it is not loaded on start. We do want it to be loaded though since the state of group0 will not be re-created from the log on restart because the entries will have outdated id and will be skipped. As a result in memory state machine state will not be restored. This is not a problem now since schema state it restored outside of raft code. Message-Id: <20230316112801.1004602-5-gleb@scylladb.com> (cherry picked from commit `a690070722`)	2024-01-31 16:50:42 +01:00
Aleksandra Martyniuk	f85375ff99	api: ignore future in task_manager_json::wait_task Before returning task status, wait_task waits for it to finish with done() method and calls get() on a resulting future. If requested task fails, an exception will be thrown and user will get internal server error instead of failed task status. Result of done() method is ignored. Fixes: #14914. (cherry picked from commit `ae67f5d47e`) Closes #16438	2024-01-30 10:54:33 +02:00
Aleksandra Martyniuk	35a0a459db	compaction: ignore future explicitly discard_result ignores only successful futures. Thus, if perform_compaction<regular_compaction_task_executor> call fails, a failure is considered abandoned, causing tests to fail. Explicitly ignore failed future. Fixes: #14971. Closes #15000 (cherry picked from commit `7a28cc60ec`) Closes #16441	2024-01-30 10:53:09 +02:00
Kamil Braun	784695e3ac	system_keyspace: use system memory for `system.raft` table `system.raft` was using the "user memory pool", i.e. the `dirty_memory_manager` for this table was set to `database::_dirty_memory_manager` (instead of `database::_system_dirty_memory_manager`). This meant that if a write workload caused memory pressure on the user memory pool, internal `system.raft` writes would have to wait for memtables of user tables to get flushed before the write would proceed. This was observed in SCT longevity tests which ran a heavy workload on the cluster and concurrently, schema changes (which underneath use the `system.raft` table). Raft would often get stuck waiting many seconds for user memtables to get flushed. More details in issue #15622. Experiments showed that moving Raft to system memory fixed this particular issue, bringing the waits to reasonable levels. Currently `system.raft` stores only one group, group 0, which is internally used for cluster metadata operations (schema and topology changes) -- so it makes sense to keep use system memory. In the future we'd like to have other groups, for strongly consistent tables. These groups should use the user memory pool. It means we won't be able to use `system.raft` for them -- we'll just have to use a separate table. Fixes: scylladb/scylladb#15622 Closes scylladb/scylladb#15972 (cherry picked from commit `f094e23d84`)	2024-01-25 17:59:49 +01:00
Avi Kivity	351d6d6531	Merge 'Invalidate prepared statements for views when their schema changes.' from Eliran Sinvani When a base table changes and altered, so does the views that might refer to the added column (which includes "SELECT " views and also views that might need to use this column for rows lifetime (virtual columns). However the query processor implementation for views change notification was an empty function. Since views are tables, the query processor needs to at least treat them as such (and maybe in the future, do also some MV specific stuff). This commit adds a call to `on_update_column_family` from within `on_update_view`. The side effect true to this date is that prepared statements for views which changed due to a base table change will be invalidated. Fixes https://github.com/scylladb/scylladb/issues/16392 This series also adds a test which fails without this fix and passes when the fix is applied. Closes scylladb/scylladb#16897 github.com:scylladb/scylladb: Add test for mv prepared statements invalidation on base alter query processor: treat view changes at least as table changes (cherry picked from commit `5810396ba1`)	2024-01-23 21:31:47 +02:00
Takuya ASADA	5a05ccc2f8	scylla_raid_setup: faillback to other paths when UUID not avialable On some environment such as VMware instance, /dev/disk/by-uuid/<UUID> is not available, scylla_raid_setup will fail while mounting volume. To avoid failing to mount /dev/disk/by-uuid/<UUID>, fetch all available paths to mount the disk and fallback to other paths like by-partuuid, by-id, by-path or just using real device path like /dev/md0. To get device path, and also to dumping device status when UUID is not available, this will introduce UdevInfo class which communicate udev using pyudev. Related #11359 Closes scylladb/scylladb#13803 (cherry picked from commit `58d94a54a3`) [syuu: renegerate tools/toolchain/image for new python3-pyudev package] Closes #16938	2024-01-23 16:05:28 +02:00
Botond Dénes	a1603bcb40	readers/multishard: evictable_reader::fast_forward_to(): close reader on exception When the reader is currently paused, it is resumed, fast-forwarded, then paused again. The fast forwarding part can throw and this will lead to destroying the reader without it being closed first. Add a try-catch surrounding this part in the code. Also mark `maybe_pause()` and `do_pause()` as noexcept, to make it clear why that part doesn't need to be in the try-catch. Fixes: #16606 Closes scylladb/scylladb#16630 (cherry picked from commit `204d3284fa`)	2024-01-16 16:57:28 +02:00
Michał Jadwiszczak	29da20b9e0	schema: add scylla specific options to schema description Add `paxos_grace_seconds`, `tombstone_gc`, `cdc` and `synchronous_updates` options to schema description. Fixes: #12389 Fixes: scylladb/scylla-enterprise#2979 Closes #16786	2024-01-16 09:56:08 +02:00
Botond Dénes	7c4ec8cf4b	Update tools/java submodule * tools/java 843096943e...a1eed2f381 (1): > Update JNA dependency to 5.14.0 Fixes: https://github.com/scylladb/scylla-tools-java/issues/371	2024-01-15 15:51:32 +02:00
Aleksandra Martyniuk	5def443cf0	tasks: keep task's children in list If std::vector is resized its iterators and references may get invalidated. While task_manager::task::impl::_children's iterators are avoided throughout the code, references to its elements are being used. Since children vector does not need random access to its elements, change its type to std::list<foreign_task_ptr>, which iterators and references aren't invalidated on element insertion. Fixes: #16380. Closes scylladb/scylladb#16381 (cherry picked from commit `9b9ea1193c`) Closes #16777	2024-01-15 15:38:00 +02:00
Anna Mikhlin	c0604a31fa	release: prepare for 5.2.14 scylla-5.2.14	2024-01-14 16:34:38 +02:00
Pavel Emelyanov	96bb602c62	Update seastar submodule (token bucket duration underflow) * seastar 43a1ce58...29badd99 (1): > shared_token_bucket: Fix duration_for() underflow Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-12 15:15:56 +03:00
Botond Dénes	d96440e8b6	Merge '[Backport 5.2] Validate compaction strategy options in prepare' from Aleksandra Martyniuk Table properties validation is performed on statement execution. Thus, when one attempts to create a table with invalid options, an incorrect command gets committed in Raft. But then its application fails, leading to a raft machine being stopped. Check table properties when create and alter statements are prepared. Fixes: https://github.com/scylladb/scylladb/issues/14710. Closes #16750 * github.com:scylladb/scylladb: cql3: statements: delete execute override cql3: statements: call check_restricted_table_properties in prepare cql3: statements: pass data_dictionary::database to check_restricted_table_properties	2024-01-12 10:56:54 +02:00
Aleksandra Martyniuk	ea41a811d6	cql3: statements: delete execute override Delete overriden create_table_statement::execute as it only calls its direct parent's (schema_altering_statement) execute method anyway. (cherry picked from commit `6c7eb7096e`)	2024-01-11 16:43:17 +01:00
Aleksandra Martyniuk	8b77fbc904	cql3: statements: call check_restricted_table_properties in prepare Table properties validation is performed on statement execution. Thus, when one attempts to create a table with invalid options, an incorrect command gets committed in Raft. But then its application fails, leading to a raft machine being stopped. Check table properties when create and alter statements are prepared. The error is no longer returned as an exceptional future, but it is thrown. Adjust the tests accordingly. (cherry picked from commit `60fdc44bce`)	2024-01-11 16:10:26 +01:00
Aleksandra Martyniuk	3ab3a2cc1b	cql3: statements: pass data_dictionary::database to check_restricted_table_properties Pass data_dictionary::database to check_restricted_table_properties as an arguemnt instead of query_processor as the method will be called from a context which does not have access to query processor. (cherry picked from commit `ec98b182c8`)	2024-01-11 16:10:26 +01:00
Botond Dénes	7e9107cc97	Update tools/java submodule * tools/java 79fa02d8a3...843096943e (1): > build.xml: update io.airlift to 0.9 Fixes: scylladb/scylla-tools-java#374	2024-01-11 11:03:29 +02:00
Botond Dénes	abb7ae4309	Update ./tools/jmx submodule * tools/jmx f21550e...50909d6 (1): > scylla-apiclient: drop hk2-locator dependency Fixes: scylladb/scylla-jmx#231	2024-01-10 14:22:14 +02:00
Botond Dénes	2820c63734	Update tools/java submodule * tools/java d7ec9bf45f...79fa02d8a3 (2): > build.xml: update scylla-driver-core to 3.11.5.1 > treewide: update "guava" package Fixes: scylla-tools-java#365 Fixes: scylla-tools-java#343 Closes #16693	2024-01-10 08:19:43 +02:00
Nadav Har'El	ac0056f4bc	Merge 'Fix partition estimation with TWCS tables during streaming' from Raphael "Raph" Carvalho TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows. Turns out we had two problems in this area that leads to suboptimal bloom filters. 1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed. 2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count. For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong. Fixes https://github.com/scylladb/scylladb/issues/15704. Closes scylladb/scylladb#15938 * github.com:scylladb/scylladb: streaming: Improve partition estimation with TWCS streaming: Don't adjust partition estimate if segregation is postponed (cherry picked from commit `64d1d5cf62`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #16672	2024-01-08 09:06:43 +02:00
Calle Wilund	aaa25e1a78	Commitlog replayer: Range-check skip call Fixes #15269 If segment being replayed is corrupted/truncated we can attempt skipping completely bogues byte amounts, which can cause assert (i.e. crash) in file_data_source_impl. This is not a crash-level error, so ensure we range check the distance in the reader. v2: Add to corrupt_size if trying to skip more than available. The amount added is "wrong", but at least will ensure we log the fact that things are broken Closes scylladb/scylladb#15270 (cherry picked from commit `6ffb482bf3`)	2024-01-05 09:19:45 +02:00
Beni Peled	c57a0a7a46	release: prepare for 5.2.13 scylla-5.2.13	2024-01-03 17:48:59 +02:00
Botond Dénes	740ba3ac2a	tools/schema_loader: read_schema_table_mutation(): close the reader The reader used to read the sstables was not closed. This could sometimes trigger an abort(), because the reader was destroyed, without it being closed first. Why only sometimes? This is due to two factors: * read_mutation_from_flat_mutation_reader() - the method used to extract a mutation from the reader, uses consume(), which does not trigger `set_close_is_required()` (#16520). Due to this, the top-level combined reader did not complain when destroyed without close. * The combined reader closes underlying readers who have no more data for the current range. If the circumstances are just right, all underlying readers are closed, before the combined reader is destoyed. Looks like this is what happens for the most time. This bug was discovered in SCT testing. After fixing #16520, all invokations of `scylla-sstable`, which use this code would trigger the abort, without this patch. So no further testing is required. Fixes: #16519 Closes scylladb/scylladb#16521 (cherry picked from commit `da033343b7`)	2023-12-31 18:13:10 +02:00
Gleb Natapov	76c3dda640	storage_service: register schema version observer before joining group0 and starting gossiper The schema version is updated by group0, so if group0 starts before schema version observer is registered some updates may be missed. Since the observer is used to update node's gossiper state the gossiper may contain wrong schema version. Fix by registering the observer before starting group0 and even before starting gossiper to avoid a theoretical case that something may pull schema after start of gossiping and before the observer is registered. Fixes: #15078 Message-Id: <ZOYZWhEh6Zyb+FaN@scylladb.com> (cherry picked from commit `d1654ccdda`)	2023-12-20 11:14:27 +01:00
Kamil Braun	287546923e	Merge 'db: hints: add checksum to sync_point encoding' from Patryk Jędrzejczak Fixes #9405 `sync_point` API provided with incorrect sync point id might allocate crazy amount of memory and fail with `std::bad_alloc`. To fix this, we can check if the encoded sync point has been modified before decoding. We can achieve this by calculating a checksum before encoding, appending it to the encoded sync point, and compering it with a checksum calculated in `db::hints::decode` before decoding. Closes #14534 * github.com:scylladb/scylladb: db: hints: add checksum to sync point encoding db: hints: add the version_size constant (cherry picked from commit `eb6202ef9c`) The only difference from the original merge commit is the include path of `xx_hasher.hh`. On branch 5.2, this file is in the root directory, not `utils`. Closes #16458	2023-12-19 17:39:50 +02:00
Botond Dénes	c0dab523f9	Update tools/java submodule * tools/java e2aad6e3a0...d7ec9bf45f (1): > Merge "build: take care of old libthrift" from Piotr Grabowski Fixes: scylladb/scylla-tools-java#352 Closes #16464	2023-12-19 17:37:27 +02:00
Michael Huang	5499f7b5a8	cdc: use chunked_vector for topology_description entries Lists can grow very big. Let's use a chunked vector to prevent large contiguous allocations. Fixes: #15302. Closes scylladb/scylladb#15428 (cherry picked from commit `62a8a31be7`)	2023-12-19 13:43:23 +01:00
Piotr Grabowski	7055ac45d1	test: use more frequent reconnection policy The default reconnection policy in Python Driver is an exponential backoff (with jitter) policy, which starts at 1 second reconnection interval and ramps up to 600 seconds. This is a problem in tests (refs #15104), especially in tests that restart or replace nodes. In such a scenario, a node can be unavailable for an extended period of time and the driver will try to reconnect to it multiple times, eventually reaching very long reconnection interval values, exceeding the timeout of a test. Fix the issue by using a exponential reconnection policy with a maximum interval of 4 seconds. A smaller value was not chosen, as each retry clutters the logs with reconnection exception stack trace. Fixes #15104 Closes #15112 (cherry picked from commit `17e3e367ca`)	2023-12-19 13:43:23 +01:00
Gleb Natapov	4ff29d1637	raft: drop assert in server_impl::apply_snapshot for a condition that may happen server_impl::apply_snapshot() assumes that it cannot receive a snapshots from the same host until the previous one is handled and usually this is true since a leader will not send another snapshot until it gets response to a previous one. But it may happens that snapshot sending RPC fails after the snapshot was sent, but before reply is received because of connection disconnect. In this case the leader may send another snapshot and there is no guaranty that the previous one was already handled, so the assumption may break. Drop the assert that verifies the assumption and return an error in this case instead. Fixes: #15222 Message-ID: <ZO9JoEiHg+nIdavS@scylladb.com> (cherry picked from commit `55f047f33f`)	2023-12-19 13:43:23 +01:00
Alexey Novikov	6bcf9e6631	When add duration field to UDT check whether this UDT is used in some clustering key Having values of the duration type is not allowed for clustering columns, because duration can't be ordered. This is correctly validated when creating a table but do not validated when we alter the type. Fixes #12913 Closes scylladb/scylladb#16022 (cherry picked from commit `bd73536b33`)	2023-12-19 06:58:41 -05:00
Takuya ASADA	74dd8f08e3	dist: fix local-fs.target dependency systemd man page says: systemd-fstab-generator(3) automatically adds dependencies of type Before= to all mount units that refer to local mount points for this target unit. So "Before=local-fs.taget" is the correct dependency for local mount points, but we currently specify "After=local-fs.target", it should be fixed. Also replaced "WantedBy=multi-user.target" with "WantedBy=local-fs.target", since .mount are not related with multi-user but depends local filesystems. Fixes #8761 Closes scylladb/scylladb#15647 (cherry picked from commit `a23278308f`)	2023-12-19 13:15:00 +02:00
Botond Dénes	68507ed4d9	Merge '[Backport 5.2] Shard of shard repair task impl' from Aleksandra Martyniuk Shard id is logged twice in repair (once explicitly, once added by logger). Redundant occurrence is deleted. shard_repair_task_impl::id (which contains global repair shard) is renamed to avoid further confusion. Fixes: https://github.com/scylladb/scylladb/issues/12955 Closes #16439 * github.com:scylladb/scylladb: repair: rename shard_repair_task_impl::id repair: delete redundant shard id from logs	2023-12-19 10:28:57 +02:00
Botond Dénes	46a29e9a02	Merge 'alternator: fix isolation of concurrent modifications to tags' from Nadav Har'El Alternator's implementation of TagResource, UntagResource and UpdateTimeToLive (the latter uses tags to store the TTL configuration) was unsafe for concurrent modifications - some of these modifications may be lost. This short series fixes the bug, and also adds (in the last patch) a test that reproduces the bug and verifies that it's fixed. The cause of the incorrect isolation was that we separately read the old tags and wrote the modified tags. In this series we introduce a new function, `modify_tags()` which can do both under one lock, so concurrent tag operations are serialized and therefore isolated as expected. Fixes #6389. Closes #13150 * github.com:scylladb/scylladb: test/alternator: test concurrent TagResource / UntagResource db/tags: drop unsafe update_tags() utility function alternator: isolate concurrent modification to tags db/tags: add safe modify_tags() utility functions migration_manager: expose access to storage_proxy (cherry picked from commit `dba1d36aa6`) Closes #16453	2023-12-19 10:19:31 +02:00

1 2 3 4 5 ...

35029 Commits