scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Michał Chojnowski	c5c19e90ac	logalloc: add hold_reserve mutation_partition_v2::apply_monotonically() needs to perform some allocations in a destructor, to ensure that the invariants of the data structure are restored before returning. But it is usually called with reclaiming disabled, so the allocations might fail even in a perfectly healthy node with plenty of reclaimable memory. This patch adds a mechanism which allows to reserve some LSA memory (by asking the allocator to keep it unused) and make it available for allocation right when we need to guarantee allocation success. (cherry picked from commit `7b3f55a65f`)	2024-07-10 08:36:11 +00:00
Michał Chojnowski	985f5a50f6	logalloc: generalize refill_emergency_reserve() In the next patch, we will want to do the thing as refill_emergency_reserve() does, just with a quantity different than _emergency_reserve_max. So we split off the shareable part to a new function, and use it to implement refill_emergency_reserve(). (cherry picked from commit `f784be6a7e`)	2024-07-10 08:36:11 +00:00
Botond Dénes	ae11381d7c	Merge '[Backport 6.0] reader_concurrency_semaphore: make CPU concurrency configurable' from Botond Dénes The reader concurrency semaphore restricts the concurrency of reads that require CPU (intention: they read from the cache) to 1, meaning that if there is even a single active read which declares that it needs just CPU to proceed, no new read is admitted. This is meant to keep the concurrency of reads in the cache at 1. The idea is that concurrency in the cache is not useful: it just leads to the reactor rotating between these reads, all of the finishing later then they could if they were the only active read in the cache. This was observed to backfire in the case where there reads from a single table are mostly very fast, but on some keys are very slow (hint: collection full of tombstones). In this case the slow read keeps up the fast reads in the queue, increasing the 99th percentile latencies significantly. This series proposes to fix this, by making the CPU concurrency configurable. We don't like tunables like this and this is not a proper fix, but a workaround. The proper fix would be to allow to cut any page early, but we cannot cut a page in the middle of a row. We could maybe have a way of detecting slow reads and excluding them from the CPU concurrency. This would be a heuristic and it would be hard to get right. So in this series a robust and simple configurable is offered, which can be used on those few clusters which do suffer from the too strict concurrency limit. We have seen it in very few cases so far, so this doesn't seem to be wide-spread. Fixes: https://github.com/scylladb/scylladb/issues/19017 This PR backports https://github.com/scylladb/scylladb/pull/19018 and its follow-up https://github.com/scylladb/scylladb/pull/19600. Closes scylladb/scylladb#19644 * github.com:scylladb/scylladb: reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrency test/boost/reader_concurrency_semaphore_test: hoist require_can_admit reader_concurrency_semaphore: wire in the configurable cpu concurrency reader_concurrency_semaphore: add cpu_concurrency constructor parameter db/config: introduce reader_concurrency_semahore_cpu_concurrency	2024-07-10 07:23:08 +03:00
Anna Stuchlik	4ec5a06101	doc: update Scylla Doctor installation This commit updates the instuctions on how to download and run Scylla Doctor, following the changes in how Scylla Doctor is released. (cherry picked from commit `2ffda9b262`) Closes scylladb/scylladb#19525	2024-07-09 14:32:21 +03:00
Anna Stuchlik	dcf4c757b2	doc: remove support for Debian 10 This PR removes support for Debian 10, which reached end of life on June 30, 2024. Refs https://github.com/scylladb/scylla-enterprise/issues/4377 (cherry picked from commit `1f340428ea`) Closes scylladb/scylladb#19630	2024-07-09 12:55:11 +02:00
Wojciech Przytuła	a7fe9eeffd	storage_proxy: fix uninitialized LWT contention counter When debugging the issue of high LWT contention metric, we (the drivers team) discovered that at least 3 drivers (Go, Java, Rust) cause high numbers in that metrics in LWT workloads - we doubted that all those drivers route LWT queries badly. We tried to understand that metric and its semantics. It took 3 people over 10 hours to figure out what it is supposed to count. People from core team suspected that it was the drivers sending requests to different shards, causing contention. Then we ran the workload against a single node single shard cluster... and observed contention. Finally, we looked into the Scylla code and saw it. Uninitialized stack value. The core member was shocked. But we, the drivers people, felt we always knew it. It's yet another time that we are blamed for a server-side issue. We rebuilt scylla with the variable initialized to 0 and the metric kept being 0. To prevent such errors in the future, let's consider some lints that warn against uninitialized variables. This is such an obvious feature of e.g. Rust, and yet this has shown to be cause a painful bug in 2024. Fixes: scylladb/scylladb#19654 (cherry picked from commit `36a125bf97`) Closes scylladb/scylladb#19657	2024-07-09 11:41:10 +02:00
Michael Litvak	ad6eb1cadf	view: drain view builder before database The view builder is doing write operations to the database. In order for the view builder to shutdown gracefully without errors, we need to ensure the database can handle writes while it is drained. The commit changes the drain order, so that view builder is drained before the database shuts down. Fixes scylladb/scylladb#18929 (cherry picked from commit `9d9318c564`) Closes scylladb/scylladb#19636	2024-07-08 19:16:26 +02:00
Botond Dénes	dadc0c32e1	reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop Now that the CPU concurency limit is configurable, new reads might be ready to execute right after the current one was executed. So move the poll for admitting new reads into the inner loop, to prevent the situation where the inner loop yields and a concurrent do_wait_admission() finds that there are waiters (queued because at the time they arrived to the semaphore, the _ready_list was not empty) but it is is possible to admit a new read. When this happens the semaphore will dump diagnostics to help debug the apparent contradiction, which can generate a lot of log spam. Moving the poll into the inner loop prevents the false-positive contradiction detection from firing. Refs: scylladb/scylladb#19017 Closes scylladb/scylladb#19600 (cherry picked from commit `155acbb306`)	2024-07-08 08:13:40 +03:00
Botond Dénes	88d3c2eb4b	test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrency (cherry picked from commit `b4f3809ad2`)	2024-07-08 08:13:07 +03:00
Botond Dénes	4307631950	test/boost/reader_concurrency_semaphore_test: hoist require_can_admit This is currently a lambda in a test, hoist it into the global scope and make it into a function, so other tests can use it too (in the next patch). (cherry picked from commit `9cbdd8ef92`)	2024-07-08 08:12:34 +03:00
Botond Dénes	abc4a9b635	reader_concurrency_semaphore: wire in the configurable cpu concurrency Before this patch, the semaphore was hard-wired to stop admission, if there is even a single permit, which is in the need_cpu state. Therefore, keeping the CPU concurrency at 1. This patch makes use of the new cpu_concurrency parameter, which was wired in in the last patches, allowing for a configurable amount of concurrent need_cpu permits. This is to address workloads where some small subset of reads are expected to be slow, and can hold up faster reads behind them in the semaphore queue. (cherry picked from commit `07c0a8a6f8`)	2024-07-08 08:12:34 +03:00
Botond Dénes	052cef2621	reader_concurrency_semaphore: add cpu_concurrency constructor parameter In the case of the user semaphore, this receives the new reader_concurrency_semaphore_cpu_limit config item. Not used yet. (cherry picked from commit `59faa6d4ff`)	2024-07-08 08:12:20 +03:00
Botond Dénes	5a7af93c7c	db/config: introduce reader_concurrency_semahore_cpu_concurrency To allow increasing the semaphore's CPU concurrency, which is currently hard-limited to 1. Not wired yet. (cherry picked from commit `c7317be09a`)	2024-07-08 08:06:28 +03:00
Pavel Emelyanov	78f3fc8890	tablet_allocator: Put more info into failed-to-drain exception When balancer fails to find a node to balance drained tablets into, it throws an exception with tablet id and node id, but it's also good to know more details about the balancing state that lead to failure refs: #19504 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `c3d9831c5f`) Closes scylladb/scylladb#19619	2024-07-05 11:17:37 +03:00
None	3e06c882f0	.github: remove pull_request_template The reason for the pr template is to explain why do we need to backport a PR. On release branches there is no need for it Closes scylladb/scylladb#19615	2024-07-04 16:52:27 +03:00
Avi Kivity	c6e8a7f762	Merge '[Backport 6.0] Close output_stream in get_compaction_history() API handler' from ScyllaDB If an httpd body writer is called with output_stream<>, it mist close the stream on its own regardless of any exceptions it may generate while working, otherwise stream destructor may step on non-closed assertion. Stepped on with different handler, see #19541 Coroutinize the handler as the first step while at it (though the fix would have been notably shorter if done with .finally() lambda) (cherry picked from commit `acb351f4ee`) (cherry picked from commit `6d4ba98796`) (cherry picked from commit `b4f9387a9d`) Refs #19543 Closes scylladb/scylladb#19603 * github.com:scylladb/scylladb: api: Close response stream of get_compaction_history() api: Flush output stream in get_compaction_history() call api: Coroutinize get_compaction_history inner function	2024-07-04 15:08:08 +03:00
Pavel Emelyanov	941ec80a00	api: Close response stream of get_compaction_history() The function must close the stream even if it throws along the way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `b4f9387a9d`)	2024-07-03 18:30:17 +00:00
Pavel Emelyanov	ab5041cb03	api: Flush output stream in get_compaction_history() call It's currently implicitly flushed on its close, but in that case close can throw while flusing. Next patch wants close not to throw and that's possible if flushing the stream in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `6d4ba98796`)	2024-07-03 18:30:17 +00:00
Pavel Emelyanov	009f5eb69e	api: Coroutinize get_compaction_history inner function The handler returns a function which is then invoked with output_stream argument to render the json into. This function is converted into coroutine. It has yet another inner lambda that's passed into compaction_manager::get_compaction_history() as consumer lambda. It's coroutinized too. The indentation looks weird as preparation for future patching. Hopefullly it's still possible to understand what's going on. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `acb351f4ee`)	2024-07-03 18:30:17 +00:00
Tzach Livyatan	c9cd171f42	Docs: Fix a typo in sstable-corruption.rst (cherry picked from commit `a7115124ce`) Closes scylladb/scylladb#19591 scylla-6.0.2-candidate-20240703050547 scylla-6.0.2	2024-07-03 10:24:44 +02:00
Piotr Dulikowski	8b9e62e107	Merge '[Backport 6.0] cql3/statement/select_statement: do not parallelize single-partition aggregations' from Michał Jadwiszczak This patch adds a check if aggregation query is doing single-partition read and if so, makes the query to not use forward_service and do not parallelize the request. Fixes scylladb/scylladb#19349 (cherry picked from commit `e9ace7c203`) (cherry picked from commit `8eb5ca8202`) Refs scylladb/scylladb#19350 Closes scylladb/scylladb#19499 * github.com:scylladb/scylladb: test/boost/cql_query_test: add test for single-partition aggregation cql3/select_statement: do not parallelize single-partition aggregations	2024-07-02 21:03:24 +02:00
Kamil Braun	4e21421ddc	Merge '[Backport 6.0] Do not expire local addres in raft address map since the local node cannot disappear' from ScyllaDB A node may wait in the topology coordinator queue for awhile before been joined. Since the local address is added as expiring entry to the raft address map it may expire meanwhile and the bootstrap will fail. The series makes the entry non expiring. Fixes scylladb/scylladb#19523 Needs to be backported to 6.0 since the bug may cause bootstrap to fail. (cherry picked from commit `5d8f08c0d7`) (cherry picked from commit `3f136cf2eb`) Refs #19557 Closes scylladb/scylladb#19574 * github.com:scylladb/scylladb: test: add test that checks that local address cannot expire between join request placemen and its processing storage_service: make node's entry non expiring in raft address map	2024-07-01 16:20:17 +02:00
Gleb Natapov	724ec62e22	test: add test that checks that local address cannot expire between join request placemen and its processing (cherry picked from commit `3f136cf2eb`)	2024-07-01 10:44:31 +00:00
Gleb Natapov	a6c5f8192d	storage_service: make node's entry non expiring in raft address map Local address map entry should never expire in the address map. (cherry picked from commit `5d8f08c0d7`)	2024-07-01 10:44:31 +00:00
Pavel Emelyanov	20b99246fd	Merge '[Backport 6.0] Close output stream in task manager's API get_tasks handler' from ScyllaDB If client stops reading response early, the server-side stream throws but must be closed anyway. Seen in another endpoint and fixed by #19541 (cherry picked from commit `4897d8f145`) (cherry picked from commit `986a04cb11`) (cherry picked from commit `1be8b2fd25`) Refs #19542 Closes scylladb/scylladb#19562 * github.com:scylladb/scylladb: api: Fix indentation after previous patch api: Close response stream on error api: Flush response output stream before closing	2024-07-01 10:47:30 +03:00
Pavel Emelyanov	8e74ac5140	Merge '[Backport 6.0] Close output_stream in get_snapshot_details() API handler' from ScyllaDB All streams used by httpd handlers are to be closed by the handler itself, caller doesn't take care of that. fixes: #19494 (cherry picked from commit `d1fd886608`) (cherry picked from commit `a0c1552cea`) (cherry picked from commit `1839030e3b`) Refs #19541 Closes scylladb/scylladb#19563 * github.com:scylladb/scylladb: api: Fix indentation after previous patch api: Close output_stream on error api: Flush response output stream before closing	2024-07-01 10:47:08 +03:00
Pavel Emelyanov	4e17a5a1c2	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `1839030e3b`)	2024-06-30 19:20:11 +00:00
Pavel Emelyanov	c5c168a1db	api: Close output_stream on error If the get_snapshot_details() lambda throws, the output stream remains non-closed which is bad. Close it regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `a0c1552cea`)	2024-06-30 19:20:10 +00:00
Pavel Emelyanov	09272d2478	api: Flush response output stream before closing Otherwise close() may throw and this is what next patch will want not to happen. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `d1fd886608`)	2024-06-30 19:20:10 +00:00
Pavel Emelyanov	1e7f377b0a	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `1be8b2fd25`)	2024-06-30 19:19:52 +00:00
Pavel Emelyanov	b038177f19	api: Close response stream on error The handler's lambda is called with && stream object and must close the stream on its own regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `986a04cb11`)	2024-06-30 19:19:52 +00:00
Pavel Emelyanov	426bc6a4e1	api: Flush response output stream before closing The .close() method flushes the stream, but it may throw doing it. Next patch will want .close() not to throw, for that stream must be flushed explicitly before closing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `4897d8f145`)	2024-06-30 19:19:52 +00:00
Piotr Smaron	6a1e0489c6	cql: forbid switching from tablets to vnodes in ALTER KS This check is already in place, but isn't fully working, i.e. switching from a vnode KS to a tablets KS is not allowed, but this check doesn't work in the other direction. To fix the latter, `ks_prop_defs::get_initial_tablets()` has been changed to handle 3 states: (1) init_tablets is set, (2) it was skipped, (3) tablets are disabled. These couldn't fit into std::optional, so a new local struct to hold these states has been introduced. Callers of this function have been adjusted to set init_tablets to an appropriate value according to the circumstances, i.e. if tablets are globally enabled, but have been skipped in the CQL, init_tablets is automatically set to 0, but if someone executes ALTER KS and doesn't provide tablets options, they're inherited from the old KS. I tried various approaches and this one resulted in the least lines of code changed. I also provided testcases to explain how the code behaves. Fixes: #18795 (cherry picked from commit `758139c8b2`) Closes scylladb/scylladb#19540	2024-06-28 17:58:35 +03:00
Yaron Kaikov	1577765a20	.github/scripts/label_promoted_commits.py: fix adding labels when PR is closed `prs = response.json().get("items", [])` will return empty when there are no merged PRs, and this will just skip the all-label replacement process. This is a regression following the work done in #19442 Adding another part to handle closed PRs (which is the majority of the cases we have in Scylla core) Fixes: https://github.com/scylladb/scylladb/issues/19441 (cherry picked from commit `2eb8344b9a`) Closes scylladb/scylladb#19527	2024-06-27 19:35:18 +03:00
Botond Dénes	c4f1f129c3	Merge '[Backport 6.0] batchlog replay: bypass tombstones generated by past replays' from ScyllaDB The `system.batchlog` table has a partition for each batch that failed to complete. After finally applying the batch, the partition is deleted. Although the table has gc_grace_second = 0, tombstones can still accumulate in memory, because we don't purge partition tombstones from either the memtable or the cache. This can lead to the cache and memtable of this table to accumulate many thousands of even millions of tombstones, making batchlog replay very slow. We didn't notice this before, because we would only replay all failed batches on unbootstrap, which is rare and a heavy and slow operation on its own right already. With repair-based tombstone-gc however, we do a full batchlog replay at the beginning of each repair, and now this extra delay is noticeable. Fix this by making sure batchlog replays don't have to scan through all the tombstones generated by previous replays: * flush the `system.batchlog` memtable at the end of each batchlog replay, so it is cleared of tombstones * bypass the cache Fixes: https://github.com/scylladb/scylladb/issues/19376 Although this is not a regression -- replay was like this since forever -- now that repair calls into batchlog replay, every release which uses repair-based tombstone-gc should get this fix (cherry picked from commit `4e96e320b4`) (cherry picked from commit `2dd057c96d`) (cherry picked from commit `29f610d861`) (cherry picked from commit `31c0fa07d8`) Refs #19377 Closes scylladb/scylladb#19502 * github.com:scylladb/scylladb: db/batchlog_manager: bypass cache when scanning batchlog table db/batchlog_manager: replace open-coded paging with internal one db/batchlog_manager: implement cleanup after all batchlog replay cql3/query_processor: for_each_cql_result(): move func to the coro frame	2024-06-27 14:46:50 +03:00
Botond Dénes	fa644c6269	Merge '[Backport 6.0] tasks: fix tasks abort' from Aleksandra Martyniuk Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive. Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent. Fixes: https://github.com/scylladb/scylladb/issues/19304. Requires backport to all versions containing task manager (cherry picked from commit `3463f495b1`) (cherry picked from commit `50cb797d95`) Refs https://github.com/scylladb/scylladb/pull/19305 Closes scylladb/scylladb#19437 * github.com:scylladb/scylladb: test: add test for abort while a task is being unregistered tasks: fix tasks abort	2024-06-27 14:45:34 +03:00
Botond Dénes	cb4b4fe678	Merge '[Backport 6.0] test_tablets: add test_tablet_storage_freeing' from ScyllaDB Before work on tablets was completed, it was noticed that — due to some missing pieces of implementation — Scylla doesn't properly close sstables for migrated-away tablets. Because of this, disk space wasn't being reclaimed properly. Since the missing pieces of implementation were added, the problem should be gone now. This patch adds a test which was used to reproduce the problem earlier. It's expected to pass now, validating that the issue was fixed. Should be backported to branch-6.0, because the tested problem was also affecting that branch. Fixes #16946 (cherry picked from commit `7741491b47`) (cherry picked from commit `823da140dd`) Refs #18906 Closes scylladb/scylladb#19295 * github.com:scylladb/scylladb: test_tablets: add test_tablet_storage_freeing test: pylib: add get_sstables_disk_usage()	2024-06-27 14:40:06 +03:00
Kamil Braun	aca08bb1d1	Merge '[Backport 6.0] join_token_ring, gossip topology: recalculate sync nodes in wait_alive' from ScyllaDB The node booting in gossip topology waits until all NORMAL nodes are UP. If we removed a different node just before, the booting node could still see it as NORMAL and wait for it to be UP, which would time out and fail the bootstrap. This issue caused scylladb/scylladb#17526. Fix it by recalculating the nodes to wait for in every step of the of the `wait_alive` loop. Although the issue fixed by this PR caused only test flakiness, it could also manifest in real clusters. It's best to backport this PR to 5.4 and 6.0. Fixes scylladb/scylladb#17526 (cherry picked from commit `017134fd38`) (cherry picked from commit `7735bd539b`) (cherry picked from commit `bcc0a352b7`) Refs #19387 Closes scylladb/scylladb#19419 * github.com:scylladb/scylladb: join_token_ring, gossip topology: update obsolete comment join_token_ring, gossip topology: fix indendation after previous patch join_token_ring, gossip topology: recalculate sync nodes in wait_alive	2024-06-26 12:38:06 +02:00
Yaron Kaikov	9f31426ead	.github/workflow: close and replace label when backport promoted Today after Mergify opened a Backport PR, it will stay open until someone manually close the backport PR , also we can't track using labels which backport was done or not since there is no indication for that except digging into the PR and looking for a comment or a commit ref The following changes were made in this PR: * trigger add-label-when-promoted.yaml also when the push was made to `branch-x.y` * Replace label `backport/x.y` with `backport/x.y-done` in the original PR (this will automatically update the original Issue as well) * Add a comment on the backport PR and close it Fixes: https://github.com/scylladb/scylladb/issues/19441 (cherry picked from commit `394cba3e4b`) Closes scylladb/scylladb#19496	2024-06-26 12:42:34 +03:00
Botond Dénes	22622a94ca	db/batchlog_manager: bypass cache when scanning batchlog table Scans should not pollute the cache with cold data, in general. In the case of the batchlog table, there is another reason to bypass the cache: this table can have a lot of partition tombstones, which currently are not purged from the cache. So in certain cases, using the cache can make batch replay very slow, because it has to scan past tombstones of already replayed batches. (cherry picked from commit `31c0fa07d8`)	2024-06-26 09:05:14 +00:00
Botond Dénes	35a64856b0	db/batchlog_manager: replace open-coded paging with internal one query_processor has built-in paging support, no need to open-code paging in batchlog manager code. (cherry picked from commit `29f610d861`)	2024-06-26 09:05:13 +00:00
Botond Dénes	4e66b3c9ce	db/batchlog_manager: implement cleanup after all batchlog replay We have a commented code snippet from Origin with cleanup and a FIXME to implement it. Origin flushes the memtables and kicks a compaction. We only implement the flush here -- the flush will trigger a compaction check and we leave it up to the compaction manager to decide when a compaction is worthwhile. This method used to be called only from unbootstrap, so a cleanup was not really needed. Now it is also called at the end of repair, if the table is using repair-based tombstone-gc. If the memtable is filled with tombstones, this can add a lot of time to the runtime of each repair. So flush the memtable at the end, so the tombstones can be purged (they aren't purged from memtables yet). (cherry picked from commit `2dd057c96d`)	2024-06-26 09:05:13 +00:00
Botond Dénes	5e422ceefb	cql3/query_processor: for_each_cql_result(): move func to the coro frame Said method has a func parameter (called just f), which it receives as rvalue ref and just uses as a reference. This means that if caller doesn't keep the func alive, for_each_cql_result() will run into use-after-free after the first suspention point. This is unexpected for callers, who don't expect to have to keep something alive, which they passed in with std::move(). Adjust the signature to take a value instead, value parameters are moved to the coro frame and survive suspention points. Adjust internal callers (query_internal()) the same way. There are no known vulnerable external callers. (cherry picked from commit `4e96e320b4`)	2024-06-26 09:05:13 +00:00
Michał Jadwiszczak	29c6a4cf44	test/boost/cql_query_test: add test for single-partition aggregation (cherry picked from commit `8eb5ca8202`)	2024-06-25 23:56:49 +02:00
Dawid Medrek	7201efc2f2	db/hints: Initialize endpoint managers only for valid hint directories Before these changes, it could happen that Scylla initialized endpoint managers for hint directories representing * host IDs before migrating hinted handoff to using host IDs, * IP addresses after the migration. One scenario looked like this: 1. Start Scylla and upgrade the cluster to using host IDs. 2. Create, by hand, a hint directory representing an IP address. 3. Trigger changing the host filter in hinted handoff; it could be achieved by, for example, restricting the set of data centers Scylla is allowed to save hints for. When changing the host filter, we browse the hint directories and create endpoint managers if we can send hints towards the node corresponding to a given hint directory. We only accepted hint directories representing IP addresses and host IDs. However, we didn't check whether the local node has already been upgraded to host-ID-based hinted handoff or not. As a result, endpoint managers were created for both IP addresses and host IDs, no matter whether we were before or after the migration. These changes make sure that any time we browse the hint directories, we take that into account. Fixes scylladb/scylladb#19172 (cherry picked from commit `c9bb0a4da6`) Closes scylladb/scylladb#19426	2024-06-23 19:32:57 +03:00
Kefu Chai	1b2f10a4e7	sstables: use "me" sstable format by default in `7952200c`, we changed the `selected_format` from `mc` to `me`, but to be backward compatible the cluster starts with "md", so when the nodes in cluster agree on the "ME_SSTABLE_FORMAT" feature, the format selector believes that the node is already using "ME", which is specified by `_selected_format`. even it is actually still using "md", which is specified by `sstable_manager::_format`, as changed by `54d49c04`. as explained above, it was specified to "md" in hope to be backward compatible when upgrading from an existign installation which might be still using "md". but after a second thought, since we are able to read sstables persisted with older formats, this concern is not valid. in other words, `7952200c` introduced a regression which changed the "default" sstable format from `me` to `md`. to address this, we just change `sstable_manager::_format` to "me", so that all sstables are created using "me" format. a test is added accordingly. Fixes #18995 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `5a0d30f345`) Closes scylladb/scylladb#19422	2024-06-23 19:26:53 +03:00
Jenkins Promoter	1f2bbf52cc	Update ScyllaDB version to: 6.0.2	2024-06-23 15:15:46 +03:00
Aleksandra Martyniuk	169dfaf037	test: add test for abort while a task is being unregistered (cherry picked from commit `50cb797d95`)	2024-06-22 15:47:03 +02:00
Botond Dénes	cfac9d8bef	Merge '[Backport 6.0] Reduce TWCS off-strategy space overhead' from ScyllaDB Normally, the space overhead for TWCS is 1/N, where is number of windows. But during off-strategy, the overhead is 100% because input sstables cannot be released earlier. Reshaping a TWCS table that takes ~50% of available space can result in system running out of space. That's fixed by restricting every TWCS off-strategy job to 10% of free space in disk. Tables that aren't big will not be penalized with increased write amplification, as all input (disjoint) sstables can still be compacted in a single round. Fixes #16514. (cherry picked from commit `b8bd4c51c2`) (cherry picked from commit `51c7ee889e`) (cherry picked from commit `0ce8ee03f1`) (cherry picked from commit `ace4e5111e`) Refs #18137 Closes scylladb/scylladb#19404 * github.com:scylladb/scylladb: compaction: Reduce twcs off-strategy space overhead to 10% of free space compaction: wire storage free space into reshape procedure sstables: Allow to get free space from underlying storage replica: don't expose compaction_group to reshape task	2024-06-21 20:00:10 +03:00
Anna Stuchlik	aca9d657ca	doc: remove the link to Scylladb Google group The group is no longer active and should be removed from resources. (cherry picked from commit `027cf3f47d`) Closes scylladb/scylladb#19402	2024-06-21 19:59:35 +03:00

1 2 3 4 5 ...

43011 Commits