scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Michał Chojnowski	136ccff353	cql_test_env: ensure shutdown() before stop() for system_keyspace If system_keyspace::stop() is called before system_keyspace::shutdown(), it will never finish, because the uncleared shared pointers will keep it alive indefinitely. Currently this can happen if an exception is thrown before the construction of the shutdown() defer. This patch moves the shutdown() call to immediately before stop(). I see no reason why it should be elsewhere. Fixes scylladb/scylla-enterprise#4380 (cherry picked from commit `eeaf4c3443`) Closes scylladb/scylladb#20146	2024-08-14 20:15:50 +03:00
Michael Litvak	29c352d9c8	db: fix waiting for counter update operations on table stop When a table is dropped it should wait for all pending operations in the table before the table is destroyed, because the operations may use the table's resources. With counter update operations, currently this is not the case. The table may be destroyed while there is a counter update operation in progress, causing an assert to be triggered due to a resource being destroyed while it's in use. The reason the operation is not waited for is a mistake in the lifetime management of the object representing the write in progress. The commit fixes it so the object lives for the duration of the entire counter update operation, by moving it to the `do_with` list. Fixes scylladb/scylla-enterprise#4475 (cherry picked from commit `ff86c864ff`) Closes scylladb/scylladb#19980	2024-08-07 10:52:39 +02:00
Kamil Braun	888d5fe1a3	Merge '[Backport 5.4] raft: fix the shutdown phase being stuck' from Emil Maskovsky Some of the calls inside the raft_group0_client::start_operation() method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (get_units(semaphore)) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes #19223 (cherry picked from commit `2dbe9ef2f2`) (cherry picked from commit `5dfc50d354`) Refs #19860 Closes scylladb/scylladb#19986 * github.com:scylladb/scylladb: raft: fix the shutdown phase being stuck raft: use the abort source reference in raft group0 client interface	2024-08-05 16:28:19 +02:00
Emil Maskovsky	6e8911ed51	raft: fix the shutdown phase being stuck Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 (cherry picked from commit `5dfc50d354`)	2024-08-02 11:00:08 +02:00
Emil Maskovsky	9c4fa2652c	raft: use the abort source reference in raft group0 client interface Most callers of the raft group0 client interface are passing a real source instance, so we can use the abort source reference in the client interface. This change makes the code simpler and more consistent. (cherry picked from commit `2dbe9ef2f2`)	2024-08-02 10:59:44 +02:00
Avi Kivity	58377036b0	Merge '[Backport 5.4] sstables: fix some mixups between the writer's schema and the sstable's schema ' from Michał Chojnowski There are two schemas associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. This series fixes the known mixups between the two — when setting up compression, and when setting up the bloom filters. Fixes scylladb/scylladb#16065 The bug is present in all supported versions, so the patch has to be backported to all of them. (cherry picked from commit `a1834efd82`) (cherry picked from commit `d10b38ba5b`) (cherry picked from commit `1a8ee69a43`) Refs scylladb/scylladb#19695 Closes scylladb/scylladb#19878 * github.com:scylladb/scylladb: sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's sstables: for i_filter downcasts, use dynamic_cast instead of static_cast	2024-07-28 18:15:27 +03:00
Michał Chojnowski	5b29da123f	sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the compressor objects based on its own schema, but using them based based on the sstable's schema the sstable's schema. This patch forces the writer to use the sstable's schema for both. (cherry picked from commit `1a8ee69a43`)	2024-07-25 12:28:00 +02:00
Michał Chojnowski	92ee525f22	sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the filter based on its own schema, while the layer outside the writer was interpreting it as if it was created with the sstable's schema. This patch forces the writer to pick the filter's parameters based on the sstable's schema instead. (cherry picked from commit `d10b38ba5b`)	2024-07-25 12:28:00 +02:00
Michał Chojnowski	bc1c6275a4	sstables: for i_filter downcasts, use dynamic_cast instead of static_cast As of this patch, those static_casts are actually invalid in some cases (they cast to the wrong type) because of an oversight. A later patch will fix that. But to even write a reliable reproducer for the problem, we must force the invalid casts to manifest as a crash (instead of weird results). This patch both allows writing a reproducer for the bug and serves as a bit of defensive programming for the future. (cherry picked from commit `a1834efd82`) # Conflicts: # sstables/sstables.cc	2024-07-25 12:28:00 +02:00
Nadav Har'El	79629a80cd	alternator: fix "/localnodes" to not return nodes still joining Alternator's "/localnodes" HTTP request is supposed to return the list of nodes in the local DC to which the user can send requests. The existing implementation incorrectly used gossiper::is_alive() to check for which nodes to return - but "alive" nodes include nodes which are still joining the cluster and not really usable. These nodes can remain in the JOINING state for a long time while they are copying data, and an attempt to send requests to them will fail. The fix for this bug is trivial: change the call to is_alive() to a call to is_normal(). But the hard part of this test is the testing: 1. An existing multi-node test for "/localnodes" assummed that right after a new node was created, it appears on "/localnodes". But after this patch, it may take a bit more time for the bootstrapping to complete and the new node to appear in /localnodes - so I had to add a retry loop. 2. I added a test that reproduces the bug fixed here, and verifies its fix. The test is in the multi-node topology framework. It adds an injection which delays the bootstrap, which leaves a new node in JOINING state for a long time. The test then verifies that the new node is alive (as checked by the REST API), but is not returned by "/localnodes". 3. The new injection for delaying the bootstrap is unfortunately not very pretty - I had to do it in three places because we have several code paths of how bootstrap works without repair, with repair, without Raft and with Raft - and I wanted to delay all of them. Fixes #19694. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19725 (cherry picked from commit `bac7c33313`) (deleted test for cherry-pick) (cherry picked from commit `af39675c38`)	2024-07-24 11:58:16 +03:00
Lakshmi Narayanan Sreethar	9f0b75bcd2	[Backport 5.4]: sstables: do not reload components of unlinked sstables The SSTable is removed from the reclaimed memory tracking logic only when its object is deleted. However, there is a risk that the Bloom filter reloader may attempt to reload the SSTable after it has been unlinked but before the SSTable object is destroyed. Prevent this by removing the SSTable from the reclaimed list maintained by the manager as soon as it is unlinked. The original logic that updated the memory tracking in `sstables_manager::deactivate()` is left in place as (a) the variables have to be updated only when the SSTable object is actually deleted, as the memory used by the filter is not freed as long as the SSTable is alive, and (b) the `_reclaimed.erase(sst)` is still useful during shutdown, for example, when the SSTable is not unlinked but just destroyed. Fixes https://github.com/scylladb/scylladb/issues/19722 Closes scylladb/scylladb#19717 github.com:scylladb/scylladb: boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded sstables: do not reload components of unlinked sstables sstables/sstables_manager: introduce on_unlink method (cherry picked from commit `591876b44e`) Backported from #19717 to 5.4 Closes scylladb/scylladb#19831	2024-07-23 21:31:23 +03:00
Kamil Braun	0fbec200e9	Merge '[Backport 5.4] Fix lwt semaphore guard accounting' from ScyllaDB Currently the guard does not account correctly for ongoing operation if semaphore acquisition fails. It may signal a semaphore when it is not held. Should be backported to all supported versions. (cherry picked from commit `87beebeed0`) (cherry picked from commit `4178589826`) Refs #19699 Closes scylladb/scylladb#19795 * github.com:scylladb/scylladb: test: add test to check that coordinator lwt semaphore continues functioning after locking failures paxos: do not signal semaphore if it was not acquired	2024-07-22 10:34:47 +02:00
Gleb Natapov	972b799773	test: add test to check that coordinator lwt semaphore continues functioning after locking failures (cherry picked from commit `4178589826`)	2024-07-19 19:18:35 +02:00
Gleb Natapov	68c581314a	paxos: do not signal semaphore if it was not acquired The guard signals a semaphore during destruction if it is marked as locked, but currently it may be marked as locked even if locking failed. Fix this by using semaphore_units instead of managing the locked flag manually. Fixes: https://github.com/scylladb/scylladb/issues/19698 (cherry picked from commit `87beebeed0`)	2024-07-18 15:34:16 +00:00
Michael Litvak	380ce9a6d8	storage_proxy: remove response handler if no targets When writing a mutation, it might happen that there are no live targets to send the mutation to, yet the request can be satisfied. For example, when writing with CL=ANY to a dead node, the request is completed by storing a local hint. Currently, in that case, a write response handler is created for the request and it remains active until it timeouts because it is not removed anywhere, even though the write is completed successfuly after storing the hint. The response handler should be removed usually when receiving responses from all targets, but in this case there are no targets to trigger the removal. In this commit we check if we don't have live targets to send the mutation to. If so, we remove the response handler immediately. Fixes scylladb/scylladb#19529 (cherry picked from commit `a9fdd0a93a`) Closes scylladb/scylladb#19679	2024-07-15 16:43:28 +02:00
Wojciech Mitros	2c01dfe12b	test: account for multiple flushes of commitlog segments Currently, when we calculate the number of deactivated segments in test_commitlog_delete_when_over_disk_limit, we only count the segments that were active during the first flush. However, during the test, there may have been more than one flush, and a segment could have been created between them. This segment would sometimes get deactivated and even destroyed, and as a result, the count of destroyed segments would appear larger than the count of deactivated ones. This patch fixes this behavior by accounting for all segments that were active during any flush instead of just segments active during the first flush. Fixes #10527 (cherry-picked from commit `39a8f4310d`) Closes scylladb/scylladb#19706	2024-07-14 23:21:02 +02:00
Jenkins Promoter	ab22cb7253	Update ScyllaDB version to: 5.4.10	2024-07-11 15:43:24 +03:00
Tomasz Grabiec	0e02128d28	Merge '[Backport 5.4] mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion' from ScyllaDB apply_monotonically() is run with reclaim disabled. So with some bad luck, sentinel insertion might fail with bad_alloc even on a perfectly healthy node. We can't deal with the failure of sentinel insertion, so this will result in a crash. This patch prevents the spurious OOM by reserving some memory (1 LSA segment) and only making it available right before the critical allocations. Fixes https://github.com/scylladb/scylladb/issues/19552 (cherry picked from commit `f784be6a7e`) (cherry picked from commit `7b3f55a65f`) (cherry picked from commit `78d6471ce4`) Refs #19617 Closes scylladb/scylladb#19676 * github.com:scylladb/scylladb: mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion logalloc: add hold_reserve logalloc: generalize refill_emergency_reserve()	2024-07-10 14:29:39 +02:00
Botond Dénes	1e548770cf	Merge '[Backport 5.4] reader_concurrency_semaphore: make CPU concurrency configurable' from Botond Dénes The reader concurrency semaphore restricts the concurrency of reads that require CPU (intention: they read from the cache) to 1, meaning that if there is even a single active read which declares that it needs just CPU to proceed, no new read is admitted. This is meant to keep the concurrency of reads in the cache at 1. The idea is that concurrency in the cache is not useful: it just leads to the reactor rotating between these reads, all of the finishing later then they could if they were the only active read in the cache. This was observed to backfire in the case where there reads from a single table are mostly very fast, but on some keys are very slow (hint: collection full of tombstones). In this case the slow read keeps up the fast reads in the queue, increasing the 99th percentile latencies significantly. This series proposes to fix this, by making the CPU concurrency configurable. We don't like tunables like this and this is not a proper fix, but a workaround. The proper fix would be to allow to cut any page early, but we cannot cut a page in the middle of a row. We could maybe have a way of detecting slow reads and excluding them from the CPU concurrency. This would be a heuristic and it would be hard to get right. So in this series a robust and simple configurable is offered, which can be used on those few clusters which do suffer from the too strict concurrency limit. We have seen it in very few cases so far, so this doesn't seem to be wide-spread. Fixes: https://github.com/scylladb/scylladb/issues/19017 This PR backports https://github.com/scylladb/scylladb/pull/19018 and its follow-up https://github.com/scylladb/scylladb/pull/19600. Closes scylladb/scylladb#19646 * github.com:scylladb/scylladb: reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrency test/boost/reader_concurrency_semaphore_test: hoist require_can_admit reader_concurrency_semaphore: wire in the configurable cpu concurrency reader_concurrency_semaphore: add cpu_concurrency constructor parameter db/config: introduce reader_concurrency_semahore_cpu_concurrency	2024-07-10 13:32:07 +03:00
Wojciech Przytuła	3e879c1bfa	storage_proxy: fix uninitialized LWT contention counter When debugging the issue of high LWT contention metric, we (the drivers team) discovered that at least 3 drivers (Go, Java, Rust) cause high numbers in that metrics in LWT workloads - we doubted that all those drivers route LWT queries badly. We tried to understand that metric and its semantics. It took 3 people over 10 hours to figure out what it is supposed to count. People from core team suspected that it was the drivers sending requests to different shards, causing contention. Then we ran the workload against a single node single shard cluster... and observed contention. Finally, we looked into the Scylla code and saw it. Uninitialized stack value. The core member was shocked. But we, the drivers people, felt we always knew it. It's yet another time that we are blamed for a server-side issue. We rebuilt scylla with the variable initialized to 0 and the metric kept being 0. To prevent such errors in the future, let's consider some lints that warn against uninitialized variables. This is such an obvious feature of e.g. Rust, and yet this has shown to be cause a painful bug in 2024. Fixes: #19654 (cherry picked from commit `36a125bf97`) Closes scylladb/scylladb#19656	2024-07-10 13:29:31 +03:00
Michał Chojnowski	5e9a2193db	mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion apply_monotonically() is run with reclaim disabled. So with some bad luck, sentinel insertion might fail with bad_alloc even on a perfectly healthy node. We can't deal with the failure of sentinel insertion, so this will result in a crash. This patch prevents the spurious OOM by reserving some memory (1 LSA segment) and only making it available right before the critical allocations. Fixes scylladb/scylladb#19552 (cherry picked from commit `78d6471ce4`)	2024-07-10 08:36:12 +00:00
Michał Chojnowski	c2e5d9e726	logalloc: add hold_reserve mutation_partition_v2::apply_monotonically() needs to perform some allocations in a destructor, to ensure that the invariants of the data structure are restored before returning. But it is usually called with reclaiming disabled, so the allocations might fail even in a perfectly healthy node with plenty of reclaimable memory. This patch adds a mechanism which allows to reserve some LSA memory (by asking the allocator to keep it unused) and make it available for allocation right when we need to guarantee allocation success. (cherry picked from commit `7b3f55a65f`)	2024-07-10 08:36:12 +00:00
Michał Chojnowski	80ff0688b1	logalloc: generalize refill_emergency_reserve() In the next patch, we will want to do the thing as refill_emergency_reserve() does, just with a quantity different than _emergency_reserve_max. So we split off the shareable part to a new function, and use it to implement refill_emergency_reserve(). (cherry picked from commit `f784be6a7e`)	2024-07-10 08:36:10 +00:00
Raphael S. Carvalho	a319085870	compaction: Check for key presence in memtable when calculating max purgeable timestamp It was observed that some use cases might append old data constantly to memtable, blocking GC of expired tombstones. That's because timestamp of memtable is unconditionally used for calculating max purgeable, even when the memtable doesn't contain the key of the tombstone we're trying to GC. The idea is to treat memtable as we treat L0 sstables, i.e. it will only prevent GC if it contains data that is possibly shadowed by the expired tombstone (after checking for key presence and timestamp). Memtable will usually have a small subset of keys in largest tier, so after this change, a large fraction of keys containing expired tombstones can be GCed when memtable contains old data. Fixes #17599. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `38699f6c3d`) Closes scylladb/scylladb#19551	2024-07-10 07:30:40 +03:00
Botond Dénes	b24bd4d176	Merge '[Backport 5.4] Reduce TWCS off-strategy space overhead' from Raphael "Raph" Carvalho Normally, the space overhead for TWCS is 1/N, where is number of windows. But during off-strategy, the overhead is 100% because input sstables cannot be released earlier. Reshaping a TWCS table that takes ~50% of available space can result in system running out of space. That's fixed by restricting every TWCS off-strategy job to 10% of free space in disk. Tables that aren't big will not be penalized with increased write amplification, as all input (disjoint) sstables can still be compacted in a single round. Fixes https://github.com/scylladb/scylladb/issues/16514. (cherry picked from commit `b8bd4c51c2`) (cherry picked from commit `51c7ee889e`) (cherry picked from commit `0ce8ee03f1`) (cherry picked from commit `ace4e5111e`) Refs https://github.com/scylladb/scylladb/pull/18137 note for maintainer: first cleanup (conflicting) patch was removed and doesn't change anything. Closes scylladb/scylladb#19549 * github.com:scylladb/scylladb: compaction: Reduce twcs off-strategy space overhead to 10% of free space compaction: wire storage free space into reshape procedure sstables: Allow to get free space from underlying storage	2024-07-09 14:36:27 +03:00
Botond Dénes	f628e7439c	reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop Now that the CPU concurency limit is configurable, new reads might be ready to execute right after the current one was executed. So move the poll for admitting new reads into the inner loop, to prevent the situation where the inner loop yields and a concurrent do_wait_admission() finds that there are waiters (queued because at the time they arrived to the semaphore, the _ready_list was not empty) but it is is possible to admit a new read. When this happens the semaphore will dump diagnostics to help debug the apparent contradiction, which can generate a lot of log spam. Moving the poll into the inner loop prevents the false-positive contradiction detection from firing. Refs: scylladb/scylladb#19017 Closes scylladb/scylladb#19600 (cherry picked from commit `155acbb306`)	2024-07-09 13:11:25 +03:00
Botond Dénes	42da43b5b4	test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrency (cherry picked from commit `b4f3809ad2`)	2024-07-09 13:11:25 +03:00
Botond Dénes	679fa0f72a	test/boost/reader_concurrency_semaphore_test: hoist require_can_admit This is currently a lambda in a test, hoist it into the global scope and make it into a function, so other tests can use it too (in the next patch). (cherry picked from commit `9cbdd8ef92`)	2024-07-09 11:40:13 +03:00
Botond Dénes	89733a1f18	reader_concurrency_semaphore: wire in the configurable cpu concurrency Before this patch, the semaphore was hard-wired to stop admission, if there is even a single permit, which is in the need_cpu state. Therefore, keeping the CPU concurrency at 1. This patch makes use of the new cpu_concurrency parameter, which was wired in in the last patches, allowing for a configurable amount of concurrent need_cpu permits. This is to address workloads where some small subset of reads are expected to be slow, and can hold up faster reads behind them in the semaphore queue. (cherry picked from commit `07c0a8a6f8`)	2024-07-09 11:40:13 +03:00
Botond Dénes	f185a227a2	reader_concurrency_semaphore: add cpu_concurrency constructor parameter In the case of the user semaphore, this receives the new reader_concurrency_semaphore_cpu_limit config item. Not used yet. (cherry picked from commit `59faa6d4ff`)	2024-07-09 11:40:12 +03:00
Botond Dénes	89fd08b955	db/config: introduce reader_concurrency_semahore_cpu_concurrency To allow increasing the semaphore's CPU concurrency, which is currently hard-limited to 1. Not wired yet. (cherry picked from commit `c7317be09a`)	2024-07-08 08:41:02 +03:00
Nadav Har'El	f29d51d9c3	cql: don't crash when creating a view during a truncate The test dtest materialized_views_test.py::TestMaterializedViews:: test_mv_populating_from_existing_data_during_truncate reproduces an assertion failure, and crash, while doing a CREATE MATERIALIZED VIEW during a TRUNCATE operation. This patch fixes the crash by removing the assert() call for a view (replacing it by a warning message) - we'll explain below why this is fine. Also for base tables change we change the assertion to an on_internal_error (Refs #7871). This makes the test stop crashing Scylla, but it still fails due to issue #17635. Let's explain the crash, and the fix: The test starts TRUNCATE on table that doesn't yet have a view. truncate_table_on_all_shards() begins by disabling compaction on the table and all its views (of which there are none, at this point). At this point, the test creates a new view is on this table. The new view has, by default, compaction enabled. Later, TRUNCATE calls discard_sstables() on this new view, asserts that it has compaction disabled - and this assertion fails. The fix in this patch is to not do the assert() for views. In other words, we acknowledge that in this use case, the view will have compactions enabled while being truncated. I claim that this is "good enough", if we remember why we disable compaction in the first place: It's important to disable compaction while truncating because truncating during compaction can lead us to data resurection when the old sstable is deleted during truncation but the result of the compaction is written back. True, this can now happen in a new view (a view created DURING the truncation). But I claim that worse things can happen for this new view: Notably, we may truncate a view and then the ongoing view building (which happens in a new view) might copy data from the base to the view and only then truncate the base - ending up with an empty base and non-empty view. This problem - issue #17635 - is more likely, and more serious, than the compaction problem, so will need to be solved in a separate patch. Fixes #17543. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17634 (cherry picked from commit `8df2ea3f95`) Closes scylladb/scylladb#19580	2024-07-05 11:19:33 +03:00
Avi Kivity	01d5169593	Merge '[Backport 5.4] Close output_stream in get_compaction_history() API handler' from ScyllaDB If an httpd body writer is called with output_stream<>, it mist close the stream on its own regardless of any exceptions it may generate while working, otherwise stream destructor may step on non-closed assertion. Stepped on with different handler, see #19541 Coroutinize the handler as the first step while at it (though the fix would have been notably shorter if done with .finally() lambda) (cherry picked from commit `acb351f4ee`) (cherry picked from commit `6d4ba98796`) (cherry picked from commit `b4f9387a9d`) Refs #19543 Closes scylladb/scylladb#19602 * github.com:scylladb/scylladb: api: Close response stream of get_compaction_history() api: Flush output stream in get_compaction_history() call api: Coroutinize get_compaction_history inner function	2024-07-04 15:06:53 +03:00
Pavel Emelyanov	3116ea7d8e	api: Close response stream of get_compaction_history() The function must close the stream even if it throws along the way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `b4f9387a9d`)	2024-07-03 18:30:16 +00:00
Pavel Emelyanov	36ccf67bee	api: Flush output stream in get_compaction_history() call It's currently implicitly flushed on its close, but in that case close can throw while flusing. Next patch wants close not to throw and that's possible if flushing the stream in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `6d4ba98796`)	2024-07-03 18:30:15 +00:00
Pavel Emelyanov	bae47ca197	api: Coroutinize get_compaction_history inner function The handler returns a function which is then invoked with output_stream argument to render the json into. This function is converted into coroutine. It has yet another inner lambda that's passed into compaction_manager::get_compaction_history() as consumer lambda. It's coroutinized too. The indentation looks weird as preparation for future patching. Hopefullly it's still possible to understand what's going on. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `acb351f4ee`)	2024-07-03 18:30:15 +00:00
Tzach Livyatan	fdcbbb85ad	Docs: Fix a typo in sstable-corruption.rst (cherry picked from commit `a7115124ce`) Closes scylladb/scylladb#19590 scylla-5.4.9-candidate scylla-5.4.9	2024-07-03 10:24:15 +02:00
Piotr Dulikowski	65daae0fbe	Merge '[Backport 5.4] cql3/statement/select_statement: do not parallelize single-partition aggregations #19414 ' from Michał Jadwiszczak This patch adds a check if aggregation query is doing single-partition read and if so, makes the query to not use forward_service and do not parallelize the request. Fixes scylladb/scylladb#19349 (cherry picked from commit `e9ace7c203`) (cherry picked from commit `8eb5ca8202`) Refs scylladb/scylladb#19350 Closes scylladb/scylladb#19500 * github.com:scylladb/scylladb: test/boost/cql_query_test: add test for single-partition aggregation cql3/select_statement: do not parallelize single-partition aggregations	2024-07-03 05:57:49 +02:00
Amnon Heiman	f07fbcf929	replica/table.cc: Add metrics per-table-per-node This patch adds metrics that will be reported per-table per-node. The added metrics (that are part of the per-table per-shard metrics) are: scylla_column_family_cache_hit_rate scylla_column_family_read_latency scylla_column_family_write_latency scylla_column_family_live_disk_space Fixes #18642 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `bc3cc6777b`) Closes scylladb/scylladb#19553	2024-07-02 10:49:00 +02:00
Pavel Emelyanov	1d0a6672d6	Merge '[Backport 5.4] Close output stream in task manager's API get_tasks handler' from Pavel Emelyanov If client stops reading response early, the server-side stream throws but must be closed anyway. Seen in another endpoint and fixed by https://github.com/scylladb/scylladb/pull/19541 Original commit is `0ce00ebfbd` Had simple conflict with `ffb5ad494f` closes: #19560 Closes scylladb/scylladb#19571 * https://github.com/scylladb/scylladb: api: Fix indentation after previous patch api: Close response stream on error api: Flush response output stream before closing	2024-07-01 18:02:35 +03:00
Pavel Emelyanov	e16327034c	[Backport 5.4] Close output_stream in get_snapshot_details() API handler All streams used by httpd handlers are to be closed by the handler itself, caller doesn't take care of that. On master this place is coroutinized, thus the original fix doesn't fit as is. Original commit is `0ce00ebfbd` refs: #19494 closes: #19561 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19570	2024-07-01 18:02:26 +03:00
Pavel Emelyanov	3510ff3179	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `1be8b2fd25`)	2024-07-01 10:58:57 +03:00
Pavel Emelyanov	e9b8d08b74	api: Close response stream on error The handler's lambda is called with && stream object and must close the stream on its own regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `986a04cb11`)	2024-07-01 10:58:47 +03:00
Pavel Emelyanov	d67af8da1c	api: Flush response output stream before closing The .close() method flushes the stream, but it may throw doing it. Next patch will want .close() not to throw, for that stream must be flushed explicitly before closing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `4897d8f145`)	2024-07-01 10:58:14 +03:00
Jenkins Promoter	66fc7c0494	Update ScyllaDB version to: 5.4.9	2024-06-30 16:23:57 +03:00
Raphael S. Carvalho	67be26ff7d	compaction: Reduce twcs off-strategy space overhead to 10% of free space TWCS off-strategy suffers with 100% space overhead, so a big TWCS table can cause scylla to run out of disk space during node ops. To not penalize TWCS tables, that take a small percentage of disk, with increased write ampl, TWCS off-strategy will be restricted to 10% of free disk space. Then small tables can still compact all disjoint sstables in a single round. Fixes #16514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `ace4e5111e`)	2024-06-29 11:29:59 -03:00
Raphael S. Carvalho	97893a4f6d	compaction: wire storage free space into reshape procedure After this, TWCS reshape procedure can be changed to limit job to 10% of available space. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `0ce8ee03f1`)	2024-06-29 11:29:59 -03:00
Raphael S. Carvalho	ab9683d182	sstables: Allow to get free space from underlying storage That will be used in turn to restrict reshape to 10% of available space in underlying storage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `51c7ee889e`)	2024-06-29 11:29:57 -03:00
Yaron Kaikov	892ffa966d	.github/scripts/label_promoted_commits.py: fix adding labels when PR is closed `prs = response.json().get("items", [])` will return empty when there are no merged PRs, and this will just skip the all-label replacement process. This is a regression following the work done in #19442 Adding another part to handle closed PRs (which is the majority of the cases we have in Scylla core) Fixes: https://github.com/scylladb/scylladb/issues/19441 (cherry picked from commit `efa94b06c2`) Closes scylladb/scylladb#19526	2024-06-27 20:58:56 +03:00
Botond Dénes	4b6e462266	Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`. Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table. The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards. Fixes #5883. Closes scylladb/scylladb#18286 * github.com:scylladb/scylladb: test/alternator: test addressing LSI using REST API REST API: stop using deprecated, buggy, path parameter (cherry picked from commit `0438febdc9`) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-06-27 20:58:56 +03:00

1 2 3 4 5 ...

39690 Commits