Commit Graph

43041 Commits

Author SHA1 Message Date
Tomasz Grabiec
416cbafd16 Merge '[Backport 6.0] sstables: fix some mixups between the writer's schema and the sstable's schema' from Michał Chojnowski
There are two schemas associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

This series fixes the known mixups between the two — when setting up compression,
and when setting up the bloom filters.

Fixes scylladb/scylladb#16065

The bug is present in all supported versions, so the patch has to be backported to all of them.

(cherry picked from commit a1834efd82)

(cherry picked from commit d10b38ba5b)

(cherry picked from commit 1a8ee69a43)

Refs scylladb/scylladb#19695

Closes scylladb/scylladb#19877

* github.com:scylladb/scylladb:
  sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's
  sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's
  sstables: for i_filter downcasts, use dynamic_cast instead of static_cast
2024-07-29 15:36:52 +02:00
Jenkins Promoter
36cb61589d Update ScyllaDB version to: 6.0.3 2024-07-29 15:21:14 +03:00
Takuya ASADA
fefa76bffc scylla_raid_setup: install update-initramfs when it's not available
scylla_raid_setup may fail on Ubuntu minimal image since it calls
update-initramfs without installing.

(cherry picked from commit b6dedf1ee1)

Closes scylladb/scylladb#19871
2024-07-25 13:58:11 +03:00
Michał Chojnowski
43ba44ce97 sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

The problem fixed by this patch is that the writer was wrongly creating
the compressor objects based on its own schema, but using them based
based on the sstable's schema the sstable's schema.
This patch forces the writer to use the sstable's schema for both.

(cherry picked from commit 1a8ee69a43)
2024-07-25 12:23:58 +02:00
Michał Chojnowski
d6d3a91283 sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

The problem fixed by this patch is that the writer was wrongly creating
the filter based on its own schema, while the layer outside the writer
was interpreting it as if it was created with the sstable's schema.

This patch forces the writer to pick the filter's parameters based on the
sstable's schema instead.

(cherry picked from commit d10b38ba5b)
2024-07-25 12:23:58 +02:00
Michał Chojnowski
0555d4c30b sstables: for i_filter downcasts, use dynamic_cast instead of static_cast
As of this patch, those static_casts are actually invalid in some cases
(they cast to the wrong type) because of an oversight.
A later patch will fix that. But to even write a reliable reproducer
for the problem, we must force the invalid casts to manifest as a crash
(instead of weird results).

This patch both allows writing a reproducer for the bug and serves
as a bit of defensive programming for the future.

(cherry picked from commit a1834efd82)

# Conflicts:
#	sstables/sstables.cc
2024-07-25 12:23:58 +02:00
Nadav Har'El
af39675c38 alternator: fix "/localnodes" to not return nodes still joining
Alternator's "/localnodes" HTTP request is supposed to return the list of
nodes in the local DC to which the user can send requests.

The existing implementation incorrectly used gossiper::is_alive() to check
for which nodes to return - but "alive" nodes include nodes which are still
joining the cluster and not really usable. These nodes can remain in the
JOINING state for a long time while they are copying data, and an attempt
to send requests to them will fail.

The fix for this bug is trivial: change the call to is_alive() to a call
to is_normal().

But the hard part of this test is the testing:

1. An existing multi-node test for "/localnodes" assummed that right after
   a new node was created, it appears on "/localnodes". But after this
   patch, it may take a bit more time for the bootstrapping to complete
   and the new node to appear in /localnodes - so I had to add a retry loop.

2. I added a test that reproduces the bug fixed here, and verifies its
   fix. The test is in the multi-node topology framework. It adds an
   injection which delays the bootstrap, which leaves a new node in JOINING
   state for a long time. The test then verifies that the new node is
   alive (as checked by the REST API), but is not returned by "/localnodes".

3. The new injection for delaying the bootstrap is unfortunately not
   very pretty - I had to do it in three places because we have several
   code paths of how bootstrap works without repair, with repair, without
   Raft and with Raft - and I wanted to delay all of them.

Fixes #19694.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19725

(cherry picked from commit bac7c33313)
(deleted test for cherry-pick)
2024-07-24 11:29:36 +03:00
Lakshmi Narayanan Sreethar
3c1fd843c8 [Backport 6.0]: sstables: do not reload components of unlinked sstables
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.

The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.

Fixes https://github.com/scylladb/scylladb/issues/19722

Closes scylladb/scylladb#19717

* github.com:scylladb/scylladb:
  boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded
  sstables: do not reload components of unlinked sstables
  sstables/sstables_manager: introduce on_unlink method

(cherry picked from commit 591876b44e)

Backported from #19717 to 6.0

Closes scylladb/scylladb#19830
2024-07-23 23:16:53 +03:00
Piotr Dulikowski
9cc20e7c4d Merge '[Backport 6.0] schema: fix describe of indexes on collections' from ScyllaDB
If the index was created on collection (both frozen or not), its description wasn't a correct create statement.
This patch fixes the bug and includes functions like `full()`, `keys()`, `values()`, ... used to create index on collections.

Fixes scylladb/scylladb#19278

(cherry picked from commit 253feb6811)

(cherry picked from commit b65a4c66f0)

 Refs #19381

Closes scylladb/scylladb#19700

* github.com:scylladb/scylladb:
  cql-pytest/test_describe: add a test for describe indexes
  schema/schema: fix column names in index description
2024-07-22 12:33:47 +02:00
Kamil Braun
9efaca0bd2 Merge '[Backport 6.0] test: raft: fix the flaky test_raft_recovery_stuck' from ScyllaDB
Use the rolling restart to avoid spurious driver reconnects.

This can be eventually reverted once the scylladb/python-driver#295 is fixed.

Fixes scylladb/scylladb#19154

(cherry picked from commit ef3393bd36)

(cherry picked from commit a89facbc74)

 Refs #19771

Closes scylladb/scylladb#19809

* github.com:scylladb/scylladb:
  test: raft: fix the flaky `test_raft_recovery_stuck`
  test: raft: code cleanup in `test_raft_recovery_stuck`
2024-07-22 11:14:44 +02:00
Emil Maskovsky
62c9709f4a test: raft: fix the flaky test_raft_recovery_stuck
Use the rolling restart to avoid spurious driver reconnects.

This can be eventually reverted once the scylladb/python-driver#295 is
fixed.

Fixes scylladb/scylladb#19154

(cherry picked from commit a89facbc74)
2024-07-20 02:17:50 +00:00
Emil Maskovsky
64d414f10a test: raft: code cleanup in test_raft_recovery_stuck
Cleaning up the imports.

(cherry picked from commit ef3393bd36)
2024-07-20 02:17:50 +00:00
Kamil Braun
f32ed716ed Merge '[Backport 6.0] Fix lwt semaphore guard accounting' from ScyllaDB
Currently the guard does not account correctly for ongoing operation if semaphore acquisition fails. It may signal a semaphore when it is not held.

Should be backported to all supported versions.

(cherry picked from commit 87beebeed0)

(cherry picked from commit 4178589826)

 Refs #19699

Closes scylladb/scylladb#19796

* github.com:scylladb/scylladb:
  test: add test to check that coordinator lwt semaphore continues functioning after locking failures
  paxos: do not signal semaphore if it was not acquired
2024-07-19 19:06:36 +02:00
Gleb Natapov
c437c8be36 test: add test to check that coordinator lwt semaphore continues functioning after locking failures
(cherry picked from commit 4178589826)
2024-07-18 15:34:17 +00:00
Gleb Natapov
1c04b95c68 paxos: do not signal semaphore if it was not acquired
The guard signals a semaphore during destruction if it is marked as
locked, but currently it may be marked as locked even if locking failed.
Fix this by using semaphore_units instead of managing the locked flag
manually.

Fixes: https://github.com/scylladb/scylladb/issues/19698
(cherry picked from commit 87beebeed0)
2024-07-18 15:34:16 +00:00
Emil Maskovsky
5649b55e08 test: raft: fix the flaky test_change_ip
The python driver might currently trigger spurios reconnects that cause
the `NoHostAvailable` to be thrown, which is not expected.

This patch adds a retry mechanism to the test to make skip this failure
if it occurs, as a work-around.

The proper fix is expected to be done in the scylladb/python-driver#295,
once fixed there this work-around can be reverted.

Fixes: scylladb/scylla#18547
(cherry picked from commit 6b9992737a)

Closes scylladb/scylladb#19773
2024-07-18 15:06:23 +02:00
Emil Maskovsky
b4745406da raft: Fix crash in leader_host API handler
The leader_host API handler was eventually using the `req` unique_ptr
after it has been already destroyed (passed down to the future lambda
by reference). This was causing an occassional crash in some tests.

Reworked the leader_host handler to use the req only outside of the
future lambda.

Also updated the code to handle the possibility that the non-default
leader group (other than Group 0) might reside on a different shard
than the shard 0 - using the same concept of calling on all shards via
`invoke_on_all()` as done for the other requests.

Fixes scylladb/scylladb#19714

(cherry picked from commit b2db8f4b9b)

Closes scylladb/scylladb#19742
2024-07-16 13:29:37 +02:00
Anna Stuchlik
27faec3015 doc: replace a link on the CDC+Kafka page
This commit replaces a link to the installation section with a link to the getting started section.

(cherry picked from commit f90867c740)

Closes scylladb/scylladb#19712
2024-07-16 13:15:45 +02:00
Emil Maskovsky
06c356df8f test: raft: fix the topology failure recovery test flakiness
Setting the error condition for all nodes in the cluster to avoid
having to check which one is the coordinator. This should make the test
more stable and avoid the flakiness observed when the coordinator node
is the one that got the error condition injected.

Randomizing the retrieved running servers to reproduce the issue more
frequently and to avoid making any assumptions about the order of the
servers.

Note that only the "raft_topology_barrier_fail" needs to run
on a non-coordinator node, the other error "stream_ranges_fail" can be
injected on any node (including the coordinator).

Fixes: #18614
(cherry picked from commit 9dbad34205)

Closes scylladb/scylladb#19708
2024-07-15 16:27:22 +02:00
Michael Litvak
815a707b0a storage_proxy: remove response handler if no targets
When writing a mutation, it might happen that there are no live targets
to send the mutation to, yet the request can be satisfied. For example,
when writing with CL=ANY to a dead node, the request is completed by
storing a local hint.

Currently, in that case, a write response handler is created for the
request and it remains active until it timeouts because it is not
removed anywhere, even though the write is completed successfuly after
storing the hint. The response handler should be removed usually when
receiving responses from all targets, but in this case there are no
targets to trigger the removal.

In this commit we check if we don't have live targets to send the
mutation to. If so, we remove the response handler immediately.

Fixes scylladb/scylladb#19529

(cherry picked from commit a9fdd0a93a)

Closes scylladb/scylladb#19680
2024-07-15 08:24:18 +02:00
Botond Dénes
16452f9cf5 Merge '[Backport 6.0] scylla-sstable: add method to load the schema from the sstable itself' from ScyllaDB
As it turns out, each sstable carries its own schema in its serialization header (Statistics component). This schema is incomplete -- the names of the key columns are not stored, just their type. Static and regular columns do have names and types stored however. This bare-bones schema is enough to parse and display the content of the sstable. Another thing missing is schema options (the stuff after the `WITH` keyword, except the clustering order). The only options stored are the compression options (in the CompressionInfo component), this is actually needed to read the Data component.

This series adds a new method to `tools/schema_loader.cc` to extract the schema stored in the sstable itself. This new schema load method is used as the last fall-back for obtaining the schema, in case scylla-sstable is trying to autodetect the schema of the sstable. Although, right now this bare-bones schema is enough for everything scylla-sstable does, it is more future proof to stick to the "full" schema if possible, so this new method is the last resort for now.

Fixes: https://github.com/scylladb/scylladb/issues/17869
Fixes: https://github.com/scylladb/scylladb/issues/18809

New functionality, no backport needed.

(cherry picked from commit 435c01d1e6)

(cherry picked from commit 0d7335dd27)

(cherry picked from commit 8f2ba03465)

(cherry picked from commit 43c44f0af5)

(cherry picked from commit 145a67f77c)

 Refs #19169

Closes scylladb/scylladb#19711

* github.com:scylladb/scylladb:
  tools/scylla-sstable: log loaded schema with trace level
  tools/scylla-sstable: load schema from the sstable as fallback
  tools/schema_loader: introduce load_schema_from_sstable()
  test/lib/random_schema: remove assert on min number of regular columns
  sstables: introduce load_metadata()
2024-07-12 16:55:44 +03:00
Botond Dénes
5d94a08250 tools/scylla-sstable: log loaded schema with trace level
The schema of the sstable can be interesting, so log it with trace
level. Unfortunately, this is not the nice CQL statement we are used to
(that requires a database object), but the not-nearly-so-nice CFMetadata
printout. Still, it is better then nothing.

(cherry picked from commit 145a67f77c)
2024-07-12 10:36:59 +00:00
Botond Dénes
4f74e6f28e tools/scylla-sstable: load schema from the sstable as fallback
When auto-detecting the schema of the sstable, if all other methods
failed, load the schema from the sstable's serialization header. This
schema is incomplete. It is just enough to parse and display the content
of the sstable. Although parsing and displaying the content of the
sstable is all scylla-sstable does, it is more future-compatible to us
the full schema when possible. So the always-available but minimal
schema that each sstable has on itself, is used just as a fallback.

The test which tested the case when all schema load attempts fail,
doesn't work now, because loading the serialization header always
succeeds. So convert this test into two positive tests, testing the
serialization header schema fallback instead.

(cherry picked from commit 43c44f0af5)
2024-07-12 10:36:59 +00:00
Botond Dénes
f42e8e872a tools/schema_loader: introduce load_schema_from_sstable()
Allows loading the schema from an sstable's serialization header. This
schema is incomplete, but it is enough to parse and display the content
of the sstable.

(cherry picked from commit 8f2ba03465)
2024-07-12 10:36:59 +00:00
Botond Dénes
f7c8c32929 test/lib/random_schema: remove assert on min number of regular columns
It is legal for a schema to have 0 regular columns, so remove the assert
on the schema specification's regular column count.

(cherry picked from commit 0d7335dd27)
2024-07-12 10:36:59 +00:00
Botond Dénes
4f165eb3e9 sstables: introduce load_metadata()
Loads just the metadata components. No validation.
Split off from load(), to allow scylla-sstable to partially load an
sstable.

(cherry picked from commit 435c01d1e6)
2024-07-12 10:36:59 +00:00
Michał Jadwiszczak
25f8fd0b5c cql-pytest/test_describe: add a test for describe indexes
(cherry picked from commit b65a4c66f0)
2024-07-11 12:59:27 +00:00
Michał Jadwiszczak
67764e7d66 schema/schema: fix column names in index description
Previously description of index didn't include functions for
indexes on collections like full(), keys(), values(), etc...

(cherry picked from commit 253feb6811)
2024-07-11 12:59:27 +00:00
Tomasz Grabiec
43ff19273c Merge '[Backport 6.0] mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion' from ScyllaDB
apply_monotonically() is run with reclaim disabled. So with some bad luck,
sentinel insertion might fail with bad_alloc even on a perfectly healthy node.
We can't deal with the failure of sentinel insertion, so this will result in a
crash.

This patch prevents the spurious OOM by reserving some memory (1 LSA segment)
and only making it available right before the critical allocations.

Fixes https://github.com/scylladb/scylladb/issues/19552

(cherry picked from commit f784be6a7e)

(cherry picked from commit 7b3f55a65f)

(cherry picked from commit 78d6471ce4)

Refs #19617

Closes scylladb/scylladb#19675

* github.com:scylladb/scylladb:
  mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion
  logalloc: add hold_reserve
  logalloc: generalize refill_emergency_reserve()
2024-07-10 14:28:01 +02:00
Michał Chojnowski
aee0150506 mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion
apply_monotonically() is run with reclaim disabled. So with some bad luck,
sentinel insertion might fail with bad_alloc even on a perfectly healthy node.
We can't deal with the failure of sentinel insertion, so this will result in a
crash.

This patch prevents the spurious OOM by reserving some memory (1 LSA segment)
and only making it available right before the critical allocations.

Fixes scylladb/scylladb#19552

(cherry picked from commit 78d6471ce4)
2024-07-10 08:36:11 +00:00
Michał Chojnowski
c5c19e90ac logalloc: add hold_reserve
mutation_partition_v2::apply_monotonically() needs to perform some allocations
in a destructor, to ensure that the invariants of the data structure are
restored before returning. But it is usually called with reclaiming disabled,
so the allocations might fail even in a perfectly healthy node with plenty of
reclaimable memory.

This patch adds a mechanism which allows to reserve some LSA memory (by
asking the allocator to keep it unused) and make it available for allocation
right when we need to guarantee allocation success.

(cherry picked from commit 7b3f55a65f)
2024-07-10 08:36:11 +00:00
Michał Chojnowski
985f5a50f6 logalloc: generalize refill_emergency_reserve()
In the next patch, we will want to do the thing as
refill_emergency_reserve() does, just with a quantity different
than _emergency_reserve_max. So we split off the shareable part
to a new function, and use it to implement refill_emergency_reserve().

(cherry picked from commit f784be6a7e)
2024-07-10 08:36:11 +00:00
Botond Dénes
ae11381d7c Merge '[Backport 6.0] reader_concurrency_semaphore: make CPU concurrency configurable' from Botond Dénes
The reader concurrency semaphore restricts the concurrency of reads that require CPU (intention: they read from the cache) to 1, meaning that if there is even a single active read which declares that it needs just CPU to proceed, no new read is admitted. This is meant to keep the concurrency of reads in the cache at 1. The idea is that concurrency in the cache is not useful: it just leads to the reactor rotating between these reads, all of the finishing later then they could if they were the only active read in the cache.
This was observed to backfire in the case where there reads from a single table are mostly very fast, but on some keys are very slow (hint: collection full of tombstones). In this case the slow read keeps up the fast reads in the queue, increasing the 99th percentile latencies significantly.

This series proposes to fix this, by making the CPU concurrency configurable. We don't like tunables like this and this is not a proper fix, but a workaround. The proper fix would be to allow to cut any page early, but we cannot cut a page in the middle of a row. We could maybe have a way of detecting slow reads and excluding them from the CPU concurrency. This would be a heuristic and it would be hard to get right. So in this series a robust and simple configurable is offered, which can be used on those few clusters which do suffer from the too strict concurrency limit. We have seen it in very few cases so far, so this doesn't seem to be wide-spread.

Fixes: https://github.com/scylladb/scylladb/issues/19017

This PR backports https://github.com/scylladb/scylladb/pull/19018 and its follow-up https://github.com/scylladb/scylladb/pull/19600.

Closes scylladb/scylladb#19644

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop
  test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrency
  test/boost/reader_concurrency_semaphore_test: hoist require_can_admit
  reader_concurrency_semaphore: wire in the configurable cpu concurrency
  reader_concurrency_semaphore: add cpu_concurrency constructor parameter
  db/config: introduce reader_concurrency_semahore_cpu_concurrency
2024-07-10 07:23:08 +03:00
Anna Stuchlik
4ec5a06101 doc: update Scylla Doctor installation
This commit updates the instuctions on how to download and run Scylla Doctor,
following the changes in how Scylla Doctor is released.

(cherry picked from commit 2ffda9b262)

Closes scylladb/scylladb#19525
2024-07-09 14:32:21 +03:00
Anna Stuchlik
dcf4c757b2 doc: remove support for Debian 10
This PR removes support for Debian 10, which reached end of life on June 30, 2024.

Refs https://github.com/scylladb/scylla-enterprise/issues/4377

(cherry picked from commit 1f340428ea)

Closes scylladb/scylladb#19630
2024-07-09 12:55:11 +02:00
Wojciech Przytuła
a7fe9eeffd storage_proxy: fix uninitialized LWT contention counter
When debugging the issue of high LWT contention metric, we (the
drivers team) discovered that at least 3 drivers (Go, Java, Rust)
cause high numbers in that metrics in LWT workloads - we doubted that
all those drivers route LWT queries badly. We tried to understand that
metric and its semantics. It took 3 people over 10 hours to figure out
what it is supposed to count.

People from core team suspected that it was the drivers sending
requests to different shards, causing contention. Then we ran the
workload against a single node single shard cluster... and observed
contention. Finally, we looked into the Scylla code and saw it.

**Uninitialized stack value.**

The core member was shocked. But we, the drivers people, felt we always
knew it. It's yet another time that we are blamed for a server-side
issue. We rebuilt scylla with the variable initialized to 0 and the
metric kept being 0.

To prevent such errors in the future, let's consider some lints that
warn against uninitialized variables. This is such an obvious feature
of e.g. Rust, and yet this has shown to be cause a painful bug in 2024.

Fixes: scylladb/scylladb#19654
(cherry picked from commit 36a125bf97)

Closes scylladb/scylladb#19657
2024-07-09 11:41:10 +02:00
Michael Litvak
ad6eb1cadf view: drain view builder before database
The view builder is doing write operations to the database.
In order for the view builder to shutdown gracefully without errors, we
need to ensure the database can handle writes while it is drained.
The commit changes the drain order, so that view builder is drained
before the database shuts down.

Fixes scylladb/scylladb#18929

(cherry picked from commit 9d9318c564)

Closes scylladb/scylladb#19636
2024-07-08 19:16:26 +02:00
Botond Dénes
dadc0c32e1 reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop
Now that the CPU concurency limit is configurable, new reads might be
ready to execute right after the current one was executed. So move the
poll for admitting new reads into the inner loop, to prevent the
situation where the inner loop yields and a concurrent
do_wait_admission() finds that there are waiters (queued because at the
time they arrived to the semaphore, the _ready_list was not empty) but it
is is possible to admit a new read. When this happens the semaphore will
dump diagnostics to help debug the apparent contradiction, which can
generate a lot of log spam. Moving the poll into the inner loop prevents
the false-positive contradiction detection from firing.

Refs: scylladb/scylladb#19017

Closes scylladb/scylladb#19600

(cherry picked from commit 155acbb306)
2024-07-08 08:13:40 +03:00
Botond Dénes
88d3c2eb4b test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrency
(cherry picked from commit b4f3809ad2)
2024-07-08 08:13:07 +03:00
Botond Dénes
4307631950 test/boost/reader_concurrency_semaphore_test: hoist require_can_admit
This is currently a lambda in a test, hoist it into the global scope and
make it into a function, so other tests can use it too (in the next
patch).

(cherry picked from commit 9cbdd8ef92)
2024-07-08 08:12:34 +03:00
Botond Dénes
abc4a9b635 reader_concurrency_semaphore: wire in the configurable cpu concurrency
Before this patch, the semaphore was hard-wired to stop admission, if
there is even a single permit, which is in the need_cpu state.
Therefore, keeping the CPU concurrency at 1.
This patch makes use of the new cpu_concurrency parameter, which was
wired in in the last patches, allowing for a configurable amount of
concurrent need_cpu permits. This is to address workloads where some
small subset of reads are expected to be slow, and can hold up faster
reads behind them in the semaphore queue.

(cherry picked from commit 07c0a8a6f8)
2024-07-08 08:12:34 +03:00
Botond Dénes
052cef2621 reader_concurrency_semaphore: add cpu_concurrency constructor parameter
In the case of the user semaphore, this receives the new
reader_concurrency_semaphore_cpu_limit config item.
Not used yet.

(cherry picked from commit 59faa6d4ff)
2024-07-08 08:12:20 +03:00
Botond Dénes
5a7af93c7c db/config: introduce reader_concurrency_semahore_cpu_concurrency
To allow increasing the semaphore's CPU concurrency, which is currently
hard-limited to 1. Not wired yet.

(cherry picked from commit c7317be09a)
2024-07-08 08:06:28 +03:00
Pavel Emelyanov
78f3fc8890 tablet_allocator: Put more info into failed-to-drain exception
When balancer fails to find a node to balance drained tablets into, it
throws an exception with tablet id and node id, but it's also good to
know more details about the balancing state that lead to failure

refs: #19504

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit c3d9831c5f)

Closes scylladb/scylladb#19619
2024-07-05 11:17:37 +03:00
None
3e06c882f0 .github: remove pull_request_template
The reason for the pr template is to explain why do we need to backport
a PR.

On release branches there is no need for it

Closes scylladb/scylladb#19615
2024-07-04 16:52:27 +03:00
Avi Kivity
c6e8a7f762 Merge '[Backport 6.0] Close output_stream in get_compaction_history() API handler' from ScyllaDB
If an httpd body writer is called with output_stream<>, it mist close the stream on its own regardless of any exceptions it may generate while working, otherwise stream destructor may step on non-closed assertion. Stepped on with different handler, see #19541

Coroutinize the handler as the first step while at it (though the fix would have been notably shorter if done with .finally() lambda)

(cherry picked from commit acb351f4ee)

(cherry picked from commit 6d4ba98796)

(cherry picked from commit b4f9387a9d)

 Refs #19543

Closes scylladb/scylladb#19603

* github.com:scylladb/scylladb:
  api: Close response stream of get_compaction_history()
  api: Flush output stream in get_compaction_history() call
  api: Coroutinize get_compaction_history inner function
2024-07-04 15:08:08 +03:00
Pavel Emelyanov
941ec80a00 api: Close response stream of get_compaction_history()
The function must close the stream even if it throws along the way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit b4f9387a9d)
2024-07-03 18:30:17 +00:00
Pavel Emelyanov
ab5041cb03 api: Flush output stream in get_compaction_history() call
It's currently implicitly flushed on its close, but in that case close
can throw while flusing. Next patch wants close not to throw and that's
possible if flushing the stream in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 6d4ba98796)
2024-07-03 18:30:17 +00:00
Pavel Emelyanov
009f5eb69e api: Coroutinize get_compaction_history inner function
The handler returns a function which is then invoked with output_stream
argument to render the json into. This function is converted into
coroutine. It has yet another inner lambda that's passed into
compaction_manager::get_compaction_history() as consumer lambda. It's
coroutinized too.

The indentation looks weird as preparation for future patching.
Hopefullly it's still possible to understand what's going on.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit acb351f4ee)
2024-07-03 18:30:17 +00:00
Tzach Livyatan
c9cd171f42 Docs: Fix a typo in sstable-corruption.rst
(cherry picked from commit a7115124ce)

Closes scylladb/scylladb#19591
scylla-6.0.2-candidate-20240703050547 scylla-6.0.2
2024-07-03 10:24:44 +02:00