Commit Graph

39623 Commits

Author SHA1 Message Date
Raphael S. Carvalho
67be26ff7d compaction: Reduce twcs off-strategy space overhead to 10% of free space
TWCS off-strategy suffers with 100% space overhead, so a big TWCS table
can cause scylla to run out of disk space during node ops.

To not penalize TWCS tables, that take a small percentage of disk,
with increased write ampl, TWCS off-strategy will be restricted to
10% of free disk space. Then small tables can still compact all
disjoint sstables in a single round.

Fixes #16514.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit ace4e5111e)
2024-06-29 11:29:59 -03:00
Raphael S. Carvalho
97893a4f6d compaction: wire storage free space into reshape procedure
After this, TWCS reshape procedure can be changed to limit job
to 10% of available space.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 0ce8ee03f1)
2024-06-29 11:29:59 -03:00
Raphael S. Carvalho
ab9683d182 sstables: Allow to get free space from underlying storage
That will be used in turn to restrict reshape to 10% of available space
in underlying storage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 51c7ee889e)
2024-06-29 11:29:57 -03:00
Calle Wilund
19999554e7 main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars
Fixes scylladb/scylla-pkg#3845

Don't overwrite (or rather change) AWS credentials variables if already set in
enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI.

v2:
* Allow environment variables in reading obj storage config - allows CI to
  use real credentials in env without risking putting them info less seure
  files
* Don't write credentials info from miniserver into config, instead use said
  environment vars to propagate creds.

v3:
* Fix python launch scripts to not clear environment, thus retaining above aws envs.

(cherry picked from commit 5056a98289)

Closes scylladb/scylladb#19336
2024-06-20 13:23:40 +03:00
Botond Dénes
43f77c71c7 [Backport 5.4] : Merge 'Fix usage of utils/chunked_vector::reserve_partial' from Lakshmi Narayanan Sreethar
utils/chunked_vector::reserve_partial: fix usage in callers

The method reserve_partial(), when used as documented, quits before the
intended capacity can be reserved fully. This can lead to overallocation
of memory in the last chunk when data is inserted to the chunked vector.
The method itself doesn't have any bug but the way it is being used by
the callers needs to be updated to get the desired behaviour.

Instead of calling it repeatedly with the value returned from the
previous call until it returns zero, it should be repeatedly called with
the intended size until the vector's capacity reaches that size.

This PR updates the method comment and all the callers to use the
right way.

Fixes #19254

Closes scylladb/scylladb#19279

* github.com:scylladb/scylladb:
  utils/large_bitset: remove unused includes identified by clangd
  utils/large_bitset: use thread::maybe_yield()
  test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial
  utils/lsa/chunked_managed_vector: fix reserve_partial()
  utils/chunked_vector: return void from reserve_partial and make_room
  test/boost/chunked_vector_test: fix testcase tests_reserve_partial
  utils/chunked_vector::reserve_partial: fix usage in callers

(cherry picked from commit b2ebc172d0)

Backported from #19308 to 5.4

Closes scylladb/scylladb#19355
scylla-5.4.8 scylla-5.4.8-candidate
2024-06-19 14:34:29 +03:00
Botond Dénes
4aa0b84ba7 Merge '[Backport 5.4] sstables_manager: use maintenance scheduling group to run components reload fiber' from Lakshmi Narayanan Sreethar
PR https://github.com/scylladb/scylladb/pull/18186 introduced a fiber that reloads reclaimed bloom filters when memory becomes available. Use maintenance scheduling group to run that fiber instead of running it in the main scheduling group.

Fixes https://github.com/scylladb/scylladb/issues/18675

(cherry picked from commit 79f6746298)

(cherry picked from commit 6f58768c46)

Backported from https://github.com/scylladb/scylladb/pull/18721 to 5.4.

Closes scylladb/scylladb#19354

* github.com:scylladb/scylladb:
  sstables_manager: use maintenance scheduling group to run components reload fiber
  sstables_manager: add member to store maintenance scheduling group
2024-06-18 16:29:07 +03:00
Botond Dénes
427127de57 Merge ' [Backport 5.4] alternator: keep TTL work in the maintenance scheduling group' from Nadav Har'El
This is a fairly elaborate backport of commit b2a500a9a1 to branch 5.4
The code patch itself is trivial, and backported cleanly. The big problem was the test, which was written using the "topology" test framework - because it needs to test a cluster, not a single node, because the scheduling group problem only happened when sending requests between different Scylla nodes.

I had to fix in the backport the following problems:
1. The test used a library function add_servers() which didn't exist in branch 5.4, so needed to switch to making three individual add_server() calls.
2. The test was randomly placed in the topology_experimental_raft directory, which runs with the tablets experimental flag enabled. In 5.4, the tablets code was broken with Alternator, and CreateTable fails (it fails in the callback to create tablets, and doesn't even get to check that tablets weren't requested). So I needed to move it the test file to a different directory.
3. Even after moving the file, it still ran with the tablets experimental feature! Turns out that test.py enabled tablets experimental feature unconditionally. This is a mistake, and I'm sure was never intended (tablets were never meant to be supported in 5.4), so I removed enabling this feature. It's still enabled in the topology_experimental_raft directory, where it is explicitly enabled.

After all that, the test passes with the patch, showing that the code fix is correct also for 5.4.

Closes scylladb/scylladb#19321

* github.com:scylladb/scylladb:
  alternator, scheduler: test reproducing RPC scheduling group bug
  test.py: don't enable "tablets" experimental feature
  main: add maintenance tenant to messaging_service's scheduling config
2024-06-18 12:29:20 +03:00
Lakshmi Narayanan Sreethar
d7b1116170 sstables_manager: use maintenance scheduling group to run components reload fiber
Fixes #18675

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 6f58768c46)
2024-06-18 14:41:46 +05:30
Lakshmi Narayanan Sreethar
72155312e5 sstables_manager: add member to store maintenance scheduling group
Store that maintenance scheduling group inside the sstables_manager. The
next patch will use this to run the components reloader fiber.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 79f6746298)
2024-06-18 14:39:45 +05:30
Nadav Har'El
f8dcbc6037 alternator, scheduler: test reproducing RPC scheduling group bug
This patch adds a test for issue #18719: Although the Alternator TTL
work is supposedly done in the "streaming" scheduling group, it turned
out we had a bug where work sent on behalf of that code to other nodes
failed to inherit the correct scheduling group, and was done in the
normal ("statement") group.

Because this problem only happens when more than one node is involved,
the test is in the multi-node test framework test/topology_experimental_raft.

The test uses the Alternator API. We already had in that framework a
test using the Alternator API (a test for alternator+tablets), so in
this patch we move the common Alternator utility functions to a common
file, test_alternator.py, where I also put the new test.

The test is based on metrics: We write expiring data, wait for it to expire,
and then check the metrics on how much CPU work was done in the wrong
scheduling group ("statement"). Before #18719 was fixed, a lot of work
was done there (more than half of the work done in the right group).
After the issue was fixed in the previous patch, the work on the wrong
scheduling group went down to zero.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 1fe8f22d89)

Modifications in the cherry-pick:
 * Moved test to topology_custom directory, so it runs without tablets
 * use the server_add() function instead of the newer add_servers() which
   didn't yet exist in this branch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-06-18 10:17:45 +03:00
Nadav Har'El
dc1968cb9e test.py: don't enable "tablets" experimental feature
This branch (5.4) does NOT support tablets, and we don't want to run
any tests with the "tablets" experimental feature. When we made test.py
enable that feature by default, it was probably considered harmless -
the partial implementation we had in this branch won't do anything if
tablets aren't actually enabled for a specific keyspace.

But unfortunately, Alternator doesn't work with tablets enabled (there
was a bug in the callback during table creation), so we can't run any
Alternator tests from test.py (like the one we we wan to backport for
Alternator TTL scheduling groups) unless we drop that experimental
feature.

Note that one specific test subdirectory,
test/topology_experimental_raft, does enable this experimental
flag. The others shouldn't.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-06-18 10:14:53 +03:00
Botond Dénes
7e40b658c8 main: add maintenance tenant to messaging_service's scheduling config
Currently only the user tenant (statement scheduling group) and system
(default scheduling group) tenants exist, as we used to have only
user-initiated operations and sytem (internal) ones. Now there is need
to distinguish between two kinds of system operation: foreground and
background ones. The former should use the system tenant while the
latter will use the new maintenance tenant (streaming scheduling group).

(cherry picked from commit 5d3f7c13f9)
2024-06-18 10:08:46 +03:00
Tomasz Grabiec
58671274d8 test: pylib: Fetch all pages by default in run_async
Fetching only the first page is not the intuitive behavior expected by users.

This causes flakiness in some tests which generate variable amount of
keys depending on execution speed and verify later that all keys were
written using a single SELECT statement. When the amount of keys
becomes larger than page size, the test fails.

Fixes #18774

(cherry picked from commit 43b907b499)

Closes scylladb/scylladb#19129
2024-06-17 10:41:20 +02:00
Botond Dénes
c18f14cd78 Merge '[Backport 5.4] test: memtable_test: increase unspooled_dirty_soft_limit ' from ScyllaDB
before this change, when performing memtable_test, we expect that
the memtables of ks.cf is the only memtables being flushed. and
we inject 4 failures in the code path of flush, and wait until 4
of them are triggered. but in the background, `dirty_memory_manager`
performs flush on all tables when necessary. so, the total number of
failures is not necessary the total number of failures triggered
when flushing ks.cf, some of them could be triggered when flushing
system tables. that's why we have sporadict test failures from
this test. as we might check `t.min_memtable_timestamp()` too soon.

after this change, we increase `unspooled_dirty_soft_limit` setting,
in order to disable `dirty_memory_manager`, so that the only flush
is performed by the test.

Fixes https://github.com/scylladb/scylladb/issues/19034

---

the issue applies to both 5.4 and 6.0, and this issue hurts the CI stability, hence we should backport it.

(cherry picked from commit 2df4e9cfc2)

(cherry picked from commit 223fba3243)

Refs #19252

Closes scylladb/scylladb#19256

* github.com:scylladb/scylladb:
  test: memtable_test: increase unspooled_dirty_soft_limit
  test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE
2024-06-14 15:50:58 +03:00
Michał Chojnowski
c19f980802 storage_proxy: avoid infinite growth of _throttled_writes
storage_proxy has a throttling mechanism which attempts to limit the number
of background writes by forcefully raising CL to ALL
(it's not implemented exactly like that, but that's the effect) when
the amount of background and queued writes is above some fixed threshold.
If this is applied to a write, it becomes "throttled",
and its ID is appended to into _throttled_writes.

Whenever the amount of background and queued writes falls below the threshold,
writes are "unthrottled" — some IDs are popped from _throttled_writes
and the writes represented by these IDs — if their handlers still exist —
have their CL lowered back.

The problem here is that IDs are only ever removed from _throttled_writes
if the number of queued and background writes falls below the threshold.
But this doesn't have to happen in any finite time, if there's constant write
pressure. And in fact, in one load test, it hasn't happened in 3 hours,
eventually causing the buffer to grow into gigabytes and trigger OOM.

This patch is intended to be a good-enough-in-practice fix for the problem.

Fixes #17476
Fixes #1834

(cherry picked from commit 97e1518eb9)

Closes scylladb/scylladb#19179
2024-06-14 15:49:34 +03:00
Kamil Braun
0abccd212d raft: fsm: add details to on_internal_error_noexcept message
If we receive a message in the same term but from a different leader
than we expect, we print:
```
Got append request/install snapshot/read_quorum from an unexpected leader
```
For some reason the message did not include the details (who the leader
was and who the sender was) which requires almost zero effort and might
be useful for debugging. So let's include them.

Ref: scylladb/scylla-enterprise#4276
(cherry picked from commit 99a0599e1e)

Closes scylladb/scylladb#19264
2024-06-13 11:24:28 +02:00
Kefu Chai
b8d0df24ed test: memtable_test: increase unspooled_dirty_soft_limit
before this change, when performing memtable_test, we expect that
the memtables of ks.cf is the only memtables being flushed. and
we inject 4 failures in the code path of flush, and wait until 4
of them are triggered. but in the background, `dirty_memory_manager`
performs flush on all tables when necessary. so, the total number of
failures is not necessary the total number of failures triggered
when flushing ks.cf, some of them could be triggered when flushing
system tables. that's why we have sporadict test failures from
this test. as we might check `t.min_memtable_timestamp()` too soon.

after this change, we increase `unspooled_dirty_soft_limit` setting,
in order to disable `dirty_memory_manager`, so that the only flush
is performed by the test.

Fixes #19034
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 223fba3243)
2024-06-12 15:43:58 +00:00
Kefu Chai
b3de65a8fb test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE
before this change, we verify the behavior of design under test using
`BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test
fails, the test just aborts. this is not very helpful for postmortem
debugging.

after this change, we use `BOOST_REQUIRE` macro for verifying the
behavior, so that Boost.Test prints out the condition if it does not
hold when we test it.

Refs #19034
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 2df4e9cfc2)
2024-06-12 15:43:58 +00:00
Kefu Chai
3eb15e841a docs: correct the link pointing to Scylla U
before this change it points to
https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/
which then redirects the browser to
https://university.scylladb.com/courses/scylla-operations/,
but it should have point to
https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/

in this change, the hyperlink is corrected.

Fixes #19163
Refs 6e97b83b60
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit b5dce7e3d0)

Closes scylladb/scylladb#19197
2024-06-11 18:13:58 +03:00
Jenkins Promoter
017524c7d8 Update ScyllaDB version to: 5.4.8 2024-06-10 12:22:35 +03:00
Wojciech Mitros
1680bc2902 mv gossip: check errno instead of value returned by strtoull
Currently, when a view update backlog is changed and sent
using gossip, we check whether the strtoll/strtoull
function used for reading the backlog returned
LLONG_MAX/ULLONG_MAX, signaling an error of a value
exceeding the type's limit, and if so, we do not store
it as the new value for the node.

However, the ULLONG_MAX value can also be used as the max
backlog size when sending empty backlogs that were never
updated. In theory, we could avoid sending the default
backlog because each node has its real backlog (based on
the node's memory, different than the ULLONG_MAX used in
the default backlog). In practice, if the node's
backlog changed to 0, the backlog sent by it will be
likely the default backlog, because when selecting
the biggest backlog across node's shards, we use the
operator<=>(), which treats the default backlog as
equal to an empty backlog and we may get the default
backlog during comparison if the backlog of some shard
was never changed (also it's the initial max value
we compare shard's backlogs against).

This patch removes the (U)LLONG_MAX check and replaces
it with the errno check, which is also set to ERANGE during
the strtoll error, and which won't prevent empty backlogs
from being read

Fixes: #18462
(cherry picked from commit 5154429713)

Closes scylladb/scylladb#18697
2024-06-05 09:16:07 +03:00
Lakshmi Narayanan Sreethar
2e836fa077 db/config.cc: increment components_memory_reclaim_threshold config default
Incremented the components_memory_reclaim_threshold config's default
value to 0.2 as the previous value was too strict and caused unnecessary
eviction in otherwise healthy clusters.

Fixes #18607

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 3d7d1fa72a)

Closes scylladb/scylladb#19013
2024-06-04 07:11:43 +03:00
Botond Dénes
98139a8716 Merge '[Backport 5.4] : Reload reclaimed bloom filters when memory is available' from Lakshmi Narayanan Sreethar
PR #17771 introduced a threshold for the total memory used by all bloom filters across SSTables. When the total usage surpasses the threshold, the largest bloom filter will be removed from memory, bringing the total usage back under the threshold. This PR adds support for reloading such reclaimed bloom filters back into memory when memory becomes available (i.e., within the 10% of available memory earmarked for the reclaimable components).

The SSTables manager now maintains a list of all SSTables whose bloom filter was removed from memory and attempts to reload them when an SSTable, whose bloom filter is still in memory, gets deleted. The manager reloads from the smallest to the largest bloom filter to maximize the number of filters being reloaded into memory.

Backported from https://github.com/scylladb/scylladb/pull/18186 to 5.4.

Closes scylladb/scylladb#18660

* github.com:scylladb/scylladb:
  sstable_datafile_test: add testcase to test reclaim during reload
  sstable_datafile_test: add test to verify auto reload of reclaimed components
  sstables_manager: reload previously reclaimed components when memory is available
  sstables_manager: start a fiber to reload components
  sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables
  sstable_datafile_test: add test to verify reclaimed components reload
  sstables: support reloading reclaimed components
  sstables_manager: add new intrusive set to track the reclaimed sstables
  sstable: add link and comparator class to support new instrusive set
  sstable: renamed intrusive list link type
  sstable: track memory reclaimed from components per sstable
  sstable: rename local variable in sstable::total_reclaimable_memory_size
scylla-5.4.7 scylla-5.4.7-candidate
2024-05-30 11:09:51 +03:00
Kefu Chai
ee942874de docs: fix typos in upgrade document
s/Montioring/Monitoring/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit f1f3f009e7)

Closes scylladb/scylladb#18911
2024-05-30 11:06:40 +03:00
Nadav Har'El
4099833587 cql3, secondary index: consistently choose index to use in a query
When a table has secondary indexes on *multiple* columns, and several
such columns are used for filtering in a query, Scylla chooses one
of these indexes as the main driver of the query, and the second
column's restriction is implemented as filtering.

Before this patch, the index to use was chosen fairly randomly, based on
the order of the indexes in the schema. This order may be different in
different coordinators, and may even change across restarts on the same
coordinators. This is not only inconsistent, it can cause outright wrong
results when using *paging* and switching (or restarting) coordinates
in the middle of a paged scan... One coordinator saves one index's key
in the paging state, and then the other coordinator gets this paging
state and wrongly believes it is supposed to be a key of a *different*
index.

The fix in this patch is to pick the index suitable for the first
indexed column mentioned in the query. This has two benefits over
the situation before the patch:

1. The decision of which index to use no longer changes between
   coordinators or across restarts - it just depends on the schema
   and the specific query.

2. Different indexes can have different "specificity" so using one
   or the other can change the query's performance. After this patch,
   the user is in control over which index is used by changing the
   order of terms in the query. A curious user can use tracing to
   check which index was used to implement a particular query.

An xfailing test we had for this issue no longer fails, so the "xfail"
marker is removed.

Fixes #7969

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 77c61f907e)

Closes scylladb/scylladb#18963
2024-05-29 18:04:16 +03:00
Botond Dénes
27802511c0 Merge '[Backport 5.4] repair: Introduce repair_partition_count_estimation_ratio config option' from Asias He
This PR backport the "repair_partition_count_estimation_ratio config option" support to 5.4 branch.
In addition to the main patch "repair: Introduce repair_partition_count_estimation_ratio config option",
the patch "repair: Add missing db/config.hh" is added too.

Closes scylladb/scylladb#18881

* github.com:scylladb/scylladb:
  repair: Introduce repair_partition_count_estimation_ratio config option
  repair: Add missing db/config.hh
2024-05-27 15:13:11 +03:00
Asias He
30ffce4c79 repair: Introduce repair_partition_count_estimation_ratio config option
In commit 642f9a1966 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes #18615

(cherry picked from commit 340eae007a)
2024-05-27 16:32:56 +08:00
Asias He
9869276192 repair: Add missing db/config.hh
Since commit 952dfc6157 "repair: Introduce
repair_partition_count_estimation_ratio config option", get_config() is
used. We need to include db/config.hh for that.

Spotted when backporting to 5.4 branch.

Refs #18615

Closes scylladb/scylladb#18780

(cherry picked from commit 1a03e3d5ae)
2024-05-27 16:32:56 +08:00
Takuya ASADA
53a9dfba3a dist/docker: revert dropping systemd package
On 7ce6962141 we dropped openssh-server,
it also dropped systemd package and caused an error on Scylla Operator
(#17787).

This reverts dropping systemd package and fix the issue.

Fix #17787

(cherry picked from commit 0c7aa9342d)

Closes scylladb/scylladb#18834
2024-05-23 12:00:16 +03:00
Nadav Har'El
0d4e22ef55 cql: fix hang during certain SELECT statements
The function intersection(r1,r2) in statement_restrictions.cc is used
when several WHERE restrictions were applied to the same column.
For example, for "WHERE b<1 AND b<2" the intersection of the two ranges
is calculated to be b<1.

As noted in issue #18690, Scylla is inconsistent in where it allows or
doesn't allow these intersecting restrictions. But where they are
allowed they must be implemented correctly. And it turns out the
function intersection() had a bug that caused it to sometimes enter
an infinite loop - when the intent was only to call itself once with
swapped parameters.

This patch includes a test reproducing this bug, and a fix for the
bug. The test hangs before the fix, and passes after the fix.

While at it, I carefully reviewed the entire code used to implement
the intersection() function to try to make sure that the bug we found
was the only one. I also added a few more comments where I thought they
were needed to understand complicated logic of the code.

The bug, the fix and the test were originally discovered by
Michał Chojnowski.

Fixes #18688
Refs #18690

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 27ab560abd)

Closes scylladb/scylladb#18717
2024-05-21 16:31:21 +03:00
Botond Dénes
ae6c8753e6 Merge '[Backport 5.4] utils: chunked_vector: fill ctor: make exception safe' from ScyllaDB
Currently, if the fill ctor throws an exception,
the destructor won't be called, as it object is not fully constructed yet.

Call the default ctor first (which doesn't throw)
to make sure the destructor will be called on exception.

Fixes scylladb/scylladb#18635

- [x] Although the fixes is for a rare bug, it has very low risk and so it's worth backporting to all live versions

(cherry picked from commit 64c51cf32c)

(cherry picked from commit 88b3173d03)

(cherry picked from commit 4bbb66f805)

Refs #18636

Closes scylladb/scylladb#18679

* github.com:scylladb/scylladb:
  chunked_vector_test: add more exception safety tests
  chunked_vector_test: exception_safe_class: count also moved objects
  utils: chunked_vector: fill ctor: make exception safe
2024-05-21 16:29:46 +03:00
Benny Halevy
36c66d5a8f chunked_vector_test: add more exception safety tests
For insertion, with and without reservation,
and for fill and copy constructors.

Reproduces https://github.com/scylladb/scylladb/issues/18635

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-21 11:31:23 +03:00
Benny Halevy
9413afce41 chunked_vector_test: exception_safe_class: count also moved objects
We have to account for moved objects as well
as copied objects so they will be balanced with
the respective `del_live_object` calls called
by the destructor.

However, since chunked_vector requires the
value_type to be nothrow_move_constructible,
just count the additional live object, but
do not modify _countdown or, respectively, throw
an exception, as this should be considered only
for the default and copy constructors.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-21 11:05:38 +03:00
Benny Halevy
8e20379305 utils: chunked_vector: fill ctor: make exception safe
Currently, if the fill ctor throws an exception,
the destructor won't be called, as it object is not
fully constructed yet.

Call the default ctor first (which doesn't throw)
to make sure the destructor will be called on exception.

Fixes scylladb/scylladb#18635

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-21 11:05:38 +03:00
Kefu Chai
d32550f953 service/storage_proxy: capture tr_state by copy in handle_paxos_accept()
this change is inspired by following warning from clang-tidy

```
Warning: /home/runner/work/scylladb/scylladb/service/storage_proxy.cc:884:13: warning: 'tr_state' used after it was moved [bugprone-use-after-move]
  884 |         if (tr_state) {
      |             ^
/home/runner/work/scylladb/scylladb/service/storage_proxy.cc:872:139: note: move occurred here
  872 |         auto f = get_schema_for_read(proposal.update.schema_version(), src_addr, *timeout).then([&sp = _sp, &sys_ks = _sys_ks, tr_state = std::move(tr_state),
      |                                                                                                                                           ^
```

this is not a false positive. as `tr_state` is a captured by move for
constructing a variable in the captured list of a lambda which is in
turn passed to the expression evaluated to `f`.

even the expression itself is not evaluated yet when we reference
`tr_state` to check if it is empty after preparing the expression,
`tr_state` is already moved away into the captured variable. so
at that moment, the statement of `f = f.finally(...)` is never
evaluated, because `tr_state` is always empty by then.

so before this change, the trace message is never recorded.

in this change, we address this issue by capturing `tr_state` by
copying it. as `tr_state` is backed by a `lw_shared_ptr`, the overhead is
neglectable.

after this change, the tracing message is recorded.

the change introduced this issue was 548767f91e.

please note, we could coroutinize this function to improve its
readability, but since this is a fix and should be backported,
let's start with a minimal fix, and worry about the readability
in a follow-up change.

Refs 548767f91e
Fixes #18725
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit a429e7b1fe)

Closes scylladb/scylladb#18763
2024-05-21 09:03:01 +03:00
Botond Dénes
4e9ed69a75 Merge '[Backport 5.4] mutation_fragment_stream_validating_filter: respect validating_level::none' from ScyllaDB
Even when configured to not do any validation at all, the validator still did some. This small series fixes this, and adds a test to check that validation levels in general are respected, and the validator doesn't validate more than it is asked to.

Fixes: #18662

(cherry picked from commit f6511ca1b0)

(cherry picked from commit e7b07692b6)

(cherry picked from commit 78afb3644c)

Refs #18667

Closes scylladb/scylladb#18724

* github.com:scylladb/scylladb:
  test/boost/mutation_fragment_test.cc: add test for validator validation levels
  mutation: mutation_fragment_stream_validating_filter: fix validation_level::none
  mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter
2024-05-20 09:02:52 +03:00
Botond Dénes
7552c4b187 test/boost/mutation_fragment_test.cc: add test for validator validation levels
To make sure that the validator doesn't validate what the validation
level doesn't include.

(cherry picked from commit 78afb3644c)
2024-05-17 07:55:05 +00:00
Botond Dénes
87dcd29ec3 mutation: mutation_fragment_stream_validating_filter: fix validation_level::none
Despite its name, this validation level still did some validation. Fix
this, by short-circuiting the catch-all operator(), preventing any
validation when the user asked for none.

(cherry picked from commit e7b07692b6)
2024-05-17 07:55:04 +00:00
Botond Dénes
9e7cd767dd mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter
When set to false, no exceptions will be raised from the validator on
validation error. Instead, it will just return false from the respective
validator methods. This makes testing simpler, asserting exceptions is
clunky.
When true (default), the previous behaviour will remain: any validation
error will invoke on_internal_error(), resulting in either std::abort()
or an exception.

(cherry picked from commit f6511ca1b0)
2024-05-17 07:55:04 +00:00
Botond Dénes
63d1c763fc Merge '[Backport 5.4] tools/scylla-sstable: add scylla sstable shard-of command' from Kefu Chai
when migrating to the uuid-based identifiers, the mapping from the
integer-based generation to the shard-id is preserved. we used to have
"gen % smp_count" for calculating the shard which is responsible to host
a given sstable. despite that this is not a documented behavior, this is
handy when we try to correlate an sstable to a shard, typically when
looking at a performance issue.

in this change, a new subcommand is added to expose the connection
between the sstable and its "owner" shards.

Fixes #16343
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes https://github.com/scylladb/scylladb/pull/16345

(cherry picked from commit 273ee36bee)

Fixes #18381

- [x] need to backport, because we have needs in production to figure out the mapping from an sstable identifier to the shard which "owns" it.

Closes scylladb/scylladb#18681

* github.com:scylladb/scylladb:
  tools: Make sstable shard-of efficient by loading minimum to compute owners
  test/cql-pytest/test_tools.py: test shard-of with a single partition
  tools/scylla-sstable: add `scylla sstable shard-of` command
2024-05-16 11:07:47 +03:00
Pavel Emelyanov
29c892ea5a functions: Do not crash when schema is missing
Getting token() function first tries to find a schema for underlying
table and continues with nullptr if there's no one. Later, when creating
token_fct, the schema is passed as is and referenced. If it's null crash
happens.

It used to throw before 5983e9e7b2 (cql3: test_assignment: pass optional
schema everywhere) on missing schema, but this commit changed the way
schema is looked up, so nullptr is now possible.

fixes: #18637

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit df8a446437)

Closes scylladb/scylladb#18698
2024-05-16 11:06:25 +03:00
Raphael S. Carvalho
9bb175852d tools: Make sstable shard-of efficient by loading minimum to compute owners
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18440

(cherry picked from commit d7a01598ce)
2024-05-15 14:32:43 +08:00
Kefu Chai
daf4ffb9b4 test/cql-pytest/test_tools.py: test shard-of with a single partition
test_scylla_sstable_shard_of takes lots of time preparing the keys for a
certain shard. with the debug build, it takes 3 minutes to complete the
test.

so in order to test the "shard-of" subcommand in an more efficient way,
in this change, we improve the test in two ways:

1. cache the output of 'scylla types shardof`. so we can avoid the
   overhead of running a seastar application repeatly for the
   same keys.
2. reduce the number of partitions from 42 to 1. as the number of
   partitions in an sstable does not matter when testing the
   output of "shard-of" command of a certain sstable. because,
   the sstable is always generated by a certain shard.

before this change, with pytest-profiling:

```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000  181.950   60.650 runner.py:219(call_and_report)
      4/3    0.000    0.000  181.948   60.649 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000  181.948   60.649 runner.py:318(from_call)
      4/3    0.000    0.000  181.948   60.649 runner.py:262(<lambda>)
    44/11    0.000    0.000  181.935   16.540 _hooks.py:427(__call__)
    43/11    0.000    0.000  181.935   16.540 _manager.py:103(_hookexec)
    43/11    0.000    0.000  181.935   16.540 _callers.py:30(_multicall)
      361    0.001    0.000  181.531    0.503 contextlib.py:141(__exit__)
   782/81    0.001    0.000  177.578    2.192 {built-in method builtins.next}
     1044    0.006    0.000   92.452    0.089 base_events.py:1894(_run_once)
       11    0.000    0.000   91.129    8.284 fixtures.py:686(<lambda>)
    17/11    0.000    0.000   91.129    8.284 fixtures.py:1025(finish)
        4    0.000    0.000   91.128   22.782 fixtures.py:913(_teardown_yield_fixture)
      2/1    0.000    0.000   91.055   91.055 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000   91.055   91.055 runner.py:119(runtestprotocol)
        2    0.000    0.000   91.052   45.526 conftest.py:50(cql)
        2    0.000    0.000   91.040   45.520 util.py:161(cql_session)
        1    0.000    0.000   91.040   91.040 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000   91.040   91.040 runner.py:509(teardown_exact)
     1945    0.002    0.000   90.722    0.047 events.py:82(_run)
```

after this change:
```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000    8.271    2.757 runner.py:219(call_and_report)
    44/11    0.000    0.000    8.270    0.752 _hooks.py:427(__call__)
    44/11    0.000    0.000    8.270    0.752 _manager.py:103(_hookexec)
    44/11    0.000    0.000    8.270    0.752 _callers.py:30(_multicall)
      4/3    0.000    0.000    8.269    2.756 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000    8.269    2.756 runner.py:318(from_call)
      4/3    0.000    0.000    8.269    2.756 runner.py:262(<lambda>)
       48    0.000    0.000    8.269    0.172 {method 'send' of 'generator' objects}
       27    0.000    0.000    5.671    0.210 contextlib.py:141(__exit__)
       11    0.000    0.000    4.297    0.391 fixtures.py:686(<lambda>)
      2/1    0.000    0.000    4.228    4.228 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000    4.228    4.228 runner.py:119(runtestprotocol)
        2    0.000    0.000    4.213    2.106 capture.py:877(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:509(teardown_exact)
        2    0.000    0.000    3.628    1.814 capture.py:872(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 runner.py:160(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 python.py:1797(runtest)
   114/81    0.001    0.000    3.505    0.043 {built-in method builtins.next}
       15    0.784    0.052    3.183    0.212 subprocess.py:417(check_output)
```

Fixes #16516
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16523

(cherry picked from commit 642652efab)
2024-05-15 14:32:43 +08:00
Kefu Chai
03a54a4c07 tools/scylla-sstable: add scylla sstable shard-of command
when migrating to the uuid-based identifiers, the mapping from the
integer-based generation to the shard-id is preserved. we used to have
"gen % smp_count" for calculating the shard which is responsible to host
a given sstable. despite that this is not a documented behavior, this is
handy when we try to correlate an sstable to a shard, typically when
looking at a performance issue.

in this change, a new subcommand is added to expose the connection
between the sstable and its "owner" shards.

Fixes #16343
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16345

(cherry picked from commit 273ee36bee)
2024-05-15 14:32:42 +08:00
Lakshmi Narayanan Sreethar
4b0c60cdc3 compaction: improve partition estimates for garbage collected sstables
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes #18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#18465

(cherry picked from commit d39adf6438)

Closes scylladb/scylladb#18656
2024-05-14 07:53:07 +03:00
Patryk Wrobel
28d0fc1b6b scylla_io_setup: ensure correct RLIMIT_NOFILE for iotune
The default limit of open file descriptors
per process may be too small for iotune on
certain machines with large number of cores.

In such case iotune reports failure due to
unability to create files or to set up seastar
framework.

This change configures the limit of open file
descriptors before running iotune to ensure
that the failure does not occur.

The limit is set via 'resource.setrlimit()' in
the parent process. The limit is then inherited
by the child process.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
(cherry picked from commit ec820e214c)

Closes scylladb/scylladb#18655
2024-05-14 07:48:53 +03:00
Israel Fruchter
393880f355 Update tools/cqlsh submodule to v6.0.17
Mostly a set of fixes in the area of ssl handling

* tools/cqlsh 99b2b777...9d49b385 (21):
  > cqlshlib/sslhandling: fix logic of `ssl_check_hostname`
  > cqlshlib/sslhandling.py: don't use empty userkey/usercert
  > Dockerfile: noninteractive isn't enough for answering yet on apt-get
  > fix cqlsh version print
  > cqlshlib/sslhandling: change `check_hostname` deafult to False
  > Introduce new ssl configuration for disableing check_hostname
  > set the hostname in ssl_options.server_hostname when SSL is used
  > issue-73 Fixed a bug where username and password from the credentials file were ignored.
  > issue-73 Fixed a bug where username and password from the credentials file were ignored.
  > issue-73
  > github actions: update `cibuildwheel==v2.16.5`
  > dist/debian: fix the trailer line format
  > `COPY TO STDOUT` shouldn't put None where a function is expected
  > Make cqlsh work with unix domain sockets
  > Bump python-driver version
  > dist/debian: add trailer line
  > dist/debian: wrap long line
  > Draft: explicit build-time packge dependencies
  > stop retruning status_code=2 on schema disagreement
  > Fix minor typos in the code
  > Dockerfile: apt-get update and apt-get upgrade to get latest OS packages

Ref: #18590

Closes scylladb/scylladb#18652
2024-05-14 07:47:37 +03:00
Lakshmi Narayanan Sreethar
e30a2af700 sstable_datafile_test: add testcase to test reclaim during reload
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 4d22c4b68b)
2024-05-14 01:04:42 +05:30
Lakshmi Narayanan Sreethar
e0b4483bb8 sstable_datafile_test: add test to verify auto reload of reclaimed components
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit a080daaa94)
2024-05-14 00:10:28 +05:30
Lakshmi Narayanan Sreethar
8a6300be4c sstables_manager: reload previously reclaimed components when memory is available
When an SSTable is dropped, the associated bloom filter gets discarded
from memory, bringing down the total memory consumption of bloom
filters. Any bloom filter that was previously reclaimed from memory due
to the total usage crossing the threshold, can now be reloaded back into
memory if the total usage can still stay below the threshold. Added
support to reload such reclaimed filters back into memory when memory
becomes available.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 0b061194a7)
2024-05-14 00:10:21 +05:30