Commit Graph

45405 Commits

Author SHA1 Message Date
Paweł Zakrzewski
b893e63b4a test: enable PER PARTIION LIMIT + GROUP BY tests 2024-11-19 09:28:01 +01:00
Paweł Zakrzewski
08eb853a96 cql3: respect PER PARTITION LIMIT for aggregates
This change adds support for PER PARTITION LIMIT for aggregate queries.
result_set_builder gets two new functions handling partition start and
end:
- accept_partition_end for notifying that a partition has been finished.
  This is also called when a page ends, so we cannot simply flush here,
  as a naive implementation could do.
- accept_new_partition, where we flush_selectors() if it's indeed a new
  partition (and not a continuation of the previous) and the query has a
  grouping: we don't want to flush on new partition in a query like
  SELECT COUNT(*) FROM foo;
2024-11-18 17:56:53 +01:00
Paweł Zakrzewski
8190d76dd6 cql3: selection: count input rows in the selector
This will allow result_set_builder::flush_selectors() to only flush when
there are input rows.
2024-11-18 17:56:53 +01:00
Paweł Zakrzewski
aea3c3851e cql3: selection: pass per partition limit to the result_set_builder
Aggregates require the limit to be applied from within the builder
class, so it needs to be passed to it.
2024-11-18 17:56:53 +01:00
Paweł Zakrzewski
cb1483037c cql3: show different messages for LIMIT and PER PARTITION LIMIT in get_limit
select_statement::get_limit is used to evaluate the LIMIT value for both
LIMIT and PER PARTITION LIMIT. This change fixes the error message for
incorrect values passed by the user.
2024-11-18 17:56:53 +01:00
Botond Dénes
fed2c6ba83 sstables/mx/reader: release column value buffer after consumed
data_consume_rows_context_m has a _column_value buffer it uses to read
key and column values into, preparing for parsing and consuming them.
This buffer is reset (released) in a few different cases:
* When using it for key - after consuming its content
* When using it for column value - when a colum has no value

However, the buffer is not released when used for a column value and the
column is consumed. This means that if a large column is read from the
sstable, this buffer can potentially linger and keep consuming memory
until either one of the other release scenarios is hit, or the reader is
destroyed.
Add a third release scenario, releasing the buffer after the row end was
consumed. This allows the buffer to be re-used between columns of the
same row, at the same time ensuring that a large buffer will not linger.

This patch can almost halve the memory consumption of reads in certain
circumstances. Point in case: the test
test_reader_concurrency_semaphore_memory_limit_engages starts to fail
after this fix, because the read doesn't trigger the OOM limit anymore
and needs doubling of the concurrency to keep passing.

This issue was found in a dtest
(`test_ics_refresh_with_big_sstable_files`), which writes some large
cells of up to 7MiB. After reading the row containing this large cell,
the reader holds on to the 7MiB buffer causing the semaphore's OOM
protection to kick in down the line.

Fixes: https://github.com/scylladb/scylladb/issues/21160

Closes scylladb/scylladb#21132
2024-11-14 17:24:53 +01:00
Kefu Chai
00810e6a01 treewide: include seastar/core/format.hh instead of seastar/core/print.hh
The later includes the former and in addition to `seastar::format()`,
`print.hh` also provides helpers like `seastar::fprint()` and
`seastar::print()`, which are deprecated and not used by scylladb.

Previously, we include `seastar/core/print.hh` for using
`seastar::format()`. and in seastar 5b04939e, we extracted
`seastar::format()` into `seastar/core/format.hh`. this allows us
to include a much smaller header.

In this change, we just include `seastar/core/format.hh` in place of
`seastar/core/print.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21574
2024-11-14 17:45:07 +02:00
Michael Pedersen
309f1606ae docs: correct the storage size for n2-highmem-32 to 9000GB
updated storage size for n2-highmem-32 to 9000GB as this is default in SC

Closes scylladb/scylladb#21537
2024-11-14 17:16:44 +03:00
Pavel Emelyanov
298602b32d Merge 'message: do not include unused headers' from Kefu Chai
these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed.

also, update the workflow to prevent future regressions of including unused headers in this subdirectory.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#21560

* github.com:scylladb/scylladb:
  .github: add "message" to CLEANER_DIR
  message: do not include unused headers
2024-11-14 17:15:16 +03:00
Kefu Chai
6955b8238e docs: fix monospace formatting for rm command
Add missing space before `rm` to ensure proper rendering
in monospace font within documentation.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21576
2024-11-14 17:14:32 +03:00
Kefu Chai
5b8c2ad600 test/object_store: various cleanups
just for better readability:

* chain comparison statement when appropriate
* do not use f-string when there are no place holders
* use list comprehension when initializing a set
* remove unused import statement
* move import statement of the standard library before
  those which import the 3rd-party modules
* put two empty lines in-between top-level functions.
  this is recommended by PEP8.
* remove the extraneous spaces around `=` in parameter
  list.
* remove the extraneous spaces in a list like `[ 1, 2, 3 ]`
  so it looks like `[1, 2, 3]`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21561
2024-11-14 16:57:15 +03:00
Nadav Har'El
99d420daa5 test: move a materialized-view test from boost to cqlpy
This patch moves (after straightforward translation) the test
"test_views_with_future_tombstone", a regression test for #5793,
from the C++ boost framework to the Python cqlpy framework.

The main motivation this move is the ease of debugging failures:
During the work on a patch for #20679 (eliminating read-before-write)
this test began to fail, and understanding where the C++ failed was
near impossible: the Boost test framework reports that the test failed,
but not in which line or why, and adding printouts to this huge source
file require a ridiculous amount of time for recompilation every time.
In contrast, the new pytest-based version shows exactly where the
error is, beautifully:

```
>               assert [] == list(cql.execute(f'select * from {mv}'))
E               assert [] == [Row(b=2, a=1, c=3, d=4, e=5)]
test_materialized_view.py:1614: AssertionError
```

It shows exactly which assertion failed, and exactly what were the
values that were compared. Beautiful and super helpful for debugging.

Beyond the ease of debugging, moving this (and later, other) test to
the cql-pytest framework has additional advantages:

1. The test was misplaced, in the cql_test source file, and it belongs
   with materialized views tests so let's use this opportunity to move
   it to the right place.
2. Can easily run the same test on multiple versions of Scylla, and
   also on Cassandra. It's a good way to confirm the test is correct.
3. No need to recompile the test after every attempt to fix the bug.
   The cql_query_test.cc is huge - over 6,000 lines - and takes over
   a minute to compile after every attempt to fix a bug.

Refs #16134 (the issue asks to move all MV tests to cql-pytest)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#21552
2024-11-14 16:55:58 +03:00
André LFA
703e6f3b1f Update report-scylla-problem.rst removing references to old Health Check Report
Closes scylladb/scylladb#21467
2024-11-14 15:12:26 +02:00
Anna Stuchlik
3bd2ecff63 doc: add the 6.0-to-2024.2 upgrade guide-from-6
This commit adds an upgrade guide from ScyllDB 6.0
to ScyllaDB Enterprise 2024.2.

Fixes https://github.com/scylladb/scylladb/issues/20063
Fixes https://github.com/scylladb/scylladb/issues/20062
Refs https://github.com/scylladb/scylla-enterprise/issues/4544

Closes scylladb/scylladb#20133
2024-11-14 15:07:43 +02:00
Kefu Chai
1cedc45c35 doc: import the new pub keys used to sign the package
before this change, when user follows the instruction, they'd get

```console
$ sudo apt-get update
Hit:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble InRelease
Hit:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble-updates InRelease
Hit:3 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble-backports InRelease
Hit:4 http://security.ubuntu.com/ubuntu noble-security InRelease
Get:5 https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease [7550 B]
Err:5 https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease
 The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A43E06657BAC99E3
Reading package lists... Done
W: GPG error: https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease: The following signatures couldn't be verified because the public key is not av
ailable: NO_PUBKEY A43E06657BAC99E3
E: The repository 'https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
```

because the packages were signed with a different keyring.

in this change, we import the new pubkey, so that the pacakge manager
can
verify the new packages (2024.2+ and 6.2+) signed with the new key.

see also https://github.com/scylladb/scylla-ansible-roles/issues/399
and https://forum.scylladb.com/t/release-scylla-manager-3-3-1/2516
for the annonucement on using the new key.

Fixes scylladb/scylladb#21557
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21524
2024-11-14 13:33:47 +02:00
Botond Dénes
89c68d4ba7 Update seastar submodule
* seastar 1b0a3087...a5432364 (2):
  > rpc: Emplace buffers into vector, not push
  > core: reactor_config: add reserve_io_control_blocks

Refs: https://github.com/scylladb/scylladb/issues/19185

Closes scylladb/scylladb#21573
2024-11-14 12:44:10 +02:00
Tomasz Grabiec
1d0c6aa26f utils: UUID: Make get_time_UUID() respect the clock offset
schema_change_test currently fails due to failure to start a cql test
env in unit tests after the point where this is called (in one of the
test cases):

   forward_jump_clocks(std::chrono::seconds(60*60*24*31));

The problem manifests with a failure to join the cluster due to
missing_column exception ("missing_column: done") being thrown from
system_keyspace::get_topology_request_state(). It's a symptom of
join request being missing in system.topology_requests. It's missing
because the row is expired.

When request is created, we insert the
mutations with intended TTL of 1 month. The actual TTL value is
computed like this:

  ttl_opt topology_request_tracking_mutation_builder::ttl() const {
      return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds(_ts)) + std::chrono::months(1)
          - std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch());
  }

_ts comes from the request_id, which is supposed to be a timeuuid set
from current time when request starts. It's set using
utils::UUID_gen::get_time_UUID(). It reads the system clock without
adding the clock offset, so after forward_jump_clocks(), _ts and
gc_clock::now() may be far off. In some cases the accumulated offset
is larger than 1month and the ttl becomes negative, causing the
request row to expire immediately and failing the boot sequence.

The fix is to use db_clock, which respects offsets and is consistent
with gc_clock.

The test doesn't fail in CI becuase there each test case runs in a
separate process, so there is no bootstrap attempt (by new cql test
env) after forward_jump_clocks().

Closes scylladb/scylladb#21558
2024-11-14 10:32:07 +02:00
Botond Dénes
c14ace54e3 Merge 'Add testcases for tablet migration involving views' from Lakshmi Narayanan Sreethar
Added test cases to reproduce issues with tablet migration involving views.

Refs #19149
Refs #21564

No backport needed as the PR adds only testcases.

Closes scylladb/scylladb#21566

* github.com:scylladb/scylladb:
  topology_custom/test_tablets.py: add testcase for tablet migration of staged sstables
  topology_custom/test_tablets.py: add testcase for tablet migration with unbuilt views
2024-11-14 08:32:38 +02:00
Lakshmi Narayanan Sreethar
c1d447c932 topology_custom/test_tablets.py: add testcase for tablet migration of staged sstables
Tablet migration mixes staged and non staged sstables causing base view
inconsistencies in the pending replica. Added a testcase to reproduce
this issue.

Refs #19149.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-13 18:17:20 +05:30
Lakshmi Narayanan Sreethar
4cc12e1b7e topology_custom/test_tablets.py: add testcase for tablet migration with unbuilt views
When a tablet gets migrated right after view was created but before the
view builder registered the new view, the pending replica will not
register the sstables in the tablet for view building causing base view
inconsistencies. This commit adds a testcase to reproduce the issue.

Refs #21564

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-13 18:17:14 +05:30
Nadav Har'El
3fda9651cc test/alternator: option to run alternator tests against specific release
We recently added a "--release <version>" option to test/cql-pytest/run
to run a cql-pytest test against a released version of Scylla, downloaded
automatically from ScyllaDB's precompiled binary repository. This patch
adds the same capability also to test/alternator/run - allowing to run
a current test/alternator test on older releases of Scylla. The
implementation in this patch reuses the same implementation from the
cql-pytest patch.

Here is an example use case: the pull request #19941 claimed that
a certain bug fix was backported to release 6.0. Was it? Let's run
the test reproducing that bug on two releases:

test/alternator/run --release 6.0 test_streams.py::test_stream_list_tables
test/alternator/run --release 6.1 test_streams.py::test_stream_list_tables

It shows that the test passes on 6.1 (so the bug is fixed there) but the
test fails 6.0. It turns out that although the fix was backported to
branch-6.0, this happened shortly after 6.0.4 was released and no later
6.0 minor release came afterwards! So the bug wasn't actually fixed
on any official release of 6.0.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#21343
2024-11-13 09:38:09 +02:00
Kefu Chai
6d65e1a73c Update seastar submodule
* seastar fba36a3d...1b0a3087 (9):
  > program-options: add missing include <memory>
  > reactor: Always retry waitpid
  > treewide: include core/format.hh when appropriate
  > print: remove unused fmt/ostream.h
  > print: extract format() into format.hh
  > net: route error messages to logger instead of to stderr
  > net: stop printing when reaching unreachable branch
  > reactor: Mark drain() private
  > rpc: optimize tuple deserialization when the types are default-constructible

Closes scylladb/scylladb#21520
2024-11-13 09:33:00 +02:00
Kefu Chai
e0525bbac0 .github: add "message" to CLEANER_DIR
in order to prevent future inclusion of unused headers, let's include
"message" subdirectory to CLEANER_DIR, so that this workflow can
identify the regressions in future.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-13 14:29:52 +08:00
Kefu Chai
876c4ec78a message: do not include unused headers
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-13 14:29:52 +08:00
Emil Maskovsky
92db2eca0b test/topology_custom: fix the flaky test_raft_recovery_stuck
The test is only sending a subset of the running servers for the rolling
restart. The rolling restart is checking the visibility of the restarted
node agains the other nodes, but if that set is incomplete some of the
running servers might not have seen the restarted node yet.

Improved the manager client rolling restart method to consider all the
running nodes for checking the restarted node visibility.

Fixes: scylladb/scylladb#19959

Closes scylladb/scylladb#21477
2024-11-12 16:38:28 +01:00
Kefu Chai
45e8d6793e test: include fmt/iostream.h and iostream when appropriate
this change was created in the same spirit of aebb5329, which
included the fmt/iostream.h and iostream when appropriate so that
the tree can build with seastar submodule including e96932b0.
in the seastar change, we stopped including unused `fmt/ostream.h`
in a public header in seastar, so the parent projects relying on
the header to indirectly include fmt/ostream.h and iostream would
have to include these headers explicitly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21525
2024-11-12 17:34:08 +02:00
Yaron Kaikov
3bc2b34a18 ./github/scripts/label_promoted_commits.py: fix search for closes prefix on merge PRs
In cc71077e33, i have added check for the
last line in pr body looking for `closes` prefix.

It seems that this is wrong, since in a merge PR, the `closes` prefix is
not the last line

Instead, changing the search for the last line contains `closes` prefix

Closes scylladb/scylladb#21545
2024-11-12 13:56:37 +02:00
Botond Dénes
1c212df62d Merge 'scylla_raid_setup: fix failure on SELinux package installation' from Takuya ASADA
After merged 5a470b2bfb, we found that scylla_raid_setup fails on offline mode
installation.
This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect
internet.
Seems like it occur because of missing "policycoreutils-python-utils"
package, which is the package for "semange" command.
So we need to implement the relabeling patch without using the command.

Fixes https://github.com/scylladb/scylladb/issues/21441

Also, since Amazon Linux 2 has different package name for semange, we need to
adjust package name.

Fixes https://github.com/scylladb/scylladb/issues/21351

Closes scylladb/scylladb#21474

* github.com:scylladb/scylladb:
  scylla_raid_setup: support installing semanage on Amazon Linux 2
  scylla_raid_setup: fix failure on SELinux package installation
2024-11-12 09:20:56 +02:00
Pavel Emelyanov
b158ca7346 api: Remove param field from req_param
The req_param class is used to help parsing http request parameters from
strings into exact types (typically some simple types like strings,
integrals or boolean). On it there are three fields:

- name -- the parameter name
- param -- the parameter string value
- value -- the parameter value of desired type

The `param` thing is not really needed, it's only used by few places
that print it into logs, but they may as well just print the `value`
thing itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#21502
2024-11-11 17:47:55 +02:00
Pavel Emelyanov
87ec2af6f0 api: Remove dead if-branch that collects all tables from ks
After calling api::parse_tables() the resulting vector of table names
cannot be empty, because in case parameter is missing, the parse_tables
function returns all tables from keyspace anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#21501
2024-11-11 17:46:38 +02:00
Botond Dénes
30cb58b3e4 Merge 'compaction: use better partition estimate for split compaction' from Lakshmi Narayanan Sreethar
Split compaction divides the partitions in an existing sstable into two groups and writes them into two new sstables, which replace the original one. The partition count from the original sstable is used as an estimate when writing the new ones, but this estimate is not accurate as the partitions are split between the two new sstables and each will contain only a portion of the original partition count. This also causes the bloom filters to be rebuilt at the end of compaction, as they were initially built with inaccurate estimates.

Fix this by using a better estimate for the output sstables, which is half the original partition count.

Fixes #20253

Improvement; No need to backport.

Closes scylladb/scylladb#20908

* github.com:scylladb/scylladb:
  compaction: use better partition estimate for split compaction
  compaction::table_state: implement `get_token_range_after_split()` wrapper
  replica/table: implement `get_token_range_after_split()` wrappers
  tablet_map: introduce `get_token_range_after_split()`
  tablet_map: implement existing get_token_range() using the new variant
  tablet_map: introduce `get_token_range()` variant
  tablet_map: introduce `get_last_token()` variant
2024-11-11 16:25:08 +02:00
Kefu Chai
3fb1112c18 readers/multishard: fix a typo in comment
s/fullfill/fulfill/

this misspelling was identified by the codespell workflow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21521
2024-11-11 16:14:47 +02:00
Kefu Chai
88410b75c9 test/object_store: verify backup fails on missing snapshot
Add test to ensure backup tasks properly handle non-existent snapshots
by:

- Verifying backup task reports failure status
- Ensuring error is propagated through task status API

Previously untested edge case when backing up a snapshot that doesn't
exist in the test_backup.py tests.

Refs scylladb/scylladb#21381
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21385
2024-11-11 13:50:07 +03:00
Yaron Kaikov
cc71077e33 .github/scripts/label_promoted_commits.py: only match the Close tag in the last line in the commit message
When a backport PR is promoted to the release branch, we automatically close the backport PR (since GitHub will only close the one based on the default branch) and update the labels in the original PRs

In a situation when we have multiple `closes` prefixes, the script will use the first one (which is not the correct one), see 3ddb61c90e

Fixing this by always using the last line with the `closes` prefix

Closes scylladb/scylladb#21498
2024-11-11 11:04:33 +02:00
Dani Tweig
381faa2649 Rename .github/ISSUE_TEMPLATE.md to .github/ISSUE_TEMPLATE/bug_report.yml
GitHub issue template process has changed.
The issue template file should be replaced and renamed.

Closes scylladb/scylladb#21518
2024-11-11 11:00:38 +02:00
Takuya ASADA
6fe09a5a16 scylla_raid_setup: support installing semanage on Amazon Linux 2
Since Amazon Linux 2 has different package name for semange, we need to
adjust package name.

Fixes #21351
2024-11-11 17:27:24 +09:00
Takuya ASADA
7ad5e69c54 scylla_raid_setup: fix failure on SELinux package installation
After merged 5a470b2, we found that scylla_raid_setup fails on offline mode
installation.
This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect
internet.
Seems like it occur because of missing "policycoreutils-python-utils"
package, which is the package for "semange" command.
So we need to implement the relabeling patch without using the command.

Fixes #21441
2024-11-11 17:27:24 +09:00
Nikita Kurashkin
3032d8ccbf add check to refuse usage of DESC TABLE on a materialized view
Fixes #21026

Closes scylladb/scylladb#21500
2024-11-11 10:23:30 +02:00
Yaron Kaikov
2596d1577b ./github/workflows/add-label-when-promoted.yaml: Run auto-backport only on default branch
In https://github.com/scylladb/scylladb/pull/21496#event-15221789614
```
scylladbbot force-pushed the backport/21459/to-6.1 branch from 414691c to 59a4ccd Compare 2 days ago
```

Backport automation triggered by `push` but also should either start from `master` branch (or `enterprise` branch from Enterprise), we need to verify it by checking also the default branch.

Fixes: https://github.com/scylladb/scylladb/issues/21514

Closes scylladb/scylladb#21515
2024-11-11 09:16:35 +02:00
Lakshmi Narayanan Sreethar
eb4b407085 compaction: use better partition estimate for split compaction
Split compaction divides the partitions in an existing sstable into two
groups and writes them into two new sstables, which replace the original
one. The partition count from the original sstable is used as an
estimate when writing the new ones, but this estimate is not accurate as
the partitions are split between the two new sstables and each will
contain only a portion of the original partition count. This also causes
the bloom filters to be rebuilt at the end of compaction, as they were
initially built with inaccurate estimates.

Fix this by using a better estimate for the output sstables based on the
token ranges written to them.

Fixes scylladb#20253

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-11 12:26:51 +05:30
Lakshmi Narayanan Sreethar
67dad99ab5 compaction::table_state: implement get_token_range_after_split() wrapper
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-11 12:24:00 +05:30
Lakshmi Narayanan Sreethar
c4db4abcae replica/table: implement get_token_range_after_split() wrappers
Expose the functionality of `tablet_map::get_token_range_after_split()`
via the replica::table class.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-11 12:24:00 +05:30
Lakshmi Narayanan Sreethar
4130187e78 tablet_map: introduce get_token_range_after_split()
Added `get_token_range_after_split()`, which returns the token range the
given token will belong to after a tablet split. This is required to
estimate the token ranges of resultant sstables after a split.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-11 12:23:47 +05:30
Lakshmi Narayanan Sreethar
1e2c1d7f25 tablet_map: implement existing get_token_range() using the new variant
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-11 12:22:14 +05:30
Lakshmi Narayanan Sreethar
f536c7d15b tablet_map: introduce get_token_range() variant
Implement `get_token_range()` to return the token range of the specified
tablet with the given `log2_tablets` size. This will be used to deduce
which range a token will end up in if the tablet is split.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-11 12:21:05 +05:30
Lakshmi Narayanan Sreethar
f655136091 tablet_map: introduce get_last_token() variant
Implement `get_last_token()`, which returns the largest token owned
by the specified tablet with the given `log2_tablets` size. This will be
used to deduce token ranges for a tablet with any arbitrary
`tablet_count`.

Also, update the existing public `get_last_token()` to utilize the new
variant.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-11 12:18:04 +05:30
Pavel Emelyanov
57af69e15f Merge 'Add retries to the S3 client' from Ernest Zaslavsky
1. Add `retry_strategy` interface and default implementation for exponential back-off retry strategy.
2. Add new S3 related errors, also introduce additional errors to describe pure http errors that has no additional information in the body.
3. Add retries to the s3 client, all retries are coordinated by an instance of `retry_strategy`. In a case of error also parse response body in attempt to retrieve additional and more focused error information as suggested by AWS. See https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html. Introduce `aws_exception` to carry the original `aws_error`.
4. Discard whatever exception is thrown in `abort_upload` when aborting multipart upload since we don't care about cleanly aborting it since there are other means to clean up dangling parts, for example `rclone cleanup` or S3 bucket's Lifecycle Management Policy.
5. Add tests to cover retries, and retry exhaustion. Also add tests for jumbo upload.
6. Add the S3 proxy which is used to randomly inject retryable S3 errors to test the "retry" part of the S3 client. Switch the `s3_test` to use the S3 proxy. `s3_tests` set afloat `put_object` problem that was causing segmentation when retrying, fixed.
7. Extend the `s3_test` to use both `minio` and `proxy` configurations.
8. Add parameter to the proxy to seed the error injection randomization to make it replayable.

fixes: #20611
fixes: #20613

Closes scylladb/scylladb#21054

* github.com:scylladb/scylladb:
  aws_errors: Make error messages more verbose.
  test: Make the minio proxy randomization re-playable
  test/boost/s3_test: add error injection scenarios to existing test suite
  test: Switch `s3_test` to use proxy
  test: Add more tests
  client: Stop returning error on `DELETE` in multipart upload abortion
  client: Fix sigsegv when retrying
  client: Add retries
  client: Adjust `map_s3_client_exception` to return exception instance
  aws_errors: Change aws_error::parse to return std::optional<>
  aws_errors: Add http errors mapping into aws_error
  client: Add aws_exception mapping
  aws_error: Add `aws_exeption` to carry original `aws_error`
  aws_errors: Add new error codes
  client: Introduce retry strategy
2024-11-11 08:35:55 +03:00
Takuya ASADA
92af373fab unified: drop scylla-tools from unified package
On b8634fb, we dropped scylla-tools from rpm and deb, we should drop it
from unified package as well.

Closes #20739

Closes scylladb/scylladb#20740
2024-11-10 12:56:43 +02:00
Avi Kivity
b58dbe57aa Merge 'repair: introduce and use buffer size hint for mixed-shard multishard reader' from Botond Dénes
Add a buffer hint to the multishard reader. This is an internal hint, used by the multishard reader to provide a hint to the shard reader, on how much data exactly is needed by the multishard reader from the respective shard. This hint allows eliminating extraneous cross-shard round-trips and possible shard reader evict-recreate cycles. Building on this, repair sets its own row buffer size as the max buffer size on the multishard reader, ensuring that the row buffer is filled with the minimum amount of cross-shard round trips and minimal reader recreation.
To further eliminate unnecessary evictions, this PR also disables the multishard reader's read-ahead which is a mechanism that was designed to reduce latency for user-reads but it can be too aggressive for repair, causing unnecessary extra congestion on the already struggling streaming semaphores.

Refs: https://github.com/scylladb/scylladb/issues/18269
Fixes: https://github.com/scylladb/scylladb/issues/21113

The performance impact was measured with an SCT test, which creates a cluster of 3 nodes with 16 shards, then adds a 4th one with 12 shards.
Currently, it is the bootstrap time which is the worse in the case of mixed shard clusters, see below for the improvement measured during bootstrap:

|              | master        | buffer-hint   | metric                                              |
| ------------ | ------------- | ------------- | --------------------------------------------------- |
| evictions    |          0.9M |         93.0K | scylla_database_paused_reads_permit_based_evictions |
| read (bytes) |          9.0T |          3.9T | scylla_reactor_aio_bytes_read                       |
| read (ops)   |         88.0M |         33.5M | scylla_reactor_aio_reads                            |
| time         |         56min |         20min | N/A                                                 |

This is a performance improvement, no backport required.

Closes scylladb/scylladb#20815

* github.com:scylladb/scylladb:
  test/boost/mutation_reader_test: add test for multishard reader buffer hint
  repair/row_level: disable read-ahead
  db/config: introduce repair_multishard_reader_enable_read_ahead
  readers/multishard: implement the read_ahead flag
  replica/database: make_multishard_streaming_reader(): expose the read_ahead parameter
  readers/multishard: add read_ahead parameter
  repair/row_level: set max buffer size on multishard reader
  replica/database: make_multishard_streaming_reader(): expose buffer_hint parameter
  db/config: introduce enable_repair_multishard_reader_buffer_hint
  readers/multishard: multishard_reader: pass hint to shard_reader
  readers/multishard: shard_reader_v2::fill_reader_buffer(): respect the hint
  readers/multishard: propagate fill_buffer_hint to shard_reader:fill_reader_buffer()
  readers/multishard: shard_reader: extract buffer-fill into its own method
2024-11-10 12:55:19 +02:00
Kefu Chai
961a53f716 dist: systemd: use default KillMode
before this change, we specify the KillMode of the scylla-service
service unit explicitly to "process". according to
according to
https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html,

> If set to process, only the main process itself is killed (not recommended!).

and the document suggests use "control-group" over "process".
but scylla server is not a multi-process server, it is a multi-threaded
server. so it should not make any difference even if we switch to
the recommended "control-group".

in the light that we've been seeing "defunct" scylla process after
stopping the scylla service using systemd. we are wondering if we should
try to change the `KillMode` to "control-group", which is the default
value of this setting.

in this change, we just drop the setting so that the systemd stops the
service by stopping all processes in the control group of this unit
are stopped.

Refs scylladb/scylladb#21507

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21508
2024-11-09 20:07:11 +02:00