Commit Graph

44318 Commits

Author SHA1 Message Date
Amnon Heiman
9b5f29b6bc test/alternator/util.py: Allow override BillingMode
This patch adds the ability to override the BillingMode. If a
BillingMode is provided to the create_test_table function, it will
override the default BillingMode.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-09-10 17:06:47 -04:00
Amnon Heiman
c76347032d alternator/executor.cc: Store ProvisionedThroughput
This patch adds the ability to store and retrieve the
ProvisionedThroughput in a table.

The information is stored in the table tags. We use the TTL convention
used in alternator, and the tags will be: system:provisioned_rcu and
system:provisioned_wcu.

verify_billing_mode function now return a struct with the billing mode
information.

The code of describe_table now check if the provision tags exists and
return the RCU and WCU accordingly.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-09-10 17:06:40 -04:00
Botond Dénes
fc690a60d8 Update tools/cqlsh submodule
* tools/cqlsh 86a280a1...b09bc793 (6):
  > build(deps): bump actions/download-artifact in /.github/workflows
  > cqlshlib/test: Add test_formatting.py
  > cqlshlib/test: Use assertEqual instead of assertEquals
  > cqlsh.py: Send DESCRIBE statement to server before parsing
  > cqlsh.py: Fix indentation
  > cqlsh.py: change shebang to /usr/bin/env python3
2024-09-10 08:11:40 +03:00
Lakshmi Narayanan Sreethar
2148e33d37 compaction: remove unnecessary share bump for split, scrub, and upgrade
When split, scrub, and upgrade compactions ran under the compaction
group, they had to bump up their shares to a minimum of 200 to prevent
slow progress as they neared completion, especially in workloads with
inconsistent ingestion rates. Since commit e86965c2 moved these
compactions to the maintenance group, this share bump is no longer
necessary. This patch removes the unnecessary share allocation.

Fixes #20224

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#20495
2024-09-09 22:03:38 +03:00
Avi Kivity
9448260b30 Merge 'major compaction: check only sstables being compacted for tombstone garbage collection' from Lakshmi Narayanan Sreethar
Any expired tombstone can be garbage collected if it doesn't shadow data in the commit log, memtable, or uncompacting SSTables.

This PR introduces a new mode to major compaction, enabled by the `consider_only_existing_data` flag that bypasses these checks. When enabled, memtables and old commitlog segments are cleared with a system-wide flush and all the sstables (after flush) are included in the compaction, so that it works with all data generated up to a given time point.

This new mode works with the assumption that newly written data will not be shadowed by expired tombstones. So it ignores new sstables (and new data written to memtable) created after compaction started. Since there was a system wide flush, commitlog checks can also be skipped when garbage collecting tombstones. Introducing data shadowed by a tombstone during compaction can lead to undefined behavior, even without this PR, as the tombstone may or may not have already been garbage collected.

Fixes #19728

Closes scylladb/scylladb#20031

* github.com:scylladb/scylladb:
  cql-pytest: add test to verify consider_only_existing_data compaction option
  tools/scylla-nodetool: add consider-only-existing-data option to compact command
  api: compaction: add `consider_only_existing_data` option
  compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp
  compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled
  tombstone_gc_state: introduce with_commitlog_check_disabled()
  compaction: introduce new option to check only compacting sstables for gc
  compaction: rename maybe_flush_all_tables to maybe_flush_commitlog
  compaction: maybe_flush_all_tables: add new force_flush param
2024-09-09 20:45:41 +03:00
Avi Kivity
894b85ce95 Merge 'hints: send hints with CL=ALL if target is leaving' from Piotr Dulikowski
Currently, when attempting to send a hint, we might choose its recipients in one of two ways:

- If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.

There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded.

As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving.

Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again.

Fixes scylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835

Should be backported to 6.0 and 6.1; the dtest started failing due to topology on raft, which sped up execution of the test and exposed the preexisting problem.

Closes scylladb/scylladb#20488

* github.com:scylladb/scylladb:
  test: topology_custom/test_hints: consistency test for decommission
  test: topology_custom/test_hints: move sync point helpers to top level
  test: topology/util: extract find_server_by_host_id
  hints: send hints with CL=ALL if target is leaving
  hints: inline do_send_one_mutation
2024-09-09 18:23:13 +03:00
Avi Kivity
c3e19425bd Merge 'docs/dev/docker-hub.md: refresh aio-max-nr calculation' from Laszlo Ersek
~~~
What we have today in "docs/dev/docker-hub.md" on "aio-max-nr" dates back
to scylla commit f4412029f4 ("docs/docker-hub.md: add quickstart section
with --smp 1", 2020-09-22). Problems with the current language:

- The "65K" claim as default value on non-production systems is wrong;
  "fs/aio.c" in Linux initializes "aio_max_nr" to 0x10000, which is 64K.

- The section in question uses equal signs (=) incorrectly. The intent was
  probably to say "which means the same as", but that's not what equality
  means.

- In the same section, the relational operator "<" is bogus. The available
  AIO count must be at least as high (>=) as the requested AIO count.

- Clearer names should be used;
  adjust_max_networking_aio_io_control_blocks() in "src/core/reactor.cc"
  sets a great example:

  - "reactor::max_aio" should be called "storage_iocbs",

  - "detect_aio_poll" should be called "preempt_iocbs",

  - "reactor_backend_aio::max_polls" should be called "network_iocbs".

- The specific value 10000 for the last one ("network_iocbs") is not
  correct in scylla's context. It is correct as the Seastar default, but
  scylla has used 50000 since commit 2cfc517874 ("main, test: adjust
  number of networking iocbs", 2021-07-18).

Rewrite the section to address these problems.

See also:
- https://github.com/scylladb/scylladb/issues/5981
- https://github.com/scylladb/seastar/pull/2396
- https://github.com/scylladb/scylladb/pull/19921

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
~~~

No need for backporting; the documentation being refreshed targets developers as audience, not end-users.

Closes scylladb/scylladb#20398

* github.com:scylladb/scylladb:
  docs/dev/docker-hub.md: refresh aio-max-nr calculation
  docs/dev/docker-hub.md: strip trailing whitespace
2024-09-09 15:04:38 +03:00
Botond Dénes
3e0bff161c Merge 'Use yielding directory lister in sstable_directory' from Pavel Emelyanov
The yielding lister is considered to be better replacement that scan_dir(lambda) one.
Also, the sstable directory will be patched to scan the contents of S3 bucket and yielding lister fits better for generalization.

Closes scylladb/scylladb#20114

* github.com:scylladb/scylladb:
  sstable_directory: Fix indentation after previous patches
  sstable_directory: Use yielding lister in .handle_sstables_pending_delete()
  sstable_directory: Use yielding lister in .cleanup_column_family_temp_sst_dirs()
  sstable_directory: Use yielding lister in .prepare()
  sstable_directory: Shorten lister loop
  sstable_directory: Use with_closeable() in .process()
  directory_lister: Add noexcept default move-constructor
2024-09-09 14:35:51 +03:00
Pavel Emelyanov
0f48847d02 test: Use shorter with_sstable_directory overload()
In sstable directory test there are two of those -- one that works on
path, state, env and callback, and the other one that just needs env and
callback, getting path from env and assuming state is normal.

Two test cases in this test can enjoy the shorter one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20395
2024-09-09 14:25:24 +03:00
Pavel Emelyanov
2bfbbaffac test: Use sstables::test_env to make sstables for schema loader test
This test calls manager directly, but it's shorter to ask test_env for
that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20431
2024-09-09 14:22:58 +03:00
Takuya ASADA
e36c939505 dist: tune LimitNOFILES for large nodes
On very large node, LimitNOFILES=80000 may not enough size, it can cause
"Too many files" error.

To avoid that, let's increase LimitNOFILES on scylla_setup stage,
generate optimal value calurated from memory size and number of cpus.

Closes scylladb/scylla-enterprise#4304

Closes scylladb/scylladb#20443
2024-09-09 14:13:49 +03:00
Piotr Smaron
60af48f5fd cql: fix exception when validating KS in CREATE TABLE
c70f321c6f added an extra check if KS
exists. This check can throw `data_dictionary::no_such_keyspace`
exception, which is supposed to be caught and a more user-friendly
exception should be thrown instead.
This commit fixes the above problem and adds a testcase to validate it
doesn't appear ever again.
Also, I moved the check for the keyspace outside of the `for` loop, as
it doesn't need to be checked repeatedly.

Fixes: scylladb/scylladb#20097

Closes scylladb/scylladb#20404
2024-09-09 13:30:57 +03:00
Avi Kivity
9a5061209f Merge '[test.py] Enable allure for python test' from Andrei Chekun
To enhance the test reports UX:
1. switching off/on passed/failed/skipped test for better visibility
2. better searching in test results
3. understanding the trends of execution for each test
4. better configurability of the final report

Enable allure adapter for all python tests.
Add tags and parameters to the test to be able to distinguish them across modes and runs.

Related: https://github.com/scylladb/qa-tasks/issues/1665

Related: https://github.com/scylladb/scylladb/pull/19335

Related: https://github.com/scylladb/scylladb/pull/18169

Closes scylladb/scylladb#19942

* github.com:scylladb/scylladb:
  [test.py] Clean duplicated arg for test suite
  [test.py] Enable allure for python test
2024-09-09 12:53:00 +03:00
Kefu Chai
ccbd3eb9f7 main: do not register redis and alternator services if not enabled
in main.cc, we start redis with `ss.local().register_protocol_server()`
only if it is enabled. but `storage_service` always calls
`stop_server()` with _all_ registered server, no matter if they have
started or not. in general, it does not hurt. for instance,

`redis::controller::stop_server()` is a noop, if the controller
is not started. but `storage_service` still print the logging message
like:
```
INFO  2024-09-04 11:20:02,224 [shard 0:main] storage_service - Shutting down redis server
INFO  2024-09-04 11:20:02,224 [shard 0:main] storage_service - Shutting down redis server was successful
```

this could be confusing or at least distracting when a field engineer
looks at the log. also, please note, `redis_port` and `redis_ssl_port`
cannot be changed dynamically once scylla server is up, so we do not
need to worry about "what if the redis server is started at runtime,
how can is be stopped?".

the same applies to alternator service.

in this change, to avoid surprises, we conditionally register the
protocol servers with the storage service based on their enabled statuses.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20472
2024-09-09 08:44:50 +03:00
Avi Kivity
58713f3080 types: remove some unused free functions
These functions are unused, so safe to remove, and reduce the work
to convert to managed_bytes{,_view}.

Closes scylladb/scylladb#20482
2024-09-09 08:36:33 +03:00
Kefu Chai
720997d1de cql3/statements: mark format string as constexpr const
after switching over to the new `seastar::format()` which enables
the compile-time format check, the fmt string should be a constexpr,
otherwise `fmt::format()` is not able to perform the check at compile
time.

to prepare for bumping up the seastar module to a version which
contains the change of `seastar::format()`, let's mark the format
string with `constexpr const`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20484
2024-09-09 08:35:45 +03:00
Piotr Dulikowski
6f3d0af994 test: topology_custom/test_hints: consistency test for decommission
Adds the test_hints_consistency_during_decommission test which
reproduces the failure observed in scylladb/scylla-dtest#4582. It uses
error injections, including the newly added
topology_coordinator_pause_after_streaming injection, to reliably
orchestrate the scenario observed there.

In a nutshell, the test makes sure to replay hints after streaming
during decommission has finished, but before the cluster switches to
reading from new replicas. Without the fix, hints would be replayed to
the decommissioned node and then would be lost forever after the cluster
start reading from new replicas.
2024-09-08 10:51:38 +02:00
Piotr Dulikowski
30d53167c9 test: topology_custom/test_hints: move sync point helpers to top level
Move create_sync_point and await_sync_point from the scope of the
test_sync_point test to the file scope. They will be used in a test that
will be introduced in the commit that follows.
2024-09-08 10:51:38 +02:00
Piotr Dulikowski
a75d0c0bfa test: topology/util: extract find_server_by_host_id
Move it out from test_mv_tablets_replace.py. It will be used by a test
introduced in a later commit.
2024-09-08 10:51:38 +02:00
Piotr Dulikowski
61ac0a336d hints: send hints with CL=ALL if target is leaving
Currently, when attempting to send a hint, we might choose its
recipients in one of two ways:

- If the original destination is a natural endpoint of the hint, we only
  send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.

There is a problem when we decommission a node: while data is streamed
away from that node, it is still considered to be a natural endpoint of
the data that it used to own. Because of that, it might happen that a
hint is sent directly to it but streaming will miss it, effectively
resulting in the hint being discarded.

As sending the hint _only_ to the leaving replica is a rather bad idea,
send the hint to all replicas also in the case when the original
destiantion of the hint is leaving.

Note that this is a conservative fix written only with the decommission
+ vnode-based keyspaces combo in mind. In general, such "data loss" can
occur in other situations where the replica set is changing and we go
through a streaming phase, i.e. other topology operations in case of
vnodes and tablet load balancing. However, the consistency guarantees of
hinted handoff in the face of topology changes are not defined and it is
not clear what they should be, if there should be any at all. The
picture is further complicated by the fact that hints are used by
materialized views, and sending view updates to more replicas than
necessary can introduce inconsistencies in the form of "ghost rows".
This fix was developed in response to a failing test which checked the
hint replay + decommission scenario, and it makes it work again.

Fixes scylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835
2024-09-08 10:50:59 +02:00
Piotr Dulikowski
8abb06ab82 hints: inline do_send_one_mutation
It's a small method and it is only used once in send_one_mutation.
Inlining it lets us get rid of its declaration in the header - now, if
one needs to change the variables passed from one function to another,
it is no longer necessary to change the header.
2024-09-08 07:19:35 +02:00
Avi Kivity
ab32ce6b45 Merge 'Coroutinize sstable::read_summary() method' from Pavel Emelyanov
Shorter and simpler this way. Hopefully it doesn't sit on critical paths

Closes scylladb/scylladb#20460

* github.com:scylladb/scylladb:
  sstables: Fix indentation after previous patch
  sstables: Coroutinize sstable::read_summary()
2024-09-06 18:45:54 +03:00
Kefu Chai
aeaeaf345d compaction: use structured binding when appropriate
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20473
2024-09-06 18:17:48 +03:00
Kamil Braun
427ad2040f Merge 'test: randomized failure injection for Raft-based topology' from Evgeniy Naydanov
The idea of the test is to have a cluster where one node is stressed with injections and failures and the rest of the cluster is used to make progress of the raft state machine.

To achieve this following two lists introduced in the PR:

  - ERROR_INJECTIONS in error_injections.py
  - CLUSTER_EVENTS in cluster_events.py

Each cluster event is an async generator which has 2 yields and should be used in the following way:

   0. Start the generator:
       ```python
       >>> cluster_event_steps = cluster_event(manager, random_tables, error_injection)
       ```

   1. Run the prepare part (before the first yield)
       ```python
       >>> await anext(cluster_event_steps)
       ```

   2.  Run the cluster event itself (between the yields)
       ```python
       >>> await anext(cluster_event_steps)
       ```

   3. Run the check part (after the second yield)
       ```python
       >>> await anext(cluster_event, None)
       ```

Closes scylladb/scylladb#16223

* github.com:scylladb/scylladb:
  test: randomized failure injection for Raft-based topology
  test: error injections for Raft-based topology
  [test.py] topology.util: add get_non_coordinator_host() function
  [test.py] random_tables: add UDT methods
  [test.py] random_tables: add CDC methods
  [test.py] api: get scylla process status
  [test.py] api: add expected_server_up_state argument to server_add()
2024-09-06 14:00:41 +02:00
Pavel Emelyanov
226fd03bae Merge 'service/qos: remove unused marked_for_deletion field from service_level struct' from Piotr Dulikowski
The `service_level::marked_for_deletion` field is always set to `false`. It might have served some purpose in the past, but now it can be just removed, simplifying the code and eliminating confusion about the field.

This is just code cleanup, no backport is needed.

Closes scylladb/scylladb#20452

* github.com:scylladb/scylladb:
  service/qos: remove the marked_for_deletion parameter
  service/qos: add constructors to service_level
2024-09-06 11:44:25 +03:00
Kamil Braun
52fdf5b4c9 test: test_raft_no_quorum: increase raft timeout in debug mode
The test cases in this file use an error injection to reduce raft group
0 timeouts (from the default 1 minute), in order to speed up the tests;
the scenarios expect these timeouts to happen, so we want them to happen
as quick as possible, but we don't want to reduce timeouts so much that
it will make other operations fail when we don't expect them to (e.g.
when the test wants to add a node to the cluster).

Unfortunately the selected 5 seconds in debug mode was not enough and
made the tests flaky: scylladb/scylladb#20111.

Increase it to 10 seconds. This unfortunately will slow down these tests
as they have to sometimes wait for 10 seconds for the timeout to happen.
But better to have this than a flaky test.

Fixes: scylladb/scylladb#20111

Closes scylladb/scylladb#20320
2024-09-06 11:40:09 +03:00
Avi Kivity
384a09585b repair: row_level: repair_get_row_diff_with_rpc_stream_process_op: simplify return value
During review of 0857b63259 it was noticed that the function

  repair_get_row_diff_with_rpc_stream_process_op()

and its _slow_path callee only ever return stop_iteration::no (or throw
an exception). As such, its return value is useless, and in fact the
only caller ignores it. Simplify by returning a plain future<>.

Closes scylladb/scylladb#20441
2024-09-06 11:39:21 +03:00
Kefu Chai
034c1df29b auth/authentication_options: move fmt::formatter up
so that it is accessible from its caller. if we enforce the
compile-time format string check, the formatter would need the access to
the specialization of `fmt::formatter` of the arguments being foramtted.
to be prepared for this change, let's move the `fmt::formatter`
specialization up, otherwise we'd have following error after switching
to the compile-time format string check introduced by a recent seastar
change:

```
In file included from ./auth/authenticator.hh:22:                                                                                                             ./auth/authentication_options.hh:50:49: error: call to consteval function 'fmt::basic_format_string<char, auth::authentication_option &>::basic_format_string<
char[32], 0>' is not a constant expression
   50 |             : std::invalid_argument(fmt::format("The {} option is not supported.", k)) {
      |                                                 ^                                                                                                     ./auth/authentication_options.hh:57:13: error: explicit specialization of 'fmt::formatter<auth::authentication_option>' after instantiation
   57 | struct fmt::formatter<auth::authentication_option> : fmt::formatter<string_view> {
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/base.h:1228:17: note: implicit instantiation first required here
 1228 |     -> decltype(typename Context::template formatter_type<T>().format(
      |                 ^
In file included from replica/distributed_loader.cc:30:
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20447
2024-09-06 09:12:38 +03:00
Pavel Emelyanov
527fc9594a sstables: Fix indentation after previous patch
And move the comment inside if while at it, it looks better in there
(and makes less churn in the patch itself)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-06 08:43:08 +03:00
Pavel Emelyanov
f7325586f3 sstables: Coroutinize sstable::read_summary()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-06 08:43:07 +03:00
Evgeniy Naydanov
dd99cf197d test: randomized failure injection for Raft-based topology
The idea of the test is to have a small cluster, where one node
is stressed with injections and failures and the rest of
the cluster is used to make progress of the Raft state machine.

To achieve this following two lists introduced in the commit:

  - ERROR_INJECTIONS in error_injections.py
  - CLUSTER_EVENTS in cluster_events.py

Each cluster event is an async generator which has 2 yields and should be
used in the following way:

   0. Start the generator:
       >>> cluster_event_steps = cluster_event(manager, random_tables, error_injection)

   1. Run the prepare part (before the first yield)
       >>> await anext(cluster_event_steps)

   2. Run the cluster event itself (between the yields)
       >>> await anext(cluster_event_steps)

   3. Run the check part (after the second yield)
       >>> await anext(cluster_event, None)
2024-09-05 22:11:32 +00:00
Evgeniy Naydanov
769424723b test: error injections for Raft-based topology
Add following error injections:
 - stop_after_init_of_system_ks
 - stop_after_init_of_schema_commitlog
 - stop_after_starting_gossiper
 - stop_after_starting_raft_address_map
 - stop_after_starting_migration_manager
 - stop_after_starting_commitlog
 - stop_after_starting_repair
 - stop_after_starting_cdc_generation_service
 - stop_after_starting_group0_service
 - stop_after_starting_auth_service
 - stop_during_gossip_shadow_round
 - stop_after_saving_tokens
 - stop_after_starting_gossiping
 - stop_after_sending_join_node_request
 - stop_after_setting_mode_to_normal_raft_topology
 - stop_before_becoming_raft_voter
 - topology_coordinator_pause_after_updating_cdc_generation
 - stop_before_streaming
 - stop_after_streaming
 - stop_after_bootstrapping_initial_raft_configuration
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
ac4ffbad5c [test.py] topology.util: add get_non_coordinator_host() function
Add get_non_coordinator_host() function which returns
ServerInfo for the first host which is not a coordinator
or None if there is no such host.

Also rework get_coordinator_host() to not fail if some
of the hosts don't have a host id.
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
d95d698601 [test.py] random_tables: add UDT methods
Add .add_udt() / .drop_udt() methods.
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
8cb442ca50 [test.py] random_tables: add CDC methods
Add .enabled_cdc() / .disable_cdc() methods.
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
a7119cf420 [test.py] api: get scylla process status
Add `server_get_process_status(server_id)` API call and
wait_for_scylla_process_status() helper function.
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
241bbb4172 [test.py] api: add expected_server_up_state argument to server_add()
Allow to return from server_add() when a server reaches specified state.
One of:
 - PROCESS_STARTED
 - HOST_ID_QUERIED (previously called NOT_CONNECTED)
 - CQL_CONNECTED (renamed from CONNECTED)
 - CQL_QUERIED (was just QUERIED)

Also, rename CqlUpState to ServerUpState and move to internal_types.
2024-09-05 22:11:31 +00:00
Pavel Emelyanov
69a5ec69c4 test: Use table storage options in sstable_directory_test
When creating sstables this test allocates temporary local options.
That works, because this test doesn't run on object storage, but it's
more correct to pick storage options from the table at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20440
2024-09-05 17:48:25 +03:00
Gleb Natapov
807e37502a db/consistency_level: do not use result from heat weighted load balancer if it contains duplicates
Because of https://github.com/scylladb/scylladb/issues/9285 heat weighted
load balancer may sometimes return same node twice. It may cause wrong
data to be read or unexpected errors to be returned to a client. Since
the original bug is not easy to fix and it is rare lets introduce a
workaround. We will check for duplicates and will use non HWLB one if
one is found.

Fixes scylladb/scylladb#20430

Closes scylladb/scylladb#20414
2024-09-05 15:21:35 +03:00
Wojciech Mitros
c1b0434c16 test: finish mv view update explicitly instead of relying on delay duration
When testing mv admission control, we perform a large view update
and check if the following view update can be admitted due to the
high view backlog usage. We rely on a delay which keeps the backlog
high for longer to make sure the backlog is still increased during
the second write. However, in some test runs the delay is not long
enough, causing the second write to miss the large backlog and not
hit admission control.

In this patch we keep the increased backlog high using another
injection instead of relying on a delay to make absolute sure
that the backlog is still high during the second write.

Fixes scylladb/scylladb#20382

Closes scylladb/scylladb#20445
2024-09-05 15:08:04 +03:00
Lakshmi Narayanan Sreethar
7c5efab7d5 cql-pytest: add test to verify consider_only_existing_data compaction option
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:34:13 +05:30
Lakshmi Narayanan Sreethar
68a902f74a tools/scylla-nodetool: add consider-only-existing-data option to compact command
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:34:06 +05:30
Lakshmi Narayanan Sreethar
84d06a13c7 api: compaction: add consider_only_existing_data option
Added a new parameter `consider_only_existing_data` to major compaction
API endpoints. When enabled, major compaction will:

- Force-flush all tables.
- Force a new active segment in the commit log.
- Compact all existing SSTables and garbage-collect tombstones by only
  checking the SSTables being compacted. Memtables, commit logs, and
  other SSTables not part of the compaction will not be checked, as they
  will only contain newer data that arrived after the compaction
  started.

The `consider_only_existing_data` is passed down to the compaction
descriptor's `gc_check_only_compacting_sstables` option to ensure that
only the existing data is considered for garbage collection.

The option is also passed to the `maybe_flush_commitlog` method to make
sure all the tables are flushed and a new active segment is created in
the commit log.

Fixes #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
98bc44f900 compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp
When gc_check_only_compacting_sstables is enabled,
get_max_purgeable_timestamp should not check memtables and other
sstables that are not part of the compaction to deduce the max purgeable
timestamp.

Refs #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
7b9ce8e040 compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled
When the compaction_descriptor's gc_check_only_compacting_sstables flag
is enabled, create and pass a copy of the get_tombstone_gc_state that
will skip checking the commitlog.

Refs #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
12fa40154b tombstone_gc_state: introduce with_commitlog_check_disabled()
Added a new method, `with_commitlog_check_disabled`, that returns a new
copy of the tombstone_gc_state but with commitlog check disabled. This
will be used by a following patch to disable commitlog checks during
compaction.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
5b8c6a8a5e compaction: introduce new option to check only compacting sstables for gc
Added new option, `gc_check_only_compacting_sstables`, to
compaction_descriptor to control the garbage collection behavior. The
subsequent patches will use this flag to decide if the garbage
collection has to check only the SSTables being compacted to collect
tombstones. This option is disabled for now and will be enabled based on
a new compaction parameter that will be added later in this patch
series.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
5e6bffc146 compaction: rename maybe_flush_all_tables to maybe_flush_commitlog
Major compaction flushes all tables as a part of flushing the commitlog.
After forcing new active segments in the commitlog, all the tables are
flushed to enable reclaim of older commitlog segments. The main goal is
to flush the commitlog and flushing all the table is just a dependency.

Rename maybe_flush_all_tables to maybe_flush_commitlog so that it
reflects the actual intent of the major compaction code. Added a new
wrapper method to database::flush_all_tables(),
database::flush_commitlog(), that is now called from
maybe_flush_commitlog.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
fa2488cc83 compaction: maybe_flush_all_tables: add new force_flush param
Add a new parameter, `force_flush` to the maybe_flush_all_tables()
method. Setting `force_flush` to true will flush all the tables
regardless of when they were flushed last. This will be used by the new
compaction option in a following patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Laszlo Ersek
53524974db docs/dev/maintainer.md: clarify "Updating submodule references"
Before the introduction of "scripts/refresh-submodules.sh", there was
indeed some manual work for the maintainer to do, hence "publish your
work" must have sounded correct. Today, the phrase "publish your work"
sounds confusing.

Commit 71da4e6e79 ("docs: Document sync-submodules.sh script in
maintainer.md", 2020-06-18) should have arguably reworded the last step of
the submodule refresh procedure; let's do it now.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#20333
2024-09-05 13:57:32 +03:00