Commit Graph

44545 Commits

Author SHA1 Message Date
Michael Litvak
5c95aaae0d view_builder: common write view_build_status function
When writing to the view_build_status we have common logic related to
upgrade and deciding whether to write to sys_dist ks or group0.
Move this common logic to a generic function used by all functions
writing to the table.
2024-09-05 15:42:35 +03:00
Michael Litvak
c1f3517a75 view_builder: improve migration to v2 with intermediate phase
Add an intermediate phase to the view builder migration to v2 where we
write to both the old and new table in order to not lose writes during
the migration.
We add an additional view builder version v1_5 between v1 and v2 where
we write to both tables. We perform a barrier before moving to v2 to
ensure all the operations to the old table are completed.
2024-09-05 15:42:35 +03:00
Michael Litvak
446ad3c184 view: delete node rows from view_build_status on node removal
When a node is removed we want to clean its rows from the
view_build_status table.
Now when removing a node and generating the topology state update, we
generate also the mutations to delete all the possible rows belonging to
the node from the table.
2024-09-05 15:42:35 +03:00
Michael Litvak
08462aaff7 view: sanitize view_build_status during migration
When migrating the view_build_status to v2, skip adding any leftover
rows that don't correspond to an existing node or an existing view.

Previously such rows could have been created and not cleaned, for
example when a node is removed.
2024-09-05 15:42:35 +03:00
Michael Litvak
78d6ff6598 view: make old view_build_status table a virtual table
After migrating the view build status from
system_distributed.view_build_status to system.view_build_status_v2, we
set system_distributed.view_build_status to be a virtual table, such
that reading from it is actually reading from the underlying new table.

The reason for this is that we want to keep compatibility with the old
table, since it exists also in Cassandra and it is used by various external
tools to check the view build status. Making the table virtual makes the
transition transparent for external users.

The two tables are in different keyspaces and have different shard
mapping. The v1 table is a distributed table with a normal shard
mapping, and the v2 table is a local table using the null sharder. The
virtual reader works by constructing a multishard reader which reads the rows
from shard zero, and then filtering it to get only the rows owned by the
current shard.
2024-09-05 15:42:35 +03:00
Michael Litvak
09eadcff08 replica: move streaming_reader_lifecycle_policy to header file
move the class streaming_reader_lifecycle_policy to a header file in
order to make it reusable in other places.
2024-09-05 15:42:35 +03:00
Michael Litvak
22f4f1fa49 view_builder: test view_build_status_v2
Add tests to verify the new view_build_status_v2 is used by the
view_builder and can be read from all nodes with the expected values.
Also test a migration from the v1 layout to v2.
2024-09-05 15:42:35 +03:00
Michael Litvak
fcf66ad541 storage_service: add view_build_status to raft snapshot
Include the table system.view_build_status_v2 in the raft snapshot, and
also the view_builder version parameter.
2024-09-05 15:42:30 +03:00
Michael Litvak
8d25a4d678 view_builder: migration to v2
Migrate view_builder to v2, to store the view build status of all nodes
in the group0 based table view_build_status_v2.

Introduce a feature view_build_status_on_group0 so we know when all
nodes are ready to migrate and use the new table.

A new cluster is initialized to use v2. Otherwise, The topology coordinator
initiates the migration when the feature is enabled, if it was not done
already.

The migration reads all the rows in the v1 table and writes it via
group0 to the v2 table, together with a mutation that updates the
view_builder parameter in scylla_local to v2. When this mutation is
applied, it updates the view_builder service to start using the v2
table.
2024-09-05 15:41:04 +03:00
Michael Litvak
f3887cd80b db:system_keyspace: add view_builder_version to scylla_local
Add a new scylla_local parameter view_builder_version, and functions to
read and mutate the value.
The version value defaults to v1 if it doesn't exist in the table.
2024-09-05 15:41:04 +03:00
Michael Litvak
d58a8930c4 view_builder: read view status from v2 table
Update the view_status function to read from the new
view_build_status_v2 table when enabled.
The code to read and extract the values is identical to v1 and v2 except it
accesses different keyspace and table, so the common code is extracted
to the view_status_common function and used by both v1 and v2 flows with
appropriate parameters.
2024-09-05 15:41:04 +03:00
Michael Litvak
05d18b818f view_builder: introduce writing status mutations via raft
Introduce the announce_with_raft function as alternative to writing view build
status mutations to the table in system_distributed. Instead, we can
apply the mutations via group0 operation to the view_build_status_v2
table.
All the view_builder functions that write to the view_build_status table
can be configured by a flag to either write the legacy way or via raft.
2024-09-05 15:41:04 +03:00
Michael Litvak
b8c7a10ae6 view_builder: pass group0_client and qp to view_builder
Store references of group0_client and query_processor in the
view_builder service.
They are required for generating mutations and writing them via group0.
2024-09-05 15:41:04 +03:00
Michael Litvak
b2332c5a72 view_builder: extract sys_dist status operations to functions
Extract all the update and read operations of a view build status in the
table system_distributed.view_build_status to separate functions.
2024-09-05 15:41:04 +03:00
Michael Litvak
bf4a58bf91 db:system_keyspace: add view_build_status_v2 table
add the table system.view_build_status_v2 with the same schema as
system_distributed.view_build_status.
2024-09-05 15:41:04 +03:00
Gleb Natapov
807e37502a db/consistency_level: do not use result from heat weighted load balancer if it contains duplicates
Because of https://github.com/scylladb/scylladb/issues/9285 heat weighted
load balancer may sometimes return same node twice. It may cause wrong
data to be read or unexpected errors to be returned to a client. Since
the original bug is not easy to fix and it is rare lets introduce a
workaround. We will check for duplicates and will use non HWLB one if
one is found.

Fixes scylladb/scylladb#20430

Closes scylladb/scylladb#20414
2024-09-05 15:21:35 +03:00
Wojciech Mitros
c1b0434c16 test: finish mv view update explicitly instead of relying on delay duration
When testing mv admission control, we perform a large view update
and check if the following view update can be admitted due to the
high view backlog usage. We rely on a delay which keeps the backlog
high for longer to make sure the backlog is still increased during
the second write. However, in some test runs the delay is not long
enough, causing the second write to miss the large backlog and not
hit admission control.

In this patch we keep the increased backlog high using another
injection instead of relying on a delay to make absolute sure
that the backlog is still high during the second write.

Fixes scylladb/scylladb#20382

Closes scylladb/scylladb#20445
2024-09-05 15:08:04 +03:00
Lakshmi Narayanan Sreethar
7c5efab7d5 cql-pytest: add test to verify consider_only_existing_data compaction option
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:34:13 +05:30
Lakshmi Narayanan Sreethar
68a902f74a tools/scylla-nodetool: add consider-only-existing-data option to compact command
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:34:06 +05:30
Lakshmi Narayanan Sreethar
84d06a13c7 api: compaction: add consider_only_existing_data option
Added a new parameter `consider_only_existing_data` to major compaction
API endpoints. When enabled, major compaction will:

- Force-flush all tables.
- Force a new active segment in the commit log.
- Compact all existing SSTables and garbage-collect tombstones by only
  checking the SSTables being compacted. Memtables, commit logs, and
  other SSTables not part of the compaction will not be checked, as they
  will only contain newer data that arrived after the compaction
  started.

The `consider_only_existing_data` is passed down to the compaction
descriptor's `gc_check_only_compacting_sstables` option to ensure that
only the existing data is considered for garbage collection.

The option is also passed to the `maybe_flush_commitlog` method to make
sure all the tables are flushed and a new active segment is created in
the commit log.

Fixes #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
98bc44f900 compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp
When gc_check_only_compacting_sstables is enabled,
get_max_purgeable_timestamp should not check memtables and other
sstables that are not part of the compaction to deduce the max purgeable
timestamp.

Refs #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
7b9ce8e040 compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled
When the compaction_descriptor's gc_check_only_compacting_sstables flag
is enabled, create and pass a copy of the get_tombstone_gc_state that
will skip checking the commitlog.

Refs #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
12fa40154b tombstone_gc_state: introduce with_commitlog_check_disabled()
Added a new method, `with_commitlog_check_disabled`, that returns a new
copy of the tombstone_gc_state but with commitlog check disabled. This
will be used by a following patch to disable commitlog checks during
compaction.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
5b8c6a8a5e compaction: introduce new option to check only compacting sstables for gc
Added new option, `gc_check_only_compacting_sstables`, to
compaction_descriptor to control the garbage collection behavior. The
subsequent patches will use this flag to decide if the garbage
collection has to check only the SSTables being compacted to collect
tombstones. This option is disabled for now and will be enabled based on
a new compaction parameter that will be added later in this patch
series.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
5e6bffc146 compaction: rename maybe_flush_all_tables to maybe_flush_commitlog
Major compaction flushes all tables as a part of flushing the commitlog.
After forcing new active segments in the commitlog, all the tables are
flushed to enable reclaim of older commitlog segments. The main goal is
to flush the commitlog and flushing all the table is just a dependency.

Rename maybe_flush_all_tables to maybe_flush_commitlog so that it
reflects the actual intent of the major compaction code. Added a new
wrapper method to database::flush_all_tables(),
database::flush_commitlog(), that is now called from
maybe_flush_commitlog.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
fa2488cc83 compaction: maybe_flush_all_tables: add new force_flush param
Add a new parameter, `force_flush` to the maybe_flush_all_tables()
method. Setting `force_flush` to true will flush all the tables
regardless of when they were flushed last. This will be used by the new
compaction option in a following patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Laszlo Ersek
53524974db docs/dev/maintainer.md: clarify "Updating submodule references"
Before the introduction of "scripts/refresh-submodules.sh", there was
indeed some manual work for the maintainer to do, hence "publish your
work" must have sounded correct. Today, the phrase "publish your work"
sounds confusing.

Commit 71da4e6e79 ("docs: Document sync-submodules.sh script in
maintainer.md", 2020-06-18) should have arguably reworded the last step of
the submodule refresh procedure; let's do it now.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#20333
2024-09-05 13:57:32 +03:00
Pavel Emelyanov
1f0db29ef6 test: Remove unused directory semaphore
The with_sstable_dir() helper no longer needs one, it used to pass it as
argument to sstable_directory constructor, but now the directory doesn't
need it (takes semaphore via table object).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20396
2024-09-05 13:11:35 +03:00
Kefu Chai
b4fc24cc1f github: use needs.read-toolchain.outputs.image for build-scylla
so we don't need to hardwire the image on which we build scylla.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20370
2024-09-05 12:58:36 +03:00
Pavel Emelyanov
955391d209 sstable_directory: Fix indentation after previous patches
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
2febde24f3 sstable_directory: Use yielding lister in .handle_sstables_pending_delete()
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
02aac3e407 sstable_directory: Use yielding lister in .cleanup_column_family_temp_sst_dirs()
Indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
ff77a677a6 sstable_directory: Use yielding lister in .prepare()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
7b5fe6bee6 sstable_directory: Shorten lister loop
Squash call to lister.get() and check for the returned value into
while()'s condition. This saves few more lines of code as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
5dc266cefa sstable_directory: Use with_closeable() in .process()
The method already uses yielding lister, but handles the exceptions
explicitly. Use with_closeable() helper, it makes the code shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
7742b90cb1 directory_lister: Add noexcept default move-constructor
It's required to make it possible to push lister into with_closeable().
Its requiremenent of nothrow-move-constructible doesn't accept
default-generated one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:10:21 +03:00
Nikos Dragazis
2450afb934 sstables: Replace assert with on_internal_error
The `skip()` method of the compressed data source implementation uses an
assert statement to check if the given offset is valid.

Replace this with `on_internal_error()` to fail gracefully. An invalid
offset shouldn't bring the whole server down.

Also, enhance the error message for unsynced compressed readers.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-05 11:03:54 +03:00
Pavel Emelyanov
da598a6210 test: Restore indentation after previous changes
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:38:01 +03:00
Pavel Emelyanov
e16c07c896 test: Threadify tombstone_in_tombstone2()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
28d016f312 test: Threadify range_tombstone_reading()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
7d567d07ad test: Threadify tombstone_in_tombstone()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
a34e38f070 test: Threadify broken_ranges_collection()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
eac4ec47f8 test: Threadify compact_storage_dense_read()
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
322c1ee9c5 test: Threadify compact_storage_simple_dense_read()
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
df71b3e446 test: Threadify compact_storage_sparse_read()
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
142ccc64fb test: Simplify test_range_reads() counting
It used to keep counter with the help of a smart pointer, now it can
just use on-stack variable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
a78ab2e998 test: Simplify test_range_reads() inner loop
It used to rely on bool (wrapped with pointer) and future<>-based loop
helper, now it can just break from the while loop.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
c84ae64562 test: Threadify test_range_reads() itself
And update its callers again.
Preserve no longer relevant local smart pointers until next patch.
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
253d53b6a1 test: Threadify test_range_reads() callers
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:00 +03:00
Pavel Emelyanov
fd8bb0c46c test: Threadify generate_clustered() itself
And update its callers again.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:35:59 +03:00