Commit Graph

3414 Commits

Author SHA1 Message Date
Kefu Chai
b36cef6f1a sstable: remove _remote_prefix from s3_storage
since we use the sstable.generation() for the remote prefix of
the key of the object for storing the sstable component, there is
no need to set remote_prefix beforehand.

since `s3_storage::ensure_remote_prefix()` and
`system_kesypace::sstables_registry_lookup_entry()` are not used
anymore, they are removed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:08:22 +08:00
Kefu Chai
af8bc8ba63 sstable: switch to uuid identifier for naming S3 sstable objects
before this change, we create a new UUID for a new sstable managed
by the s3_storage, and we use the string representation of UUID
defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for
naming the objects stored on s3_storage. but this representation is
not what we are using for storing sstables on local filesystem when
the option of "uuid_sstable_identifiers_enabled" is enabled. instead,
we are using a base36-based representation which is shorter.

to be consistent with the naming of the sstables created for local
filesystem, and more importantly, to simplify the interaction between
the local copy of sstables and those stored on object storage, we should
use the same string representation of the sstable identifier.

so, in this change:

1. instead of creating a new UUID, just reuse the generation of the
   sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As
   we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined
   by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the
   base36-based one (implemented by
   fmt::formatter<sstables::generation_type>)
4. enable the `uuid_sstable_identifers` cluster feature if it is
   enabled in the `test_env_config`, so that it the sstable manager
   can enable the uuid-based uuid when creating a new uuid for
   sstable.
5. throw if the generation of sstable is not UUID-based when
   accessing / manipulating an sstable with S3 storage backend. as
   the S3 storage backend now relies on this option. as, otherwise
   we'd have sstables with key like s3://bucket/number/basename, which
   is just unable to serve as a unique id for sstable if the bucket is
   shared across multiple tables.

Fixes #14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:08:22 +08:00
Kamil Braun
c1486fee40 Merge 'commitlog: drop truncation_records after replay' from Petr Gusev
This is a follow-up for #15279 and it fixes two problems.

First, we restore flushes on writes for the tables that were switched to the schema commitlog if `SCHEMA_COMMITLOG` feature is not yet enabled. Otherwise durability is not guaranteed.

Second, we address the problem with truncation records, which could refer to the old commitlog if any of the switched tables were truncated in the past. If the node crashes later, and we replay schema commitlog, we may skip some mutations since their `replay_position`s will be smaller than the `replay_position`s stored for the old commitlog in the `truncated` table.

It turned out that this problem exists even if we don't switch commitlogs for tables. If the node was rebooted the segment ids will start from some small number - they use `steady_clock` which is usually bound to boot time. This means that if the node crashed we may skip the mutations because their RPs will be smaller than the last truncation record RP.

To address this problem we delete truncation records as soon as commitlog is replayed. We also include a test which demonstrates the problem.

Fixes #15354

Closes scylladb/scylladb#15532

* github.com:scylladb/scylladb:
  add test_commitlog
  system.truncated: Remove replay_position data from truncated on start
  main.cc: flush only local memtables when replaying schema commitlog
  main.cc: drop redundant supervisor::notify
  system_keyspace: flush if schema commitlog is not available
2023-10-18 11:14:31 +02:00
Botond Dénes
7f81957437 Merge 'Initialize datadir for system and non-system keyspaces the same way' from Pavel Emelyanov
When populating system keyspace the sstable_directory forgets to create upload/ subdir in the tables' datadir because of the way it's invoked from distributed loader. For non-system keyspaces directories are created in table::init_storage() which is self-contained and just creates the whole layout regardless of what.

This PR makes system keyspace's tables use table::init_storage() as well so that the datadir layout is the same for all on-disk tables.

Test included.

fixes: #15708
closes: scylladb/scylla-manager#3603

Closes scylladb/scylladb#15723

* github.com:scylladb/scylladb:
  test: Add test for datadir/ layout
  sstable_directory: Indentation fix after previous patch
  db,sstables: Move storage init for system keyspace to table creation
2023-10-18 12:12:19 +03:00
Avi Kivity
f42eb4d1ce Merge 'Store and propagage GC timestamp markers from commitlog' from Calle Wilund
Fixes #14870

(Originally suggested by @avikivity). Use commit log stored GC clock min positions to narrow compaction GC bounds.
(Still requires augmented manual flush:es with extensive CL clearing to pass various dtest, but this does not affect "real" execution).

Adds a lowest timestamp of GC clock whenever a CF is added to a CL segment the first time. Because GC clock is wall
clock time and only connected to TTL (not cell/row timestamps), this gives a fairly accurate view of GC low bounds
per segment. This is then (in a rather ugly way) propagated to tombstone_gc_state to narrow the allowed GC bounds for
a CF, based on what is currently left in CL.

Note: this is a rather unoptimized version - no caching or anything. But even so, should not be excessively expensive,
esp. since various other code paths already cache the results.

Closes scylladb/scylladb#15060

* github.com:scylladb/scylladb:
  main/cql_test_env: Augment compaction mgr tombstone_gc_state with CL GC info
  tombstone_gc_state: Add optional callback to augment GC bounds
  commitlog: Add keeping track of approximate lowest GC clock for CF entries
  database: Force new commitlog segment on user initiated flush
  commitlog: Add helper to force new active segment
2023-10-17 18:27:43 +03:00
Calle Wilund
6fbd210679 system.truncated: Remove replay_position data from truncated on start
Once we've started clean, and all replaying is done, truncation logs
commit log regarding replay positions are invalid. We should exorcise
them as soon as possible. Note that we cannot remove truncation data
completely though, since the time stamps stored are used by things like
batch log to determine if it should use or discard old batch data.
2023-10-17 18:16:48 +04:00
Petr Gusev
c89ead55ff system_keyspace: flush if schema commitlog is not available
In PR #15279 we removed flushes when writing to a number
of tables from the system keyspace. This was made possible
by switching these tables to the schema commitlog.
Schema commitlog is enabled only when the SCHEMA_COMMITLOG
feature is supported by all nodes in the cluster. Before that
these tables will use the regular commitlog, which is not
durable because it uses db::commitlog::sync_mode::PERIODIC. This
means that we may lose data if a node crashes during upgrade
to the version with schema commitlog.

In this commit we fix this problem by restoring flushes
after writes to the tables if the schema commitlog
is not enabled yet.

The patch also contains a test that demonstrates the
problem. We need flush_schema_tables_after_modification
option since otherwise schema changes are not durable
and node fails after restart.
2023-10-17 18:14:27 +04:00
Calle Wilund
560d3c17f0 commitlog: Add keeping track of approximate lowest GC clock for CF entries
Adds a lowest timestamp of GC clock whenever a CF is added to a CL segment
first. Because GC clock is wall clock time and only connected to TTL (not
cell/row timestamps), this gives a fairly accurate view of GC low bounds
per segment.

Includes of course a function to get the all-segment lowest per CF.
2023-10-17 10:26:41 +00:00
Calle Wilund
810d06946f commitlog: Add helper to force new active segment
When called, if active segment holds data, close and replace with pristine one.
2023-10-17 10:26:40 +00:00
Tomasz Grabiec
0aef0f900b Merge 'truncation records refactorings' from Petr Gusev
This PR contains several refactoring, related to truncation records handling in `system_keyspace`, `commitlog_replayer` and `table` clases:
* drop map_reduce from `commitlog_replayer`, it's sufficient to load truncation records from the null shard;
* add a check that `table::_truncated_at` is properly initialized before it's accessed;
* move its initialization after `init_non_system_keyspaces`

Closes scylladb/scylladb#15583

* github.com:scylladb/scylladb:
  system_keyspace: drop truncation_record
  system_keyspace: remove get_truncated_at method
  table: get_truncation_time: check _truncated_at is initialized
  database: add_column_family: initialize truncation_time for new tables
  database: add_column_family: rename readonly parameter to is_new
  system_keyspace: move load_truncation_times into distributed_loader::populate_keyspace
  commitlog_replayer: refactor commitlog_replayer::impl::init
  system_keyspace: drop redundant typedef
  system_keyspace: drop redundant save_truncation_record overload
  table: rename cache_truncation_record -> set_truncation_time
  system_keyspace: get_truncated_position -> get_truncated_positions
2023-10-17 10:55:30 +02:00
Pavel Emelyanov
059d7c795e db,sstables: Move storage init for system keyspace to table creation
User and system keyspaces are created and populated slightly
differently.

System keyspace is created via system_keyspace::make() which eventually
calls calls add_column_family(). Then it's populated via
init_system_keyspace() which calls sstable_directory::prepare() which,
in turn, optionally creates directories in datadir/ or checks the
directory permissions if it exists

User keyspaces are created with the help of
add_column_family_and_make_directory() call which calls the
add_column_family() mentioned above _and_ calls table::init_storage() to
create directories. When it's populated with init_non_system_keyspaces()
it also calls sstable_directory::prepare() which notices that the
directory exists and then checks the permissions.

As a result, sstable_directory::prepare() initializes storage for system
keyspace only and there's a BUG (#15708) that the upload/ subdir is not
created.

This patch makes the directories creation for _all_ keyspaces with the
table::init_storage(). The change only touches system keyspace by moving
the creation of directories from sstable_directory::prepare() into
system_keyspace::make().

Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-16 16:19:25 +03:00
Jan Ciolek
940e44f887 db/view: change log level of failed view updates to WARN
When a remote view update doesn't succeed there's a log message
saying "Error applying view update...".
This message had log level ERROR, but it's not really a hard error.
View updates can fail for a multitude of reasons, even during normal operation.
A failing view update isn't fatal, it will be saved as a view hint a retried later.

Let's change the log level to WARN. It's something that shouldn't happen too much,
but it's not a disaster either.
ERROR log level causes trouble in tests which assume that an ERROR level message
means that the test has failed.

Refs: https://github.com/scylladb/scylladb/issues/15046#issuecomment-1712748784

For local view updates the log level stays at "ERROR", local view updates shouldn't fail.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes scylladb/scylladb#15640
2023-10-11 18:19:23 +03:00
Avi Kivity
35849fc901 Revert "Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun"
This reverts commit 3d4398d1b2, reversing
changes made to 45dfce6632. The commit
causes some schema changes to be lost due to incorrect timestamps
in some mutations. More information is available in [1].

Reopens: scylladb/scylladb#7620
Reopens: scylladb/scylladb#13957

Fixes scylladb/scylladb#15530.

[1] https://github.com/scylladb/scylladb/pull/15687
2023-10-11 00:32:05 +03:00
Dawid Medrek
6fdca0d3a8 db/hints/manager: Reword comments about state
The current comments should be clearer to someone
not familiar with the module. This commit also makes
them abide by the limit of 120 characters per line.
2023-10-06 13:25:30 +02:00
Dawid Medrek
aa38ea3642 db/hints/manager: Unfriend space_watchdog
space_watchdog is a friend of shard hint manager just to
be able to execute one of its functions. This commit changes
that by unfriending the class and exposing the function.
2023-10-06 13:25:30 +02:00
Dawid Medrek
6cd0153954 db/hints: Remove a redundant alias 2023-10-06 13:25:30 +02:00
Dawid Medrek
ddc385bce0 db/hints: Remove an unused namespace 2023-10-06 13:25:30 +02:00
Dawid Medrek
76d414012b db/hints: Coroutinize change_host_filter() 2023-10-06 13:25:30 +02:00
Dawid Medrek
09eb30e6f1 db/hints: Coroutinize drain_for()
This commit turns the function into a coroutine
and makes the code less compact and more readable.
2023-10-06 13:25:30 +02:00
Dawid Medrek
907a572e24 db/hints: Clean up can_hint_for()
This commit gets rid of unnecessary additional calls to functions
and makes all lines abide by the limit of 120 characters.
2023-10-06 13:25:30 +02:00
Dawid Medrek
596e1f9859 db/hints: Clean up store_hint()
This commit makes the function abide by the limit
of 120 characters per line.
2023-10-06 13:25:30 +02:00
Dawid Medrek
8a43f94ca6 db/hints: Clean up too_many_in_flight_hints_for()
This commit makes the return statement more readable.
It also makes the comment abide by the limit of 120 characters per line.
2023-10-06 13:25:30 +02:00
Dawid Medrek
96a5906621 db/hints: Refactor get_ep_manager() 2023-10-06 13:25:30 +02:00
Dawid Medrek
8b591be3c3 db/hints: Coroutinize wait_for_sync_point()
This commit coroutinizes the function and adds
a comment explaining a non-trivial case.
2023-10-06 13:25:27 +02:00
Dawid Medrek
fee3aafd80 db/hints: Use std::span in calculate_current_sync_point
std::span is a lot more flexible than std::vector as it allows
for arbitrary contiguous ranges.
2023-10-06 12:36:05 +02:00
Dawid Medrek
64fd4d6323 db/hints: Clean up manager::forbid_hints_for_eps_with_pending_hints() 2023-10-06 12:26:55 +02:00
Dawid Medrek
58cd5c4167 db/hints: Clean up manager::forbid_hints() 2023-10-06 12:26:55 +02:00
Dawid Medrek
f8ed93f5bc db/hints: Clean up manager::allow_hints() 2023-10-06 12:26:52 +02:00
Dawid Medrek
bfe32bcf89 db/hints: Coroutinize compute_hints_dir_device_id() 2023-10-06 12:18:30 +02:00
Dawid Medrek
8f28eb6522 db/hints: Clean up manager::stop()
This commit gets rid of boilerplate in the function,
leverages a range pipe and explicit types to make
the code more readable, and changes the logs to
make it clearer what happens.
2023-10-06 12:18:30 +02:00
Dawid Medrek
a384caece0 db/hints: Clean up manager::start()
This commit coroutinizes the function and makes it less compact.
2023-10-06 12:18:30 +02:00
Dawid Medrek
2db97aaf81 db/hints/manager: Clean up the constructor
fmt::to_string should be preferred to seastar::format.
It's clearer and simpler. Besides that, this commit makes
the code abide by the limit of 120 characters per line.
2023-10-06 12:18:30 +02:00
Dawid Medrek
6c10a86791 db/hints: Remove boilerplate drain_lock() 2023-10-06 12:18:30 +02:00
Dawid Medrek
f1f35ba819 db/hints: Let drain_for() return a future
Currently, the function doesn't return anything.
However, if the futurue doesn't need to be awaited,
the caller can decide that. There is no reason
to make that decision in the function itself.
2023-10-06 12:18:25 +02:00
Dawid Medrek
79e1412f14 db/hints: Remove ep_managers_end
The methods are redundant and are effectively
code boilerplate.
2023-10-06 12:15:04 +02:00
Dawid Medrek
cfbacb29bb db/hints: Remove find_ep_manager
The methods are redundant and are effectively
code boilerplate.
2023-10-06 12:15:04 +02:00
Dawid Medrek
1c70a18fc7 db/hints: Use manager as API for hint_endpoint_manager
This commit makes with_file_update_mutex() a method of hint_endpoint_manager
and introduces db::hints::manager::with_file_update_mutex_for() for accessing
it from the outside. This way, hint_endpoint_manager is hidden and no one
needs to know about its existence.
2023-10-06 12:15:01 +02:00
Dawid Medrek
d068143b83 db/hints: Don't mark have_ep_manager()'s definition as inline
Doing that doesn't allow for external linkage, so
it's not accessible from other files.
2023-10-06 11:54:15 +02:00
Dawid Medrek
58249363bc db/hints: Remove make_directory_initializer()
The function is never used. It's not even implemented.
2023-10-06 11:54:15 +02:00
Dawid Medrek
f47a669f75 db/hints/manager: Order constructors
This commit orders constructors of db::hints::manager for readability.
2023-10-06 11:54:15 +02:00
Dawid Medrek
4663f72990 db/hints: Move ~manager() and mark it as noexcept
The destructor is trivial and there is no reason
to keep in the source file. We mark it as noexcept too.
2023-10-06 11:54:15 +02:00
Dawid Medrek
18a2831186 db/hints: Use reference for storage proxy
This commit makes db::hints::manager store service::storage_proxy
as a reference instead of a seastar::shared_ptr. The manager is
owned by storage proxy, so it only lives as long as storage proxy
does. Hence, it makes little sense to store the latter as a shared
pointer; in fact, it's very confusing and may be error-prone.
The field never changes, so it's safe to keep it as a reference
(especially because copy and move constructors of db::hints::manager
are both deleted). What's more, we ensure that the hint manager
has access to storage proxy as soon as it's created.

The same changes were applied to db::hints::resource_manager.
The rationale is the same.
2023-10-06 11:54:15 +02:00
Dawid Medrek
3c347cc196 db/hints/manager: Explicitly delete copy constructor
This commit explicitly deletes the copy constructor of
db::hints::manager and its copy assignment. They're not
used in the code, and they should not.
2023-10-06 11:54:15 +02:00
Dawid Medrek
ee5a5c1661 db/hints: Capitalize constants
This is a common convention. Follow it for readability.
2023-10-06 11:54:15 +02:00
Dawid Medrek
fd30bac7b1 db/hints/manager: Hide declarations 2023-10-06 11:54:15 +02:00
Dawid Medrek
4b03cba1bf db/hints/manager: Move the defintions of static members to the header
If the variables are accessible from the outside, it makes
sense to also expose their initial values to the user.
This commit moves them to the header and marks as inline.
2023-10-06 11:54:15 +02:00
Dawid Medrek
c3ab28f5e9 db/hints: Move make_dummy() to the header
The function is trivial. It can also be marked as noexcept.
2023-10-06 11:54:15 +02:00
Dawid Medrek
5e333f0a52 db/hints: Don't explicitly define ~directory_initializer()
The destructor is the default destructor, and it is safe
to drop it altogether.
2023-10-06 11:53:02 +02:00
Dawid Medrek
9f215d3cf1 db/hints: Change the order of logging in ensure_created_and_verified()
The new logging order seems to make more sense, i.e.
we first log that we're creating and validating directories,
and only then do we start doing that.
The previous order when those actions were reversed didn't
match the log's message because the action was already
done when we informed the user of it.
2023-10-06 11:14:41 +02:00
Dawid Medrek
4ad3e8d37b db/hints: Coroutinize ensure_rebalanced() 2023-10-06 11:14:41 +02:00