When purging regular tombstone consult the min_live_timestamp, if available.
This is safe since we don't need to protect dead data from resurrection, as it is already dead.
For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp,
if available, otherwise fallback to the min_live_timestamp.
If we see in a view table a shadowable tombstone with time T, then in any row where the row marker's timestamp is higher than T the shadowable tombstone is completely ignored and it doesn't hide any data in any column, so the shadowable tombstone can be safely purged without any effect or risk resurrecting any deleted data.
In other words, rows which might cause problems for purging a shadowable tombstone with time T are rows with row markers older or equal T. So to know if a whole sstable can cause problems for shadowable tombstone of time T, we need to check if the sstable's oldest row marker (and not oldest column) is older or equal T. And the same check applies similarly to the memtable.
If both extended timestamp statistics are missing, fallback to the legacy (and inaccurate) min_timestamp.
Fixesscylladb/scylladb#20423Fixesscylladb/scylladb#20424
> [!NOTE]
> no backport needed at this time
> We may consider backport later on after given some soak time in master/enterprise
> since we do see tombstone accumulation in the field under some materialized views workloads
Closesscylladb/scylladb#20446
* github.com:scylladb/scylladb:
cql-pytest: add test_compaction_tombstone_gc
sstable_compaction_test: add mv_tombstone_purge_test
sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection
sstable_compaction_test: tombstone_purge_test: add testlog debugging
sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp
sstable, compaction: add debug logging for extended min timestamp stats
compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats
compaction: define max_purgeable_fn
tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh
sstables: scylla_metadata: add ext_timestamp_stats
compaction_group, storage_group, table_state: add extended timestamp stats getters
sstables, memtable: track live timestamps
memtable_encoding_stats_collector: update row_marker: do nothing if missing
To return the minimum live timestamp and live row-marker
timestamp across a compaction_group, storage_group, or
table_state.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The test-env in question is mostly started in one-shard mode. Also there
are several boost tests that start sharded<> environment. In that case
instances on different shards live in different temp dirs. That's not
critical yet, but better to have single directory for the whole test.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20412
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
Added a new boost test, index_reader_test, with a testcase to verifyi
the abort behaviour during an index read using
index_consume_entry_context.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Add a new member that stores the abort_source. This can later be used by
the sstables to check if an abort has been requested. Also implement
sstables_manager::get_abort_source() that returns a const reference to
the abort source.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
C++23 made std::unique_ptr constexpr. A side effect of this (presumably)
is that the compiler compiles it more eagerly, requiring the full definition
of the class in std::make_unique, while it previously was content with
finding the definition later.
One victim of this change is table_for_tests; define it earlier to
build with C++23.
C++ standard does not define the order in which the parameters
passed to a function are evaluated. so in theory, in
```c++
reusable_sst(sst->get_schema(), std::move(sst));
```
`std::move(sst)` could be evaluated before `sst->get_schema`.
but please note, `std::move(sst)` does not move `sst`
away, it merely cast `sst` to a rvalue reference, it is
`reusable_sst()` which *could* move `sst` away by
consuming it. so following call is much more dangerous
than the above one:
```c++
reusable_sst(sst->get_schema(), modify_sst(std::move(sst)))
```
nevertheless, this usage is still confusing. so instead
of passing a copy of `sst` to `reusable_sst`.
this change is inspired by clang-tidy, it warns like:
```
Warning: /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: warning: 'sst' used after it was moved [bugprone-use-after-move]
397 | return reusable_sst(sst->get_schema(), std::move(sst));
| ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:44: note: move occurred here
397 | return reusable_sst(sst->get_schema(), std::move(sst));
| ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
397 | return reusable_sst(sst->get_schema(), std::move(sst));
|
```
per the analysis above, this is a false alarm.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18775
Store that maintenance scheduling group inside the sstables_manager. The
next patch will use this to run the components reloader fiber.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.
Refs scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
sstables::test_env uses the pimpl idiom, but incompletely. This
prevents reaping some of the benefits.
Complete the pimplification:
- the `impl` nested struct is moved out-of-line
- all non-template member functions are moved out-of-line
- a destructor is declared and defined out-of-line
- the move constructor is also defined (necessary after the destructor is
defined)
After this, we can forward-declare more components.
test_env implementation is scattered around two .cc, concentrate it
in test_services.cc, which happens to be the file that doesn't cause
link errors.
Move toc_filename with it, as it is its only caller and it is static.
sstables::test_env is intended for sstable unit tests, but to satisfy its
dependency of an sstables_registry we instantiate an entire database.
Remove the dependency by having a mock implementation of sstables_registry
and using that instead.
Closesscylladb/scylladb#17895
sstables_manager now depends on system_keyspace for access to the
system.sstables table, needed by object storage. This violates
modularity, since sstables_manager is a relatively low-level leaf
module while system_keyspace integrates large parts of the system
(including, indirectly, sstables_manager).
One area where this is grating is sstables::test_env, which has
to include the much higher level cql_test_env to accommodate it.
Fix this by having sstables_manager expose its dependency on
system_keyspace as an interface, sstables_registry, and have
system_keyspace implement the glue logic in
system_keyspace_sstables_manager.
Closesscylladb/scylladb#17868
It was observed that some use cases might append old data constantly to
memtable, blocking GC of expired tombstones.
That's because timestamp of memtable is unconditionally used for
calculating max purgeable, even when the memtable doesn't contain the
key of the tombstone we're trying to GC.
The idea is to treat memtable as we treat L0 sstables, i.e. it will
only prevent GC if it contains data that is possibly shadowed by the
expired tombstone (after checking for key presence and timestamp).
Memtable will usually have a small subset of keys in largest tier,
so after this change, a large fraction of keys containing expired
tombstones can be GCed when memtable contains old data.
Fixes#17599.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#17835
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.
Fixes#16180
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#16658
To be used in the next patch to control whether the semaphore registers
and exports metrics or not. We want to move metric registration to the
semaphore but we don't want all semaphores to export metrics. The
decision on whether a semaphore should or shouldn't export metrics
should be made on a case-by-case basis so this new parameter has no
default value (except for the for_tests constructor).
Soon, the reader_concurrency_semaphore will require a unique
and meaningful name in order to label its metrics. To prepare
for that, name sstable_manager instances. This will be used
to generate a name for sstable_manager's reader_concurrency_semaphore.
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.
In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Take it from compaction_manager_test::run() which is simplified overwite
of the compaction_manager::perform_compaction().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The purpose of this method is to turn public the private
compaction_manager method of the same name. The caller of this method is
having sstable_test_env at hand with its test_env_compaction_manager, so
the de-private-isation call can be moved.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Make it public and add `bool enable` flag so that test cases could start
the compaction manager (to call make_table_for_tests() later) but keep
it disabled for their testing purposes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now when the sstables::test_env provides the compaction manager
instance, the table_for_tests can start using it and can remove c.m. and
the sidecar task_manager.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Most of the test cases that use sstables::test_env do not mess with
table objects, they only need sstables. However, compaction test cases
do need table objects and, respectively, a compaction manager instance.
Today those test cases create compaction manager instance for each table
they create, but that's a bit heaviweight and doesn't work the way core
code works. This patch prepares the sstables::test_env to provide
compaction manager on demand by starting it as soon as it's asked to
create table object.
For now this compaction manager is unused, but it will be in next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patches will stop using compaction manager from table_for_tests in
favor of external one (spoiler: the one from sstables::test_env), thus
the compaction manager would outsurvive the table_for_tests object and
the table object wrapped by it. So in order for the table_for_tests to
stop correctly, it also needs to stop the wrapped table too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's not used any longer and can be removed. This make table_for_tests
stopping code a bit shorter as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the continuation of the previous patch. Make the caller of
table_for_tests constructor provide the table::config. This makes the
table_for_tests constructor shorter and more self-contained.
Also, the caller now needs to provide the reference to reader
concurrency semaphore, and that's good news, because the only caller for
today is the sstables::test_env that does have it. This makes the
semaphore sitting on table_for_tests itself unused and it will be
removed eventually.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The table_for_tests keeps a copy of table::config on board. That's not
"idiomatic" as table config is a temporary object that should only be
needed while creating table object. Fortunately, the copy of config on
table_for_tests is no longer needed and it can be made temporary.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Making compaction permit needs a semaphore. Current code gets it from
the table_for_tests, but the very same semaphore reference sits on the
table. So get it from table, as the core code does. This will allow
removing the dedicated semaphore from table_for_tests in the future.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When making table object it needs the cache tracker reference. The
table_for_tests keeps one on board, but the very same object already
sits on the sstables manager which has public getter.
This makes the table_for_tests's cache tracker object not needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's going to get larger, so better to move.
Also when coroutinized it's goind to be easier to extend.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
before this change, we create a new UUID for a new sstable managed
by the s3_storage, and we use the string representation of UUID
defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for
naming the objects stored on s3_storage. but this representation is
not what we are using for storing sstables on local filesystem when
the option of "uuid_sstable_identifiers_enabled" is enabled. instead,
we are using a base36-based representation which is shorter.
to be consistent with the naming of the sstables created for local
filesystem, and more importantly, to simplify the interaction between
the local copy of sstables and those stored on object storage, we should
use the same string representation of the sstable identifier.
so, in this change:
1. instead of creating a new UUID, just reuse the generation of the
sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As
we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined
by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the
base36-based one (implemented by
fmt::formatter<sstables::generation_type>)
4. enable the `uuid_sstable_identifers` cluster feature if it is
enabled in the `test_env_config`, so that it the sstable manager
can enable the uuid-based uuid when creating a new uuid for
sstable.
5. throw if the generation of sstable is not UUID-based when
accessing / manipulating an sstable with S3 storage backend. as
the S3 storage backend now relies on this option. as, otherwise
we'd have sstables with key like s3://bucket/number/basename, which
is just unable to serve as a unique id for sstable if the bucket is
shared across multiple tables.
Fixes#14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
some of the tests are still relying on the integer-based sstable
identifier, so let's add a method to test_env, so that the tests
relying on this can opt-out. we will change the default setting
of sstables::test_env to use uuid-base sstable identifier in the
next commit. this change does not change the existing behavior.
it just adds a new knob to test_env_config. and let the tests
relying on this to customize the test_env_config to disable
use_uuid.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
when accessing AWS resources, uses are allowed to long-term security
credentials, they can also the temporary credentials. but if the latter
are used, we have to pass a session token along with the keys.
see also https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html
so, if we want to programatically get authenticated, we need to
set the "x-amz-security-token" header,
see
https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#UsingTemporarySecurityCredentials
so, in this change, we
1. add another member named `token` in `s3::endpoint_config::aws_config`
for storing "AWS_SESSION_TOKEN".
2. populate the setting from "object_storage.yaml" and
"$AWS_SESSION_TOKEN" environment variable.
3. set "x-amz-security-token" header if
`s3::endpoint_config::aws_config::token` is not empty.
this should allow us to test s3 client and s3 object store backend
with S3 bucket, with the temporary credentials.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15486
We want to switch system.scylla_local table to the
schema commitlog, but load phases hamper here - schema
commitlog is initialized after phase1,
so a table which is using it should be moved to phase2,
but system.scylla_local contains features, and we need
them before schema commitlog initialization for
SCHEMA_COMMITLOG feature.
In this commit we are taking a different approach to
loading system tables. First, we load them all in
one pass in 'readonly' mode. In this mode, the table
cannot be written to and has not yet been assigned
a commit log. To achieve this we've added _readonly bool field
to the table class, it's initialized to true in table's
constructor. In addition, we changed the table constructor
to always assign nullptr to commitlog, and we trigger
an internal error if table.commitlog() property is accessed
while the table is in readonly mode. Then, after
triggering on_system_tables_loaded notifications on
feature_service and sstable_format_selector, we call
system_keyspace::mark_writable and eventually
table::mark_ready_for_writes which selects the
proper commitlog and marks the table as writable.
In sstable_compaction_test we drop several
mark_ready_for_writes calls since they are redundant,
the table has already been made writable in
env.make_table_for_tests call.
The table::commitlog function either returns the current
commitlog or causes an error if the table is readonly. This
didn't work for virtual tables, since they never called
mark_ready_for_writes. In this commit we add this
call to initialize_virtual_tables.
In this refactoring commit we remove the db::config::host_id
field, as it's hacky and duplicates token_metadata::get_my_id.
Some tests want specific host_id, we add it to cql_test_config
and use in cql_test_env.
We can't pass host_id to sstables_manager by value since it's
initialized in database constructor and host_id is not loaded yet.
We also prefer not to make a dependency on shared_token_metadata
since in this case we would have to create artificial
shared_token_metadata in many tools and tests where sstables_manager
is used. So we pass a function that returns host_id to
sstables_manager constructor.
Currently minio applies anonymous public policy for the test bucket and
all tests just use unsigned S3 requests. This patch generates a policy
for the temporary minio user and removes the anon public one. All tests
are updated respectively to use the provided key:secret pair.
The use-https bit is off by default as minio still starts with plain
http. That's OK for now, all tests are local and have no secret data
anyway
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The bucket is going to stop being public, rename the env variable in
advance to make the essential patch smaller
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The option was introduced to bootstrap the project. It's still
useful for testing, but that translates into maintaining an
additional option and code that will not be really used
outside of testing. A possible option is to later map the
option in boost tests to initial_tablets, which may yield
the same effect for testing.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
There are a few good reasons for this change.
1) compaction_group doesn't have to be aware of # of groups
2) thinking forward to dynamic tablets, # of groups cannot be
statically embedded in group id, otherwise it gets stale.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
perform_cleanup may be waiting for those sstables
to become eligible for cleanup so signal it
when table::move_sstables_from_staging detects an
sstable that requires cleanup.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Adding new APIs /column_family/tombstone_gc and /storage_service/tombstone_gc, that will allow for disabling tombstone garbage collection (GC) in compaction.
Mimicks existing APIs /column_family/autocompaction and /storage_service/autocompaction.
column_family variant must specify a single table only, following existing convention.
whereas the storage_service one can specify an entire keyspace, or a subset of a tables in a keyspace.
column_family API usage
-----
```
The table name must be in keyspace:name format
Get status:
curl -s -X GET "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"
Enable GC
curl -s -X POST "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"
Disable GC
curl -s -X DELETE "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"
```
storage_service API usage
-----
```
Tables can be specified using a comma-separated list.
Enable GC on keyspace
curl -s -X POST "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"
Disable GC on keyspace
curl -s -X DELETE "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"
Enable GC on a subset of tables
curl -s -X POST
"http://127.0.0.1:10000/storage_service/tombstone_gc/ks?cf=table1,table2"
```
Closes#13793
* github.com:scylladb/scylladb:
test: Test new API for disabling tombstone GC
test: rest_api: extract common testing code into generic functions
Add API to disable tombstone GC in compaction
api: storage_service: restore indentation
api: storage_service: extract code to set attribute for a set of tables
tests: Test new option for disabling tombstone GC in compaction
compaction_strategy: bypass tombstone compaction if tombstone GC is disabled
table: Allow tombstone GC in compaction to be disabled on user request
If tombstone GC was disabled, compaction will ensure that fully expired
sstables won't be bypassed and that no expired tombstones will be
purged. Changing the value takes immediate effect even on ongoing
compactions.
Not wired into an API yet.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Right now the map<endpoint, config> sits on the sstables manager and its
update is governed by database (because it's peering and can kick other
shards to update it as well).
Having the sharded<storage_manager> at hand lets freeing database from
the need to update configs and keeps sstables_manager a bit smaller.
Also this will allow keeping s3 clients shared between sstables via this
map by next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>