The newly added testcase is based on the already existing
`test_alter_dropped_tablets_keyspace`.
A new error injection is created, which stops the ALTER execution just
before the changes are submitted to RAFT. In the meantime, a new schema
change is performed using the 2nd node in the cluster, thus causing the
1st node to retry the ALTER statement.
ALTER tablets-enabled KEYSPACES (KS) may fail due to
`group0_concurrent_modification`, in which case it's repeated by a `for`
loop surrounding the code. But because raft's `add_entry` consumes the
raft's guard (by `std::move`'ing the guard object), retries of ALTER KS
will use a moved-from guard object, which is UB, potentially a crash.
The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event
will be repeated by the topology state machine if it receives the
concurrent modification exception, because the event will remain present
in the global requests queue, hence it's going to be executed as the
very next event.
`topology_coordinator::handle_topology_coordinator_error` handling the
case of `group0_concurrent_modification` has been extended with logging
in order not to write catch-log-throw boilerplate.
Note: refactor is implemented in the follow-up commit.
Fixes: scylladb/scylladb#21102
Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load.
fixes#20043Closesscylladb/scylladb#21020
* github.com:scylladb/scylladb:
tablets: Validate system.tablets update
group0_client: Introduce change validation
group0_client: Add shared_token_metadata dependency
The testcase is flaky due to a known python driver issue:
https://github.com/scylladb/python-driver/issues/317.
This issue causes the `CREATE KEYSPACE` statement to be sometimes
executed twice in a row, and the 2nd CREATE statement causes the test to
fail.
In order to work around it, it's enough to add `if not exists` when
creating a ks.
Fixes: scylladb/scylladb#21034
Needs to be backported to all 6.x branches, as the PR introducing this flakiness is backported to every 6.x branch.
Closesscylladb/scylladb#21056
aiohttp 3.10.5 complains when 'unix+http' is used for a unix-domain
socket. USe 'http', which work with 3.10.5 and the toolchain's 3.9.5.
Closesscylladb/scylladb#21080
The SCYLLA-VERSION-GEN file skips updating the SCYLLA-*-FILE files if
the commit hash from SCYLLA-RELEASE-FILE is the same. The original
reason for this was to prevent the date in the version string from
changing if multiple modes are built across midnight
(scylladb/scylla-pkg#826). However - intentionally or not - it serves
another purpose: it prevents an infinite loop in the build process.
If the build.ninja file needs to be rebuilt, the configure.py script
unconditionally calls ./SCYLLA-VERSION-GEN. On the other hand, if one
of the SCYLLA-*-FILE files is updated then this triggers rebuild
of build.ninja. Apparently, this is sufficient for ninja to enter an
infinite loop.
However, the check assumes that the RELEASE is in the format
<build identifier>.<date>.<commit hash>
and assumes that none of the components have a dot inside - otherwise it
breaks and just works incorrectly. Specifically, when building a private
version, it is recommended to set the build identifier to
`count.yourname`.
Previously, before 85219e9, this problem wasn't noticed most likely
because reconfigure process was broken and stopped overwriting
the build.ninja file after the first iteration.
Fix the problem by fixing the logic that extracts the commit hash -
instead of looking at the third dot-separated field counting from the
left side, look at the last field.
Fixes: scylladb/scylladb#21027Closesscylladb/scylladb#21049
The system.sstables (a.k.a. sstables registry) primary key is "string location" as partition key and "uuid generation" as clustering one. The "location" part was taken from table.config.datadir value which, in turn, a string containing path to on-disk files if the table was located locally, e.g. /var/lib/scylla/data/ks/cf-abc123 one. Recently [1] the datadir was moved from table config onto storage options, but this string is still used as registry key.
Other than being owned by a table with ID, sstables are accessed by restore-from-object-storage code [2]. To make it work, both storage driver and sstable_directory helper class maintain two formats of object prefixes for sstables components. For S3-backed sstables having a record in registry, the path used is s3://bucket/generation/component. For restore code there are user-provided prefixes that do not match the aforementioned pattern. The selection between those two is now made by checking sstable state, which is not obvious and may cause troubles for tiered storage driver.
This patch changes the registry schema so that partition key becomes "uuid owner" and is set to be table.id() value. This is to stop using the local path by S3 backed sstables. Also this change makes it possible for storage driver and sstable directory to rely on the storage options only to tell different bucket prefixes formats from each other.
As a side effect, the make_s3_object_name() helper, that generates the proper object name, becomes explicit for restore-from-S3 usage. Now it relies on the sstable::filename() calling this->prefix() behind the scenes and the latter to return the user-provided prefix, which is pretty fragile construction.
No need to backport (and it's not going to be easy to do it), storage options feature is still experimental
Refs #20675 [1]
Refs #20305 [2]
Closesscylladb/scylladb#20998
* github.com:scylladb/scylladb:
sstables: Flatten S3 object name making
sstable_directory: Flatten directory lister creation
treewide: Rename sstable registry location field to be owner
system_keyspace: Change sstables registry partition key type
sstables: Keep location variant on s3 backend too
storage_options: Use variant on S3 options
sstables: Split sstable::filename() helper
sstables: Add s3_storage::owner() helper
seastar extracted `addr2line` python module out back in
e078d7877273e4a6698071dc10902945f175e8bc. but `install.sh` was
not updated accordingly. it still installs `seastar-addr2line`
without installing its new dependency. this leaves us with a
broken `seastar-addr2line` in the relocatable tarball.
```console
$ /opt/scylladb/scripts/seastar-addr2line
Traceback (most recent call last):
File "/opt/scylladb/scripts/libexec/seastar-addr2line", line 26, in <module>
from addr2line import BacktraceResolver
ModuleNotFoundError: No module named 'addr2line'
```
in this change, we redistribute `addr2line.py` as well. this
should address the issue above.
Fixesscylladb/scylladb#21077
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21078
When we made the raft-based topology mandatory, all boost test
tests started using it. Then, `test_read_required_hosts` started
failing. We left investigating it for later and started running it
with `force-gossip-topology-changes` to make it pass.
Currently, the test doesn't fail with the raft-based topology
anymore. Hence, we remove the FIXME and run the test with a normal
config.
We don't know when and why the test stopped failing. Investigating
it wouldn't be easy, since we don't even know why it failed in the
first place. We suspect that there was some bug that is now fixed.
This patch only fixes a test, there is no need to backport it.
Fixesscylladb/scylladb#18463Closesscylladb/scylladb#20960
During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking:
- Condition checking in speculating read executors for the number of replicas.
- Checking the consistency of the Effective Replication Map in filter_for_query(): the map is considered incorrect if the list of replicas contains a node from a data center whose replication factor is 0.
Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs.
Refs scylladb/scylladb#20625
As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2.
Closesscylladb/scylladb#20851
* github.com:scylladb/scylladb:
Add conditions checking for get_read_executor
Avoid an extra call to block_for in db::filter_for_query.
Improve code readability in consistency_level.cc and storage_proxy.cc
tools: Add build_info header with functions providing build type information
tests: Add tests for alter table with RF=1 to RF=0
The s3_storage backend driver has a method that generates object path
within the bucket. Depending on options alternative it picks one of two
formats:
- for string prefix, it uses it implicitly via sstable::filename() call
that calls storage->prefix() which, in turn, returns prefix value
- for registry-backed sstables, the /bucket/generation/component path is
generated
This patch bruses this place up. Similarly to previous patch, this
change also makes the selection based on the location alternative, not
on the sstable state. As well it's idempotent change, as S3 sstables
with 'upload' state only appear when restoring from object store, and in
this case the string location is in use.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After previous patchin, the way components lister is created for S3
storage options became quite hairy. This patch brushes things up to be
easier to read.
The only "functional" change here, is that selection between registry
lister and S3 lister is made based on options' location held
alternative, not on the sstable state value. That's in fact idempotent
change, the only caller that provides string location on options is the
"restore from object store" code that also sets state to be 'upload'.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is sort of continuation of the previous patch. The partition key in
the registry is now table_id, not string, and is better called "owner",
not "location". This patch is s/location/owner/ over specific places
that include field name in the schema, argument names in registry
maintenance classes and tests accessing the selected row fields by name.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Today, the system.sstables schema uses string as partition key. Callers,
in turn, use table's datadir value to reference entries in it. That's
wrong, S3-backed sstables don't have any local paths to work with. The
table's ID is better in this role.
This patch only changes the field type to be table_id and fixes the
callers to provide one. In particular, see init_table_storage() change
-- instead of generating a datadir string, it sets table.id() as the
options' location. Other fixed places are tests. Internally, this id
value is propagated via s3_storage::owner() method, that's fixed as
well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Enables debugging inside pytest subprocesses as well. It seems that pydev automatically attaches itself also to all python subprocesses. Since we used to call "pytest" wrapper it was deemed a different program, and we could not debug individual tests.
Closesscylladb/scylladb#21050
Previous patch put variant<string, table_id> as location of S3 options.
This patch makes the S3 sstables backend driver keep variant as sstable
location. As with the previous patch, driver only keeps variant, but
continues using its string alternative internally. This will be changed
later on.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Describing S3 storage for an sstables nowadays has two options -- via
sstables registry entry and by using the direct prefix string. The
former is used when putting a keyspace on S3. In this case each sstable
has the corresponding entry in the system.sstables table. The latter is
used by "restore from object storage" code. In that case, sstables don't
have entries in the registry, but are accessed by a specific S3 object
path.
This patch reflects this difference by making s3_options::location be
variant of string prefix and table_id owner. The owner needs more
explanation, here it is.
Today, the system.sstables schema defines partition key to be "string
location" and clustering key to be "UUID generation". The partition key
is table's datadir string, but it's wrong to use it this way. Next
patches will change the partition key to be table's ID (there's table_id
type for it), and before doing it storage options must be prepared to
carry it onboard. This patch does it, but the table_id alternative of
the location is still unused, the rest of the code keeps using the
string location to reference a row in the registry table. Next patches
will eventually make use of the table_id value.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add the gossip state for broadcasting the nodes state_id.
Implemented the Group0 state broadcaster (based on the gossip) that will broadcast the state id of each node and check the minimal state id for the tombstone GC.
When there is a change in the tombstone GC minimal state id, the state broadcaster will update the tombstone GC time for the group0-managed tables.
The main component of the change is the newly added `group0_state_id_handler` that keeps track, broadcasts and receives the last group0 state_ids across all nodes and sets the tombstone GC deletion time accordingly:
* on each group0 change applied, the state_id handler broadcasts the state_id as a gossip state (only if the value has changed)
* the handler checks for the node state ids every refresh period (configurable, 1h by default)
* on every check, the handler figures out the lowest state_id (timeuuid), which is state_id that all of the nodes already have
* the timestamp of this minimum state_id is then used to set the tombstone GC deletion time
* the tombstone GC calculation then uses that deletion time to provide the GC time back to the callers, e.g. when doing the compaction
* (as the time for tombstone GC calculation has the 1s granularity we actually deduce 1s from the determined timestamp, because it can happen that there were some newer mutations received in the same second that were not distributed across the nodes yet)
This change introduces a new flag to the static schema descriptor (`is_group0_table`) that is being checked for this newly added mode in the tombstone GC. We also add a check (in non-release builds only) on every group0 modification that the table has this flag set.
The group0 tombstone GC handling is similar to the "repair" tombstone GC mode in a sense (that the tombstone GC time is determined according to a reconciliation action), however it is not explicitly visible to (nor editable by) the user. And also the tombstone GC calculation is much simpler than the "repair" mode calculation - for example, we always use the whole range (as opposed to the "repair" mode that can have specific repair times set for specific ranges).
We use the group0 configuration to determine the set of nodes (both current and previous in case of joint configuration) - we need to make sure that we account for all the group0 nodes (if any node didn't provide the state_id yet, the current check round will be skipped, i.e. no GC will be done until all known nodes provide their state_id timestamp value).
Also note that the group0 state_id handling works on all nodes independently, i.e. each node might have its own (possibly different) state depending on the gossip application state propagation. This is however not a problem, as some nodes might be behind, but they will catch up eventually, and this solution has the benefit of being distributed (as opposed to having a central point to handle the state, like for example the topology coordinator that has been considered in the early stages of the design).
Fixes: scylladb/scylla#15607
New feature, should not be backported.
Closesscylladb/scylladb#20394
* github.com:scylladb/scylladb:
raft: add the check for the group0 tables
raft: fast tombstone GC for group0-managed tables
tombstone_gc: refactor the repair map
raft: flag the group0-managed tables
gossip: broadcast the group0 state id
raft/test: add test for the group0 tombstone GC
treewide: code cleanup and refactoring
To have the filename(type, prefix) one, next patches will provide prefix
on their own, to avoid storage->prefix() call.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This driver uses sstring _location as part of the lookup key in the
sstables registry. Next patches will need to change that and put more
checks on the registry access, so introduce a helper method beforehand.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
During the investigation of scylladb/scylladb#20282, it was discovered that
implementations of speculating read executors have undefined behavior
when called with an incorrect number of read replicas. This PR
introduces two levels of condition checking:
- Condition checking in speculating read executors for the number of replicas.
- Checking the consistency of the Effective Replication Map in
get_endpoints_for_reading(): the map is considered incorrect the number of
read replica nodes is higher than replication factor. The check is
applied only when built in non release mode.
Please note: This PR does not fix the issue found in scylladb/scylladb#20282;
it only adds condition checks to prevent undefined behavior in cases of
inconsistent inputs.
Refs scylladb/scylladb#20625
A new header provides `constexpr` functions to retrieve build
type information: `get_build_type()`, `is_release_build()`,
and `is_debug_build()`. These functions are useful when adding
changes that should be enabled at compile time only for
specific build types.
Adding Vnodes and Tablets tests for alter keyspace operation that decreases replication factor
from 1 to 0 for one of two data centers. Tablet version fails due to issue described in
scylladb/scylladb#20625.
Test for scylladb/scylladb#20625
When generating readers for the set of sstables, the end size of this
vector is known in advance and its storage can be reserved.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#21055
Implement change validation for raft topology_change command. For now
the only check is that the "pending replicas" contains at most one
entry. The check mirrors similar one in `process_one_row` function.
If not passed, this prevents system.tablets from being updated with the
mutation(s) that will not be loaded later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add validate_change() methods (well, a template and an overload) that
are called by prepare_command() and are supposed to validate the
proposed change before it hits persistent storage
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It will be needed later to get tablet_metadata from.
The dependency is "OK", shared_token_metadata is low-level sharded
service. Client already references db::system_keyspace, which in turn
references replica::database which, finally, references token_metadata
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The schema module (everything in schema/) is supposed to be towards the
leafs in the ScyllaDB inter-module dependency graph. In other words, it
should not depend on many other modules. On the other hand, almost the
entire codebase depends on the schema module itself.
Currently there is a circular dependency between schema and
replica::database, as the latter is a required argument for
schema::describe(). This is bad, not just because of the dependency mess
it introduces, but also because now schema::describe() can only be used
by code which has a reference to the database handy.
This patch breaks this circular dependency, by introducing the
schema_describe_helper interface and providing an implementation for it
in database.hh.
There is another circular dependency: schema <-> replica::table. This is
not addressed by this patch.
Closesscylladb/scylladb#20893
Use clear_gently to avoid the following stalls.
```
~frozen_mutation_fragment at ././frozen_mutation.hh:268
std::destroy_at<frozen_mutation_fragment> at
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_construct.h:88
std::allocator_traits<std::allocator<std::_List_node<frozen_mutation_fragment>
> >::destroy<frozen_mutation_fragment> at
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/alloc_traits.h:537
std::__cxx11::_List_base<frozen_mutation_fragment,
std::allocator<frozen_mutation_fragment> >::_M_clear at
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/list.tcc:77
~_List_base at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_list.h:499
~partition_key_and_mutation_fragments at ././repair/repair.hh:298
~repair_row_on_wire_with_cmd at ././repair/repair.hh:335
operator() at ./repair/row_level.cc:1881
```
Fixes#21016
Performance improvement only. No backport.
Closesscylladb/scylladb#21017
* github.com:scylladb/scylladb:
repair: Fix stall in repair_get_row_diff_with_rpc_stream_process_op_slow_path
repair: Add clear_gently for partition_key_and_mutation_fragments
Keep a copy of the sstable uuid generation in a new
scylla_metadata sstable_identifier attribute.
If the SSTable happens to have a numerical generation
just create a new time-uuid and log a message about that.
Dump this new attribute in scylla sstable dump tool.
And add a unit test to verify that the written (and then
loaded) sstable identifier matches the sstable's generation.
The motivatrion for this change stems from backup
deduplication. In essence, an sstable may already have been
backed up in a previous snapshot, and we don't want to
abck it up again if it's already present on external storage.
Today this is based on rclone that compares files checksums,
but once scylla will backup the sstables using the native
object-storage stack (#19890), we would like to use the sstable
globally-unique identifier for deduplication. Although the
uuid-generation is encoded in the sstable path, the latter
may change, e.g. due to intra-node migration, so keep a copy
of the original unique identifier in scylla-metadata, and that
attribute would survive file-based or intra-node migrations.
Fixesscylladb/scylladb#20459
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#21002
When tablets are migrated with file-based streaming, we can have a situation where a tombstone is garbage collected before the data it shadows lands. For instance, if we have a tablet replica with 3 sstables:
1. sstable containing an expired tombstone
2. sstable with additional data
3. sstable containing data which is shadowed by the expired tombstone in sstable 1
If this tablet is migrated, and the sstables are streamed in the order listed above, the first two sstables can be compacted before the third sstable arrives. In that case, the expired tombstone will be garbage collected, and data in the third sstable will be resurrected after it arrives to the pending replica.
This change fixes this problem by disabling tombstone garbage collection for pending replicas.
This fixes a problem in Enterprise, but the change is in OSS in order to have as few differences between OSS and Enterprise and to have a common infrastructure for disabling tombstone GC on pending replicas.
This change has to be backported to all active versions: 6.0, 6.1 and 6.2, as well as Enterprise 2024.2
Closesscylladb/scylladb#20788
* github.com:scylladb/scylladb:
test: test tombstone GC disabled on pending replica
tablet_storage_group_manager: update tombstone_gc_enabled in compaction group
database::table: add tombstone_gc_enabled(locator::tablet_id)
This PR builds upon the PR for checksum validation (#20207) to further enhance scrub's corruption detection capabilities by validating digests as well. The digest (full checksum) is the checksum over the entire data, as opposed to per-chunk checksums which apply to individual chunks. Until now, digests were not examined on any code paths. This PR integrates digest checking into the compressed/checksummed data sources as an optional feature and enables it only through the validation path of the sstable layer (`sstable::validate()`). The validation path is used by the following tools:
* scrub in validate mode
* `sstable validate`
All other reads, including normal user reads, are unaffected by this change.
The PR consists of:
* Extensions to the compressed and checksummed data sources to support digest checking. The data sources receive the expected digest as a parameter and calculate the actual digest incrementally across multiple get() calls. The check happens on the get() call that reaches EOF and results to an exception if the digest is invalid. A digest check requires reading the whole file range. Therefore, a partial read or skip() is treated as an internal error.
* A new shareable digest component loaded on demand by the validation code. No lifecycle management.
* Grouping of old scrub/validate tests for compressed and uncompressed SSTables to reduce code duplication.
* scrub/validate tests for SSTables with valid checksums but invalid digests, and SSTables with no digests at all.
* scrub/validate tests with 3.x Cassandra SSTables to ensure compatibility.
Refs #19058.
New feature, no backport is needed.
Closesscylladb/scylladb#20720
* github.com:scylladb/scylladb:
test: Test scrub/validate with SSTables from Cassandra
compaction: Make quarantine optional for perform_sstable_scrub()
test: Make random schema optional in scrub_test_framework
test: Add tests for invalid digests
test: Merge scrub/validate tests for compressed and uncompressed cases
sstables: Verify digests on validation path
sstables: Check if digest component exists
sstables: Add digest in the SSTable components
sstables: Add digest check in compressed data source
sstables: Add digest check in checksummed data source
The test/cql-ptest/run-cassandra prefers to use Java 11 if installed on
the system because this is the only version of Java that all modern
versions of Cassandra run on (Cassandra 3 and 4 can run on Java 8 and 11,
Cassandra 5 can run on Java 11 and 17).
However, in our search order we tried the "java" in the user's path
first, before trying Java 11. This means that if the user for some
reason had the ancient Java 8 (which is now a decade old) as his
default "java" got that, instead of Java 11, and couldn't run Cassandra 5.
While at it, update the comments to reflect the new reality that
Cassandra 5 needs Java 17 or 11 - *not* 11 or 8 as the older Cassandra.
We should eventually change the code logic as well (searching for
versions that depend on the Cassandra version - not always Java 8 and
11), but let's do it later. This patch already fixes a real bug for
developers that did install Java 11 but their default "java" pointed to
Java 8.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21001
It was not possible to link to configuration parameters groups in docs/reference/configuration-parameters.rst if they contained a space.
Closesscylladb/scylladb#21018
The process_one_row() evaluates pending_replica by subtracting replicas
from new_replicas. There's a convenience helper for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#21019
To avoid depending on two similar libraries (boost ranges and std \<ranges), replace
uses of the former with the latter. This series tackles the utils/ directory.
Code cleanup, no backport.
Closesscylladb/scylladb#20997
* github.com:scylladb/scylladb:
utils: logalloc: replace boost with std
utils: lsa: chunked_managed_vector: replace boost with std
utils: config_file: replace boost with std
utils: loading_cache: replace boost with std
utils: fragment_range: replace boost with std
utils: error_injector: replace boost with std
utils: crc: replace boost for_each with built-in range for
utils: class_registrator: replace boost with std
utils: chunked_vector: replace boost with std
utils: observable: replace boost with std
There's a long-pending issue in distributed loader. When it populates sstables on boot it loops over table.config.all_datadirs, but ignores the loop cursor (the datadir itslef), instead loading sstables from table.config.dir, which is 0th element of all_datadirs. There's a test for that, but it's also broken. Effectively collection happens from table.config.dir several times. For local sstables that's just wasted work and potentially lost sstables (but nobody seems to configure more than 1 datadir anyway). For S3 sstables it's also wasted work and incorrectness.
The fix is for both -- populator and test. The former is to use all_datadirs to construct sstable_directory. To make it happen, creation of sstable_directory now depends on the storage options, the loop is moved into the branch that creates sstable_directory for local storage type. The test fix is to make sure that some sstables in non-default datadir before running population code.
Closesscylladb/scylladb#20819
* github.com:scylladb/scylladb:
test: Fix test_multiple_data_dirs
distributed_loader: Indentation fix after previous patch
distributed_loader: Use correct datadir to collect local sstable
distributed_loader: Move all-datadirs loop to local storage collecting
distributed_loader: Collect table subdirs based on its storage options
distributed_loader: Indentation fix after previous patch
distributed_loader: Squash loop of collect_subdir into one method
distributed_loader: Convert map of directories into a vector
distributed_loader: Make start_subdir() method work with directory
distributed_loader: Drop local reference variable
distributed_loader: Split start_subdir()
distributed_loader: Remove allow-offstrategy argument
distributed_loader: Make populate() method work with directory
distributed_loader: Remove check for sstable_directory presense
distributed_loader: Out-line table_populator() methods
distributed_loader: Print storage options, not datadir
distributed_loader: Print prepared message
sstable_directory: Add sstable_state argument ot one of constructors
sstable_directory: Add state() method
can_admit_read() returns reason::memory_resources when the permit is queued due
to lack of count resources, and it returns reason::count_resources when the
permit is queued due to lack of memory resources. It's supposed to be the other
way around.
This bug is causing the two counts to be swapped in the stat dumps printed to
the logs when semaphores time out.
Closesscylladb/scylladb#20714
During shutdown, the compaction_manager starts stopping ongoing
compaction tasks through `really_do_stop()` method as soon as it
receives a signal from the abort source. Later, when the database object
shuts down, it calls `compaction_manager::drain` to ensure that all
compaction tasks have stopped. However, `compaction_manager::drain` is
currently implemented in such a way that, during shutdown, it
effectively becomes a no-op because the compaction_manager has already
initiated the stopping of tasks. As a result the caller assumes that all
the compaction tasks have stopped and proceeds to close all the tables.
This can lead to race conditions where table closures overlap with
compaction tasks that are still running, resulting in exceptions like :
```
exception during mutation write to 127.0.0.1:
utils::internal::nested_exception<std::runtime_error> (Could not write
mutation system:compaction_history
(pk{0010b70d31705e0411efb2edf6467f094c8b}) to commitlog):
seastar::gate_closed_exception (gate closed)
```
This commit fixes the issue by updating `compaction_manager::drain` to
invoke `stop_ongoing_compactions` even during shutdown to ensure that it
waits for the ongoing compaction tasks to complete. The
`stop_ongoing_compactions` method will also send a stop request to these
tasks before waiting, but the request will be ignored by the tasks as
they would have already received one earlier from `really_do_stop()`.
Fixes#20197
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#20715
fixes#20517
Adds `aws_error` which possibly can contain errors from the S3 response body. Adds to the multipart upload completion a check for possible error and issues a retry if the error is retryable
Closesscylladb/scylladb#20518
* github.com:scylladb/scylladb:
test: add complete_multipart_upload completion tests
code: s3 client error handling
code: add response parsing and error handling to the complete_multipart_upload
code: Introduce AWS errors parsing
ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized
views (MV), and only produced tablets mutations changing tables.
With this patch we're producing tablets mutations for both tables and
MVs, hence when e.g. we change the replication factor (RF) of a KS, both the
tables' RFs and MVs' RFs are updated along with tablets replicas.
The `test_tablet_rf_change` testcase has been extended to also verify
that MVs' tablets replicas are updated when RF changes.
Fixes: #20240Closesscylladb/scylladb#21007
As part of the effort to standardize on a single range library, convert the unconst helper
and its only user to \<ranges>.
The only user, mutation_partitions, happens to use intrusive_btree::iterator as the payload. That
iterator wasn't fully conform to iterator requirements, so it's fixed in a preliminary patch.
Code cleanup; no backport.
Closesscylladb/scylladb#20986
* github.com:scylladb/scylladb:
utils/unconst, mutation_partition: switch to ranges
utils: intrusive_btree: improve conformity with iterator requirements
Use clear_gently to avoid the following stalls.
```
~frozen_mutation_fragment at ././frozen_mutation.hh:268
std::destroy_at<frozen_mutation_fragment> at
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_construct.h:88
std::allocator_traits<std::allocator<std::_List_node<frozen_mutation_fragment>
> >::destroy<frozen_mutation_fragment> at
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/alloc_traits.h:537
std::__cxx11::_List_base<frozen_mutation_fragment,
std::allocator<frozen_mutation_fragment> >::_M_clear at
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/list.tcc:77
~_List_base at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_list.h:499
~partition_key_and_mutation_fragments at ././repair/repair.hh:298
~repair_row_on_wire_with_cmd at ././repair/repair.hh:335
operator() at ./repair/row_level.cc:1881
```
Fixes#21016