When the view builder starts to build a new view, each shard registers
itself by writing the shard id and current token to the
scylla_views_builds_in_progress table.
Previously, this happened independently by each shard. We change it now
to register all shards "atomically" - when a shard registers itself, it
also registers all other shards with an empty status, if they aren't
registered yet. This ensures that we don't have a partial state in the
table where only some of the shards are registered, but we always have a
status for all shards.
The reason we want to register all shards atomically is that if it
happens that only some of the shards were registered, then we restart
and load the status from table, this doesn't work well for multiple
reasons.
One example is that to know how many shards we had previously, we take
the maximum shard id we see in the table. If it's different than the
current shard count, we will execute the reshard code. But of course, if
the last shard is missing from the table because it didn't register
itself, this calculation will be wrong, and we can't know the previous
number of shards.
This is a problem because suppose we have two shards, and shard 0
finished building the view but shard 1 didn't start. When we come up, we
will think that previously we had only a single shard and it completed
building everything, when in fact we built only half the view
approximately. The problem is that we don't have enough information in
the tables to know that.
There are additional problems related to reshard. In the reshard
function, whether it is executed because we actually do node reshard or
because we calculated the wrong number of previous shards, if the status
of some shard is missing then the calculation of new ranges will be
wrong. When some shard didn't make progress we should start building the
view from scratch. However, this doesn't happen if we don't have a
status for the shard, because the code looks only for shards that have a
status. In effect, this shard is considered complete even though it
didn't start. This could cause the view building to get stuck or
complete without building all tokens ranges.
By registering all shards atomically, this should solve the above
problems because we will always have statuses for all shards.
Fixes https://github.com/scylladb/scylladb/issues/22989
backport not needed - the issue is probably not common and there's a workaround
Closesscylladb/scylladb#25790
* github.com:scylladb/scylladb:
test: mv: add a test for view build interrupt during registration
view_builder: register view on all shards atomically
In 789a4a1ce7, we adjusted the test file
to work with the configuration option `rf_rack_valid_keyspaces`. Part of
the commit was making the two tables used in the test replicate in
separate data centers.
Unfortunately, that destroyed the point of the test because the tables
no longer competed for resources. We fix that by enforcing the same
replication factor for both tables.
We still accept different values of replication factor when provided
manually by the user (by `--rf1` and `--rf2` commandline options). Scylla
won't allow for creating RF-rack-invalid keyspaces, but there's no reason
to take away the flexibility the user of the test already has.
Fixesscylladb/scylladb#26026Closesscylladb/scylladb#26115
In multi DC setup, tablet load balancer might generate multiple migrations of the same tablet_id but only one is actually commited to the `system.tablets` table.
This PR moved abortion of view building tasks from the same start of the migration (`<no tablet transition> -> allow_write_both_read_old`) to the next step (`allow_write_both_read_old -> write_both_read_old`). This way, we'll abort only tasks for which the tablet migration was actually started.
The PR also includes a reproducer test.
Fixesscylladb/scylladb#25912
View building coordinator hasn't been released yet, so no backport is needed.
Closesscylladb/scylladb#26144
* github.com:scylladb/scylladb:
test/test_view_building_coordinator: add reproducer
topology_coordinator: abort view building a bit later in case of tablet migration
The configuration setting vector_store_uri is renamed to
vector_store_primary_uri according to the final design.
In the future, the vector_store_secondary_uri setting will
be introduced.
This setting now also accepts a comma-separated list of URIs to prepare
for future support for redundancy and load balancing. Currently, only the
first URI in the list is used.
This change must be included before the next release.
Otherwise, users will be affected by a breaking change.
References: VECTOR-187
Closesscylladb/scylladb#26033
Add a new test that reproduces issue #22989. The test starts view
building and interrupts it by restarting the node while some shards
registered their status and some didn't.
When the view builder starts to build a new view, each shard registers
itself by writing the shard id and current token to the
scylla_views_builds_in_progress table.
Previously, this happened independently by each shard. We change it now
to register all shards "atomically" - when a shard registers itself, it
also registers all other shards with an empty status, if they aren't
registered yet. This ensures that we don't have a partial state in the
table where only some of the shards are registered, but we always have a
status for all shards.
The reason we want to register all shards atomically is that if it
happens that only some of the shards were registered, then we restart
and load the status from table, this doesn't work well for multiple
reasons.
One example is that to know how many shards we had previously, we take
the maximum shard id we see in the table. If it's different than the
current shard count, we will execute the reshard code. But of course, if
the last shard is missing from the table because it didn't register
itself, this calculation will be wrong, and we can't know the previous
number of shards.
This is a problem because suppose we have two shards, and shard 0
finished building the view but shard 1 didn't start. When we come up, we
will think that previously we had only a single shard and it completed
building everything, when in fact we built only half the view
approximately. The problem is that we don't have enough information in
the tables to know that.
There are additional problems related to reshard. In the reshard
function, whether it is executed because we actually do node reshard or
because we calculated the wrong number of previous shards, if the status
of some shard is missing then the calculation of new ranges will be
wrong. When some shard didn't make progress we should start building the
view from scratch. However, this doesn't happen if we don't have a
status for the shard, because the code looks only for shards that have a
status. In effect, this shard is considered complete even though it
didn't start. This could cause the view building to get stuck or
complete without building all tokens ranges.
By registering all shards atomically, this should solve the above
problems because we will always have statuses for all shards.
Fixesscylladb/scylladb#22989
This patch corrects the column name formatting whenever
an "Undefined column name" exception is thrown.
Until now we used the `name()` function which
returns a bytes object. This resulted in a message
with a garbled ascii bytes column name instead of
a proper string. We switch to the `text()` function
that returns a sstring instead, making the message
readable.
Tests are adjusted to confirm this behavior.
Fixes: VECTOR-228
Closesscylladb/scylladb#26120
Adds a test which reproduces the issue described
in scylladb/scylladb#25912.
The test creates a situation where a single tablet is replicated across
multiple DCs / racks, and all those tablet replicas are eligible for
migration. The tablet load balancer is unpaused at that moment which
currently causes it to attempt to generate multiple migrations for
different tablet replicas of the same tablet. Before the fix for #25912,
this used to confuse the view build coordinator which would react to
each migration attempt, pausing view building work for each tablet
replica for which there was an attempt to migrate but only unpausing it
for the tablet replica that was actually migrated. After the fix, the
view build coordinator only reacts to the migration that has "won" so
the test successfully passes.
Our sstable format selection logic is weird, and hard to follow.
If I'm not misunderstanding, the pieces are:
1. There's the `sstable_format` config entry, which currently
doesn't do anything, but in the past it used to disable
cluster features for versions newer than the specified one.
2. There are deprecated and unused config entries for individual
versions (`enable_sstables_mc_format`, `enable_sstables_md_format`,
etc).
3. There is a cluster feature for each version:
ME_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, etc.
(Currently all sstable version features have been grandfathered,
and aren't checked by the code anymore).
4. There's an entry in `system.scylla_local` which contains the
latest enabled sstable version. (Why? Isn't this directly derived
from cluster features anyway)?
5. There's `sstable_manager::_format` which contains the
sstable version to be used for new writes.
This field is updated by `sstables_format_selector`
based on cluster features and the `system.scylla_local` entry.
I don't see why those pieces are needed. Version selection has the
following constraints:
1. New sstables must be written with a format that supports existing
data. For example, range tombstones with an infinite bound are only
supported by sstables since version "mc". So if a range tombstone
with an infinite bound exists somewhere in the dataset,
the format chosen for new sstables has to be at least as new as "mc".
2. A new format might only be used after a corresponding cluster feature
is enabled. (Otherwise new sstables might become unreadable if they
are sent to another node, or if a node is downgraded).
3. The user should have a way to inhibit format ugprades if he wishes.
So far, constraint (1) has been fulfilled by never using formats older
than the newest format ever enabled on the node. (With an exception
for resharding and reshaping system tables).
Constraint (2) has been fulfilled by calling `sstable_manager::set_format`
only after the corresponsing cluster feature is enabled.
Constraint (3) has been fulfilled by the ability to inhibit cluster
features by setting `sstable_format` by some fixed value.
The main thing I don't like about this whole setup is that it doesn't
let me downgrade the preferred sstable format. After a format is
enabled, there is no way to go back to writing the old format again.
That is no good -- after I make some performance-sensitive changes
in a new format, it might turn out to be a pessimization for the
particular workload, and I want to be able to go back.
This patch aims to give a way to downgrade formats without violating
the constraints. What it does is:
1. The entry in `system.scylla_local` becomes obsolete.
After the patch we no longer update or read it.
As far as I understand, the purpose of this entry is to prevent
unwanted format downgrades (which is something cluster features
are designed for) and it's updated if and only if relevant
cluster features are updated. So there's no reason to have it,
we can just directly use cluster features.
2. `sstable_format_selector` gets deleted.
Without the `system.scylla_local` around, it's just a glorified
feature listener.
3. The format selection logic is moved into `sstable_manager`.
It already sees the `db::config` and the `gms::feature_service`.
For the foreseeable future, the knowledge of enabled cluster features
and current config should be enough information to pick the right formats.
4. The `sstable_format` entry in `db::config` is no longer intended to
inhibit cluster features. Instead, it is intended to select the
format for new sstables, and it becomes live-updatable.
5. Instead of writing new sstables with "highest supported" format,
(which used to be set by `sstables_format_selector`) we write
them with the "preferred" format, which is determined by
`sstable_manager` based on the combination of enabled features
and the current value of `sstable_format`.
Closesscylladb/scylladb#26092
[avi: Pavel found the reason for the scylla_local entry -
it predates stable storage for cluster features]
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.
Most of the patch is generated with
for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done
and a small manual change in test/perf/perf.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26136
test_streaming_deadlock_removenode starts 10240 inserts at once,
overloading a node. Due to that test fails with timeout.
Limit inserts concurrency.
Fixes: #25945.
Closesscylladb/scylladb#26102
when cluster repair is run for an entire keyspace, nodetool makes a
repair api request for each table.
if the keyspace contains colocated tables, then the api request for the
colocated tables will fail, because currently scylla doesn't allow making
repair requests for specific colocated tables, but only for base tables.
if the request is to repair an entire keyspace then we can ignore this,
because we will make a repair request for all base tables, and this in
turn will repair also all the colocated tables in the keyspace.
however if specific tables are requested and some of them are colocated
then we should propagate the error to let the user know the request is
invalid.
Refs https://github.com/scylladb/scylladb/issues/24816
no backport - no colocated tablets in previous releases
Closesscylladb/scylladb#26051
* github.com:scylladb/scylladb:
nodetool: ignore repair request error of colocated tables
storage_service: improve error message on repair of colocated tables
Previously, LSI keys were stored as separate, top-level columns in the base table. This patch changes this behavior for newly created tables, so that the key columns are stored inside the `:attrs` map. Then, we use top-level computed columns instead of regular ones.
This makes LSI storage consistent with GSIs and allows the use of a collection tombstone on `:attrs` to delete all attributes in a row except for keys in new tables.
Refs https://github.com/scylladb/scylladb/pull/24991
Refs https://github.com/scylladb/scylladb/issues/6930Closesscylladb/scylladb#25796
* github.com:scylladb/scylladb:
alternator: Store LSI keys in :attrs for newly created tables
alternator/test: Add LSI tests based mostly on the existing GSI tests
Alternator, when creating gsi, adds artificially columns, that user
had not ask for. This patch prevents those columns from showing up in
DescribeTable's output.
Fixes#5320Closesscylladb/scylladb#25978
In the current scenario, the shard receiving the remove node REST api request performs condional lock depending on whether raft is enabled or not. Since non-zero shard returns false for `raft_topology_change_enabled()`, the requests routed to non zero shards are prone to this lock which is unnecessary and hampers the ability to perform concurrent operations, which is possible for raft enabled nodes.
This pr modifies the conditional lock logic and orchestrates the remove node execution logic directly to the shard0, hence the `raft_topology_change_enabled()` is now checked on the shard0 and execution is performed accordingly.
Earlier, `storage_service::find_raft_nodes_from_hoeps` code threw error upon observing any non topology member present in ignore_nodes. Since we are performing concurrent remove node operations, the timing can lead to one node being fully removed before the other node remove op begins processing which can lead to runtime error in storage_service::find_raft_nodes_from_hoeps. This error throw was added to prevent users from putting random non existent nodes in ignore_nodes list. Hence made changes in that function to account for already removed nodes and ignore those nodes instead of throwing error.
A test is also added to confirm the new behaviour, where concurrent remove node operations are now being performed seamlessly.
This pr doesn't fix a critical bug. No need to backport it.
Fixes: scylladb/scylladb#24737Closesscylladb/scylladb#25713
* https://github.com/scylladb/scylladb:
raft_topology: Modify the conditional logic in remove node operation to enhance concurrency for raft enabled clusters.
storage_service: remove assumptions and checks for ignore_nodes to be normal.
When the monitor is started, the first disk utilization value is
obtained from the actual host filesystem and not from the fake
space source function.
Thus, register a fake space source function before the monitor
is started.
Fixes: https://github.com/scylladb/scylladb/issues/26036
Backport is not required. The test has been added recently.
Closesscylladb/scylladb#26054
initial implementation to support CDC in tablets-enabled keyspaces.
The design is described in https://docs.google.com/document/d/1qO5f2q5QoN5z1-rYOQFu6tqVLD3Ha6pphXKEqbtSNiU/edit?usp=sharing
It is followed closely for the most part except "Deciding when to change streams" - instead, streams are changed synchronously with tablet split / merge.
Instead of the stream switching algorithm with the double writes, we use a scheme similar to the previous method for vnodes - we add the new streams with timestamp that is sufficiently far into the future.
In this PR we:
* add new group0-based internal system tables for tablet stream metadata and loading it into in-memory CDC metadata
* add virtual tables for CDC consumers
* the write coordinator chooses a stream by looking up the appropriate stream in the CDC metadata
* enable creating tables with CDC enabled in tablets-enabled keyspaces. tablets are allocated for the CDC table, and a stream is created per each tablet.
* on tablet resize (split / merge), the topology coordinator creates a new stream set with a new stream for each new tablet.
* the cdc tablets are co-located with the base tablets
Fixes https://github.com/scylladb/scylladb/issues/22576
backport not needed - new feature
update dtests: https://github.com/scylladb/scylla-dtest/pull/5897
update java cdc library: https://github.com/scylladb/scylla-cdc-java/pull/102
update rust cdc library: https://github.com/scylladb/scylla-cdc-rust/pull/136Closesscylladb/scylladb#23795
* github.com:scylladb/scylladb:
docs/dev: update CDC dev docs for tablets
doc: update CDC docs for tablets
test: cluster_events: enable add_cdc and drop_cdc
test/cql: enable cql cdc tests to run with tablets
test: test_cdc_with_alter: adjust for cdc with tablets
test/cqlpy: adjust cdc tests for tablets
test/cluster/test_cdc_with_tablets: introduce cdc with tablets tests
cdc: enable cdc with tablets
topology coordinator: change streams on tablet split/merge
cdc: virtual tables for cdc with tablets
cdc: generate_stream_diff helper function
cdc: choose stream in tablets enabled keyspaces
cdc: rename get_stream to get_vnode_stream
cdc: load tablet streams metadata from tables
cdc: helper functions for reading metadata from tables
cdc: colocate cdc table with base
cdc: remove streams when dropping CDC table
cdc: create streams when allocating tablets
migration_listener: add on_before_allocate_tablet_map notification
cdc: notify when creating or dropping cdc table
cdc: move cdc table creation to pre_create
cdc: add internal tables for cdc with tablets
cdc: add cdc_with_tablets feature flag
cdc: add is_log_schema helper
This is yet another part in the BTI index project.
Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25626
Next parts: introducing the new components, Partitions.db and Rows.db
This is the preparatory, uncontroversial part of https://github.com/scylladb/scylladb/pull/26039, which has been split out to a separate PR to make the main part (which, after a revision, will be posted later) smaller.
This series contains several small fixes and changes to BTI-related code added earlier, which either have to be done (i.e. propagating `reader_permit` to IO calls in index reads) or just deserved to be done. There's no single theme for the changes in this PR, refer to the individual commits for details.
The changes are for the sake of new and unreleased code. No backporting should be done.
Closesscylladb/scylladb#26075
* github.com:scylladb/scylladb:
sstables/mx/reader: remove mx::make_reader_with_index_reader
test/boost/bti_index_test: fix indentation
sstables/trie/bti_index_reader: in last_block_offset(), return offset from the beginning of partition, not file
sstables/trie: support reader_permit and trace_state properly
sstables/trie/bti_node_reader: avoid calling into `cached_file` if the target position is already cached
sstables/trie/bti_index_reader: get rid of the seastar::file wrapper in read_row_index_header
sstables/trie/bti_index_reader: support BYPASS CACHE
test/boost/bti_index_test: use read_bti_partitions_db_footer where appropriate
sstables/trie: change the signature of bti_partition_index_writer::finish
sstables/bti_index: improve signatures of special member functions in index writers
streaming/stream_transfer_task: coroutinize `estimate_partitions()`
types/comparable_bytes: add a missing implementation for date_type_impl
sstables: remove an outdated FIXME
storage_service: delete `get_splits()`
sstables/trie: fix some comment typos in bti_index_reader.cc
sstables/mx/writer: rename _pi_write_m.tomb to partition_tombstone
when cluster repair is run for an entire keyspace, nodetool makes a
repair api request for each table.
if the keyspace contains colocated tables, then the api request for the
colocated tables will fail, because currently scylla doesn't allow making
repair requests for specific colocated tables, but only for base tables.
if the request is to repair an entire keyspace then we can ignore this,
because we will make a repair request for all base tables, and this in
turn will repair also all the colocated tables in the keyspace.
however if specific tables are requested and some of them are colocated
then we should propagate the error to let the user know the request is
invalid.
Refs scylladb/scylladb#24816
`sl:driver` is expected to be used for new and control connections,
but other connections that run user load should not use it after
the user is authenticated.
Refs: scylladb/scylladb#24411
Before this change, new connections were handled in a default
scheduling group (`main`), because before the user is authenticated
we do not know which service level should be used. With the new
`sl:driver` service level, creation of new connections can be moved to
`sl:driver`.
We switch the service level as early as possible, in `do_accepts`.
There is a possibility, that `sl:driver` will not exist yet, for
instance, in specific upgrade cases, or if it was removed. Therefore,
we also switch to `sl:driver` after a connection is accepted.
Refs: scylladb/scylladb#24411
Driver service level is a special service level that is created
automatically by the system. Therefore, it requires special handling
in DESC SCHEMA WITH INTERNALS and those test verifies the special
behavior.
Refs: scylladb/scylladb#24411
This commit:
- Increases the number of allowed scheduling groups to allow the
creation of `sl:driver`.
- Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating
`sl:driver` until all nodes have increased the number of
scheduling groups.
- Starts using `get_create_driver_service_level_mutations`
to unconditionally create `sl:driver` on
`raft_initialize_discovery_leader`. The purpose of this code
path is ensuring existence of `sl:driver` in new system and tests.
- Starts using `migrate_to_driver_service_level` to create `sl:driver`
if it is not already present. The creation of `sl:driver` is
managed by `topology_coordinator`, similar to other system keyspace
updates, such as the `view_builder` migration. The purpose of this
code path is handling upgrades.
- Modifies related tests to pass after `sl:driver` is added.
Later in this patch series, `sl:driver` will be used by
`transport/server` to handle selected traffic, such as the driver's
schema and topology fetches.
Refs: scylladb/scylladb#24411
Previously, tests used the hardcoded value 7 for the maximum number of
user service levels. This commit introduces a named variable that can
be shared across tests to avoid cases where this magic number goes
out of sync.
According to the changes in Vector Store API (VECTOR-148) the `embedding` term
should be changed to `vector`. As `vector` term is used for STL class the
internal type or variable names would be changed to `vs_vector` (aka vector
store vector). This patch changes also the HTTP ann json request payload
according to the Vector Store API changes.
Fixes: VECTOR-229
Closesscylladb/scylladb#26050
E2E test runs multi-column CAS workload (LOCAL_QUORUM/LOCAL_SERIAL) while
tablets are repeatedly migrated between nodes. Uncertainty timeouts are
resolved via LOCAL_SERIAL reads; guards use max(row, lower_bound). Final
assertion: s{i} per (pk,i) equals the count of confirmed CAS by worker i
(no lost/phantom updates) despite tablet moves.
Closesscylladb/scylladb#25402
As requested in #22099, moved the files and fixed other includes and build system.
Moved files:
- cache_temperature.hh
- cell_locking.hh
Fixes: #22099Closesscylladb/scylladb#25079
We were recently surprised (in pull request #25797) to "discover" that
Scylla does not allow granting SELECT permissions on individual
materialized views. Instead, all materialized views of a base table
are readable if the base table is readable.
In this patch we document this fact, and also add a test to verify
that it is indeed true. As usual for cqlpy tests, this test can also
be run on Cassandra - and it passes showing that Cassandra also
implemented it the same way (which isn't surprising, given that we
probably copied our initial implementation from them).
The test demonstrates that neither Scylla nor Cassandra prints an error
when attempting to GRANT permissions on a specific materialized view -
but this GRANT is simply ignored. This is not ideal, but it is the
existing behavior in both and it's not important now to change it.
Additionally, because pull request #25797 made CDC-log permissions behave
the same as materialized views - i.e., you need to make the base table
readable to allow reading from the CDC log, this patch also documents
this fact and adds a test for it also.
Fixes#25800Closesscylladb/scylladb#25827
Upcoming changes in Seastar cause `rest::simple_send` to move the
`http::request` into `seastar::http::experimental::client::make_request`
when called multiple times. This leaves the original request in an
invalid state. Specifically, the `_version` field becomes empty,
causing request validation to fail. This patch ensures `version` is
explicitly set to prevent such failures.
Fixes: https://github.com/scylladb/scylladb/issues/26018Closesscylladb/scylladb#26066
Load balancer aims to preserve a balance in rack loads when generating
tablet migrations. However, this balance might get broken when dead nodes
are present. Currently, these nodes aren't include in rack load calculations,
even if they own tablet replicas. As a result, load balancer treats racks
with dead nodes as racks with a lower load, so I generates migrations to these
racks.
This is incorrect, because a dead node might come back alive, which would result
in having multiple tablet replicas on the same rack. It's also inefficient
even if we know that the node won't come back - when it's being replaced or removed.
In that case we know we are going to rebuild the lost tablet replicas
so migrating tablets to this rack just doubles the work. Allowing such migrations
to happen would also require adjustments in the materialized view pairing code
because we'd temporarily allow having multiple tablet replicas on the same rack.
So in this patch we include dead nodes when calculating rack loads in the load
balancer. The dead nodes still aren't treated as potential migration sources or
destinations.
We also add a test which verifies that no migrations are performed by doing a node
replace with a mv workload in parallel. Before the patch, we'd get pairing errors
and after the patch, no pairing errors are detected.
Fixes https://github.com/scylladb/scylladb/issues/24485Closesscylladb/scylladb#26028
Currently, in repair_tablet we retrieve session_id from tablet
map (and throw if it isn't specified). In case of topology
coordinator failover, we may end up in a situation where a node
runs outdated repair, treating session of a different operation
as the repair's session:
- topology coordinator starts repair transition (A);
- topology coordinator sends tablet repair rpc to node1;
- topology coordinator is separated from the cluster;
- new topology coordinator is elected;
- new topology coordinator sees waiting repair request (A_2)
and executes it;
- new repair of the same tablet is requested (B);
- new topology coordinator starts repair transition (B);
- new topology coordinator sends tablet repair rpc to node2;
- node2 starts repair (B) as repair master;
- node1 starts repair (A), checks the current session (B), proceeds
with repair (B) as repair master.
Send current session_id in repair_tablet rpc. If this session_id
and session id got from tablet map don't match, an exception
is thrown.
Fixes: https://github.com/scylladb/scylladb/issues/23318.
No backport; changes in rpc signatures
Closesscylladb/scylladb#25369
* github.com:scylladb/scylladb:
test: check that repair with outdated session_id fails
service: pass current session_id to repair rpc
The script test/cqpy/run-cassandra aims to make it easy to run
any version of Cassandra using whatever version of Java the user
has installed. Sadly, the fact that Java keeps changing and the
Cassandra developers are very slow to adapt to new Javas makes
doing this non-trivial.
This patch makes it possible for run-cassandra to run Cassandra 5
on the Java 21 that is now the default on Fedora 42. Fedora 42
no longer carries antique version of Java (like Java 8 or 11), not
even as an optional package.
Sadly, even with this patch it is not possible to run older
versions of Cassandra (4 and 3) with Java 21, because the new
Java is missing features such as Netty that the older Cassandra
require. But at least it restores the ability to run our cqlpy
tests against Cassandra 5.
Also, this patch adds to test/cqlpy/README.md simple instructions on
how to install Java 11 (in addition to the system's default Java 21)
on Fedora 42. Doing this is very easy and very recommended because
it restores the ability to run Cassandra 3 and 4, not just Cassandra 5.
Fixes#25822.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25825
compaction/scrub: register sstables for compaction before validation
When `scrub --validate` runs, it collects all candidate sstables at the
start and validates them one by one in separate compaction tasks.
However, scrub in validate mode does not register these sstables for
compaction, which allows regular compaction to pick them up and
potentially compact them away before validation begins. This leads to
scrub failures because the sstables can no longer be found.
This patch fixes the issue by first disabling compaction, collecting the
sstables, and then registering them for compaction before starting
validation. This ensures that the enqueued sstables remain available for
the entire duration of the scrub validation task.
Fixes#23363
This reported scrub failure occurs on all versions that have the
checksum/digest validation feature for uncompressed sstables.
So, backport it to older versions.
Closesscylladb/scylladb#26034
* github.com:scylladb/scylladb:
compaction/scrub: register sstables for compaction before validation
compaction/scrub: handle exceptions when moving invalid sstables to quarantine
Depends on https://github.com/scylladb/seastar/pull/2651
Missing columns have been present since probably forever -
they were added to the schema but never assigned any value:
```
cqlsh> select * from system.clients;
------------------+------------------------
...
ssl_cipher_suite | null
ssl_enabled | null
ssl_protocol | null
...
```
This patch sets values of these columns:
- with a TLS connection, the 3 TLS-related fields are filled in,
- without TLS, `ssl_enabled` is set to `false` and other columns are
`null`,
- if there's an error while inspecting TLS values, the connection is
dropped.
We want to save the TLS info of a connection just after accepting it,
but without waiting for a TLS handshake to complete, so once the
connection is accepted, we're inspecting it in the background for the
server to be able to accept next connections immediately.
Later, when we construct system.clients virtual table, the previously
saved data can be instantaneously assigned to client_data, which is a
struct representing a row in system.clients table. This way we don't
slow down constructing this table by more than necessary, which is
relevant for cases with plenty of connections.
Fixes: #9216Closesscylladb/scylladb#22961
In test/alternator/test_returnvalues.py we had tests for the
ReturnValues feature on UpdateItem requests - but we only tested
UpdateItem requests with the "modern" UpdateExpression, and forgot to
test the combination of ReturnValues with the old AttributeUpdates API.
It turns out this combination is buggy: when both ReturnValues=ALL_OLD
and AttributeUpdates need the previous value of the item, we may wrongly
std::move() the value out, and the operation will fail with a strange
error:
An error occurred (ValidationException) when calling the UpdateItem
operation: JSON assert failed on condition 'IsObject()'
The fix in this patch is trivial - just move the std::move() to the
correct place, after both UpdateExpression and AttributeUpdates
handling is done.
This patch also includes a reproducing test, which fails before this
patch and passes with it - and of course passes on DynamoDB. This
test reproduces two cases where the bug happened, as well as one
case where it didn't (to make sure we don't regress in what already
worked).
Fixes#25894
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25900
previously the test set tablets to disabled because cdc wasn't supported
with tablets. now we can change this to use the default to enable it to
run with either tablets or vnodes.
update cdc-related tests in test/cqlpy for cdc with tablets.
* test_cdc_log_entries_use_cdc_streams: this test depends on the
implementation of the cdc tables, which is different for tablets, so
it's changed to run for both vnodes and tablets keyspaces, and we add
the implementation for tablets.
* some cdc-related are unskipped for tablets so they will be run with
both tablets and vnodes keyspaces. these are tests where the
implementation may be different between tablets and vnodes and we want
to have converage of both.
* other cdc-related tests do not depend on the implementation
differences between tablets and vnodes, so we can just enable them to
run with the default configuration. previously they were disabled for
tablets keyspaces because it wasn't supported, so now we remove this.
Introduce basic tests creating CDC tables in tablets-enabled keyspaces,
verifying we can create and drop CDC tables, write and consume CDC log
entries, and consume the log while splitting streams.
Read the CDC stream metadata from the internal system tables, and store
it in the cdc metadata data structures.
The metadata is stored in the tables as diffs which is more storage
efficient, but when in-memory we store it as full stream sets for each
timestamp. This is more useful because we need to be able to find a
stream given timestamp and token.
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.
First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1
Fixes: https://github.com/scylladb/scylladb/issues/25970Closesscylladb/scylladb#25995
* github.com:scylladb/scylladb:
test: verify that the index metric is added
index, metrics: add per-index metrics
When `mx::make_reader` is used to construct an sstable reader,
it constructs its own index reader internally.
`mx::make_reader_with_index_reader` was originally added
as a variant of `mx::make_reader` which can be used to inject
a custom `index_reader` for testing that the mx Data reader
tolerates inexact indexes.
But now we want the ability to choose between BIG index readers
and BTI index readers if both are present. And at this point,
it seems to me that it makes sense to just construct the index
reader in the caller and pass it via argument to `mx::make_reader`
instead of putting the index selection inside it.
So that's what we do in this patch. And we remove `mx::make_reader_with_index_reader`
because it's no longer different from `mx::make_reader`.
Before this patch, `bti_index_reader::last_block_offset()` returns the
offset of the last block within the file.
But the old `index_reader::last_block_offset()` returns the offset
within the partition, and that's what the callers (i.e. reversed
sstable reader) expect.
Fix `bti_index_reader::last_block_offset()` (and the corresponding
comment and test) to match `index_reader::last_block_offset()`.