This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.
It implements functionality for refreshing ip address of the
vector-store service dns name and creating a new HTTP client with that
address. It also provides cleanup of unused http clients. There are
hardcoded intervals for dns refresh and old http clients cleanup, and
timeout for requesting new http client.
This patch introduces two background tasks - for dns resolving
task and for cleanup old http clients.
It adds unit tests for possible dns refreshing issues.
Reference: VS-47
Fixes: VS-45
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.
There is a need for abortable sequention_producer operator(). The
existing operator() is changed to allow timeout argument with default
time_point::max() (as current default usage) and the new operator() is
created with abort_source parameter.
Reference: VS-47
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.
It adds a `services/vector_store_client.{cc|hh}` sharded service and a
configuration parameter `vector_store_uri` with a
`http://vector-store.dns.name:port` format. If there will be an error
during parsing that parameter there will be an exception during
construction.
For the future unit testing purposes the patch adds
`vector_store_client_tester` as a way to inject mockup functionality.
This service will be used by the select statements for the Vector search
indexes (see VS-46). For this reason I've added vector_store_client
service in the query processor.
Reference: VS-47 VS-45
When replaying a failed batch and sending the mutation to all replicas, make the write response handler cancellable and abort it on shutdown or if some target is marked down. also set a reasonable timeout so it gets aborted if it's stuck for some other unexpected reason.
Previously, the write response handler is not cancellable and has no timeout. This can cause a scenario where some write operation by the batchlog manager is stuck indefinitely, and node shutdown gets stuck as well because it waits for the batchlog manager to complete, without aborting the operation.
backport to relevant versions since the issue can cause node shutdown to hang
Fixesscylladb/scylladb#24599Closesscylladb/scylladb#24595
* github.com:scylladb/scylladb:
test: test_batchlog_manager: batchlog replay includes cdc
test: test_batchlog_manager: test batch replay when a node is down
batchlog_manager: set timeout on writes
batchlog_manager: abort writes on shutdown
batchlog_manager: create cancellable write response handler
storage_proxy: add write type parameter to mutate_internal
The series adds more logging and provides new REST api around topology command rpc execution to allow easier debugging of stuck topology operations.
Backport since we want to have in the production as quick as possible.
Fixes#24860Closesscylladb/scylladb#24799
* https://github.com/scylladb/scylladb:
topology coordinator: log a start and an end of topology coordinator command execution at info level
topology coordinator: add REST endpoint to query the status of ongoing topology cmd rpc
Several audit test issues caused test failures, and in the result, almost all of audit syslog tests were marked with xfail.
This patch series enables the syslog audit tests, that should finally pass after the following fixes are introduced:
- bring back commas to audit syslog (scylladb#24410 fix)
- synchronize audit syslog server
- fix parsing of syslog messages
- generate unique uuid for each line in syslog audit
- allow audit logging from multiple nodes
Fixes: scylladb/scylladb#24410
Test improvements, no backport required.
Closesscylladb/scylladb#24553
* github.com:scylladb/scylladb:
test: audit: use automatic comparators in AuditEntry
test: audit: enable syslog audit tests
test: audit: sort new audit entries before comparing with expected ones
test: audit: check audit logging from multiple nodes
test: audit: generate unique uuid for each line in syslog audit
test: audit: fix parsing of syslog messages
test: audit: synchronize audit syslog server
docs: audit: update syslog audit format to the current one
audit: bring back commas to audit syslog
Thiss check validates that static values of supported versions are "in
sync" with each other. It's enough to do it once when compiling
sstable_version.cc, not every time the header is included.
refs: #1 (not that it helps noticeably, but technically it fits)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#24839
Add a new test that verifies that when replaying batch mutations from
the batchlog, the mutations include cdc augmentation if needed.
This is done in order to verify that it works currently as expected and
doesn't break in the future.
Add a test of the batchlog manager replay loop applying failed batches
while some replica is down.
The test reproduces an issue where the batchlog manager tries to replay
a failed batch, doesn't get a response from some replica, and becomes
stuck.
It verifies that the batchlog manager can eventually recover from this
situation and continue applying failed batches.
Set a timeout on writes of replayed batches by the batchlog manager.
We want to avoid having infinite timeout for the writes in case it gets
stuck for some unexpected reason.
The timeout is set to be high enough to allow any reasonable write to
complete.
On shutdown of batchlog manager, abort all writes of replayed batches
by the batchlog manager.
To achieve this we set the appropriate write_type to BATCH, and on
shutdown cancel all write handlers with this type.
When replaying a batch mutation from the batchlog manager and sending it
to all replicas, create the write response handler as cancellable.
To achieve this we define a new wrapper type for batchlog mutations -
batchlog_replay_mutation, and this allows us to overload
create_write_response_handler for this type. This is similar to how it's
done with hint_wrapper and read_repair_mutation.
Currently mutate_internal has a boolean parameter `counter_write` that
indicates whether the write is of counter type or not.
We replace it with a more general parameter that allows to indicate the
write type.
It is compatible with the previous behavior - for a counter write, the
type COUNTER is passed, and otherwise a default value will be used
as before.
The topology coordinator executes several topology cmd rpc against some nodes
during a topology change. A topology operation will not proceed unless
rpc completes (successfully or not), but sometimes it appears that it
hangs and it is hard to tell on which nodes it did not complete yet.
Introduce new REST endpoint that can help with debugging such cases.
If executed on the topology coordinator it returns currently running
topology rpc (if any) and a list of nodes that did not reply yet.
As seen in #23284, when the tablet_metadata contains many tables, even empty ones,
we're seeing a long queue of seastar tasks coming from the individual destruction of
`tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>`.
This change improves `tablet_metadata::clear_gently` to destroy the `tablet_map_ptr` objects
on their owner shard by sorting them into vectors, per- owner shard.
Also, background call to clear_gently was added to `~token_metadata`, as it is destroyed
arbitrarily when automatic token_metadata_ptr variables go out of scope, so that the
contained tablet_metadata would be cleared gently.
Finally, a unit test was added to reproduce the `Too long queue accumulated for gossip` symptom
and verify that it is gone with this change.
Fixes#24814
Refs #23284
This change is not marked as fixing the issue since we still need to verify that there is no impact on query performance, reactor stalls, or large allocations, with a large number of tablet-based tables.
* Since the issue exists in 2025.1, requesting backport to 2025.1 and upwards
Closesscylladb/scylladb#24618
* github.com:scylladb/scylladb:
token_metadata_impl: clear_gently: release version tracker early
test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables
token_metadata: clear_and_destroy_impl when destroyed
token_metadata: keep a reference to shared_token_metadata
token_metadata: move make_token_metadata_ptr into shared_token_metadata class
replica: database: get and expose a mutable locator::shared_token_metadata
locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction
No need to wait for all members to be cleared gently.
We can release the version earlier since the
held version may be awaited for in barriers.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Reproduces #23284
Currently skipped in release mode since it requires
the `short_tablet_stats_refresh_interval` interval.
Ref #24641
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We have a lot of places in the code where
a token_metadata_ptr is kept in an automatic
variable and destroyed when it leaves the scope.
since it's a referenced counted lw_shared_ptr,
the token_metadata object is rarely destroyed in
those cases, but when it is, it doesn't go through
clear_gently, and in particular its tablet_metadata
is not cleared gently, leading to inefficient destruction
of potentially many foreign_ptr:s.
This patch calls clear_and_destroy_impl that gently
clears and destroys the impl object in the background
using the shared_token_metadata.
Fixes#13381
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So we can use the local shared_token_metadata instance
for safe background destroy of token_metadata_impl:s.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prepare for next patch, the will use this shared_token_metadata
to make mutable_token_metadata_ptr:s
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Sort all tablet_map_ptr:s by shard_id
and then destroy them on each shard to prevent
long cross-shard task queues for foreign_ptr destructions.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The db::config is top-level configuration class that includes options for pretty much everything in Scylla. Instead of messing with this large thing, individual services have their own smaller configs, that are initialized with values from db::config. This PR makes it for transport::server (transport::controller will be next) and its cql_server_config. One bad thing not to step on is that updateable_value is not shard-safe (#7316), but the code in controller that creates cql_server_config is already taking care.
Closesscylladb/scylladb#24841
* github.com:scylladb/scylladb:
transport: Stop using db::config by transport::server
transport: Keep uninitialized_connections_semaphore_cpu_concurrency on cql_server_config
transport: Move cql_duplicate_bind_variable_names_refer_to_same_variable to cql_server_config
transport: Move max_concurrent_requests to struct config
transport: Use cql_server_config::max_request_size
cql_server_config
This also repeats previous patch for another updateable_value. The thing
here is that this config option is passed further to generic_server, but
not used by transport::server itslef.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
to cql_server_config
Similarly to previous patch -- move yet another updateable_value to let
transport::server eventually stop messing with db::config.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is updateable_value that's initialized from db::config named_value
to tackle its shard-unsafety. However, the cql_server_config is created
by controller using sharded_parameter() helper, so that is can be safely
passed to server.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Replace manual comparator implementations with generated comparators.
This simplifies future maintenance and ensures comparators
remain accurate when new fields are added.
Reorder fields in AuditEntry so the less-than comparator evaluates
the most significant fields first.
Several audit test issues were resolved in numerous commits of this
patch series. This commit enables the syslog audit tests, that should
finally pass.
In some corner cases, the order of audit entries can change. For
instance, ScyllaDB is allowed to apply BATCH statements in an order
different from the order in which they are listed in the statement.
To prevent test failures in such cases, this commit sorts new
audit entries.
Additionally, it is possible that some of the audit entries won't be
received by the SYSLOG server immediately. To prevent test failures
in this scenario, waiting for the expected number of new audit entries
is added.
Before this change, the `assert_audit_row_eq` check assumed that
audit logs were always generated by the same (first) node. However,
this assumption is invalid in a multi-node setup.
This commit modifies the check to just verify that one of the nodes
in the cluster generated the audit log.
Audit to TABLE uses a time UUID as a clustering key, while audit to
SYSLOG simply appends new lines. As a result, having such a detailed
time UUID is unnecessary for SYSLOG. However, TABLE tests expect each
line to be unique, and a similar check is performed (and fails)
in SYSLOG tests.
This commit updates the test framework to generate a unique UUID for
each line in SYSLOG audit. This ensures the tests remain consistent
for both TABLE and SYSLOG audit.
Before this commit, there were following issues with parsing of syslog
messages in audit tests:
- `line_to_row()` function was never called
- `line_to_row()` was not prepared for changes introduced in
scylladb#23099 (i.e. key=value pairs)
- `line_to_row()` didn't handle newlines in queries
- `line_to_row()` didn't handle "\\" escaping in queries
Due to the aforementioned issues, the syslog audit tests were failing.
This commit fixes all of those issues, by parsing each audit syslog
message using a regexp.
Copy `auth_test.py` from scylla-dtest test suite, remove all not next_gating tests from it, and make it works with `test.py`
As a part of the porting process, remove unused imports and markers, remove non-next_gating tests and tests marked with `required_features("!consistent-topology-changes")` marker.
Remove `test_permissions_caching` test because it's too flaky when running using test.py
Also, make few time execution optimizations:
- remove redundant `time.sleep(10)`
- use smaller timeouts for CQL sessions
Enable the test in `suite.yaml` (run in dev mode only.)
Additional modifications to test.py/dtest shim code:
- Modify ManagerClient.server_update_config() method to change multiple config options in one call in addition to one `key: value` pair.
- Implement the method using slightly modified `set_configuration_options()` method of `ScyllaCluster`.
- Copy generate_cluster_topology() function from tools/cluster_topology.py module.
- Add support for `bootstrap` parameter for `new_node()` function.
- Rework `wait_for_any_log()` function.
Closesscylladb/scylladb#24648
* github.com:scylladb/scylladb:
test.py: dtest: make auth_test.py run using test.py
test.py: dtest: rework wait_for_any_log()
test.py: dtest: add support for bootstrap parameter for new_node
test.py: dtest: add generate_cluster_topology() function
test.py: dtest: add ScyllaNode.set_configuration_options() method
test.py: pylib/manager_client: support batch config changes
test.py: dtest: copy unmodified auth_test.py
test.py: dtest: add missed markers to pytest.ini
01466be7b9 changed the summary entries, storing raw tokens in them,
instead of dht::token. Adjust the command so that it works with both
pre- and post- versions.
Also make it accept pointers to sstables as arguments, this is what
scylla sstables listing provides.
Closesscylladb/scylladb#24759
Multiple tests are currently flaky due to graceful shutdown
timing out when flushing tables takes more than a minute. We still
don't understand why flushing is sometimes so slow, but we suspect
it is an issue with new machines spider9 and spider11 that CI runs
on. All observed failures happened on these machines, and most of
them on spider9.
In this commit, we increase the timeout of graceful shutdown as
a temporary workaround to improve CI stability. When we get to
the bottom of the issue and fix it, we will revert this change.
Ref #12028
It's a temporary workaround to improve CI stability, we don't
have to backport it.
Closesscylladb/scylladb#24802
Currently, when computing the mutation to be stored in system.batchlog,
we go through data_value. In turn this goes through `bytes` type
(#24810), so it causes a large contiguous allocation if the batch is
large.
Fix by going through the more primitive, but less contiguous,
atomic_cell API.
Fixes#24809.
Closesscylladb/scylladb#24811
C++20 introduced two new attributes--likely and unlikely--that
function as a built-in replacement for __builtin_expect implemented
in various compilers. Since it makes code easier to read and it's
an integral part of the language, there's no reason to not use it
instead.
Closesscylladb/scylladb#24786
Old nodes do not expect global topology request names to be in
request_type field, so set it only if a cluster is fully upgraded
already.
Closesscylladb/scylladb#24731
- Fix missing negation in the `if` in the background downloading fiber
- Add test to catch this case
- Improve the s3 proxy to inject errors if the same resource requested more than once
- Suppress client retry since retrying the same request when each produces multiple buffers may lead to the same data appear more than once in the buffer deque
- Inject exception from the test to simulate response callback failure in the middle
No need to backport anything since this class in not used yet
Closesscylladb/scylladb#24657
* github.com:scylladb/scylladb:
s3_test: Add s3_client test for non-retryable error handling
s3_test: Add trace logging for default_retry_strategy
s3_client: Fix edge case when the range is exhausted
s3_client: Fix indentation in try..catch block
s3_client: Stop retries in chunked download source
s3_client: Enhance test coverage for retry logic
s3_client: Add test for Content-Range fix
s3_client: Fix missing negation
s3_client: Refine logging
s3_client: Improve logging placement for current_range output
Replacing "from" is incorrect. The typo comes from recently
merged #24583.
Fixes#24732
Requires backport to 2025.2 since #24583 has been backported to 2025.2.
Closesscylladb/scylladb#24733
This PR introduces a new `comparable_bytes` class to add byte-comparable format support for all the [native cql3 data types](https://opensource.docs.scylladb.com/stable/cql/types.html#native-types) except `counter` type as that is not comparable. The byte-comparable format is a pre-requisite for implementing the trie based index format for our sstables(https://github.com/scylladb/scylladb/issues/19191). This implementation adheres to the byte-comparable format specification in https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md
Note that support for composite data types like lists, maps, and sets has not been implemented yet and will be made available in a separate PR.
Refs https://github.com/scylladb/scylladb/issues/19407
New feature - backport not required.
Closesscylladb/scylladb#23541
* github.com:scylladb/scylladb:
types/comparable_bytes: add testcase to verify compatibility with cassandra
types/comparable_bytes: support variable-length natively byte-ordered data types
types/comparable_bytes: support decimal cql3 types
types/comparable_bytes: introduce count_digits() method
types/comparable_bytes: support uuid and timeuuid cql3 types
types/comparable_bytes: support varint cql3 type
types/comparable_bytes: support skipping sign byte write in decode_signed_long_type
types/comparable_bytes: introduce encode/decode_varint_length
types/comparable_bytes: support float and double cql3 types
types/comparable_bytes: support date, time and timestamp cql3 types
types/comparable_bytes: support bigint cql3 type
types/comparable_bytes: support fixed length signed integers
types/comparable_bytes: support boolean cql3 type
types: introduce comparable_bytes class
bytes_ostream: overload write() to support writing from FragmentedView
docs: fix minor typo in docs/dev/cql3-type-mapping.md
When describing a table, we need to do it carefully: if some
columns were dropped, we must specify that explicitly by
```
ALTER TABLE {table} DROP {column} USING TIMESTAMP ...
```
in the result of the DESCRIBE statement. Failing to do so
could lead to data resurrection.
However, if a table has been altered many, many times,
we might end up with a huge create statement. Constructing
it could, in turn, trigger an oversized allocation.
Some tests ran into that very problem in fact.
In this commit, we want to mitigate the problem: instead of
allocating a contiguous chunk of memory for the create
statement, we use `bytes_ostream` and `managed_bytes` to
possibly keep data scattered in memory. It makes handling
`cql3::description` less convenient in the code, but since
the struct is pretty much immediately serialized after
creating it, it's a very good trade-off.
A reproducer is intentionally not provided by this commit:
it's easy to test the change, but adding and dropping
a huge number of columns would take a really long amount
of time, so we need to omit it.
Fixesscylladb/scylladb#24018
Backport: all of the supported versions are affected, so we want to backport the changes there.
Closesscylladb/scylladb#24151
* github.com:scylladb/scylladb:
cql3/description: Serialize only rvalues of description
cql3: Represent create_statement using managed_string
cql3/statements/describe_statement.cc: Don't copy descriptions
cql3: Use managed_bytes instead of bytes in DESCRIBE
utils/managed_string.hh: Introduce managed_string and fragmented_ostringstream