Commit Graph

48408 Commits

Author SHA1 Message Date
Pawel Pery
1f797e2fcd vector_store_client: implement ip addr retrieval from dns
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.

It implements functionality for refreshing ip address of the
vector-store service dns name and creating a new HTTP client with that
address. It also provides cleanup of unused http clients. There are
hardcoded intervals for dns refresh and old http clients cleanup, and
timeout for requesting new http client.

This patch introduces two background tasks - for dns resolving
task and for cleanup old http clients.

It adds unit tests for possible dns refreshing issues.

Reference: VS-47
Fixes: VS-45
2025-07-09 11:54:51 +02:00
Pawel Pery
8d3c33f74a utils: refactor sequential_producer as abortable
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.

There is a need for abortable sequention_producer operator(). The
existing operator() is changed to allow timeout argument with default
time_point::max() (as current default usage) and the new operator() is
created with abort_source parameter.

Reference: VS-47
2025-07-08 16:29:55 +02:00
Pawel Pery
7bf53fc908 vector_store_client: implement initial vector_store_client service
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.

It adds a `services/vector_store_client.{cc|hh}` sharded service and a
configuration parameter `vector_store_uri` with a
`http://vector-store.dns.name:port` format. If there will be an error
during parsing that parameter there will be an exception during
construction.

For the future unit testing purposes the patch adds
`vector_store_client_tester` as a way to inject mockup functionality.

This service will be used by the select statements for the Vector search
indexes (see VS-46). For this reason I've added vector_store_client
service in the query processor.

Reference: VS-47 VS-45
2025-07-08 16:29:55 +02:00
Yaniv Michael Kaul
82fba6b7c0 PowerPC: remove ppc stuff
We don't even compile-test it.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#24659
2025-07-08 10:38:23 +03:00
Piotr Dulikowski
6c65f72031 Merge 'batchlog_manager: abort replay of a failed batch on shutdown or node down' from Michael Litvak
When replaying a failed batch and sending the mutation to all replicas, make the write response handler cancellable and abort it on shutdown or if some target is marked down. also set a reasonable timeout so it gets aborted if it's stuck for some other unexpected reason.

Previously, the write response handler is not cancellable and has no timeout. This can cause a scenario where some write operation by the batchlog manager is stuck indefinitely, and node shutdown gets stuck as well because it waits for the batchlog manager to complete, without aborting the operation.

backport to relevant versions since the issue can cause node shutdown to hang

Fixes scylladb/scylladb#24599

Closes scylladb/scylladb#24595

* github.com:scylladb/scylladb:
  test: test_batchlog_manager: batchlog replay includes cdc
  test: test_batchlog_manager: test batch replay when a node is down
  batchlog_manager: set timeout on writes
  batchlog_manager: abort writes on shutdown
  batchlog_manager: create cancellable write response handler
  storage_proxy: add write type parameter to mutate_internal
2025-07-07 16:48:07 +02:00
Patryk Jędrzejczak
2a52834b7f Merge 'Make it easier to debug stuck raft topology operation.' from Gleb Natapov
The series adds more logging and provides new REST api around topology command rpc execution to allow easier debugging of stuck topology operations.

Backport since we want to have in the production as quick as possible.

Fixes #24860

Closes scylladb/scylladb#24799

* https://github.com/scylladb/scylladb:
  topology coordinator: log a start and an end of topology coordinator command execution at info level
  topology coordinator: add REST endpoint to query the status of ongoing topology cmd rpc
2025-07-07 15:40:44 +02:00
Piotr Dulikowski
ea35302617 Merge 'test: audit: enable syslog audit tests' from Andrzej Jackowski
Several audit test issues caused test failures, and in the result, almost all of audit syslog tests were marked with xfail.
This patch series enables the syslog audit tests, that should finally pass after the following fixes are introduced:
 - bring back commas to audit syslog (scylladb#24410 fix)
 - synchronize audit syslog server
 - fix parsing of syslog messages
 - generate unique uuid for each line in syslog audit
 - allow audit logging from multiple nodes

Fixes: scylladb/scylladb#24410

Test improvements, no backport required.

Closes scylladb/scylladb#24553

* github.com:scylladb/scylladb:
  test: audit: use automatic comparators in AuditEntry
  test: audit: enable syslog audit tests
  test: audit: sort new audit entries before comparing with expected ones
  test: audit: check audit logging from multiple nodes
  test: audit: generate unique uuid for each line in syslog audit
  test: audit: fix parsing of syslog messages
  test: audit: synchronize audit syslog server
  docs: audit: update syslog audit format to the current one
  audit: bring back commas to audit syslog
2025-07-07 12:45:44 +02:00
Pavel Emelyanov
84e1ac5248 sstables: Move versions static-assertion check to .cc file
Thiss check validates that static values of supported versions are "in
sync" with each other. It's enough to do it once when compiling
sstable_version.cc, not every time the header is included.

refs: #1 (not that it helps noticeably, but technically it fits)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24839
2025-07-07 13:16:21 +03:00
Michael Litvak
d7af26a437 test: test_batchlog_manager: batchlog replay includes cdc
Add a new test that verifies that when replaying batch mutations from
the batchlog, the mutations include cdc augmentation if needed.

This is done in order to verify that it works currently as expected and
doesn't break in the future.
2025-07-07 12:24:05 +03:00
Michael Litvak
a9b476e057 test: test_batchlog_manager: test batch replay when a node is down
Add a test of the batchlog manager replay loop applying failed batches
while some replica is down.

The test reproduces an issue where the batchlog manager tries to replay
a failed batch, doesn't get a response from some replica, and becomes
stuck.

It verifies that the batchlog manager can eventually recover from this
situation and continue applying failed batches.
2025-07-07 12:23:06 +03:00
Michael Litvak
74a3fa9671 batchlog_manager: set timeout on writes
Set a timeout on writes of replayed batches by the batchlog manager.

We want to avoid having infinite timeout for the writes in case it gets
stuck for some unexpected reason.

The timeout is set to be high enough to allow any reasonable write to
complete.
2025-07-07 12:23:06 +03:00
Michael Litvak
7150632cf2 batchlog_manager: abort writes on shutdown
On shutdown of batchlog manager, abort all writes of replayed batches
by the batchlog manager.

To achieve this we set the appropriate write_type to BATCH, and on
shutdown cancel all write handlers with this type.
2025-07-07 12:23:06 +03:00
Michael Litvak
fc5ba4a1ea batchlog_manager: create cancellable write response handler
When replaying a batch mutation from the batchlog manager and sending it
to all replicas, create the write response handler as cancellable.

To achieve this we define a new wrapper type for batchlog mutations -
batchlog_replay_mutation, and this allows us to overload
create_write_response_handler for this type. This is similar to how it's
done with hint_wrapper and read_repair_mutation.
2025-07-07 12:23:06 +03:00
Michael Litvak
8d48b27062 storage_proxy: add write type parameter to mutate_internal
Currently mutate_internal has a boolean parameter `counter_write` that
indicates whether the write is of counter type or not.

We replace it with a more general parameter that allows to indicate the
write type.

It is compatible with the previous behavior - for a counter write, the
type COUNTER is passed, and otherwise a default value will be used
as before.
2025-07-07 12:23:06 +03:00
Gleb Natapov
4e6369f35b topology coordinator: log a start and an end of topology coordinator command execution at info level
Those calls a relatively rare and the output may help to analyze issues
in production.
2025-07-07 10:46:22 +03:00
Gleb Natapov
c8ce9d1c60 topology coordinator: add REST endpoint to query the status of ongoing topology cmd rpc
The topology coordinator executes several topology cmd rpc against some nodes
during a topology change. A topology operation will not proceed unless
rpc completes (successfully or not), but sometimes it appears that it
hangs and it is hard to tell on which nodes it did not complete yet.
Introduce new REST endpoint that can help with debugging such cases.
If executed on the topology coordinator it returns currently running
topology rpc (if any) and a list of nodes that did not reply yet.
2025-07-07 10:46:03 +03:00
Avi Kivity
d4efefbd9c Merge 'Improve background disposal of tablet_metadata' from Benny Halevy
As seen in #23284, when the tablet_metadata contains many tables, even empty ones,
we're seeing a long queue of seastar tasks coming from the individual destruction of
`tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>`.

This change improves `tablet_metadata::clear_gently` to destroy the `tablet_map_ptr` objects
on their owner shard by sorting them into vectors, per- owner shard.

Also, background call to clear_gently was added to `~token_metadata`, as it is destroyed
arbitrarily when automatic token_metadata_ptr variables go out of scope, so that the
contained tablet_metadata would be cleared gently.

Finally, a unit test was added to reproduce the `Too long queue accumulated for gossip` symptom
and verify that it is gone with this change.

Fixes #24814
Refs #23284

This change is not marked as fixing the issue since we still need to verify that there is no impact on query performance, reactor stalls, or large allocations, with a large number of tablet-based tables.

* Since the issue exists in 2025.1, requesting backport to 2025.1 and upwards

Closes scylladb/scylladb#24618

* github.com:scylladb/scylladb:
  token_metadata_impl: clear_gently: release version tracker early
  test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables
  token_metadata: clear_and_destroy_impl when destroyed
  token_metadata: keep a reference to shared_token_metadata
  token_metadata: move make_token_metadata_ptr into shared_token_metadata class
  replica: database: get and expose a mutable locator::shared_token_metadata
  locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction
2025-07-06 19:43:50 +03:00
Benny Halevy
6e4803a750 token_metadata_impl: clear_gently: release version tracker early
No need to wait for all members to be cleared gently.
We can release the version earlier since the
held version may be awaited for in barriers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-07-06 15:07:31 +03:00
Benny Halevy
4a3d14a031 test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables
Reproduces #23284

Currently skipped in release mode since it requires
the `short_tablet_stats_refresh_interval` interval.
Ref #24641

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-07-06 15:07:31 +03:00
Benny Halevy
2c0bafb934 token_metadata: clear_and_destroy_impl when destroyed
We have a lot of places in the code where
a token_metadata_ptr is kept in an automatic
variable and destroyed when it leaves the scope.
since it's a referenced counted lw_shared_ptr,
the token_metadata object is rarely destroyed in
those cases, but when it is, it doesn't go through
clear_gently, and in particular its tablet_metadata
is not cleared gently, leading to inefficient destruction
of potentially many foreign_ptr:s.

This patch calls clear_and_destroy_impl that gently
clears and destroys the impl object in the background
using the shared_token_metadata.

Fixes #13381

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-07-06 15:07:31 +03:00
Benny Halevy
2b2cfaba6e token_metadata: keep a reference to shared_token_metadata
To be used by a following patch to gently clean and destroy
the token_data_impl in the background.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-07-06 15:07:31 +03:00
Benny Halevy
e0a19b981a token_metadata: move make_token_metadata_ptr into shared_token_metadata class
So we can use the local shared_token_metadata instance
for safe background destroy of token_metadata_impl:s.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-07-06 14:22:20 +03:00
Benny Halevy
493a2303da replica: database: get and expose a mutable locator::shared_token_metadata
Prepare for next patch, the will use this shared_token_metadata
to make mutable_token_metadata_ptr:s

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-07-06 14:22:20 +03:00
Benny Halevy
3acca0aa63 locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction
Sort all tablet_map_ptr:s by shard_id
and then destroy them on each shard to prevent
long cross-shard task queues for foreign_ptr destructions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-07-06 14:20:46 +03:00
Pavel Emelyanov
4d6385fc27 api: Remove unused get_json_return_type() templates
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24837
2025-07-05 18:42:02 +03:00
Avi Kivity
33225b730d Merge 'Do not reference db::config by transport::server' from Pavel Emelyanov
The db::config is top-level configuration class that includes options for pretty much everything in Scylla. Instead of messing with this large thing, individual services have their own smaller configs, that are initialized with values from db::config. This PR makes it for transport::server (transport::controller will be next) and its cql_server_config. One bad thing not to step on is that updateable_value is not shard-safe (#7316), but the code in controller that creates cql_server_config is already taking care.

Closes scylladb/scylladb#24841

* github.com:scylladb/scylladb:
  transport: Stop using db::config by transport::server
  transport: Keep uninitialized_connections_semaphore_cpu_concurrency on cql_server_config
  transport: Move cql_duplicate_bind_variable_names_refer_to_same_variable to cql_server_config
  transport: Move max_concurrent_requests to struct config
  transport: Use cql_server_config::max_request_size
2025-07-05 18:39:01 +03:00
Pavel Emelyanov
9b178df7dd transport: Stop using db::config by transport::server
Now the server is self-contained in the way it is being configured by
the controller.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-07-04 15:40:20 +03:00
Pavel Emelyanov
e2c1484d8d transport: Keep uninitialized_connections_semaphore_cpu_concurrency on
cql_server_config

This also repeats previous patch for another updateable_value. The thing
here is that this config option is passed further to generic_server, but
not used by transport::server itslef.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-07-04 15:40:20 +03:00
Pavel Emelyanov
64ffe67cbd transport: Move cql_duplicate_bind_variable_names_refer_to_same_variable
to cql_server_config

Similarly to previous patch -- move yet another updateable_value to let
transport::server eventually stop messing with db::config.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-07-04 15:40:14 +03:00
Pavel Emelyanov
b6546ed5ff transport: Move max_concurrent_requests to struct config
This is updateable_value that's initialized from db::config named_value
to tackle its shard-unsafety. However, the cql_server_config is created
by controller using sharded_parameter() helper, so that is can be safely
passed to server.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-07-04 15:35:55 +03:00
Pavel Emelyanov
6075eca168 transport: Use cql_server_config::max_request_size
It's duplicated on config and the transport::server that aggregates the
config itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-07-04 15:34:53 +03:00
Andrzej Jackowski
55e542e52e test: audit: use automatic comparators in AuditEntry
Replace manual comparator implementations with generated comparators.
This simplifies future maintenance and ensures comparators
remain accurate when new fields are added.

Reorder fields in AuditEntry so the less-than comparator evaluates
the most significant fields first.
2025-07-04 13:08:29 +02:00
Andrzej Jackowski
d7711a5f3a test: audit: enable syslog audit tests
Several audit test issues were resolved in numerous commits of this
patch series. This commit enables the syslog audit tests, that should
finally pass.
2025-07-04 12:40:57 +02:00
Andrzej Jackowski
3ebc693e70 test: audit: sort new audit entries before comparing with expected ones
In some corner cases, the order of audit entries can change. For
instance, ScyllaDB is allowed to apply BATCH statements in an order
different from the order in which they are listed in the statement.
To prevent test failures in such cases, this commit sorts new
audit entries.

Additionally, it is possible that some of the audit entries won't be
received by the SYSLOG server immediately. To prevent test failures
in this scenario, waiting for the expected number of new audit entries
is added.
2025-07-04 12:40:57 +02:00
Andrzej Jackowski
436e86d96a test: audit: check audit logging from multiple nodes
Before this change, the `assert_audit_row_eq` check assumed that
audit logs were always generated by the same (first) node. However,
this assumption is invalid in a multi-node setup.

This commit modifies the check to just verify that one of the nodes
in the cluster generated the audit log.
2025-07-04 12:40:57 +02:00
Andrzej Jackowski
2fefa29de7 test: audit: generate unique uuid for each line in syslog audit
Audit to TABLE uses a time UUID as a clustering key, while audit to
SYSLOG simply appends new lines. As a result, having such a detailed
time UUID is unnecessary for SYSLOG. However, TABLE tests expect each
line to be unique, and a similar check is performed (and fails)
in SYSLOG tests.

This commit updates the test framework to generate a unique UUID for
each line in SYSLOG audit. This ensures the tests remain consistent
for both TABLE and SYSLOG audit.
2025-07-04 12:40:57 +02:00
Andrzej Jackowski
f85e738b11 test: audit: fix parsing of syslog messages
Before this commit, there were following issues with parsing of syslog
messages in audit tests:
 - `line_to_row()` function was never called
 - `line_to_row()` was not prepared for changes introduced in
    scylladb#23099 (i.e. key=value pairs)
 - `line_to_row()` didn't handle newlines in queries
 - `line_to_row()` didn't handle "\\" escaping in queries

 Due to the aforementioned issues, the syslog audit tests were failing.
 This commit fixes all of those issues, by parsing each audit syslog
 message using a regexp.
2025-07-04 12:40:51 +02:00
Pavel Emelyanov
4d4406c5bc Merge 'test.py: dtest: port next_gating tests from auth_test.py' from Evgeniy Naydanov
Copy `auth_test.py` from scylla-dtest test suite, remove all not next_gating tests from it, and make it works with `test.py`

As a part of the porting process, remove unused imports and markers, remove non-next_gating tests and tests marked with `required_features("!consistent-topology-changes")` marker.

Remove `test_permissions_caching` test because it's too flaky when running using test.py

Also, make few time execution optimizations:
  - remove redundant `time.sleep(10)`
  - use smaller timeouts for CQL sessions

Enable the test in `suite.yaml` (run in dev mode only.)

Additional modifications to test.py/dtest shim code:

- Modify ManagerClient.server_update_config() method to change multiple config options in one call in addition to one `key: value` pair.
- Implement the method using slightly modified `set_configuration_options()` method of `ScyllaCluster`.
- Copy generate_cluster_topology() function from tools/cluster_topology.py module.
- Add support for `bootstrap` parameter for `new_node()` function.
- Rework `wait_for_any_log()` function.

Closes scylladb/scylladb#24648

* github.com:scylladb/scylladb:
  test.py: dtest: make auth_test.py run using test.py
  test.py: dtest: rework wait_for_any_log()
  test.py: dtest: add support for bootstrap parameter for new_node
  test.py: dtest: add generate_cluster_topology() function
  test.py: dtest: add ScyllaNode.set_configuration_options() method
  test.py: pylib/manager_client: support batch config changes
  test.py: dtest: copy unmodified auth_test.py
  test.py: dtest: add missed markers to pytest.ini
2025-07-04 10:51:52 +03:00
Botond Dénes
258bf664ee scylla-gdb.py: sstable-summary: adjust for raw-tokens
01466be7b9 changed the summary entries, storing raw tokens in them,
instead of dht::token. Adjust the command so that it works with both
pre- and post- versions.
Also make it accept pointers to sstables as arguments, this is what
scylla sstables listing provides.

Closes scylladb/scylladb#24759
2025-07-04 10:44:25 +03:00
Patryk Jędrzejczak
8d925b5ab4 test: increase the default timeout of graceful shutdown
Multiple tests are currently flaky due to graceful shutdown
timing out when flushing tables takes more than a minute. We still
don't understand why flushing is sometimes so slow, but we suspect
it is an issue with new machines spider9 and spider11 that CI runs
on. All observed failures happened on these machines, and most of
them on spider9.

In this commit, we increase the timeout of graceful shutdown as
a temporary workaround to improve CI stability. When we get to
the bottom of the issue and fix it, we will revert this change.

Ref #12028

It's a temporary workaround to improve CI stability, we don't
have to backport it.

Closes scylladb/scylladb#24802
2025-07-04 10:43:38 +03:00
Avi Kivity
60f407bff4 storage_proxy: avoid large allocation when storing batch in system.batchlog
Currently, when computing the mutation to be stored in system.batchlog,
we go through data_value. In turn this goes through `bytes` type
(#24810), so it causes a large contiguous allocation if the batch is
large.

Fix by going through the more primitive, but less contiguous,
atomic_cell API.

Fixes #24809.

Closes scylladb/scylladb#24811
2025-07-04 10:43:05 +03:00
Avi Kivity
5cbeae7178 sstables: drop minimum_key(), maximum_key()
Not used.

Closes scylladb/scylladb#24825
2025-07-04 10:42:44 +03:00
Dawid Mędrek
a151944fa6 treewide: Replace __builtin_expect with (un)likely
C++20 introduced two new attributes--likely and unlikely--that
function as a built-in replacement for __builtin_expect implemented
in various compilers. Since it makes code easier to read and it's
an integral part of the language, there's no reason to not use it
instead.

Closes scylladb/scylladb#24786
2025-07-03 13:34:04 +03:00
dependabot[bot]
59cc496757 build(deps): bump sphinx-scylladb-theme from 1.8.6 to 1.8.7 in /docs
Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.8.6 to 1.8.7.
- [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases)
- [Commits](https://github.com/scylladb/sphinx-scylladb-theme/compare/1.8.6...1.8.7)

---
updated-dependencies:
- dependency-name: sphinx-scylladb-theme
  dependency-version: 1.8.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Closes scylladb/scylladb#24805
2025-07-03 12:04:24 +03:00
Gleb Natapov
ca7837550d topology coordinator: do not set request_type field for truncation command if topology_global_request_queue feature is not enabled yet
Old nodes do not expect global topology request names to be in
request_type field, so set it only if a cluster is fully upgraded
already.

Closes scylladb/scylladb#24731
2025-07-02 17:09:29 +02:00
Pavel Emelyanov
fa0077fb77 Merge 'S3 chunked download source bug fixes' from Ernest Zaslavsky
- Fix missing negation in the `if` in the background downloading fiber
- Add test to catch this case
- Improve the s3 proxy to inject errors if the same resource requested more than once
- Suppress client retry since retrying the same request when each produces multiple buffers may lead to the same data appear more than once in the buffer deque
- Inject exception from the test to simulate response callback failure in the middle

No need to backport anything since this class in not used yet

Closes scylladb/scylladb#24657

* github.com:scylladb/scylladb:
  s3_test: Add s3_client test for non-retryable error handling
  s3_test: Add trace logging for default_retry_strategy
  s3_client: Fix edge case when the range is exhausted
  s3_client: Fix indentation in try..catch block
  s3_client: Stop retries in chunked download source
  s3_client: Enhance test coverage for retry logic
  s3_client: Add test for Content-Range fix
  s3_client: Fix missing negation
  s3_client: Refine logging
  s3_client: Improve logging placement for current_range output
2025-07-02 14:45:10 +03:00
Patryk Jędrzejczak
fa982f5579 docs: handling-node-failures: fix typo
Replacing "from" is incorrect. The typo comes from recently
merged #24583.

Fixes #24732

Requires backport to 2025.2 since #24583 has been backported to 2025.2.

Closes scylladb/scylladb#24733
2025-07-02 12:22:01 +03:00
Konstantin Osipov
37fc4edeb5 test.py: add a way to provide pytest arguments via test.py
Now that we use a single pytest.ini for all tests, different
developer preferences collide. There should be an easy way to override
pytest.ini defaults from the command line.

Fixes https://github.com/scylladb/scylladb/issues/21800

Closes scylladb/scylladb#24573
2025-07-02 12:20:43 +03:00
Avi Kivity
dfaed80f55 Merge 'types: add byte-comparable format support for native cql3 types' from Lakshmi Narayanan Sreethar
This PR introduces a new `comparable_bytes` class to add byte-comparable format support for all the [native cql3 data types](https://opensource.docs.scylladb.com/stable/cql/types.html#native-types) except `counter` type as that is not comparable. The byte-comparable format is a pre-requisite for implementing the trie based index format for our sstables(https://github.com/scylladb/scylladb/issues/19191). This implementation adheres to the byte-comparable format specification in https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md

Note that support for composite data types like lists, maps, and sets has not been implemented yet and will be made available in a separate PR.

Refs https://github.com/scylladb/scylladb/issues/19407

New feature - backport not required.

Closes scylladb/scylladb#23541

* github.com:scylladb/scylladb:
  types/comparable_bytes: add testcase to verify compatibility with cassandra
  types/comparable_bytes: support variable-length natively byte-ordered data types
  types/comparable_bytes: support decimal cql3 types
  types/comparable_bytes: introduce count_digits() method
  types/comparable_bytes: support uuid and timeuuid cql3 types
  types/comparable_bytes: support varint cql3 type
  types/comparable_bytes: support skipping sign byte write in decode_signed_long_type
  types/comparable_bytes: introduce encode/decode_varint_length
  types/comparable_bytes: support float and double cql3 types
  types/comparable_bytes: support date, time and timestamp cql3 types
  types/comparable_bytes: support bigint cql3 type
  types/comparable_bytes: support fixed length signed integers
  types/comparable_bytes: support boolean cql3 type
  types: introduce comparable_bytes class
  bytes_ostream: overload write() to support writing from FragmentedView
  docs: fix minor typo in docs/dev/cql3-type-mapping.md
2025-07-02 11:58:32 +03:00
Avi Kivity
1e0b015c8b Merge 'cql3: Represent create_statement using managed_bytes' from Dawid Mędrek
When describing a table, we need to do it carefully: if some
columns were dropped, we must specify that explicitly by

```
ALTER TABLE {table} DROP {column} USING TIMESTAMP ...
```

in the result of the DESCRIBE statement. Failing to do so
could lead to data resurrection.

However, if a table has been altered many, many times,
we might end up with a huge create statement. Constructing
it could, in turn, trigger an oversized allocation.
Some tests ran into that very problem in fact.

In this commit, we want to mitigate the problem: instead of
allocating a contiguous chunk of memory for the create
statement, we use `bytes_ostream` and `managed_bytes` to
possibly keep data scattered in memory. It makes handling
`cql3::description` less convenient in the code, but since
the struct is pretty much immediately serialized after
creating it, it's a very good trade-off.

A reproducer is intentionally not provided by this commit:
it's easy to test the change, but adding and dropping
a huge number of columns would take a really long amount
of time, so we need to omit it.

Fixes scylladb/scylladb#24018

Backport: all of the supported versions are affected, so we want to backport the changes there.

Closes scylladb/scylladb#24151

* github.com:scylladb/scylladb:
  cql3/description: Serialize only rvalues of description
  cql3: Represent create_statement using managed_string
  cql3/statements/describe_statement.cc: Don't copy descriptions
  cql3: Use managed_bytes instead of bytes in DESCRIBE
  utils/managed_string.hh: Introduce managed_string and fragmented_ostringstream
2025-07-01 21:59:38 +03:00