Rename to make it more explicit where the error injection happens.
Also change how the error is injected, use the lambda overload instead
of is_enabled(), the former leaves better trace in logs, which helps
when debugging tests.
For tests that contain multiple assert_that() invokations, identifying
the one that failed is very challenging. Add source location to fail
messages to allow convenient identification of the call-site.
To enable assertions on columns which are sometimes null.
One existing user of with_typed_column() needs adjustment, because the
previous version of with_typed_column() covered up silently for null
value, but after this patch this caused a failure.
Although the value of this item is indeed derived from the write timeout
config, the name doesn't reflect what it is used for. Change it to
reflect it better.
New batchlogs are written to the batchlog_v2 table and replay also uses
the v2 table.
The content of system.batchlog is attempted to be migrated to
system.batchlog_v2 after each start of the batchlog_manager service.
The migration is retried on each replay if it fails. This is reduntant
but simple.
Batchlog cleanup now doesn't involve flushing memtables, the only
remaining user of replica/database.hh is gone, so the include is
dropped.
Rearranges the system.batchlog schema as follows:
CREATE TABLE system.batchlog_v2 (
version int,
stage int,
shard int,
written_at timestamp,
id uuid,
data blob,
PRIMARY KEY ((version, stage, shard), written_at, id));
With the following goals:
1) Make post-replay batchlog cleanup possible with a simple
range-tombstone. This allows dropping the individual dead batchlog
entries, as they are shadowed by a higher level tombstone. This
enables dropping tombstones without tombstone GC.
2) To make the above possible, introduce the stage key component:
batchlog entries that fail the first replay attempt, are moved to the
failed_replay stage, so the initial stage can be cleaned up safely.
3) Spread out the data among Scylla shards, via the batchlog shard
column.
4) Make batchlog entries ordered by the batchlog create time (id). This
allows for selecting batchlogs to replay, without post-filtering of
batchlogs that are too young to be replayed.
Don't build batchlog delete mutations in storage-proxy code. Move this
code into db/batchlog_manager.cc, exposed via db/batchlog.hh.
This serves multiple goals:
1) Concentrates low-level batchlog related logic in
db/batchlog_manager.cc
2) Reduce current and future code duplication.
3) Make future changes to this logic easier.
Don't build batchlog mutations in storage-proxy code. Move this code
into db/batchlog_manager.cc, exposed via db/batchlog.hh.
This serves multiple goals:
1) Concentrates low-level batchlog related logic in
db/batchlog_manager.cc
2) Reduce current and future code duplication.
2) Make future changes to this logic easier.
Just skip the mutation(s) whose tables were dropped instead.
Use the newly introduced data_dictionary::table::get_truncation_time()
to avoid looking up real table object.
The map_reduce achieves no concurrency, both map and reduce are
synchronous. It only achieves two redundant lookups for the table and
hard-to-read code. Convert it into a simple loop. Preserve the
stall-protection by adding a maybe_yield() to the loop.
Add cleanup flag value to start message and drop cpu, it is redundant as
Scylla already adds the shard number to the logs.
Add all_replayed to finish message.
set worksteal disribution for xdist(new sheduler)
Because now it shows better tests distribution that standart(load) in CI
Closesscylladb/scylladb#27354
Add comprehensive coding guidelines for GitHub Copilot to improve
quality and consistency of AI-generated code. Instructions cover C++
and Python development with language-specific best practices, build
system usage, and testing workflows.
Following GitHub Copilot's standard layout with general instructions
in .github/copilot-instructions.md and language-specific files in
.github/instructions/ directory using *.instructions.md naming.
No backport: This change is only for developers in master, so it doesn't
need to be backported.
Closesscylladb/scylladb#25374
In this series we implement Alternator's support for gzip-compressed
requests, i.e., requests with the "Content-Encoding: gzip" header,
other uncompressed header, and a gzip-compressed body.
The server needs to verify the signature of the *compressed* content,
and then uncompress the body before running the request.
We only support gzip compression because this is what DynamoDB supports.
But in the future we can easily add support for other compression
algorithms like lz4 or zstd.
This series Refs #5041 but doesn't "Fixes" it because it only implements
compressed requests (Content-Encoding), *not* compressed responses
(Accept-Encoding).
In addition to the code changes, the series also contains tests for this
feature that make sure it behaves like DynamoDB.
Note that while we will have now support in our server for compressed
requests, just like DynamoDB does, the clients (AWS SDKs) will probably
NOT make use of it because they do not enable request compression by
default. For example, see the tests for some hoops one needs to jump
through in boto3 (the Python SDK) to send compressed requests. However,
we are hoping that in the future Alternator's modified clients will
use compressed requests and enjoy this feature.
Closesscylladb/scylladb#27080
* github.com:scylladb/scylladb:
test/alternator: enable, and add, tests for gzip'ed requests
alternator: implement gzip-compressed requests
seastar::compat::source_location (which should not have been used
outside Seastar) is replaced with std::source_location to avoid
deprecation warnings. The relevant header, which was removed, is no
longer included.
* seastar 8c3fba7a...b5c76d6b (3):
> testing: There can be only one memory_data_sink
> util: Use std::source_location directly
> Merge 'net: support proxy protocol v2' from Avi Kivity
apps: httpd: add --load-balancing-algorithm
apps: httpd: add /shard endpoint
test: socket_test: add proxy protocol v2 test suite
test: socket_test: test load balancer with proxy protocol
net: posix_connected_socket: specialize for proxied connections
net: posix_server_socket_impl: implement proxy protocol in server sockets
net: posix_server_socket_impl: adjust indentation
net: posix_server_socket_impl: avoid immediately-invoked lambda
net: conntrack: complete handle nested class special member functions
net: posix_server_socket_impl: coroutinize accept()
Closesscylladb/scylladb#27316
This patch increases the compatibility with DynamoDB Streams by integrating the DynamoDB's event type rules (described in https://github.com/scylladb/scylladb/issues/6918) into Alternator. The main changes are:
- introduce a new flag `alternator_streams_strict_compatibility`, meant as a guard of performance-intensive operations that increase the compatibility with DynamoDB Streams. If enabled, Alternator always performs a RBW before a data-modifying operation, and propagates its result to CDC. Then, the old item is compared to the new one, to determine the mutation type (INSERT vs MODIFY). This option is a no-op for tables with disabled Alternator Streams,
- reduce splitting of simple Alternator mutations,
- correctly distinguish event types described in #6918, except for item deletes. Deleting a missing item with DeleteItem, BatchWriteItem, or a missing field with UpdateItem still emit REMOVEs.
To summarize, the emitted events of the data manipulation operations should be as follows:
- DeleteItem/BatchWriteItem.DeleteItem of existing item: REMOVE (OK)
- DeleteItem of nonexistent item: nothing (OK)
- BatchWriteItem.DeleteItem of nonexistent item: nothing (OK)
- PutItem/UpdateItem/BatchWriteItem.PutItem of existing and not equal item: MODIFY (OK)
- PutItem/UpdateItem/BatchWriteItem.PutItem of existing and equal item: nothing (OK)
- PutItem/UpdateItem/BatchWriteItem.PutItem of nonexistent item: INSERT (OK)
No backport is necessary.
Refs https://github.com/scylladb/scylladb/pull/26149
Refs https://github.com/scylladb/scylladb/pull/26396
Refs https://github.com/scylladb/scylladb/issues/26382
Fixes https://github.com/scylladb/scylladb/issues/6918
Closes scylladb/scylladb#26121
* github.com:scylladb/scylladb:
test/alternator: Enable the tests failing because of #6918
alternator, cdc: Don't emit events for no-op removes
alternator, cdc: Don't emit an event for equal items
alternator/streams, cdc: Differentiate item replace and item update in CDC
alternator: Change the return type of rmw_operation_return
config: Add alternator_streams_strict_compatibility flag
cdc: Don't split a row marker away from row cells
Consider this:
1) n1 is the topology coordinator
2) n1 schedules and executes a tablet repair with session id s1 for a
tablet on n3 an n4.
3) n3 and n4 take and store the in _rs._repair_compaction_locks[s1]
4) n1 steps down before it executes
locator::tablet_transition_stage::end_repair
5) n2 becomes the new topology coordinator
6) n2 runs locator::tablet_transition_stage::repair again
7) n3 and n4 try to take the lock again and hangs since the lock is
already taken.
To avoid the deadlock, we can throw in step 7 so that n2 will
proceed to end_repair stage and release the lock. After that, the
scheduler could schedule the tablet repair request again.
Fixes#26346Closesscylladb/scylladb#27163
Fix unlikely use-after-free in `encode_paging_state`. The function
incorrectly assumes that current position to encode will always have
data for all clustering columns the schema defines. It's possible to
encounter current position having less than all columns specified, for
eample in case of range tombstone. Those don't happen in Alternator
tables as DynamoDB doesn't allow range deletions and clustering key
might be of size at most 1. Alternator api can be used to read
scylla system tables and those do have range tombstones with more
than single clustering column.
The fix is to stop trying to encode columns, that don't have the value -
they are not needed anyway, as there's no possible position with those
values (range tombstone made sure of that).
Fixes#27001Fixes#27125Closesscylladb/scylladb#26960
Previously, `wait_until_driver_service_level_created` only waited for
the `driver` service level to appear in the output of
`LIST ALL SERVICE_LEVELS`. However, the fact that one node lists
`sl:driver` does not necessarily mean that all other nodes can see
it yet. This caused sporadic test failures, especially in DEBUG builds.
To prevent these failures, this change adds an extra wait for
a `raft/read_barrier` after the `driver` service level first appears.
This ensures the service level is globally visible across the cluster.
Fixes: https://github.com/scylladb/scylladb/issues/27019
Na backport - test fix for `sl:driver` tests, and this that is only available on `master`
Closesscylladb/scylladb#27076
* github.com:scylladb/scylladb:
test: wait for read_barrier in wait_until_driver_service_level_created
test: use ManagerClient in wait_until_driver_service_level_created
Before this commit, any attempt to create, alter, attach, or drop
the default service level would result in a syntax error whose
error message was unclear:
```
cqlsh> attach service level default to cassandra;
SyntaxException: line 1:21 no viable alternative at input 'default'
```
The error stems from the grammar not being able to parse `default`
as a correct service level name. To fix that, we cover it manually.
This way, the grammar accepts it and we can process it in Scylla.
The reason why we'd like to cover the default service level is that
it's an actual service level that the user should reference. Getting
a syntax error is not what should happen. Hence this fix.
We validate the input and if the given role is really the default
service level, we reject the query and provide an informative error
message.
Two validation tests are provided.
Fixesscylladb/scylladb#26699Closesscylladb/scylladb#27162
When we come across a segment truncation, this information may
be helpful to determine when the error occurred exactly and
hint at what code path might've led to it.
Closesscylladb/scylladb#27207
Fixes#24346
When reading, we check for each entry and each chunk, if advancing there
will hit EOF of the segment. However, IFF the last chunk being read has
the last entry _exactly_ matching the chunk size, and the chunk ending
at _exactly_ segment size (preset size, typically 32Mb), we did not check
the position, and instead complained about not being able to read.
This has literally _never_ happened in actual commitlog (that was replayed
at least), but has apparently happened more and more in hints replay.
Fix is simple, just check the file position against size when advancing
said position, i.e. when reading (skipping already does).
v2:
* Added unit test
Closesscylladb/scylladb#27236
In several exception handlers, only raft::request_aborted was being
caught and rethrown, while seastar::abort_requested_exception was
falling through to the generic catch(...) block. This caused the
exception to be incorrectly treated as a failure that triggers
rollback, instead of being recognized as an abort signal.
For example, during tablet draining, the error log showed:
"tablets draining failed with seastar::abort_requested_exception
(abort requested). Aborting the topology operation"
This change adds seastar::abort_requested_exception handling
alongside raft::request_aborted in all places where it was missing.
When rethrown, these exceptions propagate up to the main run() loop
where handle_topology_coordinator_error() recognizes them as normal
abort signals and allows the coordinator to exit gracefully without
triggering unnecessary rollback operations.
Fixes: scylladb/scylladb#27255
No backport: The problem was only seen in tests and not reported in
customer tickets, so it's enough to fix it in the main branch.
Closesscylladb/scylladb#27314
The tablet scheduler should not emit conflicting migrations for the same
tablet. This was addressed initially in scylladb/scylladb#26038 but the
check is missing in the merge colocation plan, so add it there as well.
Without this check, the merge colocation plan could generate a
conflicting migration for a tablet that is already scheduled for
migration, as the test demonstrates.
This can cause correctness problems, because if the load balancer
generates two migrations for a single tablet, both will be written as
mutations, and the resulting mutation could contain mixed cells from
both migrations.
Fixesscylladb/scylladb#27304Closesscylladb/scylladb#27312
And switch to std::source_location.
Upcoming seastar update will deprecate its compatibility layer.
The patch is
for f in $(git grep -l 'seastar::compat::source_location'); do
sed -e 's/seastar::compat::source_location/std::source_location/g' -i $f;
done
and removal of few header includes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#27309
`git log --format` doesn't add a newline after the last line. This
causes `read` to ignore that line, losing the last line (corresponding
to the first commit).
Use `git log --tformat` instead, which terminates the last line.
Closesscylladb/scylladb#27317
The reason for this seastar update is to have the fixed handling
of the `integer` type in `seastar-json2code` because it's needed
for further development of ScyllaDB REST API.
The following changes were introduced to ScyllaDB code to ensure it
compiles with the updated seastar:
- Remove `seastar/util/modules.hh` includes as the file was removed
from seastar
- Modified `metrics::impl::labels_type` construction in
`test/boost/group0_test.cc` because now it requires `escaped_string`
* seastar 340e14a7...8c3fba7a (32):
> Merge 'Remove net::packet usage from dns.cc' from Pavel Emelyanov
dns: Optimize packet sending for newer c-ares versions
dns: Replace net::packet with vector<temporary_buffer>
dns: Remove unused local variable
dns: Remove pointless for () loop wrapping
dns: Introduce do_sendv_tcp() method
dns: Introduce do_send_udp() method
> test: Add http rules test of matching order
> Merge 'Generalize packet_data_source into memory_data_source' from Pavel Emelyanov
memcached: Patch test to use memory_data_source
memcached: Use memory_data_source in server
rpc: Use memory_data_sink without constructing net::packet
util: Generalize packet_data_source into memory_data_source
> tests: coroutines: restore "explicit this" tests
> reactor: remove blocking of SIGILL
> Merge 'Update compilers in GH actions scripts' from Pavel Emelyanov
github: Use gcc-14
github: Use clang-20
> Merge 'Reinforce DNS reverse resolution test ' from Pavel Emelyanov
test: Make test_resolve() try several addresses
test: Coroutinize test_resolve() helper
> modules: make module support standards-compliant
> Merge 'Fix incorrect union access in dns resolver' from Pavel Emelyanov
dns: Squash two if blocks together
dns: Do not check tcp entry for udp type
> coroutine: Fix compilation of execute_involving_handle_destruction_in_await_suspend
> promise: Document that promise is resolved at most once
> coroutine: exception: workaround broken destroy coroutine handle in await_suspend
> socket: Return unspecified socket_address for unconnected socket
> smp: Fix exception safety of invoke_on_... internal copying
> Merge 'Improve loads evaluation by reactor' from Pavel Emelyanov
reactor: Keep loads timer on reactor
reactor: Update loads evaluation loop
> Merge 'scripts: add 'integer' type to seastar-json2code' from Andrzej Jackowski
test: extend tests/unit/api.json to use 'integer' type
scripts: add 'integer' type to seastar-json2code
> Merge 'Sanitize tls::session::do_put(_one)? overloads' from Pavel Emelyanov
tls: Rename do_put_one(temporary_buffer) into do_put()
tls: Fix indentation after previous patch
tls: Move semaphore grab into iterating do_put()
> net: tcp: change unsent queue from packets to temporary_buffer:s
> timer: Enable highres timer based on next timeout value
> rpc: Add a new constructor in closed_error to accept string argument
> memcache: Implement own data sink for responses
> Merge 'file: recursive_remove_directory: general cleanup' from Avi Kivity
file: do_recursive_remove_directory(): move object when popping from queue
file: do_recursive_remove_directory(): adjust indentation
file: do_recursive_remove_directory(): coroutinize
file: do_recursive_remove_directory(): simplify conditional
file: do_recursive_remove_directory(): remove wrong const
file: do_recursive_remove_directory(): clean up work_entry
> tests: Move thread_context_switch_test into perf/
> test: Add unit test for append_challenged_posix_file
> Merge 'Prometheus metrics handler optimization' from Travis Downs
prometheus: optimize metrics aggregation
prometheus: move and test aggregate_by helper
prometheus: various optimizations
metrics: introduce escaped_string for label values
metric:value: implement + in terms of +=
tests: add prometheus text format acceptance tests
extract memory_data_sink.hh
metrics_perf: enhance metrics bench
> demos: Simplify udp_zero_copy_demo's way of preparing the packet
> metrics: Remove deprecated make_...-ers
> Merge 'Make slab_test be BOOST kind' from Pavel Emelyanov
test: Use BOOST_REQUIRE checkers
test: Replace some SEASTAR_ASSERT-s with static_assert-s
test: Convert slab test into boost kind
> Merge 'Coroutinize lister_test' from Pavel Emelyanov
test: Fix indentation after previuous patch
test: Coroutinize lister_test lister::report() method
test: Coroutinize lister_test main code
> file: recursive_remove_directory(): use a list instead of a deque
> Merge 'Stop using packets in tls data_sink and session' from Pavel Emelyanov
tls: Stop using net::packet in session::put()
tls: Fix indentation after previous patch
tls: Split session::do_put()
tls: Mark some session methods private
Closesscylladb/scylladb#27240
After in the previous patch we implemented support in Alternator for
gzip-compressed requests ("Content-Encoding: gzip"), here we enable
an existing xfail-ing test for this feature, and also add more tests
for more cases:
* A test for longer compressed requests, or a short compressed
request which expands to a longer request. Since the decompression
uses small buffers, this test reaches additional code paths.
* Check for various cases of a malformed gzip'ed request, and also
an attempt to use an unsupported Content-Encoding. DynamoDB
returns error 500 for both cases, so we want to test that we
do to - and not silently ignore such errors.
* Check that two concatenated gzip'ed streams is a valid request,
and check that garbage at the end of the gzip - or a missing
character at the end of the gzip - is recognized as an error.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
We currently ignore the `_excluded` field in `node::clone()` and the verbose
formatter of `locator::node`. The first one is a bug that can have
unpredictable consequences on the system. The second one can be a minor
inconvenience during debugging.
We fix both places in this PR.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-72
This PR is a bugfix that should be backported to all supported branches.
Closesscylladb/scylladb#27265
* github.com:scylladb/scylladb:
locator/node: include _excluded in verbose formatter
locator/node: preserve _excluded in clone()
Otherwise, in a mixed cluster, the handle_tablet_resize_finalization
would fail because of the unknown rpc verb.
Fixes#26309Closesscylladb/scylladb#27218
We currently ignore the `_excluded` field in `clone()`. Losing
information about exclusion can have unpredictable consequences. One
observed effect (that led to finding this issue) is that the
`/storage_service/nodes/excluded` API endpoint sometimes misses excluded
nodes.
Commit 6e4803a750 broke notification about expired erms held for too long since it resets the tracker without calling its destructor (where notification is triggered). Fix the assign operator to call the destructor like it should.
Fixes https://github.com/scylladb/scylladb/issues/27141Closesscylladb/scylladb#27140
* https://github.com/scylladb/scylladb:
test: test that expired erm that held for too long triggers notification
token_metadata: fix notification about expiring erm held for to long
This patch enforces that vector indexes can only be created on keyspaces
that use tablets. During index validation, `check_uses_tablets()` verifies
the base keyspace configuration and rejects creation otherwise.
To support this, the `custom_index::validate()` API now receives a
`const data_dictionary::database&` parameter, allowing index
implementations to access keyspace-level settings during DDL validation.
Fixes https://scylladb.atlassian.net/browse/VECTOR-322Closesscylladb/scylladb#26786
A new feature was announced this week for Amazon DynamoDB, "multi-
attribute composite keys in global secondary indexes", which allows to
create GSIs with composite keys (multiple columns). This feature already
existed in CQL's materialized views, but didn't exist in DynamoDB until
now.
So this patch adds a paragraph to our docs/alternator/compatibility.md
mentioning that we don't support this DynamoDB feature yet.
See also issue #27182 which we opened to track this unimplemented
feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27183