In case an sstable unit test case is run individually, it would fail
with exception saying that S3_... environment is not set. It's better to
skip the test-case rather than fail. If someone wants to run it from
shell, it will have to prepare S3 server (minio/AWS public bucket) and
provide proper environment for the test-case.
refs: #13569
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13755
Currently there are only 2 tests for S3 -- the pure client test and compound object_store test that launches scylla, creates s3-backed table and CQL-queries it. At the same time there's a whole lot of small unit test for sstables functionality, part of it can run over S3 storage too.
This PR adds this support and patches several test cases to use it. More test cases are to come later on demand.
fixes: #13015Closes#13569
* github.com:scylladb/scylladb:
test: Make resharding test run over s3 too
test: Add lambda to fetch bloom filter size
test: Tune resharding test use of sstable::test_env
test: Make datafile test case run over s3 too
test: Propagate storage options to table_for_test
test: Add support for s3 storage_options in config
test: Outline sstables::test_env::do_with_async()
test: Keep storage options on sstable_test_env config
sstables: Add and call storage::destroy()
sstables: Coroutinize sstable::destroy()
The evictable reader must ensure that each buffer fill makes forward
progress, i.e. the last fragment in the buffer has a position larger
than the last fragment from the last buffer-fill. Otherwise, the reader
could get stuck in an infinite loop between buffer fills, if the reader
is evicted in-between.
The code guranteeing this forward change has a bug: when the next
expected position is a partition-start (another partition), the code
would loop forever, effectively reading all there is from the underlying
reader.
To avoid this, add a special case to ignore the progress guarantee loop
altogether when the next expected position is a partition start. In this
case, progress is garanteed anyway, because there is exactly one
partition-start fragment in each partition.
Fixes: #13491Closes#13563
This mini-series cleans up printing of ranges in utils/to_string.hh
It generalizes the helper function to work on a std::ranges::range,
with some exceptions, and adds a helper for boost::transformed_range.
It also changes the internal interface by moving `join` the the utils namespace
and use std::string rather than seastar::sstring.
Additional unit tests were added to test/boost/json_test
Fixes#13146Closes#13159
* github.com:scylladb/scylladb:
utils: to_string: get rid of utils::join
utils: to_string: get rid of to_string(std::initializer_list)
utils: to_string: get rid of to_string(const Range&)
utils: to_string: generalize range helpers
test: add string_format_test
utils: chunked_vector: add std::ranges::range ctor
Now when the test case and used lib/utils code is using storage-agnostic
approach, it can be extended to run over S3 storage as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The resharding test compares bloom filter sizes before and after reshard
runs. For that it gets the filter on-disk filename and stat()s it. That
won't work with S3 as it doesn't have its accessable on-disk files.
Some time ago there existed the storage::get_stats() method, but now
it's gone. The new s3::client::get_object_stat() is coming, but it will
take time to switch to it. For now, generalize filter size fetching into
a local lambda. Next patch will make a stub in it for S3 case, and once
the get_object_stat() is there we'll be able to smoothly start using it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test case in question spawns async context then makes the test_env
instance on the stack (and stopper for it too). There's helper for the
above steps, better to use them.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Most of the sstable_datafile test cases are capable of running with S3
storage, so this patch makes the simplest of them do it. Patching the
rest from this file is optional, because mostly the cases test how the
datafile data manipulations work without checking the files
manipulations. So even if making them all run over S3 is possible, it
will just increase the testing time w/o real test of the storage driver.
So this patch makes one test case run over local and S3 storages, more
patches to update more test cases with files manipulations are yet to
come.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In the previous patch we added a limit in Alternator for the magnitude
and precision of numbers, based on a function get_magnitude_and_precision
whose implementation was, unfortunately, rather elaborate and delicate.
Although we did add in the previous patches some end-to-end tests which
confirmed that the final decision made based on this function, to accept or
reject numbers, was a correct decision in a few cases, such an elaborate
function deserves a separate unit test for checking just that function
in isolation. In fact, this unit tests uncovered some bugs in the first
implementation of get_magnitude_and_precision() which the other tests
missed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
It's unused.
Just in case, add a unit test case for using the fmt library to
format it (that includes fmt::to_string(std::initializer_list)).
Note that the existing to_string implementation
used square brackets to enclose the initializer_list
but the new, standardized form uses curly braces.
This doesn't break anything since to_string(initializer_list)
wasn't used.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As seen in https://github.com/scylladb/scylladb/issues/13146
the current implementation is not general enough
to provide print helpers for all kind of containers.
Modernize the implementation using templates based
on std::ranges::range and using fmt::join.
Extend unit test for formatting different types of ranges,
boost::transformed ranges, deque.
Fixes#13146
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
No point in going through the vector<mutation> entry-point
just to discover in run time that it was called
with a single-element vector, when we know that
in advance.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#13733
Let's remove `expr::token` and replace all of its functionality with `expr::function_call`.
`expr::token` is a struct whose job is to represent a partition key token.
The idea is that when the user types in `token(p1, p2) < 1234`, this will be internally represented as an expression which uses `expr::token` to represent the `token(p1, p2)` part.
The situation with `expr::token` is a bit complicated.
On one hand side it's supposed to represent the partition token, but sometimes it's also assumed that it can represent a generic call to the `token()` function, for example `token(1, 2, 3)` could be a `function_call`, but it could also be `expr::token`.
The query planning code assumes that each occurence of expr::token
represents the partition token without checking the arguments.
Because of this allowing `token(1, 2, 3)` to be represented as `expr::token` is dangerous - the query planning might think that it is `token(p1, p2, p3)` and plan the query based on this, which would be wrong.
Currently `expr::token` is created only in one specific case.
When the parser detects that the user typed in a restriction which has a call to `token` on the LHS it generates `expr::token`.
In all other cases it generates an `expr::function_call`.
Even when the `function_call` represents a valid partition token, it stays a `function_call`. During preparation there is no check to see if a `function_call` to `token` could be turned into `expr::token`. This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented as `expr::token` and the query planner handles that, but sometimes it might be represented as `function_call`, which the query planner doesn't handle.
There is also a problem because there's a lot of code duplication between a `function_call` and `expr::token`.
All of the evaluation and preparation is the same for `expr::token` as it's for a `function_call` to the token function.
Currently it's impossible to evaluate `expr::token` and preparation has some flaws, but implementing it would basically consist of copy-pasting the corresponding code from token `function_call`.
One more aspect is multi-table queries.
With `expr::token` we turn a call to the `token()` function into a struct that is schema-specific.
What happens when a single expression is used to make queries to multiple tables? The schema is different, so something that is represented as `expr::token` for one schema would be represented as `function_call` in the context of a different schema.
Translating expressions to different tables would require careful manipulation to convert `expr::token` to `function_call` and vice versa. This could cause trouble for index queries.
Overall I think it would be best to remove `expr::token`.
Although having a clear marker for the partition token is sometimes nice for query planning, in my opinion the pros are outweighted by the cons.
I'm a big fan of having a single way to represent things, having two separate representations of the same thing without clear boundaries between them causes trouble.
Instead of having both `expr::token` and `function_call` we can just have the `function_call` and check if it represents a partition token when needed.
Refs: #12906
Refs: #12677Closes: #12905Closes#13480
* github.com:scylladb/scylladb:
cql3: remove expr::token
cql3: keep a schema in visitor for extract_clustering_prefix_restrictions
cql3: keep a schema inside the visitor for extract_partition_range
cql3/prepare_expr: make get_lhs_receiver handle any function_call
cql3/expr: properly print token function_call
expr_test: use unresolved_identifier when creating token
cql3/expr: split possible_lhs_values into column and token variants
cql3/expr: fix error message in possible_lhs_values
cql3: expr: reimplement is_satisfied_by() in terms of evaluate()
cql3/expr: add a schema argument to expr::replace_token
cql3/expr: add a comment for expr::has_partition_token
cql3/expr: add a schema argument to expr::has_token
cql3: use statement_restrictions::has_token_restrictions() wherever possible
cql3/expr: add expr::is_partition_token_for_schema
cql3/expr: add expr::is_token_function
cql3/expr: implement preparing function_call without a receiver
cql3/functions: make column family argument optional in functions::get
cql3/expr: make it possible to prepare expr::constant
cql3/expr: implement test_assignment for column_value
cql3/expr: implement test_assignment for expr::constant
Let's remove expr::token and replace all of its functionality with expr::function_call.
expr::token is a struct whose job is to represent a partition key token.
The idea is that when the user types in `token(p1, p2) < 1234`,
this will be internally represented as an expression which uses
expr::token to represent the `token(p1, p2)` part.
The situation with expr::token is a bit complicated.
On one hand side it's supposed to represent the partition token,
but sometimes it's also assumed that it can represent a generic
call to the token() function, for example `token(1, 2, 3)` could
be a function_call, but it could also be expr::token.
The query planning code assumes that each occurence of expr::token
represents the partition token without checking the arguments.
Because of this allowing `token(1, 2, 3)` to be represented
as expr::token is dangerous - the query planning
might think that it is `token(p1, p2, p3)` and plan the query
based on this, which would be wrong.
Currently expr::token is created only in one specific case.
When the parser detects that the user typed in a restriction
which has a call to `token` on the LHS it generates expr::token.
In all other cases it generates an `expr::function_call`.
Even when the `function_call` represents a valid partition token,
it stays a `function_call`. During preparation there is no check
to see if a `function_call` to `token` could be turned into `expr::token`.
This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented
as `expr::token` and the query planner handles that, but sometimes it might
be represented as `function_call`, which the query planner doesn't handle.
There is also a problem because there's a lot of duplication
between a `function_call` and `expr::token`. All of the evaluation
and preparation is the same for `expr::token` as it's for a `function_call`
to the token function. Currently it's impossible to evaluate `expr::token`
and preparation has some flaws, but implementing it would basically
consist of copy-pasting the corresponding code from token `function_call`.
One more aspect is multi-table queries. With `expr::token` we turn
a call to the `token()` function into a struct that is schema-specific.
What happens when a single expression is used to make queries to multiple
tables? The schema is different, so something that is representad
as `expr::token` for one schema would be represented as `function_call`
in the context of a different schema.
Translating expressions to different tables would require careful
manipulation to convert `expr::token` to `function_call` and vice versa.
This could cause trouble for index queries.
Overall I think it would be best to remove expr::token.
Although having a clear marker for the partition token
is sometimes nice for query planning, in my opinion
the pros are outweighted by the cons.
I'm a big fan of having a single way to represent things,
having two separate representations of the same thing
without clear boundaries between them causes trouble.
Instead of having expr::token and function_call we can
just have the function_call and check if it represents
a partition token when needed.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
One test for expr::token uses raw column identifier
in the test.
Let's change it to unresloved_identifier, which is
a standard representation of unresolved column
names in expressions.
Once expr::token is removed it will be possible
to create a function_call with unresolved_identifiers
as arguments.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function to check whether the expression
represents a partition token - that is a call
to the token function with consecutive partition
key columns as the arguments.
For example for `token(p1, p2, p3)` this function
would return `true`, but for `token(1, 2, 3)` or `token(p3, p2, p1)`
the result would be `false`.
The function has a schema argument because a schema is required
to get the list of partition columns that should be passed as
arguments to token().
Maybe it would be possible to infer the schema from the information
given earlier during prepare_expression, but it would be complicated
and a bit dangerous to do this. Sometimes we operate on multiple tables
and the schema is needed to differentiate between them - a token() call
can represent the base table's partition token, but for an index table
this is just a normal function call, not the partition token.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function that can be used to check
whether a given expression represents a call
to the token() function.
Note that a call to token() doesn't mean
that the expression represents a partition
token - it could be something like token(1, 2, 3),
just a normal function_call.
The code for checking has been taken from functions::get.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Currently trying to do prepare_expression(function_call)
with a nullptr receiver fails.
It should be possible to prepare function calls without
a known receiver.
When the user types in: `token(1, 2, 3)`
the code should be able to figure out that
they are looking for a function with name `token`,
which takes 3 integers as arguments.
In order to support that we need to prepare
all arguments that can be prepared before
attempting to find a function.
Prepared expressions have a known type,
which helps to find the right function
for the given arguments.
Additionally the current code for finding
a function requires all arguments to be
assignment_testable, which requires to prepare
some expression types, e.g column_values.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
try_prepare_expression(constant) used to throw an error
when trying to prepeare expr::constant.
It would be useful to be able to do this
and it's not hard to implement.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.
Things achieved in this PR:
- You can start a cluster and create a keyspace whose tables will use
tablet-based replication. This is done by setting `initial_tablets`
option:
```
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
'replication_factor': 3,
'initial_tablets': 8};
```
All tables created in such a keyspace will be tablet-based.
Tablet-based replication is a trait, not a separate replication
strategy. Tablets don't change the spirit of replication strategy, it
just alters the way in which data ownership is managed. In theory, we
could use it for other strategies as well like
EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
is augmented to support tablets.
- You can create and drop tablet-based tables (no DDL language changes)
- DML / DQL work with tablet-based tables
Replicas for tablet-based tables are chosen from tablet metadata
instead of token metadata
Things which are not yet implemented:
- handling of views, indexes, CDC created on tablet-based tables
- sharding is done using the old method, it ignores the shard allocated in tablet metadata
- node operations (topology changes, repair, rebuild) are not handling tablet-based tables
- not integrated with compaction groups
- tablet allocator piggy-backs on tokens to choose replicas.
Eventually we want to allocate based on current load, not statically
Closes#13387
* github.com:scylladb/scylladb:
test: topology: Introduce test_tablets.py
raft: Introduce 'raft_server_force_snapshot' error injection
locator: network_topology_strategy: Support tablet replication
service: Introduce tablet_allocator
locator: Introduce tablet_aware_replication_strategy
locator: Extract maybe_remove_node_being_replaced()
dht: token_metadata: Introduce get_my_id()
migration_manager: Send tablet metadata as part of schema pull
storage_service: Load tablet metadata when reloading topology state
storage_service: Load tablet metadata on boot and from group0 changes
db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
migration_notifier: Introduce before_drop_keyspace()
migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
test: perf: Introduce perf-tablets
test: Introduce tablets_test
test: lib: Do not override table id in create_table()
utils, tablets: Introduce external_memory_usage()
db: tablets: Add printers
db: tablets: Add persistence layer
dht: Use last_token_of_compaction_group() in split_token_range_msb()
locator: Introduce tablet_metadata
dht: Introduce first_token()
dht: Introduce next_token()
storage_proxy: Improve trace-level logging
locator: token_metadata: Fix confusing comment on ring_range()
dht, storage_proxy: Abstract token space splitting
Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
db: Introduce get_non_local_vnode_based_strategy_keyspaces()
service: storage_proxy: Avoid copying keyspace name in write handler
locator: Introduce per-table replication strategy
treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
locator: Introduce effective_replication_map
locator: Rename effective_replication_map to vnode_effective_replication_map
locator: effective_replication_map: Abstract get_pending_endpoints()
db: Propagate feature_service to abstract_replication_strategy::validate_options()
db: config: Introduce experimental "TABLETS" feature
db: Log replication strategy for debugging purposes
db: Log full exception on error in do_parse_schema_tables()
db: keyspace: Remove non-const replication strategy getter
config: Reformat
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.
fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.
in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.
sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.
also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.
please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.
this change is a cleanup to modernize the code base with C++20
features.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13687
Common compression libraries work on contiguous buffers.
Contiguous buffers are a problem for the allocator. However, as long as they are short-lived,
we can avoid the expensive allocations by reusing buffers across tasks.
This idea is already applied to the compression of CQL frames, but with some deficiencies.
`utils: redesign reusable_buffer` attempts to improve upon it in a few ways. See its commit message for an extended discussion.
Compression buffer reuse also happens in the zstd SSTable compressor, but the implementation is misguided. Every `zstd_processor` instance reuses a buffer, but each instance has its own buffer. This is very bad, because a healthy database might have thousands of concurrent instances (because there is one for each sstable reader). Together, the buffers might require gigabytes of memory, and the reuse actually *increases* memory pressure significantly, instead of reducing it.
`zstd: share buffers between compressor instances` aims to improve that by letting a single buffer be shared across all instances on a shard.
Closes#13324
* github.com:scylladb/scylladb:
zstd: share buffers between compressor instances
utils: redesign reusable_buffer
Large contiguous buffers put large pressure on the allocator
and are a common source of reactor stalls. Therefore, Scylla avoids
their use, replacing it with fragmented buffers whenever possible.
However, the use of large contiguous buffers is impossible to avoid
when dealing with some external libraries (i.e. some compression
libraries, like LZ4).
Fortunately, calls to external libraries are synchronous, so we can
minimize the allocator impact by reusing a single buffer between calls.
An implementation of such a reusable buffer has two conflicting goals:
to allocate as rarely as possible, and to waste as little memory as
possible. The bigger the buffer, the more likely that it will be able
to handle future requests without reallocation, but also the memory
memory it ties up.
If request sizes are repetitive, the near-optimal solution is to
simply resize the buffer up to match the biggest seen request,
and never resize down.
However, if we anticipate pathologically large requests, which are
caused by an application/configuration bug and are never repeated
again after they are fixed, we might want to resize down after such
pathological requests stop, so that the memory they took isn't tied
up forever.
The current implementation of reusable buffers handles this by
resizing down to 0 every 100'000 requests.
This patch attempts to solve a few shortcomings of the current
implementation.
1. Resizing to 0 is too aggressive. During regular operation, we will
surely need to resize it back to the previous size again. If something
is allocated in the hole left by the old buffer, this might cause
a stall. We prefer to resize down only after pathological requests.
2. When resizing, the current implementation allocates the new buffer
before freeing the old one. This increases allocator pressure for no
reason.
3. When resizing up, the buffer is resized to exactly the requested
size. That is, if the current size is 1MiB, following requests
of 1MiB+1B and 1MiB+2B will both cause a resize.
It's preferable to limit the set of possible sizes so that every
reset doesn't tend to cause multiple resizes of almost the same size.
The natural set of sizes is powers of 2, because that's what the
underlying buddy allocator uses. No waste is caused by rounding up
the allocation to a power of 2.
4. The interval of 100'000 uses is both too low and too arbitrary.
This is up for discussion, but I think that it's preferable to base
the dynamics of the buffer on time, rather than the number of uses.
It's more predictable to humans.
The implementation proposed in this patch addresses these as follows:
1. Instead of resizing down to 0, we resize to the biggest size
seen in the last period.
As long as at least one maximal (up to a power of 2) "normal" request
appears each period, the buffer will never have to be resized.
2. The capacity of the buffer is always rounded up to the nearest
power of 2.
3. The resize down period is no longer measured in number of requests
but in real time.
Additionally, since a shared buffer in asynchronous code is quite a
footgun, some rudimentary refcounting is added to assert that only
one reference to the buffer exists at a time, and that the buffer isn't
downsized while a reference to it exists.
Fixes#13437
std::rel_ops was deprecated in C++20, as C++20 provides a better solution for defining comparison operators. and all the use cases previously to be addressed by `using namespace std::rel_ops` have been addressed either by `operator<=>` or the default-generated `operator!=`.
so, in this series, to avoid using deprecated facilities, let's drop all these `using namespace std::rel_ops`. there are many more cases where we could either use `operator<=>` or the default-generated `operator!=` to simplify the implementation. but here, we care more about `std::rel_ops`, we will drop the most (if not all of them) of the explicitly defined `operator!=` and other comparison operators later.
Closes#13676
* github.com:scylladb/scylladb:
treewide: do not use std::rel_ops
dht: token: s/tri_compare/operator<=>/
std::rel_ops was deprecated in C++20, as C++20 provides a better
solution for defining comparison operators. and all the use cases
previously to be addressed by `using namespace std::rel_ops` have
been addressed either by `operator<=>` or the default-generated
`operator!=`.
so, in this change, to avoid using deprecated facilities, let's
drop all these `using namespace std::rel_ops`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `range_tombstone_list` and `range_tombstone_entry`
without the help of `operator<<`.
the corresponding `operator<<()` for `range_tombstone_entry` is moved
into test, where it is used. and the other one is dropped in this change,
as all its callers are now using fmtlib for formatting now.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13627
Adds a reproducer for #12462, which doesn't manifest in master any
more after f73e2c992f. It's still useful
to keep the test to avoid regresions.
The bug manifests by reader throwing:
std::logic_error: Stream ends with an active range tombstone: {range_tombstone_change: pos={position: clustered,ckp{},-1}, {tombstone: timestamp=-9223372036854775805, deletion_time=2}}
The reason is that prior to the rework of the cache reader,
range_tombstone_generator::flush() was used with end_of_range=true to
produce the closing range_tombstone_change and it did not handle
correctly the case when there are two adjacent range tombstones and
flush(pos, end_of_range=true) is called such that pos is the boundary
between the two.
Closes#13665
Consolidate `bytes_view_hasher` and abstract_replication_strategy `factory_key_hasher` which are the same into a reusable utils::basic_xx_hasher.
To be used in a followup series for netw:msg_addr.
Closes#13530
* github.com:scylladb/scylladb:
utils: hashing: use simple_xx_hasher
utils: hashing: add simple_xx_hasher
utils: hashers: add HasherReturning concept
hashing: move static_assert to source file
this change ensures that `dk._key` is formatted with the "pk" prefix.
as in 3738fcb, the `operator<<` for partition_key was removed. so the
compiler has to find an alternative when trying to fulfill the needs
when this operator<< is called. fortunately, from the compiler's
perspective, `partition_key` has an `operator managed_bytes_view`, and
this operator does not have the explicit specifier, and,
`managed_bytes_view` does support `operator<<`. so this ends up with a
change in the format of `decorated_key` when it is printed using
`operator<<`. the code compiles. but unfortunately, the behavior is
changed, and it breaks scylla-dtest/cdc_tracing_info_test.py where the
partition_key is supposed to be printed like "pk{010203}" instead of
"010203". the latter is how `managed_bytes_view` is formatted.
a test is added accordingly to avoid future changes which break the
dtest.
Fixes scylladb#13628
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13653
Currently, scans are splitting partition ranges around tokens. This
will have to change with tablets, where we should split at tablet
boundaries.
This patch introduces token_range_splitter which abstracts this
task. It is provided by effective_replication_map implementation.
This reverts commit 95bf8eebe0.
Later patches will adapt this code to work with token_range_splitter,
and the unit test added by the reverted commit will start to fail.
The unit test asks the query_ranges_to_vnodes_generator to split the range:
[t:end, t+1:start)
around token t, and expects the generator to produce an empty range
[t:end, t:end]
After adapting this code to token_range_splitter, the input range will
not be split because it is recognized as adjacent to t:end, and the
optimization logic will not kick in. Rather than adding more logic to
handle this case, I think it's better to drop the optimization, as it
is not very useful (rarely happens) and not required for correctness.
Will be used by tablet-based replication strategies, for which
effective replication map is different per table.
Also, this patch adapts existing users of effective replication map to
use the per-table effective replication map.
For simplicity, every table has an effective replication map, even if
the erm is per keyspace. This way the client code can be uniform and
doesn't have to check whether replication strategy is per table.
Not all users of per-keyspace get_effective_replication_map() are
adapted yet to work per-table. Those algorithms will throw an
exception when invoked on a keyspace which uses per-table replication
strategy.
This series adds handling of null std::unique_ptr to utils::clear_gently
and handling of std::optional and seastar::optimized_optional (both engaged and disengaged cases).
Also, unit tests were added to tests the above cases.
Fixes#13636Closes#13638
* github.com:scylladb/scylladb:
utils: clear_gently: add variants for optional values
utils: clear_gently: do not clear null unique_ptr
All users of global proxy are gone (*), proxy can be made fully main/cql_test_env local.
(*) one test case still needs it, but can get it via cql_test_env
Closes#13616
* github.com:scylladb/scylladb:
code: Remove global proxy
schema_change_test: Use proxy from cql_test_env
test: Carry proxy reference on cql_test_env
this is the first step to the uuid-based generation identifier. the goal is to encapsulate the generation related logic in generator, so its consumers do not have to understand the difference between the int64_t based generation and UUID v1 based generation.
this commit should not change the behavior of existing scylla. it just allows us to derive from `generation_generator` so we can have another generator which generates UUID based generation identifier.
Closes#13073
* github.com:scylladb/scylladb:
replica, test: create generation id using generator
sstables: add generation_generator
test: sstables: use generate_n for generating ids for testing
Implement clear_gently for std:;optional<T>
and seastar::optimized_optional<T> and respective
unit tests.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Otherwise the null pointer is dereferenced.
Add a unit test reproducing the issue
and testing this fix.
Fixes#13636
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The only reason why it's there (right next to compaction_fwd.hh) is
because the database::table_truncate_state subclass needs the definition
of compaction_manager::compaction_reenabler subclass.
However, the former sub is not used outside of database.cc and can be
defined in .cc. Keeping it outside of the header allows dropping the
compaction_manager.hh from database.hh thus greatly reducing its fanout
over the code (from ~180 indirect inclusions down to ~20).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13622
reuse generation_generator for generating generation identifiers for
less repeatings. also, add allow update generator to update its
lastest known generation id.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>