the invalid sstable id is the NULL of a sstable identifier. with
this concept, it would be a lot simpler to find/track the greatest
generation. the complexity is hidden in the generation_type, which
compares the a) integer-based identifiers b) uuid-based identifiers
c) invalid identitifer in different ways.
so, in this change
* the default constructor generation_type is
now public.
* we don't check for empty generation anymore when loading
SSTables or enumerating them.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we assume that generation is always integer based.
in order to enable the UUID-based generation identifier if the related
option is set, we should populate this option down to generation generator.
because we don't have access to the cluster features in some places where
a new generation is created, a new accessor exposing feature_service from
sstable manager is added.
Fixes#10459
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
UUID_SSTABLE_IDENTIFIERS is a new cluster wide feature. if it is
enabled, all nodes will generate new sstables with the UUID as their
generation identifiers. this feature is configured using
config option of "uuid_sstable_identifiers_enabled".
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
unlike Cassandra 4.1, this option is true by default, will be used
for enabling cluster feature of "UUID_SSTABLE_IDENTIFIERS". not wired yet.
please note, because we are still using sstableloader and
sstabledump based on 3.x branch, while the Cassandra upstream
introduced the uuid sstable identifier in its 4.x branch, these tool
fail to work with the sstables with uuid identifier, so this option
is disabled when performing these tests. we will enable it once
these tools are updated to support the uuid-basd sstable identifiers.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change generalize the value of generation_type so it also
supports UUID based identifier.
* sstables/generation_type.h:
- add formatter and parse for UUID. please note, Cassandra uses
a different format for formatting the SSTable identifier. and
this formatter suits our needs as it uses underscore "_" as the
delimiter, as the file name of components uses dash "-" as the
delimiter. instead of reinventing the formatting or just use
another delimiter in the stringified UUID, we choose to use the
Cassandra's formatting.
- add accessors for accessing the type and value of generation_type
- add constructor for constructing generation_type with UUID and
string.
- use hash for placing sstables with uuid identifiers into shards
for more uniformed distrbution of tables in shards.
* replica/table.cc:
- only update the generator if the given generation contains an
integer
* test/boost:
- add a simple test to verify the generation_type is able to
parse and format
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
simpler this way. and this also matches with the "add" call at the
beginning of this function.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14239
Move the initialization of `storage_proxy` early in the startup procedure, before starting
`system_keyspace`, `messaging_service`, `gossiper`, `storage_service` and more.
As a follow-up, we'll be able to move initialization of `query_processor` right
after `storage_proxy` (but this requires a bit of refactoring in
`query_processor` too).
Local queries through `storage_proxy` can be done after the early initialization step.
In a follow-up, when we do a similar thing for `query_processor`, we'll be able
to perform local CQL queries early as well. (Before starting `gossiper` etc.)
Closes#14231
* github.com:scylladb/scylladb:
main, cql_test_env: initialize `storage_proxy` early
main, cql_test_env: initialize `database` early
storage_proxy: rename `init_messaging_service` to `start_remote`
storage_proxy: don't pass `gossiper&` and `messaging_service&` during initialization
storage_proxy: prepare for missing `remote`
storage_proxy: don't access `remote` during local queries in `query_partition_key_range_concurrent`
db: consistency_level: remove overload of `filter_for_query`
storage_proxy: don't access `remote` when calculating target replicas for local queries
storage_proxy: introduce const version of `remote()`
replica: table: introduce `get_my_hit_rate`
storage_proxy: `endpoint_filter`: remove gossiper dependency
try_emplace() is
- simpler than the lookup-and-insert dance, and
- presumably, it is more efficient.
- also, most importantly, it is simpler to read.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14237
This reverts commit 9526258b89.
Because the issue (#14213) supposed to be fix only exists in the
enterprise branch. And that issue has been fixed in a different
way in a different place.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14234
to reduce the indentation level, and to improve the readability. also, take this opportunity to name some variables for better readability.
Closes#14236
* github.com:scylladb/scylladb:
migration_manager: coroutineize migration_manager::do_merge_schema_from()
migration_manager: coroutineize migration_manager::sync_schema()
Currently, the mapping is initialized from the gossiper state when
group0 server is started and updated from a gossiper change
listener. Gossiper state is restored from system.peers in
storage_service::join_cluster(), which is later than
setup_group0_if_exists() is called.
The restarted server will hang in
group0_service.setup_group0_if_exist(), which waits for snapshot
loading, which waits for storage_service::topology_state_load(), which
waits for IP mapping for servers mentioned in the topology, and
produces logs like this:
WARN 2023-06-12 15:45:21,369 [shard 0] storage_service - (rate limiting dropped 196 similar messages) raft topology: cannot map c94ae68f-869d-4727-8b2f-d40814e395f0 to ip, retrying.
This is a regression after f26179c, where group0 server is initialized
before the gossiper is started.
The fix is to load the mapping from system.peers before group0 is
started. Gossiper state is not available at this point, so we read the
mapping directly from system keyspace. This change will also be needed
to implement messaging by host id, even if raft is disabled, where we
will need to restore the mapping early.
Fixes#14217Closes#14220
This is another part of splitting Scylla initialization into two phases:
local and remote parts. Performing queries is done with `storage_proxy`,
so for local queries we want to initialize it before we initialize
services specific to cluster communication such as `gossiper`,
`messaging_service`, `storage_service`.
`system_keyspace` should also be initialized after `storage_proxy` (and
is after this patch) so in the future we'll be able to merge the
multiple initialization steps of `system_keyspace` into one (it only
needs the local part to work).
We want to separate two phases of Scylla service initialization: first
we initialize the local part, which allows performing local queries,
then a remote part, which requires contacting other nodes in a cluster
and allows performing distributed queries.
The `database` object is crucial for both remote and local queries, but it
was created pretty late, after services such as `gossiper` or
`storage_service` which are used for distributed operations.
Fortunately we can easily move `database` initialization and all of its
prerequisites early in the init procedure.
These services are now passed during `init_messaging_service`, and
that's when the `remote` object is constructed.
The `remote` object is then destroyed in `uninit_messaging_service`.
Also, `migration_manager*` became `migration_manager&` in
`init_messaging_service`.
Prepare the users of `remote` for the possibility that it's gone.
The `remote()` accessor throws an error if it's gone. Observe that
`remote()` is only used in places where it's verified that we really
want to send a message to a remote node, with a small exception:
`truncate_blocking`, which truncates locally by sending an RPC to
ourselves (and truncate always sends RPC to the whole cluster; we might
want to change this behavior in the future, see #11087). Other places
are easy to check (it's either implementations of `apply_remotely`
which is only called for remote nodes, or there's an `if` that checks
we don't apply the operation to ourselves).
There is one direct access to `_remote` which checks first if `_remote`
is available: `storage_proxy::is_alive`. If `_remote` is unavailable, we
consider nodes other than us dead. Indeed, if `gossiper` is unavailable,
we didn't have a chance to gossip with other nodes and mark them alive.
In `query_partition_key_range_concurrent` there's a calculation of cache
hit rates which requires accessing `gossiper` through `remote`. We want
to support local queries when `remote` is unavailable.
Check if it's a local query and only if not, fetch `gossiper` from `remote`.
We only want to access `remote` when it's necessary - when we're
performing a query that involves remote nodes. We want to support local
queries when `remote` (in particular, `gossiper&`) is unavailable.
Add a helper, `storage_proxy::filter_replicas_for_read`, which will
check if it's a local query and return early in that case without
accessing `remote`.
The method sits on sstable, but is called only from fs storage and it's
the only place that really needs it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#14230
After recent changes, all wasm related logic has been moved from
the database class to the query_processor. As a result, the wasm
headers no longer need to be included there, and in particular,
files that include replica/database.hh no longer need to wait
on the generated header rust/wasmtime_bindings.hh to compile.
Fixes#14224Closes#14223
This is a translation of Cassandra's CQL unit test source file
validation/operations/UpdateTest.java into our cql-pytest framework.
There are 18 tests, and they did not reproduce any previously-unknown
bug, but did provide additional reproducers for two known issues:
Refs #12243: Setting USING TTL of "null" should be allowed
Refs #12474: DELETE/UPDATE print misleading error message suggesting
ALLOW FILTERING would work
Note that we knew about this issue for the DELETE operation, and
the new test shows the same issue exists for UPDATE.
I had to modify some of the tests to allow for different error messages
in ScyllaDB (in cases where the different message makes sense), as well
as cases where we decided to allow in Scylla some behaviors that are
forbidden in Cassandra - namely Refs #12472.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#14222
When scylla starts it collects dangling sstables from the datadir. It includes temporary sstable directories and pending-deletion log. S3-backed sstables cannot be garbage-collected like that, instead "garbage" entries from the ownership table should be processed. Currently the g.c. code is unaware of storage and scans datadir for whatever sstable it's called for.
This PR prepares the garbage_collect() call to become virtual, but no-op for ownership-table lister. Proper S3 garbage-collecting is not yet here, it needs an extra patch to seastar http client.
refs: #13024Closes#14023
* github.com:scylladb/scylladb:
sstable_directory: Do not collect filesystem garbage for S3-backed sstables
sstable_directory: Deduplicate .process() location argument
sstable_directory: Keep directory lister on stack
sstable_directory: Use directory_lister API directly
Fixes https://github.com/scylladb/scylladb/issues/14084
This commit adds OS support for version 5.3 to the table on the OS Support by Linux Distributions and Version page.
Closes#14228
* github.com:scylladb/scylladb:
doc: remove OS support for outdated ScyllaDB versions 2.x and 3.x
doc: add OS support for ScyllaDB 5.3
to reduce the indentation level, and to improve the readability.
also, take this opportunity to name some variables for better readability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
per clang's document, -Og is like -O1, which is in turn an optimization
level between -O0 and -O2. -O0 "generates the most debuggable code".
for instance, with -O0, presumably, the variables are not allocated in
the registers and later get overwritten, they are always allocated on
the stack. this helps with the debugging.
in this change, -O0 is used for better debugging experience. the
downside is that the emitted code size will be greater than the one
emitted from -Og, and the executable will be slower.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14210
This reverts commit 7e68ed6a5d. With -Og
simple 'gdb build/debug/scylla -ex start' shows the function parameters
as "optimized out" while with -O0 they display fine. This applies to
all variables, not just main's parameters.
Bisect revealed that this behavior started with the reverted commit; it's
not due to a later toolchain update.
Fixes#14196.
Closes#14197
This PR adds a set of tests for cluster features. The set covers those tests from the test plan for the upcoming "cluster features on raft" that do not depend on implementation-specific details. Those tests are applicable to the existing gossip-based implementation, so they can be useful for us right now.
The tests simulate cluster upgrades by conditionally marking support for the TEST_ONLY_FEATURE on the nodes via error injection. Therefore, the tests work only in non-release mode.
The `test_partial_upgrade_can_be_finished_with_removenode` test is marked as skipped because of a bug in gossip that prevents features from being enabled if a node was removed within the last 3 days (#14194).
Closes#14211
* github.com:scylladb/scylladb:
test/topology: add cluster feature tests
test: introduce get_supported_features/get_enabled_features
test: move wait_for_feature to pylib utils
feature_service: add a test-only cluster feature
count(col), unlike count(*), does not count rows for which col is NULL.
However, if col's data type is not a scalar (e.g. a collection, tuple,
or user-defined type) it behaves like count(*), counting NULLs too.
The cause is that get_dynamic_aggregate() converts count() to
the count(*) version. It works for scalars because get_dynamic_aggregate()
intentionally fails to match scalar arguments, and functions::get() then
matches the arguments against the pre-declared count functions.
As we can only pre-declare count(scalar) (there's an infinite number
of non-scalar types), we change the approach to be the same as min/max:
we make count() a generic function. In fact count(col) is much better
as a generic function, as it only examines its input to see if it is
NULL.
A unit test is added. It passes with Cassandra as well.
Fixes#14198.
Closes#14199
This commit introduces a new boolean flag, `shutdown`, to the
forward_service, along with a corresponding shutdown method. It also
adds checks throughout the forward_service to verify the value of the
shutdown flag before retrying or invoking functions that might use the
messaging service under the hood.
The flag is set before messaging service shutdown, by invoking
forward_service::shutdown in main. By checking the flag before each call
that potentially involves the messaging service, we can ensure that the
messaging service is still operational. If the flag is false, indicating
that the messaging service is still active, we can proceed with the
call. In the event that the messaging service is shutdown during the
call, appropriate exceptions should be thrown somewhere down in called
functions, avoiding potential hangs.
This fix should resolve the issue where forward_service retries could
block the shutdown.
Fixes#12604Closes#13922
The `topology_node_mutation_builder::set` function, when passed a
non-empty set of tokens, will construct a mutation that adds given
tokens to the column instead of overwriting them. This is not a problem
today because we are always calling `set` on an empty column, but given
the fact that the function is called `set` not `add` and other overloads
of `set` do overwrite, the function might be misused in the future.
This commit fixes the problem by initializing the tombstone in
`collection_mutation_description` properly, causing the previous state
to be dropped before applying new tokens. The tombstone has a timestamp
which is one less than the timestamp of the added cells, mimicking the
CQL behavior which happens when a non-frozen collection is overwritten.
Closes#14216
The function used `gossiper&` to check whether an endpoint is considered
alive. Abstract this out through `noncopyable_function`.
This will allow us to use `endpoint_filter` during local queries when
`remote` (which contains the `gossiper` reference) is unavailable.
when compiling the generated source files, sometimes, we can run
into the FTBFS like:
02:18:54 FAILED: build/release/gen/cql3/CqlParser.o
02:18:54 clang++ ... -o build/release/gen/cql3/CqlParser.o build/release/gen/cql3/CqlParser.cpp
...
02:18:54 In file included from build/release/gen/cql3/CqlParser.cpp:44:
02:18:54 In file included from build/release/gen/cql3/CqlParser.hpp:75:
02:18:54 In file included from ./cql3/statements/create_function_statement.hh:12:
02:18:54 In file included from ./cql3/functions/user_function.hh:16:
02:18:54 ./lang/wasm.hh:15:10: fatal error: 'rust/wasmtime_bindings.hh' file not found
02:18:54 #include "rust/wasmtime_bindings.hh"
02:18:54 ^~~~~~~~~~~~~~~~~~~~~~~~~~~
CqlParser.cc is a source file generated from cql3/Cql.g, this source in
turn includes another source file generated from
wasmtime_bindings/src/lib.rs. but we failed to setup this dependency in
the build.ninja rules -- we only teach ninja that "to compile the
grammer source files, please prepare the`serializers` source files
first". but this is not enough.
so, in this change, we just replace `serializers` with `gen_headers`,
as the latter is a superset of the former. and should fulfill the needs
of CqlParser.cc.
Fixes#14213
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14214
This commit adds a set of tests for cluster features. The set covers
those tests from the test plan for the upcoming "cluster features on
raft" that do not depend on implementation-specific details. Those tests
are applicable to the existing gossip-based implementation, so they can
be useful for us right now.
The tests simulate cluster upgrades by conditionally marking support for
the TEST_ONLY_FEATURE on the nodes via error injection. Therefore, the
tests work only in non-release mode.
The `test_partial_upgrade_can_be_finished_with_removenode` test is
marked as skipped because of a bug in gossip that prevents features from
being enabled if a node was removed within the last 3 days (#14194).
Introduces two helper functions that allow getting information about
supported/enabled features on a node, according to its system tables.
As a bonus, the `wait_for_feature` function is refactored to use
`get_enabled_features`.
Fixes https://github.com/scylladb/scylladb/issues/14097
This commit removes support for Ubuntu 18 from
platform support for ScyllaDB Enterprise 2023.1.
The update is in sync with the change made for
ScyllaDB 5.2.
This commit must be backported to branch-5.2 and
branch-5.3.
Closes#14118
Adds a cluster feature called TEST_ONLY_FEATURE. It can only be marked
as supported via error injection "features_enable_test_feature", which
can be enabled on node startup via CLI option or YAML configuration.
The purpose of this cluster feature is to simulate upgrading a node to a
version that supports a new feature. This allows us to write tests which
verify that the cluster feature mechanism works.
The fact that TEST_ONLY_FEATURE can only be enabled via error injection
should make it impossible to accidentally enable it in release mode and,
consequently, in production.
Prior to this `table_name` was validated for every request in `find_table_name` leading to unnecessary overhead (although small, but unnecessary). Now, the `table_name` is only validated while creation reqeust and in other requests iff the table does not exist (to keep compatibility with DynamoDB's exception).
Fixes: #12538Closes#13966
because we build stripped package and non-stripped package in parallel
using ninja. there are chances that the non-stripped build job could
be adding build/node_exporter directory to the tarball while the job
building stripped package is using objcopy to extract the symbols from
the build/node_exporter/node_exporter executable. but objcopy creates
temporary files when processing the executables. and the temporary
files can be spotted by the non-stripped build job. there are two
consequences:
1. non-stripped build job includes the temporary files in its tarball,
even they are not supposed to be distributed
2. non-stripped build job fails to include the temporary file(s), as
they are removed after objcopy finishes its job. but the job did spot
them when preparing the tarball. so when the tarfile python module
tries to include the previous found temporary file(s), it throws.
neither of these consequences is expected. but fortunately, this only
happens when packaging the non-stripped package. when packaging the
stripped package, the build/node_exported directory is not in flux
anymore. as ninja ensures the dependencies between the jobs.
so, in this change, we do not add the whole directory when packaging
the non-stripped version. as all its ingredients have been added
separately as regular files. and when packaing the stripped version,
we still use the existing step, as we don't have to list all the
files created by strip.sh:
node_exporter{,.debug,.dynsyms,.funcsyms,.keep_symbols,.minidebug.xz}
we could do so in this script, but the repeatings is unnecessary and
error-prune. so, let's keep including the whole directory recursively,
so all the debug symbols are included.
Fixes https://github.com/scylladb/scylladb/issues/14079Closes#14081
* github.com:scylladb/scylladb:
create-relocatable-package.py: package build/node_export only for stripped version
create-relocatable-package.py: use positive condition when possible