instead of relying on the operator<<() of an opaque type, use fmtlib
to print a timepoint for better support of new fmtlib which dropped
the default-generated formatter for types with operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16116
This PR contains two patches which get rid of unnecessary sleeps on cql_test_env teardown greatly reducing run time of tests.
Reduces run time of `build/dev/test/boost/schema_change_test` from 90s to 6s.
Closesscylladb/scylladb#16111
* github.com:scylladb/scylladb:
test: cql_test_env: Interrupt all components on cql_test_env teardown
tests: cql_test_env: Skip gossip shutdown sleep
This commit fixes the rollback procedure in
the 5.2-to-5.4 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
is fixed.
- The "Gracefully shutdown ScyllaDB" command
is fixed.
In addition, there are the following updates
to be in sync with the tests:
- The "Backup the configuration file" step is
extended to include a command to backup
the packages.
- The Rollback procedure is extended to restore
the backup packages.
- The Reinstallation section is fixed for RHEL.
Also, I've removed the optional step to enable
consistent schema management from the list of
steps - the appropriate section has already
been removed, but it remained in the procedure
description, which was misleading.
Refs https://github.com/scylladb/scylladb/issues/11907
This commit must be backported to branch-5.4
Closesscylladb/scylladb#16114
On non-interactive mode setup, RHEL/CentOS7 old kernel check causes
"Setup aborted", this is not what we want.
We should keep warning but proceed setup, so default value of the kernel
check should be True, since it will automatically applied on
non-interactive mode.
Fixes#16045Closesscylladb/scylladb#16100
Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place.
refs: #2795
refs: #2737Closesscylladb/scylladb#16103
* github.com:scylladb/scylladb:
storage_service: Drop (un)init_messaging_service_part() pair
storage_service: Init/Deinit RPC handlers in constructor/stop
storage_service: Dont capture container() on RPC handler
storage_service: Use storage_service::_sys_dist_ks in some places
storage_service: Add explicit dependency on system dist. keyspace
storage_service: Rurn query processor pointer into reference
storage_service: Add explicity query_processor dependency
main: Start storage service later
This should interrupt all sleeps in component teardown.
Before this patch, there was a 1s sleep on gossiper shutdown, which I
don't know where it comes from. After the patch there is no such
sleep.
reconcilable_result_builder passes range tombstone changes to _rt_assembler
using table schema, not query schema.
This means that a tombstone with bounds (a; b), where a < b in query schema
but a > b in table schema, will not be emitted from mutation_query.
This is a very serious bug, because it means that such tombstones in reverse
queries are not reconciled with data from other replicas.
If *any* queried replica has a row, but not the range tombstone which deleted
the row, the reconciled result will contain the deleted row.
In particular, range deletes performed while a replica is down will not
later be visible to reverse queries which select this replica, regardless of the
consistency level.
As far as I can see, this doesn't result in any persistent data loss.
Only in that some data might appear resurrected to reverse queries,
until the relevant range tombstone is fully repaired.
This series fixes the bug and adds a minimal reproducer test.
Fixes#10598Closesscylladb/scylladb#16003
* github.com:scylladb/scylladb:
mutation_query_test: test that range tombstones are sent in reverse queries
mutation_query: properly send range tombstones in reverse queries
vla (variable length array) is an extension in GCC and Clang. and
it is not part of the C++ standard.
so let's avoid using it if possible, for better standard compliant.
it's also more consistent with other places where we calculate the size
of an array of T in the same source file.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16084
Currently CREATE KEYSPACE ... WITH STORAGE = { 'type' = 'S3' ... } will create keyspace even if the backend configuration is "invalid" in the sense that the requested endpoint is not known to scylla via object_storage.yaml config file. The first time after that when this misconfiguration will reveal itself is when flushing a memtable (see #15635), but it's good to know the endpoint is not configured earlier than that.
fixes: #15074Closesscylladb/scylladb#16038
* github.com:scylladb/scylladb:
test: Add validation of misconfigured storage creation
sstables: Throw early if endpoint for keyspace is not configured
replica: Move storage options validation to sstables manager
test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store
sstables: Add has_endpoint_client() helper to manager
in this change,
* all `Seastar_OptimizationLevel_*` are dropped.
* mode.Sanitize.cmake:
s/CMAKE_CXX_FLAGS_COVERAGE/CMAKE_CXX_FLAGS_SANITIZE/
* mode.Dev.cmake:
s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/
Seastar_OptimizationLevel_* variables have nothing to do with
Seastar, and they introduce unnecessary indirection. the function
of `update_cxx_flags()` already requires an option name for this
parameter, so there is no need to have a name for it.
the cached entry of `Seastar_OptimizationLevel_DEBUG` is also
dropped, if we really need to have knobs which can be configured
by user, we should define them in a more formal way. at this
moment, this is not necessary. so drop it along with this
variable.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16059
There are three of them, one is used by core, another by tests and the third one passes arguments between those two. And the ..._for_tests() helper in test utils. This PR leaves only one for tests out of three.
Closesscylladb/scylladb#16068
* github.com:scylladb/scylladb:
tests: Shorten the write_memtable_to_sstable_for_test()
replica: Squash two write_memtable_to_sstable()
replica: Coroutinize one of write_memtable_to_sstable() overloads
In an attempt to create a non-local keyspace with unknown endpoint,
there should pop up the configuration exception.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When a keyspace is created it initiaizes the storage for it and
initialization of S3 storage is the good place to check if the endpoint
for the storage is configured at all.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently the cql statement .validate() callback is responsible for
checking if the non-local storage options are allowed with the
respective feature. Next patch will need to extend this check to also
validate the details of the provided storage options, but doing it at
cql level doesn't seem correct -- it's "too far" from query processor
down to sstables manager.
Good news is that there's a lower-level validation of the new keyspace,
namely the database::validate_new_keyspace() call. Move the storage
options validation into sstables manager, while at it, reimplement it
as a visitor to facilitate further extentions and plug the new
validation to the aforementioned database::validate_new_keyspace().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
allow_mutation_read_page_without_live_row is a new option in the
partition_slice::option option set. In a mixed clusters, old nodes
possibly don't know this new option, so its usage must be protected by a
cluster feature. This patch does just that.
Fixes: #15795Closesscylladb/scylladb#15890
We're going to ban creation of a keyspace with S3 type in case the
requested endpoint is not configured. The problem is that this test case
of cql-pytest needs such keyspace to be created and in order to provide
the object storage configuration we'd need to touch the generic scylla
cluster management which is an overill for generic cql-pytest case.
Simpler solution is to make object_store test suite perform all the
S3-related checks, including the way DESCRIBE for S3-backed ks works.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's the get_endpoint_client() peer that only checks the client
presense. To be used by next patches.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The wrapper just calls the test-only core write_memtable_to_sstable()
overload, tests can do it on their own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All the services that need to register RPC handlers do it in service
constructor or .start() method. Unregistration happens in .stop().
Storage service explicitly (de)initializes its RPC handlers in dedicated
calls, but there's no point in that. The handlers' accessibility is
determined by messaging service start_lister/shutdown, handlers
themselves can be registered any time before it and unregistered any
time after it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The handlers are about to be initialized from inside storage_service
constructor. At that time container() is not yet available and its
invalid to capture it on handlers' lambda. Fortunately, there's only one
handler that does it, other handlers capture 'this' and call container()
explicitly. This patch fixes the remaining one to do the same.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The main goal here is to drop sys.dist.ks argument from the
init_messaging_service call to make future patching simpler. While doing
it it turned out that the argument was needed to be passed all the way
down to the mark_existing_views_as_built(), so this patch also dropes
this argument from this whole call-trace.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This effectively reverts bc051387c5 (storage_service: Remove sys_dist_ks
from storage_service dependencies) since now storage service needs the
sys. disk. ks not only cluster join time. Next patch will make more use
of it as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now set via a dedicated call that happens after query processor is
started. Now query processor is started before storage service and the
latter can get the q.p. local reference via constructor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage service is top-level service which depends on many other
services. Recently (see d42685d0cb storage_service: Load tablet
metadata on boot and from group0 changes) it also got implicit
dependency on query processor, but it still starts too early for
explicit reference on q.p.
This patch moves storage service start to later times. This is possible
because storage service is not explicitly needed by any other component
start/init in between its old and new start places. Also, cql_test_ent
starts storage service "that late" too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This miniset addresses two potential conversions to `global_schema_ptr` of incomplete materialized views schemas.
One of them was completely unnecessary and also is a "chicken and an egg" problem where on the sync schema procedure itself a view schema was converted to `global_schema_ptr` solely for the purposes of logging. This can create a
"hickup" in the materialized views updates if they are comming from a node with a different mv schema.
The reason why sometimes a synced schema can have no base info is because of deactivision and reactivision of the schema inside the `schema_registry` which doesn't restore the base information due to lack of context.
When a schema is synced the problem becomes easy since we can just use the latest base information from the database.
Fixes#14011Closesscylladb/scylladb#14861
* github.com:scylladb/scylladb:
migration manager: fix incomplete mv schemas returned from get_schema_for_write
migration_manager: do not globalize potentially incomplete schema
Patch 967ebacaa4 (view_update_generator: Move abort kicking to
do_abort()) moved unplugging v.u.g from database from .stop() to
.do_abort(). The latter call happens very early on stop -- once scylla
receives SIGINT. However, database may still need v.u.g. plugged to
flush views.
This patch moves unplug to later, namely to .stop() method of v.u.g.
which happens after database is drained and should no longer continue
view updates.
fixes: #16001
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#16091
this is a cleanup in `scylla.spec`.
Closesscylladb/scylladb#16097
* github.com:scylladb/scylladb:
dist/redhat: group sub-package preambles together
dist/redhat: drop unused `defines` variable
dist/redhat: remove tags for subpackage which are same as main preamble
this variable was introduced in 6d7d0231. back then, we were still
building the binaries in .spec, but we've switched to the relocatable
package now, so there is no need to use keep these compilation related
flags anymore.
in this change, the `defines` variable is dropped.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this is a cleanup.
if a subpackage is licensed under a different license from the one
specified in the main preamble, we need to use a distinct License
tag on a per-subpackage basis. but if it is licensed with the
identical license, it is not necessary. since all three
subpackages of "*-{server, conf, kernel-conf}" are licensed under
AGPLv3, there is no need to repeat the "License:" tag in their
own preamble section.
the same applies to the "URL" tag.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
get_schema_for_write
Sometimes a view registry can get deactivated inside the schema
registry, this happens due to dactivating and reactivating the registry
entry which doesn't rebuild the base table information in the view.
This error is later caught when trying to convert the schema into a
`global_schema_ptr`, however, the real bug here is that not all schemas
returned from `get_schema_for_write` are suitable for write because the
mv schemas can be incomplete.
This commit changes the aforementioned function in order to fix the bug.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Update node_exporter to 1.7.0.
The previous version (1.6.1) was flagged by security scanners (such as
Trivy) with HIGH-severity CVE-2023-39325. 1.7.0 release fixed that
problem.
[Botond: regenerate frozen toolchain]
Fixes#16085Closesscylladb/scylladb#16086Closesscylladb/scylladb#16090
Fixes#15269
If segment being replayed is corrupted/truncated we can attempt skipping
completely bogues byte amounts, which can cause assert (i.e. crash) in
file_data_source_impl. This is not a crash-level error, so ensure we
range check the distance in the reader.
v2: Add to corrupt_size if trying to skip more than available. The amount added is "wrong", but at least will
ensure we log the fact that things are broken
Closesscylladb/scylladb#15270
There was a case where maybe sync function of a materialized view could
fail to sync if the view version was old. This is because adding the
base information to the view is only relevant until the record is
synced. This triggers an internal error in the `global_schem_ptr`
constructor.
The conversion to global pointer in that case was solely for logging
purposes so instead, we pass the pieces of information needed for the
logging itself.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
this change is a cleanup to add `-Wignore-qualifiers` when building the tree.
to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.
and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:
```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
245 | const index_metadata_kind kind() const;
| ^~~~~
```
so this change also silences the above warnings.
Closesscylladb/scylladb#16083
* github.com:scylladb/scylladb:
build: enable -Wignore-qualifiers
treewide: do not mark return value const if this has no effect
`-Wignore-qualifiers` is included by -Wextra. but we are not there yet,
with this change, we can keep the changes introducing -Wignore-qualifiers
warnings out of the repo, before applying `-Wextra`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change is a cleanup.
to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.
and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:
```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
245 | const index_metadata_kind kind() const;
| ^~~~~
```
so this change also silences the above warnings.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This is a loose collection of fixes to rare row cache bugs flushed out by running test_concurrent_reads_and_eviction several million times. See individual commits for details.
Fixes#15483Closesscylladb/scylladb#15945
* github.com:scylladb/scylladb:
partition_version: fix violation of "older versions are evicted first" during schema upgrades
cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound()
cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity()
cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads
partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads
cache_flat_mutation_reader: fix some cache mispopulations with reverse reads
cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads
cache_flat_mutation_reader: never make an unlinked last dummy continuous
A schema upgrade appends a MVCC version B after an existing version A.
The last dummy in B is added to the front of LRU,
so it will be evicted after the entries in A.
This alone doesn't quite violate the "older versions are evicted first" rule,
because the new last dummy carries no information. But apply_monotonically
generally assumes that entries on the same position have the obvious
eviction order, even if they carry no information. Thus, after the merge,
the rule can become broken.
The proposed fix is as follows:
- In the case where A is merged into B, the merged last dummy
inherits the link of A.
- The merging of B into anything is prevented until its merge with A is finished.
This is relatively hacky, because it still involves a state that
goes against some natural expectations granted by the "older versions..."
rule. A less hacky fix would be to ensure that the new dummy is inserted
into a proper place in the eviction order to begin with.
Or, better yet, we could eliminate the rule altogether.
Aside from being very hard to maintain, it also prevents the introduction
of any eviction algorithm other than LRU.
ensure_population_lower_bound() guarantees that _last_row is valid or null.
However, it fails to provide this guarantee in the special rare case when
`_population_range_starts_before_all_rows == true` and _last_row is non-null.
(This can happen in practice if there is a dummy at before_all_clustering_rows
and eviction makes the `(before_all_clustering_rows, ...)` interval
discontinous. When the interval is read in this state, _last_row will point to
the dummy, while _population_range_starts_before_all_rows will still be true.)
In this special case, `ensure_population_lower_bound()` does not refresh
`_last_row`, so it can be non-null but invalid after the call.
If it is accessed in this state, undefined behaviour occurs.
This was observed to happen in a test,
in the `read_from_underlying() -- maybe_drop_last_entry()` codepath.
The proposed fix is to make the meaning of _population_range_starts_before_all_rows
closer to its real intention. Namely: it's supposed to handle the special case of a
left-open interval, not the case of an interval starting at -inf.
To reflect the final range tombstone change in the populated range,
maybe_update_continuity() might insert a dummy at `before_key(_next_row.table_position())`.
But the relevant logic breaks down in the special case when that position is
equal to `_last_row.position()`. The code treats the dummy as a part of
the (_last_row, _next_row) range, but this is wrong in the special case.
This can lead to inconsistent state. For example, `_last_row` can be wrongly made
continuous, or its range tombstone can be wrongly nulled.
The proposed fix is to only modify the dummy if it was actually inserted.
If it had been inserted beforehand (which is true in the special case, because
of the `ensure_population_lower_bound()` call earlier), then it's already in a
valid state and doesn't need changes.
Cache population routines insert new row entries.
In non-reverse reads, the new entries (except for the lower bound of the query
range) are filled with the correct continuity and range tombstones immediately
after insertion, because that information has already arrived from underlying.
when the entries are inserted.
But in reverse reads, it's the interval *after* the newly-inserted entry
that's made continuous. The continuity information in the new entries isn't
filled. When two population routines race, the one which comes later can
punch holes in the continuity left by the first routine, which can break
the "older versions are evicted first" rule and revert the affected
interval to an older version.
To fix this, we must make sure that inserting new row entries doesn't
change the total continuity of the version.
The FIXME comment claims that setting continity isn't very important in this
place, but in fact this is just wrong.
If two calls to read_from_underlying() get into a race, the one which finishes
later can call ensure_entry_in_latest() on a position which lies inside a
continuous interval in the newest version. If we don't take care to preserve
the total continuity of the version, this can punch a hole in the continuity of the
newest version, potentially reverting the affected interval to an older version.
Fix that.
`_last_row` is in table schema, but it is sometimes compared with positions in
query schema. This leads to unexpected behaviour when reverse reads
are used.
The previous patch fixed one such case, which was affecting correctness.
As far as I can tell, the three cases affected by this patch aren't
a correctness problem, but can cause some intervals to fail to be made continuous.
(And they won't be cached even if the same read is repeated many times).