The test case creates non-jumbo upload simk and puts some bytes into it,
then flushes. In order to make sure the fallback did took place the
multipar memory tracker sempahore is broken in advance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
before this change, we create a new UUID for a new sstable managed by the s3_storage, and we use the string representation of UUID defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for naming the objects stored on s3_storage. but this representation is not what we are using for storing sstables on local filesystem when the option of "uuid_sstable_identifiers_enabled" is enabled. instead, we are using a base36-based representation which is shorter.
to be consistent with the naming of the sstables created for local filesystem, and more importantly, to simplify the interaction between the local copy of sstables and those stored on object storage, we should use the same string representation of the sstable identifier.
so, in this change:
1. instead of creating a new UUID, just reuse the generation of the sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the base36-based one (implemented by fmt::formatter<sstables::generation_type>)
Fixes#14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#14406
* github.com:scylladb/scylladb:
sstable: remove _remote_prefix from s3_storage
sstable: switch to uuid identifier for naming S3 sstable objects
this series extracts the the env variables related functions out and remove unused `import`s for better readability.
Closesscylladb/scylladb#15796
* github.com:scylladb/scylladb:
test/pylib: remove duplicated imports
test/pylib: extract the env variable printing into MinIoServer
test/pylib: extract _set_environ() out
default_compaction_progress_monitor returns a reference to a static
object. So, it should be read-only, but its users need to modify it.
Delete default_compaction_progress_monitor and use one's own
compaction_progress_monitor instance where it's needed.
Closesscylladb/scylladb#15800
Said method is called in an allocating section, which will re-try the enclosed lambda on allocation failure. `read_context()` however moves the permit parameter so on the second and later calls, the permit will be in a moved-from state, triggering a `nullptr` dereference and therefore a segfault.
We already have a unit test (`test_exception_safety_of_reads` in `row_cache_test.cc`) which was supposed to cover this, but:
* It only tests range scans, not single partition reads, which is a separate path.
* Turns out allocation failure tests are again silently broken (no error is injected at all). This is because `test/lib/memtable_snapshot_source.hh` creates a critical alloc section which accidentally covers the entire duration of tests using it.
Fixes: #15578Closesscylladb/scylladb#15614
* github.com:scylladb/scylladb:
test/boost/row_cache_test: test_exception_safety_of_reads: also cover single-partition reads
test/lib/memtable_snapshot_source: disable critical alloc section while waiting
row_cache: make_reader_opt(): make make_context() reentrant
This patch adds a reproducer for a minor compatibility between Scylla's
and Cassandra's handling of a prepared statement when a bind marker with
the same name is used more than once, e.g.,
```
SELECT * FROM tbl WHERE p=:x AND c=:x
```
It turns out that Scylla tells the driver that there is only one bind
marker, :x, whereas Cassandra tells the driver that there are two bind
markers, both named :x. This makes no different if the user passes
a map `{'x': 3}`, but if the user passes a tuple, Scylla accepts only
`(3,)` (assigning both bind markers the same value) and Cassandra
accepts only `(3,3)`.
The test added in this patch demonstrates this incompatibility.
It fails on Scylla, passes on Cassandra, and is marked "xfail".
Refs #15559
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#15564
before this change, we create a new UUID for a new sstable managed
by the s3_storage, and we use the string representation of UUID
defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for
naming the objects stored on s3_storage. but this representation is
not what we are using for storing sstables on local filesystem when
the option of "uuid_sstable_identifiers_enabled" is enabled. instead,
we are using a base36-based representation which is shorter.
to be consistent with the naming of the sstables created for local
filesystem, and more importantly, to simplify the interaction between
the local copy of sstables and those stored on object storage, we should
use the same string representation of the sstable identifier.
so, in this change:
1. instead of creating a new UUID, just reuse the generation of the
sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As
we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined
by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the
base36-based one (implemented by
fmt::formatter<sstables::generation_type>)
4. enable the `uuid_sstable_identifers` cluster feature if it is
enabled in the `test_env_config`, so that it the sstable manager
can enable the uuid-based uuid when creating a new uuid for
sstable.
5. throw if the generation of sstable is not UUID-based when
accessing / manipulating an sstable with S3 storage backend. as
the S3 storage backend now relies on this option. as, otherwise
we'd have sstables with key like s3://bucket/number/basename, which
is just unable to serve as a unique id for sstable if the bucket is
shared across multiple tables.
Fixes#14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The following new commands are implemented:
* stop
* compactionhistory
All are associated with tests. All tests (both old and new) pass with both the scylla-native and the cassandra nodetool implementation.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#15649
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement compactionhistory command
tools/scylla-nodetool: implement stop command
mutation/json: extract generic streaming writer into utils/rjson.hh
test/nodetool: rest_api_mock.py: add support for error responses
memtable_snapshot_source starts a background fiber in its constructor,
which compacts LSA memory in a loop. The loop's inside is covered with a
critical alloc section. It also contains a wait on a condition variable
and in its present form the critical section also covers the wait,
effectively turning off allocation failure injection for any test using
the memtable_snapshot_source.
This patch disables the critical alloc section while the loop waits on
the condition variable.
test.py calls `uninstall()` and `stop()` concurrently from exit
artifacts, and `uninstall()` internally calls `stop()`. This leads to
premature releasing of IP addresses from `uninstall()` (returning IPs to
the pool) while the servers using those IPs are still stopping. Then a
server might obtain that IP from the pool and fail to start due to
"Address already in use".
Put a lock around the body of `stop()` to prevent that.
Fixes: scylladb/scylladb#15755Closesscylladb/scylladb#15763
before this change, we print
marshaling error: Value not compatible with type org.apache.cassandra.db.marshal.AsciiType: '...'
but the wording is not quite user friendly, it is a mapping of the
underlying implementation, user would have difficulty understanding
"marshaling" and/or "org.apache.cassandra.db.marshal.AsciiType"
when reading this error message.
so, in this change
1. change the error message to:
Invalid ASCII character in string literal: '...'
which should be more straightforward, and easier to digest.
2. update the test accordingly
please note, the quoted non-ASCII string is preserved instead of
being printed in hex, as otherwise user would not be able to map it
with his/her input.
Refs #14320
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15678
messaging_service.cc depends on idl, but many source files in
scylla-main do no depend on idl, so let's
* move "message/*" into its own directory and add an inter-library
dependency between it and the "idl" library.
* rename the target of "message" under test/manual to "message_test"
to avoid the name collision
this should address the compilation failure of
```
FAILED: CMakeFiles/scylla-main.dir/message/messaging_service.cc.o
/usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT CMakeFiles/scylla-main.dir/message/messaging_service.cc.o -MF CMakeFiles/scylla-main.dir/message/messaging_service.cc.o.d -o CMakeFiles/scylla-main.dir/message/messaging_service.cc.o -c /home/kefu/dev/scylladb/message/messaging_service.cc
/home/kefu/dev/scylladb/message/messaging_service.cc:81:10: fatal error: 'idl/join_node.dist.hh' file not found
^~~~~~~~~~~~~~~~~~~~~~~
```
where the compiler failed to find the included `idl/join_node.dist.hh`,
which is exposed by the idl library as part of its public interface.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15657
It seems that Scylla has more values returned by DeleteTable operation than DynamoDB.
In this patch I added a table status check when generating output.
If we delete the table, values KeySchema, AttributeDefinitions and CreationDateTime won't be returned.
The test has also been modified to check that these attributes are not returned.
Fixes scylladb#14132
Closesscylladb/scylladb#15707
this series
1. let sstable tests using test_env to use uuid-based sstable identifiers by default
2. let the test who requires integer-based identifier keep using it
this should enable us to perform the s3 related test after enforcing the uuid-based identifier for s3 backend, otherwise the s3 related test would fail as it also utilize `test_env`.
Closesscylladb/scylladb#14553
* github.com:scylladb/scylladb:
test: set use_uuid to true by default in sstables::test_env
test: enable test to set uuid_sstable_identifiers
A memtable object contains two logalloc::allocating_section members
that track memory allocation requirements during reads and writes.
Because these are local to the memtable, each time we seal a memtable
and create a new one, these statistics are forgotten. As a result
we may have to re-learn the typical size of reads and writes, incurring
a small performance penalty.
The solution is to move the allocating_section object to the memtable_list
container. The workload is the same across all memtables of the same
table, so we don't lose discrimination here.
The performance penalty may be increased later if log changes to
memory reserve thresholds including a backtrace, so this reduces the
odds of incurring such a penalty.
Closesscylladb/scylladb#15737
pytest changes the test's sys.stdout and sys.stderr to the
captured fds when it captures the outputs of the test. so we
are not able to get the STDOUT_FILENO and STDERR_FILENO in C
by querying `sys.stdout.fileno()` and `sys.stderr.fileno()`.
their return values are not 1 and 2 anymore, unless pytest
is started with "-s".
so, to ensure that we always redirect the child process's
outputs to the log file. we need to use 1 and 2 for accessing
the well-known fds, which are the ones used by the child
process, when it writes to stdout and stderr.
this change should address the problem that the log file is
always empty, unless "-s" is specified.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15560
This is a follow-up for #15279 and it fixes two problems.
First, we restore flushes on writes for the tables that were switched to the schema commitlog if `SCHEMA_COMMITLOG` feature is not yet enabled. Otherwise durability is not guaranteed.
Second, we address the problem with truncation records, which could refer to the old commitlog if any of the switched tables were truncated in the past. If the node crashes later, and we replay schema commitlog, we may skip some mutations since their `replay_position`s will be smaller than the `replay_position`s stored for the old commitlog in the `truncated` table.
It turned out that this problem exists even if we don't switch commitlogs for tables. If the node was rebooted the segment ids will start from some small number - they use `steady_clock` which is usually bound to boot time. This means that if the node crashed we may skip the mutations because their RPs will be smaller than the last truncation record RP.
To address this problem we delete truncation records as soon as commitlog is replayed. We also include a test which demonstrates the problem.
Fixes#15354Closesscylladb/scylladb#15532
* github.com:scylladb/scylladb:
add test_commitlog
system.truncated: Remove replay_position data from truncated on start
main.cc: flush only local memtables when replaying schema commitlog
main.cc: drop redundant supervisor::notify
system_keyspace: flush if schema commitlog is not available
When populating system keyspace the sstable_directory forgets to create upload/ subdir in the tables' datadir because of the way it's invoked from distributed loader. For non-system keyspaces directories are created in table::init_storage() which is self-contained and just creates the whole layout regardless of what.
This PR makes system keyspace's tables use table::init_storage() as well so that the datadir layout is the same for all on-disk tables.
Test included.
fixes: #15708closes: scylladb/scylla-manager#3603Closesscylladb/scylladb#15723
* github.com:scylladb/scylladb:
test: Add test for datadir/ layout
sstable_directory: Indentation fix after previous patch
db,sstables: Move storage init for system keyspace to table creation
Fixes#14870
(Originally suggested by @avikivity). Use commit log stored GC clock min positions to narrow compaction GC bounds.
(Still requires augmented manual flush:es with extensive CL clearing to pass various dtest, but this does not affect "real" execution).
Adds a lowest timestamp of GC clock whenever a CF is added to a CL segment the first time. Because GC clock is wall
clock time and only connected to TTL (not cell/row timestamps), this gives a fairly accurate view of GC low bounds
per segment. This is then (in a rather ugly way) propagated to tombstone_gc_state to narrow the allowed GC bounds for
a CF, based on what is currently left in CL.
Note: this is a rather unoptimized version - no caching or anything. But even so, should not be excessively expensive,
esp. since various other code paths already cache the results.
Closesscylladb/scylladb#15060
* github.com:scylladb/scylladb:
main/cql_test_env: Augment compaction mgr tombstone_gc_state with CL GC info
tombstone_gc_state: Add optional callback to augment GC bounds
commitlog: Add keeping track of approximate lowest GC clock for CF entries
database: Force new commitlog segment on user initiated flush
commitlog: Add helper to force new active segment
Check that commitlog provides durability in case
of a node reboot:
* truncate table T, truncation_record RP=1000;
* clean shutdown node/reboot machine/restart node, now RP=~0
since segment ids count from boot time;
* write some data to T; crash/restart
* check data is retained
In PR #15279 we removed flushes when writing to a number
of tables from the system keyspace. This was made possible
by switching these tables to the schema commitlog.
Schema commitlog is enabled only when the SCHEMA_COMMITLOG
feature is supported by all nodes in the cluster. Before that
these tables will use the regular commitlog, which is not
durable because it uses db::commitlog::sync_mode::PERIODIC. This
means that we may lose data if a node crashes during upgrade
to the version with schema commitlog.
In this commit we fix this problem by restoring flushes
after writes to the tables if the schema commitlog
is not enabled yet.
The patch also contains a test that demonstrates the
problem. We need flush_schema_tables_after_modification
option since otherwise schema changes are not durable
and node fails after restart.
You can now pass `expected_error` to `ManagerClient.decommission_node`
and `ManagerClient.remove_node`. Useful in combination with error
injections, for example.
Closesscylladb/scylladb#15650
Fixes#14870 (yet another alternative solution)
(Originally suggested by @avikivity). Use store GC clock min positions from CL
to narrow compaction GC bounds.
Note: not optimized with caches or anything at this point. Can easily be added
though of course always somewhat risky.
This PR contains several refactoring, related to truncation records handling in `system_keyspace`, `commitlog_replayer` and `table` clases:
* drop map_reduce from `commitlog_replayer`, it's sufficient to load truncation records from the null shard;
* add a check that `table::_truncated_at` is properly initialized before it's accessed;
* move its initialization after `init_non_system_keyspaces`
Closesscylladb/scylladb#15583
* github.com:scylladb/scylladb:
system_keyspace: drop truncation_record
system_keyspace: remove get_truncated_at method
table: get_truncation_time: check _truncated_at is initialized
database: add_column_family: initialize truncation_time for new tables
database: add_column_family: rename readonly parameter to is_new
system_keyspace: move load_truncation_times into distributed_loader::populate_keyspace
commitlog_replayer: refactor commitlog_replayer::impl::init
system_keyspace: drop redundant typedef
system_keyspace: drop redundant save_truncation_record overload
table: rename cache_truncation_record -> set_truncation_time
system_keyspace: get_truncated_position -> get_truncated_positions
The estimation assumes that size of other components are irrelevant,
when estimating the number of partitions for each output sstable.
The sstables are split according to the data file size, therefore
size of other files are irrelevant for the estimation.
With certain data models, like single-row partitions containing small
values, the index could be even larger than data.
For example, assume index is as large as data, then the estimation
would say that 2x more sstables will be generated, and as a result,
each sstable are underestimated to have 2x less keys.
Fix it by only accounting size of data file.
Fixes#15726.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15727
The test checks that
- for non-system keyspace datadir and its staging/ and upload/ subdirs
are created when the table is created _and_ that the directory is
re-populated on boot in case it was explicitly removed
- for system non-virtual tables it checks that the same directory layout
is created on boot
- for system virtual tables it checks that the directory layout doesn't
exist
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently, when the test is executed too quickly, the timestamp
insterted into the 'my_table' table might be the same as the
timestamp used in the SELECT statement for comparison. However,
the statement only selects rows where the inserted timestamp
is strictly lower than current timestamp. As a result, when this
comparison fails, we may skip executing the following comparison,
which uses a user-defined function, due to which the statement
is supposed to fail with an error. Instead, the select statement
simply returns no rows and the test case fails.
To fix this, simply use the less or equal operator instead
of using the strictly less operator for comparing timestamps.
Fixes#15616Closesscylladb/scylladb#15699
This is in order to aid investigation of falkiness of the test, which
fails due to a timeout during scan after cluster restart in debug mode.
See #14746.
I enable trace-level logging for some scylla-side loggers and
inject logging of sent and received messages on the driver side.
Closesscylladb/scylladb#15696
test_storage_service_keyspace_cleanup_with_no_owned_ranges
from test_storage_service.py creates snapshots with tags based
on current time. Thus if a test runs on the same node twice
with time distance short enough, there may be a name collision
between the snapshots from two runs. This will cause the second
run to fail on assertions.
Use new_test_snapshot fixture to drop snapshots after the test.
Delete my_snapshot_tags as it's no longer necessary.
Fixes: #15680.
Closesscylladb/scylladb#15683
In 20ff2ae5e1 mutating endpoints were
changed to use PUT. But some of them return a response, and I forgot to
provide `response_type` parameter to `put_json` (which causes
`RESTClient` to actually obtain the response). These endpoints now
return `None`.
Fix this.
Closesscylladb/scylladb#15674
The logger is not thread safe, so a multithreaded test can concurrently
write into the log, yielding unreadable XMLs.
Example:
boost/sstable_directory_test: failed to parse XML output '/scylladir/testlog/x86_64/release/xml/boost.sstable_directory_test.sstable_directory_shared_sstables_reshard_correctly.3.xunit.xml': not well-formed (invalid token): line 1, column 1351
The critical (today's unprotected) section is in boost/test/utils/xml_printer.hpp:
```
inline std::ostream&
operator<<( custom_printer<cdata> const& p, const_string value )
{
*p << BOOST_TEST_L( "<![CDATA[" );
print_escaped_cdata( *p, value );
return *p << BOOST_TEST_L( "]]>" );
}
```
The problem is not restricted to xml, but the unreadable xml file caused
the test to fail when trying to parse it, to present a summary.
New thread-safe variants of BOOST_REQUIRE and BOOST_REQUIRE_EQUAL are
introduced to help multithreaded tests. We'll start patching tests of
sstable_directory_test that will call BOOST_REQUIRE* from multiple
threads. Later, we can expand its usage to other tests.
Fixes#15654.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15655
for better coverage of uuid-based sstable identifier. since this
option is enabled by default, this also match our tests with the
default behavior of scylladb.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
some of the tests are still relying on the integer-based sstable
identifier, so let's add a method to test_env, so that the tests
relying on this can opt-out. we will change the default setting
of sstables::test_env to use uuid-base sstable identifier in the
next commit. this change does not change the existing behavior.
it just adds a new knob to test_env_config. and let the tests
relying on this to customize the test_env_config to disable
use_uuid.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Make handlers return their results directly, without wrapping them into
`aiohttp.web.Response`s. Instead the wrapping is done in a generic way
when defining the routes.
Some endpoint handlers return JSON, some return text, some return empty
responses.
Reduce the number of different handler types by making the text case a
subcase of the JSON case. This also simplifies some code on the
`ManagerClient` side, which would have to deserialize data from text
(because some endpoint handlers would serialize data into text for no
particular reason). And it will allow reducing boilerplate in later
commits even further.
Exceptions flying from `RESTClient` (which used to communicate with
Scylla's REST API) are in fact not `RuntimeException`s, they are
`HTTPError`s (a type defined in the `rest_client` module). So they would
just fly through our catch branches, and the additional info (such as
log file path) would not be attached. Fix this.
Some of them are already done that way, so turn the rest to asserts as
well for consistency and code terseness, instead of using the `if` +
`raise` pattern.
Since 2f84e820fd it is possible to return errors from
`ScyllaClusterManager` handlers through exceptions without losing the
contents of these exceptions (the contents arrive at `ManagerClient` and
the test can inspect them, unlike in the past where the client would get
a generic `InternalServerError`)
Change all handlers to return errors through exceptions (like some
already do) and get rid of the `ActionReturn` boilerplate.
When checking for `self.cluster`, do it through assertions, like most of
the handlers already do, instead of using the `if` + `raise` pattern.
`ScyllaClusterManager` registers a bunch of HTTP endpoints which
`ManagerClient` uses to perform operations on a cluster during
a topology test.
The endpoints were inconsistently using verbs, like using GET for
endpoints that would have side effects. Use PUT for these.