There's one place where test case calls for storage proxy and currently
does it via global refernece. Time to switch it to cql_test_env's one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All sharded<> services are created by cql_test_env on the stack. The
cql_test_env() is then used to keep references on some of them and to
export them to test cases via its methods. Proxy is missing on that
exportable list, but will be needed, so add one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The tests in question are using MINIO_SERVER_ADDRESS environment variable to export minio server address from pylib to test cases. Also they use hard-coded public bucket name. Both plays badly with AWS S3, the former due to MINIO_... in its name and the latter because public bucket name can be any.
So this PR puts address and public bucket name into S3_..._FOR_TEST environment variables and fixes output stream closure on failure while at it.
Detached from #13493Closes#13546
* github.com:scylladb/scylladb:
s3/test: Rename MINIO_SERVER_ADDRESS environment variable
s3/test: Keep public bucket name in environment
s3/test: Fix upload stream closure
test/lib: Add getenv_safe() helper
This `with` context is supposed to disable, then re-enable
autocompaction for the given keyspaces, but it used the wrong API for
it, it used the column_family/autocompaction API, which operates on
column families, not keyspaces. This oversight led to a silent failure
because the code didn't check the result of the request.
Both are fixed in this patch:
* switch to use `storage_service/auto_compaction/{keyspace}` endpoint
* check the result of the API calls and report errors as exceptions
Fixes: #13553Closes#13568
server to see other servers after start/restart
When starting/restarting a server, provide a way to wait for the server
to see at least n other servers.
Also leave the implementation methods available for manual use and
update previous tests, one to wait for a specific server to be seen, and
one to wait for a specific server to not be seen (down).
Fixes#13147
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#13438
The logger instancewas removed in a previous commit but it is used in
the wrapper helper. Add it back.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
This short PR fixes a bug in SUM() aggregation where if the data contains +Inf and -Inf the returned sum should be NaN but we returned an error instead. This is a recent regression uncovered by a dtest (see issue #13551), but in the first patch we add additional tests in the cql-pytest framework which reproduce this bug and explore various other areas (wrongly) implicated by the failing dtest.
Fixes#13551Closes#13564
* github.com:scylladb/scylladb:
cql3: allow SUM() aggregation to result in a NaN
test/cql-pytest: add tests for data casts and inf in sums
Using it the pylib minio code export minio address for tests. This
creates unneeded WTFs when running the test over AWS S3, so it's better
to rename to variable not to mention MINIO at all.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Local test.py runs minio with the public 'testbucket' bucket and all
test cases know that. This series adds an ability to run tests over real
S3 so the bucket name should be configurable.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
If multipart upload fails for some reason the output stream remains not
closed and the respective assertion masquerades the original failure.
Fix that by closing the stream in all cases.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The helper is like ::getenv() but checks if the variable exists and
throws descriptive exception. So instead of
fatal error: in "...": std::logic_error: basic_string: construction from null is not valid
one could get something like
fatal error: in "...": std::logic_error: Environment variable ... not set
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When floating-point data contains +Inf and -Inf, the sum is NaN.
Our SUM() aggregation calculated this sum correctly, but then instead
of returning it, complained that the sum overflowed by narrowing.
This was a false positive: The sum() finalizer wanted to test that no
precision was lost when casting the accumulator to the result type,
so checked that the result before and after the cast are the same.
But specifically for NaN, it is never equal to anything - not even
to itself. This check is wrong for floating point, but moreover -
isn't even necessary when the two types (accumulator type and result
type) are identical so in this patch we skip it in this case.
Note that in the current code, a different accumulator and result type
is only used in the case of integer types; When accumulating floating
point sums, the same type is used, so the broken check will be avoided.
The test for this issue starts to pass with this patch, so the xfail
tag is removed.
Fixes#13551
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds tests to reproduce issue #13551. The issue, discovered
by a dtest (cql_cast_test.py), claimed that either cast() or sum(cast())
from varint type broke. So we add two tests in cql-pytest:
1. A new test file, test_cast_data.py, for testing data casts (a
CAST (...) as ... in a SELECT), starting with testing casts from
varint to other types.
The test uncovers a lot of interesting cases (it is heavily
commented to explain these cases) but nothing there is wrong
and all tests pass on Scylla.
2. An xfailing test for sum() aggregate of +Inf and -Inf. It turns out
that this caused #13551. In Cassandra and older Scylla, the sum
returned a NaN. In Scylla today, it generates a misleading
error message.
As usual, the tests were run on both Cassandra (4.1.1) and Scylla.
Refs #13551.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Such namespace-wide imports can create conflicts between names that
are the same in seastar and std, such as {std,seastar}::future and
{std,seastar}::format, since we also have 'using namespace seastar'.
Replace the namespace imports with explicit qualification, or with
specific name imports.
Closes#13528
Currently if index_node throws when trying to
add an already indexed node, pop_node might
unindex the existing node instead of the new one.
Instead, with this change, unindex_node looks up
the node by its pointer and removed it from the
index map only if it's found there so to clean up
safely after index_node throws (at any stage).
Add a unit test to verify that.
In addition, added a unit test to reproduce #13502 and test the fix.
Closes#13512
* github.com:scylladb/scylladb:
test: locator_topology: add test_update_node
topology: add_node, unindex_node: make exception safe
this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print following classes without the help of `operator<<`.
- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper
the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.
the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.
Refs scylladb#13245
Closes#13513
* github.com:scylladb/scylladb:
keys: consolidate the formatter for partition_keys
keys: specialize fmt::formatter<partition_key> and friends
Current if index_node throws when trying to
add an already indexed node, pop_node might
unindex the existing node instead of the new one.
Instead, with this change, unindex_node looks up
the node by its pointer and removed it from the
index map only if it's found there so to clean up
safely after index_node throws (at any stage).
Add a unit test to verify that.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
scylla-sstable currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).
This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.
This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.
Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like qurantine, staging etc are also supported.
Fixes: https://github.com/scylladb/scylladb/issues/10126Closes#13448
* github.com:scylladb/scylladb:
test/cql-pytest: test_tools.py: add tests for schema loading
test/cql-pytest: add no_autocompaction_context
docs: scylla-sstable.rst: remove accidentally added copy-pasta
docs: scylla-sstable.rst: remove paragraph with schema limitations
docs: scylla-sstable.rst: update schema section
test/cql-pytest: nodetool.py: add flush_keyspace()
tools/scylla-sstable: reform schema loading mechanism
tools/schema_loader: add load_schema_from_schema_tables()
db/schema_tables: expose types schema
Reproducers for https://github.com/scylladb/scylladb/issues/10770.
(Already fixed in 15ebd59071)
Includes necessary improvements and fixes to `pylib`.
Closes#12699
* github.com:scylladb/scylladb:
test/pytest: reproducers for store mutation...
test: pylib: Add a way to create cql connections with particular coordinators
test/pylib: get gossiper alive endpoints
test/topology: default replication factor 3
test/pylib: configurable replication factor
before this change, we just print out the addresses of the elements
in `column_defs`, if the arguments passed to `token()` function are
not valid. this is not quite helpful from the user's perspective. as
user would be more interested in the values. also, we could print
more accurate error message for different error.
in this change, following Cassandra 4.1's behavior, three cases are
identified, and corresponding errors are returned respectively:
* duplicated partition keys
* wrong order of partition key
* missing keys
where, if the partition key order is wrong, instead of printing the
keys specified by user, the correct order is printed in the error
message for helping user to correct the `token()` function.
for better performance, the checks are performed only if the keys
do not match, based on the assumption that the error handling path
is not likely to be executed.
tests are added accordingly. they tested with Canssandra 4.1.1 also.
Fixes#13468
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13470
Description of storage options is important for S3, as one
needs to know if underlying storage is either local or
remote, and if the latter, details about it.
This relies on server-side desc statement.
$ ./bin/cqlsh.py -e "describe keyspace1;"
CREATE KEYSPACE keyspace1 WITH replication = { ... } AND
storage = {'type': 'S3', 'bucket': 'sstables',
'endpoint': '127.0.0.1:9000'} AND
durable_writes = true;
Fixes#13507.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13510
enable_optimized_twcs_queries is specific to TWCS, therefore it
belongs to TWCS, not replica::table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13489
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print following classes without the help of `operator<<`.
- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper
the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.
the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.
Refs scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
with schema change and host down
Reproducers for a failure during lwt operation due to missing of a
column mapping in schema history table.
Issue #10770
For most tests there will be nodes down, increase replication factor to
3 to avoid having problems for partitions belonging to down nodes.
Use replication factor 1 for raft upgrade tests.
Inactive readers should only be evicted to free up resources for waiting
readers. Evicting them when waiters are not admitted for any other
reason than resources is wasteful and leads to extra load later on when
these evicted readers have to be recreated end requeued.
This patch changes the logic on both the registering path and the
admission path to not evict inactive readers unless there are readers
actually waiting on resources.
A unit-test is also added, reproducing the overly-agressive eviction and
checking that it doesn't happen anymore.
Fixes: #11803Closes#13286
A set of comprehensive tests covering all the supported ways of providing
the schema to scylla-sstable, either explicitely or implicitely
(auto-detect).
It would have been better if `flush()` could have been called with a
keyspace and optional table param, but changing it now is too much
churn, so we add a dedicated method to flush a keyspace instead.
Task manager task implementations of classes that cover
rewrite sstables keyspace compaction which can be start
through /storage_service/keyspace_compaction/ api.
Top level task covers the whole compaction and creates child
tasks on each shard.
Closes#12714
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test rewrite sstables compaction
compaction: create task manager's task for rewrite sstables keyspace compaction on one shard
compaction: create task manager's task for rewrite sstables keyspace compaction
compaction: create rewrite_sstables_compaction_task_impl
This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (#9559)
When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set. They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.
Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.
Closes#12422
* github.com:scylladb/scylladb:
table: discard_sstables: update_sstable_cleanup_state when deleting sstables
compaction_manager: compact_sstables: retrieve owned ranges if required
sstables: add a printer for shared_sstable
compaction_manager: keep owned_ranges_ptr in compaction_state
compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
compaction: refactor compaction_state out of compaction_manager
compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
compaction_manager: refactor get_candidates
compaction_manager: get_candidates: mark as const
table, compaction_manager: add requires_cleanup
sstable_set: add for_each_sstable_until
distributed_loader: reshard: update sstable cleanup state
table, compaction_manager: add update_sstable_cleanup_state
compaction_manager: needs_cleanup: delete unused schema param
compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
distributed_loader: reshard: consider sstables for cleanup
distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
distributed_loader: reshard: add optional owned_ranges_ptr param
distributed_loader: reshard: get a ref to table_state
distributed_loader: reshard: capture creator by ref
distributed_loader: reshard: reserve num_jobs buckets
compaction: move owned ranges filtering to base class
compaction: move owned_ranges into descriptor
Alternator's implementation of TagResource, UntagResource and UpdateTimeToLive (the latter uses tags to store the TTL configuration) was unsafe for concurrent modifications - some of these modifications may be lost. This short series fixes the bug, and also adds (in the last patch) a test that reproduces the bug and verifies that it's fixed.
The cause of the incorrect isolation was that we separately read the old tags and wrote the modified tags. In this series we introduce a new function, `modify_tags()` which can do both under one lock, so concurrent tag operations are serialized and therefore isolated as expected.
Fixes#6389.
Closes#13150
* github.com:scylladb/scylladb:
test/alternator: test concurrent TagResource / UntagResource
db/tags: drop unsafe update_tags() utility function
alternator: isolate concurrent modification to tags
db/tags: add safe modify_tags() utility functions
migration_manager: expose access to storage_proxy
This is a translation of Cassandra's CQL unit test source file
validation/operations/DeleteTest.java into our cql-pytest framework.
There are 51 tests, and they did not reproduce any previously-unknown
bug, but did provide additional reproducers for three known issues:
Refs #4244 Add support for mixing token, multi- and single-column
restrictions
Refs #12474 DELETE prints misleading error message suggesting ALLOW
FILTERING would work
Refs #13250 one-element multi-column restriction should be handled like
a single-column restriction
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13436
The PR adds sstables storage backend that keeps all component files as S3 objects and system.sstables_registry ownership table that keeps track of what sstables objects belong to local node and their names.
When a keyspace is configured with 'STORAGE = { 'type': 'S3' }' the respective class table object eventually gets the storage_options instance pointing to the target S3 endpoint and bucket. All the sstables created for that table attach the S3 storage implementation that maintains components' files as S3 objects. Writing to and reading from components is handled by the S3 client facilities from utils/. Changing the sstable state, which is -- moving between normal, staging and quarantine states -- is not yet implemented, but would eventually happen by updating entries in the sstables registry.
To keep track of which node owns which objects, to provide bucket-wide uniqueness of object names and to maintain sstable state the storage driver keeps records in the system.sstables_registry ownership table. The table maps sstable location and generation to the object format, version, status-state (*) and (!) unique identifier (some time soon this identifier is supposed to be replaced with UUID sstables generations). The component object name is thus s3://bucket/uuid/component_basename. The registry is also used on boot. The distributed loader picks up sstables from all the tables found in schema and for S3-backed keyspaces it lists entries in the registry to a) identify those and b) get their unique S3-side identifiers to open by name.
(*) About sstable's status and state.
The state field is the part of today's sstable path on disk -- staging, quarantine, normal (root table data dir), etc. Since S3 doesn't have the renaming facility, moving sstable between those states is only possible by updating the entry in the registry. This is not yet implemented in this set (#13017)
The status field tracks sstable' transition through its creation-deletion. It first starts with 'creating' status which corresponds to the today's TemporaryTOC file. After being created and written to the sstable moves into 'sealed' state which corresponds to the today's normal sstable being with the TOC file. To delete sstable atomically it first moves into 'removing' state which is equivalent to being in the deletion-log for the on-disk sstable. Once removed from the bucket, the entry is removed from the registry.
To play with:
1. Start minio (installed by install-dependencies.sh)
```
export MINIO_ROOT_USER=${root_user}
export MINIO_ROOT_PASSWORD=${root_pass}
mkdir -p ${root_directory}
minio server ${root_directory}
```
2. Configure minio CLI, create anonymous bucket
```
mc config host rm local
mc config host add local http://127.0.0.1:9000 ${root_user} ${root_pass}
mc mb local/sstables
mc anonymous set public local/sstables
```
3. Start Scylla with object-storage feature enabled
``` scylla ... --experimental-features=keyspace-storage-options --workdir ${as_usual}```
4. Create KS with S3 storage
``` create keyspace ... storage = { 'type': 'S3', 'endpoint': '127.0.0.1:9000', 'bucket': 'sstables' };```
The S3 client has a logger named "s3", it's useful to use on with `trace` verbosity.
Closes#12523
* github.com:scylladb/scylladb:
test: Add object-storage test
distributed_loader: Print storage type when populating
sstable_directory: Add ownership table components lister
sstable_directory: Make components_lister and API
sstable_directory: Create components lister based on storage options
sstables: Add S3 storage implementation
system_keyspace: Add ownership table
system_keyspace: Plug to user sstables manager too
sstable: Make storage instance based on storage options
sstable_directory: Keep storage_options aboard
sstable: Virtualize the helper that gets on-disk stats for sstable
sstable, storage: Virtualize data sink making for small components
sstable, storage: Virtualize data sink making for Data and Index
sstable/writer: Shuffle writer::init_file_writers()
sstable: Make storage an API
utils: Add S3 readable file impl for random reads
utils: Add S3 data sink for multipart upload
utils: Add S3 client with basic ops
cql-pytest: Add option to run scylla over stable directory
test.py: Equip it with minio server
sstables: Detach write_toc() helper
Refactor the printing logic in compaction::formatted_sstables_list
out to sstables::to_string(const shared_sstable&, bool include_origin)
and operator<<(const shared_sstable) on top of it.
So that we can easily print std::vector<shared_sstable>
from compaction_manager in the next patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Move the owned_ranges_ptr, currently used only by
cleanup and upgrade compactions, to the generic
compaction descriptor so we apply cleanup in other
compaction types.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `auth::auth_authentication_options` and `auth::resource_kind`
without the help of fmt::ostream. and their `operator<<(ostream,..)` are
dropped, as there are no users of them anymore.
Refs #13245Closes#13460
* github.com:scylladb/scylladb:
auth: remove unused operator<<(.., resource_kind)
auth: specialize fmt::formatter<resource_kind>
auth: remove unused operator<<(.., authentication_option)
auth: specialize fmt::formatter<authentication_option>
The test does
- starts scylla (over stable directory
- creates S3-backed keyspace (minio is up and running by test.py
already)
- creates table in that keyspace and populates it with several rows
- flushes the keyspace to make sstables hit the storage
- checks that the ownership table is populated properly
- restarts scylla
- makes sure old entries exist
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch adds storage options lw-ptr to sstables_manager::make_sstable
and makes the storage instance creation depend on the options. For local
it just creates the filesystem storage instance, for S3 -- throws, but
next patch will fix that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>