This patch adds a couple of simple tests for the USE statement: that
without USE one cannot create a table without explicitly specifying
a keyspace name, and with USE, it is possible.
Beyond testing these specific feature, this patch also serves as an
example of how to write more tests that need to control the effective USE
setting. Specifically, it adds a "new_cql" function that can be used to
create a new connection with a fresh USE setting. This is necessary
in such tests, because if multiple tests use the same cql fixture
and its single connection, they will share their USE setting and there
is no way to undo or reset it after being set.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11741
There is a flaw in how the raft rpc endpoints are
currently managed. The io_fiber in raft::server
is supposed to first add new servers to rpc, then
send all the messages and then remove the servers
which have been excluded from the configuration.
The problem is that the send_messages function
isn't synchronous, it schedules send_append_entries
to run after all the current requests to the
target server, which can happen
after we have already removed the server from address_map.
In this patch the remove_server function is changed to mark
the server_id as expiring rather than synchronously dropping it.
This means all currently scheduled requests to
that server will still be able to resolve
the ip address for that server_id.
Fixes: #11228Closes#11748
The `add_entry` and `modify_config` methods sometimes do an rpc to
execute the request on the current leader. If the tcp connection
was broken, a `seastar::rpc::closed_error` would be thrown to the client.
This exception was not documented in the method comments and the
client could have missed handling it. For example, this exception
was not handled when calling `modify_config` in `raft_group0`,
which sometimes broke the `removenode` command.
An `intermittent_connection_error` exception was added earlier to
solve a similar problem with the `read_barrier` method. In this patch it
is renamed to `transport_error`, as it seems to better describe the
situation, and an explicit specification for this exception
was added - the rpc implementation can throw it if it is not known
whether the call reached the destination and whether any mutations were made.
In case of `read_barrier` it does not matter and we just retry, in case
of `add_entry` and `modify_config` we cannot retry because of possible mutations,
so we convert this exception to `commit_status_unknown`, which
the client has to handle.
Explicit comments have also been added to `raft::server` methods
describing all possible exceptions.
Closes#11691
* github.com:scylladb/scylladb:
raft_group0: retry modify_config on commit_status_unknown
raft: convert raft::transport_error to raft::commit_status_unknown
modify_config can throw commit_status_unknown in case
of a leader change or when the leader is unavailable,
but the information about it has not yet reached the
current node. In this patch modify_config is run
again after some time in this case.
The add_entry and modify_config methods sometimes do an rpc to
execute the request on the current leader. If the tcp connection
was broken, a seastar::rpc::closed_error would be thrown to the client.
This exception was not documented in the method comments and the
client could have missed handling it. For example, this exception
was not handled when calling modify_config in raft_group0,
which sometimes broke the removenode command.
An intermittent_connection_error exception was added earlier to
solve a similar problem with the read_barrier method. In this patch it
is renamed to transport_error, as it seems to better describe the
situation, and an explicit specification for this exception
was added - the rpc implementation can throw it if it is not known
whether the call reached the target node and whether any
actions were performed on it.
In case of read_barrier it does not matter and we just retry. In case
of add_entry and modify_config we cannot retry
because the rpc calls are not idempotent, so we convert this
exception to commit_status_unknown, which the client has to handle.
Explicit comments have also been added to raft::server methods
describing all possible exceptions.
Yet another user of global qctx object. Making the method(s) non-static requires pushing the system_keyspace all the way down to size_estimate_virtual_reader and a small update of the cql_test_env
Closes#11738
* github.com:scylladb/scylladb:
system_keyspace: Make get_{local|saved}_tokens non static
size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges()
cql_test_env: Keep sharded<system_keyspace> reference
size_estimate_virtual_reader: Keep system_keyspace reference
system_keyspace: Pass sys_ks argument to install_virtual_readers()
system_keyspace: Make make() non-static
distributed_loader: Pass sys_ks argument to init_system_keyspace()
system_keyspace: Remove dangling forward declaration
do_with() is a sure indicator for coroutinization, since it adds
an allocation (like the coroutine does with its frame). Therefore
translating a function with do_with is at least a break-even, and
usually a win since other continuations no longer allocate.
This series converts most of storage_proxy's function that have
do_with to coroutines. Two remain, since they are not simple
to convert (the do_with() is kept running in the background and
its future is discarded).
Individual patches favor minimal changes over final readability,
and there is a final patch that restores indentation.
The patches leave some moves from coroutine reference parameters
to the coroutine frame, this will be cleaned up in a follow-up. I wanted
this series not to touch headers to reduce rebuild times.
Closes#11683
* github.com:scylladb/scylladb:
storage_proxy: reindent after coroutinization
storage_proxy: convert handle_read_digest() to a coroutine
storage_proxy: convert handle_read_mutation_data() to a coroutine
storage_proxy: convert handle_read_data() to a coroutine
storage_proxy: convert handle_write() to a coroutine
storage_proxy: convert handle_counter_mutation() to a coroutine
storage_proxy: convert query_nonsingular_mutations_locally() to a coroutine
In August 2022, DynamoDB added a "S3 Import" feature, which we don't yet
support - so let's document this missing feature in the compatibility
document.
Refs #11739.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11740
This series adds support for detecting collections that have too many items
and recording them in `system.large_cells`.
A configuration variable was added to db/config: `compaction_collection_items_count_warning_threshold` set by default to 10000.
Collections that have more items than this threshold will be warned about and will be recorded as a large cell in the `system.large_cells` table. Documentation has been updated respectively.
A new column was added to system.large_cells: `collection_items`.
Similar to the `rows` column in system.large_partition, `collection_items` holds the number of items in a collection when the large cell is a collection, or 0 if it isn't. Note that the collection may be recorded in system.large_cells either due to its size, like any other cell, and/or due to the number of items in it, if it cross the said threshold.
Note that #11449 called for a new system.large_collections table, but extending system.large_cells follows the logic of system.large_partitions is a smaller change overall, hence it was preferred.
Since the system keyspace schema is hard coded, the schema version of system.large_cells was bumped, and since the change is not backward compatible, we added a cluster feature - `LARGE_COLLECTION_DETECTION` - to enable using it.
The large_data_handler large cell detection record function will populate the new column only when the new cluster feature is enabled.
In addition, unit tests were added in sstable_3_x_test for testing large cells detection by cell size, and large_collection detection by the number of items.
Closes#11449Closes#11674
* github.com:scylladb/scylladb:
sstables: mx/writer: optimize large data stats members order
sstables: mx/writer: keep large data stats entry as members
db: large_data_handler: dynamically update config thresholds
utils/updateable_value: add transforming_value_updater
db/large_data_handler: cql_table_large_data_handler: record large_collections
db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler
db/large_data_handler: cql_table_large_data_handler: move ctor out of line
docs: large-rows-large-cells-tables: fix typos
db/system_keyspace: add collection_elements column to system.large_cells
gms/feature_service: add large_collection_detection cluster feature
test: sstable_3_x_test: add test_sstable_too_many_collection_elements
test: lib: simple_schema: add support for optional collection column
test: lib: simple_schema: build schema in ctor body
test: lib: simple_schema: cql: define s1 as static only if built this way
db/large_data_handler: maybe_record_large_cells: consider collection_elements
db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries
sstables: mx/writer: pass collection_elements to writer::maybe_record_large_cells
sstables: mx/writer: add large_data_type::elements_in_collection
db/large_data_handler: get the collection_elements_count_threshold
db/config: add compaction_collection_elements_count_warning_threshold
test: sstable_3_x_test: add test_sstable_write_large_cell
test: sstable_3_x_test: pass cell_threshold_bytes to large_data_handler
test: sstable_3_x_test: large_data_handler: prepare callback for testing large_cells
test: sstable_3_x_test: large_data tests: use BOOST_REQUIRE_[GL]T
test: sstable_3_x_test: test_sstable_log_too_many_rows: use tests::random
What's contained in this series:
- Refactored compaction tests (and utilities) for integration with multiple groups
- The idea is to write a new class of tests that will stress multiple groups, whereas the existing ones will still stress a single group.
- Fixed a problem when cloning compound sstable set (cannot be triggered today so I didn't open a GH issue)
- Many changes in replica::table for allowing integration with multiple groups
Next:
- Introduce for_each_compaction_group() for iterating over groups wherever needed.
- Use for_each_compaction_group() in replica::table operations spanning all groups (API, readers, etc).
- Decouple backlog tracker from compaction strategy, to allow for backlog isolation across groups
- Introduce static option for defining number of compaction groups and implement function to map a token to its respective group.
- Testing infrastructure for multiple compaction groups (helpful when testing the dynamic behavior: i.e. merging / splitting).
Closes#11592
* github.com:scylladb/scylladb:
sstable_resharding_test: Switch to table_for_tests
replica: Move compacted_undeleted_sstables into compaction group
replica: Use correct compaction_group in try_flush_memtable_to_sstable()
replica: Make move_sstables_from_staging() robust and compaction group friendly
test: Rename column_family_for_tests to table_for_tests
sstable_compaction_test: Use column_family_for_tests::as_table_state() instead
test: Don't expose compound set in column_family_for_tests
test: Implement column_family_for_tests::table_state::is_auto_compaction_disabled_by_user()
sstable_compaction_test: Merge table_state_for_test into column_family_for_tests
sstable_compaction_test: use table_state_for_test itself in fully_expired_sstables()
sstable_compaction_test: Switch to table_state in compact_sstables()
sstable_compaction_test: Reduce boilerplate by switching to column_family_for_tests
Instead of `test.py.log`, use:
`test.py.dev.log`
when running with `--mode dev`,
`test.py.dev-release.log`
when running with `--mode dev --mode release`,
and so on.
This is useful in Jenkins which is running test.py multiple times in
different modes; a later run would overwrite a previous run's test.py
file. With this change we can preserve the test.py files of all of these
runs.
Closes#11678
The test was added recently and since then causes CI failures.
We suspect that it happens if the node being removed was the Raft group
0 leader. The removenode coordinator tries to send to it the
`remove_from_group0` request and fails.
A potential fix is in review: #11691.
Now all callers have system_keyspace reference at hand. This removes one
more user of the global qctx object
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This method static calls system_keyspace::get_local_tokens(). Having the
system_keyspace reference will make this method non-static
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a test_get_local_ranges() call in size-estimate reader which
will need system keyspace reference. There's no other place for tests to
get it from but the cql_test_env thing
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The s._e._v._reader::fill_buffer() method needs system keyspace to get
node's local tokens. Now it's a static method, having system_keyspace
reference will make it non-static
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The size-estimate-virtual-reader will need it, now it's available as
"this" from system_keyspace::make() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This helper needs system_keyspace reference and using "this" as this
looks natural. Also this de-static-ification makes it possible to put
some sense into the invoke_on_all() call from init_system_keyspace()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's final destination is virtual tabls registration code called from
init_system_keyspace() eventually
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It doesn't match the real system_keyspace_make() definition and is in
fact not needed, as there's another "real" one in database.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Fixes a regression introduced in 80917a1054:
"scylla_prepare: stop generating 'mode' value in perftune.yaml"
When cpuset.conf contains a "full" CPU set the negation of it from
the "full" CPU set is going to generate a zero mask as a irq_cpu_mask.
This is an illegal value that will eventually end up in the generated
perftune.yaml, which in line will make the scylla service fail to start
until the issue is resolved.
In such a case a irq_cpu_mask must represent a "full" CPU set mimicking
a former 'MQ' mode.
Fixes#11701
Tested:
- Manually on a 2 vCPU VM in an 'auto-selection' mode.
- Manually on a large VM (48 vCPUs) with an 'MQ' manually
enforced.
Message-Id: <20221004004237.2961246-1-vladz@scylladb.com>
Currently doing `CONTAINS NULL` or `CONTAINS KEY NULL` on a collection evaluates to `true`.
This is a really weird behaviour. Collections can't contain `NULL`, even if they wanted to.
Any operation that has a NULL on either side should evaluate to `NULL`, which is interpreted as `false`.
In Cassandra trying to do `CONTAINS NULL` causes an error.
Fixes: #10359
The only problem is that this change is not backwards compatible. Some existing code might break.
Closes#11730
* github.com:scylladb/scylladb:
cql3: Make CONTAINS KEY NULL return false
cql3: Make CONTAINS NULL return false
The removenode_abort logic that follows the warning
may throw, in which case information about
the original exception was lost.
Fixes: #11722Closes#11735
To keep the idl definition of plan_id from
getting out of sync with the one in stream_fwd.hh.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#11720
The raft_group0 code needs system_keyspace and now it gets one from gossiper. This gossiper->system_keyspace dependency is in fact artificial, gossiper doesn't need system ks, it's there only to let raft and snitch call gossiper.get_system_keyspace().
This makes raft use system ks directly, snitch is patched by another branch
Closes#11729
* github.com:scylladb/scylladb:
raft_group0: Use local reference
raft_group0: Add system keyspace reference
Commit aba475fe1d accidentally fixed a race, which happens in
the following sequence of events:
1) storage service starts drain() via API for example
2) main's abort source is triggered, calling compaction_manager's do_stop()
via subscription.
2.1) do_stop() initiates the stop but doesn't wait for it.
2.2) compaction_manager's state is set to stopped, such that
compaction_manager::stop() called in defer_verbose_shutdown()
will wait for the stop and not start a new one.
3) drain() calls compaction_manager::drain() changing the state from
stopped to disabled.
4) main calls compaction_manager::stop() (as described in 2.2) and
incorrectly tries to stop the manager again, because the state was
changed in step 3.
aba475fe1d accidentally fixed this problem because drain() will no
longer take place if it detects the shutdown process was initiated
(it does so by ignoring drain request if abort source's subscription
was unlinked).
This shows us that looking at the state to determine if stop should be
performed is fragile, because once the state changes from A to B,
manager doesn't know the state was A. To make it robust, we can instead
check if the future that stores stop's promise is engaged, meaning that
the stop was already initiated and we don't have to start a new one.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#11711
This series undoes some recent damage to clarity, then
goes further by renaming terms around dirty_memory_manager
to be clearer. Documentation is added.
Closes#11705
* github.com:scylladb/scylladb:
dirty_memory_manager: re-term "virtual dirty" to "unspooled dirty"
dirty_memory_manager: rename _virtual_region_group
api: column_family: fix memtable off-heap memory reporting
dirty_memory_manager: unscramble terminology
This is the continuation of the a980510654 that tries to catch ENOSPCs reported via storage_io_error similarly to how defer_verbose_shutdown() does on stop
Closes#11664
* github.com:scylladb/scylladb:
table: Handle storage_io_error's ENOSPC when flushing
table: Rewrap retry loop
Compacted undeleted sstables are relevant for avoiding data resurrection
in the purge path. As token ranges of groups won't overlap, it's
better to isolate this data, so to prevent one group from interfering
with another.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
We need to pass the compaction_group received as a param, not the one
retrieved via as_table_state(). Needed for supporting multiple
groups.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Off-strategy can happen in parallel to view building.
A semaphore is used to ensure they don't step on each other's
toe.
If off-strategy completes first, then move_sstables_from_staging()
won't find the SSTable alive and won't reach code to add
the file to the backlog tracker.
If view building completes first, the SSTable exists, but it's
not reshaped yet (has repair origin) and shouldn't be
added to the backlog tracker.
Off-strategy completion code will make sure new sstables added
to main set are accounted by the backlog tracker, so
move_sstables_from_staging() only need to add to tracker files
which are certainly not going through a reshape compaction.
So let's take these facts into account to make the procedure
more robust and compaction group friendly. Very welcome change
for when multiple groups are supported.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That's important for multiple compaction groups. Once replica::table
supports multiple groups, there will be no table::as_table_state(),
so for testing table with a single group, we'll be relying on
column_family_for_tests::as_table_state().
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The compound set shouldn't be exposed in main_sstables() because
once we complete the switch to column_family_for_tests::table_state,
can happen compaction will try to remove or add elements to its
set snapshot, and compound set isn't allowed to either ops.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Needed once we switch to column_family_for_tests::table_state, so unit
tests relying on correct value will still work
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This change will make table_state_for_test the table_state of
column_family_for_tests. Today, an unit test has to keep a reference
to them both and logically couple them, but that's error prone.
This change is also important when replica::table supports multiple
compaction groups, so unit tests won't have to directly reference
the table_state of table, but rather use the one managed by
column_family_for_tests.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The switch is important once we have multiple compaction groups,
as a single table may own several groups. There will no longer be
a replica::table::as_table_state().
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Lots of boilerplate is reduced, and will also help to complete the
switch from replica::table to compaction::table_state in the unit
tests.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
A binary operator like this:
{1: 2, 3: 4} CONTAINS KEY NULL
used to evaluate to `true`.
This is wrong, any operation involving null
on either side of the operator should evaluate
to NULL, which is interpreted as false.
This change is not backwards compatible.
Some existing code might break.
partially fixes: #10359
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
A binary operator like this:
[1, 2, 3] CONTAINS NULL
used to evaluate to `true`.
This is wrong, any operation involving null
on either side of the operator should evaluate
to NULL, which is interpreted as false.
This change is not backwards compatible.
Some existing code might break.
partially fixes: #10359
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
It now grabs one from gossiper which is weird. A bit later it will be
possible to remove gossiper->system_keyspace dependency
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All calls in the try block have been noexcept for some time.
Remove the try...catch and the associated misleading comment to avoid confusing
source code readers.
Closes#11715
Since `_partition_size_entry` and `_rows_in_partition_entry`
are accessed at the same time when updated, and similarly
`_cell_size_entry` and `_elements_in_collection_entry`,
place the member pairs closely together to improve data
cache locality.
Follow the same order when preparing the
`scylla_metadata::large_data_stats` map.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>