This series fixes a couple issues around generating and handling of no_such_keyspace and no_such_column_family exceptions.
First, it removes std::throw_with_nested around their throw sites in the respective database::find_* functions.
Fixes#9753
And then, it introduces a `validate_tables` helper in api/storage_service.cc that generates a `bad_param_exception` in order to set the correct http response status if a non-existing table name is provided in the `cf` http request parameter.
Fixes#9754
The series also adds a test for the REST API under test/rest_api that verifies the storage_service enable/disable auto_compaction api and checks the error codes for non-existing keyspace or table.
Test: unit(dev)
Closes#9755
* github.com:scylladb/scylla:
api: storage_service: add parse_tables
database: un-nest no_such_keyspace and no_such_column_family exceptions
database: throw internal error when failing uuid returned by find_uuid
database: find_uuid: throw no_such_column_family exception if ks/cf were not found
test: rest_api: add storage_service test
test: add basic rest api test
test: cql-pytest: wait for rest api when starting scylla
Splits and validate the cf parameter, containing an optional
comma-separated list of table names.
If any table is not found and a no_such_column_family
exception is thrown, wrap it in a `bad_param_exception`
so it will translate to `reply::status_type::bad_request`
rather than `reply::status_type::internal_server_error`.
With that, hide the split_cf function from api/api.hh
since it was used only from api/storage_service
and new use sites should use validate_tables instead.
Fixes#9754
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
These were thrown in the respective database::find_*
function as nested exception since
d3fe0c5182.
Wrapping them in nested exceptions just makes it
harder to figure out and work with and apprently serves
no purpose.
Without these nested_exception we can correctly detect
internal errors when synchronously failing to find
a uuid returned by find_uuid(ks, cf).
Fixes#9753
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
find_uuid returns a uuid found for ks_name.table_name.
In some cases, we immediately and synchronously use that
uuid to lookup other information like the table&
or the schema. Failing to find that uuid indicates
an internal error when no preemption is possible.
Note that yielding could allow deletion of the table
to sneak in and invalidate the uuis.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than masquerading all errors as std::out_of_range("")
convert only the std::out_of_range error from _ks_cf_to_uuid.at()
to no_such_column_family(ks, cf). That relieves all callers of
fund_uuid from doing that conversion themselves.
For example, get_uuid in api/column_family now only deals with converting
no_such_column_family to bad_param_exception, as it needs to do
at the api level, rather than generating a similar error from scratch.
Other call sites required no intervention.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
FIXME: negative tests for not-found tables
should result in a requests.codes.bad_request
but currently result in requests.codes.internal_server_error.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This patch addresses yet another FIXME from alternator/ttl.cc.
Namely, scans are now started from a random, owned token range
instead of always starting with the first range.
This mechanism is expected to reduce the probability of some
ranges being starved when the scanning process is often restarted,
e.g. due to nodes failing.
Should the mechanism prove insufficient for some users, a more complete
solution is to regularly persist the state of the scanning process
in a table (distributed if we want to allow other nodes to pick up
from where a dead node left off), but that induces overhead.
Tests: unit(release) (including a long loop over the ttl pytest)
Message-Id: <7fc3f6525ceb69725c41de10d0fb6b16188349e3.1638387924.git.sarna@scylladb.com>
Message-Id: <db198e743ca9ed1e5cc659e73da342fbce2c882a.1638473143.git.sarna@scylladb.com>
Some of the tests, like nodetool.py, use the scylla REST API.
Add a check_rest_api function that queries http://<node_addr>:10000/
that is served once scylla starts listening on the API port
and call it via run.wait_for_services.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
On my local machine, a 3 second deadline proved to cause flakiness
of test_ttl_expiration case, because its execution time is just
around 3 seconds.
This patch addresse the problem by bumping the local timeout to 10
(and 15 for test_ttl_expiration_long, since it's dangerously near
the 10 second deadline on my machine as well).
Moreover, some test cases short-circuited once they detected that
all needed items expired, but other ones lacked it and always used
their full time slot. Since 10 seconds is a little too long for
a single test case, even one marked with --veryslow, this patch
also adds a couple of other short-circuits.
One exception is test_ttl_expiration_hash_wrong_type, which actually
depends on the fact that we should wait for the whole loop to finish.
Since this case was never flaky for me with the 3 second timeout,
it's left as is.
Theoretically, test_ttl_expiration also kind of depends on checking
the condition more than once (because the TTL of one of the values
is bumped on each iteration), but empirical evidence shows that
multiple iterations always occur in this test case anyway - for
me, it always spinned at least 3 times.
Tests: unit(release)
Message-Id: <a0a479929dac37daace744e0a970567a8aa3b518.1638431933.git.sarna@scylladb.com>
if fill_buffer() is called after EOS, underlying reader will
be fast forwarded to a range pointed to by an invalid iterator,
so producing incorrect results.
fill_buffer() is changed to return early if EOS was found,
meaning that underlying reader already fast forwarded to
all ranges managed by multi_range_reader.
Usually, consume facilities check for EOS, before calling
fill_buffer() but most reader impl check for EOS to avoid
correctness issues. Let's do the same here.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211208131423.31612-1-raphaelsc@scylladb.com>
* seastar f8a038a0a2...8d15e8e67a (21):
> core/program_options: preserve defaultness of CLI arguments
> log: Silence logger when logging
> Include the core/loop.hh header inside when_all.hh header
> http: Fix deprecated wrappers
> foreign_ptr: Add concept
> util: file: add read_entire_file
> short_streams: move to util
> Revert "Merge: file: util: add read_entire_file utilities"
> foreign_ptr: declare destroy as a static method
> Merge: file: util: add read_entire_file utilities
> Merge "output_stream: handle close failure" from Benny
> net: bring local_address() to seastar::connected_socket.
> Merge "Allow programatically configuring seastar" from Botond
> Merge 'core: clean up memory metric definitions' from John Spray
> Add PopOS to debian list in install-dependencies.sh
> Merge "make shared_mutex functions exception safe and noexcept" from Benny
> on_internal_error: set_abort_on_internal_error: return current state
> Implementation of iterator-range version of when_any
> net: mark functions returning ethernet_address noexcept
> net: ethernet_address: mark functions noexcept
> shared_mutex: mark wake and unlock methods noexcept
Contains patch from Botond Dénes <bdenes@scylladb.com>:
db/config: configure logging based on app_template::seastar_options
Scylla has its own config file which supports configuring aspects of
logging, in addition to the built-in CLI logging options. When applying
this configuration, the CLI provided option values have priority over
the ones coming from the option file. To implement this scylla currently
reads CLI options belonging to seastar from the boost program options
variable map. The internal representation of CLI options however do not
constitute an API of seastar and are thus subject to change (even if
unlikely). This patch moves away from this practice and uses the new
shiny C++ api: `app_template::seastar_options` to obtain the current
logging options.
This series wires up the schema state machine to process raft commands
and transfer snapshots. The series assumes that raft group zero is used
for schema transfer only and that single raft command contains single
schema change in a form of canonical_mutation array. Both assumptions
may change in which case the code will be changed accordingly, but we
need to start somewhere.
* scylla-dev/gleb/schema-raft-sm-v2:
schema raft sm: request schema sync on schema_state_machine snapshot transfer
raft service: delegate snapshot transfer to a state machine implementation
schema raft sm: pass migration manager to schema_raft_state_machine and merge schema on apply()
Add a short test verifying that Alternator responds with the correct
error code (UnknownOperationException) when receiving an unknown or
unsupported operation.
The test passes on both AWS and Alternator, confirming that the behavior
is the same.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211206125710.1153008-1-nyh@scylladb.com>
We begin by preparing the `persistence` class so that the storage can be
reused across different Raft server instances: the test keeps a shared
pointer to the storage so that when a server stops, a new server with
the same ID can be reconstructed with this storage.
We then modify `environment` so that server instances can be removed and
replaced in middle of operations.
Finally we prepare a nemesis operation which gracefully stops or
immediately crashes a randomly picked server and run this operation
periodically in `basic_generator_test`.
One important change that changes the API of `raft::server` is included:
the metrics are not automatically registered in `start()`. This is
because metric registration modifies global data structures, which
cannot be done twice with the same set of metrics (and we would do it
when we restart a server with the same ID). Instead,
`register_metrics()` is exposed in the `raft::server` interface to be
called when running servers in production.
* kbr/crashes-v3:
raft: server: print the ID of aborted server
test: raft: randomized_nemesis_test: run stop_crash nemesis in `basic_generator_test`
test: raft: randomized_nemesis_test: introduce `stop_crash` operation
test: raft: randomized_nemesis_test: environment: implement server `stop` and `crash`
raft: server: don't register metrics in `start()`
test: raft: randomized_nemesis_test: raft_server: return `stopped_error` when called during abort
test: raft: randomized_nemesis_test: handle `raft::stopped_error`
test: raft: randomized_nemesis_test: handle missing servers in `environment` call functions
test: raft: randomized_nemesis_test: environment: split `new_server` into `new_node` and `start_server`
test: raft: randomized_nemesis_test: remove `environment::get_server`
test: raft: randomized_nemesis_test: construct `persistence_proxy` outside `raft_server<M>::create`
test: raft: randomized_nemesis_test: persistence_proxy: store a shared pointer to `persistence`
test: raft: randomized_nemesis_test: persistence: split into two classes
test: raft: logical_timer: introduce `sleep_until`
It turns out that -O3 enabled -fslp-vectorize even if it is
disabled before -O3 on the command line. Rearrange the code
so that -O3 is before the more specific optimization options.
The problem was that such a command:
```
alter table ks.cf with cdc={'ttl': 120};
```
would assume that "enabled" parameter is the default ("false") and,
in effect, disable CDC on that table. This commit forces the user
to specify that key.
Fixes#6475Closes#9720
get_max_result_size() is called on slice moved in previous argument.
This results in use-after-move with clang, which evaluation order is
left-to-right.
For paged queries, max_result_size is later overriden by query_pager,
but for unpaged and/or reversed queries it can happen that max result
size incorrectly contains the 1MB limit for paged, non-reversed queries.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211207145133.69764-1-raphaelsc@scylladb.com>
Since sstable reader was already converted to flat_mutation_reader_v2, compaction layer can naturally be converted too.
There are many dependencies that use v1. Those strictly needed like readers in sstable set, which links compaction to sstable reader, were converted to v2 in this series. For those that aren't essential we're relying on V1<-->V2 adaptors, and conversion work on them will be postponed. Those being postponed are: scrub specialized reader (needs a validator for mutation_fragment_v2), interposer consumer, combined reader which is used by incremental selector. incremental selector itself was converted to v2.
tests: unit(debug).
Closes#9725
* github.com:scylladb/scylla:
compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2
sstable_set: update make_crawling_reader() to flat_mutation_reader_v2
sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2
sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2
sstable_set: update incremental_reader_selector to flat_mutation_reader_v2
Cannot be fully converted to flat_mutation_reader_v2 yet, as the
selector is built on combined_reader interface which is still not
converted. So only updated wherever possible.
Subsequent work will update sstable_set readers, which uses the
selector, to flat_mutation_reader_v2.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
There is a separate thread that periodically stops/crashes and restarts
a randomly chosen server, so the nemesis runs concurrently with
reconfigurations and network partitions.
An operation which chooses a server randomly, randomly chooses whether
to crash or gracefully stop it, performs the chosen operation, and
restarts the server after a selected delay.
`stop` gracefully stops a running server, `crash` immediately "removes" it
(from the point of view of the rest of the environment).
We cannot simply destroy a running server. Read the comments in `crash`
to understand how it's implemented.
Instead, expose `register_metrics()` at the `server` interface
(previously it was a private method of `server_impl`).
Metrics are global so `register_metrics()` cannot be called on
two servers that have the same ID, which is useful e.g. in tests when we
want to simulate server stops and restarts.
`environment` functions for performing operations on Raft servers:
`is_leader`, `call`, `reconfigure`, `get_configuration`,
currently assume that a server is running on each node at all times and
that it never changes. Prepare these functions for missing/restarting
servers.
Soon it will be possible to stop a server and then start a completely
new `raft::server` instance but which uses the same ID and persistence,
simulating a server restart. For this we introduce the concept of a
"node" which keeps the persistence alive (through a shared pointer). To
start a server - using `start_server` - we must first create a node on
which it will be running through `new_node`. `new_server` is now a
short function which does these two things.
To perform calls to servers in a Raft cluster, the test code would first
obtain a reference to a server through `get_server` and then call the
server directly. This will not be safe when we implement server crashes
and restarts as servers will disappear in middle of operations; we don't
want the test code to keep references to no-longer-existing servers.
In the new API the test will call the `environment` to perform
operations, giving it the server ID. `environment` will handle
disappearing servers underneath.
We want the test to be able to reuse `persistence` even after
`persistence_proxy` is destroyed for simulating server restarts. We'll
do it by having the test keep a shared pointer to `persistence`.
To do that, instead of storing `persistence` by value and constructing
it inside `persistence_proxy`, store it by `lw_shared_ptr` which is
taken through the constructor (so `persistence` itself is now
constructed outside of `persistence_proxy`).
The previous `persistence` implemented the `raft::persistence` interface
and had two different responsibilities:
- representing "persistent storage", with the ability to store and load
stuff to/from it,
- accessing in-memory state shared with a corresponding instance of
`impure_state_machine` that is running along `persistence` inside
a `raft::server`.
For example, `persistence::store_snapshot_descriptor` would persist not
only the snapshot descriptor, but also the corresponding snapshot. The
descriptor was provided through a parameter but the snapshot wasn't. To
obtain the snapshot we use a data structure (`snapshots_t`) that both
`persistence` and `impure_state_machine` had a reference to.
We split `persistence` into two classes:
- `persistence` which handles only the first responsibility, i.e.
storing and loading stuff; everything to store is provided through
function parameters (e.g. now we have a `store_snapshot` function
which takes both the snapshot and its descriptor through the
parameters) and everything to load is returned directly by functions
(e.g. `load_snapshot` returns a pair containing both the descriptor
and corresponding snapshot)
- `persistence_proxy` (for lack of a better name) which implements
`raft::persistence`, contains the above `persistence` inside and
shares a data structure with `impure_state_machine`
(so `persistence_proxy` corresponds to the old `persistence`).
The goal is to prepare for reusing the persisted stuff between different
instances of `raft::server` running in a single test when simulating
server shutdowns/crashes and restarts. When destroying a `raft::server`,
we destroy its `impure_state_machine` and `persistence_proxy` (we are
forced to because constructing a `raft::server` requires a `unique_ptr`
to `raft::persistence`), but we will be able to keep the underlying
`persistence` for the next instance (if we simulate a restart) - after a
slight modification made in the next commit.
"
Currently stateful (readers being saved and resumed on page boundaries)
multi-range scans are broken in multiple ways. Trying to use them can
result in anything from use-after-free (#6716) or getting corrupt data
(#9718). Luckily no-one is doing such queries today, but this started to
change recently as code such as Alternator TTL and distributed
aggregate reads started using this.
This series fixes both problems and adds a unit test too exercising this
previously completely unused code-path.
Fixes: #6716Fixes: #9718
Tests: unit(dev, release, debug)
"
* 'fix-stateful-multi-range-scans/v1' of https://github.com/denesb/scylla:
test/boost/multishard_mutation_query_test: add multi-range test
test/boost/multishard_mutation_query_test: add multi-range support
multishard_mutation_query: don't drop data during stateful multi-range reads
multishard_combining_reader: reader_lifecycle_policy: allow saving read range on fast-forward
In the past, we had very similar shell scripts for test/alternator/run,
test/cql-pytest/run and test/redis/run. Most of the code of all three
scripts was identical - dealing with starting Scylla in a temporary
directory, running pytest, and so on. The code duplication meant that
every time we fixed a bug in one of those scripts, or added an important
boot-time parameter to Scylla, we needed to fix all three scripts.
The solution was to convert the run scripts to Python, and to use a
common library, test/cql-pytest/run.py, for the main features shared
by all scripts - starting Scylla, waiting for protocols to be available,
and running pytest.
However, we only did this conversion for alternator and cql-pytest -
redis remained the old shell scripts. This patch completes the
conversion also for redis. As expected, no change was needed to the
run.py library code, which was already strong enough for the needs of
the redis tests.
Fixes#9748.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211207081423.1187847-1-nyh@scylladb.com>
As part of the drive to move over to flat_mutation_reader_v2, update
make_filtering_reader(). Since it doesn't examine range tombstones
(only the partition_start, to filter the key) the entire patch
is just glue code upgrading and downgrading users in the pipeline
(or removing a conversion, in one case).
Test: unit (dev)
Closes#9723
We have three semaphores for serialization of maintenance ops.
1) _rewrite_sstables_sem: for scrub, cleanup and upgrade.
2) _major_compaction_sem: for major
3) _custom_job_sem: for reshape, resharding and offstrategy
scrub, cleanup and upgrade should be serialized with major,
so rewrite sem should be merged into major one.
offstrategy is also a maintenance op that should be serialized
with others, to reduce compaction aggressiveness and space
requirement.
resharding is one-off operation, so can be merged there too.
the same applies for reshape, which can take long and not
serializing it with other maintenance activity can lead to
exhaustion of resources and high space requirement.
let's have a single semaphore to guarantee their serialization.
deadlock isn't an issue because locks are always taken in same
order.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211201182046.100942-1-raphaelsc@scylladb.com>
Adding a strigify function for the node_ops_cmd enum,
will make the log output more readable and will make it
possible (hopefully) to do initial analysis without consulting
the source code.
Refs #9629
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Closes#9745