table queries.
This change introduces a query_restrictions object into the virtual
table infrastructure, for now only holding a restriction on partition
ranges.
That partition range is then implemented into
memtable_filling_virtual_table.
This change adds a more specific implementation of the virtual table
called memtable_filling_virtual_table. It produces results by filling
a memtable on each read.
This change introduces the basic interface we expect each virtual
table to implement. More specific implementations will then expand
upon it if needed.
This change adds a new type of mutation reader which purpose
is to allow inserting operations before an invocation of the proper
reader. It takes a future to wait on and only after it resolves will
it forward the execution to the underlying flat_mutation_reader
implementation.
As a function returning a future, simplify
its interface by handling any exceptions and
returning an exceptional future instead of
propagating the exception.
In this specific case, throwing from advance_and_await()
will propagate through table::await_pending_* calls
short-circuiting a .finally clause in table::stop().
Also, mark as noexcept methods of class table calling
advance_and_await and table::await_pending_ops that depends on them.
Fixes#8636
A followup patch will convert advance_and_await to a coroutine.
This is done separately to facilitate backporting of this patch.
Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210511161407.218402-1-bhalevy@scylladb.com>
When an index is created without an explicit name, a default name
is chosen. However, there was no check if a table with conflicting
name already exists. The check is now in place and if any conflicts
are found, a new index name is chosen instead.
When an index is created *with* an explicit name and a conflicting
regular table is found, index creation should simply fail.
This series comes with a test.
Fixes#8620
Tests: unit(release)
Closes#8632
* github.com:scylladb/scylla:
cql-pytest: add regression tests for index creation
cql3: fail to create an index if there is a name conflict
database: check for conflicting table names for indexes
The operator== of enum_option<> (which we use to hold multi-valued
Scylla options) makes it easy to compare to another enum_option
wrapper, but ugly to compare the actual value held. So this patch
adds a nicer way to compare the value held.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210511120222.1167686-1-nyh@scylladb.com>
utils::phased_barrier holds a `lw_shared_ptr<gate>` that is
typically `enter()`ed in `phased_barrier::start()`,
and left when the operation is destroyed in `~operation`.
Currently, the operation move-assign implementation is the
default one that just moves the lw_shared gate ptr from the
other operation into this one, without calling `_gate->leave()` first.
This change first destroys *this when move-assigned (if not self)
to call _gate->leave() if engaged, before reassigning the
gate with the other operation::_gate.
A unit test that reproduces the issue before this change
and passes with the fix was added to serialized_action_test.
Fixes#8613
Test: unit(dev), serialized_action_test(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210510120703.1520328-1-bhalevy@scylladb.com>
"
The current printout is has multiple problems:
* It is segregated by state, each having its own sorting criteria;
* Number of permits and count resources is collapsed in to a single
column, not clear which is the one printed.
* Number of available/initial units of the semaphore are not printed;
This series solves all this problems:
* It merges all states into a single table, sorted by memory
consumption, in descending order.
* It separates number of permits and count resources into separate
columns.
* Prints a summary of the semaphore units.
* Provides a cap on the maximum amount of printable lines, to not blow
up the logs.
The goal of all this is to make it easy to find the culprit a semaphore
problem: easily spot the big memory consumers, then unpack the name
column to determine which table and code path is responsible.
This brings the printout close to the recently `scylla reads`
scylla-gdb.py command, providing a uniform report format across the two
tools.
Example report:
INFO 2021-05-07 09:52:16,806 [shard 0] testlog - With max-lines=4: Semaphore reader_concurrency_semaphore_dump_reader_diganostics with 8/2147483647 count and 263599186/9223372036854775807 memory resources: user request, dumping permit diagnostics:
permits count memory table/description/state
7 2 77M ks.tbl1/op1/active
6 3 59M ks.tbl1/op0/active
4 0 36M ks.tbl1/op2/active
3 1 36M ks.tbl0/op2/active
11 2 43M permits omitted for brevity
31 8 251M total
"
* 'reader-concurrency-semaphore-dump-improvement/v1' of https://github.com/denesb/scylla:
test: reader_concurrency_test: add reader_concurrency_semaphore_dump_reader_diganostics
reader_concurrency_semaphore: dump_reader_diagnostics(): print more information in the header
reader_concurrency_semaphore: dump_reader_diagnostics(): cap number of printed lines
reader_concurrency_semaphore: dump_reader_diagnostics(): sort lines in descending order
reader_concurrency_semaphore: dump_reader_diagnostics(): merge all states into a single table
reader_concurrency_semaphore: dump_reader_diagnostics(): separate number of permits and count resources
In commit 3e39985c7a we added the Cassandra-compatible system table
system."IndexInfo" (note the capitalized table name) which lists built
indexes. Because we already had a table of built materialized views, and
indexes are implemented as materialized views, the index list was
implemented as a virtual table based on the view list.
However, the *name* of each materialized view listed in the list of
views looks like something_index, with the suffix "_index", while the
name of the table we need to print is "something". We forgot to do this
transformation in the virtual table - and this is what this patch does.
This bug can confuse applications which use this system table to wait for
an index to be built. Several tests translated from Cassandra's unit
tests, in cassandra_tests/validation/entities/secondary_index_test.py fail
in wait_for_index() because of this incompatibility, and pass after this
patch.
This patch also changes the unit test that enshrined the previous, wrong,
behavior, to test for the correct behavior. This problem is typical of
C++ unit tests which cannot be run against Cassandra.
Fixes#8600
Unfortunately, although this patch fixes "typical" applications (including
all tests which I tried) - applications which read from IndexInfo in a
"typical" method to look for a specific index being ready, the
implementation is technically NOT correct: The problem is that index
names are not sorted in the right order, because they are sorted with
the "_index" prefix.
To give an example, the index names "a" should be listed before "a1", but
the view names "a1_index" comes before "a_index" (because in ASCII, 1
comes before underscore). I can't think of any way to fix this bug
without completely reimplementing IndexInfo in a different way - probably
based on a temporary memtable (which is fine as this is not a
performance-critical operation). We'll need to do this rewrite eventually,
and I'll open a new issue.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210509140113.1084497-1-nyh@scylladb.com>
Ref: #7617
This series adds timeout parameters to service levels.
Per-service-level timeouts can be set up in the form of service level parameters, which can in turn be attached to roles. Setting up and modifying role-specific timeouts can be achieved like this:
```cql
CREATE SERVICE LEVEL sl2 WITH read_timeout = 500ms AND write_timeout = 200ms AND cas_timeout = 2s;
ATTACH SERVICE LEVEL sl2 TO cassandra;
ALTER SERVICE LEVEL sl2 WITH write_timeout = null;
```
Per-service-level timeouts take precedence over default timeout values from scylla.yaml, but can still be overridden for a specific query by per-query timeouts (e.g. `SELECT * from t USING TIMEOUT 50ms`).
Closes#7913
* github.com:scylladb/scylla:
docs: add a paragraph describing service level timeouts
test: add per-service-level timeout tests
test: add refreshing client state
transport: add updating per-service-level params
client_state: allow updating per service level params
qos: allow returning combined service level options
qos: add a way of merging service level options
cql3: add preserving default values for per-sl timeouts
qos: make getting service level public
qos: make finding service level public
treewide: remove service level controller from query state
treewide: propagate service level to client state
sstables: disambiguate boost::find
cql3: add a timeout column to LIST SERVICE LEVEL statement
db: add extracting service level info via CQL
types: add a missing translation for cql_duration
cql3: allow unsetting service level timeouts
cql3: add validating service level timeout values
db: add setting service level params via system_distributed
cql3: add fetching service level attrs in ALTER and CREATE
cql3: add timeout to service level params
qos: add timeout to service level info
db,sys_dist_ks: add timeout to the service level table
migration_manager: allow table updates with timestamp
cql3: allow a null keyword for CQL properties
This patchset adds a basic scylla-gdb.py test to the test suite.
First two patches add the test itself (disabled), subsequent ones are
fixes for scylla-gdb.py to make the test pass, and the last one
enables the test.
Closes#8618
* github.com:scylladb/scylla:
test: enable scylla-gdb/run
scylla-gdb.py: "this" -> "self"
scylla-gdb.py: wrap std::unordered_{set,map} and flat_hash_map
scylla-gdb.py: robustify execution_strategy traversal
scylla-gdb.py: recognize new sstable reader types
scylla-gdb.py: make list_unordered_map more resilient
scylla-gdb.py: robustify netw & gms
scylla-gdb.py: redo find_db() in terms of sharded()
scylla-gdb.py: debug::logalloc_alignment may not exist
scylla-gdb.py: handle changed container type of keyspaces
scylla-gdb.py: walk intrusive containers using provided link fields
test: add a basic test for scylla-gdb.py
test.py: refine test mode control
"
Storage service needs migration notifier reference to pass it to cdc
service via get_local_storage_service(). This set removes
- get_local_storage_service from cdc
- migration notifier from storage service
- db_context::builder from cdc (released nuclear binding energy)
tests: unit(dev)
"
* 'br-cdc-no-storage-service' of https://github.com/xemul/scylla:
storage_service: Remove migration notifier dependency
cdc: Remove db_context::builder
cdc: Provide migration notifier right at once
cdc: Remove db_context::builder::with_migration_notifier
"
This patch-set builds on the existing very basic coverage generation
support and greatly improves it, adding an almost fully automated way of
generating reports, as well as a more manual way.
At the heart of this is a new build mode, coverage, that is dedicated
to coverage report generation, containing all the required build flags,
without interfering with that of the "host" build mode, like currently
(with the --coverage flag).
Additionally a new script, scripts/coverage.py, is added which automates
the magic behind the scenes needed to get from raw profile files to a
nice html report, as long as the raw files are at the expected place.
There are still some rough edges:
* There is no direct ninja support for coverage generation, one has to
build the tests, then run them via test.py.
* Building and running just a few tests is a miserable experience
(#8608).
* Only boost unit tests are supported at the moment when using test.py.
* A --verbose flag for coverage.py would be nice.
* coverage.py could have a way to run a test itself, automatically
adding the required ENV variable(s).
I plan on addressing all these in the future, in the meanwhile, with
this series, the coverage report generation is made available for
non-masochists as well.
"
* 'coverage-improvements/v1' of https://github.com/denesb/scylla:
HACKING.md: update the coverage guide
test.py: add basic coverage generation support
scripts: introduce coverage.py
configure.py: replace --coverage with a coverage build mode
configure.py: make the --help output more readable
configure.py: add build mode descriptions
configure.py: fix fallback mode selection for checkheaders target
configure.py: centralize the declaration of build modes
When decommission is done, all nodes that receive data from the
decommission node will run node_ops_cmd::decommission_done handler.
Trigger off-strategy compaction inside the handler to wire off-strategy
for decommission.
Refs #5226Closes#8607
This cuts an allocation in the write path. Instruction count reduction isn't
large, but performance does improve (results are consistent):
before: 196369.48 tps ( 55.2 allocs/op, 13.2 tasks/op, 51658 insns/op)
after: 199290.32 tps ( 54.2 allocs/op, 13.2 tasks/op, 51600 insns/op)
(this is perf_simple_query --write --smp 1 --operations-per-shard 1000000)
Since small_vector requires noexcept move constructor and assignment,
they corresponding unique_response_handler members are adjusted/added
respectively.
Closes#8606
* github.com:scylladb/scylla:
storage_proxy: place unique_response_handler:s in small_vector instead of std::vector
storage_proxy: make unique_response_handler friendly to small_vector
storage_proxy: give a name to a vector of unique_response_handlers
On severl instance types in AWS and Azure, we get the following failure
during scylla_io_setup process:
```
ERROR 2021-04-14 07:50:35,666 [shard 5] seastar - Could not setup Async
I/O: Resource temporarily unavailable. The most common cause is not
enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that
number or reducing the amount of logical CPUs available for your
application
```
We have scylla_prepare:configure_io_slots() running before the
scylla-server.service start, but the scylla_io_setup is taking place
before
1) Let's move configure_io_slots() to scylla_util.py since both
scylla_io_setup and scylla_prepare are import functions from it
2) cleanup scylla_prepare since we don't need the same function twice
3) Let's use configure_io_slots() during scylla_io_setup to avoid such
failure
Fixes: #8587Closes#8512
The get_all_endpoints() should return the nodes that are part of the ring.
A node inside _endpoint_to_host_id_map does not guarantee that the node
is part of the ring.
To fix, return from _token_to_endpoint_map.
Fixes#8534Closes#8536
* github.com:scylladb/scylla:
token_metadata: Get rid of get_all_endpoints_count
range_streamer: Handle everywhere_topology
range_streamer: Adjust use_strict_sources_for_ranges
token_metadata: Fix get_all_endpoints to return nodes in the ring
Some unordered_map instantiations have cache=true, some cache=false,
but we don't need to care.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
I haven't found a way to make it stay -- __attribute__((used)) is not
enough and apparently lld is going to ignore __attribute__((retain))
until at least LLVM 13.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
clang & gdb apparently conspire to not reveal template argument types
beyond the first one -- at least for some templates, and definitely
for Boost's intrusive container ones. This severely restricts our
ability to find the right intrusive list link by examining the
container type.
Allow the caller to simply provide the relevant field name, so we
don't have to guess.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
(And disable it initially, because it won't pass without subsequent
commits)
Runs only in release mode, to keep things more realistic.
Doesn't exercise Scylla much at present -- just stops it after several
compactions and tries (almost) all "scylla *" commands in order.
Refs #6952.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
* Add ability to skip tests in individual modes using "skip_in_<mode>".
* Add ability to allow tests in specific modes using "run_in_<mode>".
* Rename "skip_in_debug_mode" to "skip_in_debug_modes", because there
is an actual mode named "debug" and this is confusing.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
This commit adds unit tests for an issue with index creation
after a table with malicious name is previously created as well.
The cases cover both indexes with a default name and the ones with
explicit name set.
When an index with an explicit name is created, it's underlying
materalized view's name is set to <index-name>_index.
If there already exists a regular table with such a name,
the creation should fail with a proper error message.
When an index is created without an explicit name, a default name
is chosen. However, there was no check if a table with conflicting
name already exists. The check is now in place and if any conflicts
are found, a new index name is chosen instead.
Not really testing anything, at least not automatically. It just
provides coverage for the diagnostics dump code, as well as allows for
developers to inspect the printout visually when making changes.
With a helper client state refresher, some attributes
which are usually only refreshed after a client disconnects
and then reconnects, can be verified in the test suite.
Per-service-level parameters (currently timeouts)
are now updated when a new connection is established.
The other connections which have the changed role are currently
not immediately reloaded.
Originally, the API for finding a service level controller returned
its name, which also implied that only a single service level
may be active for a user and provide its options.
After adding timeout parameters it makes more sense to return a result
which combines multiple service level parameters - e.g. a user
can be attached to one level for read timeouts and a separate one
for write timeouts.
In order to combine multiple service level options coming from
multiple roles, a helper function is provided to merge two
of them. The semantics depend on each parameter, but for timeouts,
which are the only parameters at the time of writing this message,
the minimum value of the two is taken. That in particular means
that when service level A has timeout = 50ms and service level B
has timeout = 1s, the resulting service level options
would set the timeout to 50ms.