Commit Graph

26438 Commits

Author SHA1 Message Date
Tomasz Grabiec
57ed93bf44 db/virtual_table: Add a way to specify a range of partitions for virtual
table queries.

This change introduces a query_restrictions object into the virtual
table infrastructure, for now only holding a restriction on partition
ranges.
That partition range is then implemented into
memtable_filling_virtual_table.
2021-05-12 17:05:35 +02:00
Piotr Wojtczak
38720847f2 db/virtual_table: Introduce memtable_filling_virtual_table
This change adds a more specific implementation of the virtual table
called memtable_filling_virtual_table. It produces results by filling
a memtable on each read.
2021-05-12 17:05:34 +02:00
Juliusz Stasiewicz
61a0314952 db: Add virtual tables interface
This change introduces the basic interface we expect each virtual
table to implement. More specific implementations will then expand
upon it if needed.
2021-05-12 17:05:34 +02:00
Juliusz Stasiewicz
8333d66d4e db: Introduce chained_delegating_reader
This change adds a new type of mutation reader which purpose
is to allow inserting operations before an invocation of the proper
reader. It takes a future to wait on and only after it resolves	will
it forward the execution to the underlying flat_mutation_reader
implementation.
2021-05-12 17:05:34 +02:00
Benny Halevy
c0dafa75d9 utils: phased_barrier: advance_and_await: make noexcept
As a function returning a future, simplify
its interface by handling any exceptions and
returning an exceptional future instead of
propagating the exception.

In this specific case, throwing from advance_and_await()
will propagate through table::await_pending_* calls
short-circuiting a .finally clause in table::stop().

Also, mark as noexcept methods of class table calling
advance_and_await and table::await_pending_ops that depends on them.

Fixes #8636

A followup patch will convert advance_and_await to a coroutine.
This is done separately to facilitate backporting of this patch.

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210511161407.218402-1-bhalevy@scylladb.com>
2021-05-12 01:36:11 +02:00
Benny Halevy
b4cbd46adb row_cache: create_underlying_reader: call read_context on_underlying_created only on success
ctx.on_underlying_created() mustn't be called if src.make_reader failed
and a reader isn't created.

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210511054525.35090-1-bhalevy@scylladb.com>
2021-05-12 01:34:48 +02:00
Nadav Har'El
cee4c075d2 Merge 'Fix index name conflicts with regular tables' from Piotr Sarna
When an index is created without an explicit name, a default name
is chosen. However, there was no check if a table with conflicting
name already exists. The check is now in place and if any conflicts
are found, a new index name is chosen instead.
When an index is created *with* an explicit name and a conflicting
regular table is found, index creation should simply fail.

This series comes with a test.

Fixes #8620
Tests: unit(release)

Closes #8632

* github.com:scylladb/scylla:
  cql-pytest: add regression tests for index creation
  cql3: fail to create an index if there is a name conflict
  database: check for conflicting table names for indexes
2021-05-11 18:40:15 +03:00
Nadav Har'El
c7a814fd5c utils/enum_option.hh: make it easier to compare the value
The operator== of enum_option<> (which we use to hold multi-valued
Scylla options) makes it easy to compare to another enum_option
wrapper, but ugly to compare the actual value held. So this patch
adds a nicer way to compare the value held.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210511120222.1167686-1-nyh@scylladb.com>
2021-05-11 18:39:10 +03:00
Benny Halevy
9ba960a388 utils: phased_barrier::operation do not leak gate entry when reassigned
utils::phased_barrier holds a `lw_shared_ptr<gate>` that is
typically `enter()`ed in `phased_barrier::start()`,
and left when the operation is destroyed in `~operation`.

Currently, the operation move-assign implementation is the
default one that just moves the lw_shared gate ptr from the
other operation into this one, without calling `_gate->leave()` first.

This change first destroys *this when move-assigned (if not self)
to call _gate->leave() if engaged, before reassigning the
gate with the other operation::_gate.

A unit test that reproduces the issue before this change
and passes with the fix was added to serialized_action_test.

Fixes #8613

Test: unit(dev), serialized_action_test(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210510120703.1520328-1-bhalevy@scylladb.com>
2021-05-11 18:39:10 +03:00
Avi Kivity
1d8234f52d Merge "reader_concurrency_semaphore: improve diagnostics printout" from Botond
"
The current printout is has multiple problems:
* It is segregated by state, each having its own sorting criteria;
* Number of permits and count resources is collapsed in to a single
  column, not clear which is the one printed.
* Number of available/initial units of the semaphore are not printed;

This series solves all this problems:
* It merges all states into a single table, sorted by memory
  consumption, in descending order.
* It separates number of permits and count resources into separate
  columns.
* Prints a summary of the semaphore units.
* Provides a cap on the maximum amount of printable lines, to not blow
  up the logs.

The goal of all this is to make it easy to find the culprit a semaphore
problem: easily spot the big memory consumers, then unpack the name
column to determine which table and code path is responsible.
This brings the printout close to the recently `scylla reads`
scylla-gdb.py command, providing a uniform report format across the two
tools.
Example report:
INFO  2021-05-07 09:52:16,806 [shard 0] testlog - With max-lines=4: Semaphore reader_concurrency_semaphore_dump_reader_diganostics with 8/2147483647 count and 263599186/9223372036854775807 memory resources: user request, dumping permit diagnostics:
permits count   memory  table/description/state
7       2       77M     ks.tbl1/op1/active
6       3       59M     ks.tbl1/op0/active
4       0       36M     ks.tbl1/op2/active
3       1       36M     ks.tbl0/op2/active
11      2       43M     permits omitted for brevity

31      8       251M    total
"

* 'reader-concurrency-semaphore-dump-improvement/v1' of https://github.com/denesb/scylla:
  test: reader_concurrency_test: add reader_concurrency_semaphore_dump_reader_diganostics
  reader_concurrency_semaphore: dump_reader_diagnostics(): print more information in the header
  reader_concurrency_semaphore: dump_reader_diagnostics(): cap number of printed lines
  reader_concurrency_semaphore: dump_reader_diagnostics(): sort lines in descending order
  reader_concurrency_semaphore: dump_reader_diagnostics(): merge all states into a single table
  reader_concurrency_semaphore: dump_reader_diagnostics(): separate number of permits and count resources
2021-05-11 18:39:10 +03:00
Avi Kivity
eed89a9b56 Update tools/jmx submodule (toppartitions multi-sampler query)
* tools/jmx 440313e...a7c4c39 (1):
  > storage_service: Fix getToppartitions to always return both reads and writes
2021-05-11 18:39:10 +03:00
Nadav Har'El
af485f5226 secondary index: fix index name in IndexInfo system table
In commit 3e39985c7a we added the Cassandra-compatible system table
system."IndexInfo" (note the capitalized table name) which lists built
indexes. Because we already had a table of built materialized views, and
indexes are implemented as materialized views, the index list was
implemented as a virtual table based on the view list.

However, the *name* of each materialized view listed in the list of
views looks like something_index, with the suffix "_index", while the
name of the table we need to print is "something". We forgot to do this
transformation in the virtual table - and this is what this patch does.

This bug can confuse applications which use this system table to wait for
an index to be built. Several tests translated from Cassandra's unit
tests, in cassandra_tests/validation/entities/secondary_index_test.py fail
in wait_for_index() because of this incompatibility, and pass after this
patch.

This patch also changes the unit test that enshrined the previous, wrong,
behavior, to test for the correct behavior. This problem is typical of
C++ unit tests which cannot be run against Cassandra.

Fixes #8600

Unfortunately, although this patch fixes "typical" applications (including
all tests which I tried) - applications which read from IndexInfo in a
"typical" method to look for a specific index being ready, the
implementation is technically NOT correct: The problem is that index
names are not sorted in the right order, because they are sorted with
the "_index" prefix.
To give an example, the index names "a" should be listed before "a1", but
the view names "a1_index" comes before "a_index" (because in ASCII, 1
comes before underscore). I can't think of any way to fix this bug
without completely reimplementing IndexInfo in a different way - probably
based on a temporary memtable (which is fine as this is not a
performance-critical operation). We'll need to do this rewrite eventually,
and I'll open a new issue.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210509140113.1084497-1-nyh@scylladb.com>
2021-05-11 18:39:10 +03:00
Avi Kivity
61c7f874cc Merge 'Add per-service-level timeouts' from Piotr Sarna
Ref: #7617

This series adds timeout parameters to service levels.

Per-service-level timeouts can be set up in the form of service level parameters, which can in turn be attached to roles. Setting up and modifying role-specific timeouts can be achieved like this:
```cql
CREATE SERVICE LEVEL sl2 WITH read_timeout = 500ms AND write_timeout = 200ms AND cas_timeout = 2s;
ATTACH SERVICE LEVEL sl2 TO cassandra;
ALTER SERVICE LEVEL sl2 WITH write_timeout = null;
```
Per-service-level timeouts take precedence over default timeout values from scylla.yaml, but can still be overridden for a specific query by per-query timeouts (e.g. `SELECT * from t USING TIMEOUT 50ms`).

Closes #7913

* github.com:scylladb/scylla:
  docs: add a paragraph describing service level timeouts
  test: add per-service-level timeout tests
  test: add refreshing client state
  transport: add updating per-service-level params
  client_state: allow updating per service level params
  qos: allow returning combined service level options
  qos: add a way of merging service level options
  cql3: add preserving default values for per-sl timeouts
  qos: make getting service level public
  qos: make finding service level public
  treewide: remove service level controller from query state
  treewide: propagate service level to client state
  sstables: disambiguate boost::find
  cql3: add a timeout column to LIST SERVICE LEVEL statement
  db: add extracting service level info via CQL
  types: add a missing translation for cql_duration
  cql3: allow unsetting service level timeouts
  cql3: add validating service level timeout values
  db: add setting service level params via system_distributed
  cql3: add fetching service level attrs in ALTER and CREATE
  cql3: add timeout to service level params
  qos: add timeout to service level info
  db,sys_dist_ks: add timeout to the service level table
  migration_manager: allow table updates with timestamp
  cql3: allow a null keyword for CQL properties
2021-05-11 18:39:10 +03:00
Nadav Har'El
3c2e852dd9 Merge 'scylla-gdb unit test' from Michael Livshin
This patchset adds a basic scylla-gdb.py test to the test suite.

First two patches add the test itself (disabled), subsequent ones are
fixes for scylla-gdb.py to make the test pass, and the last one
enables the test.

Closes #8618

* github.com:scylladb/scylla:
  test: enable scylla-gdb/run
  scylla-gdb.py: "this" -> "self"
  scylla-gdb.py: wrap std::unordered_{set,map} and flat_hash_map
  scylla-gdb.py: robustify execution_strategy traversal
  scylla-gdb.py: recognize new sstable reader types
  scylla-gdb.py: make list_unordered_map more resilient
  scylla-gdb.py: robustify netw & gms
  scylla-gdb.py: redo find_db() in terms of sharded()
  scylla-gdb.py: debug::logalloc_alignment may not exist
  scylla-gdb.py: handle changed container type of keyspaces
  scylla-gdb.py: walk intrusive containers using provided link fields
  test: add a basic test for scylla-gdb.py
  test.py: refine test mode control
2021-05-11 18:39:10 +03:00
Avi Kivity
b1f9df279a Merge "Untie cdc, storage service and migration notifier knot" from Pavel E
"
Storage service needs migration notifier reference to pass it to cdc
service via get_local_storage_service(). This set removes

- get_local_storage_service from cdc
- migration notifier from storage service
- db_context::builder from cdc (released nuclear binding energy)

tests: unit(dev)
"

* 'br-cdc-no-storage-service' of https://github.com/xemul/scylla:
  storage_service: Remove migration notifier dependency
  cdc: Remove db_context::builder
  cdc: Provide migration notifier right at once
  cdc: Remove db_context::builder::with_migration_notifier
2021-05-11 18:39:10 +03:00
Michael Livshin
ff7d781988 test: enable scylla-gdb/run
It should pass now.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Avi Kivity
6548436db3 Merge "Improve coverage support" from Botond
"
This patch-set builds on the existing very basic coverage generation
support and greatly improves it, adding an almost fully automated way of
generating reports, as well as a more manual way.
At the heart of this is a new build mode, coverage, that is dedicated
to coverage report generation, containing all the required build flags,
without interfering with that of the "host" build mode, like currently
(with the --coverage flag).
Additionally a new script, scripts/coverage.py, is added which automates
the magic behind the scenes needed to get from raw profile files to a
nice html report, as long as the raw files are at the expected place.
There are still some rough edges:
* There is no direct ninja support for coverage generation, one has to
  build the tests, then run them via test.py.
* Building and running just a few tests is a miserable experience
  (#8608).
* Only boost unit tests are supported at the moment when using test.py.
* A --verbose flag for coverage.py would be nice.
* coverage.py could have a way to run a test itself, automatically
  adding the required ENV variable(s).

I plan on addressing all these in the future, in the meanwhile, with
this series, the coverage report generation is made available for
non-masochists as well.
"

* 'coverage-improvements/v1' of https://github.com/denesb/scylla:
  HACKING.md: update the coverage guide
  test.py: add basic coverage generation support
  scripts: introduce coverage.py
  configure.py: replace --coverage with a coverage build mode
  configure.py: make the --help output more readable
  configure.py: add build mode descriptions
  configure.py: fix fallback mode selection for checkheaders target
  configure.py: centralize the declaration of build modes
2021-05-11 18:39:10 +03:00
Michael Livshin
ee80c81593 scylla-gdb.py: "this" -> "self"
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Asias He
4f0a1cbca3 repair: Wire off-strategy compaction for decommission
When decommission is done, all nodes that receive data from the
decommission node will run node_ops_cmd::decommission_done handler.

Trigger off-strategy compaction inside the handler to wire off-strategy
for decommission.

Refs #5226

Closes #8607
2021-05-11 18:39:10 +03:00
Michael Livshin
b711fc5762 scylla-gdb.py: wrap std::unordered_{set,map} and flat_hash_map
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Nadav Har'El
df9faba652 Merge 'storage_proxy: place unique_response_handler:s in small_vector instead of std::vector' from Avi Kivity
This cuts an allocation in the write path. Instruction count reduction isn't
large, but performance does improve (results are consistent):

before: 196369.48 tps ( 55.2 allocs/op,  13.2 tasks/op,   51658 insns/op)
after:  199290.32 tps ( 54.2 allocs/op,  13.2 tasks/op,   51600 insns/op)

(this is perf_simple_query --write --smp 1 --operations-per-shard 1000000)

Since small_vector requires noexcept move constructor and assignment,
they corresponding unique_response_handler members are adjusted/added
respectively.

Closes #8606

* github.com:scylladb/scylla:
  storage_proxy: place unique_response_handler:s in small_vector instead of std::vector
  storage_proxy: make unique_response_handler friendly to small_vector
  storage_proxy: give a name to a vector of unique_response_handlers
2021-05-11 18:39:10 +03:00
Michael Livshin
b0fbd0062e scylla-gdb.py: robustify execution_strategy traversal
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Yaron Kaikov
588a065304 scylla_io_setup: configure "aio-max-nr" before iotune
On severl instance types in AWS and Azure, we get the following failure
during scylla_io_setup process:
```
ERROR 2021-04-14 07:50:35,666 [shard 5] seastar - Could not setup Async
I/O: Resource temporarily unavailable. The most common cause is not
enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that
number or reducing the amount of logical CPUs available for your
application
```

We have scylla_prepare:configure_io_slots() running before the
scylla-server.service start, but the scylla_io_setup is taking place
before

1) Let's move configure_io_slots() to scylla_util.py since both
   scylla_io_setup and scylla_prepare are import functions from it
2) cleanup scylla_prepare since we don't need the same function twice
3) Let's use configure_io_slots() during scylla_io_setup to avoid such
failure

Fixes: #8587

Closes #8512
2021-05-11 18:39:10 +03:00
Michael Livshin
4ea6c7cd49 scylla-gdb.py: recognize new sstable reader types
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Nadav Har'El
fb0c4e469a Merge 'token_metadata: Fix get_all_endpoints to return nodes in the ring' from Asias He
The get_all_endpoints() should return the nodes that are part of the ring.

A node inside _endpoint_to_host_id_map does not guarantee that the node
is part of the ring.

To fix, return from _token_to_endpoint_map.

Fixes #8534

Closes #8536

* github.com:scylladb/scylla:
  token_metadata: Get rid of get_all_endpoints_count
  range_streamer: Handle everywhere_topology
  range_streamer: Adjust use_strict_sources_for_ranges
  token_metadata: Fix get_all_endpoints to return nodes in the ring
2021-05-11 18:39:10 +03:00
Michael Livshin
513695c5ba scylla-gdb.py: make list_unordered_map more resilient
Some unordered_map instantiations have cache=true, some cache=false,
but we don't need to care.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Michael Livshin
2a386c06d9 scylla-gdb.py: robustify netw & gms
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Michael Livshin
76c2d792c9 scylla-gdb.py: redo find_db() in terms of sharded()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Michael Livshin
ed2d471e79 scylla-gdb.py: debug::logalloc_alignment may not exist
I haven't found a way to make it stay -- __attribute__((used)) is not
enough and apparently lld is going to ignore __attribute__((retain))
until at least LLVM 13.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Michael Livshin
77d8272cca scylla-gdb.py: handle changed container type of keyspaces
Used to be std::unordered_map, but is a flat_hash_map now.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Michael Livshin
69a5aef620 scylla-gdb.py: walk intrusive containers using provided link fields
clang & gdb apparently conspire to not reveal template argument types
beyond the first one -- at least for some templates, and definitely
for Boost's intrusive container ones.  This severely restricts our
ability to find the right intrusive list link by examining the
container type.

Allow the caller to simply provide the relevant field name, so we
don't have to guess.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Michael Livshin
73f9f08df6 test: add a basic test for scylla-gdb.py
(And disable it initially, because it won't pass without subsequent
commits)

Runs only in release mode, to keep things more realistic.

Doesn't exercise Scylla much at present -- just stops it after several
compactions and tries (almost) all "scylla *" commands in order.

Refs #6952.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Michael Livshin
3bff94cd29 test.py: refine test mode control
* Add ability to skip tests in individual modes using "skip_in_<mode>".

* Add ability to allow tests in specific modes using "run_in_<mode>".

* Rename "skip_in_debug_mode" to "skip_in_debug_modes", because there
  is an actual mode named "debug" and this is confusing.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-11 18:39:10 +03:00
Piotr Sarna
1cb804f024 cql-pytest: add regression tests for index creation
This commit adds unit tests for an issue with index creation
after a table with malicious name is previously created as well.
The cases cover both indexes with a default name and the ones with
explicit name set.
2021-05-11 17:34:37 +02:00
Piotr Sarna
0ef0a4c78d cql3: fail to create an index if there is a name conflict
When an index with an explicit name is created, it's underlying
materalized view's name is set to <index-name>_index.
If there already exists a regular table with such a name,
the creation should fail with a proper error message.
2021-05-11 15:21:00 +02:00
Piotr Sarna
fa53bf5c1e database: check for conflicting table names for indexes
When an index is created without an explicit name, a default name
is chosen. However, there was no check if a table with conflicting
name already exists. The check is now in place and if any conflicts
are found, a new index name is chosen instead.
2021-05-11 15:20:59 +02:00
Botond Dénes
69d04d161e test: reader_concurrency_test: add reader_concurrency_semaphore_dump_reader_diganostics
Not really testing anything, at least not automatically. It just
provides coverage for the diagnostics dump code, as well as allows for
developers to inspect the printout visually when making changes.
2021-05-10 18:06:30 +03:00
Piotr Sarna
7f086d8f73 docs: add a paragraph describing service level timeouts
Along with examples.
2021-05-10 12:39:41 +02:00
Piotr Sarna
570c63d39b test: add per-service-level timeout tests
The test suite checks if per-service-level timeouts
work and validate their input.
2021-05-10 12:39:41 +02:00
Piotr Sarna
43f1f9e445 test: add refreshing client state
With a helper client state refresher, some attributes
which are usually only refreshed after a client disconnects
and then reconnects, can be verified in the test suite.
2021-05-10 12:39:41 +02:00
Piotr Sarna
6da59b8a38 transport: add updating per-service-level params
Per-service-level parameters (currently timeouts)
are now updated when a new connection is established.
The other connections which have the changed role are currently
not immediately reloaded.
2021-05-10 12:39:41 +02:00
Piotr Sarna
7ee5686d6c client_state: allow updating per service level params
Per-service-level params can now be updated with a helper function.
2021-05-10 12:39:41 +02:00
Piotr Sarna
368a6976ff qos: allow returning combined service level options
Originally, the API for finding a service level controller returned
its name, which also implied that only a single service level
may be active for a user and provide its options.
After adding timeout parameters it makes more sense to return a result
which combines multiple service level parameters - e.g. a user
can be attached to one level for read timeouts and a separate one
for write timeouts.
2021-05-10 12:39:41 +02:00
Piotr Sarna
cbedefb0f9 qos: add a way of merging service level options
In order to combine multiple service level options coming from
multiple roles, a helper function is provided to merge two
of them. The semantics depend on each parameter, but for timeouts,
which are the only parameters at the time of writing this message,
the minimum value of the two is taken. That in particular means
that when service level A has timeout = 50ms and service level B
has timeout = 1s, the resulting service level options
would set the timeout to 50ms.
2021-05-10 12:39:41 +02:00
Piotr Sarna
4ba1ac57a1 cql3: add preserving default values for per-sl timeouts
In order for per-service-level timeouts to work as expected,
a special value is reserved for internally marking the timeouts
as deleted.
2021-05-10 11:48:14 +02:00
Piotr Sarna
fb4e8951f5 qos: make getting service level public 2021-05-10 11:48:14 +02:00
Piotr Sarna
06d0e1853d qos: make finding service level public 2021-05-10 11:48:14 +02:00
Piotr Sarna
e257ec11c0 treewide: remove service level controller from query state
... since it's accessible through its member, client state.
2021-05-10 11:48:14 +02:00
Piotr Sarna
d1f2e8b469 treewide: propagate service level to client state
... since it's going to be used to set up per-service-level
timeouts.
2021-05-10 11:48:14 +02:00
Piotr Sarna
00e59a9823 sstables: disambiguate boost::find
There are multiple functions named `find` in boost,
so to avoid future clashes, this one is explicitly marked
as belonging to boost::range.
2021-05-10 11:48:14 +02:00