Commit Graph

1456 Commits

Author SHA1 Message Date
Pavel Emelyanov
a4118a70ee database, messaging: Delete old connection drop notification
Database no longer needs it. Since the only user of the old-style
notification is gone -- remove it as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
bfd91d7b81 database, proxy: Relocate connection-drop activity
On start database is subscribed on messaging-service connection drop
notification to drop the hit-rate from column families. However, the
updater and reader of those hit-rates is the storage_proxy, so it
must be the _proxy_ who drops the hit-rate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
b78e9b51b7 database, tests: Rework recommended format setting
Tests don't have sstable format selector and enforce the needed
format by hands with the help of special database:: method. It's
more natural to provide it via convig. Doing this makes database
initialization in main and cql_test_env closer to each other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
a42383b127 database, sstables_manager: Sow some noexcepts
Setting sstables format into database and into sstables_manager is
all plain assignments. Mark them as noexcept, next patch will become
apparently exception safe after that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
9a76df96e3 database: Eliminate unused helpers
There are some large-data-handler-related helpers left after previous
patches, they can be removed altogehter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
4b7846da86 database: Merge the stop_database() into database::stop()
After stop_database() became shard-local, it's possible to merge
it with database::stop() as they are both called one after another
on scylla stop. In cql-test-env there are few more steps in
between, but they don't rely on the database being partially
stopped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
469c734155 database: Flatten stop_database()
The method need to perform four steps cross-shard synchronously:
first stop compaction manager, then close user and, after it,
system tables, finally shutdown the large data handler.

This patch reworks this synchronization with the help of cross-shard
barrier added to the database previously. The motivation is to merge
.stop_database() with .stop().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
b1013e09b4 database: Equip with cross-shard-barrier
Make sure a node-wide barrier exists on a database when scylla starts.
Also provide a barrier for cql_test_env. In all other cases keep a
solo-mode barrier so that single-shard db stop doesn't get blocked.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
634ea4b543 database: Move starting bits into start()
Thse include large_data_handler::start, compaction_manager::enable
and database::init_commitlog.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:48:48 +03:00
Pavel Emelyanov
e2308034ff database: Add .start() method
Called right after the sharded::start(). For now empty, to be populated
by next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:44:48 +03:00
Pavel Emelyanov
127e4fe8de main: Shorten commitlog creation
This does three things in one go:

- converts

    db.invoke_on_all([] (database& db) {
        return db.init_commitlog();
    });

  into a one-line version

    db.invoke_on_all(&database::init_commitlog);

- removes the shard-0 pre-initialization for tests, because
  tests don't have the problem this pre- solves

- make the init_commitlog() re-entrable to let regular start
  not check for shard-0 explicitly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:37:07 +03:00
Pavel Emelyanov
bd2b7dca0e database: Remove unused mm arg from init_non_system_keyspaces()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:35:37 +03:00
Pavel Emelyanov
bb23986826 wasm: Localize it to database usage
The wasm::engine exists as a sharded<> service in main, but it's only
passed by local reference into database on start. There's no much profit
in keeping it at main scope, things get much simpler if keeping the
engine purely on database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:35:17 +03:00
Kamil Braun
c12e265eb8 table, database: query, mutation_query: remove unnecessary class_config param
The semaphore inside was never accessed and `max_memory_for_unlimited_query`
was always equal to `*cmd.max_result_size` so the parameter was completely
redundant.

`cmd.max_result_size` is supposed to be always set in the affected
functions - which are executed on the replica side - as soon as the
replica receives the `read_command` object, in case the parameter was
not set by the coordinator. However, we don't have a guarantee at the
type level (it's still an `optional`). Many places used
`*cmd.max_result_size` without even an assertion.

We make the code a bit safer, we check for `cmd.max_result_size` and if
it's indeed engaged, store it in `reader_permit`. We then access it from
`reader_permit` where necessary. If `cmd.max_result_size` is not set, we
assume this is an unlimited query and obtain the limit from
`get_unlimited_query_max_result_size`.
2021-09-14 13:39:56 +02:00
Kamil Braun
fbb83dd5ca reader_concurrency_semaphore: remove default parameter values from constructors
It's easy to forget about supplying the correct value for a parameter
when it has a default value specified. It's safer if 'production code'
is forced to always supply these parameters manually.

The default values were mostly useful in tests, where some parameters
didn't matter that much and where the majority of uses of the class are.
Without default values adding a new parameter is a pain, forcing one to
modify every usage in the tests - and there are a bunch of them. To
solve this, we introduce a new constructor which requires passing the
`for_tests` tag, marking that the constructor is only supposed to be
used in tests (and the constructor has an appropriate comment). This
constructor uses default values, but the other constructors - used in
'production code' - do not.
2021-09-14 12:20:28 +02:00
Botond Dénes
502a45ad58 treewide: switch to native reversed format for reverse reads
We define the native reverse format as a reversed mutation fragment
stream that is identical to one that would be emitted by a table with
the same schema but with reversed clustering order. The main difference
to the current format is how range tombstones are handled: instead of
looking at their start or end bound depending on the order, we always
use them as-usual and the reversing reader swaps their bounds to
facilitate this. This allows us to treat reversed streams completely
transparently: just pass along them a reversed schema and all the
reader, compacting and result building code is happily ignorant about
the fact that it is a reversed stream.
2021-09-09 15:42:15 +03:00
Benny Halevy
b7eaa22ce6 abstract_replication_strategy: create_replication_strategy: drop keyspace name parameter
It is not used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210906133840.3307279-1-bhalevy@scylladb.com>
2021-09-06 16:51:21 +03:00
Benny Halevy
56e063ce93 keyspace: get rid of set_replication_strategy
It's unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210906133905.3307397-1-bhalevy@scylladb.com>
2021-09-06 16:48:35 +03:00
Avi Kivity
4d7e00d0f8 cql3: selection: make selectable.hh not include expr/expresion.hh
We have this dependency now:

   column_identifier -> selectable -> expression

and want to introduce this:

   expression -> user types -> column_identifier

This leads to a loop, since expression is not (yet) forward
declarable.

Fix by moving any mention of expression from selectable.hh to a new
header selection-expr.hh.

database.cc lost access to timeout_config, so adjust its includes
to regain it.
2021-08-26 15:19:14 +03:00
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
9b0b13c450 reader_concurrency_semaphore: adjust reactivated reader timeout
Update the reader's timeout where needed
after unregistering inactive_read.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
fe479aca1d reader_permit: add timeout member
To replace the timeout parameter passed
to flat_mutation_reader methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00
Nadav Har'El
49aea3b301 Merge 'database: coroutinize schema load functions' from Avi Kivity
Simple coroutinization of the schema load functions, leaving the code tidier.

Test: unit (dev)

Closes #9217

* github.com:scylladb/scylla:
  database: adjust indentation after coroutinization of schema table parsing code
  database: convert database::parse_schema_tables() to a coroutine
  database: remove unneeded temporary in do_parse_schema_tables()
  database: convert do_parse_schema_tables() to a coroutine
2021-08-23 17:45:58 +03:00
Benny Halevy
4439e5c132 everywhere: cleanup defer.hh includes
Get rid of unused includes of seastar/util/{defer,closeable}.hh
and add a few that are missing from source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-22 21:11:39 +03:00
Avi Kivity
5450af8e1b database: coroutinize stop()
Make the code tidier.

The conversion is not mechanical: the finally block is converted
to straight line code. stop()/close() must not fail anyway, and we
cannot recover from such failures. The when_all_succeed() for stopping
the semaphores is also converted to straight-line code - there is no
advantage to stopping them in parallel, as we're just waiting for
running tasks to complete and clean up.

Test: unit (dev)

Closes #9218
2021-08-18 10:57:44 +02:00
Avi Kivity
73d6f2798d database: adjust indentation after coroutinization of schema table parsing code 2021-08-17 21:05:05 +03:00
Avi Kivity
4ca856157d database: convert database::parse_schema_tables() to a coroutine
In one case we have f = f.then(...), but we can just wait
for the first future where it's created.
2021-08-17 21:00:15 +03:00
Avi Kivity
4f91953ebf database: remove unneeded temporary in do_parse_schema_tables()
The coroutine can keep the cf_name parameter alive, provided we
pass it by value.
2021-08-17 20:45:41 +03:00
Avi Kivity
b2d5820d75 database: convert do_parse_schema_tables() to a coroutine 2021-08-17 20:44:28 +03:00
Asias He
cc44edb4e2 database: Detemplate run_async
I initially tried to use a noncopyable_function to avoid the unnecessary
template usage.

However, since database::apply_in_memory is a hot function. It is better
to use with_gate directly. The run_async function does nothing but calls
with_gate anyway.

Closes #9160
2021-08-12 07:53:10 +03:00
Nadav Har'El
6c27000b98 Merge 'Propagate exceptions without throwing' from Piotr Sarna
NOTE: this series depends on a Seastar submodule update, currently queued in next: 0ed35c6af052ab291a69af98b5c13e023470cba3

In order to avoid needless throwing, exceptions are passed
directly wherever possible. Two mechanisms which help with that are:
 1. `make_exception_future<>` for futures
 2. `co_return coroutine::exception(...)` for coroutines
    which return `future<T>` (the mechanism does not work for `future<>`
    without parameters, unfortunately)

Tests: unit(release)

Closes #9079

* github.com:scylladb/scylla:
  system_keyspace: pass exceptions without throwing
  sstables: pass exceptions without throwing
  storage_proxy: pass exceptions without throwing
  multishard_mutation_query: pass exceptions without throwing
  client_state: pass exceptions without throwing
  flat_mutation_reader: pass exceptions without throwing
  table: pass exceptions without throwing
  commitlog: pass exceptions without throwing
  compaction: pass exceptions without throwing
  database: pass exceptions without throwing
2021-08-01 16:47:47 +03:00
Avi Kivity
a180cd240f atomic_cell: change compare_atomic_cell_for_merge() to std::strong_ordering
The implementation is in database.cc for some reason.

Ref #1449.
2021-07-28 13:26:27 +03:00
Piotr Sarna
66c4d58a8c database: pass exceptions without throwing
In order to avoid needless throwing, exceptions are passed
directly wherever possible. Two mechanisms which help with that are:
 1. make_exception_future<> for futures
 2. co_return coroutine::exception(...) for coroutines
    which return future<T> (the mechanism does not work for future<>
    without parameters, unfortunately)
2021-07-26 17:02:36 +02:00
Botond Dénes
27fbca84f6 reader_concurrency_semaphore: remove prethrow_action
The semaphore accepts a functor as in its constructor which is run just
before throwing on wait queue overload. This is used exclusively to bump
a counter in the database::stats, which counts queue overloads. However,
there is now an identical counter in
reader_concurrency_semaphore::stats, so the database can just use that
directly and we can retire the now unused prethrow action.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210716111105.237492-1-bdenes@scylladb.com>
2021-07-19 15:47:37 +03:00
Pavel Emelyanov
1ed582304d memtable_list: Shorten flush coalescing codeflow
The memtable_list::flush() maintains a shared_promise object
to coalesce the flushers until the get_flush_permit() resolves.
Also it needs to keep the extraneous flushes counter bumped
while doing the flush itself.

All this can be coded in a shorter form and without the need
to carry shared_promise<> around.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210716164237.10993-1-xemul@scylladb.com>
2021-07-17 00:42:20 +02:00
Botond Dénes
ae4df99e6b database: remove now unused query execution stages 2021-07-14 17:19:02 +03:00
Botond Dénes
1b7eea0f52 reader_concurrency_semaphore: admission: flip the switch
This patch flips two "switches":
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) by now all permits are obtained up-front, so this patch just yanks
out the restricted reader from all reader stacks and simultaneously
switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By
doing this admission is now waited on when creating the permit.

(2) we switch to an admission algorithm that adds a new aspect to the
existing resource availability: the number of used/blocked reads. Namely
it only admits new reads if in addition to the necessary amount of
resources being available, all currently used readers are blocked. In
other words we only admit new reads if all currently admitted reads
requires something other than CPU to progress. They are either waiting
on I/O, a remote shard, or attention from their consumers (not used
currently).

We flip these two switches at the same time because up-front admission
means cache reads now need to obtain a permit too. For cache reads the
optimal concurrency is 1. Anything above that just increases latency
(without increasing throughput). So we want to make sure that if a cache
reader hits it doesn't get any competition for CPU and it can run to
completion. We admit new reads only if the read misses and has to go to
disk.

Another change made to accommodate this switch is the replacement of the
replica side read execution stages which the reader concurrency
semaphore as an execution stage. This replacement is needed because with
the introduction of up-front admission, reads are not independent of
each other any-more. One read executed can influence whether later reads
executed will be admitted or not, and execution stages require
independent operations to work well. By moving the execution stage into
the semaphore, we have an execution stage which is in control of both
admission and running the operations in batches, avoiding the bad
interaction between the two.
2021-07-14 17:19:02 +03:00
Botond Dénes
7bfa40a2f1 treewide: use make_tracking_only_permit()
For all those reads that don't (won't or can't) pass through admission
currently.
2021-07-14 17:19:02 +03:00
Botond Dénes
7f2813e3fa database: mutation_query(): handle querier lookup/save on the database level
Instead of passing down the querier_cache_ctx to table::mutation_query(),
handle the querier lookup/save on the level where the cache exists.

The real motivation behind this change however is that we need to move
the lookup outside the execution stage, because the current execution
stage will soon be replaced by the one provided by the semaphore and to
use that properly we need to know if we have a saved permit or not.
2021-07-14 16:48:43 +03:00
Botond Dénes
f9d302bf49 database: mutation_query(): convert into coroutine
To facilitate further patching (and reading).
2021-07-14 16:48:43 +03:00
Botond Dénes
d2f5393a43 database: query(): handle querier lookup/save on the database level
Instead of passing down the querier_cache_ctx to table::query(),
handle the querier lookup/save on the level where the cache exists.

The real motivation behind this change however is that we need to move
the lookup outside the execution stage, because the current execution
stage will soon be replaced by the one provided by the semaphore and to
use that properly we need to know if we have a saved permit or not.
2021-07-14 16:48:43 +03:00
Botond Dénes
c28a6e8537 database: query(): convert into coroutine
To facilitate further patching (and reading).
2021-07-14 16:48:43 +03:00
Botond Dénes
426b46c4ed mutation_reader: reader_lifecycle_policy: add obtain_reader_permit()
This method is both a convenience method to obtain the permit, as well
as an abstraction to allow different implementations to get creative.
For example, the main implementation, the one in multishard mutation
query returns the permit of the saved reader one was successful. This
ensures that on a multi-paged read the same permit is used across as
much pages as possible. Much more importantly it ensures the evictable
reader wrapping the actual reader both use the same permit.
2021-07-14 16:48:43 +03:00
Botond Dénes
97a03f9027 database: make_multishard_streaming_reader: use external permit
As a preparation for up-front admission, add a permit parameter to
`make_multishard_streaming_reader()`, which will be the admitted permit
once we switch to up-front admission. For now it has to be a
non-admitted permit.
A nice side-effect of this patch is that now permits will have a
use-case specific description, instead of the generic
"multishard-streaming-reader" one
2021-07-14 16:48:43 +03:00
Botond Dénes
999169e535 database: make_streaming_reader(): require permit
As a preparation for up-front admission, add a permit parameter to
`make_streaming_reader()`, which will be the admitted permit once we
switch to up-front admission. For now it has to be a non-admitted
permit.
A nice side-effect of this patch is that now permits will have a
use-case specific description, instead of the generic "streaming" one.
2021-07-14 16:48:43 +03:00
Botond Dénes
3ec149222d database: add obtain_reader_permit()
A convenience method for obtaining an admitted permit for a read on a
given table.
For now it uses the nowait semaphore obtaining method, as all normal
reads still use the old admission method. Migrating reads to this method
will make the switch easier, as there will be one central place to
replace the nowait method with the proper one.
2021-07-14 16:48:43 +03:00
Avi Kivity
f0e2f31839 Merge "Implement validation compaction" from Botond
"
Currently, when sstables are suspected to be corrupt, one has a few bad
choices on how to verify that they are indeed correct:
* Obtain suspect sstable files and manually inspect them. This is
  problematic because it requires a scylla engineer to have direct access
  to data, which is not always simple or even possible due to privacy
  protection rules.
* Run sstable scrub in abort mode. This is enough to confirm whether
  there is any corruption or not, but only in a binary manner. It is not
  possible to explore the full scope of the corruption, as the scrub
  will abort on the first corruption.
* Run sstable scrub in non-abort mode. Although this allows for
  exploring the full scope of the corruption and it even gets rid of it,
  it is a very intrusive and potentially destructive process that some
  users might not be willing to even risk.

This patchset offers an alternative: validation compaction. This is a
completely non-intrusive compaction that reads all sstables in turn and
validates their contents, logging any discrepancies it can find. It does
not mutate their content, it doesn't even re-writes them. It is akin to
a dry-run mode for sstable scrub. The reason it was not implemented as
such is that the current compaction infrastructure assumes that input
sstables are replaced by output sstables as part of the compaction
process. Lifting this assumption seemed error-prone and risky, so
instead I snatched the unused "Validation" compaction type for this
purpose. This compaction type completely bypasses the regular compaction
infrastructure but only at the low-level. It still integrates fully
into compaction-manager.

Fixes: #7736
Refs: https://github.com/scylladb/scylla-tools-java/issues/263

Tests: unit(dev)
"

* 'validation-compaction/v5' of https://github.com/denesb/scylla:
  test/boost/sstable_datafile_test: add test for validation compaction
  test/boost/sstable_datafile_test: scrub tests: extract corrupt sst writer code into function
  api: storage_service: expose validation compaction
  sstables/compaction_manager: add perform_sstable_validation()
  sstables/compaction_manager: rewrite_sstables(): resolve maintenance group FIXME
  sstables/compaction_manager: add maintenance scheduling group
  sstables/compaction_manager: drop _scheduling_group field
  sstables/compaction_manager: run_custom_job(): replace parameter name with compaction type
  sstables/compaction_manager: run_custom_job(): keep job function alive
  sstables/compaction_descriptor: compaction_options: add validation compaction type
  sstables/compaction: compaction_options::type(): add static assert for size of index_to_type
  sstables/compaction: implement validation compaction type
  sstables/compaction: extract compaction info creation into static method
  sstables/compaction: extract sstable list formatting to a class
  sstables/compaction: scrub_compaction: extract reporting code into static methods
  position_in_paritition{_view}: add has_key()
  mutation_fragment_stream_validator: add schema() accessor
2021-07-13 10:29:40 +03:00
Tomasz Grabiec
e947fac74c database: Fix cache metrics not being registered
Introduced in 6a6403d. The default constructor with dummy_app_stats is
also used by production code.

Fixes #9012
Message-Id: <20210712221447.71902-1-tgrabiec@scylladb.com>
2021-07-13 07:50:44 +03:00
Botond Dénes
c8f8e9232c sstables/compaction_manager: add maintenance scheduling group
rewrite_sstables() wants to be run in the maintenance group and soon we
will add another compaction type which also wants to be run in the
said group. To enable this propagate the maintenance scheduling group
(both CPU and IO) to the compaction manager.
2021-07-12 10:25:15 +03:00
Botond Dénes
c4e71fb9b8 reader_concurrency_semaphore: remove default name parameter
Naming the concurrency semaphore is currently optional, unnamed
semaphores defaulting to "Unnamed semaphore". Although the most
important semaphores are named, many still aren't, which makes for a
poor debugging experience when one of these times out.
To prevent this, remove the name parameter defaults from those
constructors that have it and require a unique name to be passed in.
Also update all sites creating a semaphore and make sure they use a
unique name.
2021-07-08 12:31:36 +03:00