Commit Graph

1477 Commits

Author SHA1 Message Date
Avi Kivity
0114244363 Merge 'replica/database: drop_column_family(): properly cleanup stale querier cache entries' from Botond Dénes
Said method has to evict all querier cache entries, belonging to the to-be-dropped table. This is already the case, but there was a window where new entries could sneak in, causing a stale reference to the table to be de-referenced later when they are evicted due to TTL. This window is now closed, the entries are evicted after the method has waited for all ongoing operations on said table to stop.

Fixes: #10450

Closes #10451

* github.com:scylladb/scylla:
  replica/database: drop_column_family(): drop querier cache entries after waiting for ops
  replica/database: finish coroutinizing drop_column_family()
  replica/database: make remove(const column_family&) private

(cherry picked from commit 7f1e368e92)
2022-05-01 17:11:52 +03:00
Avi Kivity
4b1b0a55c0 replica, atomic_cell: move atomic_cell merge code from replica module to atomic_cell.cc
compare_atomic_cell_for_merge() was placed in database.cc, before
atomic_cell.cc existed. Move it to its correct place.

Closes #9889

(cherry picked from commit 6c53717a39)
2022-03-24 18:07:11 +02:00
Avi Kivity
1aff7d19c2 treewide: replace seastar::fmt_print() with fmt::print()
We shouldn't be using Seastar as a text formatting library; that's
not its focus. Use fmt directly instead. fmt::print() doesn't return
the output stream which is a minor inconvenience, but that's life.

Closes #9556
2021-11-01 10:05:16 +02:00
Benny Halevy
0746b5add6 storage_service: replicate_to_all_cores: update all keyspaces
Currently we update the effective_replication_map
only on non-system keyspace, leaving the system keyspace,
that uses the local replication strategy, with the empty
replication_map, as it was first initialized.

This may lead to a crash when get_ranges is called later
as seen in #9494 where get_ranges was called from the
perform_sstable_upgrade path.

This change updates the effective_replication_map on all
keyspaces rather than just on the non-system ones
and adds a unit test that reproduces #9494 without
the fix and passes with it.

Fixes #9494

Test: unit(dev), database_test(debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211020143217.243949-1-bhalevy@scylladb.com>
2021-10-20 17:54:23 +03:00
Benny Halevy
e4dc81ec04 abstract_replication_strategy: add to_qualified_class_name
And use it from cql3 check_restricted_replication_strategy and
keyspace_metadata ctor that defined their own `replication_class_strategy`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-18 12:13:25 +03:00
Tomasz Grabiec
cc56a971e8 database, treewide: Introduce partition_slice::is_reversed()
Cleanup, reduces noise.

Message-Id: <20211014093001.81479-1-tgrabiec@scylladb.com>
2021-10-14 12:39:16 +03:00
Benny Halevy
8c85197c6c abstract_replication_strategy: get rid of shared_token_metadata member and ctor param
It is not used any more.

Methods either use the token_metadata_ptr in the
effective_replication_map, or receive an ad-hoc
token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
dfdc8d4ddb abstract_replication_strategy: move get_ranges and get_primary_ranges* to effective_replication_map
Provide a sync get_ranges method by effective_replication_map
that uses the precalculated map to get all token ranges owned by or
replicated on a given endpoint.

Reuse do_get_ranges as common infrastructure for all
3 cases: get_ranges, get_primary_ranges, and get_primary_ranges_within_dc.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:09:51 +03:00
Benny Halevy
991a6a8664 keyspace: update_effective_replication_map
And use it to get_natural_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:55:34 +03:00
Benny Halevy
970b0a50b5 keyspace: futurize create_replication_strategy
And functions that use it, like:
keyspace::update_from
database::update_keyspace
database::create_in_memory_keyspace

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:53:41 +03:00
Benny Halevy
5001d261d4 abstract_replication_strategy: define replication_strategy_config_options
To be used for searching effective replication strategy instances.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Avi Kivity
fd8beeaea9 treewide: handle switch statements that return
A switch statement where every case returns triggers a gcc
warning if the surrounding function doesn't return/abort.

Fix by adding an abort(). The abort() will never trigger since we
have a warning on unhandled switch cases.
2021-10-10 18:16:50 +03:00
Tomasz Grabiec
e89b9799b8 Merge 'sstable mx reader: implement reverse single-partition reads' from Kamil Braun
Until now reversed queries were implemented inside
`querier::consume_page` (more precisely, inside the free function
`consume_page` used by `querier::consume_page`) by wrapping the
passed-in reader into `make_reversing_reader` and then consuming
fragments from the resulting reversed reader.

The first couple of commits change that by pushing the reversing down below
the `make_combined_reader` call in `table::query`. This allows
working on improving reversing for memtables independently from
reversing for sstables.

We then extend the `index_reader` with functions that allow
reading the promoted index in reverse.

We introduce `partition_reversing_data_source`, which wraps an sstable data
file and returns data buffers with contents of a single chosen partition
as if the rows were stored in reverse order.

We use the reversing source and the extended index reader in
`mx_sstable_mutation_reader` to implement efficient (at least in theory)
reversed single-partition reads.

The patchset disables cache for reversed reads. Fast-forwarding
is not supported in the mx reader for reversed queries at this point.

Details in commit messages. Read the commits in topological order
for best review experience.

Refs: #9134
(not saying "Fixes" because it's only for single-partition queries
without forwarding)

Closes #9281

* github.com:scylladb/scylla:
  table: add option to automatically bypass cache for reversed queries
  test: reverse sstable reader with random schema and random mutations
  sstables: mx: implement reversed single-partition reads
  sstables: mx: introduce partition_reversing_data_source
  sstables: index_reader: add support for iterating over clustering ranges in reverse
  clustering_key_filter: clustering_key_filter_ranges owning constructor
  flat_mutation_reader: mention reversed schema in make_reversing_reader docstring
  clustering_key_filter: document clustering_key_filter_ranges::get_ranges
2021-10-04 15:37:34 +02:00
Kamil Braun
703aed3277 table: add option to automatically bypass cache for reversed queries
Currently the new reversing sstable algorithms do not support fast
forwarding and the cache does not yet handle reversed results. This
forced us to disable the cache for reversed queries if we want to
guarantee bounded memory. We introduce an option that does this
automatically (without specifying `bypass cache` in the query) and turn
it on by default.

If the user decides that they prefer to keep the cache at the
cost of fetching entire partitions into memory (which may be viable
if their partitions are small) during reversed queries, the option can
be turned off. It is live-updateable.
2021-10-04 15:24:12 +02:00
Pavel Emelyanov
e9002e1e61 database: Get local host id from system_keyspace
It's now cached on database itself, and it can be removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 10:55:20 +03:00
Avi Kivity
936de92876 Merge 'cql3: Add evaluate(expression) and use instead of term::bind()' from Jan Ciołek
This PR adds the function:
```c++
constant evaluate(const expression&, const query_options&);
```
which evaluates the given expression to a constant value.
It binds all the bound values, calls functions, and reduces the whole expression to just raw bytes and `data_type`, just like `bind()` and `get()` did for `term`.

The code is often similar to the original `bind()` implementation in `lists.cc`, `sets.cc`, etc.

* For some reason in the original code, when a collection contains `unset_value`, then the whole collection is evaluated to `unset_value`. I'm not sure why this is the case, considering it's impossible to have `unset_value` inside a collection, because we forbid bind markers inside collections. For example here: cc8fc73761/cql3/lists.cc (L134)
This seems to have been introduced by Pekka Enberg in 50ec81ee67, but he has left the company.
I didn't change the behaviour, maybe there is a reason behind it, although maybe it would be better to just throw `invalid_request_exception`.
* There was a strange limitation on map key size, it seems incorrect: cc8fc73761/cql3/maps.cc (L150), but I left it in.
* When evaluating a `user_type` value, the old code tolerated `unset_value` in a field, but it was later converted to NULL. This means that `unset_value` doesn't work inside a `user_type`, I didn't change it, will do in another PR.
* We can't fully get rid of `bind()` yet, because it's used in `prepare_term` to return a `terminal`. It will be removed in the next PR, where we finally get rid of `term`.

Closes #9353

* github.com:scylladb/scylla:
  cql3: types: Optimize abstract_type::contains_collection
  cql3: expr: Convert evaluate_IN_list to use evaluate(expression)
  cql3: expr: Use only evaluate(expression) to evaluate term
  cql3: expr: Implement evaluate(expr::function_call)
  cql3: expr: Implement evaluate(expr::usertype_constructor)
  cql3: expr: Implement evaluate(expr::collection_constructor)
  cql3: expr: Implement evaluate(expr::tuple_constructor)
  cql3: expr: Implement evaluate(expr::bind_variable)
  cql3: Add contains_collection/set_or_map to abstract_type
  cql3: expr: Add evaluate(expression, query_options)
  cql3: Implement term::to_expression for function_call
  cql3: Implement term::to_expression for user_type
  cql3: Implement term::to_expression for collections
  cql3: Implement term::to_expression for tuples
  cql3: Implement term::to_expression for marker classes
  cql3: expr: Add data_type to *_constructor structs
  cql3: Add term::to_expression method
  cql3: Reorganize term and expression includes
2021-09-26 12:58:11 +03:00
Pavel Emelyanov
88e5b7c547 database: Shutdown in tests
There's a circular dependency:

  query processor needs database
  database owns large_data_handler and compaction_manager
  those two need qctx
  qctx owns a query_processor

Respectively, the latter hidden dependency is not "tracked" by
constructor arguments -- the query processor is started after
the database and is deferred to be stopped before it. This works
in scylla, because query processor doesn't really stop there,
but in cql_test_env it's problematic as it stops everything,
including the qctx.

Recent database start-stop sanitation revealed this problem --
on database stop either l.d.h. or compaction manager try to
start (or continue) messing with the query processor. One problem
was faced immediatelly and pluged with the 75e1d7ea safety check
inside l.d.h., but still cql_test_env tests continue suffering
from use after free on stopped query processor.

The fix is to partially revert the 4b7846da by making the tests
stop some pieces of the database (inclusing l.d.h. and compaction
manager) as it used to before. In scylla this is, probably, not
needed, at least now -- the database shutdown code was and still
is run right before the stopping one.

tests: unit(debug)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210924080248.11764-1-xemul@scylladb.com>
2021-09-26 11:09:01 +03:00
Jan Ciolek
746e9c620f cql3: Reorganize term and expression includes
Make term.hh include expression.hh instead of the other way around.
expression can't be forward declared.
expression is needed in term.hh to declare term::to_expression().

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-09-24 11:05:53 +02:00
Benny Halevy
ad46ff8e5e database: coroutinize create_keyspace
Prepare for futurizing on create_in_memory_keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-10-bhalevy@scylladb.com>
2021-09-23 14:05:44 +03:00
Benny Halevy
91091e9d89 database: update_keyspace: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-9-bhalevy@scylladb.com>
2021-09-23 14:05:18 +03:00
Benny Halevy
c71cd2bed3 database: coroutinize update_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210923093200.1559734-8-bhalevy@scylladb.com>
2021-09-23 14:05:18 +03:00
Pavel Emelyanov
a4118a70ee database, messaging: Delete old connection drop notification
Database no longer needs it. Since the only user of the old-style
notification is gone -- remove it as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
bfd91d7b81 database, proxy: Relocate connection-drop activity
On start database is subscribed on messaging-service connection drop
notification to drop the hit-rate from column families. However, the
updater and reader of those hit-rates is the storage_proxy, so it
must be the _proxy_ who drops the hit-rate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
b78e9b51b7 database, tests: Rework recommended format setting
Tests don't have sstable format selector and enforce the needed
format by hands with the help of special database:: method. It's
more natural to provide it via convig. Doing this makes database
initialization in main and cql_test_env closer to each other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
a42383b127 database, sstables_manager: Sow some noexcepts
Setting sstables format into database and into sstables_manager is
all plain assignments. Mark them as noexcept, next patch will become
apparently exception safe after that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
9a76df96e3 database: Eliminate unused helpers
There are some large-data-handler-related helpers left after previous
patches, they can be removed altogehter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
4b7846da86 database: Merge the stop_database() into database::stop()
After stop_database() became shard-local, it's possible to merge
it with database::stop() as they are both called one after another
on scylla stop. In cql-test-env there are few more steps in
between, but they don't rely on the database being partially
stopped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
469c734155 database: Flatten stop_database()
The method need to perform four steps cross-shard synchronously:
first stop compaction manager, then close user and, after it,
system tables, finally shutdown the large data handler.

This patch reworks this synchronization with the help of cross-shard
barrier added to the database previously. The motivation is to merge
.stop_database() with .stop().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
b1013e09b4 database: Equip with cross-shard-barrier
Make sure a node-wide barrier exists on a database when scylla starts.
Also provide a barrier for cql_test_env. In all other cases keep a
solo-mode barrier so that single-shard db stop doesn't get blocked.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
634ea4b543 database: Move starting bits into start()
Thse include large_data_handler::start, compaction_manager::enable
and database::init_commitlog.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:48:48 +03:00
Pavel Emelyanov
e2308034ff database: Add .start() method
Called right after the sharded::start(). For now empty, to be populated
by next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:44:48 +03:00
Pavel Emelyanov
127e4fe8de main: Shorten commitlog creation
This does three things in one go:

- converts

    db.invoke_on_all([] (database& db) {
        return db.init_commitlog();
    });

  into a one-line version

    db.invoke_on_all(&database::init_commitlog);

- removes the shard-0 pre-initialization for tests, because
  tests don't have the problem this pre- solves

- make the init_commitlog() re-entrable to let regular start
  not check for shard-0 explicitly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:37:07 +03:00
Pavel Emelyanov
bd2b7dca0e database: Remove unused mm arg from init_non_system_keyspaces()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:35:37 +03:00
Pavel Emelyanov
bb23986826 wasm: Localize it to database usage
The wasm::engine exists as a sharded<> service in main, but it's only
passed by local reference into database on start. There's no much profit
in keeping it at main scope, things get much simpler if keeping the
engine purely on database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:35:17 +03:00
Kamil Braun
c12e265eb8 table, database: query, mutation_query: remove unnecessary class_config param
The semaphore inside was never accessed and `max_memory_for_unlimited_query`
was always equal to `*cmd.max_result_size` so the parameter was completely
redundant.

`cmd.max_result_size` is supposed to be always set in the affected
functions - which are executed on the replica side - as soon as the
replica receives the `read_command` object, in case the parameter was
not set by the coordinator. However, we don't have a guarantee at the
type level (it's still an `optional`). Many places used
`*cmd.max_result_size` without even an assertion.

We make the code a bit safer, we check for `cmd.max_result_size` and if
it's indeed engaged, store it in `reader_permit`. We then access it from
`reader_permit` where necessary. If `cmd.max_result_size` is not set, we
assume this is an unlimited query and obtain the limit from
`get_unlimited_query_max_result_size`.
2021-09-14 13:39:56 +02:00
Kamil Braun
fbb83dd5ca reader_concurrency_semaphore: remove default parameter values from constructors
It's easy to forget about supplying the correct value for a parameter
when it has a default value specified. It's safer if 'production code'
is forced to always supply these parameters manually.

The default values were mostly useful in tests, where some parameters
didn't matter that much and where the majority of uses of the class are.
Without default values adding a new parameter is a pain, forcing one to
modify every usage in the tests - and there are a bunch of them. To
solve this, we introduce a new constructor which requires passing the
`for_tests` tag, marking that the constructor is only supposed to be
used in tests (and the constructor has an appropriate comment). This
constructor uses default values, but the other constructors - used in
'production code' - do not.
2021-09-14 12:20:28 +02:00
Botond Dénes
502a45ad58 treewide: switch to native reversed format for reverse reads
We define the native reverse format as a reversed mutation fragment
stream that is identical to one that would be emitted by a table with
the same schema but with reversed clustering order. The main difference
to the current format is how range tombstones are handled: instead of
looking at their start or end bound depending on the order, we always
use them as-usual and the reversing reader swaps their bounds to
facilitate this. This allows us to treat reversed streams completely
transparently: just pass along them a reversed schema and all the
reader, compacting and result building code is happily ignorant about
the fact that it is a reversed stream.
2021-09-09 15:42:15 +03:00
Benny Halevy
b7eaa22ce6 abstract_replication_strategy: create_replication_strategy: drop keyspace name parameter
It is not used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210906133840.3307279-1-bhalevy@scylladb.com>
2021-09-06 16:51:21 +03:00
Benny Halevy
56e063ce93 keyspace: get rid of set_replication_strategy
It's unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210906133905.3307397-1-bhalevy@scylladb.com>
2021-09-06 16:48:35 +03:00
Avi Kivity
4d7e00d0f8 cql3: selection: make selectable.hh not include expr/expresion.hh
We have this dependency now:

   column_identifier -> selectable -> expression

and want to introduce this:

   expression -> user types -> column_identifier

This leads to a loop, since expression is not (yet) forward
declarable.

Fix by moving any mention of expression from selectable.hh to a new
header selection-expr.hh.

database.cc lost access to timeout_config, so adjust its includes
to regain it.
2021-08-26 15:19:14 +03:00
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
9b0b13c450 reader_concurrency_semaphore: adjust reactivated reader timeout
Update the reader's timeout where needed
after unregistering inactive_read.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
fe479aca1d reader_permit: add timeout member
To replace the timeout parameter passed
to flat_mutation_reader methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00
Nadav Har'El
49aea3b301 Merge 'database: coroutinize schema load functions' from Avi Kivity
Simple coroutinization of the schema load functions, leaving the code tidier.

Test: unit (dev)

Closes #9217

* github.com:scylladb/scylla:
  database: adjust indentation after coroutinization of schema table parsing code
  database: convert database::parse_schema_tables() to a coroutine
  database: remove unneeded temporary in do_parse_schema_tables()
  database: convert do_parse_schema_tables() to a coroutine
2021-08-23 17:45:58 +03:00
Benny Halevy
4439e5c132 everywhere: cleanup defer.hh includes
Get rid of unused includes of seastar/util/{defer,closeable}.hh
and add a few that are missing from source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-22 21:11:39 +03:00
Avi Kivity
5450af8e1b database: coroutinize stop()
Make the code tidier.

The conversion is not mechanical: the finally block is converted
to straight line code. stop()/close() must not fail anyway, and we
cannot recover from such failures. The when_all_succeed() for stopping
the semaphores is also converted to straight-line code - there is no
advantage to stopping them in parallel, as we're just waiting for
running tasks to complete and clean up.

Test: unit (dev)

Closes #9218
2021-08-18 10:57:44 +02:00
Avi Kivity
73d6f2798d database: adjust indentation after coroutinization of schema table parsing code 2021-08-17 21:05:05 +03:00
Avi Kivity
4ca856157d database: convert database::parse_schema_tables() to a coroutine
In one case we have f = f.then(...), but we can just wait
for the first future where it's created.
2021-08-17 21:00:15 +03:00
Avi Kivity
4f91953ebf database: remove unneeded temporary in do_parse_schema_tables()
The coroutine can keep the cf_name parameter alive, provided we
pass it by value.
2021-08-17 20:45:41 +03:00
Avi Kivity
b2d5820d75 database: convert do_parse_schema_tables() to a coroutine 2021-08-17 20:44:28 +03:00