Commit Graph

50 Commits

Author SHA1 Message Date
Avi Kivity
7f1e368e92 Merge 'replica/database: drop_column_family(): properly cleanup stale querier cache entries' from Botond Dénes
Said method has to evict all querier cache entries, belonging to the to-be-dropped table. This is already the case, but there was a window where new entries could sneak in, causing a stale reference to the table to be de-referenced later when they are evicted due to TTL. This window is now closed, the entries are evicted after the method has waited for all ongoing operations on said table to stop.

Fixes: #10450

Closes #10451

* github.com:scylladb/scylla:
  replica/database: drop_column_family(): drop querier cache entries after waiting for ops
  replica/database: finish coroutinizing drop_column_family()
  replica/database: make remove(const column_family&) private
2022-04-29 22:06:51 +03:00
Botond Dénes
024ceec61e replica/database: drop_column_family(): drop querier cache entries after waiting for ops
Reads (part of operations) running concurrent to `drop_column_family()`
can create querier cache entries while we wait for them to finish in
`await_pending_ops()`. Move the cache entry eviction to after this, to
ensure such entries are also cleaned up before destroying the table
object.
This moves the `_querier_cache.evict_all_for_table()` from
`database::remove()` to `database::drop_column_family()`. With that the
former doesn't have to return `future<>` anymore. While at it (changing
the signature) also rename `column_family` -> `table`.

Also add a regression unit test.
2022-04-28 13:40:13 +03:00
Botond Dénes
4c17da9996 replica/database: finish coroutinizing drop_column_family()
Said method was already coroutinized, but only halfway, possibly because
of the difficulty in expressing `finally()` with coroutines. We now have
`coroutines::as_future()` which makes this easier, so finish the job.
2022-04-28 13:40:13 +03:00
Avi Kivity
de0ee13f45 schema_tables: forward-declare user_function and user_aggerates
These bring in wasm.hh (though they really shouldn't) and make
everyone suffer. Forward declare instead and add missing includes
where needed.

Closes #10444
2022-04-28 07:22:02 +03:00
Benny Halevy
e88871f4ec replica: database: move shard_of implementation to mutation layer
We don't need the database to determine the shard of the mutation,
only its schema. So move the implementation to the respecive
definitions of mutation and frozen_mutation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10430
2022-04-27 14:40:24 +03:00
Benny Halevy
db676e9e4a replica: database: apply: make sure the schema is synced or throw internal error
Currently an exception is thrown in the apply stage
when the schema is not synced, but it is too late
since returning an error doesn't pinpoint which code
path was using an unsync'ed schema so move the check
earlier, before _apply_stage is called.

We need to make sure the schema is synced earlier
when the mutation is applied so call on_internal_error
to generate a backtrace in testing and still throw
an error in production.

Typically storage_proxy::mutate_locally implicitly
ensures the schema is synced by making a global_schema_ptr
for it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220424110057.3957597-1-bhalevy@scylladb.com>
2022-04-25 12:18:47 +02:00
Avi Kivity
a4be927e23 Revert "memtable_list: futurize clear_and_add"
This reverts commit 2325c566d9. It
causes a use-after-free of a memtable.

Fixes #10421.
2022-04-24 21:09:48 +03:00
Botond Dénes
270aba0f51 Merge "Abort database stopping barriers on exception" by Pavel Emelyanov
"
The database::shutdown() and ::drain() methods are called inside the
invoke_on_all()s synchronizing with each other via the cross-shard
_stop_barrier.

If either shard throws in between all others may get stuck waiting for
the barrier to collect all arrivals. To fix it the throwing shard
should wake up others, resolving the wait somehow.

The fix is actually patch #4, the first and the second are the abort()
method for the barrier itself.

Fixes: #10304

tests: unit(dev), manual
"

* 'br-barrier-exception-2' of https://github.com/xemul/scylla:
  database: Abort barriers on exception
  database: Coroutinize close_tables
  test: Add test for cross_shard_barrier::abort()
  cross-shard-barrier: Add .abort() method
2022-04-11 13:48:43 +03:00
Pavel Emelyanov
f63f1c3d69 database: Abort barriers on exception
The database::shutdown() and ::drain() methods are called inside the
container().invoke_on_all() and synchronize with each other via the
cross-shard _stop_barrier. If either shard throws in between all others
may get stuck waiting for the barrier to collect all arrivals.

The fix is to abort the barrier on exception thus making all the
shards sitting in shutdown or drain to bail out with exceptions too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 13:47:02 +03:00
Piotr Sarna
58529591a9 database,cql3: add STORAGE option to keyspaces
The STORAGE option is designed to hold a map of options
used for customizing storage for given keyspace.
The option is kept in a system_schema.scylla_keyspaces table.
The option is only available if the whole cluster is aware
of it - guarded by a cluster feature.

Example of the table contents:
```
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;

 keyspace_name | storage_options                                | storage_type
---------------+------------------------------------------------+--------------
           ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} |           S3
```
2022-04-08 09:17:01 +02:00
Pavel Emelyanov
2cab2a32b8 database: Coroutinize close_tables
To make next patch a bit simpler

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-06 18:43:32 +03:00
Botond Dénes
d0ea895671 readers: move multishard reader & friends to reader/multishard.cc
Since the multishard reader family weighs more than 1K SLOC, it gets
its own .cc file.
2022-03-30 15:42:51 +03:00
Benny Halevy
2325c566d9 memtable_list: futurize clear_and_add
Allow yielding to fix a reactor stall from table::clear.

Fixes #10281

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220327141259.213688-1-bhalevy@scylladb.com>
2022-03-27 17:25:43 +03:00
Avi Kivity
3c2271af52 Merge "De-globalize system keyspace local cache" from Pavel E
"
There's a static global sharded<local_cache> variable in system keyspace
the keeps several bits on board that other subsystems need to get from
the system keyspace, but what to have it in future<>-less manner.

Some time ago the system_keyspace became a classical sharded<> service
that references the qctx and the local cache. This set removes the global
cache variable and makes its instances be unique_ptr's sitting on the
system keyspace instances.

The biggest obstacle on this route is the local_host_id that was cached,
but at some point was copied onto db::config to simplify getting the value
from sstables manager (there's no system keyspace at hand there at all).
So the first thing this set does is removes the cached host_id and makes
all the users get it from the db::config.

(There's a BUG with config copy of host id  -- replace node doesn't
 update it. This set also fixes this place)

De-globalizing the cache is the prerequisite for untangling the snitch-
-messaging-gossiper-system_keyspace knot. Currently cache is initialized
too late -- when main calls system_keyspace.start() on all shards -- but
before this time messaging should already have access to it to store
its preferred IP mappings.

tests: unit(dev), dtest.simple_boot_shutdown(dev)
"

* 'br-trade-local-hostid-for-global-cache' of https://github.com/xemul/scylla:
  system_keyspace: Make set_local_host_id non-static
  system_keyspace: Make load_local_host_id non-static
  system_keyspace: Remove global cache instance
  system_keyspace: Make it peering service
  system_keyspace,snitch: Make load_dc_rack_info non-static
  system_keyspace,cdc,storage_service: Make bootstrap manipulations non-static
  system_keyspace: Coroutinize set_bootstrap_state
  gossiper: Add system keyspace dependency
  cdc_generation_service: Add system keyspace dependency
  system_keyspace: Remove local host id from local cache
  storage_service: Update config.host_id on replace
  storage_service: Indentation fix after previous patch
  storage_service: Coroutinize prepare_replacement_info()
  system_distributed_keyspace: Indentation fix after previous patch
  code,system_keyspace: Relax system_keyspace::load_local_host_id() usage
  code,system_keyspace: Remove system_keyspace::get_local_host_id()
2022-03-27 17:19:24 +03:00
Pavel Emelyanov
965d2a0a4f code,system_keyspace: Remove system_keyspace::get_local_host_id()
The host id is cached on db::config object that's available in
all the places that need it. This allows removing the method in
question from the system_keyspace and not caring that anyone that
needs host_id would have to depend on system_keyspace instance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-25 13:21:59 +03:00
Botond Dénes
421d4411f8 replica: move data_dictionary_impl into own header
As a first step towards disentangling it from database and allowing it
to be used by other classes (like table) too.
2022-03-25 11:44:31 +02:00
Avi Kivity
aab052c0d5 Merge 'replica/database: truncate: temporarily disable compaction on table and views before flush' from Benny Halevy
Flushing the base table triggers view building
and corresponding compactions on the view tables.

Temporarily disable compaction on both the base
table and all its view before flush and snapshot
since those flushed sstables are about to be truncated
anyway right after the snapshot is taken.

This should make truncate go faster.

In the process, this series also embeds `database::truncate_views`
into `truncate` and coroutinizes both

Refs #6309

Test: unit(dev)

Closes #10203

* github.com:scylladb/scylla:
  replica/database: truncate: fixup indentation
  replica/database: truncate: temporarily disable compaction on table and views before flush
  replica/database: truncate: coroutinize per-view logic
  replica/database: open-code truncate_view in truncate
  replica/database: truncate: coroutinize run_with_compaction_disabled lambda
  replica/database: coroutinize truncate
  compaction_manager: add disable_compaction method
2022-03-17 17:24:20 +02:00
Pavel Emelyanov
b80d5f8900 schema_tables: Add sharded<system_keyspace> argument to update_schema_version_and_announce
All its (indirect) callers had been patched to have it, now it's
possible to have the argument in it. Next patch will make use of it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
009c449cc3 replica: Push sharded<system_keyspace> down to parse_system_tables
The method needs to call merge_schema() that will need system keyspace
instance at hand. The parse_s._t. method is boot-time one, pushing the
main-local instance through it is fine

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Benny Halevy
70e1fdb0c8 replica/database: truncate: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-15 14:02:35 +02:00
Benny Halevy
5ca45b5c32 replica/database: truncate: temporarily disable compaction on table and views before flush
Flushing the base table triggers view building
and corresponding compactions on the view tables.

Temporarily disable compaction on both the base
table and all its view before flush and snapshot
since those flushed sstables are about to be truncated
anyway right after the snapshot is taken.

This should make truncate go faster.

Refs #6309

Test: unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-15 14:02:29 +02:00
Benny Halevy
20fcf05586 replica/database: truncate: coroutinize per-view logic
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-15 12:51:20 +02:00
Benny Halevy
c6d72a1814 replica/database: open-code truncate_view in truncate
truncate-views is called only internally from database::truncate.

Next step will be to disable compactions on the base
table and view before flush and snapshot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-15 12:47:46 +02:00
Benny Halevy
613e65e5d0 replica/database: truncate: coroutinize run_with_compaction_disabled lambda
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-15 12:47:13 +02:00
Benny Halevy
0318f48110 replica/database: coroutinize truncate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-15 12:46:41 +02:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Benny Halevy
ebbbf1e687 lister: move to utils
There's nothing specific to scylla in the lister
classes, they could (and maybe should) be part of
the seastar library.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-28 12:36:03 +02:00
Avi Kivity
cbba80914d memtable: move to replica module and namespace
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.

Memtables are also used outside the replica, in two places:
 - in some virtual tables; this is also in some way inside the replica,
   (virtual readers are installed at the replica level, not the
   cooordinator), so I don't consider it a layering violation
 - in many sstable unit tests, as a convenient way to create sstables
   with known input. This is a layering violation.

We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).

Test: unit (dev)

Closes #10120
2022-02-23 09:05:16 +02:00
Pavel Emelyanov
66b9a53808 database: Move is_replacing() and get_replace_address() (back) into storage_service
Both helpers (natuarally) used to be storage-service methods, but then
were moved to databse because bootstrapper code wanted to know this info.
Now the bootstraper is equipped with necessary arguments.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-02-07 12:43:08 +03:00
Piotr Wojtczak
0dd7739716 snapshots: Fix snapshot-ctl to include snapshots of dropped tables
Snapshot-ctl methods fetch information about snapshots from
column family objects. The problem with this is that we get rid
of these objects once the table gets dropped, while the snapshots
might still be present (the auto_snapshot option is specifically
made to create this kind of situation). This commit switches from
relying on column family interface to scanning every datadir
that the database knows of in search for "snapshots" folders.

Fixes #3463
Closes #7122

Closes #9884

Signed-off-by: Piotr Wojtczak <piotr.m.wojtczak@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-01 22:31:43 +02:00
Calle Wilund
7ca72ffd19 database: Make wrapped version of timed_out_error a timed_out_error
Refs #9919

in a6202ae  throw_commitlog_add_error was added to ensure we had more
info on errors generated writing to commit log.

However, several call sites catch timed_out_error explicitly, not
checking for nested etc.
97bb1be and 868b572 tried to deal with it, by using check routines.
It turns out there are call sites left, and while these should be
changed, it is safer and quicker for now to just ensure that
iff we have a timed_out_error, we throw yet another timed_out_error.

Closes #10002
2022-01-31 14:15:23 +02:00
Pavel Emelyanov
77532a6a36 database, dht: Move get_initial_tokens()
The helper in question has nothing to do with replica/database and
is only used by dht to convert config option to a set of tokens.
It sounds like the helper deserves living where it's needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-01-27 16:41:29 +03:00
Nadav Har'El
4aa9e86924 Merge 'alternator: move uses of replica module to data_dictionary' from Avi Kivity
Alternator is a coordinator-side service and so should not access
the replica module. In this series all but one of uses of the replica
module are replaced with data_dictionary.

One case remains - accessing the replication map which is not
available (and should not be available) via the data dictionary.

The data_dictionary module is expanded with missing accessors.

Closes #9945

* github.com:scylladb/scylla:
  alternator: switch to data_dictionary for table listing purposes
  data_dictionary: add get_tables()
  data_dictionary: introduce keyspace::is_internal()
2022-01-19 11:34:25 +02:00
Avi Kivity
f80d13c95c data_dictionary: add get_tables()
Unlike replica::database::get_column_families() which is replaces,
it returns a vector of tables rather than a map. Map-like access
is provided by get_table(), so it's redundant to build a new
map container to expose the same functionality.
2022-01-19 09:36:22 +02:00
Nadav Har'El
1ce73c2ab3 Merge 'utils::is_timeout_exception: Ensure we handle nested exception types' from Calle Wilund
Fixes #9922

storage proxy uses is_timeout_exception to traverse different code paths.
a6202ae079 broke this (because bit rot and
intermixing), by wrapping exception for information purposes.

This adds check of nested types in exception handling, as well as a test
for the routine itself.

Closes #9932

* github.com:scylladb/scylla:
  database/storage_proxy: Use "is_timeout_exception" instead of catch match
  utils::is_timeout_exception: Ensure we handle nested exception types
2022-01-18 23:49:41 +02:00
Calle Wilund
868b572ec8 database/storage_proxy: Use "is_timeout_exception" instead of catch match
Might miss cases otherwise.

v2: Fix broken control flow
v3: Avoid throw - use make_exception_future instead.
2022-01-18 15:40:41 +00:00
Avi Kivity
8350cabff3 data_dictionary: introduce keyspace::is_internal()
Instead of the replica module's is_internal_keyspace(), provide
it as part of data_dictionary. By making it a member of the keyspace
class, it is also more future proof in that it doesn't depend on
a static list of names.
2022-01-18 15:31:38 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Mikołaj Sielużycki
f6d9d6175f sstables: Harden bad_alloc handling during memtable flush.
dirty_memory_manager monitors memory and triggers memtable flushing if
there is too much pressure. If bad_alloc happens during the flush, it
may break the loop and flushes won't be triggered automatically, leading
to blocked writes as memory won't be automatically released.

The solution is to add exception handling to the loop, so that the inner
part always returns a non-exceptional future (meaning the loop will
break only on node shutdown).

try/catch is used around on_internal_error instead of
on_internal_error_noexcept, as the latter doesn't have a version that
accepts an exception pointer. To get the exception message from
std::exception_ptr a rethrow is needed anyway, so this was a simpler
approach.

Fixes: #4174

Message-Id: <20220114082452.89189-1-mikolaj.sieluzycki@scylladb.com>
2022-01-14 16:09:21 +02:00
Botond Dénes
d6efe27545 Merge 'db: config: add a flag to disable new reversed reads algorithm' from Kamil Braun
Just in case the new algorithm turns out to be buggy, or give a
performance regression, add a flag to fall-back to the old algorithm for
use in the field.

Closes #9908

* github.com:scylladb/scylla:
  db: config: add a flag to disable new reversed reads algorithm
  replica: table: remove obsolete comment about reversed reads
2022-01-13 23:09:02 +02:00
Kamil Braun
e98711cfcb db: config: add a flag to disable new reversed reads algorithm
Just in case the new algorithm turns out to be buggy, or give a
performance regression, add a flag to fall-back to the old algorithm for
use in the field.
2022-01-12 18:59:19 +01:00
Avi Kivity
631a19884d data_dictionary: add get_keyspaces() method
Mirroring replica::database::get_keyspaces(), for Thrift's use.

We return a vector instead of a hash map. Random access is already
available via database::find_keyspace(). The name is available
via the keyspace metadata, and in fact Thrift ignore the map
name and uses the metadata name. Using a simpler type reduces
include dependencies for this heavily used module.

The function is plumbed to replica::database::get_keyspaces() so
it returns the same data.
2022-01-12 18:24:38 +02:00
Calle Wilund
a6202ae079 database: Add error message with mutation info on commit log apply failure
Fixes #9408

While it is rare, some customer issues have shown that we can run into cases
where commit log apply (writing mutations to it) fails badly. In the known
cases, due to oversized mutations. While these should have been caught earlier
in the call chain really, it would probably help both end users and us (trying
to figure out how they got so big and how they got so far) iff we added info
to the errors thrown (and printed), such as ks, cf, and mutation content.
2022-01-12 14:04:23 +00:00
Calle Wilund
63ea666ca0 database: coroutinize do_apply and apply_with_commitlog
Somewhat controversial. Making the apply with CL decision path
coroutinized, mainly to be able to in next patch make error handling
more informative (because we will have exceptions that are immediate
and/or futurized).

This is as stated somewhat problematic, it adds an allocation to
perf_simple_query::write path (because of crap clang cr frame folding?).
However, tasks/op remain constant and actual tps (though unstable)
remain more or less the same (on my crappy measurements).

Counter path is unaffected, as coroutine frame alloc replaces with(...)
alloc, and all is same and dandy.

I am hoping that the simpler error + verbose code will compensate for
the extra alloc.
2022-01-12 14:04:15 +00:00
Michael Livshin
1f27e12dc6 convert make_multishard_streaming_reader() to flat_mutation_reader_v2
All changes are mechanical.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
be5118a7c9 convert table::make_streaming_reader() to flat_mutation_reader_v2
All changes are mechanical.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
221cd264db convert make_flat_multi_range_reader() to flat_mutation_reader_v2
Mechanical changes and a resulting downgrade in one caller (which is
itself converted later).

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Avi Kivity
6c53717a39 replica, atomic_cell: move atomic_cell merge code from replica module to atomic_cell.cc
compare_atomic_cell_for_merge() was placed in database.cc, before
atomic_cell.cc existed. Move it to its correct place.

Closes #9889
2022-01-09 11:08:10 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00