Commit Graph

957 Commits

Author SHA1 Message Date
Avi Kivity
b3c95a1fc6 commitlog: reduce inclusions of commitlog.hh due to db::commitlog::force_sync (#9379)
There are now 231 translation units that indirectly include commitlog.hh
due to the need to have access to db::commitlog::force_sync.

Move that type to a new file commitlog_types.hh and make it available
without access to the commitlog class.

This reduces the number of translation units that depend on commitlog.hh
to 84, improving compile time.
2021-09-29 16:13:44 +03:00
Raphael S. Carvalho
9718173598 compaction: Update backlog tracker correctly when schema is updated
Currently the following can happen:
1) there's ongoing compaction with input sstable A, so sstable set
and backlog tracker both contains A.
2) ongoing compaction replaces input sstable A by B, so sstable set
contains only B now.
3) schema is updated, so a new backlog tracker is built without A
because sstable set now contains only B.
4) ongoing compaction tries to remove A from tracker, but it was
excluded in step 3.
5) tracker can now have a negative value if table is decreasing in
size, which leads to log(<negative number>) == -NaN

This problem happens because backlog tracker updates are decoupled
from sstable set updates. Given that the essential content of
backlog tracker should be the same as one of sstable set, let's move
tracker management to table.
Whenever sstable set is updated, backlog tracker will be updated with
the same changes, making their management less error prone.

Fixes #9157

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-27 14:15:29 -03:00
Avi Kivity
d7ac699a55 Revert "Merge "compaction: Update backlog tracker correctly when schema is updated" from Raphael"
This reverts commit b5cf0b4489, reversing
changes made to e8493e20cb. It causes
segmentation faults when sstable readers are closed.

Fixes #9388.
2021-09-26 18:31:49 +03:00
Avi Kivity
bf94c06fc7 Revert "Merge "simplifications and layer violation fix for compaction manager" from Raphael"
This reverts commit 7127c92acc, reversing
changes made to 88480ac504. We need to
revert b5cf0b4489 to fix #9388, and this stands
in the way.

Ref #9388.
2021-09-26 18:30:36 +03:00
Pavel Emelyanov
88e5b7c547 database: Shutdown in tests
There's a circular dependency:

  query processor needs database
  database owns large_data_handler and compaction_manager
  those two need qctx
  qctx owns a query_processor

Respectively, the latter hidden dependency is not "tracked" by
constructor arguments -- the query processor is started after
the database and is deferred to be stopped before it. This works
in scylla, because query processor doesn't really stop there,
but in cql_test_env it's problematic as it stops everything,
including the qctx.

Recent database start-stop sanitation revealed this problem --
on database stop either l.d.h. or compaction manager try to
start (or continue) messing with the query processor. One problem
was faced immediatelly and pluged with the 75e1d7ea safety check
inside l.d.h., but still cql_test_env tests continue suffering
from use after free on stopped query processor.

The fix is to partially revert the 4b7846da by making the tests
stop some pieces of the database (inclusing l.d.h. and compaction
manager) as it used to before. In scylla this is, probably, not
needed, at least now -- the database shutdown code was and still
is run right before the stopping one.

tests: unit(debug)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210924080248.11764-1-xemul@scylladb.com>
2021-09-26 11:09:01 +03:00
Raphael S. Carvalho
5bf51ced14 compaction: split compaction info and data for control
compaction_info must only contain info data to be exported to the
outside world, whereas compaction_data will contain data for
controlling compaction behavior and stats which change as
compaction progresses.
This separation makes the interface clearer, also allowing for
future improvements like removing direct references to table
in compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:56:18 -03:00
Raphael S. Carvalho
0885376a85 compaction: move management of compaction_info to compaction_manager
Today, compaction is calling compaction manager to register / deregister
the compaction_info created by it.

This is a layer violation because manager sits one layer above
compaction, so manager should be responsible for managing compaction
info.

From now on, compaction_info will be created and managed by
compaction_manager. compaction will only have a reference to info,
which it can use to update the world about compaction progress.

This will allow compaction_manager to be simplified as info can be
coupled with its respective task, allowing duplication to be removed
and layer violation to be fixed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:00:49 -03:00
Avi Kivity
b5cf0b4489 Merge "compaction: Update backlog tracker correctly when schema is updated" from Raphael
"
Backlog tracker isn't updated correctly when facing a schema change, and
may leak a SSTable if compaction strategy is changed, which causes
backlog to be computed incorrectly. Most of these problems happen because
sstable set and tracker are updated independently, so it could happen
that tracker lose track (pun intended) of changes applied to set.

The first patch will fix the leak when strategy is changed, and the third
patch will make sure that tracker is updated atomically with sstable set,
so these kind of problems will not happen anymore.

Fixes #9157

test: mode(debug)
"

* 'fixes_to_backlog_tracker_v3' of https://github.com/raphaelsc/scylla:
  compaction: Update backlog tracker correctly when schema is updated
  compaction: Don't leak backlog of input sstable when compaction strategy is changed
  compaction: introduce compaction_read_monitor_generator::remove_exhausted_sstables()
  compaction: simplify removal of monitors
2021-09-22 18:55:25 +03:00
Raphael S. Carvalho
ff38f59f67 compaction: Update backlog tracker correctly when schema is updated
Currently the following can happen:
1) there's ongoing compaction with input sstable A, so sstable set
and backlog tracker both contains A.
2) ongoing compaction replaces input sstable A by B, so sstable set
contains only B now.
3) schema is updated, so a new backlog tracker is built without A
because sstable set now contains only B.
4) ongoing compaction tries to remove A from tracker, but it was
excluded in step 3.
5) tracker can now have a negative value if table is decreasing in
size, which leads to log(<negative number>) == -NaN

This problem happens because backlog tracker updates are decoupled
from sstable set updates. Given that the essential content of
backlog tracker should be the same as one of sstable set, let's move
tracker management to table.
Whenever sstable set is updated, backlog tracker will be updated with
the same changes, making their management less error prone.

Fixes #9157

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-20 15:54:41 -03:00
Pavel Emelyanov
a4118a70ee database, messaging: Delete old connection drop notification
Database no longer needs it. Since the only user of the old-style
notification is gone -- remove it as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
b78e9b51b7 database, tests: Rework recommended format setting
Tests don't have sstable format selector and enforce the needed
format by hands with the help of special database:: method. It's
more natural to provide it via convig. Doing this makes database
initialization in main and cql_test_env closer to each other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
a42383b127 database, sstables_manager: Sow some noexcepts
Setting sstables format into database and into sstables_manager is
all plain assignments. Mark them as noexcept, next patch will become
apparently exception safe after that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
9a76df96e3 database: Eliminate unused helpers
There are some large-data-handler-related helpers left after previous
patches, they can be removed altogehter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
4b7846da86 database: Merge the stop_database() into database::stop()
After stop_database() became shard-local, it's possible to merge
it with database::stop() as they are both called one after another
on scylla stop. In cql-test-env there are few more steps in
between, but they don't rely on the database being partially
stopped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
469c734155 database: Flatten stop_database()
The method need to perform four steps cross-shard synchronously:
first stop compaction manager, then close user and, after it,
system tables, finally shutdown the large data handler.

This patch reworks this synchronization with the help of cross-shard
barrier added to the database previously. The motivation is to merge
.stop_database() with .stop().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
b1013e09b4 database: Equip with cross-shard-barrier
Make sure a node-wide barrier exists on a database when scylla starts.
Also provide a barrier for cql_test_env. In all other cases keep a
solo-mode barrier so that single-shard db stop doesn't get blocked.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
e2308034ff database: Add .start() method
Called right after the sharded::start(). For now empty, to be populated
by next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:44:48 +03:00
Pavel Emelyanov
f6ab69b7f8 database: Extract commitlog initialization from init_system_keyspace
The intention is to keep all database initialization code in one place.
The init_system_keyspace() is one the obstacles -- it initializes db's
commitlog as first step.

This patch moves the commitlog initialization out of the mentioned
helper. The result looks clumsy, but it's temporary, next patches will
brush it up.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:36:42 +03:00
Pavel Emelyanov
bd2b7dca0e database: Remove unused mm arg from init_non_system_keyspaces()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:35:37 +03:00
Pavel Emelyanov
dc92f220e4 database: Drop get_available_memory() helper
It's only used on start to provide the total_memory() value to the
repair configuration code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:35:32 +03:00
Pavel Emelyanov
bb23986826 wasm: Localize it to database usage
The wasm::engine exists as a sharded<> service in main, but it's only
passed by local reference into database on start. There's no much profit
in keeping it at main scope, things get much simpler if keeping the
engine purely on database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:35:17 +03:00
Avi Kivity
08042c1688 Merge 'reader_permit: make query max result size accessible from the permit' from Kamil Braun
This will make it easier, for example, to enforce memory limits in lower
levels of the `flat_mutation_reader` stack.

By default, the query result size is unlimited. However, for specific queries it is
possible to store a different value (e.g. obtained from a `read_command` object)
through a setter. An example of this can be seen in the last commit of this PR,
where we set the limit to `cmd.max_result_size` if engaged, or to the 'unlimited
query' limit (using `database::get_unlimited_query_max_result_size()`) if not.

Refs: #9281. The v2 version of the reverse sstable reader PR will be based on this PR:
we'll use the query max result size parameter in one of the readers down the stack
where `read_command` is not available but `reader_permit` is.

Closes #9341

* github.com:scylladb/scylla:
  table, database: query, mutation_query: remove unnecessary class_config param
  reader_permit: make query max result size accessible from the permit
  reader_concurrency_semaphore: remove default parameter values from constructors
  query_class_config: remove query::max_result_size default constructor
2021-09-14 16:17:18 +03:00
Kamil Braun
c12e265eb8 table, database: query, mutation_query: remove unnecessary class_config param
The semaphore inside was never accessed and `max_memory_for_unlimited_query`
was always equal to `*cmd.max_result_size` so the parameter was completely
redundant.

`cmd.max_result_size` is supposed to be always set in the affected
functions - which are executed on the replica side - as soon as the
replica receives the `read_command` object, in case the parameter was
not set by the coordinator. However, we don't have a guarantee at the
type level (it's still an `optional`). Many places used
`*cmd.max_result_size` without even an assertion.

We make the code a bit safer, we check for `cmd.max_result_size` and if
it's indeed engaged, store it in `reader_permit`. We then access it from
`reader_permit` where necessary. If `cmd.max_result_size` is not set, we
assume this is an unlimited query and obtain the limit from
`get_unlimited_query_max_result_size`.
2021-09-14 13:39:56 +02:00
Avi Kivity
3f2c680b70 Merge 'Add initial support for WebAssembly in user-defined functions (UDF)' from Piotr Sarna
This series adds very basic support for WebAssembly-based user-defined functions.

This series comes with a basic set of tests which were used to designate a minimal goal for this initial implementation.

Example usage:
```cql
CREATE FUNCTION ks.fibonacci (str text)
    RETURNS NULL ON NULL INPUT
    RETURNS boolean
    LANGUAGE xwasm
    AS ' (module
  (func $fibonacci (param $n i32) (result i32)
    (if
      (i32.lt_s (local.get $n) (i32.const 2))
      (return (local.get $n))
    )
    (i32.add
      (call $fibonacci (i32.sub (local.get $n) (i32.const 1)))
      (call $fibonacci (i32.sub (local.get $n) (i32.const 2)))
    )
  )
  (export "fibonacci" (func $fibonacci))
) '
```

Note that the language is currently called "xwasm" as in "experimental wasm", because its interface is still subject to change in the future.

Closes #9108

* github.com:scylladb/scylla:
  docs: add a WebAssembly entry
  cql-pytest: add wasm-based tests for user-defined functions
  main: add wasm engine instantiation
  treewide: add initial WebAssembly support to UDF
  wasm: add initial WebAssembly runtime implementation
  db: add wasm_engine pointer to database
  lang: add wasm_engine service
  import wasmtime.hh
  lua: move to lang/ directory
  cql3: generalize user-defined functions for more languages
2021-09-14 11:34:20 +03:00
Avi Kivity
115d6d8d4c system_keyspace: prepare forward-declared members
In anticipation of making system_keyspace a class instead of a
namespace, rename any member that is currently forward-declared,
since one can't forward-declare a class member. Each member
is taken out of the system_keyspace namespace and gains a
system_keyspace prefix. Aliases are added to reduce code churn.

The result isn't lovely, but can be adjusted later.
2021-09-13 15:11:26 +03:00
Piotr Sarna
83f46e6e6f db: add wasm_engine pointer to database
WASM engine needs to be used from two separate contexts:
 - when a user-defined function is created via CQL
 - when a user-defined function is received during schema migration

The common instance that these two have in common is the database
object, so that's where the reference is stored.
2021-09-13 11:01:33 +02:00
Benny Halevy
56e063ce93 keyspace: get rid of set_replication_strategy
It's unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210906133905.3307397-1-bhalevy@scylladb.com>
2021-09-06 16:48:35 +03:00
Raphael S. Carvalho
3a1cf3aa88 database: document database::get_keyspace_local_ranges()
Documentation was extracted from abstract_replication_strategy::get_ranges(),
which says:
    // get_ranges() returns the list of ranges held by the given endpoint.
    // The list is sorted, and its elements are non overlapping and non wrap-around.

That's important because users of get_keyspace_local_ranges() expect
that the returned list is both sorted and non overlapping, so let's
document it to prevent someone from removing any of these properties.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210805140628.537368-1-raphaelsc@scylladb.com>
2021-08-17 21:44:24 +03:00
Asias He
cc44edb4e2 database: Detemplate run_async
I initially tried to use a noncopyable_function to avoid the unnecessary
template usage.

However, since database::apply_in_memory is a hot function. It is better
to use with_gate directly. The run_async function does nothing but calls
with_gate anyway.

Closes #9160
2021-08-12 07:53:10 +03:00
Asias He
4ae6eae00a table: Get rid of table::run_compaction helper
The table::run_compaction is a trivial wrapper for
table::compact_sstables.

We have lots of similar {start, trigger, run}_compaction functions.
Dropping the run_compaction wrapper to reduce confusion.

Closes #9161
2021-08-09 14:02:54 +03:00
Asias He
6350a19f73 compaction: Move compaction_strategy.hh to compaction dir
The top dir is a mess. Move compaction_strategy.hh and
compaction_strategy_type.hh to the new home.
2021-08-07 08:06:37 +08:00
Michael Livshin
64dca1fef9 memtables: count read row tombstones
Refs #7749.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:41:11 +03:00
Michael Livshin
2ee9f1b951 memtables: add metric and accounter for range tombstone reads
Refs #7749.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:41:11 +03:00
Raphael S. Carvalho
c399601833 table: kill move_sstables_from_staging()
not used anywhere.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210728175403.86867-1-raphaelsc@scylladb.com>
2021-07-29 10:42:36 +03:00
Tomasz Grabiec
b044db863f Merge 'db/virtual_table: Streaming tables for large data + describe_ring example table' from Juliusz Stasiewicz
This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data.

This PR was created by @jul-stas and @StarostaGit

Closes #8961

* github.com:scylladb/scylla:
  test/boost: run_mutation_source_tests on streaming virtual table
  system_keyspace: Introduce describe_ring table as virtual_table
  storage_service: Pass the reference down to system_keyspace
  endpoint_details: store `_host` as `gms::inet_address`
  queue_reader: implement next_partition()
  virtual_tables: Introduce streaming_virtual_table
  flat_mutation_reader: Add a new filtering reader factory method
2021-07-23 18:05:51 +02:00
Raphael S. Carvalho
e4eb7df1a1 table: Make correctness of concurrent sstable list update robust
Today, table relies on row_cache::invalidate() serialization for
concurrent sstable list updates to produce correct results.
That's very error prone because table is relying on an implementation
detail of invalidate() to get things right.
Instead, let's make table itself take care of serialization on
concurrent updates.
To achieve that, sstable_list_builder is introduced. Only one
builder can be alive for a given table, so serialization is guaranteed
as long as the builder is kept alive throughout the update procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210721001716.210281-1-raphaelsc@scylladb.com>
2021-07-21 16:45:30 +03:00
Raphael S. Carvalho
aad72289e2 table: Kill load_sstable()
That function is dangerously used by distributed loader, as the latter
was responsible for invalidating cache for new sstable.
load_sstable() is an unsafe alternative to
add_sstable_and_update_cache() that should never have been used by
the outside world. Instead, let's kill it and make loader use
the safe alternative instead.
This will also make it easier to make sure that all concurrent updates
to sstable set are properly serialized.

Additionally, this may potentially reduce the amount of data evicted
from the cache, when the sstables being imported have a narrow range,
like high level sstables imported from a LCS table. Unlikely but
possible.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210721131949.26899-1-raphaelsc@scylladb.com>
2021-07-21 16:21:42 +03:00
Juliusz Stasiewicz
f8067d938d storage_service: Pass the reference down to system_keyspace
According to the policy of avoiding globals.
2021-07-20 14:18:24 +02:00
Botond Dénes
27fbca84f6 reader_concurrency_semaphore: remove prethrow_action
The semaphore accepts a functor as in its constructor which is run just
before throwing on wait queue overload. This is used exclusively to bump
a counter in the database::stats, which counts queue overloads. However,
there is now an identical counter in
reader_concurrency_semaphore::stats, so the database can just use that
directly and we can retire the now unused prethrow action.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210716111105.237492-1-bdenes@scylladb.com>
2021-07-19 15:47:37 +03:00
Raphael S. Carvalho
841e9227f9 table: Document the serialization requirement on sstable set rebuild
In order to avoid data loss bugs, that could come due to lack of
serialization when using the preemptable build_new_sstable_list(),
let's document the serialization requirement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210714201301.188622-1-raphaelsc@scylladb.com>
2021-07-17 18:09:00 +03:00
Pavel Emelyanov
1ed582304d memtable_list: Shorten flush coalescing codeflow
The memtable_list::flush() maintains a shared_promise object
to coalesce the flushers until the get_flush_permit() resolves.
Also it needs to keep the extraneous flushes counter bumped
while doing the flush itself.

All this can be coded in a shorter form and without the need
to carry shared_promise<> around.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210716164237.10993-1-xemul@scylladb.com>
2021-07-17 00:42:20 +02:00
Botond Dénes
ae4df99e6b database: remove now unused query execution stages 2021-07-14 17:19:02 +03:00
Botond Dénes
16d3cb4777 mutation_reader: remove now unused restricting_reader
Move the now orphaned new_reader_base_cost constant to
database.hh/table.cc, as its main user is now
`table::estimate_read_memory_cost()`.
2021-07-14 17:19:02 +03:00
Botond Dénes
01a4bb33de database: increase semaphore max queue size
Queued reads don't take 10KB (not even 1KB) for years now. But the real
motivation of this patch is that due to a soon-to-come change to
admission we expect larger queues especially in tests, so be more
forgiving with queue sizes.
2021-07-14 17:19:02 +03:00
Botond Dénes
7f2813e3fa database: mutation_query(): handle querier lookup/save on the database level
Instead of passing down the querier_cache_ctx to table::mutation_query(),
handle the querier lookup/save on the level where the cache exists.

The real motivation behind this change however is that we need to move
the lookup outside the execution stage, because the current execution
stage will soon be replaced by the one provided by the semaphore and to
use that properly we need to know if we have a saved permit or not.
2021-07-14 16:48:43 +03:00
Botond Dénes
d2f5393a43 database: query(): handle querier lookup/save on the database level
Instead of passing down the querier_cache_ctx to table::query(),
handle the querier lookup/save on the level where the cache exists.

The real motivation behind this change however is that we need to move
the lookup outside the execution stage, because the current execution
stage will soon be replaced by the one provided by the semaphore and to
use that properly we need to know if we have a saved permit or not.
2021-07-14 16:48:43 +03:00
Botond Dénes
97a03f9027 database: make_multishard_streaming_reader: use external permit
As a preparation for up-front admission, add a permit parameter to
`make_multishard_streaming_reader()`, which will be the admitted permit
once we switch to up-front admission. For now it has to be a
non-admitted permit.
A nice side-effect of this patch is that now permits will have a
use-case specific description, instead of the generic
"multishard-streaming-reader" one
2021-07-14 16:48:43 +03:00
Botond Dénes
999169e535 database: make_streaming_reader(): require permit
As a preparation for up-front admission, add a permit parameter to
`make_streaming_reader()`, which will be the admitted permit once we
switch to up-front admission. For now it has to be a non-admitted
permit.
A nice side-effect of this patch is that now permits will have a
use-case specific description, instead of the generic "streaming" one.
2021-07-14 16:48:43 +03:00
Botond Dénes
3ec149222d database: add obtain_reader_permit()
A convenience method for obtaining an admitted permit for a read on a
given table.
For now it uses the nowait semaphore obtaining method, as all normal
reads still use the old admission method. Migrating reads to this method
will make the switch easier, as there will be one central place to
replace the nowait method with the proper one.
2021-07-14 16:48:43 +03:00
Botond Dénes
a6b59f0d89 table: add estimate_read_memory_cost()
To be used for determining the base cost of reads used in admission. For
now it just returns the already used constant. This is a forward looking
change, to when this will be a real estimation, not just a hardcoded
number.
2021-07-14 16:48:43 +03:00