Commit Graph

4972 Commits

Author SHA1 Message Date
Botond Dénes
4bd4aa2e88 Merge 'memtable, cache: Eagerly compact data with tombstones' from Tomasz Grabiec
When memtable receives a tombstone it can happen under some workloads
that it covers data which is still in the memtable. Some workloads may
insert and delete data within a short time frame. We could reduce the
rate of memtable flushes if we eagerly drop tombstoned data.

One workload which benefits is the raft log. It stores a row for each
uncommitted raft entry. When entries are committed they are
deleted. So the live set is expected to be short under normal
conditions.

Fixes #652.

Closes #10807

* github.com:scylladb/scylla:
  memtable: Add counters for tombstone compaction
  memtable, cache: Eagerly compact data with tombstones
  memtable: Subtract from flushed memory when cleaning
  mvcc: Introduce apply_resume to hold state for partition version merging
  test: mutation: Compare against compacted mutations
  compacting_reader: Drop irrelevant tombstones
  mutation_partition: Extract deletable_row::compact_and_expire()
  mvcc: Apply mutations in memtable with preemption enabled
  test: memtable: Make failed_flush_prevents_writes() immune to background merging
2022-06-15 18:12:42 +03:00
Tomasz Grabiec
169025d9b4 memtable: Add counters for tombstone compaction 2022-06-15 11:30:25 +02:00
Pavel Emelyanov
9a88bc260c Merge 'various group0 start/stop issues' from Gleb
The series fixes a couple of crashes that were found during starting and
stopping Scylla with raft while doing ddl operations. Most of them
related to shutdown order between different components.

Also in scylla-dev gleb/group0-fixes-v1

CI https://jenkins.scylladb.com/job/releng/job/Scylla-CI/749/

* origin-dev/gleb/group0-fixes-v1:
  migration manager: remove unused code
  db/system_distributed_keyspace: do not announce empty schema
  main: stop raft before the migration manager
  storage_service: do not pass the raft group manager to storage_service constructor
  main: destroy the group0_client after stopping the group0
2022-06-15 11:44:03 +03:00
Michael Livshin
aab4cd850c allow pre-scrub snapshots of materialized views and secondary indices
Previously, any attempt to take a materialized view or secondary index
snapshot was considered a mistake and caused the snapshot operation to
abort, with a suggestion to snapshot the base table instead.

But an automatic pre-scrub snapshot of a view cannot be attributed to
user error, so the operation should not be aborted in that case.

(It is an open question whether the more correct thing to do during
pre-scrub snapshot would be to silently ignore views.  Or perhaps they
should be ignored in all cases except when the user explicitly asks to
snapshot them, by name)

Closes #10760.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-06-15 11:30:58 +03:00
Avi Kivity
5129280f45 Revert "Merge 'memtable, cache: Eagerly compact data with tombstones' from Tomasz Grabiec"
This reverts commit e0670f0bb5, reversing
changes made to 605ee74c39. It causes failures
in debug mode in
database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain,
though with low probability.

Fixes #10780
Reopens #652.
2022-06-14 18:06:22 +03:00
Avi Kivity
c80999fab4 cql3: expr: push is_satisfied_by regular and static column extraction to callers
is_satisfied_by() rearranges the static and regular columns from
query::result_row_view form (which is a use-once iterator) to
std::vector<managed_bytes_opt> (which uses the standard value
representation, and allows random access which expression
evaluation needs). Doing it in is_saitisfied_by() means that it is
done every time an expression is evaluated, which is wasteful. It's
also done even if the expression doesn't need it at all.

Push it out to callers, which already eliminates some calls.

We still pass cql3::expr::selection, which is a layering violation,
but that is left to another time.

Note that in view.cc's check_if_matches(), we should have been
able to move static_and_regular_columns calculation outside the
loop. However, we get crashes if we do. This is likely due to a
preexisting bug (which the zero iterations loop avoids). However,
in selection.cc, we are able to avoid the computation when the code
claims it is only handling partition keys or clustering keys.
2022-06-12 16:12:41 +03:00
Avi Kivity
4b715226fe cql3: expr: convert is_satisfied_by() signature to evaluation_inputs
Callers are converted, but the internals are kept using the old
conventions until more APIs are converted.

Although the new API allows passing no query_options, the view code
keeps passing dummy query_options and improvement is left as a FIXME.
2022-06-12 12:53:44 +03:00
Gleb Natapov
727a9071d8 db/system_distributed_keyspace: do not announce empty schema 2022-06-09 09:40:55 +03:00
Tomasz Grabiec
0bc45f9666 memtable: Add counters for tombstone compaction 2022-06-06 19:25:41 +02:00
Michael Livshin
632b4e5a9a fix "ninja dev-headers"
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
029508b77c flat_mutation_reader ist tot
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Avi Kivity
4b53af0bd5 treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines
coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime
of the function object is less ambiguous, and so it is safer. Replace all eligible
occurences (i.e. caller is a coroutine).

One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra
attention since there was a handle_exception() continuation attached. It is converted
to a try/catch.

Closes #10699
2022-05-31 09:06:24 +03:00
Pavel Emelyanov
7f2837824e system_keyspace: Save coroutine's captured variable on stack
Currently it works, but the newer version of seastar's map_reduce()
is compiled in a way to trigger use-after-free on accessing captured
value.

tests: unit(dev), unit.alternator(debug on v1)

Fixes #10689

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220523095409.6078-1-xemul@scylladb.com>
2022-05-30 17:46:32 +03:00
Benny Halevy
6677028212 sstables: mx/writer: auto-scale promoted index
Add column_index_auto_scale_threshold_in_kb to the configuration
(defaults to 10MB).

When the promoted index (serialized) size gets to this
threshold, it's halved by merging each two adjacent blocks
into one and doubling the desired_block_size.

Fixes #4217

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-24 13:32:35 +03:00
Avi Kivity
5285ccbb12 Merge 'Add prune ghost rows statement' from Piotr Sarna
This series is split from another, bigger RFC series which provides
manual remedies to deal with inconsistencies between the base table
and its views. This part deals with ghost rows by providing a statement
which fetches view rows from a given range, then reads its corresponding
rows from the base table (cl=ALL), and finally removes rows which were
not present in the base table at all, qualifying them as ghost rows.
Motivations for introducing such a statement:
 * in case of detected inconsistencies, it can be used to fix
   materialized views without recreating them from scratch, which can
   take days and generates lots of throughput
 * a tool which periodically scrubs a materialized view can be easily
   created on top of this statement, especially that it's possible
   to remove ghost rows from a user-defined view token range;

This series comes with a unit test.

The reason for digging up this series is because it's still possible to end up with ghost rows in certain rather improbable scenarios, and we lack a way of fixing them without rebuilding the whole view. For instance, in case of a failed synchronous update to a local view, the user will be notified that the query failed, but a ghost row can be created nonetheless. The pruning statement introduced in this series would allow healing the failure locally, without rebuilding the whole view.

Tests: unit(dev)

Closes #10426

* github.com:scylladb/scylla:
  docs: add a paragraph on PRUNE MATERIALIZED VIEW statement
  service,test: add a test case for error during pruning
  tests: add ghost row deletion test case
  cql3: enable ghost row deletion via CQL
  cql3: add a statement for deleting ghost rows
  cql3: convert is_json statement parameter to enum
  pager: add ghost row deleting pager
  db,view: add delete ghost rows visitor
2022-05-19 17:21:35 +03:00
Piotr Sarna
c3a9658535 db,view: add delete ghost rows visitor
The visitor is used to traverse view rows, and if it detects
a ghost row it qualifies it for deletion. Qualification is based
on a base table read with cl=ALL: if the corresponding row is not
present in the base table, it is considered a ghost.
2022-05-19 10:11:50 +02:00
Pavel Emelyanov
f81f1c7ef7 format-selector: Remove .sync() point
The feature listener callbacks are waited upon to finish in the
middle of the cluster joining process. I particular -- before
actually joining the cluster the format should have being selected.
For that there's a .sync() method that locks the semaphore thus
making sure that any update is finished and it's called right after
the wait_for_gossip_to_settle() finishes.

However, features are enabled inside the wait_for_gossip_to_settle()
in a seastar::async() context that's also waited upon to finish. This
waiting makes it possible for any feature listener to .get() any of
its futures that should be resolved until gossip is settled.

Said that, the format selection barrier can be moved -- instead of
waiting on the semaphore, the respective part of the selection code
can be .get()-ed (it all runs in async context). One thing to care
about -- the remainder should continue running with the gate held.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-16 14:14:14 +03:00
Pavel Emelyanov
7fee50f1e3 format-selector: Coroutinize maybe_select_format()
This method is run when a feature is enabled. It's a bit trickier than
the others, also there are two methods actually, that are merged into
one by this patch. By and large most of the care is about the _sel
gate and _sem semaphore.

The gate protects the whole selection code from the selector being freed
from underneath it on stop. The semaphore is only needed to keep two
different format selections from each other -- each update the system
keyspace, local variable and replica::database instance on all shards.
In the end there's a gossiper update, but it happens outside of the
semaphore.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-16 14:13:59 +03:00
Pavel Emelyanov
93df88aac4 format-selector: Coroutinize simple methods
These all are just straightfowrard usage of co_await's around the code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-16 14:13:59 +03:00
Avi Kivity
528ab5a502 treewide: change metric calls from make_derive to make_counter
make_derive was recently deprecated in favor of make_counter, so
make the change throughput the codebase.

Closes #10564
2022-05-14 12:53:55 +02:00
Avi Kivity
5937b1fa23 treewide: remove empty comments in top-of-files
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.

Closes #10562
2022-05-13 07:11:58 +02:00
Michael Livshin
00ed4ac74c batchlog_manager: warn when a batch fails to replay
Only for reasons other than "no such KS", i.e. when the failure is
presumed transient and the batch in question is not deleted from
batchlog and will be retried in the future.

(Would info be more appropriate here than warning?)

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>

Closes #10556
2022-05-12 13:34:03 +03:00
Benny Halevy
e1d58d4422 database: add snapshot_on_all
And move the logic from snapshot-ctl down to the
replica::database layer.

A following patch will move the flush phase
from the replica::table::snapshot layer
out to the caller.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-10 10:45:14 +03:00
Benny Halevy
aa127a2dbb snapshot-ctl: run_snapshot_modify_operation: reject views and secondary index using the schema
Detecting a secondary index by checking for a dot
in the table name is wrong as tables generated by Alternator
may contain a dot in their name.

Instead detect bot hmaterialized view and secondary indexes
using the schema()->is_view() method.

Fixes #10526

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-10 10:44:52 +03:00
Benny Halevy
1fbcdbd2e8 snapshot-ctl: refactor and coroutinize take_snapshot / take_column_family_snapshot
There is no functional change in this patch.
Only refactoring of the code.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-10 10:16:39 +03:00
Benny Halevy
5b4eb44795 database: add flush_on_all variants
Use by api layer.

Will be used in a later patch to flush
on all shards before taking a snapshot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-10 09:56:44 +03:00
Botond Dénes
fd27fbfe64 Merge "Add user types carrier helper" from Pavel Emelyanov
"
There's a cql_type_parser::parse() method that needs to get user
types for a keyspace by its name. For this it uses the global
storage proxy instance as a place to get database from. This set
introduces an abstract user_types_storage helper object that's
responsible in providing the user types for the caller.

This helper, in turn, is provided to the parse() method by the
database itself or by the schema_ctxt object that needs parse()
to unfreeze schemas and doesn't have database at those times.

This removes one more get_storage_proxy() call.
"

* 'br-user-types-storage' of https://github.com/xemul/scylla:
  cql_type_parser: Require user_types_storage& in parse()
  schame_tables: Add db/ctxt args here and there
  user_types: Carry storage on database and schema_ctxt
  data_dictionary: Introduce user types storage
2022-05-09 17:38:52 +03:00
Pavel Emelyanov
0aea43a245 gossiper: Make state and locks maps private
Locks are not needed outside gossiper, state map is sometimes read from,
but there a const getter for such cases. Both methods now desrve the
underbar prefix, but it doesn't come with this short patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-06 10:34:48 +03:00
Piotr Sarna
eeec502aee Merge 'gms: feature_service: reduce boilerplate to add a cluster feature' from Avi Kivity
Currently, adding a cluster feature requires editing several files and
repeating the new feature name several times. This series reduces
the boilerplate to a single line (for non-experimental features), and
perhaps three for experimental features.

Closes #10488

* github.com:scylladb/scylla:
  gms: feature_service: remove variable/helper function duplication
  gms: feature: make `operator bool` implicit
  gms: feature_service: remove feature variable duplication in enable()
  gms: feature_service: remove feature variable declaration/definition duplication
  gms: features: de-quadruplicate active feature names
  gms: features: de-quadruplicate deprecated feature names
  gms: feature_service: avoid duplicating feature names when listing known features
2022-05-05 12:43:15 +02:00
Pavel Emelyanov
0f698910e8 cql_type_parser: Require user_types_storage& in parse()
Right now to get user types the method in question gets global proxy
instance to get database from it and then peek a keyspace, its metadata
and, finally, the user types. There's also a safety check for proxy not
being initialized, which happens in tests.

Instead of messing with the proxy, the parse() method now accepts the
user_types_storage reference from which it gets the types. All the
callers already have the needed storage at hand -- in most of the cases
it's one shared between the database and schema_ctxt. In case of tests
is's a dummy storage, in case of schema-loader it's its local one.

The get_column_mapping() is special -- it doesn't expect any user-types
to be parsed and passes "" keyspace into it, neither it has db/ctxt to
get types storage from, so it can safely use the dummy one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-05 13:11:18 +03:00
Pavel Emelyanov
44f38d4de2 schame_tables: Add db/ctxt args here and there
This is to have them in places that call cql_type_parser::parse.
Pure churn reduction for the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-05 13:11:18 +03:00
Pavel Emelyanov
2104d90dd0 user_types: Carry storage on database and schema_ctxt
The user types storage is needed in cql_type_parser::parse which is in
turn called with either replica::database or scema_ctxt at hand.

To facilitate the former case replica::database has its own user types
storage created in database constructor.

The latter case is a bit trickier. In many cases the ctxt is created as
a temporary object and the database is available at those places. Also
the ctxt object lives on the schema_registry instance which doesn't have
database nearby. However, that ctxt lifetime is the same as the registry
instance one and when it's created there's a database at hand (it's the
database constructor that calls schema_registry.init() passing "this"
into it). Thus, the solution is to make database's user types storage be
a shared pointer that's shared between database itself and all the ctxts
out there including the one that lives on schema_registry instance.

When database goes away it .deactivate()s its user types storage so that
any ctxts that may share it stay on the safe side and don't use database
after free. This part will go away when the schema_registry will be
deglobalized.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-05 13:06:04 +03:00
Avi Kivity
19ab3edd77 gms: feature_service: remove variable/helper function duplication
Each feature has a private variable and a public accessor. Since the
accessor effectively makes the variable public, avoid the intermediary
and make the variable public directly.

To ease mechanical translation, the variable name is chosen as
the function name (without the cluster_supports_ prefix).

References throughout the codebase are adjusted.
2022-05-04 18:59:56 +03:00
Michał Radwański
29e09a3292 db/config: command line arguments logger_stdout_timestamps and logger_ostream_type are no longer ignored
Closes #10452
2022-05-04 14:40:52 +03:00
Pavel Emelyanov
063d26bc9e system_keyspace/config: Swallow string->value cast exception
When updating an updateable value via CQL the new value comes as a
string that's then boost::lexical_cast-ed to the desired value. If the
cast throws the respective exception is printed in logs which is very
likely uncalled for.

fixes: #10394
tests: manual

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220503142942.8145-1-xemul@scylladb.com>
2022-05-04 08:35:12 +03:00
Pavel Emelyanov
11c99fc41b table: Don't use global gossiper
The table::get_hit_rate needs gossiper to get hitrates state from.
There's no way to carry gossiper reference on the table itself, so it's
up to the callers of that method to provide it. Fortunately, there's
only one caller -- the proxy -- but the call chain to carry the
reference it not very short ... oh, well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:33:08 +03:00
Eliran Sinvani
a16b4e407d internal queries: add caching to some queries
Some of the internal queries didn't have caching enabled even though
there are chances of the query executing in large bursts or relatively
often, example of the former is `default_authorized::authorize` and for
the later is `system_distributed_keyspace::get_service_levels`.

Fixes #10335

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2022-05-01 13:30:02 +03:00
Eliran Sinvani
e0c7178e75 query_processor: remove default internal query caching behavior
When executing internal queries, it is important that the developer
will decide if to cache the query internally or not since internal
queries are cached indefinitely. Also important is that the programmer
will be aware if caching is going to happen or not.
The code contained two "groups" of `query_processor::execute_internal`,
one group has caching by default and the other doesn't.
Here we add overloads to eliminate default values for caching behaviour,
forcing an explicit parameter for the caching values.
All the call sites were changed to reflect the original caching default
that was there.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2022-05-01 08:33:55 +03:00
Eliran Sinvani
38b7ebf526 query_processor: make execute_internal caching parameter more verbose
`execute_internal` has a parameter to indicate if caching a prepared
statement is needed for a specific call. However this parameter was a
boolean so it was easy to miss it's meaning in the various call sites.
This replaces the parameter type to a more verbose one so it is clear
from the call site what decision was made.
2022-05-01 08:33:55 +03:00
Avi Kivity
de0ee13f45 schema_tables: forward-declare user_function and user_aggerates
These bring in wasm.hh (though they really shouldn't) and make
everyone suffer. Forward declare instead and add missing includes
where needed.

Closes #10444
2022-04-28 07:22:02 +03:00
Benny Halevy
e88871f4ec replica: database: move shard_of implementation to mutation layer
We don't need the database to determine the shard of the mutation,
only its schema. So move the implementation to the respecive
definitions of mutation and frozen_mutation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10430
2022-04-27 14:40:24 +03:00
Botond Dénes
3051fc3cbc Merge 'Fix some errors and issues found by gcc 12' from Avi Kivity
gcc 12 checks some things that clang doesn't, resulting in compile errors.
This series fixes some of theses issues, but still builds (and tests) with clang.

Unfortunately, we still don't have a clean gcc build due to an outstanding bug [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056

Closes #10386

* github.com:scylladb/scylla:
  build: disable warnings that cause false-positive errors with gcc 12
  utils: result_loop: remove invalid and incorrect constraint
  service: forward_service: avoid using deprecated std::bind1st and std::not1
  repair: explicityl ignore tombstone gc update response
  treewide: abort() after switch in formatters
  db: view: explicitly ignore unused result
  compaction: leveled_compaction_strategy: avoid compares between signed and unsigned
  compaction_manager: compaction_reenabler: disambiguate compaction_state
  api: avoid function specialization in req_param
  alternator: ttl: avoid specializing class templates in non-namespace scope
  alternator: executor: fix signed/unsigned comparison in is_big()
2022-04-19 10:25:38 +03:00
Avi Kivity
a1df583dea db: view: explicitly ignore unused result
Otherwise, gcc complains.
2022-04-18 12:27:18 +03:00
Piotr Sarna
fea18943cd schema_tables: drop leftover change to system_schema.keyspaces
Series 59d56a3fd7 introduced
an accidental backward incompatible regression by adding
a column to system_schema.keyspaces and then not even using
it for anything. It's a leftover from the original hackathon
implementation and should never reach master in the first place.
Fortunately, the series isn't part of any stable release yet.

Fixes #10376
Tests: manual, verifying that the system_schema.keyspaces table
no longer contains the extraneous column.

Closes #10377
2022-04-18 12:00:43 +03:00
Kamil Braun
41f5b7e69e Merge branch 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla into next
* 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla:
  main: allow joining raft group0 before waiting for gossiper to settle
  service: raft_group0: make `join_group0` re-entrant
  service: storage_service: add `join_group0` method
  raft_group_registry: update gossiper state only on shard 0
  raft: don't update gossiper state if raft is enabled early or not enabled at all
  gms: feature_service: add `cluster_uses_raft_mgmt` accessor method
  db: system_keyspace: add `bootstrap_needed()` method
  db: system_keyspace: mark getter methods for bootstrap state as "const"
2022-04-14 16:42:20 +02:00
Avi Kivity
8aec146dec Merge "Remove qctx from repair" from Pavel E
"
Repair code keeps its history in system keyspace and uses the qctx
global thing to update and query it. This set replaces the qctx with
the explicit reference on the system_keyspace object.

tests: unit(dev), dtest.repair_test(dev)
"

* 'br-repair-vs-qctx' of https://github.com/xemul/scylla:
  repair, system_keyspace: Query repair_history with a helper
  repair: Update loader code to use system_keyspace entry
  repair, system_keyspace: Update repair_history with a helper
  repair: Keep system keyspace reference
2022-04-12 17:08:41 +03:00
Avi Kivity
546ee814dd Merge 'schema_tables, sstables: return instead of throwing' from Piotr Sarna
This miniseries rewrites a few unnecessary throws into forwarding the exception directly. It's partially possible thanks to the new `co_await coroutine::return_exception` mechanism which allows returning from a coroutine early, without explicitly calling co_return (d5843f6e88).

Closes #10360

* github.com:scylladb/scylla:
  sstables: : remove unnecessary throws
  schema_tables: remove unnecessary throws
2022-04-12 15:18:14 +03:00
Piotr Sarna
91f130bd9c schema_tables: remove unnecessary throws
Throws are translated to passing the exception directly.
2022-04-12 13:09:27 +02:00
Pavel Emelyanov
05eb9c9416 repair, system_keyspace: Query repair_history with a helper
Querying the table is now done with the help of qctx directly. This
patch replaces it with a querying helper that calls the consumer
function with the entry struct as the argument.

After this change repair code can stop including query_context and
mess with untyped_result_set.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-12 14:04:21 +03:00
Pavel Emelyanov
9940016e05 repair, system_keyspace: Update repair_history with a helper
Current code works directly on the qctx which is not nice. Instead,
make it use the system keyspace reference. To make it work, the patch
adds a helper method and introduces a helper struct for the table
entry. This struct will also be used to query the table (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-12 13:57:57 +03:00