Commit Graph

1790 Commits

Author SHA1 Message Date
Avi Kivity
fd663bcb94 cql3: util: change where clause utilities to accept a single expression rather than a vector of terms
Conversion to terms happens internally via boolean_factors().
2022-07-22 20:14:48 +03:00
Avi Kivity
a5dd588465 cql3: statement_restrictions: accept a single expression rather than a vector
Move closer to the goal of accepting a generic expression for WHERE
clause by accepting a generic expression in statement_restrictions. The
various callers will synthesize it from a vector of terms.
2022-07-22 20:14:48 +03:00
Avi Kivity
8085b9f57a cql3: expr: add boolean_factors() function to factorize an expression
When analyzing a WHERE clause, we want to separate individual
factors (usually relations), and later partition them into
partition key, clustering key, and regular column relations. The
first step is separation, for which this helper is added.

Currently, it is not required since the grammar supplies the
expression in separated form, but this will not work once it is
relaxed to allow any expression in the WHERE clause.

A unit test is added.
2022-07-22 20:14:48 +03:00
Avi Kivity
13a64d8ab2 Merge 'Remove all remaining restrictions classes' from Jan Ciołek
This PR removes all code that used classes `restriction`, `restrictions` and their children.

There were two fields in `statement_restrictions` that needed to be dealt with: `_clustering_columns_restrictions` and `_nonprimary_key_restrictions`.

Each function was reimplemented to operate on the new expression representaiion and eventually these fields weren't needed anymore.

After that the restriction classes weren't used anymore and could be deleted as well.

Now all of the code responsible for analyzing WHERE clause and planning a query works on expressions.

Closes #11069

* github.com:scylladb/scylla:
  cql3: Remove all remaining restrictions code
  cql3: Move a function from restrictions class to the test
  cql3: Remove initial_key_restrictions
  cql3: expr: Remove convert_to_restriction
  cql3: Remove _new from _new_nonprimary_key_restrictions
  cql3: Remove _nonprimary_key_restrictions field
  cql3: Reimplement uses of _nonprimary_key_restrictions using expression
  cql3: Keep a map of single column nonprimary key restrictions
  cql3: Remove _new from _new_clustering_columns_restrictions
  cql3: Remove _clustering_columns_restrictions from statement_restrictions
  cql3: Use a variable instead of dynamic cast
  cql3: Use the new map of single column clustering restrictions
  cql3: Keep a map of single column clustering key restrictions
  cql3: Return an expression in get_clustering_columns_restrctions()
  cql3: Reimplement _clustering_columns_restrictions->has_supporting_index()
  cql3: Don't create single element conjunction
  cql3: Add expr::index_supports_some_column
  cql3: Reimplement has_unrestricted_components()
  cql3: Reimplement _clustering_columns_restrictions->need_filtering()
  cql3: Reimplement num_prefix_columns_that_need_not_be_filtered
  cql3: Use the new clustering restrictions field instead of ->expression
  cql3: Reimplement _clustering_columns_restrictions->size() using expressions
  cql3: Reimplement _clustering_columns_restrictions->get_column_defs() using expressions
  cql3: Reimplement _clustering_columns_restrictions->is_all_eq() using expressions
  cql3: expr: Add has_only_eq_binops function
  cql3: Reimplement _clustering_columns_restrictions->empty() using expressions
2022-07-20 18:01:15 +03:00
Jan Ciolek
599bcd6ea7 cql3: Remove all remaining restrictions code
The classes restriction, restrictions and its children
aren't used anywhere now and can be safely removed.

Some includes need to be modified for the code to compile.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Jan Ciolek
bff0b87c18 cql3: Move a function from restrictions class to the test
statement_restrictions_test uses a function that is defined
in multi_column_restriction.hh.

This file will be removed soon and for the test to still work
the function is moved to the test source.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Botond Dénes
6e20cb3255 Merge 'database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard' from Benny Halevy
Currently, all the mutations this test generates are applied on shard 0.
In rare cases, this may lead to the following crash, when the flushed
sstable doesn't contain any key that belongs to the current shard,
as seen in https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1390/artifact/testlog/x86_64/dev/database_test.test_truncate_without_snapshot_during_writes.114.log
```
WARN  2022-07-17 17:41:36,630 [shard 0] sstable - create_sharding_metadata: range=[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}] has no intersection with shard=0 first_key={key: pk{00046b657930}, token:-468459073612751032} last_key={key: pk{00046b657930}, token:-468459073612751032} ranges_single_shard=[] ranges_all_shards={{1, {[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}]}}}
ERROR 2022-07-17 17:41:36,630 [shard 0] table - failed to write sstable /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db)
ERROR 2022-07-17 17:41:36,631 [shard 0] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db). Aborting, at 0x329e28e 0x329e780 0x329ea88 0xf5bc69 0xf956b1 0x3196dc4 0x3198037 0x319742a 0x32be2e4 0x32bd8e1 0x32ba01c 0x317f97d /lib64/libpthread.so.0+0x92a4 /lib64/libc.so.6+0x100322
```

Instead, generate random keys and apply them on their
owning shard, and truncate all database shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11066

* github.com:scylladb/scylla:
  database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard
  table: try_flush_memtable_to_sstable: consume: close reader on error
2022-07-20 09:06:07 +03:00
Avi Kivity
5a30f9b789 Merge 'Distributed aggregate query' from Michał Jadwiszczak
This PR extends #9209. It consists of 2 main points:

To enable parallelization of user-defined aggregates, reduction function was added to UDA definition. Reduction function is optional and it has to be scalar function that takes 2 arguments with type of UDA's state and returns UDA's state

All currently implemented native aggregates got their reducible counterpart, which return their state as final result, so it can be reduced with other result. Hence all native aggregates can now be distributed.

Local 3-node cluster made with current master. `node1` updated to this branch. Accessing node with `ccm <node-name> cqlsh`

I've tested belowed things from both old and new node:
- creating UDA with reduce function - not allowed
- selecting count(*) - distributed
- selecting other aggregate function - not distributed

Fixes: #10224

Closes #10295

* github.com:scylladb/scylla:
  test: add tests for parallelized aggregates
  test: cql3: Add UDA REDUCEFUNC test
  forward_service: enable multiple selection
  forward_service: support UDA and native aggregate parallelization
  cql3:functions: Add cql3::functions::functions::mock_get()
  cql3: selection: detect parallelize reduction type
  db,cql3: Move part of cql3's function into db
  selection: detect if selectors factory contains only simple selectors
  cql3: reducible aggregates
  DB: Add `scylla_aggregates` system table
  db,gms: Add SCYLLA_AGGREGATES schema features
  CQL3: Add reduce function to UDA
  gms: add UDA_NATIVE_PARALLELIZED_AGGREGATION feature
2022-07-19 19:05:19 +03:00
Benny Halevy
1c26d49fba database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard
Currently, all the mutations this test generates are applied on shard 0.
In rare cases, this may lead to the following crash, when the flushed
sstable doesn't contain any key that belongs to the current shard,
as seen in https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1390/artifact/testlog/x86_64/dev/database_test.test_truncate_without_snapshot_during_writes.114.log
```
WARN  2022-07-17 17:41:36,630 [shard 0] sstable - create_sharding_metadata: range=[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}] has no intersection with shard=0 first_key={key: pk{00046b657930}, token:-468459073612751032} last_key={key: pk{00046b657930}, token:-468459073612751032} ranges_single_shard=[] ranges_all_shards={{1, {[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}]}}}
ERROR 2022-07-17 17:41:36,630 [shard 0] table - failed to write sstable /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db)
ERROR 2022-07-17 17:41:36,631 [shard 0] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db). Aborting, at 0x329e28e 0x329e780 0x329ea88 0xf5bc69 0xf956b1 0x3196dc4 0x3198037 0x319742a 0x32be2e4 0x32bd8e1 0x32ba01c 0x317f97d /lib64/libpthread.so.0+0x92a4 /lib64/libc.so.6+0x100322
```

Instead, generate random keys and apply them on their
owning shard, and truncate all database shards.

Fixes #11076

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-19 16:55:11 +03:00
Pavel Emelyanov
a56e2c83f3 sstables: Keep priority class on sstable_directory
Current code accepts priotity class as an argument to various functions
that need it and all its callers use streaming class. Next patches will
needs to sometimes use default class, but it will require heavy patching
of the distributed loader. Things get simpler if the priority class is
kept on sstable_directory on start.

This change also simplifies the ongoing effort on unification of sched
and IO classes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-19 12:14:41 +03:00
Jadw1
7497fda370 test: add tests for parallelized aggregates 2022-07-18 15:25:42 +02:00
Botond Dénes
9afd2dc428 Merge 'Make compaction manager switch to table abstraction ' from Raphael "Raph" Carvalho
This work gets us a step closer to compaction groups.

Everything in compaction layer but compaction_manager was converted to table_state.

After this work, we can start implementing compaction groups, as each group will be represented by its own table_state. User-triggered operations that span the entire table, not only a group, can be done by calling the manager operation on behalf of each group and then merging the results, if any.

Closes #11028

* github.com:scylladb/scylla:
  compaction: remove forward declaration of replica::table
  compaction_manager: make add() and remove() switch to table_state
  compaction_manager: make run_custom_job() switch to table_state
  compaction_manager: major: switch to table_state
  compaction_manager: scrub: switch to table_state
  compaction_manager: upgrade: switch to table_state
  compaction: table_state: add get_sstables_manager()
  compaction_manager: cleanup: switch to table_state
  compaction_manager: offstrategy: switch to table_state()
  compaction_manager: rewrite_sstables(): switch to table_state
  compaction_manager: make run_with_compaction_disabled() switch to table_state
  compaction_manager: compaction_reenabler: switch to table_state
  compaction_manager: make submit(T) switch to table_state
  compaction_manager: task: switch to table_state
  compaction: table_state: Add is_auto_compaction_disabled_by_user()
  compaction: table_state: Add on_compaction_completion()
  compaction: table_state: Add make_sstable()
  compaction_manager: make can_proceed switch to table_state
  compaction_manager: make stop compaction procedures switch to table_state
  compaction_manager: make get_compactions() switch to table_state
  compaction_manager: change task::update_history() to use table_state instead
  compaction_manager: make can_register_compaction() switch to table_state
  compaction_manager: make get_candidates() switch to table_state
  compaction_manager: make propagate_replacement() switch to table_state
  compaction: Move table::in_strategy_sstables() and switch to table_state
  compaction: table_state: Add maintenance sstable set
  compaction_manager: make has_table_ongoing_compaction() switch to table_state
  compaction_manager: make compaction_disabled() switch to table_state
  compaction_manager: switch to table_state for mapping of compaction_state
  compaction_manager: move task ctor into source
2022-07-18 15:18:29 +03:00
Benny Halevy
bbbbea65fb database: clear_snapshot: remove dropped table directory when it has no remaining snapshots
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
c70a675d77 database: clear_snapshot: make it a coroutine and use thread
and use an async thread around `directory_lister`
rather than `lister::scan_dir` to simplify the implementation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
e710fe527c database_test: add clear_multiple_snapshots test
Based on the `clear_snapshot` test.

Test with multiple snapshots and different
combinations of parameters to database::clear_snapshot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
ae3b1b5a64 database_test: drop_table_with_snapshots: test auto_snapshot
Refactor test_drop_table_with_auto_snapshot out of
drop_table_with_snapshots, adding a auto_snapshot param,
controlling how to configure the cql_test_env db:.config::auto_snapshot,
so we can test both cases - auto_snapshot enabled and disabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
af6805dd75 database_test: populate_from_quarantine_works: pass optional db:config to do_with_some_data
Instead of just `tmpdir_for_data`, so we can easily set auto_snapshot
for `drop_table_with_snapshots` in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
2e37dcf62a database: drop_table_on_all_shards: remove table directory having no snapshots
If the table to remove has no snapshots then
completely remove its directory on storage
as the left-over directory slows down operations on the keyspace
and makes searching for live tables harder.

Fixes #10896

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
e005629afb database: add drop_table_on_all_shards
Runs drop_column_family on all database shards.
Will be extended later to consider removing the table directory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Raphael S. Carvalho
9a1efc69d0 compaction_manager: major: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
cebe6e22cb compaction_manager: scrub: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
d29f7070d9 compaction_manager: upgrade: switch to table_state
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
c2678ca661 compaction: table_state: add get_sstables_manager()
That will be needed for retrieving sstable manager in
perform_sstable_upgrade().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
7c1d178f4e compaction_manager: make submit(T) switch to table_state
Now that submit() switched to table_state, compaction_reenabler
and friends can switch to table_state too.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
43136a3ca7 compaction: table_state: Add is_auto_compaction_disabled_by_user()
auto_compaction_disabled_by_user is a configuration that can be enabled
or disabled on a particular table. We're adding this interface to
avoid having to push the configuration for every compaction_state,
which would result in redundant information as the configuration
value is the same for all table states.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
1deeeff825 compaction: table_state: Add on_compaction_completion()
The idea is that we'll have a single on-completion interface for both
"in-strategy" and off-strategy compactions, so not to pollute table_state
with one interface for each.
replica::table::on_compaction_completion is being moved into private namespace.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
1520580212 compaction: table_state: Add make_sstable()
compaction_manager needs this interface when setting the sstable
creation lambda in compaction_descriptor, which is then forwarded
into the actual compaction procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
cb05142d58 compaction: Move table::in_strategy_sstables() and switch to table_state
in_strategy_sstables() doesn't have to be implemented in table, as it's
simply about main set with maintenance and staging files filtered out.

Also, let's make it switch to table_state as part of ongoing work.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
23e21ed5bc compaction: table_state: Add maintenance sstable set
Needed for off-strategy compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-16 21:35:06 -03:00
Raphael S. Carvalho
f52ad722f3 compaction_manager: rename table_state's get_sstable_set to main_sstable_set
With compaction_manager switching to table_state, we'll need to
introduce a method in table_state to return maintenance set.
So better to have a descriptive name for main set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-13 11:12:33 -03:00
Avi Kivity
e1e4a73793 Merge 'Always register fully expired sstables for compaction' from Benny Halevy
If the compaction_descriptor returned by `time_window_compaction_strategy::get_sstables_for_compaction`
is marked with `has_only_fully_expired::yes` it should always be compacted
since `time_window_compaction_strategy::get_sstables_for_compaction` is not idempotent.

It sets `_last_expired_check` and if compaction is postponed and retried before
`expired_sstable_check_frequency` has passed, it will not look for those fully-expired
sstables again. Plus, compacting them is the cheapest possible as it does not require
reading anything, just deleting the input sstables, so there's no reason not postpone it.

Also, extend `max_ongoing_compaction_test` to test serialization of compaction jobs with the same weight.

Fixes #10989

Closes #10990

* github.com:scylladb/scylla:
  compaction_manager: always register descriptor with fully expired sstables for compaction
  test: max_ongoing_compaction_test: test serialization of regular compaction with same weight
  test: max_ongoing_compaction_test: reindent refactored code
  test: max_ongoing_compaction_test: define compact_all_tables lambda
  test: max_ongoing_compaction_test: refactor make_table_with_single_fully_expired_sstable
  test: max_ongoing_compaction_test: reduce number of tables
2022-07-12 18:40:01 +03:00
Avi Kivity
ea4a907090 Merge 'mutation_compactor: remove emit only live rows parameter' from Botond Dénes
Said parameter is a convenience so downstream consumers of the
mutation compactors don't have to check the `bool is_live` already
passed to them. This convenience however causes a template parameter and
additional logic for the compactor. As the most prominent of these
consumers (the query result builder) will soon have to switch to
`emit_only_live_rows::no` for other reasons anyway (it will want to count
tombstones), we take the opportunity to switch everybody to ::no. This
can be done with very little additional complexity to these consumers --
basically an additional if or two. With everybody using the `::no` variant
of the compactor, we can remove this template parameter and the logic
associated with it altogether.

Closes #10931

* github.com:scylladb/scylla:
  multishard_mutation_query: remove now pointless compact_for_result_state typedef
  mutation_compactor: remove only-live related logic
  mutation_compactor: remove emit_only_live_rows template parameter
  mutation_compactor: remove unused compact_mutation_state::parameters
  querier: remove {data,mutation}_querier aliases
  querier: remove now pointless emit_only_live_rows template parameter
  tree: use emit_only_live_rows::no
  querier: querier_cache: de-override insert() methods
2022-07-12 17:30:46 +03:00
Benny Halevy
6332816ccf compaction_manager: always register descriptor with fully expired sstables for compaction
If the compaction_descriptor returned by time_window_compaction_strategy::get_sstables_for_compaction
is marked with has_only_fully_expired::yes it should always be compacted
since time_window_compaction_strategy::get_sstables_for_compaction is not idempotent.

It sets _last_expired_check and if compaction is postponed and retried before
expired_sstable_check_frequency has passed, it will not look for those fully-expired
sstables again. Plus, compacting them is the cheapest possible as it does not require
reading anything, just deleting the input sstables, so there's no reason not postpone it.

Fixes #10989

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-12 12:04:04 +03:00
Benny Halevy
cfc7a5065a test: max_ongoing_compaction_test: test serialization of regular compaction with same weight
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-12 12:04:03 +03:00
Benny Halevy
65a5e0a7bb test: max_ongoing_compaction_test: reindent refactored code
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-12 12:03:32 +03:00
Benny Halevy
5212e81475 test: max_ongoing_compaction_test: define compact_all_tables lambda
To test both expired and non-expired sstables scenarios
we need to pass this helper function the expected number
of sstables before compaction and after compaction.

When compaction a set of fully-expired sstables,
we expect none to remain, while when the set of sstables
is not fully expired, we'll expect 1 output sstable
after compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-12 12:00:27 +03:00
Benny Halevy
fe4a59372e test: max_ongoing_compaction_test: refactor make_table_with_single_fully_expired_sstable
So we can use the lower-level build blocks to
test compaction serialization of both fully-expired
and non-fully-expired sstables scenarios in the following patches.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-12 11:56:41 +03:00
Benny Halevy
d18fc6a7ed test: max_ongoing_compaction_test: reduce number of tables
There is no need to test 100 tables.
10 tables are enough so make the test complete faster.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-12 11:53:01 +03:00
Botond Dénes
4d2ce5c304 mutation_compactor: remove emit_only_live_rows template parameter
Now that we use emit_only_live_rows::no everywhere we can remove this
template parameters. Only the template parameter is removed, the
internal logic around it is left in place (will be removed in a next
patch), by hard-wiring `only_live()`.
2022-07-12 08:43:49 +03:00
Botond Dénes
f912f5f373 querier: remove {data,mutation}_querier aliases
They now both mean the same thing: querier.
2022-07-12 08:41:51 +03:00
Botond Dénes
742dc10185 querier: querier_cache: de-override insert() methods
Soon, the currently two distinct types of queriers will be merged, as
the template parameter differentiating them will be gone. This will make
using type based overload for insert() impossible, as 2 out of the 3
types will be the same. Use different names instead.
2022-07-12 08:41:48 +03:00
Avi Kivity
53e0dc7530 bytes_ostream: base on managed_bytes
bytes_ostream is an incremental builder for a discontiguous byte container.
managed_bytes is a non-incremental (size must be known up front) byte
container, that is also compatible with LSA. So far, conversion between
them involves copying. This is unfortunate, since query_result is generated
as a bytes_ostream, but is later converted to managed_bytes (today, this
is done in cql3::expr::get_non_pk_values() and
compound_view_wrapper::explode(). If the two types could be made compatible,
we could use managed_bytes_view instead of creating new objects and avoid
a copy. It's also nicer to have one less vocabulary type.

This patch makes bytes_ostream use managed_bytes' internal representation
(blob_storage instead of bytes_ostream::chunk) and provides a conversion
to managed_bytes. All bytes_ostream users are left in place, but the goal
is to make bytes_ostream a write-only type with the only observer a conversion
to managed_bytes.

It turns out to be relatively simple. The internal representations were
already similar. I made blob_storage::ref_type self-initializing to
reduce churn (good practice anyway) and added a private constructor
to managed_bytes for the conversion.

Note that bytes_ostream can only be used to construct a non-LSA managed_bytes,
but LSA uses of managed_bytes are very strictly controlled (the entry
points to memtable and cache) so that's not a problem.

A unit test is added.

Closes #10986
2022-07-12 00:23:29 +03:00
Avi Kivity
34886ce1a1 Merge 'Allow regular compaction during major' from Benny Halevy
After acquiring the _compaction_state write lock,
select all sstables using get_candidates and register them
as compacting, then unlock the _compaction_state lock
to let regular compaction run in parallel.

Also, run major compaction in maintenance scheduling group.
We should separate the scheduling groups used for major compaction
from the the regular compaction scheduling group so that
the latter can be affected by the backlog tracker in case
backlog accumulates during a long running major compaction.

Fixes #10961

Closes #10984

* github.com:scylladb/scylla:
  compaction_manager: major_compaction_task: run in maintenance scheduling groupt
  compaction_manager: allow regular compaction to run in parallel to major
2022-07-11 17:11:51 +03:00
Avi Kivity
957bf48eb2 Merge 'Don't throw exceptions on the replica side when handling single partition reads and writes' from Piotr Dulikowski
This PR gets rid of exception throws/rethrows on the replica side for writes and single-partition reads. This goal is achieved without using `boost::outcome` but rather by replacing the parts of the code which throw with appropriate seastar idioms and by introducing two helper functions:

1.`try_catch` allows to inspect the type and value behind an `std::exception_ptr`. When libstdc++ is used, this function does not need to throw the exception and avoids the very costly unwind process. This based on the "How to catch an exception_ptr without even try-ing" proposal mentioned in https://github.com/scylladb/scylla/issues/10260.

This function allows to replace the current `try..catch` chains which inspect the exception type and account it in the metrics.

Example:

```c++
// Before
try {
    std::rethrow_exception(eptr);
} catch (std::runtime_exception& ex) {
    // 1
} catch (...) {
    // 2
}

// After
if (auto* ex = try_catch<std::runtime_exception>(eptr)) {
    // 1
} else {
    // 2
}
```

2. `make_nested_exception_ptr` which is meant to be a replacement for `std::throw_with_nested`. Unlike the original function, it does not require an exception being currently thrown and does not throw itself - instead, it takes the nested exception as an `std::exception_ptr` and produces another `std::exception_ptr` itself.

Apart from the above, seastar idioms such as `make_exception_future`, `co_await as_future`, `co_return coroutine::exception()` are used to propagate exceptions without throwing. This brings the number of exception throws to zero for single partition reads and writes (tested with scylla-bench, --mode=read and --mode=write).

Results from `perf_simple_query`:

```
Before (719724e4df):
  Writes:
    Normal:
      127841.40 tps ( 56.2 allocs/op,  13.2 tasks/op,   50042 insns/op,        0 errors)
    Timeouts:
      94770.81 tps ( 53.1 allocs/op,   5.1 tasks/op,   78678 insns/op,  1000000 errors)
  Reads:
    Normal:
      138902.31 tps ( 65.1 allocs/op,  12.1 tasks/op,   43106 insns/op,        0 errors)
    Timeouts:
      62447.01 tps ( 49.7 allocs/op,  12.1 tasks/op,  135984 insns/op,   936846 errors)

After (d8ac4c02bfb7786dc9ed30d2db3b99df09bf448f):
  Writes:
    Normal:
      127359.12 tps ( 56.2 allocs/op,  13.2 tasks/op,   49782 insns/op,        0 errors)
    Timeouts:
      163068.38 tps ( 52.1 allocs/op,   5.1 tasks/op,   40615 insns/op,  1000000 errors)
  Reads:
    Normal:
      151221.15 tps ( 65.1 allocs/op,  12.1 tasks/op,   43028 insns/op,        0 errors)
    Timeouts:
      192094.11 tps ( 41.2 allocs/op,  12.1 tasks/op,   33403 insns/op,   960604 errors)
```

Closes #10368

* github.com:scylladb/scylla:
  database: avoid rethrows when handling exceptions from commitlog
  database: convert throw_commitlog_add_error to use make_nested_exception_ptr
  utils: add make_nested_exception_ptr
  storage_proxy: don't rethrow when inspecting replica exceptions on write path
  database: don't rethrow rate_limit_exception
  storage_proxy: don't rethrow the exception in abstract_read_resolver::error
  utils/exceptions.cc: don't rethrow in is_timeout_exception
  utils/exceptions: add try_catch
  utils: add abi/eh_ia64.hh
  storage_proxy: don't rethrow exceptions from replicas when accounting read stats
  message: get rid of throws in send_message{,_timeout,_abortable}
  database/{query,query_mutations}: don't rethrow read semaphore exceptions
2022-07-11 14:01:41 +03:00
Nadav Har'El
cc69177dcc config: fix printing of experimental feature list
Recently we noticed a regression where with certain versions of the fmt
library,

   SELECT value FROM system.config WHERE name = 'experimental_features'

returns string numbers, like "5", instead of feature names like "raft".

It turns out that the fmt library keep changing their overload resolution
order when there are several ways to print something. For enum_option<T> we
happen to have to conflicting ways to print it:
  1. We have an explicit operator<<.
  2. We have an *implicit* convertor to the type held by T.

We were hoping that the operator<< always wins. But in fmt 8.1, there is
special logic that if the type is convertable to an int, this is used
before operator<<()! For experimental_features_t, the type held in it was
an old-style enum, so it is indeed convertible to int.

The solution I used in this patch is to replace the old-style enum
in experimental_features_t by the newer and more recommended "enum class",
which does not have an implicit conversion to int.

I could have fixed it in other ways, but it wouldn't have been much
prettier. For example, dropping the implicit convertor would require
us to change a bunch of switch() statements over enum_option (and
not just experimental_features_t, but other types of enum_option).

Going forward, all uses of enum_option should use "enum class", not
"enum". tri_mode_restriction_t was already using an enum class, and
now so does experimental_features_t. I changed the examples in the
comments to also use "enum class" instead of enum.

This patch also adds to the existing experimental_features test a
check that the feature names are words that are not numbers.

Fixes #11003.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11004
2022-07-11 09:17:30 +02:00
Benny Halevy
e3f561db31 compaction_manager: major_compaction_task: run in maintenance scheduling groupt
We should separate the scheduling groups used for major compaction
from the the regular compaction scheduling group so that
the latter can be affected by the backlog tracker in case
backlog accumulates during a long running major compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-06 18:18:45 +03:00
Avi Kivity
419fe65259 Revert "Merge 'Block flush until compaction finishes if sstables accumulate' from Mikołaj Sielużycki"
This reverts commit aa8f135f64, reversing
changes made to 9a88bc260c. The patch
causes hangs during flush.

Also reverts parts of 411231da75 that impacted the unit test.

Fixes #10897.
2022-07-06 12:19:02 +03:00
Piotr Dulikowski
1b8aacfee1 utils: add make_nested_exception_ptr
The utils::make_nested_exception_ptr function works similar to
std::throw_with_nested, but instead of storing the currently thrown
exception as the nested exception and then immediately throwing the new
exception, it receives the nested exception as an std::exception_ptr and
also returns an std::exception_ptr.

If the standard library supports it, the function does not perform any
throws. Otherwise the fallback logic performs two throws.
2022-07-05 16:41:09 +02:00
Piotr Dulikowski
18f43fa00e utils/exceptions: add try_catch
Introduces a utility function which allows obtaining a pointer to the
exception data held behind an std::exception_ptr if the data matches the
requested type. It can be used to implement manual but concise
try..catch chains.

The `try_catch` has the best performance when used with libstdc++ as it
uses the stdlib specific functions for simulating a try..catch without
having to actually throw. For other stdlibs, the implementation falls
back to a throw surrounded by an actual try..catch.
2022-07-05 16:41:09 +02:00
Tomasz Grabiec
a6aef60b93 memtable: Fix missing range tombstones during reads under ceratin rare conditions
There is a bug introduced in e74c3c8 (4.6.0) which makes memtable
reader skip one a range tombstone for a certain pattern of deletions
and under certain sequence of events.

_rt_stream contains the result of deoverlapping range tombstones which
had the same position, which were sipped from all the versions. The
result of deoverlapping may produce a range tombstone which starts
later, at the same position as a more recent tombstone which has not
been sipped from the partition version yet. If we consume the old
range tombstone from _rt_stream and then refresh the iterators, the
refresh will skip over the newer tombstone.

The fix is to drop the logic which drains _rt_stream so that
_rt_stream is always merged with partition versions.

For the problem to trigger, there have to be multiple MVCC versions
(at least 2) which contain deletions of the following form:

[a, c] @ t0
[a, b) @ t1, [b, d] @ t2

c > b

The proper sequence for such versions is (assuming d > c):

[a, b) @ t1,
[b, d] @ t2

Due to the bug, the reader will produce:

[a, b) @ t1,
[b, c] @ t0

The reader also needs to be preempted right before processing [b, d] @
t2 and iterators need to get invalidated so that
lsa_partition_reader::do_refresh_state() is called and it skips over
[b, d] @ t2. Otherwise, the reader will emit [b, d] @ t2 later. If it
does emit the proper range tombstone, it's possible that it will violate
fragment order in the stream if _rt_stream accumulated remainders
(possible with 3 MVCC versions).

The problem goes away once MVCC versions merge.

Fixes #10913
Fixes #10830

Closes #10914
2022-06-29 19:02:23 +03:00