Commit Graph

1565 Commits

Author SHA1 Message Date
Michał Chojnowski
b114551d53 configure: don't reduce parsers' optimization level to 1 in release
The line modified in this patch was supposed to increase the
optimization levels of parsers in debug mode to 1, because they
were too slow otherwise. But as a side effect, it also reduced the
optimization level in release mode to 1. This is not a problem
for the CQL frontend, because statement preparation is not
performance-sensitive, but it is a serious performance problem
for Alternator, where it lies in the hot path.

Fix this by only applying the -O1 to debug modes.

Fixes #12463

Closes #12460

(cherry picked from commit 08b3a9c786)
2023-01-08 01:34:56 +02:00
Benny Halevy
cf6bcffc1b configure: add --perf-tests-debuginfo option
Provides separate control over debuginfo for perf tests
since enabling --tests-debuginfo affects both today
causing the Jenkins archives of perf tests binaries to
inflate considerably.

Refs https://github.com/scylladb/scylla-pkg/issues/3060

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 48021f3ceb)

Fixes #12191
2022-12-04 17:19:59 +02:00
Avi Kivity
268e4abe77 Merge 'wasm: reuse instances for wasm UDFs' from Wojciech Mitros
Calling WebAssembly UDFs requires wasmtime instance. Creating such an instance is expensive,
but these instances can be reused for subsequent calls of the same UDF on various inputs.

This patch introduces a way of reusing wasmtime instances: a wasm instance cache.
The cache stores a wasmtime instance for each UDF and scheduling group. The instances are
evicted using LRU strategy and their size is based on the size of their wasm memories.

The instances stored in the cache are also dropped when the UDF is dropped itself. For that reason,
the first patch modifies the current implementation of UDF dropping, so that the instance dropping may be added
later. The patch also removes the need of compiling the UDF again when dropping it.

The second patch contains the implementation and use of the new cache. The cache is implemented
in `lang/wasm_instance_cache.hh` and the main ways of using it are the `run_script` methods from `wasm.hh`

The third patch adds tests to `test_wasm.py` that check the correctness and performance of the new
cache. The tests confirm the instance reuse, size limits, instance eviction after timeout and after dropping the UDF.

Closes #10306

* github.com:scylladb/scylladb:
  wasm: test instances reuse
  wasm: reuse UDF instances
  schema_tables: simplify merge_functions and avoid extra compilation
2022-08-02 13:51:16 +03:00
Avi Kivity
2c0932cc41 Merge 'Reduce the amount of per-table metrics' from Amnon Heiman
This series is the first step in the effort to reduce the number of metrics reported by Scylla.
The series focuses on the per-table metrics.

The combination of histograms, per-tables, and per shard makes the number of metrics in a cluster explode.
The following series uses multiple tools to reduce the number of metrics.
1. Multiple metrics should only be reported for the user tables and the condition that checked it was not updated when more non-user keyspaces were added.
2. Second, instead of a histogram, per table, per shard, it will report a summary per table, per shard, and a single histogram per node.
3. Histograms, summaries, and counters will be reported only if they are used (for example, the cas-related metrics will not be reported for tables that are not using cas).

Closes #11058

* github.com:scylladb/scylla:
  Add summary_test
  database: Reduce the number of per-table metrics
  replica/table.cc: Do not register per-table metrics for system
  histogram_metrics_helper.hh: Add to_metrics_summary function
  Unified histogram, estimated_histogram, rates, and summaries
  Split the timed_rate_moving_average into data and timer
  utils/histogram.hh: should_sample should use a bitmask
  estimated_histogram: add missing getter method
2022-07-27 22:01:08 +03:00
Amnon Heiman
3658aa9ec2 Add summary_test
This patch adds unit tests for the summary implementation.
2022-07-27 16:58:52 +03:00
Amnon Heiman
9a3e70adfb histogram_metrics_helper.hh: Add to_metrics_summary function
The to_metrics_summary is a helper function that create a metrics type
summary from a timed_rate_moving_average_with_summary object.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2022-07-27 16:58:52 +03:00
Botond Dénes
81e20ceaab Merge 'logalloc, dirty_memory_manager: move region_groups to dirty_memory_manager' from Avi Kivity
logalloc manages regions of log-structured allocated memory, and region_groups
containing such regions and other region_groups. region_groups were introduced
for accounting purposes - first to limit the amount of memory in memtables, then to
match new dirty memory allocation rate with memtable flushing rate so we never
hit a situation where allocation rate exceeded flush rate, and we exceed our limit.

The problem is that the abstraction is very weak - if we want to change anything
in memtable flush control we'll need to change region_groups too - and also
expensive to maintain.

The solution is to break the abstraction and move region_groups to memtable
dirty memory management code. Instead introduce a new, simpler abstraction,
the region_listener, which communicates changes in region memory consumption
to an external piece of code, which can then choose to do with it what it likes.

The long term plan is to completely remove region_groups and fold them into dirty_memory_manager:
 - make each memtable a region_listener so it gets called back after size changes
 - make memtables inform their dirty_memory_manager about the size to dirty_memory_manager can decide to throttle writes and which memtable to pick to flush

Closes #10839

* github.com:scylladb/scylla:
  logalloc: drop region_impl public accessors
  logalloc, dirty_memory_manager: move size-tracking binomial heap out of logalloc
  logalloc: relax lifetime rules around region_listener
  logalloc, dirty_memory_manager: move region_group and associated code
  logalloc: expose tracker_reclaimer_lock
  logalloc: reimplement tracker_reclaim_lock to avoid using hidden classes
  logalloc: reduce friendship between region and region_group
  logalloc: decouple region_group from region
  memtable: stop using logalloc::region::group() to test for flushed memtables
2022-07-26 17:08:37 +03:00
Nadav Har'El
cb8a67dc98 Merge 'Allow materialized views to by synchronous' from Piotr Sarna
This pull request introduces a "synchronous mode" for global views. In this mode, all view updates are applied synchronously as if the view was local.

Marking view as a synchronous one can be done using `CREATE MATERIALIZED VIEW` and `ALTER MATERIALIZED VIEW`. E.g.:
```cql
ALTER MATERIALIZED VIEW ks.v WITH synchronous_updates = true;
```

Marking view as a synchronous one was done using tags (originally used by alternator). No big modifications in the view's code were needed.

Fixes: https://github.com/scylladb/scylla/issues/10545

Closes #11013

* github.com:scylladb/scylla:
  cql-pytest: extend synchronous mv test with new cases
  cql-pytest: allow extra parameters in new_materialized_view
  docs: add a paragraph on view synchronous updates
  test/boost/cql_query_test: add test setting synchronous updates property
  test: cql-pytest: add a test for synchronous mode materialized views
  db: view: react to synchronous updates tag
  cql3: statements: cf_prop_defs: apply synchronous updates tag
  alternator, db: move the tag code to db/tags
  cql3: statements: add a synchronous_updates property
2022-07-26 15:42:51 +03:00
Avi Kivity
fbe8ea7727 logalloc, dirty_memory_manager: move region_group and associated code
region_group is an abstraction that allows accounting for groups of
regions, but the cost/benefit ratio of maintaining the abstraction
is poor. Each time we need to change decision algorithm of memtable
flushing (admittedly rarely), we need to distill that into an abstraction
for region_groups and then use it. An example is virtual regions groups;
we wanted to account for the partially flushed memtables and had to
invent region groups to stand in their place.

Rather than continuing to invest in the abstraction, break it now
and move it to the memtable dirty memory manager which is responsible
for making those decisions. The relevant code is moved to
dirty_memory_manager.hh and dirty_memory_manager.cc (new file), and
a new unit test file is added as well.

A downside of the change is that unit testing will be more difficult.
2022-07-26 11:12:10 +03:00
Yaron Kaikov
c42c5111eb SCYLLA-VERSION-GEN: use semver-compatible version
Setting Scylla to use semantic versioning. (Ref: https://semver.org/)

Closes: https://github.com/scylladb/scylla/issues/9543

Closes #10957
2022-07-25 18:06:28 +03:00
Michał Sala
041cb77ad0 alternator, db: move the tag code to db/tags
Tags are a useful mechanism that could be used outside of alternator
namespace. My motivation to move tags_extension and other utilities to
db/tags/ was that I wanted to use them to mark "synchronous mode" views.

I have extracted `get_tags_of_table`, `find_tag` and `update_tags`
method to db/tags/utils.cc and moved alternator/tags_extension.hh to
db/tags/.

The signature of `get_tags_of_table` was changed from `const
std::map<sstring, sstring>&` to `const std::map<sstring, sstring>*`
Original behavior of this function was to throw an
`alternator::api_error` exception. This was undesirable, as it
introduced a dependency on the alternator module. I chose to change it
to return a potentially null value, and added a wrapper function to the
alternator module - `get_tags_of_table_or_throw` to keep the previous
throwing behavior.
2022-07-25 09:53:33 +02:00
Wojciech Mitros
9281ba3919 wasm: reuse UDF instances
When executing a wasm UDF, most of the time is spent on
setting up the instance. To minimize its cost, we reuse
the instance using wasm::instance_cache.

This patch adds a wasm instance cache, that stores
a wasmtime instance for each UDF and scheduling group.
The instances are evicted using LRU strategy. The
cache may store some entries for the UDF after evicting
the instance, but they are evicted when the corresponding
UDF is dropped, which greatly limits their number.

The size of stored instances is estimated using the size
of their WASM memories. In order to be able to read the
size of memory, we require that the memory is exported
by the client.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2022-07-20 18:19:22 +02:00
Takuya ASADA
752be6536a rename relocatable packages
Currently, we use following naming convention for relocatable package
filename:
  ${package_name}-${arch}-package-${version}.${release}.tar.gz
But this is very different with Linux standard packaging system such as
.rpm and .deb.
Let's align the convention to .rpm style, so new convention should be:
  ${package_name}-${version}-${release}.${arch}.tar.gz

Closes #9799

Closes #10891

* tools/java de8289690e...d0143b447c (1):
  > build_reloc.sh: rename relocatable packages

* tools/jmx fe351e8...06f2735 (1):
  > build_reloc.sh: rename relocatable packages

* tools/python3 e48dcc2...bf6e892 (1):
  > reloc/build_reloc.sh: rename relocatable packages
2022-07-19 15:46:49 +03:00
Takuya ASADA
23973f9591 Support installing pip provided command symlinks to /usr/bin
This is part of support installing executables from PIP package,
now we support installing executable from PIP package but it will
install under /opt/scylladb/python3/bin.
To call these commands without speciying full path, we also need to install
symlink to /usr/bin.
To do this, we need new list which specifies command name for symlink.

Closes #10748
2022-07-12 17:26:05 +03:00
Nadav Har'El
f5ff687b64 Merge 'cql3: Reorganize expr::to_restriction' from Jan Ciołek
This PR introduces improvements to `expr::to_restriction` and prepares the validation part for restriction classes removal.

`expr::to_restriction` is currently used to take a restriction from the WHERE clause, prepare it, perform some validation checks and finally convert it to an instance of the restriction class.

Soon we will get rid of the restriction class.

In preparation for that `expr::to_restriction` is split into two independent parts:
* The part that prepares and validates a binary_operator
* The part that converts a binary_operator to restriction

Thanks to this split getting rid of restriction class will be painless, we will just stop using the second part.

`to_restriction.cc` is replaced by `restrictions.hh/cc`. In the future we can put all the restriction expressions code there to avoid clutter in `expression.hh/cc`.

This change made it much easier to fix #10631, so I did that as well.

Fixes: #10631

Closes #10979

* github.com:scylladb/scylla:
  cql-pytest: Test that IS NOT only accepts NULL
  cql-pytest: Enable testInvalidCollectionNonEQRelation
  cql3: Move single element IN restrictions handling
  cql3: Check for disallowed operators early
  cql3: Simplify adding restrictions
  cql3: Reorganize to_restriction code
  cql3: Fix IS NOT NULL check in to_restriction
  cql3: Swap order of arguments in error message
2022-07-12 00:26:34 +03:00
Jan Ciolek
debd7399fd cql3: Reorganize to_restriction code
expr::to_restriction is currently used to
take a restriction from the WHERE clause,
prepare it, perform some validation checks
and finally convert it to an instance of
the restriction class.

Soon we will get rid of the restriction class.

In preparation for that expr::to_restriction
is split into two independent parts:
* The part that prepares and validates a binary_operator
* The part that converts a binary_operator to restriction

Thanks to this split getting rid of restriction class
will be painless, we will just stop using the
second part.

This commit splits expr::to_restriction into two functions;
* validate_and_prepare_new_restriction
* convert_to_restriction
that handle each of those parts.

All helper validation methods in the anonymous namespace
are copied from the to_restriction.cc file.

to_restriction.cc isn't the best filename for the new functionality,
so it has been renamed to restrictions.hh/cc.
In the future all the code regarding restrictions could be
put there to reduce clutter in expression.hh/cc

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-11 15:47:16 +02:00
Piotr Dulikowski
18f43fa00e utils/exceptions: add try_catch
Introduces a utility function which allows obtaining a pointer to the
exception data held behind an std::exception_ptr if the data matches the
requested type. It can be used to implement manual but concise
try..catch chains.

The `try_catch` has the best performance when used with libstdc++ as it
uses the stdlib specific functions for simulating a try..catch without
having to actually throw. For other stdlibs, the implementation falls
back to a throw surrounded by an actual try..catch.
2022-07-05 16:41:09 +02:00
Pavel Emelyanov
3a753068be Merge "Make permissions cache live updateable and add an API for resetting authorization cache" from Igor Ribeiro Barbosa Duarte
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache (e.g. Adding a user and changing their permissions)
can become pretty annoying.
This patch series make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
It also adds an API for flushing the cache to make it easier for users who
don't want to modify their permissions_cache config.

branch: https://github.com/igorribeiroduarte/scylla/tree/make_permissions_cache_live_updateable
CI: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1005/
dtests: https://github.com/igorribeiroduarte/scylla-dtest/tree/test_permissions_cache

* https://github.com/igorribeiroduarte/scylla/make_permissions_cache_live_updateable:
  loading_cache_test: Test loading_cache::reset and loading_cache::update_config
  api: Add API for resetting authorization cache
  authorization_cache: Make permissions cache and authorized prepared statements cache live updateable
  auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct
  utils/loading_cache.hh: Add update_config method
  utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh
  utils/loading_cache.hh: Add reset method
2022-06-29 11:14:13 +03:00
Igor Ribeiro Barbosa Duarte
a23c3d6338 api: Add API for resetting authorization cache
For cases where we have very high values set to permissions_cache validity and
update interval (E.g.: 1 day), whenever a change to permissions is made it's
necessary to update scylla config and decrease these values, since waiting for
all this time to pass wouldn't be viable.
This patch adds an API for resetting the authorization cache so that changing
the config won't be mandatory for these cases.

Usage:
    $ curl -X POST http://localhost:10000/authorization_cache/reset

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-06-28 19:58:06 -03:00
Avi Kivity
3131cbea62 Merge 'query: allow replica to provide arbitrary continue position' from Botond Dénes
Currently, we use the last row in the query result set as the position where the query is continued from on the next page. Since only live rows make it into query result set, this mandates the query to be stopped on a live row on the replica, lest any dead rows or tombstones processed after the live rows, would have to be re-processed on the next page (and the saved reader would have to be thrown away due to position mismatch). This requirement of having to stop on a live row is problematic with datasets which have lots of dead rows or tombstones, especially if these form a prefix. In the extreme case, a query can time out before it can process a single live row and the data-set becomes effectively unreadable until compaction gets rid of the tombstones.
This series prepares the way for the solution: it allows the replica to determine what position the query should continue from on the next page. This position can be that of a dead row, if the query stopped on a dead row. For now, the replica supplies the same position that would have been obtained with looking at the last row in the result set, this series merely introduces the infrastructure for transferring a position together with the query result, and it prepares the paging logic to make use of this position. If the coordinator is not prepared for the new field, it will simply fall-back to the old way of looking at the last row in the result set. As I said for now this is still the same as the content of the new field so there is no problem in mixed clusters.

Refs: https://github.com/scylladb/scylla/issues/3672
Refs: https://github.com/scylladb/scylla/issues/7689
Refs: https://github.com/scylladb/scylla/issues/7933

Tests: manual upgrade test.
I wrote a data set with:
```
./scylla-bench -mode=write -workload=sequential -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -clustering-row-size=8096 -partition-count=1000
```
This creates large, 80MB partitions, which should fill many pages if read in full. Then I started a read workload:
```
./scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100
```
I confirmed that paging is happening as expected, then upgraded the nodes one-by-one to this PR (while the read-load was ongoing). I observed no read errors or any other errors in the logs.

Closes #10829

* github.com:scylladb/scylla:
  query: have replica provide the last position
  idl/query: add last_position to query_result
  mutlishard_mutation_query: propagate compaction state to result builder
  multishard_mutation_query: defer creating result builder until needed
  querier: use full_position instead of ad-hoc struct
  querier: rely on compactor for position tracking
  mutation_compactor: add current_full_position() convenience accessor
  mutation_compactor: s/_last_clustering_pos/_last_pos/
  mutation_compactor: add state accessor to compact_mutation
  introduce full_position
  idl: move position_in_partition into own header
  service/paging: use position_in_partition instead of clustering_key for last row
  alternator/serialization: extract value object parsing logic
  service/pagers/query_pagers.cc: fix indentation
  position_in_partition: add to_string(partition_region) and parse_partition_region()
  mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh
2022-06-27 12:23:21 +03:00
Botond Dénes
119be5d5db idl: move position_in_partition into own header
So it can be used without pulling in all of partition_checksum.idl.hh.
2022-06-23 13:36:24 +03:00
Piotr Dulikowski
bc50163016 tests: add per_partition_rate_limit_test
Adds the per_partition_rate_limit_test.cc file. Currently, it only
contains a test which verifies that the feature correctly switches off
rate limiting for internal queries (!allow_limit || internal sg).
2022-06-22 20:16:49 +02:00
Piotr Dulikowski
dccb8a5729 schema: add per_partition_rate_limit schema extension
Adds the new `per_partition_rate_limit` schema extension. It has two
parameters: `max_writes_per_second` and `max_reads_per_second`.
In the future commits they will control how many operations of given
type are allowed for each partition in the given table.
2022-06-22 20:16:48 +02:00
Piotr Dulikowski
0fe8b55427 db: add rate_limiter
Introduces the rate_limiter, a replica-side data structure meant for
tracking the frequence with which each partition is being accessed
(separately for reads and writes) and deciding whether the request
should be accepted and processed further or rejected.

The limiter is implemented as a statically allocated hashmap which keeps
track of the frequency with which partitions are accessed. Its entries
are incremented when an operation is admitted and are decayed
exponentially over time.

If a partition is detected to be accessed more than its limit allows,
requests are rejected with a probability calculated in such a way that,
on average, the number of accepted requests is kept at the limit.

The structure currently weights a bit above 1MB and each shard is meant
to keep a separate instance. All operations are O(1), including the
periodic timer.
2022-06-22 20:16:48 +02:00
Piotr Dulikowski
621b7f35e2 replica: add rate_limit_exception and a simple serialization framework
Introduces `replica::rate_limit_exception` - an exceptions that is
supposed to be thrown/returned on the replica side when the request is
rejected due to the exceeding the per-partition rate limit.

Additionally, introduces the `exception_variant` type which allows to
transport the new exception over RPC while preserving the type
information. This will be useful in later commits, as the coordinator
will have to know whether a replica has failed due to rate limit being
exceeded or another kind of error.

The `exception_variant` currently can only either hold "other exception"
(std::monostate) or the aforementioned `rate_limit_exception`, but can
be extended in a backwards-compatible way in the future to be able to
hold more exceptions that need to be handled in a different way.
2022-06-22 20:07:58 +02:00
Michael Livshin
43f2c55c5d configure.py: speed up and simplify compdb generation
The most time-consuming part is invoking "ninja -t compdb", and there
is no need to repeat that for every mode.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>

Closes #10733
2022-06-15 16:40:52 +03:00
Avi Kivity
8f690fdd47 Update seastar submodule
* seastar 1424d34c93...443e6a9b77 (5):
  > reactor: re-raise fatal signals
Ref #9242
  > test: initialize _earliest_started and _latest_finished
  > reactor: add io_uring backend
  > semaphore: add semaphore_unit operator bool
  > Merge 'map reduce: save mapper' from Benny Halevy

io_uring is disabled since the frozen toolchain's liburing it too old.

Closes #10794
2022-06-15 08:36:08 +03:00
Nadav Har'El
84e1fa0513 configure.py: make build.ninja the same every time
In several places, configure.py uses unsorted sets which results in
its output being in different order every time - both a different
order of targets, and a different order in dependencies of each
target.

This is both strange, and annoying when trying to debug configure.py
and trying to understand when, if at all, its output changes.

So in this patch, we use "sorted(...)" in the right places that
are needed to guarantee a fixed order. This fixed order is alphabetical,
but that's not the goal of this patch - the goal is to ensure a fixed
order.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-05-30 16:20:37 +03:00
Nadav Har'El
8db9e62de9 configure.py: don't delete build.ninja when rebuild is interrupted
In commit 9cc9facbea, I fixed issue #4706.
That issue about what happens when interrupting a rebuild of build.ninja
(which happens automatically when you run "ninja" after configure.py
changed). We don't want to leave behind a half-built build.ninja,
or leave it deleted.

The solution in that commit was for configure.py to build a temporary file
(build.ninja.tmp), and only as the very last step rename it build.ninja.

Unfortunately, since that time, we added more last steps after what
used to be that very last step :-(

If this new code running after the rename takes a noticable amount of
time, and if the user is unlucky enough to interrupt it during that
time, ninja will see a modified output file (build.ninja) and a failed
rule, and will delete the output file!

The solution is to move the rename out of configure.py. Instead, we
add a "--out=filename" option to configure.py which allows it to write
directly to a different file name, not build.ninja. When rebuilding
build.ninja, the rule will now call configure.py with "--out=build.ninja.new"
and then rename it back to build.ninja. Any failure or interrupt at any
stage of configure.py will leave build.ninja untouched, so ninja will
not delete it - it will just delete the temporary build.ninja.new.

Fixes #4706 (again)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-05-30 16:17:41 +03:00
Konstantin Osipov
9512e076af test: remove tools/cql_repl
Remove tools/cql_repl from the source, build targets and
use in test.py. Superseded by ApprovalTest and
test/pylib/cql_repl/cql_repl.py.
2022-05-25 20:26:42 +03:00
Avi Kivity
5285ccbb12 Merge 'Add prune ghost rows statement' from Piotr Sarna
This series is split from another, bigger RFC series which provides
manual remedies to deal with inconsistencies between the base table
and its views. This part deals with ghost rows by providing a statement
which fetches view rows from a given range, then reads its corresponding
rows from the base table (cl=ALL), and finally removes rows which were
not present in the base table at all, qualifying them as ghost rows.
Motivations for introducing such a statement:
 * in case of detected inconsistencies, it can be used to fix
   materialized views without recreating them from scratch, which can
   take days and generates lots of throughput
 * a tool which periodically scrubs a materialized view can be easily
   created on top of this statement, especially that it's possible
   to remove ghost rows from a user-defined view token range;

This series comes with a unit test.

The reason for digging up this series is because it's still possible to end up with ghost rows in certain rather improbable scenarios, and we lack a way of fixing them without rebuilding the whole view. For instance, in case of a failed synchronous update to a local view, the user will be notified that the query failed, but a ghost row can be created nonetheless. The pruning statement introduced in this series would allow healing the failure locally, without rebuilding the whole view.

Tests: unit(dev)

Closes #10426

* github.com:scylladb/scylla:
  docs: add a paragraph on PRUNE MATERIALIZED VIEW statement
  service,test: add a test case for error during pruning
  tests: add ghost row deletion test case
  cql3: enable ghost row deletion via CQL
  cql3: add a statement for deleting ghost rows
  cql3: convert is_json statement parameter to enum
  pager: add ghost row deleting pager
  db,view: add delete ghost rows visitor
2022-05-19 17:21:35 +03:00
Gleb Natapov
c2ef390a52 service: raft: move group0 write path into a separate file
Writing into the group0 raft group on a client side involves locking
the state machine, choosing a state id and checking for its presence
after operation completes. The code that does it resides now in the
migration manager since the currently it is the only user of group0. In
the near future we will have more client for group0 and they all will
have to have the same logic, so the patch moves it to a separate class
raft_group0_client that any future user of group0 can use to write
into it.

Message-Id: <YoYAJwdTdbX+iCUn@scylladb.com>
2022-05-19 17:21:35 +03:00
Piotr Sarna
ec0a3bbbd4 cql3: add a statement for deleting ghost rows
In order to expose the API for deleting ghost rows from a view,
a CQL statement is created. It is loosely based on select_statement,
as its first step is to select view table rows.
2022-05-19 10:11:50 +02:00
cvybhu
d85f680df3 cql3: Remove relation class
Functionality of the relation class has been replaced by
expr::to_restriction.

Relation and all classes deriving from it can now be removed.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
89950e02b5 cql3: expr: Add expr::to_restriction for single column relations
Add a function that will be used to convert expressions
received from the parser to restrictions.

Currently parser creates relations with expressions inside
and then those relations are converted to restrictions.

Once this function is implemented we will be able to skip
creating relations altogether and convert straight from
expression to restriction. This will allow us to remove
the relation class.

Further functionality will be implemented in the following commits.
This commit implements converting single column relations to expressions.

The code is mostly taken from functions in single_column_relation.hh,
because we are replicating functionality of the functions called
single_column_relation::new_XX_restriction.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:57 +02:00
Avi Kivity
0dd5f02022 build: enable ABSL_PROPAGATE_CXX_STD
Recently Abseil started to ask to enable ABSL_PROPAGATE_CXX_STD,
warning that it will do so itself in the future. Do so, and
specify that we use C++20 to avoid inconsistencies.

Closes #10563
2022-05-13 07:12:03 +02:00
Piotr Sarna
eb6f4cc839 Merge 'dependencies: add rust' from Wojciech Mitros
The main reason for adding rust dependency to scylla is the
wasmtime library, which is written in rust. Although there
exist c++ bindings, they don't expose all of its features,
so we want to do that ourselves using rust's cxx.

The patch also includes an example rust source to be used in
c++, and its example use in tests/boost/rust_test.

The usage of wasmtime has been slightly modified to avoid
duplicate symbol errors, but as a result of adding a Rust
dependency, it is going to be removed from `configure.py`
completely anyway

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Closes #10341

* github.com:scylladb/scylla:
  docs: document rust
  tests: add rust example
2022-05-12 15:24:58 +02:00
Wojciech Mitros
4ad012cb6a tests: add rust example
The patch includes an example rust source to be used in
c++, and its example use in tests/boost/rust_test.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2022-05-11 16:49:31 +02:00
Nadav Har'El
043b1c7f89 Update seastar submodule. Unfortunately, also requires two changes
to Scylla itself to make it still compile - see below

* seastar 5e863627...96bb3a1b (18):
  > install-dependencies: add rocky as a supported distro
  > circleci: relax docker limits to allow running with new toolchain
  > core: memory: Add memory::free_memory() also in Debug mode
  > build: bump up zlib to 1.2.12
  > cmake: add FindValgrind.cmake
  > Merge 'seastar-addr2line: support sct syslogs' from Benny Halevy
  > rpc: lower log level for 'failed to connect' errors
  > scripts: Build validation
  > perftune.py: remove rx_queue_count from mode condition.
  > memory: add attributes to memalign for compatibility with glibc 2.35
  > condition-variable: Fix timeout "when" potentially not killing timer
  > Merge "tests: perf: measure coroutines performance" from Benny
  > Merge: Refine COUNTER metrics
  > Revert "Merge: Refine COUNTER metrics"
  > reactor: document intentional bitwise-on-bool op in smp_pollfn::poll()
  > Merge: Refine COUNTER metrics
  > SLES: additionally check irqbalance.service under /usr/lib
  > rpc_tester: job_cpu: mark virtual methods override

Changes to Scylla also included in this merge:

1. api: Don't export DERIVEs (Pavel Emelyanov)

Newer seastar doesn't have DERIVE metrics, but does have REAL_COUNTER
one. Teach the collectd getter the change.

(for the record: I don't understand how this endpoing works at all,
there's a HISTOGRAM metrics out there that would be attempted to get
exposed with the v.ui() call which's totally wrong)

2. test: use linux_perf_events.{cc,hh} from Seastar

Seastar now has linux_perf_events.{cc,hh}. Remove Scylla's version
of the same files and use Seastar's. Without this change, Scylla
fails to compile when some source files end up including both
versions and seeing double definitions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-05-11 14:46:30 +02:00
Kamil Braun
0f7a1179c8 service: raft: remove raft_gossip_failure_detector
It's no longer used, having been replaced
by the direct_failure_detector listener.
2022-05-09 15:31:19 +02:00
Kamil Braun
915d329f1f test: raft: randomized_nemesis_test: use direct_failure_detector::failure_detector
Until now the nemesis test used its own failure detector implementation
which used one-way heartbeats.

Switch it to use the new direct failure detection service, which will
also be used in production code. Integrating it does require some work
however as we need to implement the `pinger` and `clock` interfaces
for the failure detector.

The service is sharded, but for simplicity of implementation we
implement rpcs and sleeps by routing the requests to shard 0, where
logical timers and network live.
2022-05-09 13:14:41 +02:00
Kamil Braun
e4f85cf425 test: unit test for new failure detector service 2022-05-09 13:14:41 +02:00
Kamil Braun
666e5a414d direct_failure_detector: introduce new failure detector service
The new service performs failure detection by periodically pinging
endpoints. The set of pinged endpoints can be dynamically extended and
shrinked. To learn about liveness of endpoints, user of the service
registers a listener and chooses a threshold - a duration of time which
has to pass since the last successful ping in order to mark an endpoint
as dead. When an endpoint responds it's immediately marked as alive.

Endpoints are identified using abstract integer identifiers.
The method of performing a ping is a dependency of the service provided
by the user through the `pinger` interface. The implementation of `pinger`
is responsible for translating the abstract endpoint IDs to 'real'
addresses. For example, production implementation may map endpoint IDs
to IP addresses and use TCP/IP to perform the ping, while a test/simulation
implementation may use a simulated network that also operates on
abstract identifiers.

Similarly, the method of measuring time is a dependency provided by the
user using the `clock` interface. The service operates on abstract time
intervals and timepoints. So, for example, in a production
implementation time can be measured using a stopwatch, while in
test/simulation we can use a logical clock.

The service distributes work across different shards. When an endpoint
is added to the set of detected endpoints, the service will choose a
shard with the smallest amount of workers and create a worker that is
responsible for periodically pinging this endpoint on that shard and
sending notifications to listeners.

Endpoints can be added or removed only through the shard 0 instance of
the service and shard 0 is responsible for coordinating the endpoint
workers. Listeners can be registered on any shard.
2022-05-09 13:14:40 +02:00
Avi Kivity
7129ddfa67 build: disable warnings that cause false-positive errors with gcc 12
gcc 12 generates some incorrect warnings (that we treat as errors).
Silence them so we can build.
2022-04-18 12:27:18 +03:00
Mikołaj Sielużycki
b16e12f3a1 repair: Add unit test for flushing repair_rows_on_wire to disk.
The unit test executes a simplified repair scenario by:
- producing a random stream of mutation mutation_fragments,
- convering them to repair_rows_on_wire,
- convering them to list of repair_rows using the conversion logic
  extracted in previous commits from repair_meta,
- flushing the rows to an sstable using the logic extracted in previous
  commits from repair_meta,
- comparing the sstable contents with the originally produced mutation
  fragments.

The test checks only the flushing part and is not concerned with any
other piece of the repair pipeline.
2022-04-12 09:22:10 +02:00
Michael Livshin
da7c7fd3dc delete code of the unused normalizing_reader class
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Message-Id: <20220406161107.2376568-3-michael.livshin@scylladb.com>
2022-04-07 09:29:41 +03:00
Botond Dénes
c9e30b9a6c tree: remove now empty mutation_reader.{hh,cc} 2022-03-30 15:42:51 +03:00
Botond Dénes
d0ea895671 readers: move multishard reader & friends to reader/multishard.cc
Since the multishard reader family weighs more than 1K SLOC, it gets
its own .cc file.
2022-03-30 15:42:51 +03:00
Botond Dénes
f8015d9c26 readers: move combined reader into readers/
Since the combined reader family weighs more than 1K SLOC, it gets its
own .cc file.
2022-03-30 15:42:51 +03:00
Calle Wilund
56c383ba8e test/perf/perf_commitlog: Add a small commitlog throughput test
Based on perf_simple_query, just bashes data into CL using
normal distribution min/max data chunk size, allowing direct
freeing of segments, _but_ delayed by a normal dist as well,
to "simulate" secondary delay in data persistance.

Needs more stuff.

Some baseline measurements on master:

--min-flush-delay-in-ms 10 --max-flush-delay-in-ms 200
--commitlog-use-hard-size-limit true
--commitlog-total-space-in-mb 10000 --min-data-size 160 --max-data-size 1024
--smp1

median 2065648.59 tps (  1.1 allocs/op,   0.0 tasks/op,    1482 insns/op)
median absolute deviation: 48752.44
maximum: 2161987.06
minimum: 1984267.90

--min-data-size 256 --max-data-size 16384

median 269385.25 tps (  2.2 allocs/op,   0.7 tasks/op,    3244 insns/op)
median absolute deviation: 15719.13
maximum: 323574.43
minimum: 228206.28

--min-data-size 4096 --max-data-size 61440

median 67734.22 tps (  6.4 allocs/op,   2.9 tasks/op,    9153 insns/op)
median absolute deviation: 2070.93
maximum: 82833.17
minimum: 61473.57

--min-data-size 61440 --max-data-size 1843200

median 2281.37 tps ( 79.7 allocs/op,  43.5 tasks/op,  202963 insns/op)
median absolute deviation: 128.87
maximum: 3143.84
minimum: 2140.80

--min-data-size 368640 --max-data-size 6144000

median 679.76 tps (225.5 allocs/op, 116.3 tasks/op,  662700 insns/op)
median absolute deviation: 39.30
maximum: 1148.95
minimum: 586.86

Actual throughput obviously meaningless, as it is run on my slow
machine, but IPS might be relevant.
Note that transaction throughput plummets as we increase median data
sizes above ~200k, since we then more or less always end up replacing
buffers in every call.

Closes #10230
2022-03-22 15:18:25 +02:00