Commit Graph

31244 Commits

Author SHA1 Message Date
Kamil Braun
700a1fdd20 test: raft: randomized_nemesis_test: send modify_config requests in reconfiguration nemsesis
Extend the reconfiguration nemesis to send `modify_config` requests as
well as `reconfigure` requests. It chooses one or the other with
probability 1/2.
2022-05-24 11:39:08 +02:00
Kamil Braun
2222f095b3 test: raft: randomized_nemesis_test: fix rpc reply ID generation
When `rpc` wants to perform a two-way RPC call it sends a message
containing a `reply_id`. The other side will send the `reply_id` back
when answering, so the original side can match the response to the promise
corresponding to the future being waited on by the RPC caller.

Previously each instance of `rpc` generated reply IDs independently as
increasing integers starting from 0. The network delivers messages
based on Raft server IDs. A response message may thus be delievered not
to the original instance which invoked the RPC, but to a new instance
which uses the same Raft server ID (after we simulated a server
crash/stop and restart, creating a new server with the same ID that
reuses the previous instance's `persistence` instance but has a new `rpc`).
The new instance could have started a new RPC call using the same
`reply_id` as one currently being in-flight that was started by the
previous instance. The new instance could then receive and handle a
response that was intended for the previous instance, leading to weird
bugs.

Fix this by replacing the local reply ID counters by a global counter so
that every two-way RPC call gets a unique reply ID.
2022-05-24 11:39:08 +02:00
Kamil Braun
b9807f07e6 test: raft: randomized_nemesis_test: during bouncing call, allow a leader to reroute to itself
A server executing a `modify_config` call, even if it initially was a
leader and accepted the request, may end up throwing a `not_a_leader`
error, rerouting the caller to a new leader - but this new leader may be
that same server. This happens because `execute_modify_config`
translates certain errors that it considers transient (such as
`conf_change_in_progress`) into `not_a_leader{last_known_leader}`,
in attempt to notify the caller that they should retry the request; but
when this translation happens, the `last_known_leader` may be that same
server (it could have even lost leadership and then regained it back
while the request was being handled).

This is not strictly an error, and it should be safe for the client to
retry the request by sending it to the same server. The nemesis test
assumed that a server never returns `not_a_leader{itself}`; this commit
drops the assumption.

An alternative solution would be to extend the error types that are now
translated to `not_a_leader` so they include information about the last
known leader. This way the client does not lose information about the
original error and still gets a potential contact point for retry.
2022-05-24 11:36:51 +02:00
Kamil Braun
b33bc7a5d6 test: raft: randomized_nemesis_test: handle timed_out_error from modify_config
May be propagated from `rpc::send_modify_config` to the caller of
`modify_config`.
2022-05-24 11:36:51 +02:00
Kamil Braun
4767b163ef service: raft: rpc: don't call execute... functions after abort()
The functions are called from RPC when a follower forwards a request to
a leader (`add_entry`, `modify_config`, `read_barrier`). The call may be
attempted during shutdown. The Raft shutdown code cleans up data structures
created by those requests. Make sure that they are not updated
concurrently with shutdown. This can lead to problems such as using the
server object after it was aborted, or even destroyed.

After this change, the RPC implementation may wait for a `execute_modify_config`
call to finish before finishing abort. That call in turn may be stuck on
`wait_for_entry`. Thus the waiter may prevent RPC from aborting. Fix
this be moving the wait on the future returned from `_rpc->abort()` in
`server::abort()` until after waiters were destroyed.
2022-05-24 11:36:51 +02:00
Kamil Braun
5e06d0ad6f raft: server: fix bad_variant_access in modify_config
`modify_config` would call `execute_modify_config` or
`_rpc->send_modify_config`, which returned a reply of type
`add_entry_reply`. This is a variant of 3 options: `entry_id`,
`not_a_leader`, or `commit_status_unknown`. The code would check
for the `entry_id` option and otherwise assume that it was `not_a_leader`.
During nemesis testing however, the reply was sometimes
`commit_status_unknown`, which caused a `bad_variant_access` exception
during `std::get` call. Fix this.

There is a similar piece of code in `add_entry`, but there it should be
impossible to obtain `commit_status_unknown` even though the types don't
enforce it. Make it more explicit with a comment and an assertion.
2022-05-24 11:36:51 +02:00
Avi Kivity
6a3494442e Update abseil submodule
* abseil f70eadad...9e408e05 (109):
  > Cord: workaround a GCC 12.1 bug that triggers a spurious warning
  > Change workaround for MSVC bug regarding compile-time initialization to trigger from MSC_VER 1910 to 1930. 1929 is the last _MSC_VER for Visual Studio 2019.
  > Don't default to the unscaled cycle clock on any Apple targets.
  > Use SSE instructions for prefetch when __builtin_prefetch is unavailable
  > Replace direct uses of __builtin_prefetch from SwissTable with the wrapper functions.
  > Cast away an unused variable to play nice with -Wunused-but-set-variable.
  > Use NullSafeStringView for const char* args to absl::StrCat, treating null pointers as "" Fixes #1167
  > raw_logging: Extract the inlined no-hook-registered behavior for LogPrefixHook to a default implementation.
  > absl: fix use-after-free in Mutex/CondVar
  > absl: fix live-lock in CondVar
  > Add a stress test for base_internal::ThreadIdentity reuse.
  > Improve compiler errors for mismatched ParsedFormat inputs.
  > Internal change
  > Fix an msan warning in cord_ringbuffer_test
  > Fix spelling error "charachter"
  > Document that Consume(Prefix|Suffix)() don't modify the input on failure
  > Fixes for C++20 support when not using std::optional.
  > raw_logging: Document that AbortHook's buffers live for as long as the process remains alive.
  > raw_logging: Rename SafeWriteToStderr to indicate what about it is safe (answer: it's async-signal-safe).
  > Correct the comment about the probe sequence.  It's (i/2 + i)/2 not (i/2 - i)/2.
  > Improve analysis of the number of extra `==` operations, which was overly complicated, slightly incorrect.
  > In btree, move rightmost_ into the CompressedTuple instead of root_.
  > raw_logging: Rename LogPrefixHook to reflect the other half of it's job (filtering by severity).
  > Don't construct/destroy object twice
  > Rename function_ref_benchmark.cc into more generic function_type_benchmark.cc, add missing includes
  > Fixed typo in `try_emplace` comment.
  > Fix a typo in a comment.
  > Adds ABSL_CONST_INIT to initializing declarations where it is missing
  > Automated visibility attribute cleanup.
  > Fix typo in absl/time/time.h
  > Fix typo: "a the condition" -> "a condition".
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Fix build with uclibc-ng (#1145)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Replace the implementation of the Mix function in arm64 back to 128bit multiplication (#1094)
  > Support for QNX (#1147)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Exclude unsupported x64 intrinsics from ARM64EC (#1135)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Add NetBSD support (#1121)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Some trivial OpenBSD-related fixes (#1113)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Add support of loongarch64 (#1110)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Disable ABSL_INTERNAL_ENABLE_FORMAT_CHECKER under VsCode/Intellisense (#1097)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > macos: support Apple Universal 2 builds (#1086)
  > cmake: make `random_mocking_bit_gen` library public. (#1084)
  > cmake: use target aliases from local Google Test checkout. (#1083)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > cmake: add ABSL_BUILD_TESTING option (#1057)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Fix googletest URL in CMakeLists.txt (#1062)
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Export of internal Abseil changes
  > Fix Randen and PCG on Big Endian platforms (#1031)
  > Export of internal Abseil changes

Closes #10630
2022-05-22 23:46:33 +03:00
Avi Kivity
4afe2a24b9 cql3: grammar: make 'relation' return an expression rather than append to a vector
The 'relation' production is self-contained, except for its interface
to the rest of the grammar where it appends to a vector of expressions
(that happens to represent a conjunction of relations). Make it stand-
alone by returning an expression, and move the responsibility for
appending to an expression vector to the whereClause production (later
we can make it build a conjunction expression rather than a vector
of expressions, paving the way for more boolean operators).

Closes #10623
2022-05-22 22:16:12 +03:00
Avi Kivity
a6b554409b storage_proxy: stop using deprecated std::not1 and std::bind1st in cas(), get_paxos_participants(), and create_write_response_handler_helper()
Use equivalent std::not_fn and std::bind_front instead.

Closes #10622
2022-05-22 22:13:12 +03:00
Alejo Sanchez
3904d3b96e install-dependencies.sh: add python3-pytest-asyncio
Add package pytest-asyncio for async pytest support.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #10616

[avi: regenerate frozen toolchain]
2022-05-22 17:46:56 +03:00
Botond Dénes
f873806c7c Merge 'Fix use-after-free when queue reader is destroyed before its handle' from Benny Halevy
The handle must not point at this reader implementation after it's destroyed.

This fixes use-after-free when the queue_reader_v2
is destroyed first as repair_writer_impl::_queue_reader,
before repair_writer_impl::_mq is destroyed.

The issue was introduced in 39205917a8
in the definition of `repair_writer_impl`.

Fixes #10528

While at it, fix also an ignored exceptional future seen in the test:
`repair_additional_test.py::TestRepairAdditional::test_repair_kill_3`

Closes #10591

* github.com:scylladb/scylla:
  mutation_readers: queue_reader_v2: detach from handle when destroyed
  messaging_service: do_make_sink_source: handle failed source future
2022-05-19 21:51:41 +03:00
Raphael S. Carvalho
b120cacdd1 compaction_manager: Allow off-strategy to proceed in parallel to in-strategy compactions
Off-strategy works on maintenance sstable set using maintenance
scheduling group, whereas "in-strategy" works on main sstable set
and uses compaction group.

Today, it can happen that off-strategy has to wait for an "in-strategy"
maintenance compaction, e.g. cleanup, to complete before getting
a chance to run. But that's not desired behavior as off-strategy uses
maintenance group, and its candidates don't add to the backlog that
influences "in-strategy" bandwidth. Therefore, "in-strategy" and
off-strategy should be decoupled, with off-strategy having its own
semaphore for guaranteeing serialization across tables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #10595
2022-05-19 17:37:11 +03:00
Takuya ASADA
b6003989f9 scylla_setup: stop using sudo -u, use user/group parameter on subprocess module
To run scylla-housekeeping we currently use "sudo -u scylla <cmd>" to switch
scylla user, but it fails on some environment.
Since recent version of Python 3 supports to switch user on subprocess module,
let's use python native way and drop sudo.

Fixes #10483

Closes #10538
2022-05-19 17:21:35 +03:00
Avi Kivity
fcf292b5d1 Merge 'Add range deserializer' from Benny Halevy
Currently, the idl-generated deserialization code, e.g. mutation_partition_view::rows() deserializes and returns a complete utils::chunked_vector<deletable_row_view> . And that could be arbitrarily long.

To consume it gently, we don't need the whole vector in advance, but rather we can consume it one element at a time (and in a nested way for cells in a row in the future).

Use `range_deserializer` to consume range tombstones and rows one item at a time.

We may consider in the future also gently iterating over cells
in a row and then dipping into collection cells that might also
contain a large number of items.

Fixes #10558

Closes #10566

* github.com:scylladb/scylla:
  ser: use vector_deserializer by default for all idl vectors
  mutation_partition_view: do_accept_gently: use the range based deserializers
  idl-compiler: generate *_range methods using vector_deserializer
  serializer_impl: add vector_deserializer
  test: frozen_mutation_test: test_writing_and_reading_gently: log detailed error
2022-05-19 17:21:35 +03:00
Avi Kivity
5285ccbb12 Merge 'Add prune ghost rows statement' from Piotr Sarna
This series is split from another, bigger RFC series which provides
manual remedies to deal with inconsistencies between the base table
and its views. This part deals with ghost rows by providing a statement
which fetches view rows from a given range, then reads its corresponding
rows from the base table (cl=ALL), and finally removes rows which were
not present in the base table at all, qualifying them as ghost rows.
Motivations for introducing such a statement:
 * in case of detected inconsistencies, it can be used to fix
   materialized views without recreating them from scratch, which can
   take days and generates lots of throughput
 * a tool which periodically scrubs a materialized view can be easily
   created on top of this statement, especially that it's possible
   to remove ghost rows from a user-defined view token range;

This series comes with a unit test.

The reason for digging up this series is because it's still possible to end up with ghost rows in certain rather improbable scenarios, and we lack a way of fixing them without rebuilding the whole view. For instance, in case of a failed synchronous update to a local view, the user will be notified that the query failed, but a ghost row can be created nonetheless. The pruning statement introduced in this series would allow healing the failure locally, without rebuilding the whole view.

Tests: unit(dev)

Closes #10426

* github.com:scylladb/scylla:
  docs: add a paragraph on PRUNE MATERIALIZED VIEW statement
  service,test: add a test case for error during pruning
  tests: add ghost row deletion test case
  cql3: enable ghost row deletion via CQL
  cql3: add a statement for deleting ghost rows
  cql3: convert is_json statement parameter to enum
  pager: add ghost row deleting pager
  db,view: add delete ghost rows visitor
2022-05-19 17:21:35 +03:00
Gleb Natapov
c2ef390a52 service: raft: move group0 write path into a separate file
Writing into the group0 raft group on a client side involves locking
the state machine, choosing a state id and checking for its presence
after operation completes. The code that does it resides now in the
migration manager since the currently it is the only user of group0. In
the near future we will have more client for group0 and they all will
have to have the same logic, so the patch moves it to a separate class
raft_group0_client that any future user of group0 can use to write
into it.

Message-Id: <YoYAJwdTdbX+iCUn@scylladb.com>
2022-05-19 17:21:35 +03:00
Avi Kivity
78eccd8763 Merge "Remove sstable_format_slector::sync()" from Pavel E
"
There's an explicit barrier in main that waits for the sstable format
selector to finish selecting it by the time node start to join a cluter.
(Actually -- not quite, when restarting a normal node it joins cluster
in prepare_to_join()).

This explicit barrier is not needed, the sync point already exists in
the way features are enabled, the format-selector just needs to use it.

branch: https://github.com/xemul/scylla/tree/br-format-selector-sync
tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/351/
refs: #2795
"

* 'br-format-selector-sync' of https://github.com/xemul/scylla:
  format-selector: Remove .sync() point
  format-selector: Coroutinize maybe_select_format()
  format-selector: Coroutinize simple methods
2022-05-19 17:21:35 +03:00
Avi Kivity
08ed4d7405 Merge 'scylla-gdb.py: add commands to dump sstables summary and index-cache' from Botond Dénes
This series adds two commands:
* scylla sstable-summary
* scylla sstable-index-cache

The former dumps the content of the sstable summary. This component is kept in memory in its entirety, so this can be easily done. The latter command dumps the content of the sstable index cache. This contains all the index-pages that are currently cached. The promoted index is not dumped yet and there is no indication of whether a given entry is in the LRU or not, but this already allows at seeing what pages are in the cache and what aren't.

Closes #10546

* github.com:scylladb/scylla:
  scylla-gdb.py: add scylla sstable-index-cache command
  scylla-gdb.py: add scylla sstable-summary command
  test/scylla-gdb: add sstable fixture
  scylla-gdb.py: make chunked_vector a proper container wrapper"
  scylla-gdb.py: make small_vector a proper container wrapper"
  scylla-gdb.py: add sstring container wrapper
  scylla-gdb.py: add chunked_managed_vector container wrapper
  scylla-gdb.py: add managed_vector container wrapper
  scylla-gdb.py: std_variant: add workaround for clang template bug
  scylla-gdb.py: add bplus_tree container wrapper
2022-05-19 17:21:35 +03:00
Pavel Emelyanov
3e53a0965c scylla-gdb: Handle new seastar/fair_queue layout
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220519094452.14538-1-xemul@scylladb.com>
2022-05-19 13:29:03 +03:00
Benny Halevy
9c5feb2781 mutation_readers: queue_reader_v2: detach from handle when destroyed
The handle must not point at this reader implementation
after it's destroyed.

This fixes use-after-free when the queue_reader_v2
is destroyed first as repair_writer_impl::_queue_reader,
before repair_writer_impl::_mq is destroyed.

The issue was introduced in 39205917a8
in the definition of `repair_writer_impl`.

Fixes #10528

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-19 11:48:03 +03:00
Benny Halevy
1308b45c58 messaging_service: do_make_sink_source: handle failed source future
I've stumbled upon this with version a2901a376d in debug mode
when testing repair_additional_test.py::TestRepairAdditional::test_repair_kill_3:

WARN  2022-05-17 07:26:12,581 [shard 0] seastar - Exceptional future ignored: seastar::rpc::closed_error (connection is closed), backtrace: 0x137c33d0 0x1ad14d0d 0x1ad149cd 0x1ad16fc3 0x1ad17e52 0x19d8a809 0x19d8ab6a 0x139165a9 0x17be0d21 0x17bdcfb0 0x17bf3611 0x17bf39f0 0x17bf3c62 0x17bf3958 0x17bf57d8 0x17bf5468 0x19efe44e 0x19f04ac6 0x19f09732 0x19f072a1 0x19cca281 0x19cc7de5 0x13859cbf 0x13d309d6 0x13d3090b 0x13d30775 0x1391364d 0x13858521 /lib64/libc.so.6+0x27b74 0x137774ad

Decode:
```
seastar::report_failed_future(seastar::future_state_base::any&&) at //./seastar/src/core/future.cc:218
seastar::future_state_base::any::check_failure() at //./seastar/include/seastar/core/future.hh:573
seastar::future_state<seastar::rpc::source<repair_row_on_wire_with_cmd> >::clear() at ././seastar/include/seastar/core/future.hh:615
~future_state at ././seastar/include/seastar/core/future.hh:620
 (inlined by) ~future at ././seastar/include/seastar/core/future.hh:1343
~ at ./message/messaging_service.cc:841
```

Looks like if sink.close() fails after source.failed() then source gets abandoned.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-19 11:47:38 +03:00
Avi Kivity
5c481973a3 sstables/processing_result_generator.hh: refine check for coroutine standard
We have a check for whether we can use standard coroutines (in namespace
std) or the technical specification (in std::experimental), but it doesn't
work since Clang doesn't report the correct standard version. Use a
compiler versionspecific check, inspired by Seastar's check.

This allows building with clang 14.

Closes #10603
2022-05-19 11:31:40 +03:00
Nadav Har'El
0040e9e7f4 Merge 'cql: Add proper validation for null and unset inside collections send as bound values' from Jan Ciołek
Let's say we have a query like:
```cql
INSERT INTO ks.t (list_column) VALUES (?);
```
And the driver sends a list with null inside as the bound value, something like `[1, 2, null, 4]`.

In such case we should throw `invalid_request_exception` because `nulls` are not allowed inside collections.

Currently when a query like this gets executed Scylla throws an ugly marshalling error.
This is because the validation code reads size of the next element, interprets it as an unsigned integer and tries to read this much.
In case of `null` element the size is `-1`, which when converted to unsigned `size_t` gives 18446744073709551615 and it fails to read this much.

This PR adds proper validation checks to make the error message better.

I also added some tests.
I originally tried to write them in python, but python driver really doesn't like sending invalid values.
Trying to send `[1, None, 2]` results in a list with empty value instead of null.
Trying to send `[1, UNSET_VALUE, 2]` Fails before query even leaves the driver.

Fixes #10580

Closes #10599

* github.com:scylladb/scylla:
  cql3: Add tests for null and unset inside collections
  cql3: Add null and unset checks in collection validation
2022-05-19 11:25:24 +03:00
Piotr Sarna
e54a4ebdcb docs: add a paragraph on PRUNE MATERIALIZED VIEW statement 2022-05-19 10:16:04 +02:00
Piotr Sarna
b8a36ff253 service,test: add a test case for error during pruning
The test case checks that errors which occur during materialized
view pruning are properly propagated back to the user.
2022-05-19 10:16:04 +02:00
Piotr Sarna
995468520e tests: add ghost row deletion test case
The tests checks if manually injected ghost rows are properly deleted
by the ghost row delete statement - and, that non-ghost regular rows
are left intact.
2022-05-19 10:16:03 +02:00
Piotr Sarna
be2ef862bd cql3: enable ghost row deletion via CQL
This commit allows accepting a CQL request to clear ghost rows
from a given view partition range. Currently its syntax is a purposely
convoluted mix of existing keywords, which makes sure that the statement
is never issued by mistake. Example runs:

 -- try deleting all ghost rows, effectively performs a paged full scan
PRUNE MATERIALIZED VIEW my_mv;

 -- try deleting ghost rows from a single view partition
PRUNE MATERIALIZED VIEW my_mv WHERE mv_pk = 3;

 -- try deleting ghost rows from a token range (effective full scans)
PRUNE MATERIALIZED VIEW my_mv WHERE TOKEN(mv_pk) > 7 AND TOKEN(mv_pk) < 42
2022-05-19 10:11:50 +02:00
Piotr Sarna
ec0a3bbbd4 cql3: add a statement for deleting ghost rows
In order to expose the API for deleting ghost rows from a view,
a CQL statement is created. It is loosely based on select_statement,
as its first step is to select view table rows.
2022-05-19 10:11:50 +02:00
Piotr Sarna
d74e25be67 cql3: convert is_json statement parameter to enum
Right now is_json is used to decide if the statement needs to be treated
in a special way. For two types (regular statement and JSON statement),
a boolean is enough, but this series extends it for two more types,
so the flag is converted to an enum.
2022-05-19 10:11:50 +02:00
Piotr Sarna
2c6e1a5409 pager: add ghost row deleting pager
The pager is based on ghost row deleting visitor - it simply
traverses each fetched page with it in order to delete
ghost rows from the view table.
2022-05-19 10:11:50 +02:00
Piotr Sarna
c3a9658535 db,view: add delete ghost rows visitor
The visitor is used to traverse view rows, and if it detects
a ghost row it qualifies it for deletion. Qualification is based
on a base table read with cl=ALL: if the corresponding row is not
present in the base table, it is considered a ghost.
2022-05-19 10:11:50 +02:00
cvybhu
7adc572ec6 cql3: Add tests for null and unset inside collections
Add a bunch of tests that test what happens
when there is a null or unset value inside collections.

They are not allowed so every such attempt
should end with invalid_request_exception
with proper message.

I had to write a new function for collection serialization.
I tried to use data_value and its methods, but it's impossible
to create a data_value that represents an unset value.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-19 00:15:17 +02:00
Benny Halevy
4e78b4de1b ser: use vector_deserializer by default for all idl vectors
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-18 19:24:18 +03:00
Benny Halevy
ea89a9aa56 mutation_partition_view: do_accept_gently: use the range based deserializers
Currently use the range_deserializer for range tombstones and rows.

We may consider in the future also gently iterating over cells
in a row and then dipping into collection cells that might also
contain a large number of items.

Fixes #10558

Test: frozen_mutation_test(dev, debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-18 19:24:18 +03:00
Benny Halevy
29632e739d idl-compiler: generate *_range methods using vector_deserializer
Generate code for *_range methods that return a
vector_deserializer rather than constructing the complete
vector of views.

This would be useful for streamed mutation unfreezing
in the following patch.

Later, we should just use vector_deserializer for all vectors.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-18 19:24:18 +03:00
Benny Halevy
5b902d9fd6 serializer_impl: add vector_deserializer
To be used for streaming through a serialized
vector, deserializing the items as we go
when dereferencing or incrementing the iterator.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-18 19:10:13 +03:00
Avi Kivity
e6fe9cc683 Merge 'Reapply: "disable_auto_compaction: stop ongoing compactions"' from Eliran Sinvani
Reapply: "disable_auto_compaction: stop ongoing compactions"
This is a reapplication of a former commit
4affa801a5 which was reverted by
8e8dc2c930.
This commit is a fixed version of the original where a call to the
compaction_manager constructor accidentally issued (`compaction_manager()`)
instead a call to retrieve a compaction manager reference
(`get_compaction_manager()`), we don't use this function because it
doesn't exist anymore - it existed at the time the patch was written
bu was removed in 9066224cf4 later on,
instead, we just use the private table member _compaction_manager which refs
the compaction manager.

The explanation for the bad effect is probably that a `this` pointer
capture down the call chain, resulted in a use after free which had
an unknown effect on the system. (memory corruption at startup).

Test: unit (dev,debug)
      write performance test as the one used to find the bug.
A screenshot of the performance test can be found at
https://github.com/scylladb/scylla/issues/10146/#issuecomment-1129578381

Fixes https://github.com/scylladb/scylla/issues/9313
Refs https://github.com/scylladb/scylla/issues/10146

For completeness, the original commit message was:
The api call disables new regular compaction jobs from starting
but it doesn't wait for ongoing compaction to stop and so it's
much less useful.

Returning after stopping regular compaction jobs and waiting
for them to stop guarantees that no regular compactions job are
running when nodetool disableautocompaction returns successfully.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes #10597

* github.com:scylladb/scylla:
  compaction_manager: Make invoking the empty constructor more explicit
  Reapply: "disable_auto_compaction: stop ongoing compactions"
2022-05-18 18:33:12 +03:00
Eliran Sinvani
c5e5692a01 compaction_manager: Make invoking the empty constructor more explicit
The compaction manager's empty constructor is supposed to be invoked
only in testing environment, however, it is easy to invoke it by mistake
from production code.
Here we add a more verbose constructor and making the default compaction
private, the verbose compiler need to be invoked with a tag
for_testing_tag, this will ensure that this constructor will be invoked
only when intended.
The unit tests were changed according to this new paradigm.

Tests: unit (dev)

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2022-05-18 14:57:10 +03:00
Eliran Sinvani
c138981286 Reapply: "disable_auto_compaction: stop ongoing compactions"
This is a reapplication of a former commit
4affa801a5 which was reverted by
8e8dc2c930.
This commit is a fixed version of the original where a call to the
compaction_manager constructor accidentally issued (`compaction_manager()`)
instead a call to retrieve a compaction manager reference
(`get_compaction_manager()`), we don't use this function because it
doesn't exist anymore - it existed at the time the patch was written
bu was removed in 9066224cf4 later on,
instead, we just use the private table member _compaction_manager which refs
the compaction manager.

The explanation for the bad effect is probably that a `this` pointer
capture down the call chain, resulted in a use after free which had
an unknown effect on the system. (memory corruption at startup).

Test: unit (dev,debug)
      write performance test as the one used to find the bug.
A screenshot of the performance test can be found at
https://github.com/scylladb/scylla/issues/10146/#issuecomment-1129578381

Fixes #9313
Refs #10146

For completeness, the original commit message was:
The api call disables new regular compaction jobs from starting
but it doesn't wait for ongoing compaction to stop and so it's
much less useful.

Returning after stopping regular compaction jobs and waiting
for them to stop guarantees that no regular compactions job are
running when nodetool disableautocompaction returns successfully.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2022-05-18 14:57:10 +03:00
cvybhu
345e89756b cql3: Add null and unset checks in collection validation
Validating a collection should ensure that there
are no null or unset values inside the collection.

The validation already fails in case of such values,
but it does so in an ugly way.
Length of null and unset value is negative but is
cast to unsigned size_t. Then it tries to read
a really large value and fails with marshalling error.

The new checks are a better way to handle this.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-18 11:05:14 +02:00
Benny Halevy
9e1c76ea9e test: frozen_mutation_test: test_writing_and_reading_gently: log detailed error
The nested exception is also interesting
and boost doesn't print it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-18 08:21:54 +03:00
Botond Dénes
6fd1322bf3 test/boost/mutation_test: test_query_digest: use the same now everywhere
This test started failing sporadically of late. This failure is seen
quite often in CI tests but is very hard to reproduce locally. The
problem seems to be timing related, as the same seeds that fail in CI
don't fail locally. This patch is a speculative fix. The test has a
single time-related components: `gc_clock::now()`. This is invoked in 4
different places during a single iteration, giving ample opportunity for
off-by-one errors to appear. Although there is no solid proof for this
being the problem, this is a good candidate. This patch replaces all
those different invocations, with a single one per test: this value is
then propagated to all places that need it.

Fixes: #10554

Marking the patch as a fix for the issue, if the problem re-surfaces
after this patch we'll re-poen it.

Closes #10589
2022-05-17 16:25:04 +03:00
Takuya ASADA
883b97d8b2 dist/common/scripts: generate debug log when exception occurred
Using traceback_with_variables module, generate more detail traceback
with variables into debug log.
This will help fixing bugs which is hard to reproduce.

Closes #10472

[avi: regenerate frozen toolchain]
2022-05-17 13:18:27 +03:00
Raphael S. Carvalho
ca322fb7c2 compaction_manager: Quickly abort maintenance compaction waiting for its turn
Today, aborting a maintenance compaction like major, which is waiting for
its turn to run, can take lots of time because compaction manager will
only be able to bail out the task once it gets the "permit" from the
serialization mechanism, i.e. semaphore. Meaning that the command that
started the task will only complete after all this time waiting for
the "permit".

To allow a pending maintenance compaction to be quickly aborted, we
can use the abortable variant of get_units(). So when user submits an
abortion request, get_units() will be able to return earlier through
the abort exception.

Refs #10485.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #10581
2022-05-17 13:14:51 +03:00
Avi Kivity
a8f9ed56fb Merge 'cql3: Replace relation class with expression' from Jan Ciołek
Before this change parser used to output instances of the `relation` class, which were later converted to `restrction`.
`relation` took care of initial processing such as preparing and some validation checks.

This PR aims to remove the `relation` class and perform it's functionality using only `expression`.
This is a step towards removing the legacy classes and converting all AST analysis to work on `expressions`.

Closes #10409

* github.com:scylladb/scylla:
  cql3: Remove scalar from bind_variable_scalar_prepare_expression
  cql3: expr: Remove shape_type from bind_variable
  cql3: Remove prepare_expression_multi_column
  cql3: Remove relation class
  cql3: Add more tests for expr::printer
  cql3: Make parser output expression for relations
  cql3: expr: add printer for expression
  cql3: expr: expr::to_restriction: Handle token relations
  cql3: expr: expr::to_restriction: Handle multi column relations
  cql3: expr: Add expr::to_restriction for single column relations
  cql3: expr: Add prepare_binary_operator
  cql3: expr: Change how prepare_expression handles bind_variable
  clq3: expr: Add columns to expr::token struct
  cql3: expr: Modify list_prepare_expression to handle lists of IN values
  cql3: expr: Add expr::as_if for non-const expressions
2022-05-17 12:51:54 +03:00
Takuya ASADA
00ce34c29b scylla_prepare: describe error more correctly
Currently our error message on scylla_prepare says "Exception occurred
while creating perftune.yaml", even perftune.yaml is already generated,
and error occurred after that.
To describe error more correctly, add another error message after
perftune.yaml generated.

see scylladb/scylla-enterprise#2201

Closes #10575
2022-05-16 20:05:58 +03:00
cvybhu
21453ac9a4 cql3: Remove scalar from bind_variable_scalar_prepare_expression
There is now only one function to prepare bind_variable,
so we can remove 'scalar' from its name.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
9b49d27a8d cql3: expr: Remove shape_type from bind_variable
shape_type was used in prepare_expression to differentiate
between a few cases and create the correct receivers.
This was used by the relation class.

Now creating the correct receiver has been delegated to the caller
of prepare_expression and all bind_variables can be handled
in the same simple way.

shape_type is not needed anymore.

Not having it is better because it simplifies things.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
c0fc82d4be cql3: Remove prepare_expression_multi_column
This function was used by multi_column_relation.hh,
but now it isn't needed anymore.

The only way to prepare a bind_variable is now the standard prepare_expression.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
d85f680df3 cql3: Remove relation class
Functionality of the relation class has been replaced by
expr::to_restriction.

Relation and all classes deriving from it can now be removed.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00