Commit Graph

1825 Commits

Author SHA1 Message Date
Botond Dénes
d2ddaced4e test/lib/reader_lifecycle_policy: get rid of lifecycle workarounds
The lifecycle of the reader lifecycle policy and all the resources the
reads use is now enclosed in that of the multishard reader thanks to its
close() method. We can now remove all the workarounds we had in place to
keep different resources as long as background reader cleanup finishes.
2021-06-16 11:29:36 +03:00
Botond Dénes
5a271e42a5 test/lib/reader_lifecycle_policy: destroy_reader(): stop the semaphore
So that when this method returns the semaphore is safe to destroy. This
in turn will enable us to get rid of all the machinery we have in place
to deal with the semaphore having to out-live the lifecycle policy
without a clear time as to when it can be safe to destroy.
2021-06-16 11:29:36 +03:00
Botond Dénes
c09c62a0fb test/lib/reader_lifecycle_policy: use a more robust eviction mechanism
The test reader lifecycle policy has a mode in which it wants to ensure
all inactive readers are evicted, so tests can stress reader recreation
logic. For this it currently employs a trick of creating a waiter on the
semaphore. I don't even know how this even works (or if it even does)
but it sure complicates the lifecycle policy code a lot.
So switch to the much more reliable and simple method of creating the
semaphore with a single count and no memory. This ensures that all
inactive reads are immediately evicted, while still allows a single read
to be admitted at all times.
2021-06-16 11:29:36 +03:00
Botond Dénes
578a092e4a reader_concurrency_semaphore: wait for all permits to be destroyed in stop()
To prevent use-after-free resulting from any permit out-living the
semaphore.
2021-06-16 11:29:36 +03:00
Botond Dénes
a10a6e253e test/lib/reader_lifcecycle_policy: fix indentation
Left broken from the previous patch.
2021-06-16 11:29:36 +03:00
Botond Dénes
8c7447effd mutation_reader: reader_lifecycle_policy::destroy_reader(): require to be called on native shard
Currently shard_reader::close() (its caller) goes to the remote shard,
copies back all fragments left there to the local shard, then calls
`destroy_reader()`, which in the case of the multishard mutation query
copies it all back to the native shard. This was required before because
`shard_reader::stop()` (`close()`'s) predecessor) couldn't wait on
`smp::submit_to()`. But close can, so we can get rid of all this
back-and-forth and just call `destroy_reader()` on the shard the reader
lives on, just like we do with `create_reader()`.
2021-06-16 11:29:35 +03:00
Botond Dénes
4ecf061c90 reader_lifecycle_policy implementations: fix indentation
Left broken from the previous patch.
2021-06-16 11:21:38 +03:00
Botond Dénes
a7e59d3e2c mutation_reader: reader_lifecycle_policy::destroy_reader(): de-futurize reader parameter
The shard reader is now able to wait on the stopped reader and pass the
already stopped reader to `destroy_reader()`, so we can de-futurize the
reader parameter of said method. The shard reader was already patched to
pass a ready future so adjusting the call-site is trivial.
The most prominent implementation, the multishard mutation query, can
now also drop its `_dismantling_gate` which was put in place so it can
wait on the background stopping if readers.

A consequence of this move is that handling errors that might happen
during the stopping of the reader is now handled in the shard reader,
not all lifecycle policy implementations.
2021-06-16 11:21:38 +03:00
Tomasz Grabiec
9d49a26e79 Merge "raft: randomized_nemesis_test: tick servers less often than the network in basic_test" from Kamil
Previously `ticker` would use a single function, `on_tick`, which it
called in a loop with yields in-between. In `basic_test` we would use
this to tick every object in synchrony.

However, to closely simulate a production environment, we want the
tick ratios to be different. For example Raft servers should be ticked
rarely compared to the network.

We may also want to give the Seastar reactor more space between the
function calls (e.g. if they cause a bunch of work to be created for the
reactor that needs more than one tick to complete).

To support these use cases we first generalize `ticker` to take a set of
functions with associated numbers. These numbers are the call periods of
their corresponding functions: given {n, f}, `f` will be called each
`n`th tick.

We use this new functionality to tick Raft servers less often than the
network in basic_test.

This patchset effectively reverts 01b6a2eb38
which caused the ticker to call `on_tick` only when the Seastar reactor
had no work to do. This approach is unfortunately incompatible with the
approach taken there. We *do* want the ticker to race with other work,
potentially producing more work while already scheduled work is executing,
and we want to see in tests what happens when we adjust the ticking ratios
of different subsystems.

The previous approach also had a problem where if there was an infinite task
loop executing, the ticker wouldn't ever tick.

The previous fix was introduced since the ticker caused too much work to
be produced (so the reactor couldn't keep up) due to ticking the Raft
servers too often (after each yield). This commit deals with the problem
in a different way, by ticking the servers rarely, which also resembles
"real-life" scenarios better.

* kbr/tick-network-often-v4:
  raft: randomized_nemesis_test: generalize `ticker` to take a set of functions
  raft: randomized_nemesis_test: split `environment::tick` into two functions
  raft: randomized_nemesis_test: fix potential use-after-free in basic_test
2021-06-15 01:54:57 +02:00
Kamil Braun
8f1caa6a90 raft: randomized_nemesis_test: generalize ticker to take a set of functions
... with associated calling periods and use the new API in `basic_test`.

Previously `ticker` would use a single function, `on_tick`, which it
called in a loop with yields in-between. In `basic_test` we would use
this to tick every object in synchrony.

However, to closely simulate a production environment, we may want the
tick ratios to be different. For example Raft servers should be ticked
rarely compared to the network.

We may also want to give the Seastar reactor more space between the
function calls (e.g. if they cause a bunch of work to be created for the
reactor that needs more than one tick to complete).

To support these use cases we generalize `ticker` to take a set of
functions with associated numbers. These numbers are the call periods of
their corresponding functions: given {n, f}, `f` will be called each
`n`th tick.

We also modify `basic_test` to use this new approach: we tick Raft
servers once per 10 network ticks (in particular, once per 10 reactor
yields).

This commit effectively reverts 01b6a2eb38
which caused the ticker to call `on_tick` only when the Seastar reactor
had no work to do. This approach is unfortunately incompatible with the
approach taken there. We *do* want the ticker to race with other work,
potentially producing more work while already scheduled work is executing,
and we want to see in tests what happens when we adjust the ticking ratios
of different subsystems.

The previous approach also had a problem where if there was an infinite task
loop executing, the ticker wouldn't ever tick.

The previous fix was introduced since the ticker caused too much work to
be produced (so the reactor couldn't keep up) due to ticking the Raft
servers too often (after each yield). This commit deals with the problem
in a different way, by ticking the servers rarely, which also resembles
"real-life" scenarios better.

With this change we must also wait a bit longer for the first node to
elect itself as a leader at the beginning of the test.
2021-06-14 16:54:38 +02:00
Kamil Braun
c0b80f1f8a raft: randomized_nemesis_test: split environment::tick into two functions
One for ticking the network and one for ticking the servers.
2021-06-14 16:54:38 +02:00
Kamil Braun
f42776aded raft: randomized_nemesis_test: fix potential use-after-free in basic_test
The test starts by waiting a certain number of ticks for the first node
to elect itself as a leader.

If this wait times out - i.e. the number of ticks passes before the node
manages to elect itself - the future associated with the task which checks
for the leader condition becomes discarded (it is passed to
`with_timeout`) and the task may keep using the `environment` (which it
has a reference to) even after the `environment` is destroyed.

Furthermore, the aforementioned task is a coroutine which uses lambda
captures in its body. Leaving `with_timeout` destroys the lambda object,
causing the coroutine to refer to no-longer-existing captures.

We fix the problems by:
- making `environment` `weakly_referencable` and checking if its alive
  before it's used inside the task,
- not capturing anything in the lambda but passing whatever's needed as
  function arguments (so these things get allocated inside the coroutine
  frame).
2021-06-14 16:54:38 +02:00
Nadav Har'El
3645c7104b Merge: Wrap alternator start-stop into controller
Merged patch series by Pavel Emelyanov:

Alternator start and stop code is sitting inside the main()
and it's a big piece of code out there. Havig it all in main
complicates rework of start-stop sequences, it's much more
handy to have it in alternator/.

This set puts the mentioned code into transport- and thrift-
like controller model. While doing it one more call for global
storage service goes away.

* 'br-alternator-clientize' of https://github.com/xemul/scylla:
  alternator: Move start-stop code into controller
  alternator: Move the whole starting code into a sched group
  alternator: Dont capture db, use cfg
  alternator: Controller skeleton
  alternator: Controller basement
  alternator: Drop storage service from executor
2021-06-14 15:44:10 +03:00
Alejo Sanchez
5c8092cf42 raft: fix election with disruptive candidate
This patch also fixes rare hangs in debug mode for drops_04 without
prevote.

Branch URL: https://github.com/alecco/scylla/tree/raft-fixes-05-v2-dueling

Tests: unit ({dev}), unit ({debug}), unit ({release})

Changes in v2:
    - Fixed commit message                               @kostja

Whithout prevote, a node disconnected for long enough becomes candidate.
While disconnected (A) it keeps increasing its term.
When it rejoins it disrupts the current leader (C) which steps down due
to the higher term in (A)'s append_entries_reply and (C) also increases
its term.

Meanwhile followers (B) and (D) don't know (C) stepped down but see it
alive according to the current failure detecture implementation, and
also (A) has shorter log than them.
So they reject (A)'s vote requests (Raft 4.2.3 Disruptive servers).

Then (C) rejects voting for (A) because it has shorter log.
And (C) becomes candidate but even though (A) votes for (C), the
previous followers (B) and (D) ignore a vote request while leader (C) is
still alive and election timeout has not passed.

(A) and (C) alone can't reach quorum 2/4. So elections never succeed.

This patch addresses this problem by making followers not ignore vote
requests from who they think is the current leader even though
election timout was not reached.

As @kostja noted, if failure detector would consider a leader alive only
as long as it sends heartbeats (append requests) this patch is no longer
needed.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20210611172734.254757-1-alejo.sanchez@scylladb.com>
2021-06-14 11:07:38 +02:00
Raphael S. Carvalho
846f0bd16e sstables: Fix incremental selection with compound sstable set
Incremental selection may not work properly for LCS and ICS due to an
use-after-free bug in partitioned set which came into existence after
compound set was introduced.

The use-after-free happens because partitioned set wasn't taking into
account that the next position can become the current position in the
next iteration, which will be used by all selectors managed by
compound set. So if next position is freed, when it were being used
as current position, subsequent selectors would find the current
position freed, making them produce incorrect results.

Fix this by moving ownership of next pos from incremental_selector_impl
to incremental_selector, which makes it more robust as the latter knows
better when the selection is done with the next pos. incremental_selector
will still return ring_position_view to avoid copies.

Fixes #8802.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210611130957.156712-1-raphaelsc@scylladb.com>
2021-06-13 16:45:07 +03:00
Tomasz Grabiec
7521301b72 Merge "raft: add tests for non-voters and fix related bugs" from Kostja
Add test coverage inspired by etcd for non-voter servers,
and fix issues discovered when testing.

* scylla-dev/raft-learner-test-v4:
  raft: (testing) test non-voter can vote
  raft: (testing) test receiving a confchange in a snapshot
  raft: (testing) test voter-non-voter config change loop
  raft: (testing) test non-voter doesn't start election on election timeout
  raft: (testing) test what happens when a learner gets TimeoutNow
  raft: (testing) implement a test for a leader becoming non-voter
  raft: style fix
  raft: step down as a leader if converted to a non-voter
  raft: improve configuration consistency checks
  raft: (testing) test that non-voter stays in PIPELINE mode
  raft: (testing) always return fsm_debug in create_follower()
2021-06-12 21:36:47 +03:00
Nadav Har'El
9774c146cc cql-pytest: add test for connecting with different SSL/TLS versions
This is a reproducer for issue #8827, that checks that a client which
tries to connect to Scylla with an unsupported version of SSL or TLS
gets the expected error alert - not some sort of unexpected EOF.

Issue #8827 is still open, so this test is still xfailing. However,
I verified that with a fix for this issue, the test passes.

The test also prints which protocol versions worked - so it also helps
checking issue #8837 (about the ancient SSL protocol being allowed).

Refs #8837
Refs #8827

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210610151714.1746330-1-nyh@scylladb.com>
2021-06-12 21:36:47 +03:00
Michael Livshin
2bbc293e22 tests: improve error reporting of test_env::reusable_sst()
Distinguish the "no such sstable" case from any reading errors.

While at it, coroutinize the function.

Refs #8785.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Message-Id: <20210610113304.264922-1-michael.livshin@scylladb.com>
2021-06-11 19:06:43 +02:00
Pavel Emelyanov
773d2fe2a4 alternator: Drop storage service from executor
It's completely unused in it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-06-11 18:05:11 +03:00
Konstantin Osipov
2be8a73c34 raft: (testing) test non-voter can vote
When a non-voter is requested a vote, it must vote
to preserve liveness. In Raft, servers respond
to messages without consulting with their current configuration,
and the non-voter may not have the latest configuration
when it is requested to vote.
2021-06-11 17:16:57 +03:00
Konstantin Osipov
eaf32f2c3c raft: (testing) test receiving a confchange in a snapshot 2021-06-11 17:16:56 +03:00
Konstantin Osipov
d08ad76c24 raft: (testing) test voter-non-voter config change loop 2021-06-11 17:16:55 +03:00
Konstantin Osipov
6e4619fe87 raft: (testing) test non-voter doesn't start election on election timeout 2021-06-11 17:16:55 +03:00
Konstantin Osipov
c8ae13a392 raft: (testing) test what happens when a learner gets TimeoutNow
Once learner receives TimeoutNow it becomes a candidate, discovers it
can't vote, doesn't increase its term and converts back to a
follower. Once entries arrive from a new leader it updates its
term.
2021-06-11 17:16:55 +03:00
Konstantin Osipov
a972269630 raft: (testing) implement a test for a leader becoming non-voter 2021-06-11 17:16:55 +03:00
Konstantin Osipov
3e6fd5705b raft: (testing) test that non-voter stays in PIPELINE mode
Test that configuration changes preserve PIPELINE mode.
2021-06-11 17:07:39 +03:00
Konstantin Osipov
1dfe946c91 raft: (testing) always return fsm_debug in create_follower()
create_follower() is a test helper, so it's OK to return
a test-enabled FSM from it.
This will be used in a subsequent patch/test case.
2021-06-11 12:24:43 +03:00
Alejo Sanchez
ff34a6515d raft: replication test: fix elect_new_leader
Recently, the logic of elect_new_leader was changed to allow the old
leader to vote for the new candidate. But the implementation is wrong as
it re-connects the old leader in all cases disregarding if the nodes
were already disconnected.

Check if both old leader and the requested new leader are connected
first and only if it is the case then the old leader can participate in
the election.

There were occasional hangs in the loop of elect_new_leader because
other nodes besides the candidate were ticked.  This patch fixes the
loop by removing ticks inside of it.

The loop is needed to handle prevote corner cases (e.g. 2 nodes).

While there, also wait log on all followers to avoid a previously
dropped leader to be a dueling candidate.

And update _leader only if it was changed.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20210609193945.910592-3-alejo.sanchez@scylladb.com>
2021-06-10 12:36:25 +02:00
Nadav Har'El
b26fcf5567 test/alternator: increase timeouts in test_tracing.py
The query tracing tests in test/alternator's test_tracing.py had one
timeout of 30 seconds to find the trace, and one unclearly-coded timeout
for finding the right content for the trace. We recently saw both
timeouts exceeded in tests, but only rarely and only in debug mode,
in a run 100 times slower than normal.

This patch increases both timeouts to 100 seconds. Whatever happens then,
we win: If the test stops failing, we know the new timeout was enough.
If the test continues to fail, we will be able to conclude that we have a
real bug - e.g., perhaps one of the LWT operations has a bug causing it
to hang indefinitely.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210608205026.1600037-1-nyh@scylladb.com>
2021-06-10 09:19:01 +03:00
Tomasz Grabiec
419ee84d86 Merge "sstable: validate first and last keys ordering" from Benny
In #8772, an assert validating first token <= last token
failed in leveled_manifest::overlapping.

It is unclear how we got to that state, so add validation
in sstable::set_first_and_last_keys() that the to-be-set
first and last keys are well ordered.
Otherwise, throw malformed_sstable_exception.

set_first_and_last_keys is called both on the write path
from the sstable writer before the sstable is sealed,
and on the open/load path via update_info_for_opened_data().

This series also fixes issues with unit tests with
regards to first/last keys so they won't fail the
validation.

Refs #8772

Test: unit(dev)
DTest: next-gating(dev), materialized_views_test:TestMaterializedViews.interrupt_build_process_and_resharding_half_to_max_test(debug)

* tag 'validate-first-and-last-keys-ordering-v1':
  sstable: validate first and last keys ordering
  test: lib: reusable_sst: save unexpected errors
  test: sstable_datafile_test: stcs_reshape_test: use token_generation_for_current_shard
  test: sstable_test: define primary key in schema for compressed sstable
2021-06-09 14:43:02 +02:00
Tomasz Grabiec
ce7a404f17 Merge "Cleanups/refactoring for Raft Group 0" from Kostja
* scylla-dev/raft-group-0-part-1-rebase:
  raft: (service) pass Raft service into storage_service
  raft: (service) add comments for boot steps
  raft: add ordering for raft::server_address based on id
  raft: (internal) simplify construction of tagged_id
  raft: (internal) tagged_id minor improvements
2021-06-09 10:48:05 +02:00
Konstantin Osipov
267a8e99ad raft: (service) pass Raft service into storage_service
Raft group 0 initialization and configuration changes
should be integrated with Scylla cluster assembly,
happening when starting the storage service and joining
the cluster. Prepare for this.

Since Raft service depends on query processor, and query
processor depends on storage service, to break a dependency
loop split Raft initialization into two steps: starting
an under-constructed instance of "sharded" Raft service,
accepting an under-constructed instance of "sharded"
query_processor, and then passed into storage service start
function, and then the local state of Raft groups from system
tables once query processor starts.

Consistently abbreviate raft_services instance raft_svcs, as
is the convention at Scylla.

Update the tests.
2021-06-08 14:52:32 +03:00
Konstantin Osipov
d42d5aee8c raft: (internal) simplify construction of tagged_id
Make it easy to construct tagged_id from UUID.
2021-06-08 14:52:32 +03:00
Konstantin Osipov
c9a23e9b8a raft: (internal) tagged_id minor improvements
Introduce a syntax helper tagged_id::create_random_id(),
used to create a new Raft server or group id.

Provide a default ordering for tagged ids, for use
in Raft leader discovery, which selects the smallest
id for leader.
2021-06-08 14:52:32 +03:00
Nadav Har'El
355dbf2140 test/cql-pytest: option for running the tests over SSL
This patch adds a "--ssl" option to test/cql-pytest's pytest, as well as
to the run script test/cql-pytest/run. When "test/cql-pytest/run --ssl"
is used, Scylla is started listening for encrypted connections on its
standard port (9042) - using a temporary unsigned certificate. Then, the
individual tests connect to this encrypted port using TLSv1.2 (Scylla
doesn't support earlier version of SSL) instead of TCP.

This "--ssl" feature allows writing test which stress various aspects of
the connection (e.g., oversized requests - see PR #8800), and then be
able to run those tests in both TCP and SSL modes.

Fixes #8811

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210607200329.1536234-1-nyh@scylladb.com>
2021-06-08 11:43:20 +02:00
Avi Kivity
3e3003fcc1 Merge 'cql3: limit the concurrency of indexed statements' from Piotr Sarna
Indexed select statements fetch primary key information from
their internal materialized views and then use it to query
the base table. Unfortunately, the current mechanism for retrieving
base table rows makes it easy to overwhelm the replicas with unbounded
concurrency - the number of concurrent ops is increased exponentially
until a short read is encountered, but it's not enough to cap the
concurrency - if data is fetched row-by-row, then short reads usually
don't occur and as a result it's easy to see concurrency of 1M or
higher. In order to avoid overloading the replicas, the concurrency
of indexed queries is now capped at 4096 and additionally throttled
if enough results are already fetched. For paged queries it means that
the query returns as soon as 1MB of data is ready, and for unpaged ones
the concurrency will no longer be doubled as soon as the previous
iteration fetched 1MB of results.

The fixed 4096 value can be subject to debate, its reasoning is as follows:
for 2KiB rows, so moderately large but not huge, they result in
fetching 10MB of data, which is the granularity used by replicas.
For 200B rows, which is rather small, the result would still be
around 1MB.
At the same time, 4096 separate tasks also means 4096 allocations,
so increasing the number also strains the allocator.

Fixes #8799

Tests: unit(release),
       manual: observing metrics of modified index_paging_test

Closes #8814

* github.com:scylladb/scylla:
  cql3: limit the transitional result size for indexed queries
  cql3: return indexed pages after 1MB worth of data
  cql3: limit the concurrency of indexed statements
2021-06-07 18:00:51 +03:00
Gleb Natapov
01b6a2eb38 raft: randomized_nemesis_test: tick virtual clock less aggressively
Currently each tick of the virtual clock immediately schedules the next one
at the end of the task queue, but this is too aggressive. If a tick
generates work that need two tasks to be scheduled one after another
such implementation will make the task queue grow to infinity. Considering
that in the debug mode even ready future causes preemption and task
queue shuffling may cause two or more ticks to be executed without any
other work done in the middle it is very easy to get to such situation.

The patch changes the virtual clock to tick only when a shard is idle.
Message-Id: <20210606140305.2930189-1-gleb@scylladb.com>
2021-06-07 16:54:56 +02:00
Piotr Sarna
df0d44486a cql3: limit the transitional result size for indexed queries
Unpaged indexed queries already have a concurrency limit of 4096,
but now the concurrency is further limited by previous number of bytes
fetched. Once this number reached 1MB, the concurrency will not be
increased in consecutive queries to avoid overload.
2021-06-07 16:29:18 +02:00
Pavel Solodovnikov
76bea23174 treewide: reduce header interdependencies
Use forward declarations wherever possible.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>

Closes #8813
2021-06-07 15:58:35 +03:00
Tomasz Grabiec
50d64646cd Merge "raft: replication test fixes and OOP refactor" from Alejo
Feature requests, fixes, and OOP refactor of replication_test.

Note: all known bugs and hangs are now fixed.

A new helper class "raft_cluster" is created.
Each move of a helper function to the class has its own commit.
New helpers are provided

To simplify code, for now only a single apply function can be set per
raft_cluster. No tests were using in any other way. In the future,
there could be custom apply functions per server dynamically assigned,
if this becomes needed.

* alejo/raft-tests-replication-02-v3-30: (66 commits)
  raft: replication test: wait for log for both index and term
  raft: replication test: reset network at construction
  raft: replication test: use lambda visitor for updates
  raft: replication test: move structs into class
  raft: replication test: move data structures to cluster class
  raft: replication test: remove shared pointers
  raft: replication test: move get_states() to raft_cluster
  raft: replication test: test_server inside raft_cluster
  raft: replication test: rpc declarative tests
  raft: replication test: add wait_log
  raft: replication test: add stop and reset server
  raft: replication test: disconnect 2 support
  raft: replication test: explicit node_id naming
  raft: replication test: move definitions up
  raft: replication test: no append entries support
  raft: replication test: fix helper parameter
  raft: replication test: stop servers out of config
  raft: replication test: wait log when removing leader from configuration
  raft: replication test: only manipulate servers in configuration
  raft: replication test: only cancel rearm ticker for removed server
  ...
2021-06-06 19:18:49 +03:00
Piotr Sarna
cb17aa1e53 Merge 'test/alternator: rewrite run script to share code with cql-pytest's run script' from Nadav Har'El
In this small series, I rewrite test/alternator/run to Python using the utility
functions developed for test/cql-pytest. In the future, we should do the same to
test/redis/run and test/scylla-gdb/run.

The benefit of this rewrite is less code duplication (all run scripts start with
the same duplicate code to deal with temporary directories, to run Scylla IP
addresses, etc.), but most importantly - in the future fixes we do to cql-pytest
(e.g., parameters needed to start Scylla efficiently, how to shut down Scylla,
etc.) will appear automatically in alternator test without needing to remember
to change both.

Another benefit is that test/alternator/run will now be Python, not a shell
script. This should make it easier to integrate it into test.py (refs #6212) in
the future - if we want to.

Closes #8792

* github.com:scylladb/scylla:
  test/alternator: rewrite test/alternator/run script in Python
  test/cql-pytest: make test run code more general
2021-06-06 19:18:49 +03:00
Avi Kivity
872cd8f692 test: adjust copyright statement to use ScyllaDB rather than old name 2021-06-06 19:18:49 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Solodovnikov
2187a59089 treewide: move service::cas_request out from storage_proxy.hh
And remove all remaining inclusions of `storage_proxy.hh` in the
headers.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-06-06 19:18:49 +03:00
Pavel Solodovnikov
e0749d6264 treewide: some random header cleanups
Eliminate not used includes and replace some more includes
with forward declarations where appropriate.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-06-06 19:18:49 +03:00
Gleb Natapov
bb822c92ab raft: change raft::rpc api to return void for most sending functions
Most RAFT packets are sent very rarely during special phases of the
protocol (like election or leader stepdown). The protocol itself does
not care if a packet is sent or dropped, so returning futures from their
send function does not serve any purpose. Change the raft's rpc interface
to return void for all packet types but append_request. We still want to
get a future from sending append_request for backpressure purposes since
replication protocol is more efficient if there is no packet loss, so
it is better to pause a sender than dropping packets inside the rpc. Rpc
is still allowed to drop append_requests if overloaded.
2021-06-06 19:18:49 +03:00
Benny Halevy
3f9bad0f0a test: compound_test: use tests::random
For reproducibility.

Test: compound_test(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210602061910.286893-2-bhalevy@scylladb.com>
2021-06-06 09:21:23 +03:00
Benny Halevy
40e032ff8b test: compound_test: use to seastar test framework
Prepare for using tests::random instead of std::rand
for reproducibility.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210602061910.286893-1-bhalevy@scylladb.com>
2021-06-06 09:21:23 +03:00
Calle Wilund
3b55ef36d1 cf_prop_defs: Fix extensions merge to handle removal
Fixes #8773

When refactored for cdc, properties -> extensions merge
was modified so it did not handle _removal_ (i.e. an
extension function returning null -> no entry in new map).

This causes certain enterprise extensions to not be able
to disable themselves.

Fixed by filtering existing extensions by property keywords.
Unit test added.

Closes #8774
2021-06-06 09:21:23 +03:00
Nadav Har'El
f22ed3ff5c test/alternator: reduce very high timeout in one tracing test
In test_tracing.py::test_slow_query_log, the was what looked like
an innocent 30-second timeout, but this was in fact a 8 minute
timeout - because it started with sleeping 1 second, then 2 seconds,
then 3, ... until 30 seconds. Such a high timeout is frustrating when
trying to debug failures in the test - which is only expected to take
2 seconds (and all of it because of an artificial timeout).

So fix the loop to stop iterating after 60 seconds (a compromise
between 30 seconds and 8 minutes...), sleeping a constant amount
between iterations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210601150631.1037158-1-nyh@scylladb.com>
2021-06-06 09:21:23 +03:00