Commit Graph

27472 Commits

Author SHA1 Message Date
Wojciech Mitros
1ff72ca0a6 sstables: move kl row_consumer
In preparation for the next patch combining row_consumer and
mp_row_consumer_k_l, move row_consumer next to row_consumer.

Because row_consumer is going to be removed, we retire some
old tests for different implementations of the row_consumer
interface; as a result, we don't need to expose internal
types of kl sstable reader for tests, so all classes from
reader_impl.hh are moved to reader.cc, and the reader_impl.hh
file is deleted, and the reader.cc file has an analogous
structure to the reader.cc file in sstables/mx directory.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-07-21 18:04:22 +02:00
Wojciech Mitros
fc17c48bc9 sstables: merge consumer_m into mp_row_consumer_m
The consumer_m interface has only one implementation:
mp_row_consumer_m; and we're not planning other ones,
so to reduce the number of inheritances, and the number
of lines in the sstable reader, these classes may be
combined.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-07-21 17:36:10 +02:00
Wojciech Mitros
fbb56e930c sstables: move mp_row_consumer_m
To make next patch combining consumer_m and mp_row_consumer_m
more readable, move mp_row_consumer_m next to consumer_m.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-07-21 17:36:04 +02:00
Botond Dénes
8fc55fa5bf reader_concurrency_semaphore: get rid of struct permit_list
struct permit_list exists so the intrusive list declaration which needs
the definition of reader_permit can be hidden in the .cc. But it turns
out that if the hook type is fully spelled out, the intrusive list
declaration doesn't need T to be defined. Exploit this to get rid of
this extra indirection.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210720073121.63027-2-bdenes@scylladb.com>
2021-07-20 10:35:12 +03:00
Botond Dénes
11b39cbc23 reader_concurrency_semaphore: merge permit_stats into stats
If there was any reason to have them separate when permit_stats was
conceived, it is gone now, so merge the two.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210720073121.63027-1-bdenes@scylladb.com>
2021-07-20 10:35:12 +03:00
Tomasz Grabiec
a8528cb24d lsa: Fix uninitialized field access resulting in hangs during segment compaction
_free_space may be initialized with garbage so kind() getter should
only look at the bit which corresponds to the kind. Misclasification
of segment as being of different kind may result in a hang during
segment compaction.

Surfaced in debug mode build where the field is filled with 0xbebebebe.

Introduced in b5ca0eb2a2.

Fixes #9057
Message-Id: <20210719232734.443964-1-tgrabiec@scylladb.com>
2021-07-20 02:33:21 +03:00
Tomasz Grabiec
393b90112f gdb: segment-descs: Support debug mode builds
Debug mode builds have a different implementation of segment_store in LSA.
Message-Id: <20210719232125.442458-1-tgrabiec@scylladb.com>
2021-07-20 02:33:18 +03:00
Gleb Natapov
aa8c6b85fb raft: do not apply empty command list
Do not call user's state machine apply() if there is nothing to apply.

Message-Id: <YO1dMitXnZhZlmra@scylladb.com>
2021-07-19 18:26:18 +02:00
Nadav Har'El
36ec1d792e Merge 'cql-pytest: Test selecting from indexed table using only clustering key' from Jan Ciołek
Add examples from issue #8991 to tests
Both of these tests pass on `cassandra 4.0` but fail on `scylla 4.4.3`

First test tests that selecting values from indexed table using only clustering key returns correct values.
The second test tests that performing this operation requires filtering.

The filtering test looks similar to [the one for #7608](1924e8d2b6/test/cql-pytest/test_allow_filtering.py (L124)) but there are some differences - here the table has two clustering columns and an index, so it could test different code paths.

Contains a quick fix for the `needs_filtering()` function to make these tests pass.
It returns `true` for this case and the one described in #7708.

This implementation is a bit conservative - it might sometimes return `true` where filtering isn't actually needed, but at least it prevents scylla from returning incorrect results.

Fixes #8991.
Fixes #7708.

Closes #8994

* github.com:scylladb/scylla:
  cql3: Fix need_filtering on indexed table
  cql-pytest: Test selecting using only clustering key requires filtering
  cql-pytest: Test selecting from indexed table using clustering key
2021-07-19 18:23:08 +03:00
Tomasz Grabiec
049a1ef729 Merge 'flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler' from enedil
The downgrade_to_v1 didn't reset the state of range tombstone assembler
in case of the calls to next_partition or fast_forward_to, which caused
a situation where the closing range tombstone change is cleared from the
buffer before being emitted, without notifying the assembler. This patch
fixes the behaviour in fast_forward_to as well.

Fixes #9022

Closes #9023

* github.com:scylladb/scylla:
  flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler
  flat_mutation_reader: introduce public method returning the default size of internal buffer.
2021-07-19 17:10:23 +02:00
Jan Ciolek
54149242b4 cql3: Fix need_filtering on indexed table
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes #8991.
Fixes #7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-07-19 16:22:17 +02:00
Michał Radwański
67d99e02a7 flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler
The downgrade_to_v1 didn't reset the state of range tombstone assembler
in case of the calls to next_partition or fast_forward_to, which caused
a situation where the closing range tombstone change is cleared from the
buffer before being emitted, without notifying the assembler. This patch
fixes the behaviour in fast_forward_to as well.

Fixes #9022
2021-07-19 15:54:26 +02:00
Michał Radwański
c4089007a2 flat_mutation_reader: introduce public method returning the default size
of internal buffer.

This method is useful in tests that examine behaviour after the buffer
has been filled up.
2021-07-19 15:54:13 +02:00
Nadav Har'El
4c6dc5fce2 Merge 'continuous_data_consumer: properly skip bytes at the end of a range' from Wojciech Mitros
When skipping bytes at the end of a continuous_data_consumer range,
the position of the consumer is moved after the skipped bytes, but
the position of the underlying input_stream is not.

This patch adds skipping of the underlying input_stream, to make
its position consistent with the position of the consumer.

Fixes #9024

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Closes #9039

* github.com:scylladb/scylla:
  tests: add test for skipping bytes at end of consumer
  continuous_data_consumer: properly skip bytes at the end of a range
2021-07-19 15:57:26 +03:00
Botond Dénes
27fbca84f6 reader_concurrency_semaphore: remove prethrow_action
The semaphore accepts a functor as in its constructor which is run just
before throwing on wait queue overload. This is used exclusively to bump
a counter in the database::stats, which counts queue overloads. However,
there is now an identical counter in
reader_concurrency_semaphore::stats, so the database can just use that
directly and we can retire the now unused prethrow action.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210716111105.237492-1-bdenes@scylladb.com>
2021-07-19 15:47:37 +03:00
Wojciech Mitros
507bdfc36a tests: add test for skipping bytes at end of consumer
The new tests confirms that the regression issue, where
we didn't correctly skip bytes at the end of a
continuous_data_consumer range, is fixed.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-07-19 14:42:38 +02:00
Wojciech Mitros
7107e32390 continuous_data_consumer: properly skip bytes at the end of a range
When skipping bytes at the end of a continuous_data_consumer range,
the position of the consumer is moved after the skipped bytes, but
the position of the underlying input_stream is not.

This patch adds skipping of the underlying input_stream, to make
its position consistent with the position of the consumer.

Fixes #9024

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-07-19 11:43:30 +02:00
Piotr Sarna
38afef71b9 Merge 'Service Level Controller: Stop polling distributed data..
... when decommissioned (reworked)' from Eliran Sinvani

This is a rework of #8916 The polling loop of the service level
controller queries a distributed table in order to detect configuration
changes. If a node gets decommissioned, this loop continues to run until
shutdown, if a node stays in the decommissioned mode without being shut
down, the loop will fail to query the table and this will result in
warnings and eventually errors in the log. This is not really harmful
but it adds unnecessary noise to the log.  The series below lays the
infrastructure for observing storage service state changes, which
eventually being used to break the loop upon preparation for
decommissioning.  Tests: Unit test (dev) Failing tests in jenkins.

Fixes #8836

The previous merge (possibly due to conflict resolution) contained a
misplaced get that caused an abort on shutdown.

Closes #9035

* github.com:scylladb/scylla:
  Service Level Controller: Stop configuration polling loop upon leaving the cluster
  main: Stop using get_local_storage_service in main
2021-07-19 10:52:42 +02:00
Benny Halevy
3700702e90 cmake: update compaction source files location
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210718120906.701185-1-bhalevy@scylladb.com>
2021-07-19 11:47:35 +03:00
Botond Dénes
5aa733f933 sstables/mx/writer: initialize _range_tombstones at the end of the ctor
We need a permit to initialize said object which makes the semaphore
used and hence trigger an error if an exception is thrown in the
constructor. Move the initialization to the end of the constructor to
prevent this.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210719040449.9202-1-bdenes@scylladb.com>
2021-07-19 11:43:00 +03:00
Jan Ciolek
9bd62a07c9 cql-pytest: Test selecting using only clustering key requires filtering
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1.

This should require filtering, but doesn't.
Refs #8991.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-07-19 10:24:48 +02:00
Jan Ciolek
a041767aa3 cql-pytest: Test selecting from indexed table using clustering key
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1.

This currently fails.
Refs #8991.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-07-19 10:24:46 +02:00
Avi Kivity
2cfc517874 main, test: adjust number of networking iocbs
Seastar's default limit of 10,000 iocbs per shard is too low for
some workload (it places an upper bound on the number of idle
connections, above which a crash occurs). Use the new Seastar
feature to raise the default to 50000.

Also multiply the global reservation by 5, and round it upwards
so the number is less weird. This prevents io_setup() from failing.

For tests, the reservation is reduced since they don't create large
numbers of connections. This reduces surprise test failures when they
are run on machines that haven't been adjusted.

Fixes #9051

Closes #9052
2021-07-18 14:38:44 +03:00
Avi Kivity
9c3f8028f1 Update tools/java submodule (SLES 15)
* tools/java 79a441972d...4ef8049e07 (1):
  > dist/redhat: change PyYAML filepath to allow installing on SLES15

Fixes #9045.
2021-07-18 14:24:42 +03:00
Raphael S. Carvalho
841e9227f9 table: Document the serialization requirement on sstable set rebuild
In order to avoid data loss bugs, that could come due to lack of
serialization when using the preemptable build_new_sstable_list(),
let's document the serialization requirement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210714201301.188622-1-raphaelsc@scylladb.com>
2021-07-17 18:09:00 +03:00
Avi Kivity
df822e09e0 Merge "Run test cases in parallel" from Pavel E
"
The debug-mode tests nowadays take ~1 hours to complete on a
24-cores threadripper machine. This is mostly because of a bunch
of individual test cases that run sequentially (since they sit
in one test) each taking half-an-hour and longer.

The previous attempt was to break the longest tests into pieces,
and to update the list of long-running test in suite.yaml file,
but the concern was that the linkage time and disk space would
grow without limits if this continues. Also the long-running tests
list needs to be revisited every so often.

So the new attempt is to resurrect Avi's patch that ran test
cases in parallel for boost tests. This set applies parallelizm
to all tests and allows to blacklist those that shound't (the
logalloc needs the very first case to prime_segment_pools so
that other cases run smoothly, thus is cannot be parallelized).

Although this wild parallelizm adds an overhead for _each_ test
case this is good enough even for short dev-mode tests (saves
25% of runtime), but greatly relaxes the maintenance of the
"parallelizable list of tests".

For debug tests the problem is not 100% solved. There are 6 cases
that run longer than 30min,  while all the others complete much-
-much faster. So if excluding those slow 6 cases the full parallel
run saves 50+% of the runtime -- 60+m now vs 25m with the patch.
Those 6 slowest cases will need more incremental care.

The --parallel-cases mode is not yet default, because it requires
larger max-aio-nr value to be set, which is not (yet?) automatic.
Also it sometimes hits nr-open-files limit, which also needs more
work.

tests: unit(dev), unit(debug)
"

* 'br-parallel-testpy-3' of https://github.com/xemul/scylla:
  tests: Update boost long tests list
  test.py: Parallelize test-cases run (for boost tests)
  test.py: Prepare BoostTest for running individual cases
  test.py: Prepare TestSuite::create_test() for parallelizm
  test.py: Treat shortname as composite
  test.py: Reformat tabluar output
2021-07-17 13:57:56 +03:00
Pavel Emelyanov
1ed582304d memtable_list: Shorten flush coalescing codeflow
The memtable_list::flush() maintains a shared_promise object
to coalesce the flushers until the get_flush_permit() resolves.
Also it needs to keep the extraneous flushes counter bumped
while doing the flush itself.

All this can be coded in a shorter form and without the need
to carry shared_promise<> around.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210716164237.10993-1-xemul@scylladb.com>
2021-07-17 00:42:20 +02:00
Avi Kivity
3058c42171 Update seastar submodule
* seastar 8ed9771ae9...ef320940c2 (6):
  > reactor: reactor_backend_aio: allow tuning number of network iocbs
Ref #9051.
  > aio_general_context: flush: handle io_submit short return
  > aio_general_context: prevent overflow
  > file: Do not assume nowait_works by default
  > Merge "reactor: use sched_clock consistently" from Michael
  > testing: Lazily create seastar::app thread
2021-07-16 18:07:10 +03:00
Pavel Emelyanov
9d59f1daf3 tests: Update boost long tests list
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-16 17:25:07 +03:00
Pavel Emelyanov
cbb4837b77 test.py: Parallelize test-cases run (for boost tests)
The parallelizm is acheived by listing the content of each (boost)
test and by adding a test for each case found appending the
'--run_test={case_name}' option.

Also few tests (logallog and memtable) have cases that depend on
each other (the former explicitly stated this in the head comment),
so these are marked as "no_parallel_cases" in the suite.yaml file.

In dev mode tests need 2m:5s to run by default. With parallelizm
(and updated long-running tests list) -- 1m 35s.

In debug mode there are 6 slow _cases_ that overrun 30 minutes.
They finish last and deserve some special (incremental) care. All
the other tests run ~1h by default vs ~25m in parallel.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-16 17:25:07 +03:00
Pavel Emelyanov
3cac5173b7 test.py: Prepare BoostTest for running individual cases
This means adding the casename argument to its describing class
and handling it:

1. appending to the shortname
2. adding the --run_test= argument to boost args

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-16 17:25:07 +03:00
Pavel Emelyanov
0baee5d423 test.py: Prepare TestSuite::create_test() for parallelizm
The method in question is in charge of creating a single
entry in the list of tests to be run. The BoostTestSuite's
method is about to create several entries and this patch
prepares it for this:

- makes it distinguish individual arguments
- lets it select the test.id value itself

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-16 17:25:07 +03:00
Pavel Emelyanov
a547677502 test.py: Treat shortname as composite
When running tests in parallel-cases mode the test.uname must
include the case name to make different log and xml files for
different runs and to show which exact case is run when shown
by the tabular-output. At the same time the test shortname
identifies the binary with the whole test.

This patch makes class Test treat the shortname argument as
a dot-separated string where the 0th component is the binary
with the test and the rest is how test identifies itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-16 17:25:07 +03:00
Pavel Emelyanov
f188dd3396 test.py: Reformat tabluar output
This change solves several issues that would arise with the
case-by-case run.

First, the currently printed name is "$binary_name.$id". For
case-by-case run the binary name would coinside for many cases
and it will be inconvenient to identify the test case. So
the tests uname is printed instead.

Second, the tests uname doesn't contain suite name (unlike the
test binary name which does), so this patch also adds the
explicit suite name back as a separate column (like MODE)

Third, the testname + casename string length will be far above
the allocated 50 characters, so the test name is moved at the
tail of the line.

Fourth, the total number of cases is 2100+, the field of 7
characters is not enough to print it, so it's extended.

Finally the test.py output would look like this for parallel run:
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------
[1/2108]     raft     dev   [ PASS ] etcd_test.test_progress_leader.40 0.06s
[2/2108]     raft     dev   [ PASS ] etcd_test.test_vote_from_any_state.45 0.03s
[3/2108]     raft     dev   [ PASS ] etcd_test.test_progress_flow_control.43 0.04s
[4/2108]     raft     dev   [ PASS ] etcd_test.test_progress_resume_by_append_resp.41 0.05s
[5/2108]     raft     dev   [ PASS ] etcd_test.test_leader_election_overwrite_newer_logs.44 0.04s
[6/2108]     raft     dev   [ PASS ] etcd_test.test_progress_paused.42 0.05s
[7/2108]     raft     dev   [ PASS ] etcd_test.test_log_replication_2.47 0.06s
...

or like this for regular:
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------
[1/184]      raft     dev   [ PASS ] fsm_test.41 0.06s
[2/184]      raft     dev   [ PASS ] etcd_test.40 0.06s
[3/184]      cql      dev   [ PASS ] cassandra_cql_test.2 1.87s
[4/184]      unit     dev   [ PASS ] btree_stress_test.30 1.82s
...

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-16 17:24:36 +03:00
Tomasz Grabiec
97aa335a60 Merge "test: raft: randomized_nemesis_test: refactors and improvements" from Kamil
A couple of improvements to prepare for the next patchset.

We move `logical_timer` and `ticker` to their own headers due to the
generality of these data structures. They are not very specific to the
test.

`logical_timer` is extended with a `schedule` function, allowing to
schedule any given function to be called at the given time point.

The interface of `network` in `randomized_nemesis_test` is extended by
`add_grudge` and `remove_grudge` functions for implementing network
partitioning nemeses.
Furthermore `network` can be now constructed with an arbitrary network
delay, which was previously hardcoded.

`with_env_and_ticker` is now generic w.r.t. return values (previously
`future<>` was assumed).

`environment` exposes a reference to the `network` through a getter.

The `not_a_leader` exception now shows the leader's ID in the exception
message. Useful for logging.

In `logical_timer::with_timeout`, when we timeout, we don't just return
`timed_out_error`. The returned exception now actually contains the
original future... well almost; in any case, the user can now do
something different to the future other than simply discarding it.

We also fix some `broken_promise` exceptions appearing in discarded
futures in certain scenarios. See the corresponding commit for detailed
explanation.

We handle `raft::dropped_entry` in the `call` function.

`persistence` is fixed to avoid creating gaps in the log when storing
snapshots and to support complex state types.

Waiting for leader was refactored into a separate function and
generalized (we wait for a set of nodes to elect a leader instead of a
single node to elect itself) to be useful in more situations.

Finally, we introduce `reconfigure`, a higher-level version of
`set_configuration` which performs error handling and supports timeouts.

* kbr/raft-nemesis-improvements-v4:
  test: raft: randomized_nemesis_test: `reconfigure` function
  test: raft: randomized_nemesis_test: refactor waiting for leader into a separate function
  test: raft: randomized_nemesis_test: persistence: avoid creating gaps in the log when storing snapshots
  test: raft: randomized_nemesis_test: persistence: handle complex state types
  test: raft: randomized_nemesis_test: `call`: handle `raft::dropped_entry`
  test: raft: randomized_nemesis_test: impure_state_machine/call: handle dropped channels
  test: raft: randomized_nemesis_test: environment: expose the network
  test: raft: randomized_nemesis_test: configurable network delay and FD convict threshold
  test: raft: randomized_nemesis_test: generalize `with_env_and_ticker`
  test: raft: randomized_nemesis_test: network: `add_grudge`, `remove_grudge` functions
  test: raft: randomized_nemesis_test: move `ticker` to its own header
  test: raft: randomized_nemesis_test: ticker: take `logger` as a constructor parameter
  test: raft: logical_timer: handle immediate timeout
  test: raft: logical_timer: on timeout, return the original future in the exception
  test: raft: logical_timer: add `schedule` member function
  test: raft: randomized_nemesis_test: move `logical_timer` to its own header
  test: raft: include the leader's ID in the `not_a_leader` exception's message
2021-07-16 16:12:05 +02:00
Benny Halevy
a44c06d776 storage_proxy: query: log also errors
If log trace level is enabled, log also error.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210712070509.24102-1-bhalevy@scylladb.com>
2021-07-16 16:12:05 +02:00
Nadav Har'El
5183e0cbe9 Merge 'Fix artificial view update size limit' from Piotr Sarna
The series which split the view update process into smaller parts
accidentally put an artificial 10MB limit on the generated mutation
size, which is wrong - this limit is configurable for users,
and, what's more important, this data was already validated when
it was inserted into the base table. Thus, the limit is lifted.

The series comes with a cql-pytest which failed before the fix and succeeds now. This bug is also  covered by `wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view` dtest, but it needs over a minute to run, as opposed to cql-pytest's <1 second.

Fixes #9047

Tests: unit(release), dtest(wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view)

Closes #9048

* github.com:scylladb/scylla:
  cql-pytest: add a materialized views suite with first cases
  db,view: drop the artificial limit on view update mutation size
2021-07-15 17:03:07 +03:00
Piotr Sarna
c05340c4bf cql-pytest: add a materialized views suite with first cases
cql-pytest did not have a suite for materialized views, so one is
created. At the same time, test cases for building/updating a view on
a base table with large cells is added as a regression test for #9047.
2021-07-15 15:40:38 +02:00
Piotr Sarna
697e2fc66d db,view: drop the artificial limit on view update mutation size
The series which split the view update process into smaller parts
accidentally put an artificial 10MB limit on the generated mutation
size, which is wrong - this limit is configurable for users,
and, what's more important, this data was already validated when
it was inserted into the base table. Thus, the limit is lifted.

Tests: unit(release), dtest(wide_rows_test)
2021-07-15 14:09:37 +02:00
Tomasz Grabiec
1f255c420e flat_mutation_reader_v2: Make is_end_of_stream() reflect consumer-side state of the stream
Currently, flat_mutation_reader_v2::is_end_of_stream() returns
flat_mutation_reader_v2::impl::_end_of_stream, which means the producer
is done. The stream may be still not yet fully consumed even if
producer is done, due to internal buffering. So consumers need to make
a more elaborate check:

  rd.is_end_of_stream() && rd.is_buffer_empty()

It would be cleaner if flat_mutation_reader_v2::is_end_of_stream()
returned the state of the consumer-side of the stream, since it
belongs to the consumer-side of the API. The consumption will be as
simple as:

  while (!rd.is_end_of_stream()) {
    consume_fragment(rd());
  }

This patch makes the change on the v2 of the reader interface. v1 is
not changed to avoid problems which could happen when backporting code
which assumes new semantics into a version with the old semantics. v2
is not in any old branch yet so it doesn't have this problem and it's
a good time to make the API change.

Note that it's always safe to use the new semantics in the context
which assumes the old semantics, so v1 users can be safely converted
to v2 even if they are unware of the change.

Fixes #3067

Message-Id: <20210715102833.146914-1-tgrabiec@scylladb.com>
2021-07-15 14:00:48 +03:00
Calle Wilund
b8b5f69111 messaging_service: Bind to listen address, not broadcast
Refs #8418

Broadcast can (apparently) be an address not actually on machine, but
on the other side of NAT. Thus binding local side of outgoing
connection there will fail.
Bind instead to listen_address (or broadcast, if listen_to_broadcast),
this will require routing + NAT to make the connection looking
like from broadcast from node connected to, to allow the connection
(if using partial encryption).

Note: this is somewhat verified somewhat limitedly. I would suggest
verifying various multi rack/dc setups before relying on it.

Closes #8974
2021-07-15 13:18:10 +03:00
Avi Kivity
ed6c01a9fa test: increase timeout to account for flat_mutation_reader_v2 tests
Since fce124bd90 ("Merge "Introduce flat_mutation_reader_v2" from
Tomasz") tests involving mutation_reader are a lot slower due to
the new API testing. On slower machines it's enough to time out.

Work underway to improve the situation, and it will also revert back
to the original timing once the flat_mutation_reader_v2 work is done,
but meanwhile, increase the timeout.

Closes #9046
2021-07-15 12:33:43 +03:00
Avi Kivity
1643549d08 Merge 'Coroutinize the sstable reader' from Wojciech Mitros
This patch applies the same changes to both kl and mx sstable readers, but because the kl reader is old, we'll focus on the newer one.

This patch makes the main sstable reader process a coroutine,
allowing to simplify it, by:

- using the state saved in the coroutine instead of most of the states saved in the _state variable
- removing the switch statement and moving the code of former switch cases, resulting in reduced number of jumps in code
- removing repetitive ifs for read statuses, by adding them to the coroutine implementation

The coroutine is saved in a new class ```processing_result_generator```, which works like a generator: using its ```generate()``` method, one can order the coroutine to continue until it yields a data_consumer::processing_result value, which was achieved previously by calling the function that is now the coroutine(```do_process_state()```).

Before the patch, the main processing method had 558 lines. The patch reduces this number to 345 lines.

However, usage of c++ coroutines has a non-negligible effect on the performance of the sstable reader.
In the test cases from ```perf_fast_forward``` the new sstable reader performs up to 2% more instructions (per fragment) than the former implementation, and this loss is achieved for cases where we're reading many subsequent rows, without any skips.
Thanks to finding an optimization during the development of the patch, the loss is mitigated when we do skip rows, and for some cases, we can even observe an improvement.
You can see the full results in attached files: [old_results.txt](https://github.com/scylladb/scylla/files/6793139/old_results.txt), [new_results.txt](https://github.com/scylladb/scylla/files/6793140/new_results.txt)

Test: unit(dev)
Refs: #7952

Closes #9002

* github.com:scylladb/scylla:
  mx sstable reader: reduce code blocks
  mx sstable reader: make ifs consistent
  sstable readers: make awaiter for read status
  mx sstable reader: don't yield if the data buffer is not empty
  mx sstable reader: combine FLAGS and FLAGS_2 states
  mx sstable reader: reduce placeholder state usage
  mx sstable reader: replace non_consuming states with a bool
  mx sstable reader: reduce placeholder state usage
  mx sstable reader: replace unnecessary states with a placeholder
  mx sstable reader: remove false if case
  mx sstable reader: remove row_body_missing_columns_label
  mx sstable reader: remove row_body_deletion_label
  mx sstable reader: remove column_end_label
  mx sstable reader: remove column_cell_path_label
  mx sstable reader: remove column_ttl_label
  mx sstable reader: remove column_deletion_time_label
  mx sstable reader: remove complex_column_2_label
  mx sstable reader: remove row_body_missing_columns_read_columns_label
  mx sstable reader: remove row_body_marker_label
  mx sstable reader: remove row_body_shadowable_deletion_label
  mx sstable reader: remove row_body_prev_size_label
  mx sstable reader: remove ck_block_label
  mx sstable reader: remove ck_block2_label
  mx sstable reader: remove clustering_row_label and complex_column_label
  mx sstable reader: remove labels with only one goto
  mx sstable reader: replace the switch cases with gotos and a new label
  mx sstable reader: remove states only reached consecutively or from goto
  mx sstable reader: remove switch breaks for consecutive states
  mx sstable reader: convert readers main method into a coroutine
  kl sstable reader: replace states for ending with one state, simplify non_consuming
  kl sstable reader: remove unnecessary states
  kl sstable reader: remove unnecessary yield
  kl sstable reader: remove unnecessary blocks
  kl sstable reader: fix indentation
  kl sstable reader: replace switch with standard flow control
  kl sstable reader: remove state::CELL case
  kl sstable reader: move states code only reachable from one place
  kl sstable reader: remove states only reached consecutively
  kl sstable reader: remove switch breaks for consecutive states
  kl sstable reader: remove unreachable case
  kl sstable reader: move testing hack for fragmented buffers outside the coroutine
  kl sstable reader: convert readers main method into a coroutine
  sstable readers: create a generator class for coroutines
2021-07-15 12:06:14 +03:00
Wojciech Mitros
45058776c2 mx sstable reader: reduce code blocks
Some blocks of code were surrounded by curly braces, because
a variable was declared inside a switch case. After changes,
some of the variable declarations are in if/else/while cases,
and no longer need to be in separate code blocks, while other
blocks can be extended to entire labels for simplicity.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
9b333908e4 mx sstable reader: make ifs consistent
In several places we're checking the return value of our
consumers' consume_* calls. Because the behaviour in all cases
is the same, let us use the same notation as well.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
dc38605f75 sstable readers: make awaiter for read status
After each read* call of the primitive_consumer we need to check
if the entire primitive was in our current buffer. We can check it
in the proceed_generator object by yielding the returned read status:
if the yielded status is ready, the yield_value method returns
a structure whose await_ready() method returns true. Otherwise it
returns false.
The returned structure is co_awaited by the coroutine (due to co_yield),
and if await_ready() returns true, the coroutine isn't stopped,
conversely, if it returns false, (technical: and because its await_suspend
methods returns void) the coroutine stops, and a proceed::yes value
is saved, indicating that we need more buffers.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
09a0cd7c05 mx sstable reader: don't yield if the data buffer is not empty
The skip() method returns a skip_bytes object if we want to
skip the entire buffer, otherwise it returns a proceed::yes
and trims the buffer.

If the buffer is only trimmed we don't need to interrupt
the coroutine, we simply continue instead.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
5dc64532bd mx sstable reader: combine FLAGS and FLAGS_2 states
We don't differentiate between FLAGS and FLAGS_2 in
verify_end_state(), so we can merge them into one state.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
ab1e6f4211 mx sstable reader: reduce placeholder state usage
After the changes to non_consuming states, we can
remove some state::OTHER assignments again.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
c904ab12c8 mx sstable reader: replace non_consuming states with a bool
The non_consuming() method is only used after assuring that
primitive_consumer::active() (in continuous_data_consumer::process())
so we don't need states where primitive_consumer::active(), which
is most of them.

We still need to make sure that the states change when they need to,
so we replace all the concerned states with the placeholder state,
and for the few states from the non_consuming() OR, where the
primitive_consumer::active() returns true, we set the value of
_consuming to false, changing it back when the state is no longer
non_consuming.
2021-07-14 20:50:30 +02:00