Commit Graph

31418 Commits

Author SHA1 Message Date
Nadav Har'El
c8a3d0758a test/cql-pytest: de-duplicate code checking for an old buggy driver
We have in test_filtering.py two tests which fail when running on an old
version of the Python driver which has a specific bug, so we skip those
tests if the buggy driver is installed.

But the code to check the driver version is duplicated twice, so in this
patch we move the version-checking-and-skipping code to a fixture, which
we can use twice.

The motivation is that in the next patch we will want to introduce a
third use of the same code - and a fixture is cleaner than a third
duplicate.

This patch is supposed to be code-movement only, without functional
changes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-09 14:23:40 +03:00
Takuya ASADA
c82da0ea8e install-dependencies.sh: add scylla-api-client PIP package
Add scylla-api-client PIP package, and install into scylla-python3
package.

See scylladb/scylla-machine-image#340

Closes #10728

[avi: regenerate frozen toolchain]

Closes #10732
2022-06-07 09:43:50 +03:00
Takuya ASADA
ad2344a864 scylla_coredump_setup: support new format of Storage field
Storage field of "coredumpctl info" changed at systemd-v248, it added
"(present)" on the end of line when coredump file available.

Fixes #10669

Closes #10714
2022-06-07 02:21:32 +03:00
Avi Kivity
e0670f0bb5 Merge 'memtable, cache: Eagerly compact data with tombstones' from Tomasz Grabiec
When memtable receives a tombstone it can happen under some workloads
that it covers data which is still in the memtable. Some workloads may
insert and delete data within a short time frame. We could reduce the
rate of memtable flushes if we eagerly drop tombstoned data.

One workload which benefits is the raft log. It stores a row for each
uncommitted raft entry. When entries are committed they are
deleted. So the live set is expected to be short under normal
conditions.

Fixes #652.

Closes #10612

* github.com:scylladb/scylla:
  memtable: Add counters for tombstone compaction
  memtable, cache: Eagerly compact data with tombstones
  memtable: Subtract from flushed memory when cleaning
  mvcc: Introduce apply_resume to hold state for partition version merging
  test: mutation: Compare against compacted mutations
  compacting_reader: Drop irrelevant tombstones
  mutation_partition: Extract deletable_row::compact_and_expire()
  mvcc: Apply mutations in memtable with preemption enabled
  test: memtable: Make failed_flush_prevents_writes() immune to background merging
2022-06-07 02:17:09 +03:00
Tomasz Grabiec
0bc45f9666 memtable: Add counters for tombstone compaction 2022-06-06 19:25:41 +02:00
Tomasz Grabiec
beadd248e3 memtable, cache: Eagerly compact data with tombstones
When memtable receives a tombstone it can happen under some workloads
that it covers data which is still in the memtable. Some workloads may
insert and delete data within a short time frame. We could reduce the
rate of memtable flushes if we eagerly drpo tombstoned data.

One workload which benefits is the raft log. It stores a row for each
uncommitted raft entry. When entries are committed they are
deleted. So the live set is expected to be short under normal
conditions.

Fixes #652.
2022-06-06 19:25:41 +02:00
Tomasz Grabiec
9135d1fd1f memtable: Subtract from flushed memory when cleaning
This patch prevents virtual dirty from going negative during memtable
flush in case partition version merging erases data previously
accounted by the flush reader. There is an assert in
~flush_memory_accounter which guards for this.

This will start happening after tombstones are compacted with rows on
partition version merging.

This problem is prevented by the patch by having the cleaner notify
the memtable layer via callback about the amount of dirty memory released
during merging, so that the memtable layer can adjust its accounting.
2022-06-06 19:25:41 +02:00
Tomasz Grabiec
989ef88e26 mvcc: Introduce apply_resume to hold state for partition version merging
Partition version merging is preemptable. It may stop in the middle
and be resumed later. Currently, all state is kept inside the versions
themselves, in the form of elements in the source version which are
yet to be moved. This will change once we add compaction (tombstones
with rows) into the merging algorithm. There, state cannot be encoded
purley within versions. Consider applying a partition tombstone over
large number of rows.

This patch introduces apply_rows object to hold the necessary state to
make sure forward progress in case of preemption.

No change in behavior yet.
2022-06-06 19:25:41 +02:00
Tomasz Grabiec
374234cf76 test: mutation: Compare against compacted mutations
Memtables and cache will compact eagerly, so tests should not expect
readers to produce exact mutations written, only those which are
equivalant after applying copmaction.
2022-06-06 19:25:40 +02:00
Tomasz Grabiec
604e720706 compacting_reader: Drop irrelevant tombstones
The compacting reader created using make_compacting_reader() was not
dropping range_tombstone_change fragments which were shadowed by the
partition tombstones. As a result the output fragment stream was not
minimal.

Lack of this change would cause problems in unit tests later in the
series after the change which makes memtables lazily compact partition
versions. In test_reverse_reader_reads_in_native_reverse_order we
compare output of two readers, and assume that compacted streams are
the same. If compacting reader doesn't produce minimal output, then
the streams could differ if one of them went through the compaction in
the memtable (which is minimal).
2022-06-06 19:23:37 +02:00
Tomasz Grabiec
080c403d0b mutation_partition: Extract deletable_row::compact_and_expire() 2022-06-06 19:23:37 +02:00
Tomasz Grabiec
0e3c4fc641 mvcc: Apply mutations in memtable with preemption enabled
Preerequisite for eagerly applying tombstones, which we want to be
preemptible. Before the patch, apply path to the memtable was not
preemptible.

Because merging can now be defered, we need to involve snapshots to
kick-off background merging in case of preemption. This requires us to
propagate region and cleaner objects, in order to create a snapshot.
2022-06-06 19:23:37 +02:00
Tomasz Grabiec
0e78ad50ea test: memtable: Make failed_flush_prevents_writes() immune to background merging
Before the change, the test artificiallu set the soft pressure
condition hoping that the background flusher will flush the
memtable. It won't happen if by the time the background flusher runs
the LSA region is updated and soft pressure (which is not really
there) is lifted. Once apply() becomes preemptibe, backgroun partition
version merging can lift the soft pressure, making the memtable flush
not occur and making the test fail.

Fix by triggering soft pressure on retries.
2022-06-06 19:23:37 +02:00
Botond Dénes
605ee74c39 Merge 'sstables: save Scylla version & build id in metadata' from Michael Livshin
To provide a reasonably-definitive answer to "what exact version of
Scylla wrote this?".

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>

Closes #10712

* github.com:scylladb/scylla:
  docs: document recently-added Scylla sstable metadata sections
  sstables: save Scylla version & build id in metadata
  scylla_sstable: generalize metadata visitor for disk_string
  build_id: cache the value
2022-06-03 07:49:51 +03:00
Botond Dénes
49215fcff7 Merge 'Remove flat_mutation_reader (v1)' from Michael Livshin
- Introduce a simpler substitute for `flat_mutation_reader`-resulting-from-a-downgrade that is adequate for the remaining uses but is _not_ a full-fledged reader (does not redirect all logic to an `::impl`, does not buffer, does not really have `::peek()`), so hopefully carries a smaller performance overhead. The name `mutation_fragment_v1_stream` is kind of a mouthful but it's the best I have
- (not tests) Use the above instead of `downgrade_to_v1()`
- Plug it in as another option in `mutation_source`, in and out
- (tests) Substitute deliberate uses of `downgrade_to_v1()` with `mutation_fragment_v1_stream()`
- (tests) Replace all the previously-overlooked occurrences of `mutation_source::make_reader()` with  `mutation_source::make_reader_v2()`, or with `mutation_source::make_fragment_v1_stream()` where deliberate or still required (see below)
- (tests) This series still leaves some tests with `mutation_fragment_v1_stream` (i.e. at v1) where not called for by the test logic per se, because another missing piece of work is figuring out how to properly feed `mutation_fragment_v2` (i.e. range tombstone changes) to `mutation_partition`.  While that is not done (and I think it's better to punt on it in this PR), we have to produce `mutation_fragment` instances in tests that `apply()` them to `mutation_partition`, thus we still use downgraded readers in those tests
- Remove the `flat_mutation_reader` class and things downstream of it

Fixes #10586

Closes #10654

* github.com:scylladb/scylla:
  fix "ninja dev-headers"
  flat_mutation_reader ist tot
  tests: downgrade_to_v1() -> mutation_fragment_v1_stream()
  tests: flat_reader_assertions: refactor out match_compacted_mutation()
  tests: ms.make_reader() -> ms.make_fragment_v1_stream()
  repair/row_level: mutation_fragment_v1_stream() instead of downgrade_to_v1()
  stream_transfer_task: mutation_fragment_v1_stream() instead of downgrade_to_v1()
  sstables_loader: mutation_fragment_v1_stream() instead of downgrade_to_v1()
  mutation_source: add ::make_fragment_v1_stream()
  introduce mutation_fragment_v1_stream
  tests: ms.make_reader() -> ms.make_reader_v2()
  tests: remove test_downgrade_to_v1_clear_buffer()
  mutation_source_test: fix indentation
  tests: remove some redundant calls to downgrade_to_v1()
  tests: remove some to-become-pointless ms.make_reader()-using tests
  tests: remove some to-become-pointless reader downgrade tests
2022-06-03 07:26:29 +03:00
Michael Livshin
9a541c7c58 docs: document recently-added Scylla sstable metadata sections
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-06-02 19:40:52 +03:00
Kamil Braun
72f629c2b6 test: cdc_enable_disable_test: remove non-determinism
The test sometimes fails because the order of rows in the SELECT results
depends on how stream IDs for the different partition keys get generated.
In some runs the stream ID for pk=1 may go before the stream ID for
pk=4, in some runs the other way.

The fix is to use the same partition key but different clustering keys
for the different rows.

Refs: #10601

Closes #10718
2022-06-02 19:40:07 +03:00
Botond Dénes
0a25a2bff3 sstables: validate_checksums(): more readable checksum mismatch messages
Replace:

    Compressed chunk checksum mismatch at chunk {}, offset {}, for chunk of size {}: expected={}, actual={}

With:

    Compressed chunk checksum mismatch at offset {}, for chunk #{} of size {}: expected={}, actual={}

This is a follow-up for #10693. Also bring the uncompressed chunk
checksum check messages up to date with the compressed one (which #10693
forgot to do).
Another change included is merging the advancement of the chunk index
with the iteration over the chunks, so we don't maintain two counters
(one in the iterator and an explicit one).

Closes #10715
2022-06-02 19:38:39 +03:00
Anna Stuchlik
a309c2a1b6 conf: update the description of the seeds parameter in scylla.yaml
Closes #10719
2022-06-02 18:45:11 +03:00
Michael Livshin
fc1b957367 sstables: save Scylla version & build id in metadata
To provide a reasonably-definitive answer to "what exact version of
Scylla wrote this?".

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-06-02 11:21:05 +03:00
Michael Livshin
b60bc8bb8a scylla_sstable: generalize metadata visitor for disk_string
Some metadata fields have interesting types, and some are just
strings.  There can be more than one string field, which the visitor
would not be able to distinguish from one another by type alone, so no
reason to make `scylla_metadata::sstable_origin` special.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-06-02 11:21:05 +03:00
Michael Livshin
80c9455413 build_id: cache the value
The CPU cost of iterating over the relevant ELF structures is probably
negligible (despite the amount of code involved), but there is no need
to keep the containing page mapped in RAM when it doesn't have to be.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-06-02 11:21:05 +03:00
Avi Kivity
f5062f4b5a Merge 'Use generation_type for SSTable ancestors' from Raphael "Raph" Carvalho
To avoid a discrepancy about underlying generation type once something other than integer is allowed for the sstable generation.
Also simplifies one generic writer interface for sealing sstable statistics.

Closes #10703

* github.com:scylladb/scylla:
  sstables: Use generation_type for compaction ancestors
  sstables: Make compaction ancestors optional when sealing statistics
2022-06-01 19:55:08 +03:00
Michael Livshin
632b4e5a9a fix "ninja dev-headers"
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
029508b77c flat_mutation_reader ist tot
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
2a91323051 tests: downgrade_to_v1() -> mutation_fragment_v1_stream()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
eabe568d1c tests: flat_reader_assertions: refactor out match_compacted_mutation()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
a08ee649fc tests: ms.make_reader() -> ms.make_fragment_v1_stream()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
7a11a22cd6 repair/row_level: mutation_fragment_v1_stream() instead of downgrade_to_v1()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
8305ac26ca stream_transfer_task: mutation_fragment_v1_stream() instead of downgrade_to_v1()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
00bee4e0b3 sstables_loader: mutation_fragment_v1_stream() instead of downgrade_to_v1()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
00b2e7b2c5 mutation_source: add ::make_fragment_v1_stream()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
1b98692c8c introduce mutation_fragment_v1_stream
At this point, none of the remaining uses of
`flat_mutation_reader` (all of which are results of calling
`downgrade_to_v1()` anyway) actually need a full-featured flat
mutation reader with its own separate buffer etc.

`mutation_fragment_v1_stream` can only be constructed by wrapping a
`flat_mutation_reader_v2`, contains enough functionality for the
remaining consumers of `mutation_fragment_v1` sources and unit tests
and no more, and does not buffer.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
d137b32994 tests: ms.make_reader() -> ms.make_reader_v2()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
1a9e0ed73d tests: remove test_downgrade_to_v1_clear_buffer()
The projected limited replacement of downgraded v1 mutation reader
will not do its own buffering, so this test will be pointless.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
66ceb32612 mutation_source_test: fix indentation
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
b9ada78ec2 tests: remove some redundant calls to downgrade_to_v1()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
63a61ccaad tests: remove some to-become-pointless ms.make_reader()-using tests
mutation_source are going to be created only from v2 readers and the
::make_reader() method family is scheduled for removal.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Michael Livshin
b288cc4f9f tests: remove some to-become-pointless reader downgrade tests
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-05-31 23:42:34 +03:00
Raphael S. Carvalho
2a7eb16c02 sstables: Use generation_type for compaction ancestors
Let's also use generation_type for compaction ancestors, so once we
support something other than integer for SSTable generation, we
won't have discrepancy about what the generation type is.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-05-31 15:28:02 -03:00
Raphael S. Carvalho
d36604703f sstables: Make compaction ancestors optional when sealing statistics
Compaction ancestors is only available in versions older than mx,
therefore we can make it optional in seal_statistics(). The motivation
is that mx writer will no longer call sstable::compaction_ancestors()
which return type will be soon changed to type generation_type, so the
returned value can be something other than an integer, e.g. uuid.
We could kill compaction_ancestors in seal_statistics interface, but
given that most generic write functions still work for older versions,
if there were still a writer for them, I decided to not do it now.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-05-31 15:26:03 -03:00
Calle Wilund
adda43edc7 CDC - do not remove log table on CDC disable
Fixes #10489

Killing the CDC log table on CDC disable is unhelpful in many ways,
partly because it can cause random exceptions on nodes trying to
do a CDC-enabled write at the same time as log table is dropped,
but also because it makes it impossible to collect data generated
before CDC was turned off, but which is not yet consumed.

Since data should be TTL:ed anyway, retaining the table should not
really add any overhead beyond the compaction to eventually clear
it. And user did set TTL=0 (disabled), then he is already responsible
for clearing out the data.

This also has the nice feature of meshing with the alternator streams
semantics.

Closes #10601
2022-05-31 19:07:07 +03:00
Konstantin Osipov
94a192a7aa Revert "test.py: temporarily disable raft"
This reverts commit 26128a222b.

The issue the commit depends on is fixed, so enable raft back.

Closes #10694
2022-05-31 14:39:26 +03:00
Avi Kivity
41b098f54e Udpate tools/jmx submodule (jackson dependency update)
* tools/jmx 53f7f55...fe351e8 (1):
  > Update jackson dependency
2022-05-31 13:46:46 +03:00
Mikołaj Sielużycki
bc18e97473 sstable_writer: Fix mutation order violation
The change
- adds a test which exposes a problem of a peculiar setup of
tombstones that trigger a mutation fragment stream validation exception
- fixes the problem

Applying tombstones in the order:

range_tombstone_change pos(ck1), after_all_prefixed, tombstone_timestamp=1
range_tombstone_change pos(ck2), before_all_prefixed, tombstone=NONE
range_tombstone_change pos(NONE), after_all_prefixed, tombstone=NONE

Leads to swapping the order of mutations when written and read from
disk via sstable writer. This is caused by conversion of
range_tombstone_change (in memory representation) to range tombstone
marker (on disk representation) and back.

When this mutation stream is written to disk, the range tombstone
markers type is calculated based on the relationship between
range_tombstone_changes. The RTC series as above produces markers
(start, end, start). When the last marker is loaded from disk, it's kind
gets incorrectly loaded as before_all_prefixed instead of
after_all_prefixed. This leads to incorrect order of mutations.

The solution is to skip writing a new range_tombstone_change with empty
tombstone if the last range_tombstone_change already has empty
tombstone. This is redundant information and can be safely removed,
while the logic of encoding RTCs as markers doesn't handle such
redundancy well.

Closes #10643
2022-05-31 13:39:48 +03:00
Piotr Sarna
7169e021e5 Merge 'cql3: support list subscripts in WHERE clause' from Avi Kivity
I noticed that `column_condition` (used in LWT `IF` clause) supports lists.
As part of the Grand Expression Unification we'll need to migrate that to
expressions, so we'll need to support list subscripts.

Use the opportunity to relax the normal filtering to allow filtering on
list subscripts: `WHERE my_list[:index] = :value`.

Closes #10645

* github.com:scylladb/scylla:
  test: cql-pytest: add test for list subscript filtering
  doc: document list subscripts usable in WHERE clause
  cql3: expr: drop restrictions on list subscripts
  cql3: expr: prepare_expr: support subscripted lists
  cql3: expressions: reindent get_value()
  cql3: expression: evaluate() support subscripting lists
2022-05-31 09:28:52 +02:00
Avi Kivity
4b53af0bd5 treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines
coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime
of the function object is less ambiguous, and so it is safer. Replace all eligible
occurences (i.e. caller is a coroutine).

One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra
attention since there was a handle_exception() continuation attached. It is converted
to a try/catch.

Closes #10699
2022-05-31 09:06:24 +03:00
Botond Dénes
02608bec9d Update tools/java submodule
* tools/java a4573759a2...d4133b54c9 (1):
  > removeNode: Remove other alias for --ignore-dead-nodes
2022-05-31 07:56:54 +03:00
Botond Dénes
660921eb22 Merge 'Two improvements to configure.py' from Nadav Har'El
This two-patch series makes two improvements to configure.py:

The first patch fixes, yet again, issue #4706 where interrupting ninja's rebuild of build.ninja can leave it without any build.ninja at all. The patch uses a different approach from the previous pull-request #10671 that aimed to solve the same problem.

The second patch makes the output of configure.py more reproducible, not resulting in a different random order every time. This is useful especially when debugging configure.py and wanting to check if anything changed in its output.

Closes #10696

* github.com:scylladb/scylla:
  configure.py: make build.ninja the same every time
  configure.py: don't delete build.ninja when rebuild is interrupted
2022-05-31 06:35:16 +03:00
Avi Kivity
248cdf0e34 test: cql-pytest: add test for list subscript filtering
Test match and mismatch, as well as out of bound cases.
2022-05-30 20:47:47 +03:00