Commit Graph

3451 Commits

Author SHA1 Message Date
Benny Halevy
c71ef330b2 query-request, everywhere: define and use query_id as a strong type
Define query_id as a tagged_uuid
So it can be differentiated from other uuid-class types.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:13:28 +03:00
Benny Halevy
2b017ce285 schema, everywhere: define and use table_schema_version as a strong type
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.

Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:45 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Benny Halevy
813cffc2b5 counters: counter_id: use base class create_random_id
Rather than defining generate_random,
and use respectively in unit tests.
(It was inherited from raft::internal::tagged_id.)

This allows us to shorten counter_id's definition
to just using utils::tagged_uuid<struct counter_id_tag>.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Benny Halevy
1fda686f96 idl: make idl headers self-sufficient
Add include statements to satisfy dependencies.

Delete, now unneeded, include directives from the upper level
source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Benny Halevy
da4f0aae37 idl-compiler: add include statements
For generating #include directives in the generated files,
so we don't have to hand-craft include the dependencies
in the right order.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Benny Halevy
4f275a17b4 idl_test: add a struct depending on UUID
For testing the next change which adds
import and include statements to the idl language.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Benny Halevy
56f336d1aa database: get rid of timestamp_func
Pass an optional truncated_at time_point to
truncate_table_on_all_shards instead of the over-complicated
timestamp_func that returns the same time_point on all shards
anyhow, and was only used for coordination across shards.

Since now we synchronize the internal execution phase in
truncate_table_on_all_shards, there is no longer need
for this timestamp_func.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
be56a73e78 database: add snapshot_table_on_all_shards
We need to snapshot a single table in several paths.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
d96b56fee2 database: rename {flush,snapshot}_on_all and make static
Follow the convention of drop_table_on_all_shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
46e2a7c83b database: add truncate_table_on_all_shards
As a first step to decouple truncate from flush
and snpashot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
5e8c05f1a8 database: drop_table_on_all_shards: do not accept a truncated_at
timestamp_func

Since in the drop_table case we want to discard ALL
sstables in the table, not only those with `max_data_age()`
up until drop started.

Fixes #11232

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:52:51 +03:00
Benny Halevy
9f5e13800d database_test: apply_mutation on the correct db shard
Following up on 1c26d49fba,
apply mutations on the correct db shard in all test cases
before we define and use database::truncate_table_on_all_shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 09:18:06 +03:00
Tomasz Grabiec
7f80602b01 db: range_tombstone_list: Avoid quadratic behavior when applying
Range tombstones are kept in memory (cache/memtable) in
range_tombstone_list. It keeps them deoverlapped, so applying a range
tombstone which covers many range tombstones will erase existing range
tombstones from the list. This operation needs to be exception-safe,
so range_tombstone_list maintains an undo log. This undo log will
receive a record for each range tombstone which is removed. For
exception safety reasons, before pushing an undo log entry, we reserve
space in the log by calling std::vector::reserve(size() + 1). This is
O(N) where N is the number of undo log entries. Therefore, the whole
application is O(N^2).

This can cause reactor stalls and availability issues when replicas
apply such deletions.

This patch avoids the problem by reserving exponentially increasing
amount of space. Also, to avoid large allocations, switches the
container to chunked_vector.

Fixes #11211

Closes #11215
2022-08-05 20:34:07 +03:00
Kamil Braun
d84a93d683 Merge 'Raft test topology part 1' from Alecco
These are the first commits out of #10815.

It starts by moving pytest logic out of the common `test/conftest.py`
and into `test/topology/conftest.py`, including removing the async
support as it's not used anywhere else.

There's a fix of a bug of leaving tables in `RandomTables.tables` after
dropping all of them.

Keyspace creation is moved out of `conftest.py` into `RandomTables` as
it makes more sense and this way topology tests avoid all the
workarounds for old version (topology needs ScyllaDB 5+ for Raft,
anyway).

And a minor fix.

Closes #11210

* github.com:scylladb/scylladb:
  test.py: fix type hint for seed in ScyllaServer
  test.py: create/drop keyspace in tables helper
  test.py: RandomTables clear list when dropping all tables
  test.py: move topology conftest logic to its own
  test.py: async topology tests auto run with pytest_asyncio
2022-08-05 17:56:16 +02:00
Alejo Sanchez
ec70e26f12 test.py: fix type hint for seed in ScyllaServer
Param seed can be None (e.g. first server) so fix type hint accordingly.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Alejo Sanchez
1d7789e5a9 test.py: create/drop keyspace in tables helper
Since all topology test will use the helper, create the keyspace in the
helper.

Avoid the need of dropping all tables per test and just drop the
keyspace.

While there, use blocking CQL execution so it can be used in the
constructor and avoids possible issues with scheduling on cleanup. Also,
creation and drop should happen only once per cluster and no test should
be running changes (either not started or finished).

All topology tests are for Scylla with Raft. So don't use the Cassandra
this_dc workaround as it's unnecessary for Scylla.

Remove return type of random_tables fixture to match other fixtures
everywhere else.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Alejo Sanchez
9a019628f5 test.py: RandomTables clear list when dropping all tables
Clear the list of active tables when dropping them.

While there do the list element exchange atomically across active and
removed tables lists.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Alejo Sanchez
f6aa0d7bd7 test.py: move topology conftest logic to its own
Move asyncio, Raft checks, and RandomTables to topology test suite's own
conftest file.

While there, use non-async version of pre-checks to avoid unnecessary
complexity (we want async tests, not async setup, for now).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Alejo Sanchez
f665779cdb test.py: async topology tests auto run with pytest_asyncio
Async tests and fixtures in the topology directory are expected to run
with pytest_asyncio (not other async frameworks). Force this with auto
mode.

CI has an older pytest_asyncio version lacking pytest_asyncio.fixture.
Auto mode helps avoiding the need of it and tests and fixtures can just
be marked with regular @pytest.mark.async.

This way tests can run in both older and newer versions of the packages.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-08-05 13:05:26 +02:00
Botond Dénes
fbbe2529c1 Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.

The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.

Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"

* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
  consistency_level: Remove is_local() and count_local_endpoints()
  storage_proxy: Use topology::local_endpoints_count()
  storage_proxy: Use proxy's topology for DC checks
  storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
  storage_proxy: Use topology local_dc_filter in its methods
  storage_proxy: Mark some digest_read_resolver methods private
  forwarding_service: Use topology local_dc_filter
  storage_service: Use topology local_dc_filter
  consistency_level: Use topology local_dc_filter
  consitency-level: Call count_local_endpoints from topology
  consistency_level: Get datacenter from topology
  replication_strategy: Remove hold snitch reference
  effective_replication_map: Get datacenter from topology
  topology: Add local-dc detection shugar
2022-08-05 13:31:55 +03:00
Pavel Emelyanov
00f166809e replication_strategy: Remove hold snitch reference
When the strategy is constructed there's no place to get snitch from
so the global instance is used. However, after previous patch the
replication strategy no longer needs snitch, so this dependency can
be dropped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:43 +03:00
Avi Kivity
785ea869fb Merge 'tools/scylla-sstable: introduce the write operation' from Botond Dénes
Implementing json2sstable functionality. It allows generating an sstable from a JSON description of its content. Uses identical schema to dump-data, so it is possible to regenerate an existing sstable, by feeding the output of dump-data to write.
Most of the scylla storage engine features are supported. The only non-supported features are counters and non-strictly atomic data types (including frozen collections, tuples and UDTs).

Example invocation:
```
scylla sstable write --system-schema system_schema.columns --input-file ./input.json --generation 0
```

Refs: https://github.com/scylladb/scylladb/issues/9681

Future plans:
* Complete support for remaining features (counters and non-atomic types).
* Make sstable format configurable on the command line.

Closes #11181

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add test for sstable write
  test/cql-pytest: test-tools.py actually test with multiple sstables
  test/cql-pytest: test_tools.py: reduce the number of test-cases
  tools/scylla-sstable: introduce the write operation
  tools/scylla-sstable: add support for writer operations
  tools/scylla-sstable: dump-data: write bound-weight as int
  tools/scylla-sstable: dump-data: always write deletion time for cell tombstones
  tools/scylla-sstable: dump-data: add timezone to deletion_time
  types: publish timestamp_from_string()
2022-08-03 19:18:31 +03:00
Botond Dénes
19441881bc test/cql-pytest: test_tools.py: add test for sstable write
We can now do a full circle: dump an sstable to json, generate an
sstable from it, then dump again and compare to the original json.
Expand the existing simple_no_clustering_table and
simple_clustering_table schema/data to improve coverage of things like
TTL, tombstones and static rows.
2022-08-03 14:00:50 +03:00
Botond Dénes
5d5c3b3fe3 test/cql-pytest: test-tools.py actually test with multiple sstables
The test-cases in this suite have a parameter to run with one or
multiple input sstables. This was broken as each test table generated a
single sstable. Fix this so we actually get single/multiple input
sstable coverage.
2022-08-03 14:00:50 +03:00
Botond Dénes
bd772d095f test/cql-pytest: test_tools.py: reduce the number of test-cases
Currently this test-case exercises all the available component dumpers
with many different schemas. This doesn't add any value for most of the
dumpers, save for the dump-data one. It does have a cost however in
run-time of these test-cases. Test the dumpers which are mostly
indifferent to the schema with just a single one, cutting the number of
generated test-cases from 70 to 30.
2022-08-03 14:00:50 +03:00
Avi Kivity
a4844826fc Merge 'Decouple compaction manager from database' from Benny Halevy
Start compaction_manager as a sharded service
and pass a reference to it to the database rather
than having the database construct its own compaction_manager.

This is part of the wider scope effort to decouple compaction from replica database and table.

Closes #11099

* github.com:scylladb/scylladb:
  compaction_manager: perform_cleanup, perform_sstable_upgrade: use a lw_shared_ptr for owned token ranges
  compaction: cleanup, upgrade: use a lw_shared_ptr for owned token ranges
  main: start compaction_manager as a sharded service
  compaction_manager: keep config as member
  backlog_controller: keep scheduling_group by value
  backlog_controller: scheduling_group: keep io_priority_class by value
  backlog_controller: scheduling_group: define default member initializers
  backlog_controller: get rid of _interval member
2022-08-02 19:02:46 +03:00
Avi Kivity
665c85aefe Merge 'multishard_mutation_query: don't unpop partition header of spent partition' from Botond Dénes
When stopping the read, the multishard reader will dismantle the
compaction state, pushing back (unpopping) the currently processed
partition's header to its originating reader. This ensures that if the
reader stops in the middle of a partition, on the next page the
partition-header is re-emitted as the compactor (and everything
downstream from it) expects.
It can happen however that there is nothing more for the current
partition in the reader and the next fragment is another partition.
Since we only push back the partition header (without a partition-end)
this can result in two partitions being emitted without being separated
by a partition end.
We could just add the missing partition-end when needed but it is
pointless, if the partition has no more data, just drop the header, we
won't need it on the next page.

The missing partition-end can generate an "IDL frame truncated" message
as it ends up causing the query result writer to create a corrupt
partition entry.

Fixes: https://github.com/scylladb/scylladb/issues/9482

Closes #11175

* github.com:scylladb/scylladb:
  test/cql-pytest: add regression test for "IDL frame truncated" error
  mutation_compactor: detach_state(): make it no-op if partition was exhausted
  querier: use full_position in shard_mutation_querier
2022-08-02 16:41:15 +03:00
Avi Kivity
268e4abe77 Merge 'wasm: reuse instances for wasm UDFs' from Wojciech Mitros
Calling WebAssembly UDFs requires wasmtime instance. Creating such an instance is expensive,
but these instances can be reused for subsequent calls of the same UDF on various inputs.

This patch introduces a way of reusing wasmtime instances: a wasm instance cache.
The cache stores a wasmtime instance for each UDF and scheduling group. The instances are
evicted using LRU strategy and their size is based on the size of their wasm memories.

The instances stored in the cache are also dropped when the UDF is dropped itself. For that reason,
the first patch modifies the current implementation of UDF dropping, so that the instance dropping may be added
later. The patch also removes the need of compiling the UDF again when dropping it.

The second patch contains the implementation and use of the new cache. The cache is implemented
in `lang/wasm_instance_cache.hh` and the main ways of using it are the `run_script` methods from `wasm.hh`

The third patch adds tests to `test_wasm.py` that check the correctness and performance of the new
cache. The tests confirm the instance reuse, size limits, instance eviction after timeout and after dropping the UDF.

Closes #10306

* github.com:scylladb/scylladb:
  wasm: test instances reuse
  wasm: reuse UDF instances
  schema_tables: simplify merge_functions and avoid extra compilation
2022-08-02 13:51:16 +03:00
Benny Halevy
14faa3b6f4 compaction_manager: perform_cleanup, perform_sstable_upgrade: use a lw_shared_ptr for owned token ranges
And completely get rid of the dependency on replica::database.

Also, add respective rest_api tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 08:08:11 +03:00
Benny Halevy
e1fe598760 compaction: cleanup, upgrade: use a lw_shared_ptr for owned token ranges
Currently they are copied for the get_sstables function
so this change reduces copies.

Also, it will allow further decoupling of compaction_manager
from replica::database, by letting the caller of
perform_cleanup and perform_sstable_upgrade get the
owned token ranges from db and pass it to the perform_*
functions in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:57:41 +03:00
Benny Halevy
e4e92d44ae main: start compaction_manager as a sharded service
And pass a reference to it to the database rather
than having the database construct its own compaction_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:50:15 +03:00
Botond Dénes
11af489e84 test/cql-pytest: add regression test for "IDL frame truncated" error 2022-08-02 06:43:24 +03:00
Piotr Sarna
d4abb73389 Merge 'scrub compaction: count validation errors...
and return status over the rest api' from Aleksandra Martyniuk

Currently, scrub returns to user the number indicating operation
result as follows:
- 1 when the operation was aborted;
- 3 in validate and segregate modes when validation errors were found
  (and in segregate mode - fixed);
- 0 if operation ended successfully.

To achieve so, if an operation was aborted in abort mode, then
the exception is propagated to storage_service.cc. Also the number
of validation errors for current scrub is gathered and summed
from each shard there.

The number of validation errors is counted and registered in metrics.
Metrics provide common counters for all scrub operation within
a compaction manager, though. Thus, to check the exact number
of validation errors, the comparison of counter value before and after
scrub operation needs to be done.

Closes #11074

* github.com:scylladb/scylladb:
  scrub compaction: return status indicating aborted operations over the rest api
  test: move scylla_inject_error from alternator/ to cql-pytest/
  scrub compaction: count validation errors and return status over the rest api
  scrub compaction: count validation errors for specific scrub task
  compaction: extract statistics in compaction_result
  scrub compaction: register validation errors in metrics
  scrub compaction: count validation errors
2022-08-01 12:05:00 +02:00
Avi Kivity
00cec159d6 Revert "Merge 'multishard_mutation_query: don't unpop partition header of spent partition' from Botond Dénes"
This reverts commit c3bad157e5, reversing
changes made to e66809d051. The checks it
adds are triggered by some dtests. While it's possible the check is
triggered due to an existing problem, better to investigate it out-of-tree.

Fixes #11169.
2022-07-31 15:24:33 +03:00
Aleksandra Martyniuk
6ea5bc96d7 scrub compaction: return status indicating aborted operations
over the rest api

Performing compaction scrub user did not know whether an operation
was aborted.

If compaction scrub is aborted, return status the user gets over
rest api is set to 1.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
8e892426e2 test: move scylla_inject_error from alternator/ to cql-pytest/
Move scylla_inject_error from alternator/ to cql-pytest/ so it
can be reached from various tests dirs. alternator/util.py is
renamed to alternator/alternator_util.py to avoid name shadowing.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
7d457cffb8 scrub compaction: count validation errors for specific scrub task
The number of validation errors per given compaction scrub on given
shard is passed up to perform_task() function.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
3a805a9d9b compaction: extract statistics in compaction_result
Statistics from compaction_result are extracted to new struct
compaction_stats and stored as a field of compaction_result.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
ab85dab05d scrub compaction: count validation errors
The number of validation errors encountered during scrub compaction
is counted.
2022-07-29 09:35:20 +02:00
Avi Kivity
c3bad157e5 Merge 'multishard_mutation_query: don't unpop partition header of spent partition' from Botond Dénes
When stopping the read, the multishard reader will dismantle the
compaction state, pushing back (unpopping) the currently processed
partition's header to its originating reader. This ensures that if the
reader stops in the middle of a partition, on the next page the
partition-header is re-emitted as the compactor (and everything
downstream from it) expects.
It can happen however that there is nothing more for the current
partition in the reader and the next fragment is another partition.
Since we only push back the partition header (without a partition-end)
this can result in two partitions being emitted without being separated
by a partition end.
We could just add the missing partition-end when needed but it is
pointless, if the partition has no more data, just drop the header, we
won't need it on the next page.

The missing partition-end can generate an "IDL frame truncated" message
as it ends up causing the query result writer to create a corrupt
partition entry.

Fixes: https://github.com/scylladb/scylla/issues/9482

Closes #11137

* github.com:scylladb/scylladb:
  test/cql-pytest: add regression test for "IDL frame truncated" error
  query: query_result_builder: add check for missing partition-end
  mutation_compactor: detach_state(): make it no-op if partition was exhausted
  querier: use full_position in shard_mutation_querier
2022-07-28 20:14:15 +03:00
Avi Kivity
e66809d051 Merge 'Memtable flush: wait for sstable count reduction if needed' from Benny Halevy
Called from try_flush_memtable_to_sstable,
maybe_wait_for_sstable_count_reduction will wait for
compaction to catch up with memtable flush if there
the bucket to compact is inflated, having too many
sstables.  In that case we don't want to add fuel
to the fire by creating yet another sstable.

Fixes #4116

Closes #10954

* github.com:scylladb/scylla:
  table: Add test where compaction doesn't keep up with flush rate.
  compaction_manager: add maybe_wait_for_sstable_count_reduction
  time_window_compaction_strategy: get_sstables_for_compaction: clean up code
  time_window_compaction_strategy: make get_sstables_for_compaction idempotent
  time_window_compaction_strategy: get_sstables_for_compaction: improve debug messages
  leveled_manifest: pass compaction_counter as const&
2022-07-28 19:11:04 +03:00
Avi Kivity
09a6b93ddf Merge 'logalloc: region: properly track listeners when moved' from Benny Halevy
Currently logalloc::region is relying on boost binomial_heap handle to properly move listeners registration when the region (when derived from dirty_memory_manager_logalloc::size_tracked_region) is moved, like boost::intrusive link hooks do -
hence 81e20ceaab/dirty_memory_manager.cc (L89-L90) does nothing.

Unfortunately, this doesn't work as expected.

This series adds a unit test that verifies the move semantics
and a fix to size_tracked_region and region_group code to make it pass.

Also "logalloc: region: get_impl might be called on disengaged _impl when moved"
fixes a couple corner cases where the shared _impl could be dereferenced when disengaged, and
the change also adds a unit test for that too.

Closes #11141

* github.com:scylladb/scylla:
  logalloc: region: properly track listeners when moved
  logalloc: region_impl: add moved method
  logalloc: region: merge: optimize getting other impl
  logalloc: region: merge: call region_impl::unlisten
  logalloc: region: call unlisten rather than open coding it
  logalloc: region move-ctor: initialize _impl
  logalloc: region: get_impl might be called on disengaged _impl when moved
2022-07-28 15:29:54 +03:00
Mikołaj Sielużycki
e0c6e1ef3c table: Add test where compaction doesn't keep up with flush rate.
The test simulates a situation where 2 threads issue flushes to 2
tables. Both issue small flushes, but one has injected reactor stalls.
This can lead to a situation where lots of small sstables accumulate on
disk, and, if compaction never has a chance to keep up, resources can be
exhausted.

(cherry picked from commit b5684aa96d)
(cherry picked from commit 25407a7e41)
2022-07-28 14:43:33 +03:00
Botond Dénes
26f1295536 Merge 'mutation: Ignore dummy rows when consuming clustering fragments' from Mikołaj Sielużycki
consume_clustering_fragments already ignores dummy rows, but does it in
the wrong place. Currently they're ignored after comparing them with
range tombstones. This change skips them before any useful work is done
with them.

Consider a simplified mutation reversal scenario scenario (ckp is
clustering key prefix, -1, 0, 1 are bound_weights):

schema_ptr s = schema_builder{"ks", "cf"}
    .with_column("pk", bytes_type, column_kind::partition_key)
    .with_column("ck1", bytes_type, column_kind::clustering_key)
    .build();

Input range tombstone positions:
    {clustered, ckp{}, before}
    {clustered, ckp{1}, after}

Clustering rows:
    {clustered, ckp{2}, equal}
    {clustered, ckp{}, after} // dummy row

During reversal, clustering rows are read backwards, and reversed range
tombstone positions are read forwards (because the range tombstones are
reversed and applied backwards). The read order in the example above is:

Reversed range tombstone positions:
    1: {clustered, ckp{}, before}
    2: {clustered, ckp{1}, before}

Clustering rows read backwards:
    3: {clustered, ckp{}, after} // dummy row
    4: {clustered, ckp{2}, equal}

Then we effectively do the merge part of merge sort, trying to put all
fragments in order according to their positions from the two lists
above. However, the dummy row is used in the comparison, and it compares
to be gt each of the reversed range tombstone positions. Then we
try to emit the clustering row, but only at that point we notice it's
dummy and should be skipped. Subsequent row with ckp{2} is compared to
the last used range tombstone position and the fragments are out of
order (in reversed schema, ckp{2} should come before ckp{1}).

The solution is to move the logic skipping the dummy clustering rows to
the beginning of the loop, so they can be ignored before they're used.

Fixes: https://github.com/scylladb/scylla/issues/11147

Closes #11129

* github.com:scylladb/scylla:
  mutation: Add test if mutations are consumed in order
  test: Move validating_consumer to test/lib/mutation_assertions.hh
  mutation: Ignore dummy rows when consuming clustering fragments
2022-07-28 11:18:36 +03:00
Benny Halevy
f6645313d8 logalloc: region: properly track listeners when moved
And add targeted unit tests for that.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 11:17:55 +03:00
Benny Halevy
c7d77e4076 logalloc: region: get_impl might be called on disengaged _impl when moved
First check if _impl is engaged before accessing it
to set its _region = this in the move constructor and
move assignment operator.

Add unit test for these odd orner cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 10:48:58 +03:00
Botond Dénes
079e425ef1 test/cql-pytest: add regression test for "IDL frame truncated" error 2022-07-28 09:02:28 +03:00
Avi Kivity
2c0932cc41 Merge 'Reduce the amount of per-table metrics' from Amnon Heiman
This series is the first step in the effort to reduce the number of metrics reported by Scylla.
The series focuses on the per-table metrics.

The combination of histograms, per-tables, and per shard makes the number of metrics in a cluster explode.
The following series uses multiple tools to reduce the number of metrics.
1. Multiple metrics should only be reported for the user tables and the condition that checked it was not updated when more non-user keyspaces were added.
2. Second, instead of a histogram, per table, per shard, it will report a summary per table, per shard, and a single histogram per node.
3. Histograms, summaries, and counters will be reported only if they are used (for example, the cas-related metrics will not be reported for tables that are not using cas).

Closes #11058

* github.com:scylladb/scylla:
  Add summary_test
  database: Reduce the number of per-table metrics
  replica/table.cc: Do not register per-table metrics for system
  histogram_metrics_helper.hh: Add to_metrics_summary function
  Unified histogram, estimated_histogram, rates, and summaries
  Split the timed_rate_moving_average into data and timer
  utils/histogram.hh: should_sample should use a bitmask
  estimated_histogram: add missing getter method
2022-07-27 22:01:08 +03:00
Avi Kivity
4438865a26 Merge 'memtable flush error handling' from Benny Halevy
The series unifies memtable flush error handling into table::seal_active_memtable
following up on f6d9d6175f.

The goal here is to prevent an infinite retry loop as in #10498
by aborting on any error that is not bad_alloc.

Fixes #10498

Closes #10691

* github.com:scylladb/scylla:
  test: memtable_test: failed_flush_prevents_writes: notify_soft_pressure only once
  test: memtable_test: failed_flush_prevents_writes: extend error injection
  table: seal_active_memtable: abort if retried for too long
  table: seal_active_memtable: abort on unexpected error
  table: try_flush_memtable_to_sstable: propagate errors to seal_active_memtable
  dirty_memory_manager: flush_when_needed: move error handling to flush_one/seal_active_memtable
  dirty_memory_manager: flush_permit: add has_sstable_write_permit
  dirty_memory_manager: flush_permit: release_sstable_write_permit: mark noexcept
  dirty_memory_manager: flush_permit: make _sstable_write_permit optional
  table: reindent seal_active_memtable
  table: coroutinize seal_active_memtable
  memtable_list: mark functions noexcept
  commitlog: make discard_completed_segments and friends noexcept
  dirty_memory_manager: flush_when_needed: target error handling at flush_one
  database: delete unused seal_delayed_fn_type
  dirty_memory_manager: mark functions noexcept
  memtable: mark functions noexcept
  memtable: memtable_encoding_stats_collector: mark functions noexcept
  encoding_state: mark functions noexcept
  logalloc: mark free functions noexcept
  logalloc: allocating_section: mark functions noexcept
  logalloc: allocating_section: guard: mark constructor noexcept
  logalloc: reclaim_lock: mark functions noexcept
  logalloc: tracker_reclaimer_lock: mark constructor noexcept
  logalloc: mark shard_tracker noexcept
  logalloc: region: mark functions const/noexcept
  logalloc: basic_region_impl: mark functions noexcept
  logalloc: region_impl: mark functions noexcept
  utils: log_heap: mark functions noexcept
  logalloc: region_impl: object_descriptor: mark functions noexcept
  logalloc: region_group: mark functions noexcept
  logalloc: tracker: mark functions const/noexcept
  logalloc: tracker::impl: make region_occupancy and friends const
  logalloc: tracker::impl: occupancy: get rid of reclaiming_lock
  logalloc: tracker::impl: mark functions noexcept
  logalloc: segment: mark functions const / noexcept
  logalloc: segment_pool: add const variant of descriptor method
  logalloc: segment_pool: move descriptor method to class definition
  logalloc: segment_pool: mark functions const/noexcept
  logalloc: segment_pool: delete unused free_or_restore_to_reserve method
  utils: dynamic_bitset: mark functions noexcept
  utils: dynamic_bitset: delete unused members
  logalloc: segment_store, segment_pool: idx_from_segment: get a const segment* in const overload
  logalloc: segment_store, segment_pool: return const segment* from segment_from_idx() const
  logalloc: segment_store: make can_allocate_more_segments const
  logalloc: segment_store: mark functions noexcept
  logalloc: segment_descriptor: mark functions noexcept
  logalloc: occupancy_stats: mark functions noexcept
  min_max_tracker: mark functions noexcept
  gc_clock, db_clock: mark functions noexcept
  dirty_memory_manager: region_group: mark functions noexcept
  dirty_memory_manager: region_group: make simple constructor noexcept
  dirty_memory_manager: region_group_reclaimer mark functions noexcept
  logalloc: lsa_buffer: mark functions noexcept
2022-07-27 19:08:59 +03:00