Files
scylladb/test
Avi Kivity 371e37924f Merge 'Rebuild bloom filters that have bad partition estimates' from Lakshmi Narayanan Sreethar
The bloom filters are built with partition estimates because the actual
partition count might not be available in all cases. If the estimate is
inaccurate, the bloom filters might end up being too large or too small
compared to their optimal sizes. This PR rebuilds bloom filters with
inaccurate partition estimates using the actual partition count before
the filter is written to disk. A bloom filter is considered to have an
inaccurate estimate if its false positive rate based on the current
bitmap size is either less than 75% or more than 125% of the configured
false positive rate.

Fixes #19049

A manual test was run to check the impact of rebuild on compaction.

Table definition used : CREATE TABLE scylla_bench.simple_table (id int PRIMARY KEY);

Setup : 3 billion random rows with id in the range [0, 1e8) were inserted as batches of 5 rows into scylla_bench.simple_table via 80 threads.

Compaction statistics :

scylla_bench.simple_table :
(a) Total number of compactions : `1501`
(b) Total time spent in compaction : `9h58m47.269s`
(c) Number of compactions which rebuilt bloom filters : `16`
(d) Total time taken by these 16 compactions which rebuilt bloom filters : `2h55m11.89s`
(e) Total time spent by these 16 compactions to rebuild bloom filters : `8m6.221s` which is
- `4.63%` of the total time taken by the compactions which rebuilt filters (d)
- `1.35%` of the total compaction time (b).

(f) Total bytes saved by rebuilding filters : `388 MB`

system.compaction_history :
(a) Total number of compactions : `77`
(b) Total time spent in compaction : `21.24s`
(c) Number of compactions which rebuilt bloom filters : `74`
(d) Time taken by these 74 compactions which rebuilt bloom filters : `20.48s`
(e) Time spent by these 74 compactions to rebuild bloom filters : `377ms` which is
- `1.84%` of the total time taken by the compactions which rebuilt filters (d)
- `1.77%` of the total compaction time (b).

(f) Total bytes saved by rebuilding filters : `20 kB`

The following tables also had compactions and the bloom filter was rebuilt in all those compactions.
However, the time taken for every rebuild was observed as 0ms from the logs as it completed within a microsecond :

system.raft :
(a) Total number of compactions : `2`
(b) Total time spent in compaction : `106ms`
(c) Total bytes saved by rebuilding filters : `960 B`

system_schema.tables :
(a) Total number of compactions : `1`
(b) Total time spent in compaction : `25ms`
(c) Total bytes saved by rebuilding filter : `312 B`

system.topology :
(a) Total number of compactions : `1`
(b) Total time spent in compaction : `25ms`
(c) Total bytes saved by rebuilding filter : `320 B`

Closes scylladb/scylladb#19190

* github.com:scylladb/scylladb:
  bloom_filter_test: add testcase to verify filter rebuilds
  test/boost: move bloom filter tests from sstable_datafile_test into a new file
  sstables/mx/writer: rebuild bloom filters with bad partition estimates
  sstables/mx/writer: add variable to track number of partitions consumed
  sstable: introduce sstable::maybe_rebuild_filter_from_index()
  sstable: add method to return filter format for the given sstable version
  utils/i_filter: introduce get_filter_size()
2024-06-25 15:35:09 +03:00
..
2024-06-21 19:16:11 +03:00
2024-01-18 11:11:34 +02:00
2024-06-20 18:45:31 +03:00
2024-06-21 19:16:11 +03:00

Scylla in-source tests.

For details on how to run the tests, see docs/dev/testing.md

Shared C++ utils, libraries are in lib/, for Python - pylib/

alternator - Python tests which connect to a single server and use the DynamoDB API unit, boost, raft - unit tests in C++ cql-pytest - Python tests which connect to a single server and use CQL topology* - tests that set up clusters and add/remove nodes cql - approval tests that use CQL and pre-recorded output rest_api - tests for Scylla REST API Port 9000 scylla-gdb - tests for scylla-gdb.py helper script nodetool - tests for C++ implementation of nodetool

If you can use an existing folder, consider adding your test to it. New folders should be used for new large categories/subsystems, or when the test environment is significantly different from some existing suite, e.g. you plan to start scylladb with different configuration, and you intend to add many tests and would like them to reuse an existing Scylla cluster (clusters can be reused for tests within the same folder).

To add a new folder, create a new directory, and then copy & edit its suite.ini.