Add test for currently implemented raft features. replication_test
tests replication functionality with various initial log configurations.
raft_fsm_test test voting state machine functionality.
"
The last major untracked area of the reader pipeline is the reader
buffers. These scale with the number of readers as well as with the size
and shape of data, so their memory consumption is unpredictable varies
wildly. For example many small rows will trigger larger buffers
allocated within the `circular_buffer<mutation_fragment>`, while few
larger rows will consume a lot of external memory.
This series covers this area by tracking the memory consumption of both
the buffer and its content. This is achieved by passing a tracking
allocator to `circular_buffer<mutation_fragment>` so that each
allocation it makes is tracked. Additionally, we now track the memory
consumption of each and every mutation fragment through its whole
lifetime. Initially I contemplated just tracking the `_buffer_size` of
`flat_mutation_reader::impl`, but concluded that as our reader trees are
typically quite deep, this would result in a lot of unnecessary
`signal()`/`consume()` calls, that scales with the number of mutation
fragments and hence adds to the already considerable per mutation
fragment overhead. The solution chosen in this series is to instead
track the memory consumption of the individual mutation fragments, with
the observation that these are typically always moved and very rarely
copied, so the number of `signal()`/`consume()` calls will be minimal.
This additional tracking introduces an interesting dilemma however:
readers will now have significant memory on their account even before
being admitted. So it may happen that they can prevent their own
admission via this memory consumption. To prevent this, memory
consumption is only forwarded to the semaphore upon admission. This
might be solved when the semaphore is moved to the front -- before the
cache.
Another consequence of this additional, more complete tracking is that
evictable readers now consume memory even when the underlying reader is
evicted. So it may happen that even though no reader is currently
admitted, all memory is consumed from the semaphore. To prevent any such
deadlocks, the semaphore now admits a reader unconditionally if no
reader is admitted -- that is if all count resources all available.
Refs: #4176
Tests: unit(dev, debug, release)
"
* 'track-reader-buffers/v2' of https://github.com/denesb/scylla: (37 commits)
test/manual/sstable_scan_footprint_test: run test body in statement sched group
test/manual/sstable_scan_footprint_test: move test main code into separate function
test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s
test/manual/sstable_scan_footprint_test: make clustering row size configurable
test/manual/sstable_scan_footprint_test: document sstable related command line arguments
mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*()
test: simple_schema: add make_static_row()
reader_permit: reader_resources: add operator==
mutation_fragment: memory_usage(): remove unused schema parameter
mutation_fragment: track memory usage through the reader_permit
reader_permit: resource_units: add permit() and resources() accessors
mutation_fragment: add schema and permit
partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment
mutation_fragment: remove as_mutable_end_of_partition()
mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/
mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/
mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/
mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/
flat_mutation_reader: make _buffer a tracked buffer
mutation_reader: extract the two fill_buffer_result into a single one
...
After cleaning up old cluster features (253a7640e3)
the code for special handling of 1.7.4 counter order was effectively
only used in its own tests, so it can be safely removed.
Closes#7289
The memory usage is now maintained and updated on each change to the
mutation fragment, so it needs not be recalculated on a call to
`memory_usage()`, hence the schema parameter is unused and can be
removed.
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.
In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.
Uses where this method was just used to move the fragment away are
converted to use `as_clustering_row() &&`.
Not used yet, this patch does all the churn of propagating a permit
to each impl.
In the next patch we will use it to track to track the memory
consumption of `_buffer`.
Current code uses a single counter to produce multiple buffer worth of
data. This uses carry-on from on buffer to the other, which happens to
work with the current memory accounting but is very fragile. Account
each buffer separately, resetting the counter between them.
The test consumes all resources off the semaphore, leaving just enough
to admit a single reader. However this amount is calculated based on the
base cost of readers, but as we are going to track reader buffers as
well, the amount of memory consumed will be much less predictable.
So to make sure background readers can finish during shutdown, release
all the consumed resources before leaving scope.
No point in continuing processing the entire buffer once a failure was
found. Especially that an early failure might introduce conditions that
are not handled in the normal flow-path. We could handle these but there
is no point in this added complexity, at this point the test is failed
anyway.
Some tests rely on `consume*()` calls on the permit to take effect
immediately. Soon this will only be true once the permit has been
admitted, so make sure the permit is admitted in these tests.
The reader recreation mechanism is a very delicate and error-prone one,
as proven by the countless bugs it had. Most of these bugs were related
to the recreated reader not continuing the read from the expected
position, inserting out-of-order fragments into the stream.
This patch adds a defense mechanism against such bugs by validating the
start position of the recreated reader.
The intent is to prevent corrupt data from getting into the system as
well as to help catch these bugs as close to the source as possible.
Fixes: #7208
Tests: unit(dev), mutation_reader_test:debug (v4)
* botond/evictable-reader-validate-buffer/v5:
mutation_reader_test: add unit test for evictable reader self-validation
evictable_reader: validate buffer after recreation the underlying
evictable_reader: update_next_position(): only use peek'd position on partition boundary
mutation_reader_test: add unit test for evictable reader range tombstone trimming
evictable_reader: trim range tombstones to the read clustering range
position_in_partition_view: add position_in_partition_view before_key() overload
flat_mutation_reader: add buffer() accessor
Currently, sstable_manager is used to create sstables, but it loses track
of them immediately afterwards. This series makes an sstable's life fully
contained within its sstable_manager.
The first practical impact (implemented in this series) is that file removal
stops being a background job; instead it is tracked by the sstable_manager,
so when the sstable_manager is stopped, you know that all of its sstable
activity is complete.
Later, we can make use of this to track the data size on disk, but this is not
implemented here.
Closes#7253
* github.com:scylladb/scylla:
sstables: remove background_jobs(), await_background_jobs()
sstables: make sstables_manager take charge of closing sstables
test: test_env: hold sstables_manager with a unique_ptr
test: drop test_sstable_manager
test: sstables::test_env: take ownership of manager
test: broken_sstable_test: prepare for asynchronously closed sstables_manager
test: sstable_utils: close test_env after use
test: sstable_test: dont leak shared_sstable outside its test_env's lifetime
test: sstables::test_env: close self in do_with helpers
test: perf/perf_sstable.hh: prepare for asynchronously closed sstables_manager
test: view_build_test: prepare for asynchronously closed sstables_manager
test: sstable_resharding_test: prepare for asynchronously closed sstables_manager
test: sstable_mutation_test: prepare for asynchronously closed sstables_manager
test: sstable_directory_test: prepare for asynchronously closed sstables_manager
test: sstable_datafile_test: prepare for asynchronously closed sstables_manager
test: sstable_conforms_to_mutation_source_test: remove references to test_sstables_manager
test: sstable_3_x_test: remove test_sstables_manager references
test: schema_changes_test: drop use of test_sstables_manager
mutation_test: adjust for column_family_test_config accepting an sstables_manager
test: lib: sstable_utils: stop using test_sstables_manager
test: sstables test_env: introduce manager() accessor
test: sstables test_env: introduce do_with_async_sharded()
test: sstables test_env: introduce do_with_async_returning()
test: lib: sstable test_env: prepare for life as a sharded<> service
test: schema_changes_test: properly close sstables::test_env
test: sstable_mutation_test: avoid constructing temporary sstables::test_env
test: mutation_reader_test: avoid constructing temporary sstables::test_env
test: sstable_3_x_test: avoid constructing temporary sstables::test_env
test: lib: test_services: pass sstables_manager to column_family_test_config
test: lib: sstables test_env: implement tests_env::manager()
test: sstable_test: detemplate write_and_validate_sst()
test: sstable_test_env: detemplate do_with_async()
test: sstable_datafile_test: drop bad 'return'
table: clear sstable set when stopping
table: prevent table::stop() race with table::query()
database: close sstable_manager:s
sstables_manager: introduce a stub close()
sstable_directory_test: fix threading confusion in make_sstable_directory_for*() functions
test: sstable_datafile_test: reorder table stop in compaction_manager_test
test: view_build_test: test_view_update_generator_register_semaphore_unit_leak: do not discard future in timer
test: view_build_test: fix threading in test_view_update_generator_register_semaphore_unit_leak
view: view_update_generator: drop references to sstables when stopping
do_write_sst() creates a test_env, creates a shared_sstable using that test_env,
and destroys the test_env, and returns the sstable. This works now but will
stop working once sstable_manager becomes responsible for sstable lifetime.
Fortunately, do_write_sst() has one caller that isn't interested in the
return value at all, so fold it into that caller.
sstables_manager will soon be closed asynhronously, with a future-returning
close() function. To prepare for that, make the following changes
- replace test_sstables_manager with an sstables_manager obtained from test_env
- drop unneeded calls to await_background_jobs()
These changes allow lifetime management of the sstables_manager used
in the tests to be centralized in test_env.
sstables_manager will soon be closed asynhronously, with a future-returning
close() function. To prepare for that, make the following changes
- acquire a test_env with test_env::do_with() (or the sharded variant)
- change the sstable_from_existing_file function to be a functor that
works with either cql_test_env or test_env (as this is what individual
tests want); drop use of test_sstables_manager
- change new_sstable() to accept a test_env instead of using test_sstables_manager
- replace test_sstables_manager with an sstables_manager obtained from test_env
These changes allow lifetime management of the sstables_manager used
in the tests to be centralized in test_env.
sstables_manager will soon be closed asynhronously, with a future-returning
close() function. To prepare for that, make the following changes
- replace on-stack test_env with test_env::do_with()
- use the variant of column_family_for_tests that accepts an sstables_manager
- replace test_sstables_manager with an sstables_manager obtained from test_env
These changes allow lifetime management of the sstables_manager used
in the tests to be centralized in test_env.
Since test_env now calls await_background_jobs on termination, those
calls are dropped.
Use the sstables_manager from test_env. Use do_with_async() to create the test_env,
to allow for proper closing.
Since do_with_async() also takes care of await_background_jobs(), remove that too.
test_sstables_manager is going away, so replace it by test_env::manager().
column_family_test_config() has an implicit reference to test_sstables_manager,
so pass test_env::manager() as a parameter.
Calls to await_background_jobs() are removed, since test_env::stop() performs
the same task.
The large rows tests are special, since they use a custom sstables_manager,
so instead of using a test_env, they just close their local sstables_manager.
Acquire a test_env and extract an sstables_manager from that, passing it
to column_familty_test_config, in preparation for losing the default
constructor of column_familty_test_config.
sstables::test_env needs to be properly closed (and will soon need it
even more). Use test_env::do_with_async() to do that. Removed
await_background_jobs(), which is now done by test_env::close().
A test_env contains an sstables_manager, which will soon have a close() method.
As such, it can no longer be a temporary. Switch to using test_env::do_with_async().
As a bonus, test_env::do_with_async() performs await_background_jobs() for us, so
we can drop it from the call sites.
A test_env contains an sstables_manager, which will soon have a close() method.
As such, it can no longer be a temporary. Switch to using test_env::do_with_async().
A test_env contains an sstables_manager, which will soon have a close() method.
As such, it can no longer be a temporary. Switch to using test_env::do_with_async().
As a bonus, test_env::do_with_async() performs await_background_jobs() for us, so
we can drop it from the call sites.
The pattern
return function_returning_a_future().get();
is legal, but confusing. It returns an unexpected std::tuple<>. Here,
it doesn't do any harm, but if we try to coerce the surrounding code
into a signature (void ()), then that will fail.
Remove the unneeded and unexpected return.
The make_sstable_directory_for*() functions run in a thread, and
call functions that run in a thread, but return a future. This
more or less works but is a dangerous construct that can fail.
Fix by returning a regular value.
Stopping a table will soon close its sstables; so the next check will fail
as the number of sstables for the table will be zero.
Reorder the stop() call to make it safe.
We don't need the stop() for the check, since the previous loop made sure
compactions completed.
test_view_update_generator_register_semaphore_unit_leak creates a continuation chain
inside a timer, but does not wait for it. This can result in part of the chain
being executed after its captures have been destroyed.
This is unlikely to happen since the timer fires only if the test fails, and
tests never fail (at least in the way that one expects).
Fix by waiting for that future to complete before exiting the thread.
test_view_update_generator_register_semaphore_unit_leak uses a thread function
in do_with_cql_env(), even though the latter doesn't promise a thread and
accepts a regular function-returning-a-future. It happens to work because the
function happens to be called in a thread, but this isn't guaranteed.
Switch to do_with_cql_env, which guarantees a thread context.
In preparations of non-inactive read stats being added to the semaphore,
rename its existing stats struct and member to a more generic name.
Fields, whose name only made sense in the context of the old name are
adjusted accordingly.
new_sstable is defined as a template, and later used in a context
that requires an object. Somehow gcc uses an instantiation with
an empty template parameter list, but I don't think it's right,
and clang refuses.
Since the template is gratuitous anyway, just make it a regular
function.
Clang has a hard time dealing with single-element initializer lists. In this
case, adding an explicit conversion allows it to match the
initializer_list<data_value> parameter.
The set contains 3 small optimizations:
- avoid copying of partition key on lookup path
- reduce number of args carried around when creating a new entry
- save one partition key comparison on reader creation
Plus related satellite cleanups.
* https://github.com/xemul/scylla/tree/br-row-cache-less-copies:
row_cache: Revive do_find_or_create_entry concepts
populating reader: Do not copy decorated key too early
populating reader: Less allocator switching on population
populating reader: Fix indentation after previous patch
row_cache: Move missing entry creation into helper
test: Lookup an existing entry with its own helper
row_cache: Do not copy partition tombstone when creating cache entry
row_cache: Kill incomplete_tag
row_cache: Save one key compare on direct hit
"
This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue.
Fixes#4567
Based on #4574
"
* psarna-fix_ignoring_cells_after_null_in_appending_hash:
test: extend mutation_test for NULL values
tests/mutation: add reproducer for #4567
gms: add a cluster feature for fixed hashing
digest: add null values to row digest
mutation_partition: fix formatting
appending_hash<row>: make publicly visible
The test is extended for another possible corner case:
[1, NULL, 2] vs [1, 2, NULL] should have different digests.
Also, a check for legacy behavior is added.
There was a typo in get_column_defs_for_filtering(): it checked the
wrong pointer before dereferencing. Add a test exposing the NULL
dereference and fix the typo.
Tests: unit (dev)
Fixes#7198.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
The log-structured allocator (LSA) reserves memory when performing
operations, since its operations are performed with reclaiming disabled
and if it runs out, it cannot evict cache to gain more. The amount of
memory to reserve is remembered across calls so that it does not have
to repeat the fail/increase-reserve/retry cycle for every operation.
However, we currently lack decaying the amount to reserve. This means
that if a single operation increased the reserve in the distant past,
all current operations also require this large reserve. Large reserves
are expensive since they can cause large amounts of cache to be evicted.
This patch adds reserve decay. The time-to-decay is inversely proportional
to reserve size: 10GB/reserve. This means that a 20MB reserve is halved
after 500 operations (10GB/20MB) while a 20kB reserve is halved after
500,000 operations (10GB/20kB). So large, expensive reserves are decayed
quickly while small, inexpensive reserves are decayed slowly to reduce
the risk of allocation failures and exceptions.
A unit test is added.
Fixes#325.
Add new validate_with_error_position function
which returns -1 if data is a valid UTF-8 string
or otherwise a byte position of first invalid
character. The position is added to exception
messages of all UTF-8 parsing errors in Scylla.
validate_with_error_position is done in two
passes in order to preserve the same performance
in common case when the string is valid.