scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Avi Kivity	d5e94ab224	test: partition_data_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:45 +03:00
Avi Kivity	77d54410d0	test: querier_cache_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:37 +03:00
Avi Kivity	b406af2556	test: mutation_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:28 +03:00
Tomasz Grabiec	f893516e55	Merge "lwt: store column_mapping's for each table schema version upon a DDL change" from Pavel Solodovnikov This patch introduces a new system table: `system.scylla_table_schema_history`, which is used to keep track of column mappings for obsolete table schema versions (i.e. schema becomes obsolete when it's being changed by means of `CREATE TABLE` or `ALTER TABLE` DDL operations). It is populated automatically when a new schema version is being pulled from a remote in get_schema_definition() at migration_manager.cc and also when schema change is being propagated to system schema tables in do_merge_schema() at schema_tables.cc. The data referring to the most recent table schema version is always present. Other entries are garbage-collected when the corresponding table schema version is obsoleted (they will be updated with a TTL equal to `DEFAULT_GC_GRACE_SECONDS` on `ALTER TABLE`). In case we failed to persist column mapping after a schema change, missing entries will be recreated on node boot. Later, the information from this table is used in `paxos_state::learn` callback in case we have a mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. Such situation may arise under following circumstances: 1. The previous LWT operation crashed on the "accept" stage, leaving behind a stale accepted proposal, which waits to be repaired. 2. The table affected by LWT operation is being altered, so that schema version is now different. Stored proposal now references obsolete schema. 3. LWT query is retried, so that Scylla tries to repair the unfinished Paxos round and apply the mutation in the learn stage. When such mismatch happens, prior to that patch the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With this patch we try to look up the column mapping for the obsolete schema version, then upgrade the stored mutation using obtained column mapping and apply an upgraded mutation instead. * git@github.com:ManManson/scylla.git feature/table_schema_history_v7: lwt: add column_mapping history persistence tests schema: add equality operator for `column_mapping` class lwt: store column_mapping's for each table schema version upon a DDL change schema_tables: extract `fill_column_info` helper frozen_mutation: introduce `unfreeze_upgrading` method	2020-10-15 20:48:29 +02:00
Pavel Solodovnikov	b59ac032c9	lwt: add column_mapping history persistence tests There are two basic tests, which: * Test that column mappings are serialized and deserialized properly on both CREATE TABLE and ALTER TABLE * Column mappings for obsoleted schema versions are updated with a TTL value on schema change Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-10-15 19:25:24 +03:00
Tomasz Grabiec	62d2979888	Merge "raft: snapshot support" from Gleb Support snapshotting for raft. The patch series only concerns itself with raft logic, not how a specific state machine implements take_snapshot() callback. * scylla-dev/raft-snapshots-v2: raft: test: add tests for snapshot functionality raft: preserve trailing raft log entries during snapshotting raft: implement periodic snapshotting of a state machine raft: add snapshot transfer logic	2020-10-15 12:45:30 +02:00
Gleb Natapov	36c67aef8b	raft: test: add tests for snapshot functionality The patch adds two tests; one for snapshot transfer and another for snapshot generation.	2020-10-15 11:50:27 +03:00
Nadav Har'El	509a41db04	alternator: change name of Alternator's SSL options When Alternator is enabled over HTTPS - by setting the "alternator_https_port" option - it needs to know some SSL-related options, most importantly where to pick up the certificate and key. Before this patch, we used the "server_encryption_options" option for that. However, this was a mistake: Although it sounds like these are the "server's options", in fact prior to Alternator this option was only used when communicating with other servers - i.e., connections between Scylla nodes. For CQL connections with the client, we used a different option - "client_encryption_options". This patch introduces a third option "alternator_encryption_options", which controls only Alternator's HTTPS server. Making it separate from the existing CQL "client_encryption_options" allows both Alternator and CQL to be active at the same time but with different certificates (if the user so wishes). For backward compatibility, we temporarily continue to allow server_encryption_options to control the Alternator HTTPS server if alternator_encryption_options is not specified. However, this generates a warning in the log, urging the user to switch. This temporary workaround should be removed in a future version. This patch also: 1. fixes the test run code (which has an "--https" option to test over https) to use the new name of the option. 2. Adds documentation of the new option in alternator.md and protocols.md - previously the information on how to control the location of the certificate was missing from these documents. Fixes #7204. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930123027.213587-1-nyh@scylladb.com>	2020-10-14 18:13:57 +03:00
Calle Wilund	83339f4bac	Alternator::streams: Make SequenceNumber monotinically growing Fixes #7424 AWS sdk (kinesis) assumes SequenceNumbers are monotonically growing bigints. Since we sort on and use timeuuids are these a "raw" bit representation of this will _not_ fulfill the requirement. However, we can "unwrap" the timestamp of uuid msb and give the value as timestamp<<64\|lsb, which will ensure sort order == bigint order.	2020-10-14 16:45:21 +03:00
Calle Wilund	3f800d68c6	alternator::streams: Ensure shards are reported in string lexical order Fixes #7409 AWS kinesis Java sdk requires/expects shards to be reported in lexical order, and even worse, ignores lastevalshard. Thus not upholding said order will break their stream intropection badly. Added asserts to unit tests. v2: * Added more comments * use unsigned_cmp * unconditional check in streams_test	2020-10-14 16:45:21 +03:00
Benny Halevy	b3f46e9cbf	test: serialized_action_test: add test_serialized_action_exception Tests that the exceptional future returned by the serialized action is propagated to trigger, reproducing #7352. The test fails without the previoud patch: "serialized_action: trigger: include also semaphore status to promise" Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-14 16:45:21 +03:00
Avi Kivity	86bbf1763d	Merge "reader concurrency semaphore: dump permit diagnostics on timeout or queue overflow" from Botond " The reader concurrency semaphore timing out or its queue being overflown are fairly common events both in production and in testing. At the same time it is a hard to diagnose problem that often has a benign cause (especially during testing), but it is equally possible that it points to something serious. So when this error starts to appear in logs, usually we want to investigate and the investigation is lengthy... either involves looking at metrics or coredumps or both. This patch intends to jumpstart this process by dumping a diagnostics on semaphore timeout or queue overflow. The diagnostics is printed to the log with debug level to avoid excessive spamming. It contains a histogram of all the permits associated with the problematic semaphore organized by table, operation and state. Example: DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore - Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics: Permits with state admitted, sorted by memory memory count name 3499M 27 ks.test:data-query 3499M 27 total Permits with state waiting, sorted by count count memory name 1 0B ks.test:drain 7650 0B ks.test:data-query 7651 0B total Permits with state registered, sorted by count count memory name 0 0B total Total: permits: 7678, memory: 3499M This allows determining several things at glance: * What are the tables involved * What are the operations involved * Where is the memory This can speed up a follow-up investigation greatly, or it can even be enough on its own to determine that the issue is benign. Tests: unit(dev, debug) " * 'dump-diagnostics-on-semaphore-timeout/v2' of https://github.com/denesb/scylla: reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow utils: add to_hr_size() reader_concurrency_semaphore: link permits into an intrusive list reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line reader_concurrency_semaphore: move constructors out-of-line reader_concurrency_semaphore: add state to permits reader_concurrency_semaphore: name permits querier_cache_test: test_immediate_evict_on_insert: use two permits multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader() multishard_combining_reader: add permit parameter multishard_combining_reader: shard_reader: use multishard reader's permit	2020-10-13 12:44:23 +03:00
Botond Dénes	ff623e70b3	reader_concurrency_semaphore: name permits Require a schema and an operation name to be given to each permit when created. The schema is of the table the read is executed against, and the operation name, which is some name identifying the operation the permit is part of. Ideally this should be different for each site the permit is created at, to be able to discern not only different kind of reads, but different code paths the read took. As not all read can be associated with one schema, the schema is allowed to be null. The name will be used for debugging purposes, both for coredump debugging and runtime logging of permit-related diagnostics.	2020-10-13 12:32:13 +03:00
Botond Dénes	40c5474022	querier_cache_test: test_immediate_evict_on_insert: use two permits The test currently uses a single permit shared between two simulated reads (to wait admission twice). This is not a supported way of using a permit and will stop working soon as we make the states the permit is in more pronounced.	2020-10-12 15:56:56 +03:00
Botond Dénes	307cdf1e0d	multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader() Allow the evictable reader managing the underlying reader to pass its own permit to it when creating it, making sure they share the same permit. Note that the two parts can still end up using different permits, when the underlying reader is kept alive between two pages of a paged read and thus keeps using the permit received on the previous page. Also adjust the `reader_context` in multishard_mutation_query.cc to use the passed-in permit instead of creating a new one when creating a new reader.	2020-10-12 15:56:56 +03:00
Botond Dénes	e09ab09fff	multishard_combining_reader: add permit parameter Don't create an own permit, take one as a parameter, like all other readers do, so the permit can be provided by the higher layer, making sure all parts of the logical read use the same permit.	2020-10-12 15:56:56 +03:00
Gleb Natapov	9d7c81c1b8	raft: fix boost/raft_fsm_test complication Message-Id: <20201011063802.GA2628121@scylladb.com>	2020-10-12 12:09:21 +02:00
Nadav Har'El	977da3567f	Merge 'Alternator streams: Fix shard lengths, parenting, expiration, filter useless ones and improve paging' from Calle Wilund The remains of the defunct #7246. Fixes #7344 Fixes #7345 Fixes #7346 Fixes #7347 Shard ID length is now within limits. Shard end sequence number should be set when appropriate. Shard parent is selected a bit more carefully (sorting) Shards are filtered by time to exclude cdc generations we cannot get data from (too old) Shard paging improved Closes #7348 * github.com:scylladb/scylla: test_streams: Add some more sanity asserts alternator::streams: Set dynamodb data TTL explicitly in cdc options alternator::streams: Improve paging and fix parent-child calculation alternator::streams: Remove table from shard_id alternator::streams: Filter our cdc streams older than data/table alternator::error: Add a few dynamo exception types	2020-10-12 09:43:12 +03:00
Avi Kivity	4d6739c2e6	Merge "Use max_concurrent_for_each" from Benny " max_concurrent_for_each was added to seastar for replacing sstable_directory::parallel_for_each_restricted by using more efficient concurrency control that doesn't create unlimited number of continuations. The series replaces the use of sstable_directory::parallel_for_each_restricted with max_concurrent_for_each and exposes the sstable_directory::do_for_each_sstable via a static method. This method is used here by table::snapshot to limit concurrency do snapshot operations that suffer from the same unbound concurrency problem sstable_directory solved. In addition sstable_directory::_load_semaphore that was used across calls to do_for_each_sstable was replaced by a static per-shard semaphore that caps concurrency across all calls to `do_for_each_sstable` on that shard. This makes sense since the disk is a shared resource. In the future, we may want to have a load semaphore per device rather than a single global one. We should experiment with that. Test: unit(dev) " * tag 'max_concurrent_for_each-v5' of github.com:bhalevy/scylla: table: snapshot: use max_concurrent_for_each sstable_directory: use a external load_semaphore test: sstable_directory_test: extract sstable_directory creation into with_sstable_directory distributed_loader: process_upload_dir: use initial_sstable_loading_concurrency sstables: sstable_directory: use max_concurrent_for_each	2020-10-12 09:43:12 +03:00
Avi Kivity	610fa83f28	test: database_test: fix threading confusion database_test contains several instances of calling do_with_cql_test_env() with a function that expects to be called in a thread. This mostly works because there is an internal thread in do_with_cql_test_env(), but is not guaranteed to. Fix by switching to the more appropriate do_with_cql_test_env_thread(). Closes #7333	2020-10-11 17:44:30 +03:00
Avi Kivity	58e02c216a	test: sstable_datafile_test: sstable_run_based_compaction_test: prevent use of uninitialized variable observer The variable 'observer' (an std::optional) may be left uninitialized if 'incremental_enabled' is false. However, it is used afterwards with a call to disconnect, accessing garbage. Fix by accessing it via the optional wrapper. A call to optional::reset() destroys the observable, which in turn calls disconnect(). Closes #7380	2020-10-11 17:36:08 +03:00
Avi Kivity	15ab6a3feb	test: cql_repl: use boost::regex instead of std::regex to avoid stack overflow libstdc++'s std::regex uses recursion[1], with a depth controlled by the input. Together with clang's debug mode, this overflows the stack. Use boost::regex instead, which is immune to the problem. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164 Closes #7378	2020-10-11 17:12:21 +03:00
Avi Kivity	882ed2017a	test: network_topology_strategy_test: fix overflow in d2t() d2t() scales a fraction in the range [0, 1] to the range of a biased token (same as unsigned long). But x86 doesn't support conversion to unsigned, only signed, so this is a truncating conversion. Clang's ubsan correctly warns about it. Fix by reducing the range before converting, and expanding it afterwards. Closes #7376	2020-10-11 16:05:02 +03:00
Alejo Sanchez	5d408082b6	raft: log failed test case name Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:47 +02:00
Alejo Sanchez	664b3eddb1	raft: test add hasher Values seen by nodes were so far added but this does not provide a guarantee the order of these values was respected. Use a digest to check output, implicitly checking order. On the other hand, sum or a simple positional checksum like Fletcher's is easier to debug as rolling sum is evident. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:42 +02:00
Alejo Sanchez	670824c6fa	raft: declarative tests For convenience making Raft tests, use declarative structures. Servers are set up and initialized and then updates are processed. For now, updates are just adding entries to leader and change of leader. Updates and leader changes can be specified to run after initial test setup. An example test for 3 nodes, node 0 starting as leader having two entries 0 and 1 for term 1, and with current term 2, then adding 12 entries, changing leader to node 1, and adding 12 more entries. The test will automatically add more entries to the last leader until the test limit of total_values (default 100). {.name = "test_name", .nodes = 3, .initial_term = 2, .initial_states = {{.le = {{1,0},{1,1}}}, .updates = {entries{12},new_leader{1},entries{12}},}, Leader is isolated before change via is_leader returning false. Initial leader (default server 0) will be set with this method, too. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:31 +02:00
Alejo Sanchez	7d4b33d834	raft: test make app return proper exit int value Seastar app returns int result exit value. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:24 +02:00
Alejo Sanchez	093bc8fbb3	raft: test add support for disconnected server Failure detector support of disconnected servers with a global set of addresses. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:02 +02:00
Alejo Sanchez	21d7686766	raft: tests use custom server ids for easier debugging Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:57 +02:00
Alejo Sanchez	56683ae689	raft: test remove unnecessary header Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:45 +02:00
Alejo Sanchez	1bff357816	raft: fix typo snaphot snapshot Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:39 +02:00
Benny Halevy	57cc5f6ae1	sstable_directory: use a external load_semaphore Although each sstable_directory limits concurrency using max_concurrent_for_each, there could be a large number of calls to do_for_each_sstable running in parallel (e.g per keyspace X per table in the distributed_loader). To cap parallelism across sstable_directory instances and concurrent calls to do_for_each_sstable, start a sharded<semaphore> and pass a shared semaphore& to the sstable_directory:s. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-08 11:57:06 +03:00
Benny Halevy	dc46aaa3fd	test: sstable_directory_test: extract sstable_directory creation into with_sstable_directory Use common code to create, start, and stop the sharded<sstable_directory> for each test. This will be used in the next patch for creating a sharded semaphore and passing it to the sstable_directory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-08 11:57:06 +03:00
Gleb Natapov	0bff15a976	raft: Send multiple entries in one append_entry rpc Send more that one entry in single append_entry message but limit one packets size according to append_request_threshold parameter. Message-Id: <20201007142602.GA2496906@scylladb.com>	2020-10-07 16:43:33 +02:00
Calle Wilund	349c5ee21a	test_streams: Add some more sanity asserts Checking validity of retured shard sets etc.	2020-10-07 08:43:39 +00:00
Calle Wilund	3cdd7fe191	alternator::streams: Remove table from shard_id Fixes #7344 It is not data really needed, as shard_id:s are not required to be unique across streams, and also because the length limit on shard_id text representation. As a side effect, shard iter instead carries the stream arn.	2020-10-07 08:43:39 +00:00
Avi Kivity	c6a3fa5a49	Merge "querier_cache: use the querier's permit for memory accounting" from Botond " The querier cache has a memory based eviction mechanism, which starts evicting freshly inserted queriers once their collective memory consumption goes above the configured limit. For determining the memory consumption of individual queriers, the querier cache uses `flat_mutation_reader::buffer_size()`. But we now have a much more comprehensive accounting of the memory used by queriers: the reader permit, which also happens to be available in each querier. So use this to determine the querier's memory consumption instead. Tests: unit(dev) " * 'querier-cache-use-permit-for-memory-accounting/v1' of https://github.com/denesb/scylla: flat_mutation_reader: de-virtualize buffer_size() querier_cache: use the reader permit for memory accounting querier_cache_test: use local semaphore not the test global one reader_permit: add consumed_resources() accessor	2020-10-06 16:52:44 +03:00
Tomasz Grabiec	46b7ba8809	Merge "Bring memory footprint test back to work" from Pavel Emelyanov The test was broken by recent sstables manager rework. In the middle the sstables::test_env is destroyed without being closed which leads to broken _closing assertion inside ~sstables_manager(). Fix is to use the test_env::do_with helper. tests: perf.memory_footprint * https://github.com/xemul/scylla/tree/br-memory-footprint-test-fix: test/perf/memory_footprint: Fix indentation after previous patch test/perf/memory_footprint: Don't forget to close sstables::test_env after usage	2020-10-06 11:49:03 +02:00
Pavel Emelyanov	8bceb916ea	test/perf/memory_footprint: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 11:08:09 +03:00
Pavel Emelyanov	3e4de0f748	test/perf/memory_footprint: Don't forget to close sstables::test_env after usage After recent sstables manager rework the sstables::test_env must be .close()d after usage, otherwise the ~sstables_mananger() hits the _closing assertion. Do it with the help of .do_with(). The execution context is already seastar::async in this place, so .get() it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-06 11:06:35 +03:00
Botond Dénes	dd372c8457	flat_mutation_reader: de-virtualize buffer_size() The main user of this method, the one which required this method to return the collective buffer size of the entire reader tree, is now gone. The remaining two users just use it to check the size of the reader instance they are working with. So de-virtualize this method and reduce its responsibility to just returning the buffer size of the current reader instance.	2020-10-06 08:22:56 +03:00
Botond Dénes	f7eea06f61	querier_cache_test: use local semaphore not the test global one In the mutation source, which creates the reader for this test, the global test semaphore's permit was passed to the created reader (`tests::make_permit()`). This caused reader resources to be accounted on the global test semaphore, instead of the local one the test creates. Just forward the permit passed to the mutation sources to the reader to fix this.	2020-10-06 08:22:56 +03:00
Nadav Har'El	421f0c729d	merge: counters: Avoid signed integer overflow Merged patch series by Tomasz Grabiec: UBSAN complains in debug mode when the counter value overflows: counters.hh:184:16: runtime error: signed integer overflow: 1 + 9223372036854775807 cannot be represented in type 'long int' Aborting on shard 0. Overflow is supposed to be supported. Let's silence it by using casts. Fixes #7330. Tests: - build/debug/test/tools/cql_repl --input test/cql/counters_test.cql Tomasz Grabiec (2): counters: Avoid signed integer overflow test: cql: counters: Add tests reproducing signed integer overflow in debug mode counters.hh \| 2 +- test/cql/counters_test.cql \| 9 ++++++++ test/cql/counters_test.result \| 48 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+), 1 deletion(-)	2020-10-05 21:43:19 +03:00
Tomasz Grabiec	f01ffe063a	test: cql: counters: Add tests reproducing signed integer overflow in debug mode Reproduces #7330	2020-10-05 20:06:34 +02:00
Alejo Sanchez	bb67d15e2f	Raft: disable boost tests for now Disable raft fsm boost tests until raft is part of build. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-02 14:03:01 +02:00
Alejo Sanchez	4e26dad3a0	Raft: Remove tests for now Remove raft C++ tests until raft is included in build process. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-02 12:26:05 +02:00
Tomasz Grabiec	ca7f0c61f0	Merge "raft: initial implementation" from Gleb This is the beginning of raft protocol implementation. It only supports log replication and voter state machine. The main difference between this one and the RFC (besides having voter state machine) is that the approach taken here is to implement raft as a deterministic state machine and move all the IO processing away from the main logic. To do that some changes to RPC interface was required: all verbs are now one way meaning that sending a request does not wait for a reply and the reply arrives as a separate message (or not at all, it is safe to drop packets). * scylla-dev/raft-v4: raft: add a short readme file raft: compile raft tests raft: add raft tests raft: Implement log replication and leader election raft: Introduce raft interface header	2020-10-01 17:09:52 +02:00
Gleb Natapov	4959609589	raft: add raft tests Add test for currently implemented raft features. replication_test tests replication functionality with various initial log configurations. raft_fsm_test test voting state machine functionality.	2020-10-01 14:30:59 +03:00
Etienne Adam	98dc0dc03a	redis: only create required keyspaces/tables The 'redis_database_count' was already existing, but was not used when initializing the keyspaces. This patch merely uses it. I think it's better that way, it seems cleaner not to create 15 x 5 tables when we use only one redis database. Also change a test to test with a higher max number of database. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200930210256.4439-1-etienne.adam@gmail.com>	2020-10-01 10:27:03 +03:00
Avi Kivity	fd1dd0eac7	Merge "Track the memory consumption of reader buffers" from Botond " The last major untracked area of the reader pipeline is the reader buffers. These scale with the number of readers as well as with the size and shape of data, so their memory consumption is unpredictable varies wildly. For example many small rows will trigger larger buffers allocated within the `circular_buffer<mutation_fragment>`, while few larger rows will consume a lot of external memory. This series covers this area by tracking the memory consumption of both the buffer and its content. This is achieved by passing a tracking allocator to `circular_buffer<mutation_fragment>` so that each allocation it makes is tracked. Additionally, we now track the memory consumption of each and every mutation fragment through its whole lifetime. Initially I contemplated just tracking the `_buffer_size` of `flat_mutation_reader::impl`, but concluded that as our reader trees are typically quite deep, this would result in a lot of unnecessary `signal()`/`consume()` calls, that scales with the number of mutation fragments and hence adds to the already considerable per mutation fragment overhead. The solution chosen in this series is to instead track the memory consumption of the individual mutation fragments, with the observation that these are typically always moved and very rarely copied, so the number of `signal()`/`consume()` calls will be minimal. This additional tracking introduces an interesting dilemma however: readers will now have significant memory on their account even before being admitted. So it may happen that they can prevent their own admission via this memory consumption. To prevent this, memory consumption is only forwarded to the semaphore upon admission. This might be solved when the semaphore is moved to the front -- before the cache. Another consequence of this additional, more complete tracking is that evictable readers now consume memory even when the underlying reader is evicted. So it may happen that even though no reader is currently admitted, all memory is consumed from the semaphore. To prevent any such deadlocks, the semaphore now admits a reader unconditionally if no reader is admitted -- that is if all count resources all available. Refs: #4176 Tests: unit(dev, debug, release) " * 'track-reader-buffers/v2' of https://github.com/denesb/scylla: (37 commits) test/manual/sstable_scan_footprint_test: run test body in statement sched group test/manual/sstable_scan_footprint_test: move test main code into separate function test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s test/manual/sstable_scan_footprint_test: make clustering row size configurable test/manual/sstable_scan_footprint_test: document sstable related command line arguments mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*() test: simple_schema: add make_static_row() reader_permit: reader_resources: add operator== mutation_fragment: memory_usage(): remove unused schema parameter mutation_fragment: track memory usage through the reader_permit reader_permit: resource_units: add permit() and resources() accessors mutation_fragment: add schema and permit partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment mutation_fragment: remove as_mutable_end_of_partition() mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/ mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/ mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/ flat_mutation_reader: make _buffer a tracked buffer mutation_reader: extract the two fill_buffer_result into a single one ...	2020-09-29 16:08:16 +03:00

1 2 3 4 5 ...

900 Commits