scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 14:33:08 +00:00

Author	SHA1	Message	Date
Piotr Dulikowski	0fd36e2579	api: allow changing hinted handoff configuration This commit makes it possible to change hints manager's configuration at runtime through HTTP API. To preserve backwards compatibility, we keep the old behavior of not creating and checking hints directories if they are not enabled at startup. Instead, hint directories are lazily initialized when hints are enabled for the first time through HTTP API.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	220a2ca800	hints_manager: implement change_host_filter Implements a function which is responsible for changing hints manager configuration while it is running. It first starts new endpoint managers for endpoints which weren't allowed by previous filter but are now, and then stops endpoint managers which are rejected by the new filter. The function is blocking and waits until all relevant ep managers are started or stopped.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	1302f1b5bf	storage_proxy: always create hints manager Now, the hints manager object for regular hints is always created, even if hints are disabled in configuration. Please note that the behavior of hints will be unchanged - no hints will be sent when they are disabled. The intent of this change is to make enabling and disabling hints in runtime easier to implement.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	cefe5214ff	config: plug in hints::host_filter object into configuration Uses db::hints::host_filter as the type of hinted_handoff_enabled configuration option. Previously, hinted_handoff_enabled used to be a string option, and it was parsed later in a separate function during startup. The function returned a std::optional<std::unordered_set<sstring>>, whose meaning in the context of hints is rather enigmatic for an observer not familiar with hints. Now, hinted_handoff_enabled has type of db::hints::host_filter, and it is plugged into the config parsing framework, so there is no need for later post-processing.	2020-11-17 10:24:42 +01:00
Piotr Dulikowski	5c3c7c946b	db/hints: introduce host_filter Adds a db::hints::host_filter structure, which determines if generating hints towards a given target is currently allowed. It supports serialization and deserialization between the hinted_handoff_enabled configuration/cli option. This patch only introduces this structure, but does not make other code use it. It will be plugged into the configuration architecture in the following commits.	2020-11-17 10:15:47 +01:00
Piotr Dulikowski	a4f03d72b3	hints/resource_manager: allow registering managers after start This change modifies db::hints::resource_manager so that it is now possible to add hints::managers after it was started. This change will make it possible to register the regular hints manager later in runtime, if it wasn't enabled at boot time.	2020-11-17 10:15:47 +01:00
Piotr Dulikowski	40710677d0	hints: introduce db::hints::directory_initializer Introduces a db::hints::directory_initializer object, which encapsulates the logic of initializing directories for hints (creating/validating directories, segment rebalancing). It will be useful for lazy initialization of hints manager.	2020-11-17 10:15:47 +01:00
Kamil Braun	d74f303406	cdc: ensure that CDC generation write is flushed to commitlog before ack When a node bootstraps or upgrades from a pre-CDC version, it creates a new CDC generation, writes it to a distributed table (system_distributed.cdc_generation_descriptions), and starts gossiping its timestamp. When other nodes see the timestamp being gossiped, they retrieve the generation from the table. The bootstrapping/upgrading node therefore assumes that the generation is made durable and other nodes will be able to retrieve it from the table. This assumption could be invalidated if periodic commitlog mode was used: replicas would acknowledge the write and then immediately crash, losing the write if they were unlucky (i.e. commitlog wasn't synced to disk before the write was acknowledged). This commit enforces all writes to the generations table to be synced to commitlog immediately. It does not matter for performance as these writes are very rare. Fixes https://github.com/scylladb/scylla/issues/7610. Closes #7619	2020-11-17 00:01:13 +02:00
Piotr Jastrzebski	d2897d8f8b	alternator: guard streams with an experimental flag Add new alternator-streams experimental flag for alternator streams control. CDC becomes GA and won't be guarded by an experimental flag any more. Alternator Streams stay experimental so now they need to be controlled by their own experimental flag. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-12 12:36:16 +01:00
Piotr Jastrzebski	e9072542c1	Mark CDC as GA Enable CDC by default. Rename CDC experimental feature to UNUSED_CDC to keep accepting cdc flag. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-12 12:36:13 +01:00
Piotr Sarna	d43ac783c6	db,view: degrade helper message from error to warn When a missing base column happens to be named `idx_token`, an additional helper message is printed in logs. This additional message does not need to have `error` severity, since the previous, generic message is already marked as `error`. This patch simply makes it easier to write tests, because in case this error is expected, only one message needs to be explicitly ignored instead of two. Closes #7597	2020-11-12 12:28:26 +02:00
Benny Halevy	3fab0f8694	storage_proxy: convert to shared_token_metadata get() the latest token_metadata_ptr from the shared_token_metadata before each use. expose get_token_metadata_ptr() rather than get_token_metadata() so that caller can keep it across continuations. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	6d06853e6c	abstract_replication_strategy: convert to shared_token_metadata To facilitate that, keep a const shared_token_metadata& in class database rather than a const token_metadata& Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	8bcdf39a18	hints/manager: scan_for_hints_dirs: fix use-after-move This use-after move was apprently exposed after switching to clang in commit `eb861e68e9`. The directory_entry is required for std::stoi(de.name.c_str()) and later in the catch{} clause. This shows in the node logs as a "Ignore invalid directory" debug log message with an empty name, and caused the hintedhandoff_rebalance_test to fail when hints files aren't rebalanced. Test: unit(dev) DTest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test (dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201106172017.823577-1-bhalevy@scylladb.com>	2020-11-09 16:32:54 +01:00
Piotr Wojtczak	72c7f25a29	db: add TransitionalAuthorizer and TransitionalAuthenticator... ... to config descriptions We allow setting the transitional auth as one of the options in scylla.yaml, but don't mention it at all in the field's description. Let's change that. Closes #7565	2020-11-09 10:51:54 +01:00
Avi Kivity	6b4a7fa515	Revert "Revert "config: Do not enable repair based node operations by default"" This reverts commit `71d0d58f8c`. Repair based node operations are still not ready and will be re-enabled after more testing and fixes.	2020-11-08 14:09:50 +02:00
Tomasz Grabiec	6d0d55aa72	Merge "Unglobal query processor instance" from Pavel Emelyanov The query processor is present in the global namespace and is widely accessed with global get(_local)?_query_processor(). There's a long-term task to get rid of this globality and make services and componenets reference each-other and, for and due-to this, start and stop in specific order. This set makes this for the query processor. The remaining users of it are -- alternator, controllers for client services, schema_tables and sys_dist_ks. All of them except for the schema_tables are fixed just by passing the reference on query processor with small patches. The schema tables accessing qp sit deep inside the paxos code, but can be "fixed" with the qctx thing until the qctx itself is de-globalized. * https://github.com/xemul/scylla/tree/br-rip-global-query-processor: code: RIP global query processor instance cql test env: Keep query processor reference on board system distributed keyspace: Start sharded service erarlier schema_tables: Use qctx to make internal requests transport: Keep sharded query processor reference on controller thrift: Keep sharded query processor reference on controller alternator: Use local query processor reference to get keys alternator: Keep local query processor reference in server	2020-11-06 14:24:41 +01:00
Piotr Sarna	b61d4bc8d0	db: degrade view building progress loading error to warning When the view builder cannot read view building progress from an internal CQL table it produces an error message, but that only confuses the user and the test suite -- this situation is entirely recoverable, because the builder simply assumes that there is no progress and the view building should start from scratch. Fixes #7527 Closes #7558	2020-11-06 10:19:11 +02:00
Nadav Har'El	7ff72b0ba5	Merge 'secondary_index: fix returned rows token ordering' from Piotr Grabowski Fixes returned rows ordering to proper signed token ordering. Before this change, rows were sorted by token, but using unsigned comparison, meaning that negative tokens appeared after positive tokens. Rename `token_column_computation` to `legacy_token_column_computation` and add some comments describing this computation. Added (new) `token_column_computation` which returns token as `long_type`, which is sorted using signed comparison - the correct ordering of tokens. Add new `correct_idx_token_in_secondary_index` feature, which flags that the whole cluster is able to use new `token_column_computation`. Switch token computation in secondary indexes to (new) `token_column_computation`, which fixes the ordering. This column computation type is only set if cluster supports `correct_idx_token_in_secondary_index` feature to make sure that all nodes will be able to compute new `token_column_computation`. Also old indexes will need to be rebuilt to take advantage of this fix, as new token column computation type is only set for new indexes. Fix tests according to new token ordering and add one new test to validate this aspect explicitly. Fixes #7443 Tested manually a scenario when someone created an index on old version of Scylla and then migrated to new Scylla. Old index continued to work properly (but returning in wrong order). Upon dropping and re-creating the index, it still returned the same data, but now in correct order. Closes #7534 * github.com:scylladb/scylla: tests: add token ordering test of indexed selects tests: fix tests according to new token ordering secondary_index: use new token_column_computation feature: add correct_idx_token_in_secondary_index column_computation: add token_column_computation token_column_computation: rename as legacy	2020-11-05 18:44:49 +01:00
Piotr Grabowski	b1350af951	token_column_computation: rename as legacy Raname token_column_computation to legacy_token_column_computation, as it will be replaced with new column_computation. The reason is that this computation returns bytes, but all tokens in Scylla can now be represented by int64_t. Moreover, returning bytes causes invalid token ordering as bytes comparison is done in unsigned way (not signed as int64_t). See issue: https://github.com/scylladb/scylla/issues/7443	2020-11-04 12:00:18 +01:00
Piotr Sarna	b66c285f94	schema_tables: fix fixing old secondary index schemas Old secondary index schemas did not have their idx_token column marked as computed, and there already exists code which updates them. Unfortunately, the fix itself contains an error and doesn't fire if computed columns are not yet supported by the whole cluster, which is a very common situation during upgrades. Fixes #7515 Closes #7516	2020-11-02 12:30:20 +02:00
Pavel Emelyanov	021b905773	schema_tables: Use qctx to make internal requests The query processor global instance is going away. The schema_tables usage of it requires a huge rework to push the qp reference to the needed places. However, those places talk to system keyspace and are thus the users of the "qctx" thing -- the query context for local internal requests. To make cql tests not crash on null qctx pointer, its initialization should come earlier (conforming to the main start sequence). The qctx itself is a global pointer, which waits for its fix too, of course. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-10-31 18:50:01 +03:00
Piotr Sarna	35887bf88b	view: add printing missing base column on errors When an out-of-sync view is attempted to be used in a write operation, the whole operation needs to be aborted with an error. After this patch, the error contains more context - namely, the missing column.	2020-10-31 12:22:07 +01:00
Piotr Sarna	ef3470fa34	view: simplify creating base-dependent info for reads only The code which created base-dependent info for materialized views can be expressed with fewer branches. Also, the constructor which takes a single parameter is made explicit.	2020-10-31 12:22:07 +01:00
Piotr Sarna	71b28d69b3	view: fix typo: s/dependant/dependent	2020-10-31 12:22:07 +01:00
Piotr Sarna	669e2ada92	view: add error logs if a view is out of sync with its base When Scylla finds out that a materialized view contains columns which are not present in the base table (and they are not computed), it now presents comprehensible errors in the log.	2020-10-31 12:22:07 +01:00
Tomasz Grabiec	158ae99c89	Merge 'view info: preserve integrity by allowing base info for reads only and by initializing base info' from Eliran Sinvani This PR purpose is to handle schema integrity issues that can arise in races involving materialized views. The possibility of such integrity issues was found in #7420 , where a view schema was used for reading without it's _base_info member initialized resulting in a segfault. We handle this doing 3 things: 1. First guard against using an uninitialized base info - this will be considered as an internal error as it will indicate that there is a path in our code that creates a view schema to be used for reads or writes but is not initializing the base info. 2. We allow the base info to be initialized also from partially matching base (most likely a newer one that this used to create the view). 3. We fix the suspected path that create such a view schema to initialize it. (in migration manager) It is worth mentioning that this PR is a workaround to a probable design flaw in our materialized views which requires the base table's information to be retrieved in the first place instead of just being self contained. Refs #7420 Closes #7469 * github.com:scylladb/scylla: materialized views: add a base table reference if missing view info: support partial match between base and view for only reading from view. view info: guard against null dereference of the base info	2020-10-21 16:21:00 +02:00
Eliran Sinvani	70e04c1123	view info: support partial match between base and view for only reading from view. The current implementation of materialized views does no keep the version to which a specific version of materialized view schema corresponds to. This complicate things especially on old views versions that the schema doesn't support anymore. However, the views, being also an independent table should allow reading from them as long as they exist even if the base table changed since then. For the reading purpose, we don't need to know the exact composition of view primary key columns that are not part of the base primary key, we only need to know that there are any, and this is a much looser constrain on the schema. We can rely on a table invariants such as the fact that pk columns are not going to disappear on newer version of the table. This means that if we don't find a view column in the base table, it is not a part of the base table primary key. This information is enough for us to perform read on the view. This commit adds support for being able to rely on such partial information along with a validation that it is not going to be used for writes. If it is, we simply abort since this means that our schema integrity is compromised.	2020-10-21 15:20:43 +03:00
Eliran Sinvani	372051c97d	view info: guard against null dereference of the base info The change's purpose is to guard against segfault that is the result of dereferencing the _base_info member when it is uninitialized. We already know this can happen (#7420). The only purpose of this change is to treat this condition as an internal error, the reason is that it indicates a schema integrity problem. Besides this change, other measures should be taken to ensure that the _base_table member is initialized before calling methods that rely on it. We call the internal_error as a last resort.	2020-10-21 12:12:51 +03:00
Avi Kivity	8e386a5f48	schema_tables: adjust altered_schema construction for clang Clang does not implement P0960R3, parenthesized initialization of aggregates, so we have to use brace initialization in altered_schema. As the parenthesized constructor call is done by emplace_back(), we have to do the braced call ourselves.	2020-10-19 14:57:21 +03:00
Avi Kivity	cb9a9584ac	db: hints/manager: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:09 +03:00
Avi Kivity	1986a74cc4	db: commitlog_replayer: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:01 +03:00
Tomasz Grabiec	f893516e55	Merge "lwt: store column_mapping's for each table schema version upon a DDL change" from Pavel Solodovnikov This patch introduces a new system table: `system.scylla_table_schema_history`, which is used to keep track of column mappings for obsolete table schema versions (i.e. schema becomes obsolete when it's being changed by means of `CREATE TABLE` or `ALTER TABLE` DDL operations). It is populated automatically when a new schema version is being pulled from a remote in get_schema_definition() at migration_manager.cc and also when schema change is being propagated to system schema tables in do_merge_schema() at schema_tables.cc. The data referring to the most recent table schema version is always present. Other entries are garbage-collected when the corresponding table schema version is obsoleted (they will be updated with a TTL equal to `DEFAULT_GC_GRACE_SECONDS` on `ALTER TABLE`). In case we failed to persist column mapping after a schema change, missing entries will be recreated on node boot. Later, the information from this table is used in `paxos_state::learn` callback in case we have a mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. Such situation may arise under following circumstances: 1. The previous LWT operation crashed on the "accept" stage, leaving behind a stale accepted proposal, which waits to be repaired. 2. The table affected by LWT operation is being altered, so that schema version is now different. Stored proposal now references obsolete schema. 3. LWT query is retried, so that Scylla tries to repair the unfinished Paxos round and apply the mutation in the learn stage. When such mismatch happens, prior to that patch the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With this patch we try to look up the column mapping for the obsolete schema version, then upgrade the stored mutation using obtained column mapping and apply an upgraded mutation instead. * git@github.com:ManManson/scylla.git feature/table_schema_history_v7: lwt: add column_mapping history persistence tests schema: add equality operator for `column_mapping` class lwt: store column_mapping's for each table schema version upon a DDL change schema_tables: extract `fill_column_info` helper frozen_mutation: introduce `unfreeze_upgrading` method	2020-10-15 20:48:29 +02:00
Pavel Solodovnikov	055fd3d8ad	lwt: store column_mapping's for each table schema version upon a DDL change This patch introduces a new system table: `system.scylla_table_schema_history`, which is used to keep track of column mappings for obsolete table schema versions (i.e. schema becomes obsolete when it's being changed by means of `CREATE TABLE` or `ALTER TABLE` DDL operations). It is populated automatically when a new schema version is being pulled from a remote in get_schema_definition() at migration_manager.cc and also when schema change is being propagated to system schema tables in do_merge_schema() at schema_tables.cc. The data referring to the most recent table schema version is always present. Other entries are garbage-collected when the corresponding table schema version is obsoleted (they will be updated with a TTL equal to `DEFAULT_GC_GRACE_SECONDS` on `ALTER TABLE`). In case we failed to persist column mapping after a schema change, missing entries will be recreated on node boot. Later, the information from this table is used in `paxos_state::learn` callback in case we have a mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. Such situation may arise under following circumstances: 1. The previous LWT operation crashed on the "accept" stage, leaving behind a stale accepted proposal, which waits to be repaired. 2. The table affected by LWT operation is being altered, so that schema version is now different. Stored proposal now references obsolete schema. 3. LWT query is retried, so that Scylla tries to repair the unfinished Paxos round and apply the mutation in the learn stage. When such mismatch happens, prior to that patch the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With this patch we try to look up the column mapping for the obsolete schema version, then upgrade the stored mutation using obtained column mapping and apply an upgraded mutation instead. In case we don't find a column_mapping we just return an error from the learn stage. Tests: unit(dev, debug), dtests(paxos_tests.py:TestPaxos.schema_mismatch_*_test) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-10-15 19:24:30 +03:00
Nadav Har'El	509a41db04	alternator: change name of Alternator's SSL options When Alternator is enabled over HTTPS - by setting the "alternator_https_port" option - it needs to know some SSL-related options, most importantly where to pick up the certificate and key. Before this patch, we used the "server_encryption_options" option for that. However, this was a mistake: Although it sounds like these are the "server's options", in fact prior to Alternator this option was only used when communicating with other servers - i.e., connections between Scylla nodes. For CQL connections with the client, we used a different option - "client_encryption_options". This patch introduces a third option "alternator_encryption_options", which controls only Alternator's HTTPS server. Making it separate from the existing CQL "client_encryption_options" allows both Alternator and CQL to be active at the same time but with different certificates (if the user so wishes). For backward compatibility, we temporarily continue to allow server_encryption_options to control the Alternator HTTPS server if alternator_encryption_options is not specified. However, this generates a warning in the log, urging the user to switch. This temporary workaround should be removed in a future version. This patch also: 1. fixes the test run code (which has an "--https" option to test over https) to use the new name of the option. 2. Adds documentation of the new option in alternator.md and protocols.md - previously the information on how to control the location of the certificate was missing from these documents. Fixes #7204. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930123027.213587-1-nyh@scylladb.com>	2020-10-14 18:13:57 +03:00
Avi Kivity	86bbf1763d	Merge "reader concurrency semaphore: dump permit diagnostics on timeout or queue overflow" from Botond " The reader concurrency semaphore timing out or its queue being overflown are fairly common events both in production and in testing. At the same time it is a hard to diagnose problem that often has a benign cause (especially during testing), but it is equally possible that it points to something serious. So when this error starts to appear in logs, usually we want to investigate and the investigation is lengthy... either involves looking at metrics or coredumps or both. This patch intends to jumpstart this process by dumping a diagnostics on semaphore timeout or queue overflow. The diagnostics is printed to the log with debug level to avoid excessive spamming. It contains a histogram of all the permits associated with the problematic semaphore organized by table, operation and state. Example: DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore - Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics: Permits with state admitted, sorted by memory memory count name 3499M 27 ks.test:data-query 3499M 27 total Permits with state waiting, sorted by count count memory name 1 0B ks.test:drain 7650 0B ks.test:data-query 7651 0B total Permits with state registered, sorted by count count memory name 0 0B total Total: permits: 7678, memory: 3499M This allows determining several things at glance: * What are the tables involved * What are the operations involved * Where is the memory This can speed up a follow-up investigation greatly, or it can even be enough on its own to determine that the issue is benign. Tests: unit(dev, debug) " * 'dump-diagnostics-on-semaphore-timeout/v2' of https://github.com/denesb/scylla: reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow utils: add to_hr_size() reader_concurrency_semaphore: link permits into an intrusive list reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line reader_concurrency_semaphore: move constructors out-of-line reader_concurrency_semaphore: add state to permits reader_concurrency_semaphore: name permits querier_cache_test: test_immediate_evict_on_insert: use two permits multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader() multishard_combining_reader: add permit parameter multishard_combining_reader: shard_reader: use multishard reader's permit	2020-10-13 12:44:23 +03:00
Botond Dénes	ff623e70b3	reader_concurrency_semaphore: name permits Require a schema and an operation name to be given to each permit when created. The schema is of the table the read is executed against, and the operation name, which is some name identifying the operation the permit is part of. Ideally this should be different for each site the permit is created at, to be able to discern not only different kind of reads, but different code paths the read took. As not all read can be associated with one schema, the schema is allowed to be null. The name will be used for debugging purposes, both for coredump debugging and runtime logging of permit-related diagnostics.	2020-10-13 12:32:13 +03:00
Piotr Dulikowski	77a0f1a153	hints: don't read hint files when it's not allowed to send When there are hint files to be sent and the target endpoint is DOWN, end_point_hints_manager works in the following loop: - It reads the first hint file in the queue, - For each hint in the file it decides that it won't be sent because the target endpoint is DOWN, - After realizing that there are some unsent hints, it decides to retry this operation after sleeping 1 second. This causes the first segment to be wholly read over and over again, with 1 second pauses, until the target endpoint becomes UP or leaves the cluster. This causes unnecessary I/O load in the streaming scheduling group. This patch adds a check which prevents end_point_hints_manager from reading the first hint file at all when it is not allowed to send hints. First observed in #6964 Tests: - unit(dev) - hinted handoff dtests Closes #7407	2020-10-12 19:09:57 +03:00
Botond Dénes	dd372c8457	flat_mutation_reader: de-virtualize buffer_size() The main user of this method, the one which required this method to return the collective buffer size of the entire reader tree, is now gone. The remaining two users just use it to check the size of the reader instance they are working with. So de-virtualize this method and reduce its responsibility to just returning the buffer size of the current reader instance.	2020-10-06 08:22:56 +03:00
Avi Kivity	fd1dd0eac7	Merge "Track the memory consumption of reader buffers" from Botond " The last major untracked area of the reader pipeline is the reader buffers. These scale with the number of readers as well as with the size and shape of data, so their memory consumption is unpredictable varies wildly. For example many small rows will trigger larger buffers allocated within the `circular_buffer<mutation_fragment>`, while few larger rows will consume a lot of external memory. This series covers this area by tracking the memory consumption of both the buffer and its content. This is achieved by passing a tracking allocator to `circular_buffer<mutation_fragment>` so that each allocation it makes is tracked. Additionally, we now track the memory consumption of each and every mutation fragment through its whole lifetime. Initially I contemplated just tracking the `_buffer_size` of `flat_mutation_reader::impl`, but concluded that as our reader trees are typically quite deep, this would result in a lot of unnecessary `signal()`/`consume()` calls, that scales with the number of mutation fragments and hence adds to the already considerable per mutation fragment overhead. The solution chosen in this series is to instead track the memory consumption of the individual mutation fragments, with the observation that these are typically always moved and very rarely copied, so the number of `signal()`/`consume()` calls will be minimal. This additional tracking introduces an interesting dilemma however: readers will now have significant memory on their account even before being admitted. So it may happen that they can prevent their own admission via this memory consumption. To prevent this, memory consumption is only forwarded to the semaphore upon admission. This might be solved when the semaphore is moved to the front -- before the cache. Another consequence of this additional, more complete tracking is that evictable readers now consume memory even when the underlying reader is evicted. So it may happen that even though no reader is currently admitted, all memory is consumed from the semaphore. To prevent any such deadlocks, the semaphore now admits a reader unconditionally if no reader is admitted -- that is if all count resources all available. Refs: #4176 Tests: unit(dev, debug, release) " * 'track-reader-buffers/v2' of https://github.com/denesb/scylla: (37 commits) test/manual/sstable_scan_footprint_test: run test body in statement sched group test/manual/sstable_scan_footprint_test: move test main code into separate function test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s test/manual/sstable_scan_footprint_test: make clustering row size configurable test/manual/sstable_scan_footprint_test: document sstable related command line arguments mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*() test: simple_schema: add make_static_row() reader_permit: reader_resources: add operator== mutation_fragment: memory_usage(): remove unused schema parameter mutation_fragment: track memory usage through the reader_permit reader_permit: resource_units: add permit() and resources() accessors mutation_fragment: add schema and permit partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment mutation_fragment: remove as_mutable_end_of_partition() mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/ mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/ mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/ flat_mutation_reader: make _buffer a tracked buffer mutation_reader: extract the two fill_buffer_result into a single one ...	2020-09-29 16:08:16 +03:00
Eliran Sinvani	925cdc9ae1	consistency level: fix wrong quorum calculation whe RF = 0 We used to calculate the number of endpoints for quorum and local_quorum unconditionally as ((rf / 2) + 1). This formula doesn't take into account the corner case where RF = 0, in this situation quorum should also be 0. This commit adds the missing corner case. Tests: Unit Tests (dev) Fixes #6905 Closes #7296	2020-09-29 13:25:41 +03:00
Piotr Sarna	4b856cf62d	transport: make max_concurrent_requests_per_shard reloadable This configuration entry is expected to be used as a quick fix for an overloaded node, so it should be possible to reload this value without having to restart the server.	2020-09-29 10:11:36 +02:00
Piotr Sarna	b4db6d2598	transport,config: add a param for max request concurrency The newly introduced parameter - max_concurrent_requests_per_shard - can be used to limit the number of in-flight requests a single coordinator shard can handle. Each surplus request will be immediately refused by returning OverloadedException error to the client. The default value for this parameter is large enough to never actually shed any requests. Currently, the limit is only applied to CQL requests - other frontends like alternator and redis are not throttled yet.	2020-09-29 09:59:30 +02:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	4f5ccf82cb	mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_clustering_row() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Piotr Dulikowski	39771967bb	hinted handoff: fix race - decomission vs. endpoint mgr init This patch fixes a race between two methods in hints manager: drain_for and store_hint. The first method is called when a node leaves the cluster, and it 'drains' end point hints manager for that node (sends out all hints for that node). If this method is called when the local node is being decomissioned or removed, it instead drains hints managers for all endpoints. In the case of decomission/remove, drain_for first calls parallel_for_each on all current ep managers and tells them to drain their hints. Then, after all of them complete, _ep_managers.clear() is called. End point hints managers are created lazily and inserted into _ep_managers map the first time a hint is stored for that node. If this happens between parallel_for_each and _ep_managers.clear() described above, the clear operation will destroy the new ep manager without draining it first. This is a bug and will trigger an assert in ep manager's destructor. To solve this, a new flag for the hints manager is added which is set when it drains all ep managers on removenode/decommission, and prevents further hints from being written. Fixes #7257 Closes #7278	2020-09-24 14:51:24 +03:00
Avi Kivity	844b675520	view: view_update_generator: drop references to sstables when stopping sstable_manager will soon wait for all sstables under its control to be deleted (if so marked), but that can't happen if someone is holding on to references to those sstables. To allow sstables_manager::stop() to work, drop remaining queued work when terminating.	2020-09-23 20:55:02 +03:00
Avi Kivity	a0ffcabd66	view: use nonwrapping_interval instead of nonwrapping_range to avoid clang deduction failure We use class template argument deduction (CTAD) in a few places, but it appears not to work for alias templates in clang. While it looks like a clang bug, using the class name is an improvement, so let's do that.	2020-09-21 16:32:53 +03:00
Pavel Solodovnikov	6e10f2b530	schema_registry: make grace period configurable Introduce new database config option `schema_registry_grace_period` describing the amount of time in seconds after which unused schema versions will be cleaned up from the schema registry cache. Default value is 1 second, the same value as was hardcoded before. Tests: unit(debug) Refs: #7225 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200915131957.446455-1-pa.solodovnikov@scylladb.com>	2020-09-15 17:53:27 +02:00

... 61 62 63 64 65 ...

4972 Commits