scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Botond Dénes	b247f29881	Merge 'De-static system_keyspace::get_{saved\|local}_tokens()' from Pavel Emelyanov Yet another user of global qctx object. Making the method(s) non-static requires pushing the system_keyspace all the way down to size_estimate_virtual_reader and a small update of the cql_test_env Closes #11738 * github.com:scylladb/scylladb: system_keyspace: Make get_{local\|saved}_tokens non static size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges() cql_test_env: Keep sharded<system_keyspace> reference size_estimate_virtual_reader: Keep system_keyspace reference system_keyspace: Pass sys_ks argument to install_virtual_readers() system_keyspace: Make make() non-static distributed_loader: Pass sys_ks argument to init_system_keyspace() system_keyspace: Remove dangling forward declaration	2022-10-07 11:28:32 +03:00
Avi Kivity	20bad62562	Merge 'Detect and record large collections' from Benny Halevy This series adds support for detecting collections that have too many items and recording them in `system.large_cells`. A configuration variable was added to db/config: `compaction_collection_items_count_warning_threshold` set by default to 10000. Collections that have more items than this threshold will be warned about and will be recorded as a large cell in the `system.large_cells` table. Documentation has been updated respectively. A new column was added to system.large_cells: `collection_items`. Similar to the `rows` column in system.large_partition, `collection_items` holds the number of items in a collection when the large cell is a collection, or 0 if it isn't. Note that the collection may be recorded in system.large_cells either due to its size, like any other cell, and/or due to the number of items in it, if it cross the said threshold. Note that #11449 called for a new system.large_collections table, but extending system.large_cells follows the logic of system.large_partitions is a smaller change overall, hence it was preferred. Since the system keyspace schema is hard coded, the schema version of system.large_cells was bumped, and since the change is not backward compatible, we added a cluster feature - `LARGE_COLLECTION_DETECTION` - to enable using it. The large_data_handler large cell detection record function will populate the new column only when the new cluster feature is enabled. In addition, unit tests were added in sstable_3_x_test for testing large cells detection by cell size, and large_collection detection by the number of items. Closes #11449 Closes #11674 * github.com:scylladb/scylladb: sstables: mx/writer: optimize large data stats members order sstables: mx/writer: keep large data stats entry as members db: large_data_handler: dynamically update config thresholds utils/updateable_value: add transforming_value_updater db/large_data_handler: cql_table_large_data_handler: record large_collections db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler db/large_data_handler: cql_table_large_data_handler: move ctor out of line docs: large-rows-large-cells-tables: fix typos db/system_keyspace: add collection_elements column to system.large_cells gms/feature_service: add large_collection_detection cluster feature test: sstable_3_x_test: add test_sstable_too_many_collection_elements test: lib: simple_schema: add support for optional collection column test: lib: simple_schema: build schema in ctor body test: lib: simple_schema: cql: define s1 as static only if built this way db/large_data_handler: maybe_record_large_cells: consider collection_elements db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries sstables: mx/writer: pass collection_elements to writer::maybe_record_large_cells sstables: mx/writer: add large_data_type::elements_in_collection db/large_data_handler: get the collection_elements_count_threshold db/config: add compaction_collection_elements_count_warning_threshold test: sstable_3_x_test: add test_sstable_write_large_cell test: sstable_3_x_test: pass cell_threshold_bytes to large_data_handler test: sstable_3_x_test: large_data_handler: prepare callback for testing large_cells test: sstable_3_x_test: large_data tests: use BOOST_REQUIRE_[GL]T test: sstable_3_x_test: test_sstable_log_too_many_rows: use tests::random	2022-10-06 18:28:21 +03:00
Pavel Emelyanov	59da903054	system_keyspace: Make get_{local\|saved}_tokens non static Now all callers have system_keyspace reference at hand. This removes one more user of the global qctx object Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 18:02:09 +03:00
Pavel Emelyanov	b03f1e7b17	size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges() This method static calls system_keyspace::get_local_tokens(). Having the system_keyspace reference will make this method non-static Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 18:00:09 +03:00
Pavel Emelyanov	34e8e5959f	size_estimate_virtual_reader: Keep system_keyspace reference The s._e._v._reader::fill_buffer() method needs system keyspace to get node's local tokens. Now it's a static method, having system_keyspace reference will make it non-static Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:58:07 +03:00
Pavel Emelyanov	04552f2d58	system_keyspace: Pass sys_ks argument to install_virtual_readers() The size-estimate-virtual-reader will need it, now it's available as "this" from system_keyspace::make() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:57:13 +03:00
Pavel Emelyanov	1938412d7a	system_keyspace: Make make() non-static This helper needs system_keyspace reference and using "this" as this looks natural. Also this de-static-ification makes it possible to put some sense into the invoke_on_all() call from init_system_keyspace() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:56:11 +03:00
Pavel Emelyanov	e996503f0d	system_keyspace: Remove dangling forward declaration It doesn't match the real system_keyspace_make() definition and is in fact not needed, as there's another "real" one in database.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:54:22 +03:00
Benny Halevy	2c4ff71d2b	db: large_data_handler: dynamically update config thresholds make the various large data thresholds live-updateable and construct the observers and updaters in cql_table_large_data_handler to dynamically update the base large_data_handler class threshold members. Fixes #11685 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-05 10:53:40 +03:00
Avi Kivity	37c6b46d26	dirty_memory_manager: re-term "virtual dirty" to "unspooled dirty" The "virtual dirty" term is not very informative. "Virtual" means "not real", but it doesn't say in which way it isn't real. In this case, virtual dirty refers to real dirty memory, minus the portion of memtables that has been written to disk (but not yet sealed - in that case it would not be dirty in the first place). I chose to call "the portion of memtables that has been written to disk" as "spooled memory". At least the unique term will cause people to look it up and may be easier to remember. From that we have "unspooled memory". I plan to further change the accounting to account for spooled memory rather than unspooled, as that is a more natural term, but that is left for later. The documentation, config item, and metrics are adjusted. The config item is practically unused so it isn't worth keeping compatibility here.	2022-10-04 14:03:59 +03:00
Benny Halevy	46ebffcc93	db/large_data_handler: cql_table_large_data_handler: record large_collections When the large_collection_detection cluster feature is enabled, select the internal_record_large_cells_and_collections method to record the large collection cell, storing also the collection_elements column. We want to do that only when the cluster feature is enabled to facilitate rollback in case rolling upgrade is aborted, otherwise system.large_cells won't be backward compatible and will have to be deleted manually. Delete the sstable from system.large_cells if it contains elements_in_collection above threshold. Closes #11449 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:10 +03:00
Benny Halevy	3f8bba202f	db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler For recording collection_elements of large_collections when the large_collection_detection feature is enabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:10 +03:00
Benny Halevy	dc4e7d8e01	db/large_data_handler: cql_table_large_data_handler: move ctor out of line Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:09 +03:00
Benny Halevy	2f49eebb04	db/system_keyspace: add collection_elements column to system.large_cells And bump the schema version offset since the new schema should be distinguishable from the previous one. Refs scylladb/scylladb#11660 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:08 +03:00
Benny Halevy	6dadca2648	db/large_data_handler: maybe_record_large_cells: consider collection_elements Detect large_collections when the number of collection_elements is above the configured threshold. Next step would be to record the number of collection_elements in the system.large_cells table, when the respective cluster feature is enabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:05 +03:00
Benny Halevy	27ee75c54e	db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries Log in debug level when deleting large data entry from system table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:04 +03:00
Benny Halevy	a107f583fd	db/large_data_handler: get the collection_elements_count_threshold Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:11 +03:00
Benny Halevy	167ec84eeb	db/config: add compaction_collection_elements_count_warning_threshold Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:10 +03:00
Botond Dénes	5621cdd7f9	db/view/view_builder: don't drop partition and range tombstones when resuming The view builder builds the views from a given base table in view_builder::batch_size batches of rows. After processing this many rows, it suspends so the view builder can switch to building views for other base tables in the name of fairness. When resuming the build step for a given base table, it reuses the reader used previously (also serving the role of a snapshot, pinning sstables read from). The compactor however is created anew. As the reader can be in the middle of a partition, the view builder injects a partition start into the compactor to prime it for continuing the partition. This however only included the partition-key, crucially missing any active tombstones: partition tombstone or -- since the v2 transition -- active range tombstone. This can result in base rows covered by either of this to be resurrected and the view builder to generate view updates for them. This patch solves this by using the detach-state mechanism of the compactor which was explicitly developed for situations like this (in the range scan code) -- resuming a read with the readers kept but the compactor recreated. Also included are two test cases reproducing the problem, one with a range tombstone, the other with a partition tombstone. Fixes: #11668 Closes #11671	2022-10-03 11:28:22 +03:00
Botond Dénes	ad04f200d3	Merge 'database: automatically take snapshot of base table views' from Benny Halevy The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11616 * github.com:scylladb/scylladb: database: automatically take snapshot of base table views api: storage_service: reject snapshot of views in api layer	2022-09-29 13:33:31 +03:00
Benny Halevy	d32c497cd9	database: automatically take snapshot of base table views The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 11:02:54 +03:00
Benny Halevy	55b0b8fe2c	api: storage_service: reject snapshot of views in api layer Rather than pushing the check to `snapshot_ctl::take_column_family_snapshot`, just check that explcitly when taking a snapshot of a particular table by name over the api. Other paths that call snapshot_ctl::take_column_family_snapshot are internal and use it to snap views already. With that, we can get rid of the allow_view_snapshots flag that was introduced in `aab4cd850c`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 10:44:56 +03:00
Benny Halevy	fcbbc3eb9c	db/large_data_handler: print static cell/collection description in log warning When warning about a large cell/collection in a static row, print that fact in the log warning to make it clearer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-25 14:37:42 +03:00
Benny Halevy	4670829502	db/large_data_handler: separate pk and ck strings in log warning with delimiter Currently (since `f3089bf3d1`), when printing a warning to the log about large rows and/or cells the clustering key string is concatenated to the partition key string, rendering the warning confsing and much less useful. This patch adds a '/' delimiter to separate the fields, and also uses one to separate the clustering key from the column name for large cells. In case of a static cell, the clustering key is null hence the warning will look like: `pk//column`. This patch does NOT change anything in the large_* system table schema or contents. It changes only the log warning format that need not be backward compatible. Fixes #11620 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-25 14:36:41 +03:00
Botond Dénes	b9d55ee02f	Merge 'Add cassandra functional - show warn/err when tombstone threshold reached.' from Taras Borodin Add cassandra functional - show warn/err when tombstone_warn_threshold/tombstone_failure_threshold reached on select, by partitions. Propagate raw query_string from coordinator to replicas. Closes #11356 * github.com:scylladb/scylladb: add utf8:validate to operator<< partition_key with_schema. Show warn message if `tombstone_warn_threshold` reached on querier.	2022-09-23 05:53:47 +03:00
Tomasz Grabiec	ccbfe2ef0d	Merge 'Fix invalid mutation fragment stream issues' from Botond Dénes Found by a fragment stream validator added to the mutation-compactor (https://github.com/scylladb/scylladb/pull/11532). As that PR moves very slowly, the fixes for the issues found are split out into a PR of their own. The first two of these issues seems benign, but it is important to remember that how benign an invalid fragment stream is depends entirely on the consumer of said stream. The present consumer of said streams may swallow the invalid stream without problem now but any future change may cause it to enter into a corrupt state. The last one is a non-benign problem (again because the consumer reacts badly already) causing problems when building query results for range scans. Closes #11604 * github.com:scylladb/scylladb: shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied readers/mutation_readers: compacting_reader: remember injected partition-end db/view: view_builder::execute(): only inject partition-start if needed	2022-09-22 17:57:27 +02:00
TarasBor	1f4a93da78	Show warn message if `tombstone_warn_threshold` reached on querier. When querier read page with tombstones more than `tombstone_warn_threshold` limit - warning message appeared in logs. If `tombstone_warn_threshold:0` feature disabled. Refs scylladb#11410	2022-09-22 16:42:31 +03:00
Botond Dénes	681e6ae77f	db/view: view_builder::execute(): only inject partition-start if needed When resuming a build-step, the view builder injects the partition-start fragment of the last processed partition, to bring the consumer (compactor) into the correct state before it starts to consume the remainder of the partition content. This results in an invalid fragment stream when the partition was actually over or there is nothing left for the build step. Make the inject conditional on when the reader contains more data for the partition. Fixes: #11607	2022-09-22 13:54:36 +03:00
Piotr Sarna	481240b8b4	Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El We had quite a few tests for Alternator TTL in test/alternator, but most of them did not run as part of the usual Jenkins test suite, because they were considered "very slow" (and require a special "--runveryslow" flag to run). In this series we enable six tests which run quickly enough to run by default, without an additional flag. We also make them even quicker - the six tests now take around 2.5 seconds. I also noticed that we don't have a test for the Alternator TTL metrics - and added one. Fixes #11374. Refs https://github.com/scylladb/scylla-monitoring/issues/1783 Closes #11384 * github.com:scylladb/scylladb: test/alternator: insert test names into Scylla logs rest api: add a new /system/log operation alternator ttl: log warning if scan took too long. alternator,ttl: allow sub-second TTL scanning period, for tests test/alternator: skip fewer Alternator TTL tests test/alternator: test Alternator TTL metrics	2022-09-22 09:47:50 +02:00
Kamil Braun	595472ac59	Merge 'Don't use qctx in CDC tables quering' from Pavel Emelyanov There's a bunch of helpers for CDC gen service in db/system_keyspace.cc. All are static and use global qctx to make queries. Fortunately, both callers -- storage_service and cdc_generation_service -- already have local system_keyspace references and can call the methods via it, thus reducing the global qctx usage. Closes #11557 * github.com:scylladb/scylladb: system_keyspace: De-static get_cdc_generation_id() system_keyspace: De-static cdc_is_rewritten() system_keyspace: De-static cdc_set_rewritten() system_keyspace: De-static update_cdc_generation_id()	2022-09-16 11:52:01 +02:00
Pavel Emelyanov	e221bb0112	system_keyspace: De-static get_cdc_generation_id() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-16 08:34:15 +03:00
Pavel Emelyanov	4f67898e7b	system_keyspace: De-static cdc_is_rewritten() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:59 +03:00
Pavel Emelyanov	736021ee98	system_keyspace: De-static cdc_set_rewritten() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:53 +03:00
Pavel Emelyanov	b3d139bbdb	system_keyspace: De-static update_cdc_generation_id() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:40 +03:00
Michał Chojnowski	cdb3e71045	sstables: add a flag for disabling long-term index caching Long-term index caching in the global cache, as introduced in 4.6, is a major pessimization for workloads where accesses to the index are (spacially) sparse. We want to have a way to disable it for the affected workloads. There is already infrastructure in place for disabling it for BYPASS CACHE queries. One way of solving the issue is hijacking that infrastructure. This patch adds a global flag (and a corresponding CLI option) which controls index caching. Setting the flag to `false` causes all index reads to behave like they would in BYPASS CACHE queries. Consequences of this choice: - The per-SSTable partition_index_cache is unused. Every index_reader has its own, and they die together. Independent reads can no longer reuse the work of other reads which hit the same index pages. This is not crucial, since partition accesses have no (natural) spatial locality. Note that the original reason for partition_index_cache -- the ability to share reads for the lower and upper bound of the query -- is unaffected. - The per-SSTable cached_file is unused. Every index_reader has its own (uncached) input stream from the index file, and every bsearch_clustered_cursor has its own cached_file, which dies together with the cursor. Note that the cursor still can perform its binary search with caching. However, it won't be able to reuse the file pages read by index_reader. In particular, if the promoted index is small, and fits inside the same file page as its index_entry, that page will be re-read. It can also happen that index_reader will read the same index file page multiple times. When the summary is so dense that multiple index pages fit in one index file page, advancing the upper bound, which reads the next index page, will read the same index file page. Since summary:disk ratio is 1:2000, this is expected to happen for partitions with size greater than 2000 partition keys. Fixes #11202	2022-09-15 17:16:26 +03:00
Raphael S. Carvalho	0a8afe18ca	cql: Reject create and alter table with DateTieredCompactionStrategy It's been ~1 year (`2bf47c902e`) since we set restrict_dtcs config option to WARN, meaning users have been warned about the deprecation process of DTCS. Let's set the config to TRUE, meaning that create and alter statements specifying DTCS will be rejected at the CQL level. Existing tables will still be supported. But the next step will be about throwing DTCS code into the shadow realm, and after that, Scylla will automatically fallback to STCS (or ICS) for users which ignored the deprecation process. Refs #8914. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11458	2022-09-15 11:46:18 +03:00
Michał Chojnowski	9b6fc553b4	db: commitlog: don't print INFO logs on shutdown The intention was for these logs to be printed during the database shutdown sequence, but it was overlooked that it's not the only place where commitlog::shutdown is called. Commitlogs are started and shut down periodically by hinted handoff. When that happens, these messages spam the log. Fix that by adding INFO commitlog shutdown logs to database::stop, and change the level of the commitlog::shutdown log call to DEBUG. Fixes #11508 Closes #11536	2022-09-14 11:30:53 +03:00
Nadav Har'El	8ece63c433	Merge 'Safemode - Introduce TimeWindowCompactionStrategy Guardrails' This series introduces two configurable options when working with TWCS tables: - `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting - `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check. Refs: #6923 Fixes: #9029 Closes #11445 * github.com:scylladb/scylladb: tests: cql_query_test: add mixed tests for verifying TWCS guard rails tests: cql_query_test: add test for TWCS window size tests: cql_query_test: add test for TWCS tables with no TTL defined cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables cql: add max window restriction for TimeWindowCompactionStrategy time_window_compaction_strategy: reject invalid window_sizes cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr	2022-09-12 23:55:51 +03:00
Kamil Braun	2fe3e67a47	gms: feature_service: don't distinguish between 'known' and 'supported' features `feature_service` provided two sets of features: `known_feature_set` and `supported_feature_set`. The purpose of both and the distinction between them was unclear and undocumented. The 'supported' features were gossiped by every node. Once a feature is supported by every node in the cluster, it becomes 'enabled'. This means that whatever piece of functionality is covered by the feature, it can by used by the cluster from now on. The 'known' set was used to perform feature checks on node start; if the node saw that a feature is enabled in the cluster, but the node does not 'know' the feature, it would refuse to start. However, if the feature was 'known', but wasn't 'supported', the node would not complain. This means that we could in theory allow the following scenario: 1. all nodes support feature X. 2. X becomes enabled in the cluster. 3. the user changes the configuration of some node so feature X will become unsupported but still known. 4. The node restarts without error. So now we have a feature X which is enabled in the cluster, but not every node supports it. That does not make sense. It is not clear whether it was accidental or purposeful that we used the 'known' set instead of the 'supported' set to perform the feature check. What I think is clear, is that having two sets makes the entire thing unnecessarily complicated and hard to think about. Fortunately, at the base to which this patch is applied, the sets are always the same. So we can easily get rid of one of them. I decided that the name which should stay is 'supported', I think it's more specific than 'known' and it matches the name of the corresponding gossiper application state. Closes #11512	2022-09-12 13:09:12 +03:00
Nadav Har'El	e7e9adc519	alternator,ttl: allow sub-second TTL scanning period, for tests Alternator has the "alternator_ttl_period_in_seconds" parameter for controlling how often the expiration thread looks for expired items to delete. It is usually a very large number of seconds, but for tests to finish quickly, we set it to 1 second. With 1 second expiration latency, test/alternator/test_ttl.py took 5 seconds to run. In this patch, we change the parameter to allow a floating-point number of seconds instead of just an integer. Then, this allows us to halve the TTL period used by tests to 0.5 seconds, and as a result, the run time of test_ttl.py halves to 2.5 seconds. I think this is fast enough for now. I verified that even if I change the period to 0.1, there is no noticable slowdown to other Alternator tests, so 0.5 is definitely safe. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Pavel Emelyanov	f3dfc9dbd4	system_keyspace: Don't load preferred IPs if not asked for If snitch->prefer_local() is false, advertised (via gossiper) INTERNAL_IPs are not suggested to messaging service to use. The same should apply to boot-time when messaging service is loaded with those IPs taken from the system.peers table. fixes: #11353 tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2172/ Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220909144800.23122-1-xemul@scylladb.com>	2022-09-12 09:48:23 +03:00
Botond Dénes	5374f0edbf	Merge 'Task manager' from Aleksandra Martyniuk Task manager for observing and managing long-running, asynchronous tasks in Scylla with the interface for the user. It will allow listing of tasks, getting detailed task status and progression, waiting for their completion, and aborting them. The task manager will be configured with a “task ttl” that determines how long the task status is kept in memory after the task completes. At first it will support repair and compaction tasks, and possibly more in the future. Currently: Sharded `task_manager` is started in `main.cc` where it is further passed to `http_context` for the purpose of user interface. Task manager's tasks are implemented in two two layers: the abstract and the implementation one. The latter is a pure virtual class which needs to be overriden by each module. Abstract layer provides the methods that are shared by all modules and the access to module-specific methods. Each module can access task manager, create and manage its tasks through `task_manager::module` object. This way data specific to a module can be separated from the other modules. User can access task manager rest api interface to track asynchronous tasks. The available options consist of: - getting a list of modules - getting a list of basic stats of all tasks in the requested module - getting the detailed status of the requested task - aborting the requested task - waiting for the requested task to finish To enable testing of the provided api, test specific task implementation and module are provided. Their lifetime can be simulated with the standalone test api. These components are compiled and the tests are run in all but release build modes. Fixes: #9809 Closes #11216 * github.com:scylladb/scylladb: test: task manager api test task_manager: test api layer implementation task_manager: add test specific classes task_manager: test api layer task_manager: api layer implementation task_manager: api layer task_manager: keep task_manager reference in http_context start sharded task manager task_manager: create task manager object	2022-09-12 09:26:46 +03:00
Felipe Mendes	7fec4fcaa6	cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables TimeWindowCompactionStrategy (TWCS) tables are known for being used explicitly for time-series workloads. In particular, most of the time users should specify a default_time_to_live during table creation to ensure data is expired such as in a sliding window. Failure to do so may create unbounded windows - which - depending on the compaction window chosen, may introduce severe latency and operational problems, due to unbounded window growth. However, there may be some use cases which explicitly ingest data by using the `USING TTL` keyword, which effectively has the same effect. Therefore, we can not simply forbid table creations without a default_time_to_live explicitly set to any value other than 0. The new restrict_twcs_without_default_ttl option has three values: "true", "false", and "warn": We default to "warn", which will notify the user of the consequences when creating a TWCS table without a default_time_to_live value set. However, users are encouraged to switch it to "true", as - ideally - a default_time_to_live value should always be expected to prevent applications failing to ingest data against the database ommitting the `USING TTL` keyword.	2022-09-11 16:50:42 -03:00
Felipe Mendes	a3356e866b	cql: add max window restriction for TimeWindowCompactionStrategy The number of potential compaction windows (or buckets) is defined by the default_time_to_live / sstable_window_size ratio. Every now and then we end up in a situation on where users of TWCS end up underestimating their window buckets when using TWCS. Unfortunately, scenarios on which one employs a default_time_to_live setting of 1 year but a window size of 30 minutes are not rare enough. Such configuration is known to only make harm to a workload: As more and more windows are created, the number of SSTables will grow in the same pace, and the situation will only get worse as the number of shards increase. This commit introduces the twcs_max_window_count option, which defaults to 50, and will forbid the Creation or Alter of tables which get past this threshold. A value of 0 will explicitly skip this check. Note: this option does not forbid the creation of tables with a default_time_to_live=0 as - even though not recommended - it is perfectly possible for a TWCS table with default TTL=0 to have a bound window, provided any ingestion statements make use of 'USING TTL' within the CQL statement, in addition to it.	2022-09-11 16:50:22 -03:00
Kamil Braun	dba595d347	Merge 'Minimal implementation of Broadcast Tables' from Mikołaj Grzebieluch Broadcast tables are tables for which all statements are strongly consistent (linearizable), replicated to every node in the cluster and available as long as a majority of the cluster is available. If a user wants to store a “small” volume of metadata that is not modified “too often” but provides high resiliency against failures and strong consistency of operations, they can use broadcast tables. The main goal of the broadcast tables project is to solve problems which need to be solved when we eventually implement general-purpose strongly consistent tables: designing the data structure for the Raft command, ensuring that the commands are idempotent, handling snapshots correctly, and so on. In this MVP (Minimum Viable Product), statements are limited to simple SELECT and UPDATE operations on the built-in table. In the future, other statements and data types will be available but with this PR we can already work on features like idempotent commands or snapshotting. Snapshotting is not handled yet which means that restarting a node or performing too many operations (which would cause a snapshot to be created) will give incorrect results. In a follow-up, we plan to add end-to-end Jepsen tests (https://jepsen.io/). With this PR we can already simulate operations on lists and test linearizability in linear complexity. This can also test Scylla's implementation of persistent storage, failure detector, RPC, etc. Design doc: https://docs.google.com/document/d/1m1IW320hXtsGulzSTSHXkfcBKaG5UlsxOpm6LN7vWOc/edit?usp=sharing Closes #11164 * github.com:scylladb/scylladb: raft: broadcast_tables: add broadcast_kv_store test raft: broadcast_tables: add returning query result raft: broadcast_tables: add execution of intermediate language raft: broadcast_tables: add compilation of cql to intermediate language raft: broadcast_tables: add definition of intermediate language db: system_keyspace: add broadcast_kv_store table db: config: add BROADCAST_TABLES feature flag	2022-09-09 18:05:37 +02:00
Aleksandra Martyniuk	2439e55974	task_manager: create task manager object Implementation of a task manager that allows tracking and managing asynchronous tasks. The tasks are represented by task_manager::task class providing members common to all types of tasks. The methods that differ among tasks of different module can be overriden in a class inheriting from task_manager::task::impl class. Each task stores its status containing parameters like id, sequence number, begin and end time, state etc. After the task finishes, it is kept in memory for configurable time or until it is unregistered. Tasks need to be created with make_task method. Each module is represented by task_manager::module type and should have an access to task manager through task_manager::module methods. That allows to easily separate and collectively manage data belonging to each module.	2022-09-09 14:29:28 +02:00
Benny Halevy	6fb4b5555d	db: view: get_tombstone_gc_state from compaction_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:05:39 +03:00
Benny Halevy	71ede6124a	db: view: pass base table to view_update_builder To be used by generate_update() for getting the tombstone_gc_state via the table's compaction_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:04:23 +03:00
Benny Halevy	5dd15aa3c8	tombstone_gc: introduce tombstone_gc_state and use it to access the repair history maps. At this introductory patch, we use default-constructed tombstone_gc_state to access the thread-local maps temporarily and those use sites will be replaced in following patches that will gradually pass the tombstone_gc_state down from the compaction_manager to where it's used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Mikołaj Grzebieluch	726658f073	db: system_keyspace: add broadcast_kv_store table First implementation of strongly consistent everywhere tables operates on simple table representing string to string map. Add hard-coded schema for broadcast_kv_store table (key text primary key, value text). This table is under system keyspace and is created if and only if BROADCAST_TABLES feature is enabled.	2022-09-05 11:11:08 +02:00

1 2 3 4 5 ...

2750 Commits