scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 17:10:35 +00:00

Author	SHA1	Message	Date
Avi Kivity	605bf6e221	range.hh: retire range.hh was deprecated in `bd794629f9` (2020) since its names conflict with the C++ library concept of an iterator range. The name ::range also mapped to the dangerous wrapping_interval rather than nonwrapping_interval. Complete the deprecation by removing range.hh and replacing all the aliases by the names they point to from the interval library. Note this now exposes uses of wrapping intervals as they are now explicit. The unit tests are renamed and range.hh is deleted. Closes scylladb/scylladb#17428	2024-02-21 00:24:25 +02:00
Avi Kivity	eedb997568	Merge 'compaction: upgrade: handle keyspaces that use tablets' from Lakshmi Narayanan Sreethar Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace. Fixes #16848 Closes scylladb/scylladb#17335 * github.com:scylladb/scylladb: compaction: upgrade: handle keyspaces that use tablets replica/database: add an optional variant to get_keyspace_local_ranges	2024-02-15 21:31:54 +02:00
Lakshmi Narayanan Sreethar	7a98877798	compaction: upgrade: handle keyspaces that use tablets Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace. Fixes #16848 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-15 17:47:39 +05:30
Kefu Chai	caa20c491f	storage_service: pass non-empty keyspace when performing cleanup_all this change addresses the regression introduced by `5e0b3671`, which fall backs to local cleanup in cleanup_all. but `5e0b3671` failed to pass the keyspace to the `shard_cleanup_keyspace_compaction_task_impl` is its constructor parameter, that's why the test fails like ``` error executing POST request to http://localhost:10000/storage_service/cleanup_all with parameters {}: remote replied with status code 400 Bad Request: Can't find a keyspace ``` where the string after "Can't find a keyspace" is empty. in this change, the keyspace name of the keyspace to be cleaned is passed to `shard_cleanup_keyspace_compaction_task_impl`. we always enable the topology coordinator when performing testing, that's why this issue does not pop up until the longevity test. Fixes #17302 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17320	2024-02-15 13:17:45 +02:00
Avi Kivity	784c2f8ad2	Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing. Closes scylladb/scylladb#17130 * github.com:scylladb/scylladb: treewide: replace seastar::future::get0() with seastar::future::get() sstable: capture return value of get0() using auto utils: result_loop: define result_type with decayed type [avi: add another one that snuck in while this was cooking]	2024-02-04 15:23:33 +02:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Lakshmi Narayanan Sreethar	e86965c272	compaction: run rewrite_sstables_compaction_task_executor tasks in maintenance group Use maintenance group to run all the compaction tasks that use the rewrite_sstables_compaction_task_executor. Fixes #16699 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17112	2024-02-02 11:18:49 +02:00
Kefu Chai	5e0b3671d3	storage_service: fall back to local cleanup in cleanup_all before this change, if no keyspaces are specified, scylla-nodetool just enumerate all non-local keyspaces, and call "/storage_service/keyspace_cleanup" on them one after another. this is not quite efficient, as each this RESTful API call force a new active commitlog segment, and flushes all tables. so, if the target node of this command has N non-local keyspaces, it would repeat the steps above for N times. this is not necessary. and after a topology change, we would like to run a global "nodetool cleanup" without specifying the keyspace, so this is a typical use case which we do care about. to address this performance issue, in this change, we improve an existing RESTful API call "/storage_service/cleanup_all", so if the topology coordinator is not enabled, we fall back to a local cleanup to cleanup all non-local keyspaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	4f90a875f6	compaction: format flush_mode without the helper since flush_mode is moved out of major_compaction_task_impl, let's drop the helper hosted in that class as well, and implement the formatter witout it. please note, the `__builtin_unreachable()` is dropped. it should not change the behavior of the formatter. we don't put it in the `default` branch in hope that `-Wswitch` can warn us in the case when another enum of `flush_mode` is added, but we fail to handle it somehow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	b39cc01bb3	compaction_manager: flush all tables before cleanup according to the document "nodetool cleanup" > Triggers removal of data that the node no longer owns currently, scylla performs cleanup by rewriting the sstables. but commitlog segments may still contain the mutations to the tables which are dropped during sstable rewriting. when scylla server restarts, the dirty mutations are replayed to the memtable. if any of these dirty mutations changes the tables cleaned up. the stale data are reapplied. this would lead to data resurrection. so, in this change we following the same model of major compaction: 1. force new active segment, 2. flush all tables 3. perform cleanup using compaction, which rewrites the sstables of specified tables because we already `flush()` all tables in `cleanup_keyspace_compaction_task_impl::run()`, there is no need to call `flush()` again, in `table::perform_cleanup_compaction()`, so the `flush()` call is dropped in this function, and the tests using this function are updated to call `flush()` manually to preserve the existing behavior. there are two callers of `cleanup_keyspace_compaction_task_impl`, * one is `storage_service::sstable_cleanup_fiber()`, which listens for the events fired by topology_state_machine, which is in turn driven by, for instance, "/storage_service/cleanup_all" API. which cleanup all keyspaces in one after another. * another is "/storage_service/keyspace_cleanup", which cleans up the specified keyspace. in the first use case, we can force a new active segment for a single time, so another parameter to the ctor of `cleanup_keyspace_compaction_task_impl` is introduced to specify if the `db.flush_all_tables()` call should be skiped. please note, there are two possible optimizations, 1. force new active segment only if the mutations in it touches the tables being cleaned up 2. after forcing new active segment, only flush the (mem)tables mutated by the non-active segments but let's leave them for following-up changes. this change is a minimal fix for data resurrection issue. Fixes #16757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	9afec2e3e7	api, compaction: promote flush_mode so that this enum type can be shared by other task(s) as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Botond Dénes	c67698ea06	compaction/compaction_manager: perform_cleanup(): hold the compaction gate While the cleanup is ongoing. Otherwise, a concurrent table drop might trigger a use-after-free, as we have seen in dtests recently. Fixes: #16770 Closes scylladb/scylladb#16874	2024-01-25 14:52:50 +01:00
Benny Halevy	51a46aa83b	compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact Prevent the creation of a compaction task when the list of sstables is known to be empty ahead of time. Refs scylladb/scylladb#16694 Fixes scylladb/scylladb#16803 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 11:53:39 +02:00
Benny Halevy	bd1d65ec38	compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction `3b424e391b` introduced a loop in `perform_cleanup` that waits until all sstables that require cleanup are cleaned up. However, with `f1bbf705f9`, an sstable that is_eligible_for_compaction (i.e. it is not in staging, awaiting view update generation), may already be compacted by e.g. regular compaction. And so perform_cleanup should interrupt that by calling try_perform_cleanup, since the latter reevaluates `update_sstable_cleanup_state` with compaction disabled - that stops ongoing compactions. Refs scylladb/scylladb#15673 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 11:53:39 +02:00
Benny Halevy	d6071945c8	compaction, table: ignore foreign sstables replay_position The sstables replay_position in stats_metadata is valid only on the originating node and shard. Therefore, validate the originating host and shard before using it in compaction or table truncate. Fixes #10080 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16550	2024-01-16 18:45:59 +02:00
Botond Dénes	697ebef149	Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk Make compaction tasks internal. Drop all internal tasks without parents immediately after they are done. Fixes: #16735 Refs: #16694. Closes scylladb/scylladb#16698 * github.com:scylladb/scylladb: compaction: make regular compaction tasks internal tasks: don't keep internal root tasks after they complete	2024-01-11 12:10:44 +02:00
Kefu Chai	eb9216ef11	compaction: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16707	2024-01-10 11:07:36 +02:00
Aleksandra Martyniuk	6b87778ef2	compaction: make regular compaction tasks internal Regular compaction tasks are internal. Adjust test_compaction_task accordingly: modify test_regular_compaction_task, delete test_running_compaction_task_abort (relying on regular compaction) which checks are already achived by test_not_created_compaction_task_abort. Rename the latter.	2024-01-09 13:13:54 +01:00
Aleksandra Martyniuk	6f13e55187	tasks: call release_resources when task is finished Call task_manager::task::impl::release_resources when task is finished instead of putting the responsibility on user. Closes scylladb/scylladb#16660	2024-01-09 11:41:54 +02:00
Lakshmi Narayanan Sreethar	1d6eaf2985	compaction manager: remove: cleanup _compaction_state on exceptions If for some reason an exception is thrown in compaction_manager::remove, it might leave behind stale table pointers in _compaction_state. Fix that by setting up a deffered action to perform the cleanup. Fixes #16635 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16632	2024-01-03 22:03:24 +02:00
Raphael S. Carvalho	d1e6dfadea	sstables: Harden estimate_droppable_tombstone_ratio() interface The interface is fragile because the user may incorrectly use the wrong "gc before". Given that sstable knows how to properly calculate "gc before", let's do it in estimate__d__t__r(), leaving no room for mistakes. sstable_run's variant was also changed to conform to new interface, allowing ICS to properly estimate droppable ratio, using GC before that is calculated using each sstable's range. That's important for upcoming tablets, as we want to query only the range that belongs to a particular tablet in the repair history table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15931	2023-12-20 19:04:41 +02:00
Raphael S. Carvalho	dd1a6d6309	compaction: Add splitting compaction task to manager The task for splitting compaction will run until all sstables in the main set are split. The only exceptions are shutdown or user has explicitly asked for abort. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	f87161e556	compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	c96938c49b	compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	b1c5d5dd4e	compaction: Add splitting compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:08 -03:00
Raphael S. Carvalho	3dcb800a96	flat_mutation_reader: Allow interposer consumers to be stacked reader_consumer_v2 being a noncopyable_function imposes a restriction when stacking one interposer consumer on top of another. Think for example of a token-based segregator on top of a timestamp based one. To achieve that, the interposer consumer creator must be reentrant, such that the consumer can be created on each "channel", but today the creator becomes unusable after first usage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:26:32 -03:00
Botond Dénes	493b6bc65f	Merge 'Guard tables in compaction tasks' from Benny Halevy Currently, if a compaction function enters the table or compaction_group async_gate, we can't stop it on the table/compaction_group stop path as they co_await their respective async_gate.close(). This series introduces a table_ptr smart pointer to guards the table object by entering its async_gate, and it also defers awaiting the gate.close future till after stopping ongoing compaction so that closing the gate will prevent starting new compactions while ongoing compaction can be stopped and finally awaiting the close() future will wait for them to unwind and exit the gate after being stopped. Fixes #16305 Closes scylladb/scylladb#16351 * github.com:scylladb/scylladb: compaction: run_on_table: skip compaction also on gate_closed_exception compaction: run_on_table: hold table table: add table_holder and hold method table: stop: allow compactions to be stopped while closing async_gate	2023-12-12 12:50:17 +02:00
Benny Halevy	7843025a53	compaction: run_on_table: skip compaction also on gate_closed_exception Similar to the no_such_column_family error, gate_closed_exception indicates that the table is stopped and we should skip compaction on it gracefully. Fixes #16305 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:46:37 +02:00
Benny Halevy	92c718c60a	compaction: run_on_table: hold table To ensure the table will not be dropped while the compaction task is ongoing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:45:59 +02:00
Aleksandra Martyniuk	ceec5577d8	api: compaction: pass pointer to top level compaction tasks As a preparation for asynchronous compaction api, from which we cannot take values by reference, top level compaction tasks get pointers which need to be set to nullptr when they are not needed (like in async api).	2023-12-11 11:36:10 +01:00
Kefu Chai	f483309165	compaction, api: drop unused functions run_on_existing_tables() is not used at all. and we have two of them. in this change, let's drop them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16304	2023-12-06 14:31:08 +02:00
Benny Halevy	0bcce35abd	treewide: get rid of now unused fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 16:22:49 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Benny Halevy	b12b142232	api: add /storage_service/compact For major compacting all tables in the database. The advantage of this api is that `commitlog->force_new_active_segment` happens only once in `database::flush_all_tables` rather than once per keyspace (when `nodetool compact` translates to a sequence of `/storage_service/keyspace_compaction` calls). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	66ba983fe0	compaction_manager: flush_all_tables before major compaction Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	1fd85bd37b	api: compaction: add flush_memtables option When flushing is done externally, e.g. by running `nodetool flush` prior to `nodetool compact`, flush_memtables=false can be passed to skip flushing of tables right before they are major-compacted. This is useful to prevent creation of small sstables due to excessive memtable flushing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Botond Dénes	3ccf1e020b	Merge ' compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#16177 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-28 09:08:04 +02:00
Aleksandra Martyniuk	9c2c964b8e	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-11-24 19:25:27 +01:00
Aleksandra Martyniuk	8639eae0ce	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-11-24 19:25:20 +01:00
Aleksandra Martyniuk	aa7bba2d8b	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-11-24 15:44:34 +01:00
Kefu Chai	f99223919a	compaction: add formatter for map<timestamp_type, vector<shared_sstable>> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for map<timestamp_type, vector<shared_sstable>>. since the operator<< for this type is only used in the .cc file, and the only use case of it is to provide the formatter for fmt, so the operator<< based formatter is remove in this change. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16163	2023-11-24 11:56:28 +02:00
Raphael S. Carvalho	157a5c4b1b	treewide: Avoid using namespace sstables in header to avoid conflicts That's needed for compaction_group.hh to be included in headers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-23 17:36:57 +02:00
Botond Dénes	0ae1335daa	Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk" This reverts commit `11cafd2fc8`, reversing changes made to `2bae14f743`. Reverting because this series causes frequent CI failures, and the proposed quickfix causes other failures of its own. Fixes: #16113	2023-11-22 17:44:07 +02:00
Nadav Har'El	64d1d5cf62	Merge 'Fix partition estimation with TWCS tables during streaming' from Raphael "Raph" Carvalho TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows. Turns out we had two problems in this area that leads to suboptimal bloom filters. 1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed. 2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count. For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong. Fixes https://github.com/scylladb/scylladb/issues/15704. Closes scylladb/scylladb#15938 * github.com:scylladb/scylladb: streaming: Improve partition estimation with TWCS streaming: Don't adjust partition estimate if segregation is postponed	2023-11-14 20:41:36 +02:00
Botond Dénes	11cafd2fc8	Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#16050 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-14 14:55:17 +02:00
Aleksandra Martyniuk	6af581301b	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-11-14 10:36:38 +01:00
Botond Dénes	a66ec1d3c1	Merge 'Drop compaction_manager_test' from Pavel Emelyanov This is continuation of `a34c8dc4` (Drop compaction_manager_for_testing). There's one more wrapper over compaction_manager to access its private fields. All such access was recently moved to sstables::test_env's compaction manager, now it's time to drop the remaining legacy wrapper class. Closes scylladb/scylladb#16017 * github.com:scylladb/scylladb: test/utils: Drop compaction_manager_test test/utils: Get compaction manager from test_env test/sstables: Introduce test_env_compaction_manager::perform_compaction() test/env: Add sstables::test_env& to compaction_manager_test::run() test/utils: Add sstables::test_env& to compact_sstables() test/utils: Simplify and unify compaction_manager_test::run() test/utils: Squash two compact_sstables() helpers test/compaction: Use shorter compact_sstables() helper test/utils: Keep test task compaction gate on task itself test/utils: Move compaction_manager_test::propagate_replacement()	2023-11-14 11:25:17 +02:00
Aleksandra Martyniuk	a63a6dcd93	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-11-13 16:06:36 +01:00
Aleksandra Martyniuk	599d6ebd52	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-11-13 15:46:58 +01:00
Pavel Emelyanov	f4696f21a8	test/utils: Drop compaction_manager_test This class only provides a .run() method which allocates a task and calls sstables::test_env::perform_compaction(). This can be done in a helper method, no need for the whole class for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00

1 2 3 4 5 ...

788 Commits