scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 03:30:49 +00:00

Author	SHA1	Message	Date
Asias He	4ae6eae00a	table: Get rid of table::run_compaction helper The table::run_compaction is a trivial wrapper for table::compact_sstables. We have lots of similar {start, trigger, run}_compaction functions. Dropping the run_compaction wrapper to reduce confusion. Closes #9161	2021-08-09 14:02:54 +03:00
Tomasz Grabiec	e115fce8f7	Merge "raft: sometimes become a candidate even if outside the configuration" from Kamil There are situations where a node outside the current configuration is the only node that can become a leader. We become candidates in such cases. But there is an easy check for when we don't need to; a comment was added explaining that. * kbr/candidate-outside-config-v3: raft: sometimes become a candidate even if outside the configuration raft: fsm: update _commit_idx when applying snapshot	2021-08-09 12:29:03 +02:00
Avi Kivity	1b618921be	Merge 'hinted handoff: introduce HTTP API for waiting for hint replay (stateless version)' from Piotr Dulikowski This PR introduces a new feature to hinted handoff: ability to wait until hints from given node are replayed towards a chosen set of nodes. It replaces the old mechanism which waits for hints to be replayed before repair and exposes it through an HTTP API. The implementation is completely different, so this PR begins with a revert of the old functionality and then introduces the new implementation. Waiting for hints is made possible with the help of "hint sync points". A sync point is a collection of positions in some hint queues from one node - those positions are encoded into the sync point's description as a hexadecimal string. The sync point consists only of its hexadecimal description - there is no state kept on any of the nodes. Two operations are available through the HTTP API: - `/hints_manager/waiting_point` (POST) - _Create a sync point_. Given a set of `target_hosts`, creates a sync point which encodes positions currently at the end of all queues pointing to any of the `target_hosts`. - `/hints_manager/waiting_point` (GET) - _Wait or check the sync point_. Given a description of a sync point, checks if the sync point was already reached. If you provide a non-zero `timeout` parameter and the sync point is not reached yet, this endpoint will wait until it the point reached or the timeout expires. Hinted handoff uses the commitlog framework in order to store and replay hints. Each entry (here, a serialized hint) can be identified by a "replay position", which contains the ID of the segment containing the hint, and its position in the file. Replay positions are ordered with respect to segment ID and then position in the file; because segment IDs are assigned sequentially and entries are also written sequentially, this order corresponds to the chronological order in which hints were written. This order also corresponds to the order in which hints are replayed, provided that hint segments are processed starting with the one with the smallest ID first. The main idea is to track the positions of both the most recently written hint, and the most recently replayed hint. When creating a hint sync point, the position of the last written hint is encoded; when the sync point is waited on, the hints manager waits until the last replayed position reached the position encoded in the sync point. The description of the sync point encodes positions on a per-hint queue basis - separately for each shard, destination endpoint and hint type (regular or MV). Note: although hints manager destroys and re-creates commitlog instances, the ordering described above still works - the ID of the first segment assigned by the commitlog instance corresponds to the number of milliseconds since the epoch, so commitlog instances created by newer instances will have larger IDs. Before the hints manager is enabled, it performs segment _rebalancing_: for a given endpoint, it makes sure that each shard gets roughly the same number of hint segments. For example, if there are 3 shards and shard 1 has 7 segments, then shard 0 will get 2 segments, shard 1: 3 segments, and shard 2: 2 segments. Apart from distributing the work evenly between shards on startup, it also handles the case when the node is resharded - if the number of shards is reduced, segments from removed shards will be redistributed to lower shards. Because of the possibility of segments being moved between shards on restart, this makes accurate tracking of hint replay harder. In order to simplify the problem, this PR changes the order in which hint segments are replayed - segments from other shards (here called "foreign" segments) are replayed first, before any "local" segment from this shard. Foreign segments are treated as if they were placed before the 0 replay position - when waiting for a hint sync point, we will __always__ wait for foreign segments to be replayed. This behavior makes sure that hints generated before the sync point was created will be replayed - and, if segment rebalancing happened in the meantime, we will potentially replay some more segments which were moved across shards. This PR starts with a revert of the "hints: delay repair until hints are replayed" (#8452) functionality. Some infrastructure introduced in the original PR started to be used by other places in the code, so this is not a simple revert of the merge commit - instead, commits of the old PR are reverted separately and modified in order to make the code compile. The following commits from the original PR were omitted from the revert because the code introduced by them became used by other logic in repair: - `0db45d1df5` (repair: introduce abort_source for repair abort) - `3a2d09b644` (repair: introduce abort_source for shutdown) - `49f4a2f968` (repair: plug in waiting for hints to be sent before repair) Refs: #8102 Fixes: #8727 Tests: unit(dev) Closes #8982 * github.com:scylladb/scylla: api: add HTTP API for hint sync points api: register hints HTTP API outside set_server_done storage_proxy: add functions for creating and waiting for hint sync pts hints: add functions for creating and waiting for sync points hints: add hint sync point structure utils,alternator: move base64 code from alternator to utils hints: make it possible to wait until hints are replayed hints: track the RP of the last replayed position hints: track the RP of the last written hint hints: change last_attempted_rp to last_succeeded_rp hints: rearrange error handling logic for hint sending hints: sort segments by ID, divide into foreign and local Revert "db/hints: allow to forcefully update segment list on flush" Revert "db/hints: add a metric for counting processed files" Revert "db/hints: make it possible to wait until current hints are sent" Revert "storage_proxy: add functions for syncing with hints queue" Revert "messaging_service: add verbs for hint sync points" Revert "storage_proxy: implement verbs for hint sync points" Revert "config: add wait_for_hint_replay_before_repair option" Revert "storage_proxy: coordinate waiting for hints to be sent" Revert "repair: plug in waiting for hints to be sent before repair" Revert "hints: dismiss segment waiters when hint queue can't send" Revert "storage_proxy: stop waiting for hints replay when node goes down" Revert "storage_proxy: add abort_source to wait_for_hints_to_be_replayed"	2021-08-09 10:59:07 +03:00
Piotr Dulikowski	7e3966c03e	api: add HTTP API for hint sync points Adds HTTP endpoints for manipulating hint sync points: - /hinted_handoff/sync_point (POST) - creates a new sync point for hints towards nodes listed in the `target_hosts` parameter - /hinted_handoff/sync_point (GET) - checks the status of the sync point. If a non-zero `timeout` parameter is given, it waits until the sync point is reached or the timeout expires.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	9091ce5977	api: register hints HTTP API outside set_server_done Registration of the currently unused hinted handoff endpoints is moved out from the set_server_done function. They are now explicitly registered in main.cc by calling api::set_hinted_handoff and also uninitialized by calling api::unset_hinted_handoff. Setting/unsetting HTTP API separately will allow to pass a reference to the sync_point_service without polluting the set_server_done function.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	14b00610b2	storage_proxy: add functions for creating and waiting for hint sync pts Adds functions in storage_proxy which allow to create sync points and wait for them.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	d41d39bbcd	hints: add functions for creating and waiting for sync points Adds functions which allow to create per-shard sync points and wait for them.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	e18b29765a	hints: add hint sync point structure Adds a sync_point structure. A sync point is a (possibly incomplete) mapping from hint queues to a replay position in it. Users will be able to create sync points consisting of the last written positions of some hint queues, so then they can wait until hint replay in all of the queues reach that point. The sync point supports serialization - first it is serialized with the help of IDL to a binary form, and then converted to a hexadecimal string. Deserialization is also possible.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	5a0942a0f8	utils,alternator: move base64 code from alternator to utils The base64 encoding/decoding functions will be used for serialization of hint sync point descriptions. Base64 format is not specific to Alternator, so it can be moved to utils.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	70df9973f3	hints: make it possible to wait until hints are replayed Adds necessary infrastructure which allows, for a given endpoint manager, to wait until hints are replayed up to a specified position. An abort source must be specified which, if triggered, cancels waiting for hint replay. If the endpoint manager is stopped, current waiters are dismissed with an exception.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	93f244426d	hints: track the RP of the last replayed position Keeps track of a position which serves as an upper bound for positions of already replayed hints - i.e. all hints with replay positions strictly lower than it are considered replayed. In order to accurately track this bound during hint replay, a std::map is introduced which contains positions of hints which are currently being sent.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	03e2e671cd	hints: track the RP of the last written hint The position of the last written hint is now tracked by the endpoint hints manager. When manager is constructed and no hints are replayed yet, the last written hint position is initialized to the beginning of a fake segment with ID corresponding to the current number of milliseconds since the epoch. This choice makes sure that, in case a new hint sync point is created before any hints are written, the position recorded for that hint queue will be larger than all replay positions in segments currently stored on disk.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	27d0d598fd	hints: change last_attempted_rp to last_succeeded_rp Instead of tracking the last position for which hint sending is attempted, the last successfully replayed position is tracked. The previous variable was used to calculate the position from which hint replay should restart in case of an error, in the following way: _last_not_complete_rp = ctx_ptr->first_failed_rp.value_or( ctx_ptr->last_attempted_rp.value_or(_last_not_complete_rp)); Now, this formula uses the last_succeeded_rp in place of last_attempted_rp. This change does not have an effect on the choice of the starting position of the next retry: - If the hint at `last_attempted_rp` has succeeded, in the new algorithm the same position will be recorded in `last_succeeded_rp`, and the formula will yield the same result. - If the hint at `last_attempted_rp` has failed, it will be accounted into `first_failed_rp`, so the formula will yield the same result. The motivation for this change is that in the next commits of this PR we will start tracking the position of the last replayed hint per hint queue, and the meaning of the new variable makes it more useful - when there are no failed hints in the hint sending attempt, last_succeeded_rp gives us information that hints _up to this position_ were replayed; the last_attempted_rp variable can only tell us that hints _before that position_ were replayed successfully.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	08a7d79ffc	hints: rearrange error handling logic for hint sending Instead of calling the `on_hint_send_failure` method inside the hint sending task in places where an error occurs, we now let the exceptions be returned and handle them inside a single `then_wrapped` attached to the hint sending task. Apart from the `then_wrapped`, there is one more place which calls `on_hint_send_failure` - in the exception handler for the future which spawns the asynchronous hint sending task. It needs to be kept separate because it is a part of a separate task.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	45b04c94e0	hints: sort segments by ID, divide into foreign and local Endpoint hints manager keeps a commitlog instance which is used to write hints into new segments. This instance is re-created every 10 seconds, which causes the previous instance to leave its segments on disk. On the other hand, hints sender keeps a list of segments to replay which is updated only after it becomes empty. The list is repopulated with segments returned by the commitlog::get_segments_to_replay() method which does not specify the order of the segments returned. As a preparation for the upcoming hint sync points feature, this commit changes the order in which segments are replayed: - First, segments written by other shards are replayed. Such segments may appear in the queue because of segment rebalancing which is done at startup. The purpose of replaying "foreign" segments first is that they are problematic for hint sync points. For each hint queue, a hint sync point encodes a replay position of the last written hint on the local shard. Accounting foreign segments precisely would make the implementation more complicated. To make things simpler, waiting for sync points will always make sure that all foreign segments are replayed. This might sometimes cause more hints to be waited on than necessary if a restart occurs in the meantime. - Segments written by the local shard are replayed later, in order of their IDs. This makes sure that local hints are replayed in the order they were written to segments, and will make it possible to use replay positions to track progress of hint replay.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	f83699bb7c	Revert "db/hints: allow to forcefully update segment list on flush" This reverts commit `e48739a6da`. This commit removes the functionality from endpoint hints manager which allowed to flush hints immediately and forcefully update the list of segments to replay. The new implementation of waiting for hints will be based on replay positions returned by the commitlog API and it won't be necessary to forcefully update the segment list when creating a sync point.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	9c1d4e7e6c	Revert "db/hints: add a metric for counting processed files" This reverts commit `5a49fe74bb`. This commit removes a metric which tracks how many segments were replayed during current runtime. It was necessary for current "wait for hints" mechanism which is being replaced with a different one - therefore we can remove the metric.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	3b851a5ebd	Revert "db/hints: make it possible to wait until current hints are sent" This reverts commit `427bbf6d86`. This commit removes the infrastructure which allows to wait until current hints are replayed in a given hint queue. It will be replaced with a different mechanism in later commits.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	4a35d138f6	Revert "storage_proxy: add functions for syncing with hints queue" This reverts commit `244738b0d5`. This commit removes create_hint_queue_sync_point and check_hint_queue_sync_point functions from storage_proxy, which were used to wait until local hints are sent out to particular nodes. Similar methods will be reintroduced later in this PR, with a completely different implementation.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	0d74dee683	Revert "messaging_service: add verbs for hint sync points" This reverts commit `82c419870a`. This commit removes the HINT_SYNC_POINT_CREATE and HINT_SYNC_POINT_CHECK rpc verbs. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so there is no need for new verbs.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	4604bb21c3	Revert "storage_proxy: implement verbs for hint sync points" This reverts commit `485036ac33`. This commit removes the handlers for HINT_SYNC_POINT_CREATE and HINT_SYNC_POINT_CHECK verbs. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so there is no need for new verbs.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	ff453d80ff	Revert "config: add wait_for_hint_replay_before_repair option" This reverts commit `86d831b319`. This commit removes the wait_for_hints_before_repair option. Because a previous commit in this series removes the logic from repair which caused it to wait for hints to be replayed, this option is now useless. We can safely remove this option because it is not present in any release yet.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	6c5d2fe0bf	Revert "storage_proxy: coordinate waiting for hints to be sent" This reverts commit `46075af7c4`. This commit removes the logic responsible for waiting for other nodes to replay their hints. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so there is no need for coordinating multiple nodes.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	ecf854affc	Revert "repair: plug in waiting for hints to be sent before repair" This reverts commit `49f4a2f968`. The idea to wait for hints to be replayed before repair is not always a good one. For example, someone might want to repair a small token range or just one table - but hinted handoff cannot selectively replay hints like this. The fact that we are waiting for hints before repair caused a small number of regressions (#8612, #8831). This commit removes the logic in repair which caused it to wait for hints. Additionally, the `storage_proxy.hh` include, which was introduced in the commit being reverted is also removed and smaller header files are included instead (gossiper.hh and fb_utilities.hh).	2021-08-09 09:22:26 +02:00
Piotr Dulikowski	e3c32c897a	Revert "hints: dismiss segment waiters when hint queue can't send" This reverts commit `9d68824327`. First, we are reverting existing infrastructure for waiting for hints in order to replace it with a different one, therefore this commit needs to be reverted as well. Second, errors during hint replay can occur naturally and don't necessarily indicate that no progress can be made - for example, the target node is heavily loaded and some hints time out. The "waiting for hints" operation becomes a user-issued command, so it's not as vital to ensure liveness.	2021-08-09 09:06:23 +02:00
Piotr Dulikowski	afb4c85662	Revert "storage_proxy: stop waiting for hints replay when node goes down" This reverts commit `22e06ace2c`. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so we are removing all infrastructure related to coordinating hint waiting - therefore this commit needs to be reverted.	2021-08-09 09:06:23 +02:00
Piotr Dulikowski	035da96161	Revert "storage_proxy: add abort_source to wait_for_hints_to_be_replayed" This reverts commit `958a13577c`. The `wait_for_hints_to_be_replayed` function is going to be completely removed in this PR, so this commit needs to be reverted, too.	2021-08-09 09:06:23 +02:00
Takuya ASADA	b822c642e5	docker: fix housekeeping --repo-files to apt repository Even we switched to Ubuntu based container image, housekeeping still using yum repository. It should be switched to apt repository. Fixes #9144 Closes #9147	2021-08-09 07:47:03 +03:00
Avi Kivity	31dcb0d1d0	Update seastar submodule * seastar ce3cc2687f...07758294ef (12): > perftune.py: change hwloc-calc parameters order Fixes perftune on Fedora 34 based hwloc > resource: pass configuration to nr_processing_units() > semaphore: semaphore_timed_out: derive from timed_out_error > Merge "resource: use hwloc_topology_holder" from Benny > Merge "file: ioctl, fcntl and lifetime_hint interfaces in seastar::file" from Arun George > pipe: mark pipe_reader and pipe_writer ctors as noexcept > test: pipe: add simple unit test > test: source_location_test: relax function name check for gcc 11 > http: add 429 too_many_requests status code > Added [[nodiscard]] to abort-source's subscribe > io_queue: Use on_internal_error in io_queue > reactor: Remove unused epoll poller from reactor	2021-08-08 14:42:54 +03:00
Avi Kivity	3b5e312800	db: schema_tables: clean up read_schema_partition_for_keyspace() coroutine captures read_schema_partition_for_keyspace() copies some parameters to capture them in a coroutine, but the same can be achieved more cleanly by changing the reference parameters to value parameters, so do that. Test: unit (dev) Closes #9154	2021-08-08 12:55:10 +03:00
Nadav Har'El	61bcc0ad29	Merge 'compaction: Move compaction_strategy.hh and compaction_garbage_collector.hh to compaction directory ' from Asias He This trial patch set moves compaction_strategy.hh and compaction_garbage_collector.hh to compaction directory and drops two unused compact_for_mutation_query_state and compact_for_data_query_state. Closes #9156 * github.com:scylladb/scylla: compaction: Move compaction_garbage_collector.hh to compaction dir compaction: Move compaction_strategy.hh to compaction dir mutation_compactor: Drop compact_for_mutation_query_state and compact_for_data_query_state	2021-08-08 11:58:41 +03:00
Asias He	4c1f8c2f83	compaction: Move compaction_garbage_collector.hh to compaction dir The top dir is a mess. Move compaction_garbage_collector.hh to the new home.	2021-08-07 08:07:09 +08:00
Asias He	6350a19f73	compaction: Move compaction_strategy.hh to compaction dir The top dir is a mess. Move compaction_strategy.hh and compaction_strategy_type.hh to the new home.	2021-08-07 08:06:37 +08:00
Asias He	47aae83185	mutation_compactor: Drop compact_for_mutation_query_state and compact_for_data_query_state They are not used.	2021-08-07 07:21:48 +08:00
Tomasz Grabiec	0af2c2b1cb	Merge "raft: store cluster configuration when taking snapshots" from Kamil The cluster would forget its configuration when taking a snapshot, making it unable to reelect a leader. We fix the problem and introduce a regression test. The last commit introduces some additional assertions for safety. * kbr/snapshot-preserve-config-v4: raft: sanity checking of apply index test: raft: regression test for storing cluster configuration when taking snapshots raft: store cluster configuration when taking snapshots	2021-08-06 18:34:53 +02:00
Kamil Braun	7533c84e62	raft: sometimes become a candidate even if outside the configuration There are situations where a node outside the current configuration is the only node that can become a leader. We become candidates in such cases. But there is an easy check for when we don't need to; a comment was added explaining that.	2021-08-06 13:18:32 +02:00
Kamil Braun	907672622f	raft: fsm: update _commit_idx when applying snapshot All entries up to snapshot.idx must obviously be committed, so why not update _commit_idx to reflect that. With this we get a useful invariant: `_log.get_snapshot().idx <= _commit_idx`. For example, when checking whether the latest active configuration is committed, it should be enough to compare the configuration index to the commit index. Without the invariant we would need a special case if the latest configuration comes from a snapshot.	2021-08-06 12:43:07 +02:00
Kamil Braun	1ca4d30cc3	raft: sanity checking of apply index Check that entries are applied in the correct order.	2021-08-06 12:21:19 +02:00
Kamil Braun	93822b0ee7	test: raft: regression test for storing cluster configuration when taking snapshots Before the fix introduced in the previous patch, the cluster would forget its configuration when taking a snapshot, making it unable to reelect a leader. This regression test catches that.	2021-08-06 12:17:22 +02:00
Kamil Braun	c6563220b0	raft: store cluster configuration when taking snapshots We add a function `log_last_conf_before(index_t)` to `fsm` which, given an index greater than the last snapshot index, returns the configuration at this index, i.e. the configuration of the last configuration entry before this index. This function is then used in `applier_fiber` to obtain the correct configuration to be stored in a snapshot. In order to ensure that the configuration can be obtained, i.e. the index we're looking at is not smaller than the last snapshot index, we strengthen the conditions required for taking a snapshot: we check that `_fsm` has not yet applied a snapshot at a larger index (which it may have due to a remote snapshot install request). This also causes fewer unnecessary snapshots to be taken in general.	2021-08-06 12:00:32 +02:00
Avi Kivity	52364b5da0	Merge 'cql3: Use expressions to calculate the local-index clustering ranges' from Jan Ciołek Calculating clustering ranges on a local index has been rewritten to use the new `expression` variant. This allows us to finally remove the old `bounds_ranges` function. Closes #9080 * github.com:scylladb/scylla: cql3: Remove unused functions like bounds_ranges cql3: Use expressions to calculate the local-index clustering ranges statement_restrictions_test: tests for extracting column restrictions expression: add a function to extract restrictions for a column	2021-08-05 18:32:11 +03:00
Tomasz Grabiec	4bfff86ba5	gdb: Print disengaged optionals as std::nullopt to reduce noise Message-Id: <20210805113409.75394-1-tgrabiec@scylladb.com>	2021-08-05 14:42:31 +02:00
Kamil Braun	f050d3682c	raft: fsm: stronger check for outdated remote snapshots We must not apply remote snapshots with commit indexes smaller than our local commit index; this could result in out-of-order command application to the local state machine replica, leading to serializability violations. Message-Id: <20210805112736.35059-1-kbraun@scylladb.com>	2021-08-05 14:29:50 +02:00
Tomasz Grabiec	8fe06ad681	storage_proxy: Fix result reconciliation for memory-limitter induced short reads This applies to the case when pages are broken by replicas based on memory limits (not row or partition limits). If replicas stop pages in the following places: replica1 = { row 1, <end-of-page> row 2 } replica2 = { row 3 } The coordinator will reconcile the first page as: { row 1, row 3 } and row 2 will not be emitted at all in the following pages. The coordinator should notice that replica1 returned a short read and ignore everything past row 1 from other replicas, but it doesn't. There is a logic to do this trimming, but it is done in got_incomplete_information_across_partitions() which is executed only for the partition for which row limits were exhausted. Fix by running the logic unconditionally. Fixes #9119 Tests: - unit (dev) - manual (2 node cluster, manual reproducer) Message-Id: <20210802231539.156350-1-tgrabiec@scylladb.com>	2021-08-05 11:28:52 +03:00
Nadav Har'El	ae51fef57c	cql-pytest: add tests for estimated partition count In issue #9083 a user noted that whereas Cassandra's partition-count estimation is accurate, Scylla's (rewritten in commit `b93cc21`) is very inaccurate. The tests introduce here, which all xfail on Scylla, confirm this suspicion. The most important tests are the "simple" tests, involving a workload which writes N distinct partitions and then asks for the estimated partition count. Cassandra provides accurate estimates, which grow more accurate with more partitions, so it passes these tests, while Scylla provides bad estimates and fails them. Additional tests demonstrate that neither Scylla nor Cassandra can handle anything beyond the "simple" case of distinct partitions. Two tests which xfail on both Cassandra and Scylla demonstrate that if we write the same partitions to multiple sstables - or also delete partitions - the estimated partition counts will be way off. Refs #9083 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210726211315.1515856-1-nyh@scylladb.com>	2021-08-05 08:50:19 +02:00
Asias He	9903eecc0f	storage_service: Close reader in load_and_stream We forgot to call the reader.close() for the reader when the close api is introduced. Fixes #9146 Closes #9148	2021-08-05 09:27:19 +03:00
Raphael S. Carvalho	154e8959f9	compaction: Optimize partition filtering for cleanup compaction Realized that the overall complexity of partition filtering in cleanup is O(N * log(M)), where N is # of tokens M is # of ranges owned by the node Assuming N=10,000,000 for a table and M=257, Nlog(M) ~= 80,056,245 checks performed during the whole cleanup. This can be optimized by taking advantage that owned ranges are both sorted and non wrapping, so an incremental iterator-oriented checker is introduced to reduce complexity from O(N log(M)) to O(N + M) or just O(N). BEFORE 240MB to 237MB (~98% of original) in 3239ms = 73MB/s. ~950016 total partitions merged to 949943. 719MB to 719MB (~99% of original) in 9649ms = 74MB/s. ~2900608 total partitions merged to 2900576. 1GB to 1GB (~100% of original) in 15231ms = 74MB/s. ~4536960 total partitions merged to 4536852. 1GB to 1GB (~100% of original) in 15244ms = 74MB/s. ~4536960 total partitions merged to 4536840. 1GB to 1GB (~100% of original) in 15263ms = 74MB/s. ~4536832 total partitions merged to 4536783. 1GB to 1GB (~100% of original) in 15216ms = 74MB/s. ~4536832 total partitions merged to 4536812. AFTER 240MB to 237MB (~98% of original) in 3169ms = 74MB/s. ~950016 total partitions merged to 949943. 719MB to 719MB (~99% of original) in 9444ms = 76MB/s. ~2900608 total partitions merged to 2900576. 1GB to 1GB (~100% of original) in 14882ms = 76MB/s. ~4536960 total partitions merged to 4536852. 1GB to 1GB (~100% of original) in 14918ms = 76MB/s. ~4536960 total partitions merged to 4536840. 1GB to 1GB (~100% of original) in 14919ms = 76MB/s. ~4536832 total partitions merged to 4536783. 1GB to 1GB (~100% of original) in 14894ms = 76MB/s. ~4536832 total partitions merged to 4536812. Fixes #6807. test: mode(dev). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210802213159.182393-1-raphaelsc@scylladb.com>	2021-08-04 20:35:44 +03:00
Jan Ciolek	44ca965ba0	cql3: Remove unused functions like bounds_ranges Finding clustering ranges has been rewritten to use the new expression variant. Old bounds_ranges() and other similar ones are no longer needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-08-04 17:12:44 +02:00
Jan Ciolek	da54c9e2fb	cql3: Use expressions to calculate the local-index clustering ranges Removes old code used to calculate local-index clustering range and replaces it with new based on the expression variant. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-08-04 17:12:40 +02:00
Asias He	6230bd4b5a	locator: Add yield in do_get_ranges and friends Not all calculate_natural_endpoints implementations respect can_yield flag, for example, everywhere_replication_strategy. This patch adds yield at the caller site to fix stalls we saw in do_get_ranges. Fixes #8943 Closes #9139	2021-08-04 15:52:37 +03:00

1 2 3 4 5 ...

27829 Commits