scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Botond Dénes	7c95bd3343	Merge 'Rename 'system.status' and 'system.describe_ring' virtual tables' from Avi Kivity 'system.status' and 'system.describe_ring' are imperfect names for what they do, so rename them. Fortunately they aren't exposed in any released version so there is no compatibility concern. Closes #9530 * github.com:scylladb/scylla: system_keyspace: rename 'system.describe_ring' to 'system.token_ring' system_keyspace: rename 'system.status' to 'system.cluster_status'	2021-10-28 11:46:20 +03:00
Takuya ASADA	13ffe3c094	scylla_util.py: detect ephemeral/EBS disks correctly on Nitro System Currently, aws_instance.ephemeral_disks() returns both ephemeral disks and EBS disks on Nitro System. This is because both are attached as NVMe disks, we need to add disk type detection code on NVMe handle logic. Fixes #9440 Closes #9462	2021-10-28 08:58:25 +03:00
Piotr Sarna	f4cb8191fa	cql3: include system distributed tables in system stats Some time ago we started gathering stats for system tables in a separate class in order to be able to distinguish which queries come from the user - e.g. if the unpaged queries are internal or not. Originally, only local system tables were moved into this class, i.e. system and system_schema. It would make sense, however, to also include other internal keyspaces in this separate class - which includes system_distributed, system_traces, etc. Fixes #9380 Closes #9490	2021-10-28 08:58:25 +03:00
Avi Kivity	5e6e4aed53	Merge 'Add Scylla Sphinx Theme 1.0' from David Garcia Replaces https://github.com/scylladb/scylla/pull/9477 Related issue https://github.com/scylladb/sphinx-scylladb-theme/issues/133 Sphinx ScyllaDB Theme 1.0 is now released 🥳 We’ve made a number of updates to the look and feel of the theme to improve the overall user experience. You can read more about all notable changes [here](https://sphinx-theme.scylladb.com/stable/CHANGELOG#september-2021). This PR also cleans the file ``conf.py``, removing several unsued options. 1. Clone this PR. For more information, see [Cloning pull requests locally](https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally). 2. Enter the docs folder, and run: ``` make preview ```` 3. Open http://127.0.0.1:5500/ with your favorite browser. You will see the docs with the new look and feel. Closes #9515 * github.com:scylladb/scylla: Review docs config fix runtime errors upgrade theme to v1.x	2021-10-28 08:58:25 +03:00
Raphael S. Carvalho	affa1d9b04	utils/estimated_histogram.hh: fix division-by-zero in mean() if mean() is called when there are no elements in the histogram, a runtime error will happen due to division-by-zero. approx_exponential_histogram::mean() handles it but for some reason we forgot to do the same for estimated_histogram. this problem was found when adding an unit test which calls mean() in an empty histogram. Fixes #9531. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211027142813.56969-1-raphaelsc@scylladb.com>	2021-10-28 08:58:25 +03:00
Benny Halevy	b79e9b7396	tools: scylla-sstable: improve error reporting when loading schema from file Throw a proper exception from do_load_schemas if parse_statements fails to parse the schema cql. Catch it in scylla-sstable main() function so it won't be reported as seastar - unhandled exception. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211027124032.1787347-1-bhalevy@scylladb.com>	2021-10-28 08:58:25 +03:00
Avi Kivity	5ea0940ca9	system_keyspace: rename 'system.describe_ring' to 'system.token_ring' Table names are usually nouns, so SELECT/INSERT statements sound natural: "SELECT * FROM pets". 'system.describe_ring' defies this convention. Rename it to 'system.token_ring' so selects are natural. The name is not in any released version, so we can safely rename it.	2021-10-27 17:32:37 +03:00
Avi Kivity	5b21e4eb83	system_keyspace: rename 'system.status' to 'system.cluster_status' 'system.status' is too generic, it doesn't explain the status of what. 'system.node_status' is also ambiguous (this node? all nodes?) so I picked 'system.cluster_status'. The internal name, nodetool_status_table, was even worse (we're not querying the status of nodetool!) but fortunately wasn't exposed. The name is not in any released version, so we can safely rename it.	2021-10-27 17:31:45 +03:00
Botond Dénes	9ec55e054d	treewide: distinguish truncated frame errors We have two identical "Truncated frame" errors, at: * read_frame_size() in serialization_visitors.hh; * cql_server::connection::read_and_decompress_frame() in transport/server.cc; When such an exception is thrown, it is impossible to tell where was it thrown from and it doesn't have any further information contained in it (beyond the basic information it being thrown implies). This patch solves both problems: it makes the exception messages unique per location and it adds information about why it was thrown (the expected vs. real size of the frame). Ref: #9482 Closes #9520	2021-10-27 12:27:16 +02:00
Alejo Sanchez	0a63e72fa4	api: (minor) fix typo bool instead of boolean In definition for /column_family/major_compaction/{name} there is an incorrect use of "bool" instead of "boolean". Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #9516	2021-10-27 12:25:59 +02:00
Benny Halevy	a21b1fbb2f	large_data_handle: add sstable name to log messages Although the sstable name is part of the system.large_* records, it is not printed in the log. In particular, this is essential for the "too many rows" warning that currently does not record a row in any large_* table so we can't correlate it with a sstable. Fixes #9524 Test: unit(dev) DTest: wide_rows_test.py Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211027074104.1753093-1-bhalevy@scylladb.com>	2021-10-27 10:53:11 +03:00
Botond Dénes	6a76e12768	mutation_partition: row: make row marker shadowing symmetric Currently row marker shadowing the shadowable tombstone is only checked in `apply(row_marker)`. This means that shadowing will only be checked if the shadowable tombstone and row marker are set in the correct order. This at the very least can cause flakyness in tests when a mutation produced just the right way has a shadowable tombstone that can be eliminated when the mutation is reconstructed in a different way, leading to artificial differences when comparing those mutations. This patch fixes this by checking shadowing in `apply(shadowable_tombstone)` too, making the shadowing check symmetric. There is still one vulnerability left: `row_marker& row_marker()`, which allow overwriting the marker without triggering the corresponding checks. We cannot remove this overload as it is used by compaction so we just add a comment to it warning that `maybe_shadow()` has to be manually invoked if it is used to mutate the marker (compaction takes care of that). A caller which didn't do the manual check is mutation_source_test: this patch updates it to use `apply(row_marker)` instead. Fixes: #9483 Tests: unit(dev) Closes #9519	2021-10-26 20:40:31 +02:00
Benny Halevy	5f513ed28b	view_builder: consumer: flush_fragments: close reader on error Make sure to close the reader created by flush_fragments if an exception occurs before it's moved to `populate_views`. Note that it is also ok to close the reader _after_ it has been moved, in case populate_views itself throws after closing the reader that was moved it. For conveience flat_mutation_reader::close supports close-after-move. Fixes #9479 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211024164138.1100304-1-bhalevy@scylladb.com>	2021-10-24 19:53:31 +03:00
Benny Halevy	4062cd17e0	test: hashers_test: mutation_fragment_sanity_check: stop semaphore To stop the semaphore as required we need run the test in a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211024053402.990142-1-bhalevy@scylladb.com>	2021-10-24 11:29:23 +03:00
David Garcia	ff56b7e43e	Review docs config	2021-10-22 13:34:56 +01:00
Michał Radwański	9caf85f64a	partition_snapshot_reader: do not accidentally copy schema Functions `upper_bound` and `lower_bound` had signatures: ``` template<typename T, typename... Args> static rows_iter_type lower_bound(const T& t, Args... args); ``` This caused a dacay from `const schema&` to `schema` as one of the args, which in turn copied the schema in a fair number of the queries. Fix that by setting the parameter type to `Args&&`, which doesn't discard the reference. Fixes #9502 Closes #9507	2021-10-20 19:09:08 +03:00
Avi Kivity	a9951588b4	Update seastar submodule * seastar 994b4b5a0c...083898a172 (24): > Revert "memory: always allocate buf using "malloc" for non reactor" > Revert dpdk update to 21.08. > tutorial: Fix typos > queue: add back template requirement for element type to be nothrow move-constructible > Revert "queue: require element type to be nothrow move-constructible" > build: add the closing "-Wl,--no-whole-archive" to the ldflags > build: add -Wno-error=volatile to CXX_FLAGS > build: Include dpdk as a single object in libseastar.a > Merge: queue: cleanup exception handling > build: drop dpdk-specific machine architecture names > reactor: call memory::configure() before initialize dpdk > core/loop: parallel_for_each(): make entire function critical alloc section > Merge 'scheduling groups: Add compile parameter for setting max scheduling groups count at compile time' from Eliran Sinvani > test: coroutines_test: assign spinner lambda to local variable > shared_ptr: mark shared_from_this functions noexcept > lw_shared_ptr: mark shared_from_this functions noexcept > build: update download URL for Boost > Merge "build: build with dpdk v21.08" from Kefu > cpu_stall_detector: handle wraparounds in Linux perf_event ring buffer > entry_point.cc: default-initialize sigaction struct > reactor: s/gettid()/syscall(SYS_gettid)/ > memory: always allocate buf using "malloc" for non reactor > Revert "memory: always allocate buf using "malloc" for non reactor" > memory: always allocate buf using "malloc" for non reactor	2021-10-20 18:38:18 +03:00
Benny Halevy	0746b5add6	storage_service: replicate_to_all_cores: update all keyspaces Currently we update the effective_replication_map only on non-system keyspace, leaving the system keyspace, that uses the local replication strategy, with the empty replication_map, as it was first initialized. This may lead to a crash when get_ranges is called later as seen in #9494 where get_ranges was called from the perform_sstable_upgrade path. This change updates the effective_replication_map on all keyspaces rather than just on the non-system ones and adds a unit test that reproduces #9494 without the fix and passes with it. Fixes #9494 Test: unit(dev), database_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211020143217.243949-1-bhalevy@scylladb.com>	2021-10-20 17:54:23 +03:00
Calle Wilund	940058d25a	transport::server: Handle nested exceoptions in cql execution/query Fixes #9491 CQL server, when encountering a "general" exception (i.e. not thrown by cql error checks), reports a wire error with simply the what() part of exception. However, if we have nested exceptions, we will most likely lose info here (hello encryption). General exception case should unwind exception and give back full, concatenated message to avoid confusion. Closes #9492	2021-10-20 17:54:17 +03:00
Nadav Har'El	e4a6569258	config: experimental flag UNUSED_CDC shouldn't be distinct from UNUSED When an experimental feature graduates from being experimental, we want to continue allow the old "--experimental-features=..." option to work, in case some user's configuration uses it - just do nothing. The way we do it is to map in db::experimental_features_t::map() the feature's name to the UNUSED value - this way the feature's name is accepted, but doesn't change anything. When the CDC feature graduated from being experimental, a new bit UNUSED_CDC was introduced to do the same thing. This separate bit was not actually necessary - if we ever check for UNUSED_CDC bit anywhere in the code it means the flag isn't actually unused ;-) And we don't check it. So simplify the code by conflating UNUSED_CDC into UNUSED. This will also make it easy to build from db::experimental_features_t::map() a list of current experimental features - now it will simply be those that do not map to UNUSED. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211013105107.123544-1-nyh@scylladb.com>	2021-10-20 17:54:17 +03:00
Nadav Har'El	88afcc7fe3	Merge 'cql-pytest: Forbid deletions based on secondary index' from Piotr Sarna This series fixes a bug which allowed using a secondary index in a restriction for a DELETE statement, which resulted in generating incorrect slices and deleting the whole partition instead. Secondary indexes are not meant to be used for deletes, which this series enforces by marking the indexes as not queriable. It also comes with a reproducing test case, originally provided by @fee-mendes (thanks!). Fixes #9495 Tests: unit(release) Closes #9496 * github.com:scylladb/scylla: cql-pytest: add reproducer for deleting based on secondary index cql3: forbid querying indexes for deletions	2021-10-20 17:54:17 +03:00
Botond Dénes	995a41d422	test/perf/perf_sstable: add support for compaction strategies So the compaction perf of different compaction strategies can be compared. Data timestamps are diversified such that they fall into four different bucket if TWCS is used, in order to be able to stress the timestamp based splitting code path. Closes #9488	2021-10-20 17:54:17 +03:00
Benny Halevy	dc091fc952	effective_replication_map, abstract_replication_strategy: get_ranges: call on_internal_error in empty sorted_tokens case Accessing tm.sorted_tokens().back() causes undefined behavior if tm.sorted_tokens is empty. Check that first and throw/abort using on_internal_error in this case. This will prevent the segfault but it doesn't fix the root cause which is getting here with empty token_metadata. That will be fixed by the following patch. Refs #9494 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211019075710.1626808-1-bhalevy@scylladb.com>	2021-10-19 18:52:59 +03:00
Piotr Sarna	7c35d47690	cql3: make column names readable for invalid delete statements This commit makes the column names from an invalid delete statement human readable. Before that, they were printed in their hex representation, which is not convenient for debugging. Before: InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid where clause contains non PRIMARY KEY columns: 76616c" After: InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid where clause contains non PRIMARY KEY columns: val" Message-Id: <52923335e8837295fd5ba2dfd0921196e21f7f16.1634626777.git.sarna@scylladb.com>	2021-10-19 10:13:43 +03:00
Piotr Sarna	83722b5563	cql-pytest: add reproducer for deleting based on secondary index This commit adds a test case for a bug reported by Felipe <felipemendes@scylladb.com>. The bug involves trying to delete an entry from a partition based on a secondary index created on a column which is part of the compound clustering key, and the unfortunate result is that the whole partition gets wiped. Cassandra's behavior is in this case correct - deletion based on a secondary index column is not allowed. Refs #9495	2021-10-19 08:50:20 +02:00
Piotr Sarna	7e3649202e	cql3: forbid querying indexes for deletions Using secondary indexes for the purpose of a DELETE statement was never expected to be well-defined, but an edge case in #9495 showed that the index may sometimes be inadvertently used, which causes the whole partition to be deleted. In order to prevent such errors, it's now explicitly defined that an index is not queriable if it's going to be used for the purpose of a DELETE statement.	2021-10-19 08:49:58 +02:00
Raphael S. Carvalho	4271c4edcd	sstables: Fix metric currently_open_for_writing metric currently_open_for_writing, used to inform # of sstables opened for writing, holds the same value as total_open_for_writing. that means we aren't actually decreasing the counter, so it is bogus. Moved to sstable_writer, because sstable is used by writer to open files, which are then extracted from sstable object, and later the same object is reused for read-only mode. Fixes #9455. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211013134812.177398-1-raphaelsc@scylladb.com>	2021-10-18 18:29:33 +03:00
Kamil Braun	22061831c1	Merge 'cql3: keyspace prepare_options: expand replication_factor also for fully qualified NetworkTopologyStrategy' from Benny Halevy It was auto-expanded only if the strategy name was the short "NetworkTopologyStrategy" name. Fixes #9302. Closes #9304. * 'prepare_options' of https://github.com/bhalevy/scylla: cql3: keyspace prepare_options: expand replication_factor also for fully qualified NetworkTopologyStrategy abstract_replication_strategy: add to_qualified_class_name	2021-10-18 16:40:57 +03:00
Raphael S. Carvalho	ec1a55ffae	compaction/TWCS: reduce write amp for reshape of sstables spanning multiple windows TWCS can reshape at most 32 sstables spanning multiple windows, in a single compaction round. Which sstables are compacted together, when there are more than 32 sstables, is random. If sstables with overlapping windows are compacted together, then write amplification can be reduced because we may be able to push all the data to a window W in a single compaction round, so we'll not have to perform another compaction round later in W, to reduce its number of files. This is also very good to reduce the amount of transient file descriptors opened, because TWCS reshape first reshapes all sstables spanning multiple windows, so if all windows temporarily grow large in number of files, then there's a risk which file descriptors can be exhausted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211013203046.233540-3-raphaelsc@scylladb.com>	2021-10-18 16:40:57 +03:00
Raphael S. Carvalho	062436829c	compaction/TWCS: optimize reshape for disjoint sstables spanning multiple windows After `a4053dbb72`, data segregation is postponed to offstrategy, so reshape procedure is called with disjoint sstables which belong to different windows, so let's extend the optimization for disjoint sstables which span more than one window. In this way, write amplification is reduced for offstrategy compaction, as all disjoint sstables will be compacted at once. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211013203046.233540-2-raphaelsc@scylladb.com>	2021-10-18 16:40:57 +03:00
Raphael S. Carvalho	aa4aba40aa	sstables: sstable_run: introduce estimate_droppable_tombstone_ratio Make it possible to estimate dropppable tombstones for sstable runs. The result is averaged by number of fragments composing the run. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211014143424.353357-1-raphaelsc@scylladb.com>	2021-10-18 12:24:08 +03:00
Benny Halevy	b9aa92edd4	cql3: keyspace prepare_options: expand replication_factor also for fully qualified NetworkTopologyStrategy It was auto-expanded only if the strategy name was the short "NetworkTopologyStrategy" name. Fixes #9302 Test: cql_query_test.test_rf_expand(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-18 12:18:07 +03:00
Benny Halevy	e4dc81ec04	abstract_replication_strategy: add to_qualified_class_name And use it from cql3 check_restricted_replication_strategy and keyspace_metadata ctor that defined their own `replication_class_strategy`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-18 12:13:25 +03:00
Piotr Sarna	4bfaa7d9fc	Merge 'Service levels: fix undefined behaviours' from Eliran Sinvani This mini series contains two fixes that are bundled together since the second one assumes that the first one exists (or it will not fix anything really...), the two problems were: 1. When certain operations are called on a service level controller which doesn't have it's data accessor set, it can lead to a crash since some operations will still try to dereference the accessor pointer. 2. The cql environment test initialized the accessor with a sharded<system_distributed_data>& however this sharded class as itself is not initialized (sharded::start wasn't called), so for the same that were unsafe for null dereference the accessor will now crash for trying to access uninitialized sharded instance. Closes #9468 * github.com:scylladb/scylla: CQL test environment: Fix bad initialization order Service Level Controller: Fix possible dereference of a null pointer	2021-10-18 08:53:53 +02:00
Nadav Har'El	1d751491a3	test/alternator: recognize when Scylla crashes Before this patch, if Scylla crashes during some test in test/alternator, all tests after it will fail because they can't connect to Scylla - and we can get a report on hundreds of failures without a clear sign of where the real problem was. This patch introduces an autouse fixture (i.e., a fixture automatically used by every test) which tries to run a do-nothing health-check request after each test. If this health-check request fails, we conclude that Scylla crashed and report the test in which this happened - and exit pytest instead of failing a hundred more tests. The failure report looks something like this: ``` ! _pytest.outcomes.Exit: Scylla appears to have crashed in test test_batch.py::test_batch_get_item ! ``` And the entire test run fails. These extra health checks are not free, but they come fairly close to being free: In my tests I measured less than 0.1 seconds slowdown of the entire test suite (which has 618 tests) caused by the extra health checks. Fixes #9489 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211017123222.217559-1-nyh@scylladb.com>	2021-10-17 20:45:30 +03:00
Nadav Har'El	86e8979ff2	test/alternator, test/cql-pytest: enable specific experimental features Issue #9467 deprecated the blanket "--experimental" option which we used to enable all experimental Scylla features for testing, and suggests that individual experimental features should be enabled instead. So this is what we do in this patch for the Scylla-running scripts in test/alternator and test/cql-pytest: We need to enable UDF for the CQL tests, and to enable Alternator Streams and Alternator TTL for the Alternator tests. Refs #9467 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211012110312.719654-2-nyh@scylladb.com>	2021-10-15 16:36:35 +03:00
Nadav Har'El	ddba510e64	config: add name for the experimental Alternator TTL feature Earlier we added experimental (and very incomplete) support for Alternator's TTL feature, but forgot to set a name for this experimental feature. As a result, this feature can be enabled only with the blanket "--experimental" option and not with a specific "--experimental-features=..." option. Since issue #9467 deprecated the blanket "--experimental" option and users are encouraged to only enable specific experimental features, it is important that we have a name for it. So the name chosen in this patch is "alternator-ttl". Eventually this feature might evolve beyond Alternator-only, but for now, I think it's a good name and we'll probably graduate the experimental Alternator TTL feature before supporting CQL, so it will be a new experimental feature anyway. Refs #9467. db/config.cc Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211012110312.719654-1-nyh@scylladb.com>	2021-10-15 16:36:23 +03:00
Avi Kivity	acfe0a3803	build: reinstate -Wunknown-attributes The warning was disabled during the migration to clang, but now it appears unnecessary (perhaps clang added support for the attributes it did not have then). It is valuable for detecting misspelled attributes, so enable it again. Closes #9480	2021-10-14 14:26:56 +03:00
Tomasz Grabiec	cc56a971e8	database, treewide: Introduce partition_slice::is_reversed() Cleanup, reduces noise. Message-Id: <20211014093001.81479-1-tgrabiec@scylladb.com>	2021-10-14 12:39:16 +03:00
Nadav Har'El	cad039421a	config: automate help-string listing experimental features The help string from the "--experimental-features" command-line option lists the available experimental features, to helping a user who might want to enable them. But this help string was manually written, and has since drifted from reality: * Two of the listed "experimental" features, cdc and lwt, have actually graduated from being experimental long ago. Although technically a user may still use the words "cdc" and "lwt" in the "experimental-features" parameter, doing so is pointless, and worse: This text in the help string can mislead a user into thinking that these two features are still experimental - while they are not! * One experimental feature - alternator-ttl - is missing from this list. Instead of updating the help string text now - and needing to do this again and again in the future as we change experimental features - what this patch does is to construct the list of features automatically from the map of supported feature names - excluding any features which map to UNUSED. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211013122635.132582-1-nyh@scylladb.com>	2021-10-14 10:39:58 +03:00
Avi Kivity	4f3b8f38e2	Merge "Add effective_replication_map" from Benny " The current api design of abstract_replication_strategy provides a can_yield parameter to calls that may stall when traversing the token metadata in O(n^2) and even in O(n) for a large number of token ranges. But, to use this option the caller must run in a seastar thread. It can't be used if the caller runs a coroutine or plain async tasks. Rather than keep adding threads (e.g. in storage_service::load_and_stream or storage_service::describe_ring), the series offers an infrastructure change: precalculating the token->endpoints map once, using an async task, and keeping the results in a `effective_replication_map` object. The latter can be used for efficient and stall-free calls, like get_natural_endpoints, or get_ranges/get_primary_range, replacing their equivalents in abstract_replication_strategy, and dropping the public abstract_replication_strategy::calculate_natural_endpoints and its internal cached_endpoints map. Other than the performance benefits of: 1. The current calls require running a thread to yield. Precalculating the map (using async task) allows us to use synchronous calls without stalling the rector. 2. The replication maps can and should be shared between keyspaces that use the same replication strategy. (Will be sent as a follow-up to the series) The bigger benefits (courtesy of Avi Kivity) are laying the groundwork for: 1. atomic replication metadata - an operation can capture a replication map once, and then use consistent information from the map without worrying that it changes under its feet. We may even be able to s/inet_address/replica_ptr/ later. 2. establish boundaries on the use of replication information - by making a replication map not visible, and observing when its reference count drops to zero, we can tell when the new replication map is fully in use. When we start writing to a new node we'll be able to locate a point in time where all writes that were not aware of the new node were completed (this is the point where we should start streaming). Notes: * The get_natural_endpoints method that uses the effective_replication_map is still provided as a abstract_replication_strategy virtual method so that local_strategy can override it and privide natural endpoints for any search token, even in the absence of token_metadata, when\ called early-on, before token_metadata has been established. The effective_replication_map materializes the replication strategy over a given replication strategy options and token_metadata. Whenever either of those change for a keyspace, we make a new effective_replication_map and keep it in the keyspace for latter use. Methods that depend on an ad-hoc token_metadata (e.g. during node operations like bootstrap or replace) are still provided by abstract_replication_strategy. TODO: - effective_replication_map registry - Move pending ranges from token_metadata to replication map - get rid of abstract_replication_strategy::get_range_addresses(token_metadata&) - calculate replication map and use it instead. Test: unit(dev, debug) Dtest: next-gating, bootstrap_test.py update_cluster_layout_tests.py alternator_tests.py -a 'dtest-full,!dtest-heavy' (release) " * tag 'effective_replication_strategy-v6' of github.com:bhalevy/scylla: (44 commits) effective_replication_map: add get_range_addresses abstract_replication_strategy: get rid of shared_token_metadata member and ctor param abstract_replication_strategy: recognized_options: pass const topology& abstract_replication_strategy: precacluate get_replication_factor for effective_replication_map token_metadata: get rid of now-unused sync methods abstract_replication_strategy: get rid of do_calculate_natural_endpoints abstract_replication_strategy: futurize get_address_ranges abstract_replication_strategy: futurize get_range_addresses abstract_replication_strategy: futurize get_ranges(inet_address ep, token_metadata_ptr) abstract_replication_strategy: move get_ranges and get_primary_ranges to effective_replication_map compaction_manager: pass owned_ranges via cleanup/upgrade options abstract_replication_strategy: get rid of cached_endpoints all replication strategies: get rid of do_get_natural_endpoints storage_proxy: use effective_replication_map token_metadata_ptr along with endpoints abstract_replication_strategy: move get_natural_endpoints_without_node_being_replaced to effective_replication_map storage_service: bootstrap: add log messages storage_service: get_mutable_token_metadata_ptr: always invalidate_cached_rings shared_token_metadata: set: check version monotonicity token_metadata: use static ring version token_metadata: get rid of copy constructor and assignment operator ...	2021-10-13 20:28:30 +03:00
Tomasz Grabiec	d8832b9fd8	Merge 'Memtable make reversing reader' from Michał Radwański Make a reader that reads from memtable in reverse order. This draft PR includes two commits, out of which only the second is relevant for review. Described in #9133. Refs #1413. Closes #9174 * github.com:scylladb/scylla: partition_snapshot_reader: pop_range_tombstone returns reference (instead of value) when possible. memtable: enable native reversing partition_snapshot_reader: reverse ck_range when needed by Reversing memtable, partition_snapshot_reader: read from partition in reverse partition_snapshot_reader: rows_position and rows_iter_type supporting reverse iteration partition_snapshot_reader: split responsibility of ck_range partition_snapshot_reader: separate _schema into _query_schema and _partition_schema query: reverse clustering_range test: cql_query_test: fix test_query_limit for reversed queries	2021-10-13 20:24:02 +03:00
Nadav Har'El	ee8dc6847c	scylla.yaml: refresh list of experimental features Our scylla.yaml contains a comment listing the available experimental features, supposedly helping a user who might want to enable them. I think the usefuless of this comment is dubious, but as long as we have one, let's at least make it accurate: * Two of the listed "experimental" features, cdc and lwt, have actually graduated from being experimental long ago. Although technically a user may still use the words "cdc" and "lwt" in the "experimental-features" list, doing so is pointless, and worse: This comment suggests that these two features are still experimental - while they are not! * One experimental feature - alternator-ttl - is missing from this list. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211013083247.13223-1-nyh@scylladb.com>	2021-10-13 20:24:02 +03:00
Benny Halevy	17296cba4b	effective_replication_map: add get_range_addresses Equivalent to abstract_replication_strategy get_range_addresses, yet synchronous, as it uses the precalculated map. Call it from storage_service::get_new_source_ranges and range_streamer::get_all_ranges_with_sources_for. Consequently, get_new_source_ranges and removenode_add_ranges can become synchronous too. Unfortunately we can't entirely get rid of abstract_replication_strategy::get_range_addresses as it's still needed by range_streamer::get_all_ranges_with_strict_sources_for. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	8c85197c6c	abstract_replication_strategy: get rid of shared_token_metadata member and ctor param It is not used any more. Methods either use the token_metadata_ptr in the effective_replication_map, or receive an ad-hoc token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	91f2fd5f2c	abstract_replication_strategy: recognized_options: pass const topology& Prepare for deleting the _shared_token_metadata member. All we need for recognized_options is the topology (for network_topology_strategy). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	4d2561ff75	abstract_replication_strategy: precacluate get_replication_factor for effective_replication_map Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	d953e7b01a	token_metadata: get rid of now-unused sync methods Now that abstract_replication_strategy methods are all async clone_only_token_map_sync, and update_normal_tokens_sync are unused. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	bdce6f93ca	abstract_replication_strategy: get rid of do_calculate_natural_endpoints It is no longer in use. And with it, the virtual calculate_natural_endpoint_sync method of which it was the only caller. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	cbe58345b9	abstract_replication_strategy: futurize get_*address_ranges Remaining callers of get_address_ranges and get_pending_address_ranges are all either from a seastar thread or from a coroutine so we can make the methods always async and drop the can_yield param. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00

1 2 3 4 5 ...

28695 Commits