scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 15:33:15 +00:00

Author	SHA1	Message	Date
Michael Livshin	d8cc535297	database_test: ensure host id continuity The "populate_from_quarantine_works" test case creates sstables with one db config, then reads them with another. Ensure that both configs have the same host id so the sstables pass validation. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Michael Livshin	3fef604075	sstables_manager: add get_local_host_id() method and support Since ME sstable format includes originating host id in stats metadata, local host id needs to be made available for writing and validation. Both Scylla server (where local host id comes from the `system.local` table) and unit tests (where it is fabricated) must be accomodated. Regardless of how the host id is obtained, it is stored in the db config instance and accessed through `sstables_manager`. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Michael Livshin	0895188851	sstables_manager: formalize inheritability The class is already inherited from in tests (along with overriding a non-virtual method), so this seems to be called for. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Michael Livshin	387c882dc7	sstable_3_x_test: test ME sstable format too Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Michael Livshin	0b1447c702	add "sstable_format" config Initialize it to "md" until ME format support is complete (i.e. storing originating host id in sstable stats metadata is implemented), so at present there is no observable change by default. Also declare "enable_sstables_md_format" unused -- the idea, going forward, being that only "sstable_format" controls the written sstable file format and that no more per-format enablement config options shall be added. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Michael Livshin	c96708d262	add support for the ME sstable format The ME format has been introduced in Cassandra 3.11.11: `11952fae77/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java (L123)` `d84c6e9810` It adds originating host id to sstable metadata in support of fixing loss of commit log data when moving sstables between nodes: https://issues.apache.org/jira/browse/CASSANDRA-16619 In Scylla: * The supported way to ingest sstables is via upload/, where stored commit log replay position should be disregarded (but see https://github.com/scylladb/scylla/issues/10080). * A later commit in this series implements originating host id validation for native ME sstables. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Benny Halevy	69fcc053bb	utils: uuid: add null_uuid and respective bool predecate and operator and unit test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220215113438.473400-1-bhalevy@scylladb.com>	2022-02-15 18:02:54 +02:00
Michael Livshin	04c1286a94	Add "me" sstables for the multi-format tests Prerequisite for the "ME sstable format support" series (which has been posted to the mailing list) -- to be merged or rejected together with that. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Closes #9939	2022-02-15 09:24:09 +02:00
Nadav Har'El	212c321c55	test/alternator: add reproducers for non-atomic table creation We add reproducing tests for two known Alternator issues, #6391 and #9868, which involve the non-atomicity of table creation. Creating a table currently involves multiple steps - creating a keyspace, a table, materialized views, and tags. If some of these steps succeed and some fail, we get an InternalServerError and potentially leave behind some half-built table. Both issues will be solved by making better use of the new Raft-based capabilities of making multiple modifications to the schema atomically, but this patch doesn't fix the problem - it just proves it exist. The new tests involve two threads - one repeatedly trying to create a table with a GSI or with tags - and the other thread repeatedly trying to delete the same table under its feet. Both bugs are reproduced almost immediately. Note that like all test/alternator tests, the new tests are usually run on just one node. So when we fix the bug and these tests begin to pass, it will not be a proof that concurrent schema modification works safely on different nodes. To prove that, we will also need a multi-node test. However, this test can prove that we used Raft-based schema modification correctly - and if we assume that the Raft-based schema modification feature is itself correct, then we can be sure that CreateTable will be correct also across multiple nodes. Although it won't hurt to check it directly. Refs #6391 Refs #9868 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220207223100.207074-1-nyh@scylladb.com>	2022-02-14 18:21:21 +02:00
Avi Kivity	7cc43f8aa8	Merge 'utils: add result_try and result_futurize_try' from Piotr Dulikowski Adds `utils::result_try` and `utils::result_futurize_try` - functions which allow to convert existing try..catch blocks into a version which handles C++ exceptions, failed results with exception containers and, depending on the function variant, exceptional futures using the same exception handling logic. For example, you can convert the following try..catch block: try { return a_function_that_may_throw(); } catch (const my_exception& ex) { return 123; } catch (...) { throw; } ...to this: return utils::result_try([&] { return a_function_that_may_throw_or_return_a_failed_result(); }, utils::result_catch<my_exception>([&] (const Ex&) { return 123; }), utils::result_catch_dots([&] (auto&& handle) { return handle.into_result(); }); Similarly, `utils::result_futurize_try` can be used to migrate `then_wrapped` or `f.handle_exception()` constructs. As an example of the usability of the new constructs, two places in the current code which need to simultaneously handle exceptions and failed results are converted to use `result_try` and `result_futurize_try`. Results of `perf_simple_query --smp 1 --operations-per-shard 1000000 --write`: ``` 127041.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52422 insns/op) 126958.60 tps ( 67.2 allocs/op, 14.2 tasks/op, 52409 insns/op) 127088.37 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op) 127560.84 tps ( 67.2 allocs/op, 14.2 tasks/op, 52424 insns/op) 127826.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52406 insns/op) 126801.02 tps ( 67.2 allocs/op, 14.2 tasks/op, 52420 insns/op) 125371.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52425 insns/op) 126498.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52427 insns/op) 126359.41 tps ( 67.2 allocs/op, 14.2 tasks/op, 52423 insns/op) 126298.27 tps ( 67.2 allocs/op, 14.2 tasks/op, 52410 insns/op) ``` The number of tasks and allocations is unchanged. The number of instructions per operations seems similar, it may have increased slightly (by 10-20) but it's hard to tell for sure because of the noisiness of the results. Tests: unit(dev) Closes #10045 * github.com:scylladb/scylla: transport: use result_try in process_request_one storage_proxy: use result_futurize_try in mutate_end storage_proxy: temporarily throw exception from result in mutate_end utils: add result_try and result_futurize_try	2022-02-13 19:38:13 +02:00
Avi Kivity	6572b297a2	treewide: clean up stray license blurbs After the mechanical change in `fcb8d040e8` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), a few stray license blurbs or fragments thereof remain. In two cases these were extra blurbs in code generators intended for the generated code, in others they were just missed by the script. Clean them up, adding an SPDX license identifier where needed. Closes #10072	2022-02-13 14:16:16 +02:00
Avi Kivity	6b380121e0	Merge 'utils/result: optimize result_parallel_for_each' from Piotr Dulikowski This PR rewrites the `utils::result_parallel_for_each`'s implementation to resemble the original `seastar::parallel_for_each` more closely instead of using the less efficient `seastar::map_reduce`. It uses less tasks and allocations now, as demonstrated in the results from the `perf_result_query` benchmark, attached at the end of the cover letter. The main drawback of the new implementation is that it needs to rethrow exceptions propagated as exceptional futures from the parallel sub-invocations. Contrary to the original `seastar::parallel_for_each` which uses a custom task to collect results, the new `utils::result_parallel_for_each` uses a coroutine and there doesn't currently seem to be a way to co_await for a future and inspect its state without either rethrowing or handling it in then_wrapped (which allocates a continuation). Fortunately, rethrowing is not needed for exceptions returned in failed result<>, which are already intended to be a more performant alternative to regular exceptions. As a bonus, definitions from `utils/result.hh` are now split across three different headers in order to improve (re)compilation times. Results from `perf_simple_query --smp 1 --operations-per-shard 1000000 --write` (before vs. after): ``` 126872.54 tps ( 67.2 allocs/op, 14.2 tasks/op, 52404 insns/op) 126532.13 tps ( 67.2 allocs/op, 14.2 tasks/op, 52408 insns/op) 126864.99 tps ( 67.2 allocs/op, 14.2 tasks/op, 52428 insns/op) 127073.10 tps ( 67.2 allocs/op, 14.2 tasks/op, 52404 insns/op) 126895.85 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op) 127894.02 tps ( 66.2 allocs/op, 13.2 tasks/op, 52036 insns/op) 127671.51 tps ( 66.2 allocs/op, 13.2 tasks/op, 52042 insns/op) 127541.42 tps ( 66.2 allocs/op, 13.2 tasks/op, 52044 insns/op) 127409.10 tps ( 66.2 allocs/op, 13.2 tasks/op, 52052 insns/op) 127831.30 tps ( 66.2 allocs/op, 13.2 tasks/op, 52043 insns/op) ``` Test: unit(dev, debug) Closes #10053 * github.com:scylladb/scylla: utils/result: optimize result_parallel_for_each utils/result: split into `combinators` and `loop` file	2022-02-13 12:04:40 +02:00
Piotr Dulikowski	dd3284ec38	utils/result: optimize result_parallel_for_each It now resembles the original parallel_for_each more, but uses a coroutine instead of a custom `task` to collect not-ready futures. Although the usage of a coroutine saves on allocations, the drawback is that there is currently no way to co_await on a future and handle its exception without throwing or without unconditionally allocating a then_wrapped or handle_exception continuation - so it introduces a rethrow. Furthermore, now failed results and exceptions are treated as equals. Previously, in case one parallel invocation returned failed result and another returned an exception, the exception would always be returned. Now, the failed result/exception of the invocation with the lowest index is always preferred, regardless of the failure type. The reimplementation manages to save about 350-400 instructions, one task and one allocation in the perf_simple_query benchmark in write mode. Results from `perf_simple_query --smp 1 --operations-per-shard 1000000 --write` (before vs. after): ``` 126872.54 tps ( 67.2 allocs/op, 14.2 tasks/op, 52404 insns/op) 126532.13 tps ( 67.2 allocs/op, 14.2 tasks/op, 52408 insns/op) 126864.99 tps ( 67.2 allocs/op, 14.2 tasks/op, 52428 insns/op) 127073.10 tps ( 67.2 allocs/op, 14.2 tasks/op, 52404 insns/op) 126895.85 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op) 127894.02 tps ( 66.2 allocs/op, 13.2 tasks/op, 52036 insns/op) 127671.51 tps ( 66.2 allocs/op, 13.2 tasks/op, 52042 insns/op) 127541.42 tps ( 66.2 allocs/op, 13.2 tasks/op, 52044 insns/op) 127409.10 tps ( 66.2 allocs/op, 13.2 tasks/op, 52052 insns/op) 127831.30 tps ( 66.2 allocs/op, 13.2 tasks/op, 52043 insns/op) ``` Test: unit(dev), unit(result_utils_test, debug)	2022-02-10 18:19:08 +01:00
Piotr Dulikowski	6abeec6299	utils/result: split into `combinators` and `loop` file Segregates result utilities into: - result.hh - basic definitions related to results with exception containers, - result_combinators.hh - combinators for working with results in conjunction with futures, - result_loop.hh - loop-like combinators, currently has only result_parallel_for_each. The motivation for the split is: 1. In headers, usually only result.hh will be needed, so no need to force most .cc files to compile definitions from other files, 2. Less files need to be recompiled when a combinator is added to result_combinators or result_loop. As a bonus, `result_with_exception` was moved from `utils::internal` to just `utils`.	2022-02-10 18:19:05 +01:00
Piotr Dulikowski	8d52ceca50	utils: add result_try and result_futurize_try Adds result_try and result_futurize_try - functions which allow to convert existing try..catch blocks into a version which handles C++ exceptions, failed results with exception containers and, depending on the function variant, exceptional futures.	2022-02-10 17:35:32 +01:00
Nadav Har'El	4937270803	test/alternator: add option to run with Raft-based schema changes This patch adds a "--raft" option to test/alternator/run to enable the experimental Raft-based schema changes ("--experimental-features=raft") when running Scylla for the tests. This is the same option we added to test/cql-pytest/run in a previous patch. Note that we still don't have any Alternator tests that pass or fail differently in these two modes - these will probably come later as we fix issues #9868 and #6391. But in order to work on fixing those issues we need to be able to run the tests in Raft mode. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220209123144.321344-1-nyh@scylladb.com>	2022-02-10 09:43:10 +02:00
Nadav Har'El	a1635b553e	cql-pytest: fix detection of "raft" experimental feature In a previous patch we fixed the output of experimental features list (issue #10047), so we also need to fix the test code which detects the "raft" experimental feature - to use the string "raft" and not the silly byte 4 we had there before. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220209104331.312999-1-nyh@scylladb.com>	2022-02-10 09:10:24 +03:00
Nadav Har'El	de586ef856	test/cql-pytest: mechanism for tests requiring raft-based schema updates Issue #8968 no longer exists when Raft-based schema updates are enabled in Scylla (with --experimental-features=raft). Before we can close this issue we need a way to re-run its test test_keyspace.py::test_concurrent_create_and_drop_keyspace with Raft and see it pass. But we also want the tests to continue to run by default the older raft-less schema updates - so that this mode doesn't regress during the potentially-long duration that it's still the default! The solution in this patch is: 1. Introduce a "--raft" option to test/cql-pytest/run, which runs the tests against a Scylla with the raft experimental feature, while the default is still to run without it. 2. Introduce a text fixture "fails_without_raft" which marks a test which is expected to fail with the old pre-raft code, but is expected to pass in the new code. 3. Mark the test test_concurrent_create_and_drop_keyspace with this new "fails_without_raft". After this patch, running test/cql-pytest/run --raft test_keyspace.py::test_concurrent_create_and_drop_keyspace Passes, which shows that issue 8968 was fixed (in Raft mode) - so we can say: Fixes #8968 Running the same test without "--raft" still xfails (an expected failure). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220208162732.260888-1-nyh@scylladb.com>	2022-02-10 09:10:24 +03:00
Nadav Har'El	fef7934a2d	config: fix some types in system.config virtual table The system.config virtual tables prints each configuration variable of type T based on the JSON printer specified in the config_type_for<T> in db/config.cc. For two variable types - experimental_features and tri_mode_restriction, the specified converter was wrong: We used value_to_json<string> or value_to_json<vector<string>> on something which was not a string. Unfortunately, value_to_json silently casted the given objects into strings, and the result was garbage: For example as noted in #10047, for experimental_features instead of printing a list of features names, e.g., "raft", we got a bizarre list of one-byte strings with each feature's number (which isn't documented or even guaranteed to not change) as well as carriage-return characters (!?). So solution is a new printable_to_json<T> which works on a type T that can be printed with operator<< - as in fact the above two types can - and the type is converted into a string or vector of strings using this operator<<, not a cast. Also added a cql-pytest test for reading system.config and in particular options of the above two types - checking that they contain sensible strings and not "garbage" like before this patch. Fixes #10047. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220209090421.298849-1-nyh@scylladb.com>	2022-02-10 09:10:24 +03:00
Avi Kivity	5099b1e272	Merge 'Propagate coordinator timeouts for regular writes and batches without throwing' from Piotr Dulikowski Currently, most of the failures that occur during CQL reads or writes are reported using C++ exceptions. Although the seastar framework avoids most of the cost of unwinding by keeping exceptions in futures as `std::exception_ptr`s, the exceptions need to be inspected at various points for the purposes of accounting metrics or converting them to a CQL error response. Analyzing the value and type of an exception held by `std::exception_ptr`'s cannot be done without rethrowing the exception, and that can be very costly even if the exception is immediately caught. Because of that, exceptions are not a good fit for reporting failures which happen frequently during overload, especially if the CPU is the bottleneck. This PR introduces facilities for reporting exceptions as values using the boost::outcome library. As a first step, the need to use exceptions for reporting timeouts was eliminated for regular and batch writes, and no exceptions are thrown between creation of a `mutation_write_timeout_exception` and its serialization as a CQL response in the `cql_server`. The types and helpers introduced here can be reused in order to migrate more exceptions and exception paths in a similar fashion. Results of `perf_simple_query --smp 1 --operations-per-shard 1000000`: Master (`00a9326ae7`) 128789.53 tps ( 82.2 allocs/op, 12.2 tasks/op, 49245 insns/op) This PR 127072.93 tps ( 82.2 allocs/op, 12.2 tasks/op, 49356 insns/op) The new version seems to be slower by about 100 insns/op, fortunately not by much (about 0.2%). Tests: unit(dev), unit(result_utils_test, debug) Closes #10014 * github.com:scylladb/scylla: cql_test_env: optimize handling result_message::exception transport/server: handle exceptions from coordinator_result without throwing transport/server: propagate coordinator_result to the error handling code transport/server: unwrap the exception result_message in process_xyz_internal query_processor: add exception-returning variants of execute_ methods modification_statement: propagate failed result through result_message::exception batch_statement: propagate failed result through result_message::exception cql_statement: add `execute_without_checking_exception_message` result_message: add result_message::exception storage_proxy: change mutate_with_triggers to return future<result<>> storage_proxy: add mutate_atomically_result storage_proxy: return result<> from mutate_result storage_proxy: return result<> from mutate_internal storage_proxy: properly propagate future from mutate_begin to mutate_end storage_proxy: handle exceptions as values in mutate_end storage_proxy: let mutate_end take a future<result<>> storage_proxy: resultify mutate_begin storage_proxy: use result in the _ready future of write handlers storage_proxy: introduce helpers for dealing with results exceptions: add coordinator_exception_container and coordinator_result utils: add result utils utils: add exception_container	2022-02-08 14:27:09 +02:00
Piotr Dulikowski	ffd439d908	cql_test_env: optimize handling result_message::exception The single_node_cql_env uses query_processor::execute_xyz family of methods to perform operations. Due to previous commits in this series, they allocate one more task than before - a continuation that converts result_message::exception into an exceptional future. We can recover that one task by using variants of those methods which do not perform a conversion, and turn .finally() invocations into .then()s which perform conversion manually.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	e4ff22b4ca	result_message: add result_message::exception In order to propagate exceptions as values through the CQL layer with minimal modifications to the interfaces, a new result_message type is introduced: result_message::exception. Similarly to result_message::bounce_to_shard, this is an internal type which is supposed to be handled before being returned to the client.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	11cb670881	utils: add result utils Adds a number of utilities for working with boost::outcome::result combined with exception_container. The utilities are meant to help with migration of the existing code to use the boost::outcome::result: - `exception_container_throw_policy` - a NoValuePolicy meant to be used as a template parameter for the boost::outcome::result. It protects the caller of `result::value()` and `result::error()` methods - if the caller wishes to get a value but the result has an error (exception_container in our case), the exception in the container will be thrown instead. In case it's the other way around, boost::outcome::bad_result_access is thrown. - `result_parallel_for_each` - a version of `parallel_for_each` which is aware of results and returns a failed result in case any of the parallel invocations return a failed result. - `result_into_future` - converts a result into a future. If the result holds a value, converts it into make_ready_future; if it holds an exception, the exception is returned as make_exception_future. - `then_ok_result` takes a `future<T>` and converts it into a `future<result<T>>`. - `result_wrap` adapts a callable of type `T -> future<result<T>>` and returns a callable of type `result<T> -> future<result<T>>`.	2022-02-08 11:08:42 +01:00
Nadav Har'El	9982a28007	alternator: allow REMOVE of non-existent nested attribute DynamoDB allows an UpdateItem operation "REMOVE x.y" when a map x exists in the item, but x.y doesn't - the removal silently does nothing. Alternator incorrectly generated an error in this case, and unfortunately we didn't have a test for this case. So in this patch we add the missing test (which fails on Alternator before this patch - and passes on DynamoDB) and then fix the behavior. After this patch, "REMOVE x.y" will remain an error if "x" doesn't exist (saying "document paths not valid for this item"), but if "x" exists and is a map, but "x.y" doesn't, the removal will silently do nothing and will not be an error. Fixes #10043. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220207133652.181994-1-nyh@scylladb.com>	2022-02-07 18:40:48 +02:00
Nadav Har'El	203291f7ba	cql: reject a map literal with the same key twice The CQL parser currently accepts a command like: ALTER KEYSPACE ksname WITH replication = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc1' : 3 } But because these options are read into an std::map, one of the definitions of 'dc1' is silently ignored (counter-intuitively, it is the first setting which is kept, and the second setting is ignored.) But this is most likely a user's typo, so a better choice is to report this as a parse error instead of arbitrarly and silently keeping just one of the settings. This is what Cassandra does since version 3.11 (see https://issues.apache.org/jira/browse/CASSANDRA-13369 and Cassandra commit 1a83efe2047d0138725d5e102cc40774f3b14641), and this is what we do in this patch. The unit test cassandra_tests/validation/operations/alter_test.py:: testAlterKeyspaceWithMultipleInstancesOfSameDCThrowsSyntaxException, translated from Cassandra's unit tests, now passes. Fixes #10037. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220207113709.78613-1-nyh@scylladb.com>	2022-02-07 18:40:48 +02:00
Nadav Har'El	cc57ac8c1c	cql3: add a cql3::util::quote() function The function cql3::util::maybe_quote() is used throughout Scylla to convert identifier names (column names, table names, etc.) into strings that can be embedded in CQL commands. maybe_quote() sometimes needs to quote these identifier names, but when the identifier name is lowercase, and not a CQL keyword, it is not quoted. Not quoting identifier names when not needed is nice and pretty, but has a forward-compatibility problem: If some CQL command with an unquoted identifier is saved somewhere, and new version of Scylla adss this identifier as a new reserved keyword - the CQL command will break. So this patch introduces a new function, cql3::util::quote(), which unconditionally quotes the given identifier. The new function is not yet used in Scylla, but we add a unit test (based on the test of maybe_quote()) to confirm it behaves correctly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220118161217.231811-2-nyh@scylladb.com>	2022-02-07 11:33:57 +02:00
Nadav Har'El	5d2f694a90	cql3: fix cql3::util::maybe_quote() for keywords cql3::util::maybe_quote() is a utility function formatting an identifier name (table name, column name, etc.) that needs to be embedded in a CQL statement - and might require quoting if it contains non-alphanumeric characters, uppercase characters, or a CQL keyword. maybe_quote() made an effort to only quote the identifier name if neccessary, e.g., a lowercase name usually does not need quoting. But lowercase names that are CQL keywords - e.g., to or where - cannot be used as identifiers without quoting. This can cause problems for code that wants to generate CQL statements, such as the materialized-view problem in issue #9450 - where a user had a column called "to" and wanted to create a materialized view for it. So in this patch we fix maybe_quote() to recognize invalid identifiers by using the CQL parser, and quote them. This will quote reserved keywords, but not so-called unreserved keywords, which are allowed as identifiers and don't need quoting. This addition slows down maybe_quote(), but maybe_quote() is anyway only used in heavy operations which need to generate CQL. This patch also adds two tests that reproduce the bug and verify its fix: 1. Add to the low-level maybe_quote() test (a C++ unit test) also tests that maybe_quote() quotes reserved keywords like "to", but doesn't quote unreserved keywords like "int". 2. Add a test reproducing issue #9450 - creating a materialized view whose key column is a keyword. This new test passes on Cassandra, failed on Scylla before this patch, and passes after this patch. It is worth noting that maybe_quote() now has a "forward compatiblity" problem: If we save CQL statements generated by maybe_quote(), and a future version introduces a new reserved keyword, the parser of the future version may not be able to parse the saved CQL statement that was generated with the old mayb_quote() and didn't quote what is now a keyword. This problem can be solved in two ways: 1. Try hard not to introduced new reserved keywords. Instead, introduce unreserved keywords. We've been doing this even before recognizing this maybe_quote() future-compatibility problem. 2. In the next patch we will introduce quote() - which unconditionally quotes identifier names, even if lowercase. These quoted names will be uglier for lowercase names - but will be safe from future introduction of new keywords. So we can consider switching some or all uses of maybe_quote() to quote(). Fixes #9450 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220118161217.231811-1-nyh@scylladb.com>	2022-02-07 11:33:56 +02:00
Nadav Har'El	b3cfd4ce07	cql-pytest: translate Cassandra's tests for ALTER operations This is a translation of Cassandra's CQL unit test source file validation/operations/AlterTest.java into our our cql-pytest framework. This test file includes 24 tests for various types of ALTER operations (of keyspaces, tables and types). Two additional tests which required multiple data centers to test were dropped with a comment explaining why. All 24 tests pass on Cassandra, with 8 failing on Scylla reproducing one already known Scylla issue and 5 previously-unknown ones: Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Refs #9929: Cassandra added "USING TIMESTAMP" to "ALTER TABLE", we didn't. Refs #9930: Forbid re-adding static columns as regular and vice versa Refs #9935: Scylla stores un-expanded compaction class name in system tables. Refs #10036: Reject empty options while altering a keyspace Refs #10037: If there are multiple values for a key, CQL silently chooses last value Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220206163820.1875410-2-nyh@scylladb.com>	2022-02-07 10:57:43 +02:00
Nadav Har'El	b61876f4ff	test/cql-pytest: implement nodetool.compact() Implement the nodetool.compact() function, requesting a major compaction of the given table. As usual for the nodetool.* functions, this is implemented with the REST API if available (i.e., testing Scylla), or with the external "nodetool" command if not (for testing Cassandra). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220206163820.1875410-1-nyh@scylladb.com>	2022-02-07 10:57:42 +02:00
Konstantin Osipov	caeaba60f9	cql_repl: use POSIX primitives to reset input/output Seastar uses POSIX IO for output in addition to C++ iostreams, e.g. in print_safe(), where it write()s directly to stdout. Instead of manipulating C++ output streams to reset stdout/log files, reopen the underlying file descriptors to output/log files. Fixes #9962 "cql_repl prints junk into the log" Message-Id: <20220204205032.1313150-1-kostja@scylladb.com>	2022-02-07 10:53:20 +02:00
Piotr Dulikowski	80f6224959	utils: add exception_container Adds `exception_container` - a helper type used to hold exceptions as a value, without involving the std::exception_ptr. The motivation behind this type is that it allows inspecting exception's type and value without having to rethrow that exception and catch it, unlike std::exception_ptr. In our current codebase, some exception handling paths need to rethrow the exception multiple times in order to account it into metrics or encode it as an error response to the CQL client. Some types of exceptions can be thrown very frequently in case of overload (e.g. timeouts) and inspecting those exceptions with rethrows can make the overload even worse. For those kinds of exceptions it is important to handle them as cheaply as possible, and exception_container used with conjunction with boost::outcome::result can help achieve that.	2022-02-04 20:18:00 +01:00
Avi Kivity	fe65122ccd	Merge 'Distribute `select count()` queries' from Michał Sala This pull request speeds up execution of `count()` queries. It does so by splitting given query into sub-queries and distributing them across some group of nodes for parallel execution. New level of coordination was added. Node called super-coordinator splits aggregation query into sub-queries and distributes them across some group of coordinators. Super-coordinator is also responsible for merging results. To develop a mechanism for speeding up `count()` queries, there was a need to detect which queries have a `count()` selector. Due to this pull request being a proof of concept, detection was realized rather poorly. It is only allows catching the simplest cases of `count()` queries (with only one selector and no column name specified). After detecting that a query is a `count()` it should be split into sub-queries and sent to another coordinators. Splitting part wasn't that difficult, it has been achieved by limiting original query's partition ranges. Sending modified query to another node was much harder. The easiest scenario would be to send whole `cql3::statements::select_statement`. Unfortunately `cql3::statements::select_statement` can't be [de]serialized, so sending it was out of the question. Even more unfortunately, some non-[de]serializable members of `cql3::statements::select_statement` are required to start the execution process of this statement. Finally, I have decided to send a `query::read_command` paired with required [de]serializable members. Objects, that cannot be [de]serialized (such as query's selector) are mocked on the receiving end. When a super-coordinator receives a `count()` query, it splits it into sub-queries. It does so, by splitting original query's partition ranges into list of vnodes, grouping them by their owner and creating sub-queries with partition ranges set to successive results of such grouping. After creation, each sub-query is sent to the owner of its partition ranges. Owner dispatches received sub-query to all of its shards. Shards slice partition ranges of the received sub-query, so that they will only query data that is owned by them. Each shard becomes a coordinator and executes so prepared sub-query. 3 node cluster set up on powerful desktops located in the office (3x32 cores) Filled the cluster with ~2 10^8 rows using scylla-bench and run: ``` time cqlsh <ip> <port> --request-timeout=3600 -e "select count() from scylla_bench.test using timeout 1h;" ``` master: 68s * this branch: 2s 3 node cluster (each node had 2 shards, `murmur3_ignore_msb_bits` was set to 1, `num_tokens` was set to 3) ``` > cqlsh -e 'tracing on; select count() from ks.t; Now Tracing is enabled count ------- 1000 (1 rows) Tracing session: e5852020-7fc3-11ec-8600-4c4c210dd657 activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2022-01-27 22:53:08.770000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 1] \| 2022-01-27 22:53:08.770451 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 1] \| 2022-01-27 22:53:08.770487 \| 127.0.0.1 \| 36 \| 127.0.0.1 Dispatching forward_request to 3 endpoints [shard 1] \| 2022-01-27 22:53:08.770509 \| 127.0.0.1 \| 58 \| 127.0.0.1 Sending forward_request to 127.0.0.1:0 [shard 1] \| 2022-01-27 22:53:08.770516 \| 127.0.0.1 \| 64 \| 127.0.0.1 Executing forward_request [shard 1] \| 2022-01-27 22:53:08.770519 \| 127.0.0.1 \| -- \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770528 \| 127.0.0.1 \| 9 \| 127.0.0.1 Start querying token range ({-4242912715832118944, end}, {-4075408479358018994, end}] [shard 1] \| 2022-01-27 22:53:08.770531 \| 127.0.0.1 \| 12 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.770537 \| 127.0.0.1 \| 18 \| 127.0.0.1 Scanning cache for range ({-4242912715832118944, end}, {-4075408479358018994, end}] and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.770541 \| 127.0.0.1 \| 22 \| 127.0.0.1 Page stats: 12 partition(s), 0 static row(s) (0 live, 0 dead), 12 clustering row(s) (12 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.770589 \| 127.0.0.1 \| 70 \| 127.0.0.1 Sending forward_request to 127.0.0.2:0 [shard 1] \| 2022-01-27 22:53:08.770600 \| 127.0.0.1 \| 149 \| 127.0.0.1 Sending forward_request to 127.0.0.3:0 [shard 1] \| 2022-01-27 22:53:08.770608 \| 127.0.0.1 \| 157 \| 127.0.0.1 Executing forward_request [shard 0] \| 2022-01-27 22:53:08.770627 \| 127.0.0.1 \| -- \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.770639 \| 127.0.0.1 \| 11 \| 127.0.0.1 Start querying token range ({2507462623645193091, end}, {3897266736829642805, end}] [shard 0] \| 2022-01-27 22:53:08.770643 \| 127.0.0.1 \| 15 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.770646 \| 127.0.0.1 \| 19 \| 127.0.0.1 Scanning cache for range ({2507462623645193091, end}, {3897266736829642805, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.770649 \| 127.0.0.1 \| 22 \| 127.0.0.1 Executing forward_request [shard 1] \| 2022-01-27 22:53:08.770658 \| 127.0.0.2 \| -- \| 127.0.0.1 Executing forward_request [shard 1] \| 2022-01-27 22:53:08.770674 \| 127.0.0.3 \| 5 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770698 \| 127.0.0.2 \| 40 \| 127.0.0.1 Start querying token range [{4611686018427387904, start}, {5592106830937975806, end}] [shard 1] \| 2022-01-27 22:53:08.770704 \| 127.0.0.2 \| 46 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.770710 \| 127.0.0.2 \| 52 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770712 \| 127.0.0.3 \| 43 \| 127.0.0.1 Scanning cache for range [{4611686018427387904, start}, {5592106830937975806, end}] and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.770714 \| 127.0.0.2 \| 56 \| 127.0.0.1 Start querying token range [{-4611686018427387904, start}, {-4242912715832118944, end}] [shard 1] \| 2022-01-27 22:53:08.770718 \| 127.0.0.3 \| 49 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.770739 \| 127.0.0.3 \| 70 \| 127.0.0.1 Scanning cache for range [{-4611686018427387904, start}, {-4242912715832118944, end}] and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.770743 \| 127.0.0.3 \| 73 \| 127.0.0.1 Page stats: 17 partition(s), 0 static row(s) (0 live, 0 dead), 17 clustering row(s) (17 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.770814 \| 127.0.0.3 \| 145 \| 127.0.0.1 Executing forward_request [shard 0] \| 2022-01-27 22:53:08.770846 \| 127.0.0.3 \| -- \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.770862 \| 127.0.0.3 \| 16 \| 127.0.0.1 Page stats: 71 partition(s), 0 static row(s) (0 live, 0 dead), 71 clustering row(s) (71 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.770865 \| 127.0.0.1 \| 238 \| 127.0.0.1 Start querying token range ({-6683686776653114062, end}, {-6473446911791631266, end}] [shard 0] \| 2022-01-27 22:53:08.770867 \| 127.0.0.3 \| 21 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.770874 \| 127.0.0.3 \| 28 \| 127.0.0.1 Scanning cache for range ({-6683686776653114062, end}, {-6473446911791631266, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.770879 \| 127.0.0.3 \| 33 \| 127.0.0.1 Page stats: 48 partition(s), 0 static row(s) (0 live, 0 dead), 48 clustering row(s) (48 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.770880 \| 127.0.0.2 \| 222 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.770888 \| 127.0.0.1 \| 369 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770909 \| 127.0.0.1 \| 390 \| 127.0.0.1 Start querying token range ({-4075408479358018994, end}, {-3391415989210253693, end}] [shard 1] \| 2022-01-27 22:53:08.770911 \| 127.0.0.1 \| 392 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.770914 \| 127.0.0.1 \| 395 \| 127.0.0.1 Scanning cache for range ({-4075408479358018994, end}, {-3391415989210253693, end}] and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.770936 \| 127.0.0.1 \| 418 \| 127.0.0.1 Executing forward_request [shard 0] \| 2022-01-27 22:53:08.770951 \| 127.0.0.2 \| -- \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.770966 \| 127.0.0.2 \| 15 \| 127.0.0.1 Page stats: 12 partition(s), 0 static row(s) (0 live, 0 dead), 12 clustering row(s) (12 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.770969 \| 127.0.0.3 \| 123 \| 127.0.0.1 Start querying token range (-inf, {-6683686776653114062, end}] [shard 0] \| 2022-01-27 22:53:08.770969 \| 127.0.0.2 \| 18 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.770974 \| 127.0.0.2 \| 23 \| 127.0.0.1 Scanning cache for range (-inf, {-6683686776653114062, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.770977 \| 127.0.0.2 \| 26 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.770993 \| 127.0.0.3 \| 324 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770998 \| 127.0.0.3 \| 329 \| 127.0.0.1 Start querying token range ({-3391415989210253693, end}, {0, start}) [shard 1] \| 2022-01-27 22:53:08.771001 \| 127.0.0.3 \| 332 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.771004 \| 127.0.0.3 \| 335 \| 127.0.0.1 Scanning cache for range ({-3391415989210253693, end}, {0, start}) and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.771007 \| 127.0.0.3 \| 338 \| 127.0.0.1 Page stats: 48 partition(s), 0 static row(s) (0 live, 0 dead), 48 clustering row(s) (48 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.771044 \| 127.0.0.1 \| 525 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.771069 \| 127.0.0.1 \| 442 \| 127.0.0.1 On shard execution result is [71] [shard 0] \| 2022-01-27 22:53:08.771145 \| 127.0.0.1 \| 518 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.771308 \| 127.0.0.1 \| 789 \| 127.0.0.1 On shard execution result is [60] [shard 1] \| 2022-01-27 22:53:08.771351 \| 127.0.0.1 \| 832 \| 127.0.0.1 Page stats: 127 partition(s), 0 static row(s) (0 live, 0 dead), 127 clustering row(s) (127 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.771379 \| 127.0.0.2 \| 427 \| 127.0.0.1 Page stats: 183 partition(s), 0 static row(s) (0 live, 0 dead), 183 clustering row(s) (183 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.771385 \| 127.0.0.3 \| 716 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.771402 \| 127.0.0.3 \| 556 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.771403 \| 127.0.0.2 \| 745 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.771408 \| 127.0.0.2 \| 750 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.771409 \| 127.0.0.3 \| 563 \| 127.0.0.1 Start querying token range ({5592106830937975806, end}, +inf) [shard 1] \| 2022-01-27 22:53:08.771411 \| 127.0.0.2 \| 754 \| 127.0.0.1 Start querying token range ({-6272011798787969456, end}, {-4611686018427387904, start}) [shard 0] \| 2022-01-27 22:53:08.771412 \| 127.0.0.3 \| 566 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.771415 \| 127.0.0.3 \| 569 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.771415 \| 127.0.0.2 \| 757 \| 127.0.0.1 Scanning cache for range ({5592106830937975806, end}, +inf) and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.771419 \| 127.0.0.2 \| 761 \| 127.0.0.1 Scanning cache for range ({-6272011798787969456, end}, {-4611686018427387904, start}) and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.771419 \| 127.0.0.3 \| 573 \| 127.0.0.1 Received forward_result=[131] from 127.0.0.1:0 [shard 1] \| 2022-01-27 22:53:08.771454 \| 127.0.0.1 \| 1003 \| 127.0.0.1 Page stats: 74 partition(s), 0 static row(s) (0 live, 0 dead), 74 clustering row(s) (74 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.771764 \| 127.0.0.3 \| 918 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.771768 \| 127.0.0.3 \| 922 \| 127.0.0.1 Start querying token range [{0, start}, {2507462623645193091, end}] [shard 0] \| 2022-01-27 22:53:08.771771 \| 127.0.0.3 \| 925 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.771775 \| 127.0.0.3 \| 929 \| 127.0.0.1 Scanning cache for range [{0, start}, {2507462623645193091, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.771779 \| 127.0.0.3 \| 933 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.771935 \| 127.0.0.3 \| 1265 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.771950 \| 127.0.0.2 \| 998 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.771956 \| 127.0.0.2 \| 1004 \| 127.0.0.1 Start querying token range ({-6473446911791631266, end}, {-6272011798787969456, end}] [shard 0] \| 2022-01-27 22:53:08.771959 \| 127.0.0.2 \| 1008 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.771963 \| 127.0.0.2 \| 1011 \| 127.0.0.1 Scanning cache for range ({-6473446911791631266, end}, {-6272011798787969456, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.771966 \| 127.0.0.2 \| 1014 \| 127.0.0.1 Page stats: 13 partition(s), 0 static row(s) (0 live, 0 dead), 13 clustering row(s) (13 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.772008 \| 127.0.0.2 \| 1057 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.772012 \| 127.0.0.2 \| 1061 \| 127.0.0.1 Start querying token range ({3897266736829642805, end}, {4611686018427387904, start}) [shard 0] \| 2022-01-27 22:53:08.772014 \| 127.0.0.2 \| 1063 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.772016 \| 127.0.0.2 \| 1065 \| 127.0.0.1 Scanning cache for range ({3897266736829642805, end}, {4611686018427387904, start}) and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.772019 \| 127.0.0.2 \| 1067 \| 127.0.0.1 On shard execution result is [200] [shard 1] \| 2022-01-27 22:53:08.772053 \| 127.0.0.3 \| 1384 \| 127.0.0.1 Page stats: 56 partition(s), 0 static row(s) (0 live, 0 dead), 56 clustering row(s) (56 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.772138 \| 127.0.0.2 \| 1186 \| 127.0.0.1 Page stats: 190 partition(s), 0 static row(s) (0 live, 0 dead), 190 clustering row(s) (190 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.772364 \| 127.0.0.2 \| 1706 \| 127.0.0.1 Page stats: 149 partition(s), 0 static row(s) (0 live, 0 dead), 149 clustering row(s) (149 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.772407 \| 127.0.0.3 \| 1561 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.772417 \| 127.0.0.3 \| 1571 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.772418 \| 127.0.0.2 \| 1760 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.772426 \| 127.0.0.2 \| 1475 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.772428 \| 127.0.0.2 \| 1476 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.772449 \| 127.0.0.3 \| 1604 \| 127.0.0.1 On shard execution result is [196] [shard 0] \| 2022-01-27 22:53:08.772555 \| 127.0.0.2 \| 1603 \| 127.0.0.1 On shard execution result is [238] [shard 1] \| 2022-01-27 22:53:08.772674 \| 127.0.0.2 \| 2016 \| 127.0.0.1 On shard execution result is [235] [shard 0] \| 2022-01-27 22:53:08.772770 \| 127.0.0.3 \| 1924 \| 127.0.0.1 Received forward_result=[435] from 127.0.0.3:0 [shard 1] \| 2022-01-27 22:53:08.772933 \| 127.0.0.1 \| 2482 \| 127.0.0.1 Received forward_result=[434] from 127.0.0.2:0 [shard 1] \| 2022-01-27 22:53:08.773110 \| 127.0.0.1 \| 2658 \| 127.0.0.1 Merged result is [1000] [shard 1] \| 2022-01-27 22:53:08.773111 \| 127.0.0.1 \| 2660 \| 127.0.0.1 Done processing - preparing a result [shard 1] \| 2022-01-27 22:53:08.773114 \| 127.0.0.1 \| 2663 \| 127.0.0.1 Request complete \| 2022-01-27 22:53:08.772666 \| 127.0.0.1 \| 2666 \| 127.0.0.1 ``` Fixes #1385 Closes #9209 github.com:scylladb/scylla: docs: add parallel aggregations design doc db: config: add a flag to disable new parallelized aggregation algorithm test: add parallelized select count test forward_service: add metrics forward_service: parallelize execution across shards forward_service: add tracing cql3: statements: introduce parallelized_select_statement cql3: query_processor: add forward_service reference to query_processor gms: add PARALLELIZED_AGGREGATION feature service: introduce forward_service storage_proxy: extract query_ranges_to_vnodes_generator to a separate file messaging_service: add verb for count() request forwarding cql3: selection: detect if a selection represents count()	2022-02-04 12:34:19 +02:00
Nadav Har'El	b54e85088d	Merge 'snapshots: Fix snapshot-ctl to include snapshots of dropped tables' from Benny Halevy Snapshot-ctl methods fetch information about snapshots from column family objects. The problem with this is that we get rid of these objects once the table gets dropped, while the snapshots might still be present (the auto_snapshot option is specifically made to create this kind of situation). This commit switches from relying on column family interface to scanning every datadir that the database knows of in search for "snapshots" folders. This PR is a rebased version of #9539 (and slightly cleaned-up, cosmetically) and so it replaces the previous PR. Fixes #3463 Closes #7122 Closes #9884 * github.com:scylladb/scylla: snapshots: Fix snapshot-ctl to include snapshots of dropped tables table: snapshot: add debug messages	2022-02-04 12:34:19 +02:00
Botond Dénes	d309a86708	Merge 'Add keyspace_offstrategy_compaction api' from Benny Halevy This series adds methods to perform offstrategy compaction, if needed, returning a future<bool> so the caller can wait on it until compaction completes. The returned value is true iff offstrategy compaction was needed. The added keyspace_offstrategy_compaction calls perform_offstrategy_compaction on the specified keyspace and tables, return the number of tables that required offstrategy compaction. A respective unit test was added to the rest_api pytest. This PR replaces https://github.com/scylladb/scylla/pull/9095 that suggested adding an option to `keyspace_compaction` since offstrategy compaction triggering logic is different enough from major compaction meriting a new api. Test: unit (dev) Closes #9980 * github.com:scylladb/scylla: test: rest_api: add unit tests for keyspace_offstrategy_compaction api api: add keyspace_offstrategy_compaction compaction_manager: get rid of submit_offstrategy table: add perform_offstrategy_compaction compaction_manager: perform_offstrategy: print ks.cf in log messages compaction_manager: allow waiting on offstrategy compaction	2022-02-02 13:15:31 +02:00
Piotr Wojtczak	0dd7739716	snapshots: Fix snapshot-ctl to include snapshots of dropped tables Snapshot-ctl methods fetch information about snapshots from column family objects. The problem with this is that we get rid of these objects once the table gets dropped, while the snapshots might still be present (the auto_snapshot option is specifically made to create this kind of situation). This commit switches from relying on column family interface to scanning every datadir that the database knows of in search for "snapshots" folders. Fixes #3463 Closes #7122 Closes #9884 Signed-off-by: Piotr Wojtczak <piotr.m.wojtczak@gmail.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-02-01 22:31:43 +02:00
Michał Sala	140bab279c	test: add parallelized select count test Added test that checks if a SELECT COUNT(*) query was transformed and processed in a parallel way. Checking is done by looking at the cql statistics and comparing subsequent counts of parallelized aggregation SELECT query executions.	2022-02-01 21:14:41 +01:00
Michał Sala	66a93d3000	cql3: query_processor: add forward_service reference to query_processor	2022-02-01 21:14:41 +01:00
Michał Sala	0fe59082ec	storage_proxy: extract query_ranges_to_vnodes_generator to a separate file Such separation allows using query_ranges_to_vnodes_generator by other services without needing a storage_proxy dependency.	2022-02-01 21:14:41 +01:00
Tomasz Grabiec	00a9326ae7	Merge "raft: let `modify_config` finish on a follower that removes itself" from Kamil When forwarding a reconfiguration request from follower to a leader in `modify_config`, there is no reason to wait for the follower's commit index to be updated. The only useful information is that the leader committed the configuration change - so `modify_config` should return as soon as we know that. There is a reason not to wait for the follower's commit index to be updated: if the configuration change removes the follower, the follower will never learn about it, so a local waiter will never be resolved. `execute_modify_config` - the part of `modify_config` executed on the leader - is thus modified to finish when the configuration change is fully complete (including the dummy entry appended at the end), and `modify_config` - which does the forwarding - no longer creates a local waiter, but returns as soon as the RPC call to the leader confirms that the entry was committed on the leader. We still return an `entry_id` from `execute_modify_config` but that's just an artifact of the implementation. Fixes #9981. A regression test was also added in randomized_nemesis_test. * kbr/modify-config-finishes-v1: test: raft: randomized_nemesis_test: regression test for #9981 raft: server: don't create local waiter in `modify_config`	2022-01-31 20:14:50 +01:00
Nadav Har'El	8a745593a2	Merge 'alternator: fill UnprocessedKeys for failed batch reads' from Piotr Sarna DynamoDB protocol specifies that when getting items in a batch failed only partially, unprocessed keys can be returned so that the user can perform a retry. Alternator used to fail the whole request if any of the reads failed, but right now it instead produces the list of unprocessed keys and returns them to the user, as long as at least 1 read was successful. This series comes with a test based on Scylla's error injection mechanism, and thus is only useful in modes which come with error injection compiled in. In release mode, expect to see the following message: SKIPPED (Error injection not enabled in Scylla - try compiling in dev/debug/sanitize mode) Fixes #9984 Closes #9986 * github.com:scylladb/scylla: test: add total failure case for GetBatchItem test: add error injection case for GetBatchItem test: add a context manager for error injection to alternator alternator: add error injection to BatchGetItem alternator: fill UnprocessedKeys for failed batch reads	2022-01-31 15:28:24 +02:00
Piotr Sarna	c87126198d	test: add total failure case for GetBatchItem The test verifies that if all reads from a batch operation failed, the result is an error, and not a success response with UnprocessedKeys parameter set to all keys.	2022-01-31 14:21:55 +01:00
Piotr Sarna	e79c2943fc	test: add error injection case for GetBatchItem The new test case is based on Scylla error injection mechanism and forces a partial read by failing some requests from the batch.	2022-01-31 14:21:55 +01:00
Piotr Sarna	99c5bec0e2	test: add a context manager for error injection to alternator With the new context manager it's now easier to request an error to be injected via REST API. Note that error injection is only enabled in certain build modes (dev, debug, sanitize) and the test case will be skipped if it's not possible to use this mechanism.	2022-01-31 14:21:55 +01:00
Tomasz Grabiec	8297ae531d	Merge "Automatically retry CQL DDL statements in presence of concurrent changes" from Kamil Schema changes on top of Raft do not allow concurrent changes. If two changes are attempted concurrently, one of them gets `group0_concurrent_modification` exception. Catch the exception in CQL DDL statement execution function and retry. In addition, improve the description of CQL DDL statements in group 0 history table. Add a test which checks that group 0 history grows iff a schema change does not throw `group0_concurrent_modification`. Also check that the retry mechanism works as expected. * kbr/ddl-retry-v1: test: unit test for group 0 concurrent change protection and CQL DDL retries cql3: statements: schema_altering_statement: automatically retry in presence of concurrent changes	2022-01-31 14:12:35 +01:00
Tomasz Grabiec	b78bab7286	Merge "raft: fixes and improvements to the library and nemesis test" from Kamil Raft randomized nemesis test was improved by adding some more chaos: randomizing the network delay, server configuration, ticking speed of servers. This allowed to catch a serious bug, which is fixed in the first patch. The patchset also fixes bugs in the test itself and adds quality of life improvements such as better diagnostics when inconsistency is detected. * kbr/nemesis-random-v1: test: raft: randomized_nemesis_test: print state of each state machine when detecting inconsistency test: raft: randomized_nemesis_test: print details when detecting inconsistency test: raft: randomized_nemesis_test: print snapshot details when taking/loading snapshots in `impure_state_machine` test: raft: randomized_nemesis_test: keep server id in impure_state_machine test: raft: randomized_nemesis_test: frequent snapshotting configuration test: raft: randomized_nemesis_test: tick servers at different speeds in generator test test: raft: randomized_nemesis_test: simplify ticker test: raft: randomized_nemesis_test: randomize network delay test: raft: randomized_nemesis_test: fix use-after-free in `environment::crash()` test: raft: randomized_nemesis_test: fix use-after-free in two-way rpc functions test: raft: randomized_nemesis_test: rpc: don't propagate `gate_closed_exception` outside test: raft: randomized_nemesis_test: fix obsolete comment raft: fsm: print configuration entries appearing in the log raft: `operator<<(ostream&, ...)` implementation for `server_address` and `configuration` raft: server: abort snapshot applications before waiting for rpc abort raft: server: logging fix raft: fsm: don't advance commit index beyond matched entries	2022-01-31 13:25:27 +01:00
Mikołaj Sielużycki	93d6eb6d51	compacting_reader: Support fast_forward_to position range. Fast forwarding is delegated to the underlying reader and assumes the it's supported. The only corner case requiring special handling that has shown up in the tests is producing partition start mutation in the forwarding case if there are no other fragments. compacting state keeps track of uncompacted partition start, but doesn't emit it by default. If end of stream is reached without producing a mutation fragment, partition start is not emitted. This is invalid behaviour in the forwarding case, so I've added a public method to compacting state to force marking partition as non-empty. I don't like this solution, as it feels like breaking an abstraction, but I didn't come across a better idea. Tests: unit(dev, debug, release) Message-Id: <20220128131021.93743-1-mikolaj.sieluzycki@scylladb.com>	2022-01-31 13:37:36 +02:00
Nadav Har'El	a25e265373	test/alternator: improve comment on why we need "global_random" Improve the comment that explains why we needed to use an explicitly shared random sequence instead of the usual "random". We now understand that we need this workaround to undo what the pytest-randomly plugin does. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220130155557.1181345-1-nyh@scylladb.com>	2022-01-31 10:07:56 +01:00
Nadav Har'El	59fe6a402c	test/cql-pytest: use unique keys instead of random keys Some of the tests in test/cql-pytest share the same table but use different keys to ensure they don't collide. Before this patch we used a random key, which was usually fine, but we recently noticed that the pytest-randomly plugin may cause different tests to run through the same sequence of random numbers and ruin our intent that different tests use different keys. So instead of using a random key, let's use a unique key. We can achieve this uniqueness trivially - using a counter variable - because anyway the uniqueness is only needed inside a single temporary table - which is different in every run. Another benefit is that it will now be clearer that the tests are deterministic and not random - the intent of a random_string() key was never to randomly walk the entire key space (random_string() anyway had a pretty narrow idea of what a random string looks like) - it was just to get a unique key. Refs #9988 (fixes it for cql-pytest, but not for test/alternator) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-01-31 09:01:23 +02:00
Benny Halevy	1c25934399	test: rest_api: add unit tests for keyspace_offstrategy_compaction api Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-01-30 20:40:40 +02:00
Tomasz Grabiec	b734615f51	util: cached_file: Fix corruption after memory reclamation was triggered from population If memory reclamation is triggered inside _cache.emplace(), the _cache btree can get corrupted. Reclaimers erase from it, and emplace() assumes that the tree is not modified during its execution. It first locates the target node and then does memory allocation. Fix by running emplace() under allocating section, which disables memory reclamation. The bug manifests with assert failures, e.g: ./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed. Fixes #9915 Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>	2022-01-30 19:57:35 +02:00

1 2 3 4 5 ...

2742 Commits