scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Author	SHA1	Message	Date
Avi Kivity	854188a486	Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large: 1. Large allocations. Serialization of reconcilable_result causes large allocations for storing result rows in std::deque 2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s 3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111. This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows. This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do. My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition): * Node1: 1 live row, 1M dead rows * Node2: 1M dead rows, 1 live row This was designed to trigger reconciliation right from the very start of the query. Before: ``` Running query (node2, CL=ONE, cold cache) Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] ``` After: ``` Running query (node2, CL=ONE, cold cache) Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] ``` Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before). Refs https://github.com/scylladb/scylladb/issues/7929 Refs https://github.com/scylladb/scylladb/issues/3672 Refs https://github.com/scylladb/scylladb/issues/7933 Fixes https://github.com/scylladb/scylladb/issues/9111 Closes scylladb/scylladb#15414 * github.com:scylladb/scylladb: test/topology_custom: add test_read_repair.py replica/mutation_dump: detect end-of-page in range-scans tools/scylla-sstable: write: abort parser thread if writing fails test/pylib: add REST methods to get node exe and workdir paths test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction} service/storage_proxy: add trace points for the actual read executor type service/storage_proxy: add trace points for read-repair storage_proxy: Add more trace-level logging to read-repair database: Fix accounting of small partitions in mutation query database, storage_proxy: Reconcile pages with no live rows incrementally	2023-10-05 22:39:34 +03:00
Michael Huang	75109e9519	cql3: Fix invalid JSON parsing for JSON objects with ASCII keys For JSON objects represented as map<ascii, int>, don't treat ASCII keys as a nested JSON string. We were doing that prior to the patch, which led to parsing errors. Included the error offset where JSON parsing failed for rjson::parse related functions to help identify parsing errors better. Fixes: #7949 Signed-off-by: Michael Huang <michaelhly@gmail.com> Closes scylladb/scylladb#15499	2023-10-05 22:26:08 +03:00
Avi Kivity	e600f35d1e	Merge 'logalloc, reader_concurrency_semaphore: cooperate on OOM kills' from Botond Dénes Consider the following code snippet: ```c++ future<> foo() { semaphore.consume(1024); } future<> bar() { return _allocating_section([&] { foo(); }); } ``` If the consumed memory triggers the OOM kill limit, the semaphore will throw `std::bad_alloc`. The allocating section will catch this, bump std reserves and retry the lambda. Bumping the reserves will not do anything to prevent the next call to `consume()` from triggering the kill limit. So this cycle will repeat until std reserves are so large that ensuring the reserve fails. At this point LSA gives up and re-throws the `std::bad_alloc`. Beyond the useless time spent on code that is doomed to fail, this also results in expensive LSA compaction and eviction of the cache (while trying to ensure reserves). Prevent this situation by throwing a distinct exception type which is derived from `std::bad_alloc`. Allocating section will not retry on seeing this exception. A test reproducing the bug is also added. Fixes: #15278 Closes scylladb/scylladb#15581 * github.com:scylladb/scylladb: test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled() reader_concurrency_semaphore: use utils::memory_limit_reached exception utils: add memory_limit_reached exception	2023-10-05 19:47:21 +03:00
Botond Dénes	498e3ec435	Merge 'Remove _schema field from sstable_set' from Piotr Jastrzębski All `sstable_set_impl` subclasses/implementations already keep a `schema_ptr` so we can make `sstable_set_impl::make_incremental_selector` function return both the selector and the schema that's being used by it. That way, we can use the returned schema in `sstable_set::make_incremental_selector` function instead of `sstable_set::_schema` field which makes the field unused and allows us to remove it alltogether and reduce the memory footprint of `sstable_set` objects. Closes scylladb/scylladb#15570 * github.com:scylladb/scylladb: sstable_set: Remove unused _schema field sstable_set_impl: Return also schema from make_incremental_selector	2023-10-05 11:46:08 +03:00
Botond Dénes	36b00710c1	querier: add more information about the read on semaphore mismatch Also rephase the messages a bit so they are more uniform. The goal of this change is to make semaphore mismatches easier to diagnose, by including the table name and the permit name in the printout. While at it, add a test for semaphore mismatch, it didn't have one. Refs: #15485 Closes scylladb/scylladb#15508	2023-10-05 10:27:53 +03:00
Botond Dénes	19ed3393b3	Merge 'Sanitize tracing start-stop calls' from Pavel Emelyanov Tracing is one of two global service left out there with its starting and stopping being pretty hairy. In order to de-globalize it and keep its start-stop under control the existing start-stop sequence is worth cleaning. This PR * removes create_ , start_ and stop_ wrappers to un-hide the global tracing_instance thing * renames tracing::stop() to shutdown() as it's in fact shutdown * coroutinizes start/shutdown/stop while at it Squeezed parts from #14156 that don't reorder start-stop calls Closes scylladb/scylladb#15611 * github.com:scylladb/scylladb: main: Capture local tracing reference to stop tracing tracing: Pack testing code tracing: Remove stop_tracing() wrapper tracing: Remove start_tracing() wrapper tracing: Remove create_tracing() wrapper tracing: Make shutdown() re-entrable tracing: Coroutinize start/shutdown/stop tracing: Rename helper's stop() to shutdown()	2023-10-05 10:27:19 +03:00
Michał Chojnowski	83b71ed6b2	row_cache_test: fix test_exception_safety_of_update_from_memtable The test does (among other things) the following: 1. Create a cache reader with buffer of size 1 and fill the buffer. 2. Update the cache. 3. Check that the reader produces the first mutation as seen before the update (because the buffer fill should have snapshotted the first mutation), and produces other mutation as seen after the update. However, the test is not guaranteed to stop after the update succeeds. Even during a successful update, an allocation might have failed (and been retried by an allocation_section), which will cause the body of with_allocation_failures to run again. On subsequent runs the last check (the "3." above) fails, because the first mutation is snapshotted already with the new version. Fix that. Closes scylladb/scylladb#15634	2023-10-04 23:42:03 +02:00
Piotr Jastrzebski	9edf6e4653	sstable_set: Remove unused _schema field Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com>	2023-10-04 18:50:23 +02:00
Avi Kivity	d217c6f7c1	Merge 'tools/nodetool: implement additional commands, part 1/N' from Botond Dénes The following new commands are implemented: * disablebackup * disablebinary * disablegossip * enablebackup * enablebinary * enablegossip * gettraceprobability * help * settraceprobability * statusbackup * statusbinary * statusgossip * version All are associated with tests. All tests (both old and new) pass with both the scylla-native and the cassandra nodetool implementation. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#15593 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement help operation tools/scylla-nodetool: implement the traceprobability commands tools/scylla-nodetool: implement the gossip commands tools/scylla-nodetool: implement the binary commands tools/scylla-nodetool: implement backup related commands tools/scylla-nodetool: implement version command test/nodetool: introduce utils.check_nodetool_fails_with() test/nodetool: return stdout of nodetool invokation test/nodetool/rest_api_mock.py: fix request param matching tools/scylla-nodetool: compact: remove --partition argument tools/scylla-nodetool: scylla_rest_client: add support delete method tools/scylla-nodetool: get rid of check_json_type() tools/scylla-nodetool: log more details for failed requests tools/scylla-*: use operation_option for positional options tools/utils: add support for operation aliases	2023-10-04 14:33:16 +03:00
Botond Dénes	62cdc36a74	tools/scylla-nodetool: implement help operation Nodetool considers "help" to be just another operation. So implement it as such. The usual --help and --help <command> is also supported.	2023-10-04 05:27:09 -04:00
Botond Dénes	1efabca515	tools/scylla-nodetool: implement the traceprobability commands gettraceprobability and settraceprobability	2023-10-04 05:27:09 -04:00
Botond Dénes	25d41f72c4	tools/scylla-nodetool: implement the gossip commands disablegossip, enablegossip and statusgossip	2023-10-04 05:27:09 -04:00
Botond Dénes	5bc25dbebe	tools/scylla-nodetool: implement the binary commands disablebinary, enablebinary and statusbinary	2023-10-04 05:27:09 -04:00
Botond Dénes	2ac1705c90	tools/scylla-nodetool: implement backup related commands disablebackup, enablebackup and statusbackup	2023-10-04 05:27:09 -04:00
Botond Dénes	91e62413c8	tools/scylla-nodetool: implement version command	2023-10-04 05:27:09 -04:00
Botond Dénes	5ad9b1424c	test/nodetool: introduce utils.check_nodetool_fails_with() Checking that nodetool fails with a given message turned out to be a common pattern, so extract the logic for checking this into a method of its own. Refactor the existing tests to use it, instead of the hand-coded equivalent.	2023-10-04 05:27:09 -04:00
Botond Dénes	644d91fe95	test/nodetool: return stdout of nodetool invokation So the test can inspect it.	2023-10-04 05:09:49 -04:00
Botond Dénes	dd62299355	test/nodetool/rest_api_mock.py: fix request param matching Turns out expected request params were dropped on the floor, so any expected param matched any actual params.	2023-10-04 05:09:41 -04:00
Botond Dénes	4f66e0208b	tools/scylla-nodetool: compact: remove --partition argument This argument is not recognized by the current nodetool either. It is mentioned only in our documentation, but it should be removed from there too.	2023-10-04 05:08:33 -04:00
Pavel Emelyanov	65b7aa3387	tracing: Pack testing code There's a finally-chain stopping tracing out there, now it can just use the deferred stop call and that's it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	61381feaad	tracing: Remove start_tracing() wrapper Callers can make one-like stop with the help of invoke_on_all() overload that wraps std::invoke Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	89c43f6677	tracing: Remove create_tracing() wrapper It doesn't make callers' life easier, but hides global tracing instance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Botond Dénes	471e125592	Merge 'Use REST API client in object_store test' from Pavel Emelyanov The test needs to call flush-keyspace API endpoint and currently it does it by hand. Not very convenient. Also in the future there will be the need for _background_ API kicking, the currently used requests package cannot do it, while pylib REST API can Closes scylladb/scylladb#15565 * github.com:scylladb/scylladb: test/object_store: Use REST client from pylib test/pylib: Add flush_keyspace() method to rest client test/object_store: Wrap yielded managed cluster	2023-10-03 08:50:55 +03:00
Benny Halevy	6dc1ac768d	cql-pytest/test_select_from_mutation_fragments: disable compaction on test_table Use NullCompactionStrategy for the test_table fixture rather than using the `no_autocompaction_context`. Besides being simpler, as regular compaction just comes in the way for all tests that use `SELECT MUTATION_FRAGMENTS` The latter would be problematic when we start run cql-pytest test cases in parallel rather than in serial since it will inadvertantly affect other test cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15574	2023-10-02 10:28:59 +03:00
Michael Huang	1640f83fdc	raft: Store snapshot update and truncate log atomically In case the snapshot update fails, we don't truncate commit log. Fixes scylladb/scylladb#9603 Closes scylladb/scylladb#15540	2023-09-29 17:57:49 +02:00
Botond Dénes	5d8384eff0	Merge 'Fix `test_fencing.py::test_fence_hints` flakiness' from Kamil Braun Add a REST API to reload Raft topology state without having to restart a node and use it in `test_fence_hints`. Restarting the node has undesired side effects which cause test flakiness; more details provided in commit messages. Refactor the test a bit while at it. Fixes: #15285 Closes scylladb/scylladb#15523 * github.com:scylladb/scylladb: test: test_fencing.py: enable hints_manager=trace logs in `test_fence_hints` test: test_fencing.py: reload topology through REST API in `test_fence_hints` test: refactor test_fencing.py api: storage_service: add REST API to reload topology state	2023-09-28 16:30:23 +03:00
Benny Halevy	3709a43ccc	cql-pytest.nodetool: no_autocompaction_context: support ks.tbl syntax Allow disabling auto-compaction for given table(s) using either the ks.table syntax or ks:table (as the api suggests). The first syntax would likely be more common since the test tables we automatically create are named as test_keyspace.test_table so we can pass that name to `no_autocompaction_context` as is. test_tools.system_scylla_local_sstable_prepared was modified to disable auto-compaction only only the `system.scylla_local` table rather than the whole `system` keyspace, since it only relies on this table. Plus, it helps test this change :) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15575	2023-09-28 13:59:48 +03:00
Kamil Braun	53c01b121a	test: test_fencing.py: enable hints_manager=trace logs in `test_fence_hints` Enable TRACE level logging on the server that's supposed to send the hints. Should make it easier to debug failures in the future, if any happen again.	2023-09-28 11:59:17 +02:00
Kamil Braun	706734f76c	test: test_fencing.py: reload topology through REST API in `test_fence_hints` Restarting a node in order to reload topology may have side effects that lead to test flakiness. While the node is shutting down, it gives up leadership. Before it finishes shutting down, another node may become Raft group 0 leader, then topology coordinator, then send a topology command, triggering topology state reload on the shutting down node, causing its topology version to get updated, allowing it to send a successful hint before it shuts down and restarts. After it restarts, no more hints will be sent, so the metrics condition we're waiting for (for a hint to be sent) will never become true (metrics are not persisted between restarts). Instead of restarting, reload topology state through the new REST API. This also makes the test a bit faster. Fixes #15285	2023-09-28 11:59:17 +02:00
Kamil Braun	02dd297ba1	test: refactor test_fencing.py - use `manager.get_cql()` to silence mypy (`manager.cql` is `Optional`) - extract `metrics.lines_by_prefix('scylla_hints_manager_')` to a helper function - when waiting for conditions on metrics, split the condition into safety and liveness part, and fail early if the safety part does not hold - in `exactly_one_hint` send, don't check that `send_errors_metric` is `0` (it won't be after the next commit)	2023-09-28 11:59:17 +02:00
Kamil Braun	060f2de14e	Merge 'Cluster features on raft: new procedure for joining group 0' from Piotr Dulikowski This PR implements a new procedure for joining nodes to group 0, based on the description in the "Cluster features on Raft (v2)" document. This is a continuation of the previous PRs related to cluster features on raft (https://github.com/scylladb/scylladb/pull/14722, https://github.com/scylladb/scylladb/pull/14232), and the last piece necessary to replace cluster feature checks in gossip. Current implementation relies on gossip shadow round to fetch the set of enabled features, determine whether the node supports all of the enabled features, and joins only if it is safe. As we are moving management of cluster features to group 0, we encounter a problem: the contents of group 0 itself may depend on features, hence it is not safe to join it unless we perform the feature check which depends on information in group 0. Hence, we have a dependency cycle. In order to solve this problem, the algorithm for joining group 0 is modified, and verification of features and other parameters is offloaded to an existing node in group 0. Instead of directly asking the discovery leader to unconditionally add the node to the configuration with `GROUP0_MODIFY_CONFIG`, two different RPCs are added: `JOIN_NODE_REQUEST` and `JOIN_NODE_RESPONSE`. The main idea is as follows: - The new node sends `JOIN_NODE_REQUEST` to the discovery leader. It sends a bunch of information describing the node, including supported cluster features. The discovery leader verifies some of the parameters and adds the node in the `none` state to `system.topology`. - The topology coordinator picks up the request for the node to be joined (i.e. the node in `none` state), verifies its properties - including cluster features - and then: - If the node is accepted, the coordinator transitions it to `boostrap`/`replace` state and transitions the topology to `join_group0` state. The node is added to group 0 and then `JOIN_NODE_RESPONSE` is sent to it with information that the node was accepted. - Otherwise, the node is moved to `left` state, told by the coordinator via `JOIN_NODE_RESPONSE` that it was rejected and it shuts down. The procedure is not retryable - if a node fails to do it from start to end and crashes in between, it will not be allowed to retry it with the same host_id - `JOIN_NODE_REQUEST` will fail. The data directory must be cleared before attempting to add it again (so that a new host_id is generated). More details about the procedure and the RPC are described in `topology-over-raft.md`. Fixes: #15152 Closes scylladb/scylladb#15196 * github.com:scylladb/scylladb: tests: mark test_blocked_bootstrap as skipped storage_service: do not check features in shadow round storage_service: remove raft_{boostrap,replace} topology_coordinator: relax the check in enable_features raft_group0: insert replaced node info before server setup storage_service: use join node rpc to join the cluster topology_coordinator: handle joining nodes topology_state_machine: add join_group0 state storage_service: add join node RPC handlers raft: expose current_leader in raft::server storage_service: extract wait_for_live_nodes_timeout constant raft_group0: abstract out node joining handshake storage_service: pass raft_topology_change_enabled on rpc init rpc: add new join handshake verbs docs: document the new join procedure topology_state_machine: add supported_features to replica_state storage_service: check destination host ID in raft verbs group_state_machine: take reference to raft address map raft_group0: expose joined_group0	2023-09-28 11:45:09 +02:00
Pavel Emelyanov	0eb8d1b438	test/object_store: Use REST client from pylib Test cases kick scylla to force keyspaces flush (to have the objects on object store) by hand. Equip the wrapped cluster object with the REST API class instance for convenience The assertion for 200 return status code is dropped, REST client does it behind the scenes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-28 11:33:00 +03:00
Pavel Emelyanov	4fdf12b1c7	test/pylib: Add flush_keyspace() method to rest client Which does POST /storage_service/keyspace_flush/{ks} Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-28 11:19:04 +03:00
Pavel Emelyanov	9ce99a01d5	test/object_store: Wrap yielded managed cluster Test cases use temporary cluster object which is, in fact, cql cluster. In the future there will be the need to perform more actions on it rather than just querying it with cql client, so wrap the cluster with an extendable object Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-28 11:19:03 +03:00
Botond Dénes	08c0456b88	test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill Check that the cache reader reacts correctly to semaphore's OOM kill attempt, letting the read to fail, instead of going berserk, trying to reserve more-and-more memory, until the reserve cannot be satisfied.	2023-09-28 04:12:52 -04:00
Piotr Dulikowski	2c17f81f44	tests: mark test_blocked_bootstrap as skipped With the new procedure to join nodes, testing the scenario in `test_blocked_bootstrap` becomes very tricky. To recap, the test does the following: - Starts a 3-node cluster, - Shuts down node 1, - Tries to replace node 1 with node 4, but an error injection is triggered which causes node 4 to fail after it joins group 0. Note that pre-JOIN_NODE handshake, this would only result in node 4 being added to group 0 config, but no modification to the group 0 state itself is being done - the joining node is supposed to write a request to join. - Tries to replace node 1 again with node 5, which should succeed. The bug that this regression test was supposed to check for was that node 5 would try to resolve all IPs of nodes added to group 0 config. Because node 4 shuts down before advertising itself in gossip, the node 5 would get stuck. The new procedure to join group 0 complicates the situation because a request to join is written first to group 0 and only then the topology coordinator modifies the group 0 config. It is possible to add an error injection to the topology coordinator code so that it doesn't change the group 0 state and proceeds with bootstrapping the node, but it will only get stuck trying to add the node. If node 5 tries to join in the meantime, the topology coordinator may switch to it and try to bootstrap it instead, but this is basically a 50% chance because it depends on the order of node 4 and node 5's host IDs in the topology_state_machine struct. It should be possible to fix the test with error recovery, but until then it is marked as skipped.	2023-09-27 15:53:15 +02:00
Avi Kivity	301b0a989a	Merge ' cql3/prepare_context: fix generating pk_indexes for duplicate named bind variables' from Jan Ciołek When presented with queries that use the same named bind variables twice, like this one: ```cql SELECT p FROM table WHERE p = :x AND c = :x ``` Scylla generated empty `partition_key_bind_indexes` (`pk_indexes`). `pk_indexes` tell the driver which bind variables it should use to calculate the partition token for a query. Without it, the driver is unable to determine the token and it will send the query to a random node. Scylla should generate pk_indexes which tell the driver that it can use bind variable with `bind_index = 0` to calculate the partition token for this query. The problem was that `_target_columns` keep only a single target_column for each bind variable. In the example above `:x` is compared with both `p` and `c`, but `_target_columns` would contain only one of them, and Scylla wasn't able to tell that this bind variable is compared with a partition key column. To fix it, let's replace `_target_columns` with `_targets`. `_targets` keeps all comparisons between bind variables and other expressions, so none of them will be forgotten/overwritten. A `cql-pytest` reproducer is added. I also added some comments in `prepare_context.hh/cc` to make it easier to read. Fixes: https://github.com/scylladb/scylladb/issues/15374 Closes scylladb/scylladb#15526 * github.com:scylladb/scylladb: cql-pytest/test-prepare: remove xfail marker from *pk_indexes_duplicate_named_variables cql3/prepare_context: fix generating pk_indexes for duplicate named bind variables cql3: improve readability of prepare_context cql-pytest: test generation of pk indexes during PREPARE	2023-09-26 19:47:04 +03:00
Nadav Har'El	9dea20539d	Merge 'Sanitize forward-service shutdown' from Pavel Emelyanov There's a dedicated forward_service::shutdown() method that's defer-scheduled in main for very early invocation. That's not nice, the fwd service start-shutdown-stop sequence can be made "canonical" by moving the shutting down code into abort source subscription. Similar thing was done for view updates generator in `3b95f4f107` refs: #2737 refs: #4384 Closes scylladb/scylladb#15545 * github.com:scylladb/scylladb: forward_service: Remove .shutdown() method forward_service: Set _shutdown in abort-source subscription forward_service: Add abort_source to constructor	2023-09-26 18:36:52 +03:00
Alexander Turetskiy	024ba84637	cql3: SELECT CAST column names should match Cassandra's When doing a SELECT CAST(b AS int), Cassandra returns a column named cast(b as int). Currently, Scylla uses a different name - system.castasint(b). For Cassandra compatibility, we should switch to the same name. fixes #14508 Closes scylladb/scylladb#14800	2023-09-26 17:26:14 +03:00
Pavel Emelyanov	b18c54f56c	forward_service: Add abort_source to constructor It will be used by the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-26 10:38:26 +03:00
Raphael S. Carvalho	8997fe0625	compaction: Switch to strategy_control::candidates() for regular compaction Now everything is prepared for the switch, let's do it. Now let's wait for ICS to enjoy the set of changes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	761a37022f	tests: Prepare sstable_compaction_test for change in compaction_strategy interface Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	02f1f24f27	compaction: Allow strategy to retrieve candidates either as sstables or runs That's needed for upcoming changes that will allow ICS to efficiently retrieve sstable runs. Next patch will remove candidates from compaction_strategy's interface to retrieve candidates using this one instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	8235889b8a	sstables: tag sstable_run::insert() with nodiscard sstable_run may reject insertion of a sstable if it's going to break the disjoint invariant of the run, but it's important that the caller is aware of it, so it can act on it like generating a new run id for the sstable so it can be inserted in another run. the tag is important to avoid unknown problems in this area. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	0fe2630d70	sstables: Make all_sstable_runs() more efficient by exposing frozen shared runs Users of all_sstable_runs() don't want to mutate the runs, but rather work with their content. So let's avoid copy and make the intention explicit with the new frozen_sstable_run used as return type for the interface. This will guarantee that ICS will be able to fetch uncompacting runs efficiently. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:20 -03:00
Raphael S. Carvalho	9f6c3369d2	sstables: Simplify sstable_set interface to retrieve runs This interface selects all runs that store at least one of the sstables in the vector. But that's very fragile, to the point that even ICS had to stop using it. A better interface is to return all runs managed by the set and allow compaction manager to do its filtering. We want to use it in ICS to avoid the overhead of rebuilding sstable runs which may be expensive as sorting is performed to guarantee the disjoint invariant. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:04:20 -03:00
Jan Ciolek	649b634c63	cql-pytest/test-prepare: remove xfail marker from *pk_indexes_duplicate_named_variables Issue #15374 has been fixed, so these tests can be enabled. Duplicate bind variable names are now handled correctly. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-09-25 17:19:07 +02:00
Jan Ciolek	f3ecd279f2	cql-pytest: test generation of pk indexes during PREPARE Add some tests that test whether `pk indexes` are generated correctly. When a driver asks to prepare a statement, Scylla's response includes the metadata for this prepared statement. In this metadata there's `pk indexes`, which tells the driver which bind variable values it should use to calculate the partition token. For a query like: SELECT * FROM t WHERE p2 = ? AND p1 = ? AND p3 = ? The correct pk_indexes would be [1, 0, 2], which means "To calculate the token calculate Hash(bind_vars[1] \| bind_vars[0] \| bind_vars[2])". More information is available in the specification: `1959502d8b/doc/native_protocol_v4.spec (L699-L707)` Two tests are marked as xfail because of #15374 - Scylla doesn't correctly handle using the same named variable in multiple places. This will be fixed soon. I couldn't find a good place for these tests, so I created a new file - test_prepare.py. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-09-25 17:12:17 +02:00
Pavel Emelyanov	652153c291	Merge 'populate_keyspace: use datadir' from Benny Halevy Currently the datadir is ignored. Use it to construct the table's base path. Fixes scylladb/scylladb#15418 Closes scylladb/scylladb#15480 * github.com:scylladb/scylladb: distributed_loader: populate_keyspace: access cf by ref distributed_loader: table_populator: use datadir for base_path distributed_loader: populate_keyspace: issue table mark_ready_for_writes after all datadirs are processed distributed_loader: populate_keyspace: fixup indentation distributed_loader: populate_keyspace: iterate over datadirs in the inner loop test: sstable_directory_test: add test_multiple_data_dirs table: init_storage: create upload and staging subdirs on all datadirs	2023-09-25 13:40:50 +03:00
Nadav Har'El	1a5debac5c	test/cql-pytest: cleaner reproducer for spurious static row returned Issue #10357 is about a SELECT with a filter on a regular column which incorrectly returns a static row without regular columns set (so the filter would not have matched). We already have four tests reproducing this issue, but each of them is a small part of a large tests translated from Cassandra, making it hard to understand the scope of this bug. So in this patch we add two new tests, one passing and one xfailing, which clarify the scope of this bug. It turns out that the bug only occurs when a partition has no clustering rows and only has a static row. If the partition does have clustering rows - even if those don't match the filter - the bug doesn't happen. The xfailing test is just two statements long - a single INSERT and a single SELECT Refs #10357. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#15120	2023-09-25 11:01:22 +03:00

1 2 3 4 5 ...

5649 Commits