Commit Graph

5649 Commits

Author SHA1 Message Date
Avi Kivity
854188a486 Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes
Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large:

1. Large allocations.  Serialization of reconcilable_result causes large allocations for storing result rows in std::deque
2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator  does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s
3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111.

This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows.

This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do.

My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition):
* Node1: 1 live row, 1M dead rows
* Node2: 1M dead rows, 1 live row

This was designed to trigger reconciliation right from the very start of the query.

Before:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

After:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before).

Refs https://github.com/scylladb/scylladb/issues/7929
Refs https://github.com/scylladb/scylladb/issues/3672
Refs https://github.com/scylladb/scylladb/issues/7933
Fixes https://github.com/scylladb/scylladb/issues/9111

Closes scylladb/scylladb#15414

* github.com:scylladb/scylladb:
  test/topology_custom: add test_read_repair.py
  replica/mutation_dump: detect end-of-page in range-scans
  tools/scylla-sstable: write: abort parser thread if writing fails
  test/pylib: add REST methods to get node exe and workdir paths
  test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
  service/storage_proxy: add trace points for the actual read executor type
  service/storage_proxy: add trace points for read-repair
  storage_proxy: Add more trace-level logging to read-repair
  database: Fix accounting of small partitions in mutation query
  database, storage_proxy: Reconcile pages with no live rows incrementally
2023-10-05 22:39:34 +03:00
Michael Huang
75109e9519 cql3: Fix invalid JSON parsing for JSON objects with ASCII keys
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.

Included the error offset where JSON parsing failed for
rjson::parse related functions to help identify parsing errors
better.

Fixes: #7949

Signed-off-by: Michael Huang <michaelhly@gmail.com>

Closes scylladb/scylladb#15499
2023-10-05 22:26:08 +03:00
Avi Kivity
e600f35d1e Merge 'logalloc, reader_concurrency_semaphore: cooperate on OOM kills' from Botond Dénes
Consider the following code snippet:
```c++
future<> foo() {
    semaphore.consume(1024);
}

future<> bar() {
    return _allocating_section([&] {
        foo();
    });
}
```

If the consumed memory triggers the OOM kill limit, the semaphore will throw `std::bad_alloc`. The allocating section will catch this, bump std reserves and retry the lambda. Bumping the reserves will not do anything to prevent the next call to `consume()` from triggering the kill limit. So this cycle will repeat until std reserves are so large that ensuring the reserve fails. At this point LSA gives up and re-throws the `std::bad_alloc`. Beyond the useless time spent on code that is doomed to fail, this also results in expensive LSA compaction and eviction of the cache (while trying to ensure reserves).
Prevent this situation by throwing a distinct exception type which is derived from `std::bad_alloc`. Allocating section will not retry on seeing this exception.
A test reproducing the bug is also added.

Fixes: #15278

Closes scylladb/scylladb#15581

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill
  utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled()
  reader_concurrency_semaphore: use utils::memory_limit_reached exception
  utils: add memory_limit_reached exception
2023-10-05 19:47:21 +03:00
Botond Dénes
498e3ec435 Merge 'Remove _schema field from sstable_set' from Piotr Jastrzębski
All `sstable_set_impl` subclasses/implementations already keep a `schema_ptr` so we can make `sstable_set_impl::make_incremental_selector` function return both the selector and the schema that's being used by it.

That way, we can use the returned schema in `sstable_set::make_incremental_selector` function instead of `sstable_set::_schema` field which makes the field unused and allows us to remove it alltogether and reduce the memory footprint of `sstable_set` objects.

Closes scylladb/scylladb#15570

* github.com:scylladb/scylladb:
  sstable_set: Remove unused _schema field
  sstable_set_impl: Return also schema from make_incremental_selector
2023-10-05 11:46:08 +03:00
Botond Dénes
36b00710c1 querier: add more information about the read on semaphore mismatch
Also rephase the messages a bit so they are more uniform.
The goal of this change is to make semaphore mismatches easier to
diagnose, by including the table name and the permit name in the
printout.
While at it, add a test for semaphore mismatch, it didn't have one.

Refs: #15485

Closes scylladb/scylladb#15508
2023-10-05 10:27:53 +03:00
Botond Dénes
19ed3393b3 Merge 'Sanitize tracing start-stop calls' from Pavel Emelyanov
Tracing is one of two global service left out there with its starting and stopping being pretty hairy. In order to de-globalize it and keep its start-stop under control the existing start-stop sequence is worth cleaning. This PR

 * removes create_ , start_ and stop_ wrappers to un-hide the global tracing_instance thing
 * renames tracing::stop() to shutdown() as it's in fact shutdown
 * coroutinizes start/shutdown/stop while at it

Squeezed parts from #14156 that don't reorder start-stop calls

Closes scylladb/scylladb#15611

* github.com:scylladb/scylladb:
  main: Capture local tracing reference to stop tracing
  tracing: Pack testing code
  tracing: Remove stop_tracing() wrapper
  tracing: Remove start_tracing() wrapper
  tracing: Remove create_tracing() wrapper
  tracing: Make shutdown() re-entrable
  tracing: Coroutinize start/shutdown/stop
  tracing: Rename helper's stop() to shutdown()
2023-10-05 10:27:19 +03:00
Michał Chojnowski
83b71ed6b2 row_cache_test: fix test_exception_safety_of_update_from_memtable
The test does (among other things) the following:

1. Create a cache reader with buffer of size 1 and fill the buffer.
2. Update the cache.
3. Check that the reader produces the first mutation as seen before
the update (because the buffer fill should have snapshotted the first
mutation), and produces other mutation as seen after the update.

However, the test is not guaranteed to stop after the update succeeds.
Even during a successful update, an allocation might have failed
(and been retried by an allocation_section), which will cause the
body of with_allocation_failures to run again. On subsequent runs
the last check (the "3." above) fails, because the first mutation
is snapshotted already with the new version.

Fix that.

Closes scylladb/scylladb#15634
2023-10-04 23:42:03 +02:00
Piotr Jastrzebski
9edf6e4653 sstable_set: Remove unused _schema field
Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com>
2023-10-04 18:50:23 +02:00
Avi Kivity
d217c6f7c1 Merge 'tools/nodetool: implement additional commands, part 1/N' from Botond Dénes
The following new commands are implemented:
* disablebackup
* disablebinary
* disablegossip
* enablebackup
* enablebinary
* enablegossip
* gettraceprobability
* help
* settraceprobability
* statusbackup
* statusbinary
* statusgossip
* version

All are associated with tests. All tests (both old and new) pass with both the scylla-native and the cassandra nodetool implementation.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15593

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement help operation
  tools/scylla-nodetool: implement the traceprobability commands
  tools/scylla-nodetool: implement the gossip commands
  tools/scylla-nodetool: implement the binary commands
  tools/scylla-nodetool: implement backup related commands
  tools/scylla-nodetool: implement version command
  test/nodetool: introduce utils.check_nodetool_fails_with()
  test/nodetool: return stdout of nodetool invokation
  test/nodetool/rest_api_mock.py: fix request param matching
  tools/scylla-nodetool: compact: remove --partition argument
  tools/scylla-nodetool: scylla_rest_client: add support delete method
  tools/scylla-nodetool: get rid of check_json_type()
  tools/scylla-nodetool: log more details for failed requests
  tools/scylla-*: use operation_option for positional options
  tools/utils: add support for operation aliases
2023-10-04 14:33:16 +03:00
Botond Dénes
62cdc36a74 tools/scylla-nodetool: implement help operation
Nodetool considers "help" to be just another operation. So implement it
as such. The usual --help and --help <command> is also supported.
2023-10-04 05:27:09 -04:00
Botond Dénes
1efabca515 tools/scylla-nodetool: implement the traceprobability commands
gettraceprobability and settraceprobability
2023-10-04 05:27:09 -04:00
Botond Dénes
25d41f72c4 tools/scylla-nodetool: implement the gossip commands
disablegossip, enablegossip and statusgossip
2023-10-04 05:27:09 -04:00
Botond Dénes
5bc25dbebe tools/scylla-nodetool: implement the binary commands
disablebinary, enablebinary and statusbinary
2023-10-04 05:27:09 -04:00
Botond Dénes
2ac1705c90 tools/scylla-nodetool: implement backup related commands
disablebackup, enablebackup and statusbackup
2023-10-04 05:27:09 -04:00
Botond Dénes
91e62413c8 tools/scylla-nodetool: implement version command 2023-10-04 05:27:09 -04:00
Botond Dénes
5ad9b1424c test/nodetool: introduce utils.check_nodetool_fails_with()
Checking that nodetool fails with a given message turned out to be a
common pattern, so extract the logic for checking this into a method of
its own. Refactor the existing tests to use it, instead of the
hand-coded equivalent.
2023-10-04 05:27:09 -04:00
Botond Dénes
644d91fe95 test/nodetool: return stdout of nodetool invokation
So the test can inspect it.
2023-10-04 05:09:49 -04:00
Botond Dénes
dd62299355 test/nodetool/rest_api_mock.py: fix request param matching
Turns out expected request params were dropped on the floor, so any
expected param matched any actual params.
2023-10-04 05:09:41 -04:00
Botond Dénes
4f66e0208b tools/scylla-nodetool: compact: remove --partition argument
This argument is not recognized by the current nodetool either. It is
mentioned only in our documentation, but it should be removed from there
too.
2023-10-04 05:08:33 -04:00
Pavel Emelyanov
65b7aa3387 tracing: Pack testing code
There's a finally-chain stopping tracing out there, now it can just use
the deferred stop call and that's it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-03 10:46:47 +03:00
Pavel Emelyanov
61381feaad tracing: Remove start_tracing() wrapper
Callers can make one-like stop with the help of invoke_on_all() overload
that wraps std::invoke

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-03 10:46:47 +03:00
Pavel Emelyanov
89c43f6677 tracing: Remove create_tracing() wrapper
It doesn't make callers' life easier, but hides global tracing instance

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-03 10:46:47 +03:00
Botond Dénes
471e125592 Merge 'Use REST API client in object_store test' from Pavel Emelyanov
The test needs to call flush-keyspace API endpoint and currently it does it by hand. Not very convenient.
Also in the future there will be the need for _background_ API kicking, the currently used requests package cannot do it, while pylib REST API can

Closes scylladb/scylladb#15565

* github.com:scylladb/scylladb:
  test/object_store: Use REST client from pylib
  test/pylib: Add flush_keyspace() method to rest client
  test/object_store: Wrap yielded managed cluster
2023-10-03 08:50:55 +03:00
Benny Halevy
6dc1ac768d cql-pytest/test_select_from_mutation_fragments: disable compaction on test_table
Use NullCompactionStrategy for the test_table fixture
rather than using the `no_autocompaction_context`.

Besides being simpler, as regular compaction just comes in
the way for all tests that use `SELECT MUTATION_FRAGMENTS`

The latter would be problematic when we start run cql-pytest
test cases in parallel rather than in serial since it
will inadvertantly affect other test cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15574
2023-10-02 10:28:59 +03:00
Michael Huang
1640f83fdc raft: Store snapshot update and truncate log atomically
In case the snapshot update fails, we don't truncate commit log.

Fixes scylladb/scylladb#9603

Closes scylladb/scylladb#15540
2023-09-29 17:57:49 +02:00
Botond Dénes
5d8384eff0 Merge 'Fix test_fencing.py::test_fence_hints flakiness' from Kamil Braun
Add a REST API to reload Raft topology state without having to restart a node and use it in `test_fence_hints`. Restarting the node has undesired side effects which cause test flakiness; more details provided in commit messages.

Refactor the test a bit while at it.

Fixes: #15285

Closes scylladb/scylladb#15523

* github.com:scylladb/scylladb:
  test: test_fencing.py: enable hints_manager=trace logs in `test_fence_hints`
  test: test_fencing.py: reload topology through REST API in `test_fence_hints`
  test: refactor test_fencing.py
  api: storage_service: add REST API to reload topology state
2023-09-28 16:30:23 +03:00
Benny Halevy
3709a43ccc cql-pytest.nodetool: no_autocompaction_context: support ks.tbl syntax
Allow disabling auto-compaction for given table(s)
using either the ks.table syntax or ks:table (as the
api suggests).

The first syntax would likely be more common since
the test tables we automatically create are named
as test_keyspace.test_table so we can pass that name
to `no_autocompaction_context` as is.

test_tools.system_scylla_local_sstable_prepared was
modified to disable auto-compaction only only
the `system.scylla_local` table rather than
the whole `system` keyspace, since it only relies
on this table. Plus, it helps test this change :)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15575
2023-09-28 13:59:48 +03:00
Kamil Braun
53c01b121a test: test_fencing.py: enable hints_manager=trace logs in test_fence_hints
Enable TRACE level logging on the server that's supposed to send the
hints. Should make it easier to debug failures in the future, if any
happen again.
2023-09-28 11:59:17 +02:00
Kamil Braun
706734f76c test: test_fencing.py: reload topology through REST API in test_fence_hints
Restarting a node in order to reload topology may have side effects that
lead to test flakiness. While the node is shutting down, it gives up
leadership. Before it finishes shutting down, another node may become
Raft group 0 leader, then topology coordinator, then send a topology
command, triggering topology state reload on the shutting down node,
causing its topology version to get updated, allowing it to send a
successful hint before it shuts down and restarts. After it restarts, no
more hints will be sent, so the metrics condition we're waiting for (for
a hint to be sent) will never become true (metrics are not persisted
between restarts).

Instead of restarting, reload topology state through the new REST API.
This also makes the test a bit faster.

Fixes #15285
2023-09-28 11:59:17 +02:00
Kamil Braun
02dd297ba1 test: refactor test_fencing.py
- use `manager.get_cql()` to silence mypy (`manager.cql` is `Optional`)
- extract `metrics.lines_by_prefix('scylla_hints_manager_')` to a helper
  function
- when waiting for conditions on metrics, split the condition into
  safety and liveness part, and fail early if the safety part does not
  hold
- in `exactly_one_hint` send, don't check that `send_errors_metric` is
  `0` (it won't be after the next commit)
2023-09-28 11:59:17 +02:00
Kamil Braun
060f2de14e Merge 'Cluster features on raft: new procedure for joining group 0' from Piotr Dulikowski
This PR implements a new procedure for joining nodes to group 0, based on the description in the "Cluster features on Raft (v2)" document. This is a continuation of the previous PRs related to cluster features on raft (https://github.com/scylladb/scylladb/pull/14722, https://github.com/scylladb/scylladb/pull/14232), and the last piece necessary to replace cluster feature checks in gossip.

Current implementation relies on gossip shadow round to fetch the set of enabled features, determine whether the node supports all of the enabled features, and joins only if it is safe. As we are moving management of cluster features to group 0, we encounter a problem: the contents of group 0 itself may depend on features, hence it is not safe to join it unless we perform the feature check which depends on information in group 0. Hence, we have a dependency cycle.

In order to solve this problem, the algorithm for joining group 0 is modified, and verification of features and other parameters is offloaded to an existing node in group 0. Instead of directly asking the discovery leader to unconditionally add the node to the configuration with `GROUP0_MODIFY_CONFIG`, two different RPCs are added: `JOIN_NODE_REQUEST` and `JOIN_NODE_RESPONSE`. The main idea is as follows:

- The new node sends `JOIN_NODE_REQUEST` to the discovery leader. It sends a bunch of information describing the node, including supported cluster features. The discovery leader verifies some of the parameters and adds the node in the `none` state to `system.topology`.
- The topology coordinator picks up the request for the node to be joined (i.e. the node in `none` state), verifies its properties - including cluster features - and then:
	- If the node is accepted, the coordinator transitions it to `boostrap`/`replace` state and transitions the topology to `join_group0` state. The node is added to group 0 and then `JOIN_NODE_RESPONSE` is sent to it with information that the node was accepted.
	- Otherwise, the node is moved to `left` state, told by the coordinator via `JOIN_NODE_RESPONSE` that it was rejected and it shuts down.

The procedure is not retryable - if a node fails to do it from start to end and crashes in between, it will not be allowed to retry it with the same host_id - `JOIN_NODE_REQUEST` will fail. The data directory must be cleared before attempting to add it again (so that a new host_id is generated).

More details about the procedure and the RPC are described in `topology-over-raft.md`.

Fixes: #15152

Closes scylladb/scylladb#15196

* github.com:scylladb/scylladb:
  tests: mark test_blocked_bootstrap as skipped
  storage_service: do not check features in shadow round
  storage_service: remove raft_{boostrap,replace}
  topology_coordinator: relax the check in enable_features
  raft_group0: insert replaced node info before server setup
  storage_service: use join node rpc to join the cluster
  topology_coordinator: handle joining nodes
  topology_state_machine: add join_group0 state
  storage_service: add join node RPC handlers
  raft: expose current_leader in raft::server
  storage_service: extract wait_for_live_nodes_timeout constant
  raft_group0: abstract out node joining handshake
  storage_service: pass raft_topology_change_enabled on rpc init
  rpc: add new join handshake verbs
  docs: document the new join procedure
  topology_state_machine: add supported_features to replica_state
  storage_service: check destination host ID in raft verbs
  group_state_machine: take reference to raft address map
  raft_group0: expose joined_group0
2023-09-28 11:45:09 +02:00
Pavel Emelyanov
0eb8d1b438 test/object_store: Use REST client from pylib
Test cases kick scylla to force keyspaces flush (to have the objects on
object store) by hand. Equip the wrapped cluster object with the REST
API class instance for convenience

The assertion for 200 return status code is dropped, REST client does it
behind the scenes

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-28 11:33:00 +03:00
Pavel Emelyanov
4fdf12b1c7 test/pylib: Add flush_keyspace() method to rest client
Which does POST /storage_service/keyspace_flush/{ks}

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-28 11:19:04 +03:00
Pavel Emelyanov
9ce99a01d5 test/object_store: Wrap yielded managed cluster
Test cases use temporary cluster object which is, in fact, cql cluster.
In the future there will be the need to perform more actions on it
rather than just querying it with cql client, so wrap the cluster with
an extendable object

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-28 11:19:03 +03:00
Botond Dénes
08c0456b88 test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill
Check that the cache reader reacts correctly to semaphore's OOM kill
attempt, letting the read to fail, instead of going berserk, trying to
reserve more-and-more memory, until the reserve cannot be satisfied.
2023-09-28 04:12:52 -04:00
Piotr Dulikowski
2c17f81f44 tests: mark test_blocked_bootstrap as skipped
With the new procedure to join nodes, testing the scenario in
`test_blocked_bootstrap` becomes very tricky. To recap, the test does
the following:

- Starts a 3-node cluster,
- Shuts down node 1,
- Tries to replace node 1 with node 4, but an error injection is
  triggered which causes node 4 to fail after it joins group 0. Note
  that pre-JOIN_NODE handshake, this would only result in node 4 being
  added to group 0 config, but no modification to the group 0 state
  itself is being done - the joining node is supposed to write a request
  to join.
- Tries to replace node 1 again with node 5, which should succeed.

The bug that this regression test was supposed to check for was that
node 5 would try to resolve all IPs of nodes added to group 0 config.
Because node 4 shuts down before advertising itself in gossip, the node
5 would get stuck.

The new procedure to join group 0 complicates the situation because a
request to join is written first to group 0 and only then the topology
coordinator modifies the group 0 config. It is possible to add an error
injection to the topology coordinator code so that it doesn't change the
group 0 state and proceeds with bootstrapping the node, but it will only
get stuck trying to add the node. If node 5 tries to join in the
meantime, the topology coordinator may switch to it and try to bootstrap
it instead, but this is basically a 50% chance because it depends on the
order of node 4 and node 5's host IDs in the topology_state_machine
struct.

It should be possible to fix the test with error recovery, but until
then it is marked as skipped.
2023-09-27 15:53:15 +02:00
Avi Kivity
301b0a989a Merge ' cql3/prepare_context: fix generating pk_indexes for duplicate named bind variables' from Jan Ciołek
When presented with queries that use the same named bind variables twice, like this one:
```cql
SELECT p FROM table WHERE p = :x AND c = :x
```

Scylla generated empty `partition_key_bind_indexes` (`pk_indexes`).
`pk_indexes` tell the driver which bind variables it should use to calculate the partition token for a query. Without it, the driver is unable to determine the token and it will send the query to a random node.

Scylla should generate pk_indexes which tell the driver that it can use bind variable with `bind_index = 0` to calculate the partition token for this query.

The problem was that `_target_columns` keep only a single target_column for each bind variable.
In the example above `:x` is compared with both `p` and `c`, but `_target_columns` would contain only one of them, and Scylla wasn't able to tell that this bind variable is compared with a partition key column.

To fix it, let's replace `_target_columns` with `_targets`. `_targets` keeps all comparisons
between bind variables and other expressions, so none of them will be forgotten/overwritten.

A `cql-pytest` reproducer is added.

I also added some comments in `prepare_context.hh/cc` to make it easier to read.

Fixes: https://github.com/scylladb/scylladb/issues/15374

Closes scylladb/scylladb#15526

* github.com:scylladb/scylladb:
  cql-pytest/test-prepare: remove xfail marker from *pk_indexes_duplicate_named_variables
  cql3/prepare_context: fix generating pk_indexes for duplicate named bind variables
  cql3: improve readability of prepare_context
  cql-pytest: test generation of pk indexes during PREPARE
2023-09-26 19:47:04 +03:00
Nadav Har'El
9dea20539d Merge 'Sanitize forward-service shutdown' from Pavel Emelyanov
There's a dedicated forward_service::shutdown() method that's defer-scheduled in main for very early invocation. That's not nice, the fwd service start-shutdown-stop sequence can be made "canonical" by moving the shutting down code into abort source subscription. Similar thing was done for view updates generator in 3b95f4f107

refs: #2737
refs: #4384

Closes scylladb/scylladb#15545

* github.com:scylladb/scylladb:
  forward_service: Remove .shutdown() method
  forward_service: Set _shutdown in abort-source subscription
  forward_service: Add abort_source to constructor
2023-09-26 18:36:52 +03:00
Alexander Turetskiy
024ba84637 cql3: SELECT CAST column names should match Cassandra's
When doing a SELECT CAST(b AS int), Cassandra returns a column named
cast(b as int). Currently, Scylla uses a different name -
system.castasint(b). For Cassandra compatibility, we should switch to
the same name.

fixes #14508

Closes scylladb/scylladb#14800
2023-09-26 17:26:14 +03:00
Pavel Emelyanov
b18c54f56c forward_service: Add abort_source to constructor
It will be used by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-26 10:38:26 +03:00
Raphael S. Carvalho
8997fe0625 compaction: Switch to strategy_control::candidates() for regular compaction
Now everything is prepared for the switch, let's do it.

Now let's wait for ICS to enjoy the set of changes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-09-25 17:18:21 -03:00
Raphael S. Carvalho
761a37022f tests: Prepare sstable_compaction_test for change in compaction_strategy interface
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-09-25 17:18:21 -03:00
Raphael S. Carvalho
02f1f24f27 compaction: Allow strategy to retrieve candidates either as sstables or runs
That's needed for upcoming changes that will allow ICS to efficiently
retrieve sstable runs.

Next patch will remove candidates from compaction_strategy's interface
to retrieve candidates using this one instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-09-25 17:18:21 -03:00
Raphael S. Carvalho
8235889b8a sstables: tag sstable_run::insert() with nodiscard
sstable_run may reject insertion of a sstable if it's going
to break the disjoint invariant of the run, but it's important
that the caller is aware of it, so it can act on it like
generating a new run id for the sstable so it can be inserted
in another run. the tag is important to avoid unknown
problems in this area.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-09-25 17:18:21 -03:00
Raphael S. Carvalho
0fe2630d70 sstables: Make all_sstable_runs() more efficient by exposing frozen shared runs
Users of all_sstable_runs() don't want to mutate the runs, but rather
work with their content. So let's avoid copy and make the intention
explicit with the new frozen_sstable_run used as return type
for the interface.

This will guarantee that ICS will be able to fetch uncompacting
runs efficiently.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-09-25 17:18:20 -03:00
Raphael S. Carvalho
9f6c3369d2 sstables: Simplify sstable_set interface to retrieve runs
This interface selects all runs that store at least one of the
sstables in the vector.

But that's very fragile, to the point that even ICS had to
stop using it. A better interface is to return all runs
managed by the set and allow compaction manager to do its
filtering.

We want to use it in ICS to avoid the overhead of rebuilding
sstable runs which may be expensive as sorting is performed
to guarantee the disjoint invariant.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-09-25 17:04:20 -03:00
Jan Ciolek
649b634c63 cql-pytest/test-prepare: remove xfail marker from *pk_indexes_duplicate_named_variables
Issue #15374 has been fixed, so these tests can be enabled.
Duplicate bind variable names are now handled correctly.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-09-25 17:19:07 +02:00
Jan Ciolek
f3ecd279f2 cql-pytest: test generation of pk indexes during PREPARE
Add some tests that test whether `pk indexes` are generated correctly.
When a driver asks to prepare a statement, Scylla's response includes
the metadata for this prepared statement.
In this metadata there's `pk indexes`, which tells the driver which
bind variable values it should use to calculate the partition token.

For a query like:
SELECT * FROM t WHERE p2 = ? AND p1 = ? AND p3 = ?

The correct pk_indexes would be [1, 0, 2], which means
"To calculate the token calculate Hash(bind_vars[1] | bind_vars[0] | bind_vars[2])".

More information is available in the specification:
1959502d8b/doc/native_protocol_v4.spec (L699-L707)

Two tests are marked as xfail because of #15374 - Scylla doesn't correctly handle using the same
named variable in multiple places. This will be fixed soon.

I couldn't find a good place for these tests, so I created a new file - test_prepare.py.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-09-25 17:12:17 +02:00
Pavel Emelyanov
652153c291 Merge 'populate_keyspace: use datadir' from Benny Halevy
Currently the datadir is ignored.
Use it to construct the table's base path.

Fixes scylladb/scylladb#15418

Closes scylladb/scylladb#15480

* github.com:scylladb/scylladb:
  distributed_loader: populate_keyspace: access cf by ref
  distributed_loader: table_populator: use datadir for base_path
  distributed_loader: populate_keyspace: issue table mark_ready_for_writes after all datadirs are processed
  distributed_loader: populate_keyspace: fixup indentation
  distributed_loader: populate_keyspace: iterate over datadirs in the inner loop
  test: sstable_directory_test: add test_multiple_data_dirs
  table: init_storage: create upload and staging subdirs on all datadirs
2023-09-25 13:40:50 +03:00
Nadav Har'El
1a5debac5c test/cql-pytest: cleaner reproducer for spurious static row returned
Issue #10357 is about a SELECT with a filter on a regular column which
incorrectly returns a static row without regular columns set (so the
filter would not have matched). We already have four tests reproducing
this issue, but each of them is a small part of a large tests translated
from Cassandra, making it hard to understand the scope of this bug.

So in this patch we add two new tests, one passing and one xfailing,
which clarify the scope of this bug. It turns out that the bug only
occurs when a partition has no clustering rows and only has a static
row. If the partition does have clustering rows - even if those don't
match the filter - the bug doesn't happen. The xfailing test is just
two statements long - a single INSERT and a single SELECT

Refs #10357.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#15120
2023-09-25 11:01:22 +03:00