Commit Graph

7150 Commits

Author SHA1 Message Date
Nadav Har'El
edc5bca6b1 alternator: do not allow authentication with a non-"login" role
Alternator allows authentication into the existing CQL roles, but
roles which have the flag "login=false" should be refused in
authentication, and this patch adds the missing check.

The patch also adds a regression test for this feature in the
test/alternator test framework, in a new test file
test/alternator/cql_rbac.py. This test file will later include more
tests of how the CQL RBAC commands (CREATE ROLE, GRANT, REVOKE)
affect authentication and authorization in Alternator.
In particular, these tests need to use not just the DynamoDB API but
also CQL, so this new test file includes the "cql" fixture that allows
us to run CQL commands, to create roles, to retrieve their secret keys,
and so on.

Fixes scylladb/scylladb#19735

Closes scylladb/scylladb#19740
2024-07-24 08:20:23 +02:00
Botond Dénes
84db147c58 Merge 'tasks: introduce virtual tasks' from Aleksandra Martyniuk
Introduce virtual tasks - task manager tasks which cover
cluster-wide operations.

Virtual tasks aren't kept in memory, instead their statuses
are retrieved from associated service when user requests
them with task manager API. From API users' perspective,
virtual tasks behave similarly to regular tasks, but they can
be queried from any node in a cluster.

Virtual tasks cannot have a parent task. They can have
children on each node in a cluster, but do not keep references
to them. So, if a direct child of a virtual task is unregistered
from task manager, it will no longer be shown in parent's
children vector.

virtual_task class corresponds to all virtual tasks in one
group. If users want to list all tasks in a module, a virtual_task
returns all recent supported operations; if they request virtual
task's status - info about the one specified operation is
presented. Time to live, number of tracked operations etc.
depend on the implementation of individual virtual_task.
All virtual_tasks are kept only on shard 0.

Refs: https://github.com/scylladb/scylladb/issues/15852

New feature, no backport needed.

Closes scylladb/scylladb#16374

* github.com:scylladb/scylladb:
  docs: describe virtual tasks
  db: node_ops: filter topology request entries
  test: add a topology suite for testing tasks
  node_ops: service: create streaming tasks
  node_ops: register node_ops_virtual_task in task manager
  service: node_ops: keep node ops module in storage service
  node_ops: implement node_ops_virtual_task methods
  db: service: modify methods to get topology_requests data
  db: service: add request type column to topology_requests
  node_ops: add task manager module and node_ops_virtual_task
  tasks: api: add virtual task support to get_task_status_recursively
  tasks: api: add virtual task support
  tasks: api: add virtual tasks support to get_tasks
  tasks: add task_handler to hide task and virtual_task differences from user
  tasks: modify invoke_on_task
  tasks: implement task_manager::virtual_task::impl::get_children
  tasks: keep virtual tasks in task manager
  tasks: introduce task_manager::virtual_task
2024-07-24 08:34:28 +03:00
Avi Kivity
3c930a61c9 Merge 'test: scylla_cluster: support more test scenarios' from Patryk Jędrzejczak
We modify `ScyllaCluster.server_start` so that it changes seeds of the
starting node to all currently running nodes. This allows writing tests like
```python
s1 = await manager.server_add(start=False)
await manager.server_add()
await manager.server_start(s1.server_id)
```
However, it disallows writing tests that start multiple clusters. To fix this,
we add the `seeds` parameter to `server_start`.

We also improve the logic in `ScyllaCluster.add_server` to allow writing
tests like
```python
await manager.server_add(expected_error="...")
await manager.server_add()
```

This PR only adds improvements to the `test.py` framework, no need
to backport it.

Closes scylladb/scylladb#19847

* github.com:scylladb/scylladb:
  test: scylla_cluster: improve expected_error in add_server
  test: scylla_cluster: support more test scenarios
  test: scylla_cluster: correctly change seeds in server_start
2024-07-23 22:05:31 +03:00
Patryk Jędrzejczak
02ccd2e3af test: scylla_cluster: improve expected_error in add_server
We make two changes:
- we lease the IP address of a node that failed to boot because of
  an expected error,
- we don't log "Cluster ... added ..." when a node fails to boot
  because of an expected error.
2024-07-23 14:35:09 +02:00
Patryk Jędrzejczak
4079cd1a7b test: scylla_cluster: support more test scenarios
Here are some examples of tests that don't work with no initial
nodes, but they should work:

1.
```
await manager.server_add(expected_error="...")
await manager.server_add()
```

2.
```
await manager.servers_add(2, expected_error="...")
await manager.servers_add(2)
```

3.
```
s1 = await manager.server_add(start=False)
await manager.server_start(s1.server_id, expected_error="...")
await manager.server_add()
```

4.
```
[s1, s2] = await manager.servers_add(2, start=False)
await manager.server_start(s1.server_id, expected_error="...")
await manager.server_start(s2.server_id, expected_error="...")
await manager.servers_add(2)
```

5.
```
s1 = await manager.server_add(start=False)
await manager.server_add()
await manager.server_start(s1.server_id)
```

6.
```
[s1, s2] = await manager.servers_add(2, start=False)
await manager.servers_add(2)
await manager.server_start(s1.server_id)
await manager.server_start(s2.server_id)
```

In this patch, we make a few improvements to make tests like the ones
presented above work. I tested all the examples above manually.

From now on, servers receive correct seeds if the first servers added
in the test didn't start or failed to boot.

Also, we remove the assertion preventing the creation of a second
cluster. This assertion failed the tests presented above. We could
weaken it to make these tests pass, but it would require some work.
Moreover, we have tests that intentionally create two clusters.
Therefore, we go for the easiest solution and accept that a single
`ScyllaCluster` may not correspond to a single Scylla cluster.
2024-07-23 14:35:09 +02:00
Patryk Jędrzejczak
e196c1727e test: scylla_cluster: correctly change seeds in server_start
We change seeds in `ScyllaCluster.server_start` to all currently
running nodes. The previous code only pretended that it did it.

After doing this change, writing tests that create multiple clusters
is impossible. To allow it, we add the `seeds` parameter to
`ManagerClient.server_start`. We use it to fix and simplify the only
test that creates two clusters - `test_different_group0_ids`.
2024-07-23 14:35:08 +02:00
Aleksandra Martyniuk
c64cb98bcf db: node_ops: filter topology request entries
system_keyspace::get_topology_request_entries returns entries for
requests which are running or have finished after specified time.

In task manager node ops task set the time so that they are shown
for task_ttl seconds after they have finished.
2024-07-23 13:35:02 +02:00
Aleksandra Martyniuk
36b77c0592 test: add a topology suite for testing tasks
Add topology_tasks test suite for testing task manager's node ops
tasks. Add TaskManagerClient to topology_tasks for an easy usage
of task manager rest api.

Write a test for bootstrap, replace, rebuild, decommission and remove
top level tasks using the above.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
8e56913fdf service: node_ops: keep node ops module in storage service
Keep task manager node ops module in storage service. It will be
used to create and manage tasks related to topology changes.

The module is created and registered in storage service constructor.
In storage_service::stop() the module is stopped and so all the remaining
tasks would be unregistered immediately after they are finished.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
5f7f403a15 tasks: api: add virtual task support
Virtual tasks are supported by get_task_status, abort_task and
wait_task.

Task status returned by get_task_status and wait_task:
- contains task_kind to indicate whether it's virtual (cluster) or
  regular (node) task;
- children list apart from task_id contains node address of the task.
2024-07-23 13:35:01 +02:00
Nadav Har'El
bac7c33313 alternator: fix "/localnodes" to not return nodes still joining
Alternator's "/localnodes" HTTP request is supposed to return the list of
nodes in the local DC to which the user can send requests.

The existing implementation incorrectly used gossiper::is_alive() to check
for which nodes to return - but "alive" nodes include nodes which are still
joining the cluster and not really usable. These nodes can remain in the
JOINING state for a long time while they are copying data, and an attempt
to send requests to them will fail.

The fix for this bug is trivial: change the call to is_alive() to a call
to is_normal().

But the hard part of this test is the testing:

1. An existing multi-node test for "/localnodes" assummed that right after
   a new node was created, it appears on "/localnodes". But after this
   patch, it may take a bit more time for the bootstrapping to complete
   and the new node to appear in /localnodes - so I had to add a retry loop.

2. I added a test that reproduces the bug fixed here, and verifies its
   fix. The test is in the multi-node topology framework. It adds an
   injection which delays the bootstrap, which leaves a new node in JOINING
   state for a long time. The test then verifies that the new node is
   alive (as checked by the REST API), but is not returned by "/localnodes".

3. The new injection for delaying the bootstrap is unfortunately not
   very pretty - I had to do it in three places because we have several
   code paths of how bootstrap works without repair, with repair, without
   Raft and with Raft - and I wanted to delay all of them.

Fixes #19694.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19725
2024-07-23 13:51:16 +03:00
Kefu Chai
061def001d s3/client: add client::upload_file()
this member function prepares for the backup feature, where the
object to be stored in the object storage is already persisted as a
file on local filesystem. this brings us two benefits:

- with the file, we don't need to accumulate the payloads in memory
  and send them in batch, as we do in upload_sink and in
  upload_jumbo_sink. this puts less pressure on the memory subsystem.
- with the file, we can read multiple parts in parallel if multpart
  upload applies to it, this helps to improve the throughput.

so, this new helper is introduced to help upload an sstable from local
filesystem to the object storage.

Fixes #16287
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-23 14:39:30 +08:00
Aleksandra Martyniuk
dfe3af40ed test: tasks: adjust tests to new wait_task behavior
After c1b2b8cb2c /task_manager/wait_task/
does not unregister tasks anymore.

Delete the check if the task was unregistered from test_task_manager_wait.
Check task status in drain_module_tasks to ensure that the task
is removed from task manager.

Fixes: #19351.

Closes scylladb/scylladb#19834
2024-07-22 18:24:54 +03:00
Nadav Har'El
9eb47b3ef0 Merge 'config: round-trip boolean configuration variables' from Avi Kivity
When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted
on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in
both directions.

Not a regression, so a backport isn't strictly necessary.

Closes scylladb/scylladb#19792

* github.com:scylladb/scylladb:
  config: specialize from-string conversion for bool
  config: wrap boost::lexical_cast<> when converting from strings
2024-07-22 17:53:02 +03:00
Botond Dénes
d3135db457 Merge 'commitlog: Add optional max lifetime parameter to cl instance' from Calle Wilund
If set, any remaining segment that has data older than this threshold will request flushing, regardless of data pressure. I.e. even a system where nothing happends will after X seconds flush data to free up the commit log.

Related to  #15820

The functionality here is to prevent pathological/test cases where a silent system cannot fully process stuff like compaction, GC etc due to things like CL forcing smaller GC windows etc.

Closes scylladb/scylladb#15971

* github.com:scylladb/scylladb:
  commitlog: Make max data lifetime runtime-configurable
  db::config: Expose commitlog_max_data_lifetime_in_s parameter
  commitlog: Add optional max lifetime parameter to cl instance
2024-07-22 17:21:33 +03:00
Botond Dénes
591876b44e Merge 'sstables: do not reload components of unlinked sstables' from Lakshmi Narayanan Sreethar
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.

The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.

Fixes https://github.com/scylladb/scylladb/issues/19722

Closes scylladb/scylladb#19717

* github.com:scylladb/scylladb:
  boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded
  sstables: do not reload components of unlinked sstables
  sstables/sstables_manager: introduce on_unlink method
2024-07-22 12:08:25 +03:00
Avi Kivity
358147959e Merge 'keep table directory open for flushing' from Laszlo Ersek
`filesystem_storage` methods frequently call `sync_directory()`, for the sake of flushing (sync'ing) a directory. `sync_directory()` always brackets the sync with open and close, and given that most `sync_directory()` calls target the sstable base directory, those repeated opens and closes are considered wasteful. Rework the `filesystem_storage::_dir` member (from a mere pathname) so that it stand for an `opened_directory` object, which keeps the sstable base directory open, for the purpose of repeated sync'ing.

Resolves #2399.

Closes scylladb/scylladb#19624

* github.com:scylladb/scylladb:
  sstables/storage: synch "dst_dir" more leanly in create_links_common()
  sstables/storage: close previous directory asynchronously upon dir change
  sstables/storage: futurize change_dir_for_test()
  sstables/storage: sync through "opened_directory" in filesystem...::move()
  sstables/storage: sync through "opened_directory" in the "easy" cases
  sstables/storage: introduce "opened_directory" class
2024-07-21 17:07:44 +03:00
Łukasz Paszkowski
781eb7517c api/system: add highest_supported_sstable_format path
Current upgrade dtest rely on a ccm node function to
get_highest_supported_sstable_version() that looks for
r'Feature (.*)_SSTABLE_FORMAT is enabled' in the log files.

Starting from scylla-6.0 ME_SSTABLE_FORMAT is enabled by default
and there is no cluster feature for it. Thus get_highest_supported_sstable_version()
returns an empty list resulting in the upgrade tests failures.

This change introduces a seperate API path that returns the highest
supported sstable format (one of la, mc, md, me) by a scylla node.

Fixes scylladb/scylladb#19772

Backports to 6.0 and 6.1 required. The current upgrade test in dtest
checks scylla upgrades up to version 5.4 only. This patch is a
prerequisite to backport the upgrade tests fix in dtest.

Closes scylladb/scylladb#19787
2024-07-21 17:00:19 +03:00
Avi Kivity
36b57f3432 Merge 'token: inline optimizations' from Benny Halevy
This series contains several optimizations for dht::token
around its comparison functions as well as minimum_token and maximum_token definitions,
by moving them inline into dht/token.hh

This results in a nice improvement in perf-simple-query:
```
==> perf-simple-query.pre <== (21c67a5a64)
         throughput: mean=95774.01 standard-deviation=1129.83 median=96243.64 median-absolute-deviation=1090.08 maximum=96864.09 minimum=94471.19
instructions_per_op: mean=41813.68 standard-deviation=16.27 median=41809.29 median-absolute-deviation=7.02 maximum=41841.64 minimum=41799.41
  cpu_cycles_per_op: mean=22383.19 standard-deviation=331.01 median=22254.53 median-absolute-deviation=332.26 maximum=22744.11 minimum=21996.73

==> perf-simple-query.post.0 <== (token: move ordering operator inline)
         throughput: mean=96350.01 standard-deviation=640.10 median=96228.88 median-absolute-deviation=621.45 maximum=96988.16 minimum=95478.51
instructions_per_op: mean=41627.13 standard-deviation=37.55 median=41627.06 median-absolute-deviation=2.43 maximum=41679.44 minimum=41573.31
  cpu_cycles_per_op: mean=22184.65 standard-deviation=151.03 median=22163.05 median-absolute-deviation=120.83 maximum=22348.49 minimum=21967.30

==> perf-simple-query.post.1 <== (token: operator<=>: optimize the common case)
         throughput: mean=96778.29 standard-deviation=1719.34 median=97021.72 median-absolute-deviation=1059.56 maximum=98300.99 minimum=93893.75
instructions_per_op: mean=41590.25 standard-deviation=5.53 median=41589.50 median-absolute-deviation=4.17 maximum=41598.39 minimum=41584.57
  cpu_cycles_per_op: mean=22135.33 standard-deviation=471.98 median=21969.30 median-absolute-deviation=244.89 maximum=22905.24 minimum=21685.33

==> perf-simple-query.post.3 <== (token: always initialize data member)
         throughput: mean=98264.33 standard-deviation=998.49 median=98533.02 median-absolute-deviation=780.45 maximum=99075.40 minimum=96656.51
instructions_per_op: mean=41657.61 standard-deviation=22.53 median=41648.49 median-absolute-deviation=12.89 maximum=41696.81 minimum=41642.07
  cpu_cycles_per_op: mean=21808.57 standard-deviation=93.63 median=21794.56 median-absolute-deviation=75.41 maximum=21949.46 minimum=21719.55

==> perf-simple-query.post.4 <== (token: constexpr ctors, methods, and minimum/maximum_token)
         throughput: mean=98095.05 standard-deviation=1333.32 median=98930.22 median-absolute-deviation=906.80 maximum=99209.38 minimum=96194.25
instructions_per_op: mean=41572.28 standard-deviation=6.04 median=41574.49 median-absolute-deviation=4.76 maximum=41579.56 minimum=41564.72
  cpu_cycles_per_op: mean=21831.35 standard-deviation=169.56 median=21732.86 median-absolute-deviation=102.93 maximum=22091.66 minimum=21689.63

==> perf-simple-query.post.5 <== (token: initialize non-key tokens with min() value)
         throughput: mean=99502.32 standard-deviation=1003.70 median=99744.03 median-absolute-deviation=388.87 maximum=100482.95 minimum=97813.42
instructions_per_op: mean=41593.48 standard-deviation=17.27 median=41585.25 median-absolute-deviation=8.46 maximum=41619.41 minimum=41575.86
  cpu_cycles_per_op: mean=21545.90 standard-deviation=86.66 median=21578.01 median-absolute-deviation=43.17 maximum=21612.41 minimum=21395.42
```

Optimization only. No backport required

Closes scylladb/scylladb#19782

* github.com:scylladb/scylladb:
  token: initialize non-key tokens with min() value
  token: make kind-based ctor private
  token: constexpr ctors, methods, and minimum/maximum_token
  token: always initialize data member
  everywhere: use dht::token is_{minimum,maximum}
  token: operator<=>: optimize the common case
  token: move ordering operator inline
  partitioner_test: add more token-level tests
2024-07-21 15:07:36 +03:00
Benny Halevy
9f05072527 token: make kind-based ctor private
Users outside of the token module don't
need to mess with the token::kind.
They can only create key tokens.
Never, minimum or maximum tokens, with a particular
datya value.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
6806112189 token: constexpr ctors, methods, and minimum/maximum_token
sizeof(dht::token) is only 16 bytes and therefore
it can be passed with 2 registers.

There is no sense in defining minimum_token
and maximum_token out of line, returning a token&
to statically allocated values that require memory
access/copy, while the only call sites that needs
to point to the static min/max tokens are in
dht::ring_position_view.
Instead, they can be defined inline as constexpr
functions and return their const values.

Respectively, define token ctors and methods
as constexpr where applicable (and noexcept while at it
where applicable)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
7e745d31ed partitioner_test: add more token-level tests
Before changing how minimum and maximum
tokens are represented in memory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:37 +03:00
Kamil Braun
ad68a7f799 Merge 'test: raft: fix the flaky test_raft_recovery_stuck' from Emil Maskovsky
Use the rolling restart to avoid spurious driver reconnects.

This can be eventually reverted once the scylladb/python-driver#295 is fixed.

Fixes scylladb/scylladb#19154

Closes scylladb/scylladb#19771

* github.com:scylladb/scylladb:
  test: raft: fix the flaky `test_raft_recovery_stuck`
  test: raft: code cleanup in `test_raft_recovery_stuck`
2024-07-19 19:34:43 +02:00
Piotr Dulikowski
4571262e46 Merge 'Improve constness of functions schema code' from Marcin Maliszkiewicz
In v4 of scylladb/scylladb#19598 the last commit of the patch was replaced but this change missed merge so submitting it in a separate patch.

In the current patch, the original functions class correctly marks methods as const where appropriate, and the instance() method now returns a const object. This ensures protection against accidental modifications, as all changes must go through the change_batch object.

Since the functions_changer class was intended to serve the same purpose, it is now redundant. Therefore, we are reverting the commit that introduced it.

Relates scylladb/scylladb#19153

Closes scylladb/scylladb#19647

* github.com:scylladb/scylladb:
  cql3: functions: replace template with std::function in with_udf_iter()
  cql3: functions: improve functions class constness handling
  Revert "cql3: functions: make modification functions accessible only via batch class"
2024-07-19 19:23:11 +02:00
Emil Maskovsky
9ab25e5cbf test: raft: replace the use of read_barrier work-around
Replaced the old `read_barrier` helper from "test/pylib/util.py"
by the new helper from "test/pylib/rest_client.py" that is calling
the newly introduced direct REST API.

Replaced in all relevant tests and decommissioned the old helper.

Introduced a new helper `get_host_api_address` to retrieve the host API
address - which in come cases can be different from the host address
(e.g. if the RPC address is changed).

Fixes: scylladb/scylladb#19662

Closes scylladb/scylladb#19739
2024-07-19 19:20:44 +02:00
Laszlo Ersek
6711574646 sstables/storage: futurize change_dir_for_test()
Currently change_dir_for_test() is synchronous. Make it return a future,
so that we can use async operations in change_dir_for_test() overrides.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-07-19 15:43:19 +02:00
Piotr Dulikowski
204a479e82 Merge 'db/hints: Test manager::too_many_in_flight_hints_for()' from Dawid Mędrek
In 6e79d64, the behavior of `manager::too_many_in_flight_hints_for()`
was accidentally modified. It remained unnoticed for some time
and then fixed. In this commit, we add a test verifying that
the concurrency of hints being written to disk is indeed limited
and the limitations are imposed properly.

Refs scylladb/scylladb#17636
Fixes scylladb/scylladb#17660

Closes scylladb/scylladb#19741

* github.com:scylladb/scylladb:
  db/hints: Verify that Scylla limits the concurrency of written hints
  db/hints: Coroutinize `hint_endpoint_manager::store_hint()`
  db/hints: Move a constant value to the TU it's used in
2024-07-19 13:26:34 +02:00
Lakshmi Narayanan Sreethar
0615c8a46b boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-19 13:15:57 +05:30
Lakshmi Narayanan Sreethar
31ff69a13c sstables: do not reload components of unlinked sstables
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.

The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.

Fixes #19722

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-19 13:15:57 +05:30
Avi Kivity
c3b9e64713 Merge 'sstable::open_sstable: pass origin from the writer' from Lakshmi Narayanan Sreethar
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.

Closes scylladb/scylladb#19721

* github.com:scylladb/scylladb:
  sstables: use _origin in write path
  sstable::open_sstable: pass and store origin
2024-07-18 19:30:32 +03:00
Avi Kivity
926a02451e Merge 'sstables/index_reader: abort reading during shutdown' from Lakshmi Narayanan Sreethar
This PR adds support for aborting index reads from within `index_consume_entry_context::consume_input` when the server is being stopped. The abort source is now propagated down to the `index_consume_entry_context`, making it available for `consume_input` to check if an abort has been requested. If an abort is detected, `consume_input` will throw an exception to stop the index read operation.

Closes scylladb/scylladb#19453

* github.com:scylladb/scylladb:
  test/boost: test abort behaviour during index read
  sstables/index_reader: stop consuming index when abort has been requested
  sstables::index_consume_entry_context: store abort_source
  sstable: drop old filter only after the new filter is built during rebuild
  sstables/sstables_manager: store abort_source in sstable_manager
  replica/database: pass abort_source to database constructor
2024-07-18 19:26:22 +03:00
Avi Kivity
0780228aa2 config: specialize from-string conversion for bool
The yaml/json representation for bool is true/false, but boost::lexical_cast
is 1/0. Specialize bool conversion to accept true/false (for yaml/json
compatibilty) and 1/0 (for backward compatibility). This provides
round-trip conversion for bool configs in system.config.
2024-07-18 18:38:22 +03:00
Avi Kivity
47e99f4e04 Merge 'Fix lwt semaphore guard accounting' from Gleb Natapov
Currently the guard does not account correctly for ongoing operation if semaphore acquisition fails. It may signal a semaphore when it is not held.

Should be backported to all supported versions.

Closes scylladb/scylladb#19699

* github.com:scylladb/scylladb:
  test: add test to check that coordinator lwt semaphore continues functioning after locking failures
  paxos: do not signal semaphore if it was not acquired
2024-07-18 14:58:31 +03:00
Dawid Medrek
8b6e887e02 db/hints: Verify that Scylla limits the concurrency of written hints
In 6e79d64, the behavior of `manager::too_many_in_flight_hints_for()`
was accidentally modified. It remained unnoticed for some time
and then fixed. In this commit, we add a test verifying that
the concurrency of hints being written to disk is indeed limited
and the limitations are imposed properly.
2024-07-18 13:49:29 +02:00
Botond Dénes
8cc99973eb Merge 'Apply sstable io error handler to exceptions generated when opening file' from Calle Wilund
Fixes #19753

SSTable file open provides an `io_error_handler` instance which is applied to a file-wrapper to process any IO errors happing during read/write via the handler in `storage_service`, which in turn will effectively disable the node. However, this is not applied to the actual open operation itself, i.e. any exception generated by the file open call itself will instead just escape to caller.

This PR adds filtering via the `error_handler` to sstable open + makes `storage_service` "isolate" mechanism non-module-static (thus making it testable) and adds tests to check we exhibit the same behaviour in both cases.

The main motivation for this issue it discussions that secondary level IO issues (i.e. caused by extensions) should trigger the same behaviour as, for example, running out of disk space.

Closes scylladb/scylladb#19766

* github.com:scylladb/scylladb:
  memtable_test: Add test for isolate behaviour on exceptions during flush
  cql_test_env: Expose storage service
  storage_service: Make isolate guard non-static and add test accessor
  sstable: apply error_handler on open exceptions
2024-07-18 08:14:40 +03:00
Avi Kivity
d5af86bd8a test: cql-pytest: config_value_context: remove strange ast.literal_eval call
cql-pytest's config_value_context is used to run a code sequence with
different ScyllaDB configuration applied for a while. When it reads
the original value (in order to restore it later), it applies
ast.literal_eval() to it. This is strange, since the config variable isn't
a Python literal.

It was added in 8c464b2ddb ("guardrails: restrict replication
strategy (RS)"). Presumably, as a workaround for #19604 - it sufficiently
massaged the input we read via SELECT to be acceptable later via UPDATE.

Now that #19604 is fixed, we can remove the call to ast.literal_eval,
but have to fix up the parameters to config_value_context to something
that will be accepted without further massaging.

This is a step towards fixing #15559, where we want to run some tests
with a boolean configuration variable changed, and literal_eval is
transforming the string representation of integers to integers and
confusing the driver.

Closes scylladb/scylladb#19696
2024-07-18 08:11:26 +03:00
Calle Wilund
91b1be6736 memtable_test: Add test for isolate behaviour on exceptions during flush
Tests that certain exceptions thrown during flush to sstable does not
crash the node, but does trigger io_error_handler and causes node isolation
2024-07-17 09:36:28 +00:00
Calle Wilund
f996dfc4fa cql_test_env: Expose storage service
So tests can play with it.
2024-07-17 09:36:28 +00:00
Emil Maskovsky
a89facbc74 test: raft: fix the flaky test_raft_recovery_stuck
Use the rolling restart to avoid spurious driver reconnects.

This can be eventually reverted once the scylladb/python-driver#295 is
fixed.

Fixes scylladb/scylladb#19154
2024-07-17 09:16:06 +02:00
Emil Maskovsky
ef3393bd36 test: raft: code cleanup in test_raft_recovery_stuck
Cleaning up the imports.
2024-07-17 09:09:46 +02:00
Lakshmi Narayanan Sreethar
b762a09dcd sstable::open_sstable: pass and store origin
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:43:30 +05:30
Lakshmi Narayanan Sreethar
7d0f3ace4a test/boost: test abort behaviour during index read
Added a new boost test, index_reader_test, with a testcase to verifyi
the abort behaviour during an index read using
index_consume_entry_context.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar
6a3e7a5e7a sstables/sstables_manager: store abort_source in sstable_manager
Add a new member that stores the abort_source. This can later be used by
the sstables to check if an abort has been requested. Also implement
sstables_manager::get_abort_source() that returns a const reference to
the abort source.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:36:06 +05:30
Lakshmi Narayanan Sreethar
e2142974f8 replica/database: pass abort_source to database constructor
This is in preparation for the following patch that adds abort_source
variable to the sstables_manager.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:36:06 +05:30
Emil Maskovsky
21c67a5a64 test: raft: fix the flaky test_change_ip
The python driver might currently trigger spurios reconnects that cause
the `NoHostAvailable` to be thrown, which is not expected.

This patch adds a retry mechanism to the test to make skip this failure
if it occurs, as a work-around.

The proper fix is expected to be done in the scylladb/python-driver#295,
once fixed there this work-around can be reverted.

Fixes: scylladb/scylla#18547

Closes scylladb/scylladb#19759
2024-07-16 15:46:16 +02:00
Gleb Natapov
4178589826 test: add test to check that coordinator lwt semaphore continues functioning after locking failures 2024-07-16 12:32:25 +03:00
Avi Kivity
dde209390f Merge 'sstables: fix some mixups between the writer's schema and the sstable's schema' from Michał Chojnowski
There are two schemas associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

This series fixes the known mixups between the two — when setting up compression,
and when setting up the bloom filters.

Fixes #16065

The bug is present in all supported versions, so the patch has to be backported to all of them.

Closes scylladb/scylladb#19695

* github.com:scylladb/scylladb:
  sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's
  sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's
  sstables: for i_filter downcasts, use dynamic_cast instead of static_cast
2024-07-16 12:17:41 +03:00
Raphael S. Carvalho
c061ec8d1c test: Fix max_ongoing_compaction_test test
```
DEBUG 2024-07-03 00:59:58,291 [shard 0:main] compaction_manager - Compaction task 0x51800002a480 for table tests.3 compaction_group=0 [0x503000062050]: switch_state: none -> pending: pending=2 active=0 done=0 errors=0

DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - Checking droppable sstables in tests.3, candidates=0
DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - time_window_compaction_strategy::newest_bucket:
  now 1720314000000000
  buckets = {
    key=1720314000000000, size=2
    key=1720310400000000, size=2

1720314000000000: GMT: Sunday, July 7, 2024 1:00:00 AM
1720310400000000: GMT: Sunday, July 7, 2024 12:00:00 AM
```

the test failed to complete when ran across different clock hours, as it
expected all sstables produced to belong to same window of 1h size.
let's fix it by reusing timestamps, so it's always consistent.

Fixes #13280.
Fixes #18564.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#19749
2024-07-16 07:29:10 +03:00
Marcin Maliszkiewicz
85d38e013c cql3: functions: improve functions class constness handling
Declares getters as const methods. Makes instance() function
return const object so that it may only be modified via change_batch
class.
2024-07-15 09:39:20 +02:00
Botond Dénes
53a6ec05ed Merge 'replica: remove rwlock for protecting iteration over storage group map' from Raphael "Raph" Carvalho
rwlock was added to protect iterations against concurrent updates to the map.

the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup).

the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time).

to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out.

Fixes #18821.

```
WRITE
=====

./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets --write

- BEFORE

65559.52 tps ( 59.6 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   52841 insns/op,   30946 cycles/op,        0 errors)
67408.05 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53018 insns/op,   30874 cycles/op,        0 errors)
67714.72 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53026 insns/op,   30881 cycles/op,        0 errors)
67825.57 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53015 insns/op,   30821 cycles/op,        0 errors)
67810.74 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53009 insns/op,   30828 cycles/op,        0 errors)

         throughput: mean=67263.72 standard-deviation=967.40 median=67714.72 median-absolute-deviation=547.02 maximum=67825.57 minimum=65559.52
instructions_per_op: mean=52981.61 standard-deviation=79.09 median=53014.96 median-absolute-deviation=36.54 maximum=53025.79 minimum=52840.56
  cpu_cycles_per_op: mean=30869.90 standard-deviation=50.23 median=30874.06 median-absolute-deviation=42.11 maximum=30945.94 minimum=30820.89

- AFTER
65448.76 tps ( 59.5 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   52788 insns/op,   31013 cycles/op,        0 errors)
67290.83 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53025 insns/op,   30950 cycles/op,        0 errors)
67646.81 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53025 insns/op,   30909 cycles/op,        0 errors)
67565.90 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53058 insns/op,   30951 cycles/op,        0 errors)
67537.32 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   52983 insns/op,   30963 cycles/op,        0 errors)

         throughput: mean=67097.93 standard-deviation=931.44 median=67537.32 median-absolute-deviation=467.97 maximum=67646.81 minimum=65448.76
instructions_per_op: mean=52975.85 standard-deviation=108.07 median=53024.55 median-absolute-deviation=49.45 maximum=53057.99 minimum=52788.49
  cpu_cycles_per_op: mean=30957.17 standard-deviation=37.43 median=30951.31 median-absolute-deviation=7.51 maximum=31013.01 minimum=30908.62

READ
=====

./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets

- BEFORE

79423.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41840 insns/op,   26820 cycles/op,        0 errors)
81076.70 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41837 insns/op,   26583 cycles/op,        0 errors)
80927.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41829 insns/op,   26629 cycles/op,        0 errors)
80539.44 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41841 insns/op,   26735 cycles/op,        0 errors)
80793.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41864 insns/op,   26662 cycles/op,        0 errors)

         throughput: mean=80551.99 standard-deviation=661.12 median=80793.10 median-absolute-deviation=375.37 maximum=81076.70 minimum=79423.36
instructions_per_op: mean=41842.20 standard-deviation=13.26 median=41840.14 median-absolute-deviation=5.68 maximum=41864.50 minimum=41829.29
  cpu_cycles_per_op: mean=26685.88 standard-deviation=93.31 median=26662.18 median-absolute-deviation=56.47 maximum=26820.08 minimum=26582.68

- AFTER
79464.70 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41799 insns/op,   26761 cycles/op,        0 errors)
80954.58 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41803 insns/op,   26605 cycles/op,        0 errors)
81160.90 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41811 insns/op,   26555 cycles/op,        0 errors)
81263.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41814 insns/op,   26527 cycles/op,        0 errors)
81162.97 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41806 insns/op,   26549 cycles/op,        0 errors)

         throughput: mean=80801.25 standard-deviation=755.54 median=81160.90 median-absolute-deviation=361.72 maximum=81263.10 minimum=79464.70
instructions_per_op: mean=41806.47 standard-deviation=5.85 median=41806.05 median-absolute-deviation=4.05 maximum=41813.86 minimum=41799.36
  cpu_cycles_per_op: mean=26599.22 standard-deviation=94.84 median=26554.54 median-absolute-deviation=50.51 maximum=26761.06 minimum=26527.05
```

Closes scylladb/scylladb#19469

* github.com:scylladb/scylladb:
  replica: remove rwlock for protecting iteration over storage group map
  replica: get rid of fragile compaction group intrusive list
2024-07-12 15:45:36 +03:00