Commit Graph

39757 Commits

Author SHA1 Message Date
Nadav Har'El
92f591dc38 test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra
The test function test_mv_synchronous_updates checks the
synchronous_updates feature, which is a ScyllaDB extension and
doesn't exist in Cassandra. So it should be marked with "scylla_only"
so that it doesn't fail when running the tests on Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
301189ee28 test/cql-pytest: fix test_keyspace.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

When testing some invalid cases of ALTER TABLE, the test required
that you cannot choose SimpleStrategy without specifying a
replication_factor. As explained in Refs #16028, this isn't true
in Cassandra 4.1 and up - it now has a default value for
replication_factor and it's no longer required.

So in this patch we split that part of the test to a separate test
function and mark it scylla_only.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
2b67cd3921 test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only
The tests in test/cql-pytest/test_guardrail_replication_strategy.py
are for a Scylla-only feature that doesn't exist in Cassandra, so
obviously they all fail on Cassandra. Let's mark them all as
scylla_only.

We use an autouse fixture to automatically mark all tests in this file
as scylla-only, instead of marking each one separately.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
c4d3e08987 test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

This patch is only a partial fix - it fixes trivial differences in error
messages, but some potentially-real differences remain so three of the
tests still fail:

1. Trying to set tombstone_threshold to 5.5 is an error in ScyllaDB
   ("must be between 0.0 and 1.0") but allowed in Cassandra.

2. Trying to set bucket_low to 0.0 is an error in ScyllaDB, giving the
   wrong-looking error message "must be between 0.0 and 1.0" (so 0.0 should
   have been fine?!) but allowed in Cassandra.

3. Trying to set timestamp_resolution to SECONDS is an error in ScyllaDB
   ("invalid timestamp resolution SECONDS") but allowed in Cassandra.
   I don't think anybody wants to actually use "SECONDS", but it seems
   legal in Cassandra, so do we need to support it?

The patch also simplifies the test to use cql-pytest's util.py, instead
of cassandra_tests/porting.py. The latter was meant to make porting
existing Cassandra tests easier - not for writing new ones - and made
using a regular expression for testing error messages harder so I
switched to using pytest.raises() whose "match=" accepts a regular
expression.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
8e51ebd8a0 test/cql-pytest: fix test_filtering.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

It turns out that when the token() function is used with incorrect
parameters (it needs to be passed all partition-key columns), the
error message is different in ScyllaDB and Cassandra. Both are
reasonable error messages, so if we insist on checking the error
message - we should allow both.

Also the same test called its second partition-key column "ck". This
is confusing, because we usually use the name "ck" to refer to a clustering
key. So just for clarity, we change this name to "pk2". This is not a
functional change in the test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Kefu Chai
58f3ced4d6 scylla-gdb: raise if no tasks are found
the "task" fixture is supposed to return a task for test, if it
fails to do so, it would be an issue not directly related to
the test. so let's fail it early.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16042
2023-11-14 11:12:43 +02:00
Botond Dénes
22381441b0 migration_manager: also reload schema on enabling digest_insensitive_to_expiry
Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: #16004

Closes scylladb/scylladb#16013
2023-11-13 23:32:20 +02:00
Kamil Braun
d24b305712 Merge 'raft topology: join: do not time out waiting for the node to be joined' from Patryk Jędrzejczak
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.

Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.

Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.

Additionally, after removing the timeout, we adjust the topology
coordinator. We make it try sending the response (both acceptance
and rejection) only once since we do not care if it fails anymore. We
only need to ensure that the joining node is moved to the left state
if sending fails.

Fixes #15865

Closes scylladb/scylladb#15944

* github.com:scylladb/scylladb:
  raft topology: fix indentation
  raft topology: join: try sending the response only once
  raft topology: join: do not time out waiting for the node to be joined
  group 0: group0_handshaker: add the abort_source parameter to post_server_start
2023-11-13 15:02:27 +01:00
Takuya ASADA
85339d1820 scylla_setup: add warning for CentOS7 default kernel
Since CentOS7 default kernel is too old, has performance issues and also
has some bugs, we have been recommended to use kernel-ml kernel.
Let's check kernel version in scylla_setup and print warning if the
kernel is CentOS7 default one.

related #7365

Closes scylladb/scylladb#15705
2023-11-13 13:47:06 +02:00
Botond Dénes
2b11a02b67 Merge 'Improvements to gossiper shadow round' from Kamil Braun
Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions.
Fix the calculation of `nodes_down` which could count a single node multiple times.
Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode).
Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node).
More details in commit messages.

Ref: https://github.com/scylladb/scylladb/issues/15675

Closes scylladb/scylladb#15941

* github.com:scylladb/scylladb:
  gossiper: do_shadow_round: increment `nodes_down` in case of timeout
  gossiper: do_shadow_round: fix `nodes_down` calculation
  storage_service: make shadow round mandatory during bootstrap/replace
  gossiper: do_shadow_round: remove default value for nodes param
  gossiper: do_shadow_round: remove `fall_back_to_syn_msg`
2023-11-13 13:37:13 +02:00
Botond Dénes
dfd7981fa7 api/storage_service: start/stop native transport in the statement sg
Currently, it is started/stopped in the streaming/maintenance sg, which
is what the API itself runs in.
Starting the native transport in the streaming sg, will lead to severely
degraded performance, as the streaming sg has significantly less
CPU/disk shares and reader concurrency semaphore resources.
Furthermore, it will lead to multi-paged reads possibly switching
between scheduling groups mid-way, triggering an internal error.

To fix, use `with_scheduling_group()` for both starting and stopping
native transport. Technically, it is only strictly necessary for
starting, but I added it for stop as well for consistency.

Also apply the same treatment to RPC (Thrift). Although no one uses it,
best to fix it, just to be on the safe side.

I think we need a more systematic approach for solving this once and for
all, like passing the scheduling group to the protocol server and have
it switch to it internally. This allows the server to always run on the
correct scheduling group, not depending on the caller to remember using
it. However, I think this is best done in a follow-up, to keep this
critical patch small and easily backportable.

Fixes: #15485

Closes scylladb/scylladb#16019
2023-11-13 14:08:01 +03:00
Anna Stuchlik
8a4a8f077a doc: document full support for RBNO
This commit updates the Repair-Based Node
Operations page. In particular:
- Information about RBNO enabled for all
  node operations is added (before 5.4, RBNO
  was enabled for the replace operation, while
  it was experimental for others).
- The content is rewritten to remove redundant
  information about previous versions.

The improvement is part of the 5.4 release.
This commit must be backported to branch-5.4

Closes scylladb/scylladb#16015
2023-11-13 13:06:15 +02:00
Pavel Emelyanov
492b842929 messaging_service: Define metrics domain for client connections
Recent seastar update included RPC metrics (scylladb/seastar#1753). The
reported metrics groups together sockets based on their "metrics_domain"
configuration option. This patch makes use of this domain to make scylla
metrics sane.

The domain as this patch defines it includes two strings:

First, the datacenter the server lives in. This is because grouping
metrics for connections to different datacenters makes little sense for
several reasons. For example -- packet delays _will_ differ for local-DC
vs cross-DC traffic and mixing those latencies together is pointless.
Another example -- the amount of traffic may also differ for local- vs
cross-DC connections e.g. because of different usage of enryption and/or
compression.

Second, each verb-idx gets its own domain. That's to be able to analyze
e.g. query-related traffic from gossiper one. For that the existing
isolation cookie is taken as is.

Note, that the metrics is _not_ per-server node. So e.g. two gossiper
connections to two different nodes (in one DC) will belong to the same
domain and thus their stats will be summed when reported.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15785
2023-11-13 11:13:20 +01:00
Kefu Chai
efd65aebb2 build: cmake: add check-header target
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.

we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15913
2023-11-13 10:27:06 +02:00
Avi Kivity
7b08886e8d Update tools/java submodule (dependencies update)
* tools/java 86a200e324...97c490947c (1):
  > Merge 'build: update several dependencies' from Piotr Grabowski

Ref https://github.com/scylladb/scylla-tools-java/issues/348
Ref https://github.com/scylladb/scylla-tools-java/issues/349
Ref https://github.com/scylladb/scylla-tools-java/issues/350
2023-11-12 18:17:04 +02:00
Tomasz Grabiec
457d170078 Merge 'Multishard mutation query test fix misses expectations' from Botond Dénes
There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well
as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many
shards we have without readers on them.

Fixes: https://github.com/scylladb/scylladb/issues/14087

Closes scylladb/scylladb#15806

* github.com:scylladb/scylladb:
  test/boost/multishard_mutation_query_test: fix querier cache misses expectations
  test/lib/test_utils: add require_* variants for all comparators
2023-11-12 13:15:29 +01:00
Benny Halevy
68a7bbe582 compaction_manager: perform_cleanup: ignore condition_variable_timed_out
The polling loop was intended to ignore
`condition_variable_timed_out` and check for progress
using a longer `max_idle_duration` timeout in the loop.

Fixes #15669

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15671
2023-11-12 13:53:51 +02:00
Patryk Jędrzejczak
2d7bfeb3fa raft topology: fix indentation
Broken in the previous commit.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
e94c7cff28 raft topology: join: try sending the response only once
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response.
In the previous commit, we have made the operator responsible for
shutting down the joining node if the topology coordinator fails
to deliver a response by removing the timeout. In this commit, we
adjust the topology coordinator. We make it try sending the
response (both acceptance and rejection) only once since we do not
care if it fails anymore. We only need to ensure that the joining
node is moved to the left state if sending fails.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
4ffa692cb3 raft topology: join: do not time out waiting for the node to be joined
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.

Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.

Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.

This change additionally fixes the TODO in
raft_group0::join_group0.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
5f36e1d7f2 group 0: group0_handshaker: add the abort_source parameter to post_server_start
Used in the following commit to enable the clean shutdown of a
node that does not receive the join rejection from the topology
coordinator.
2023-11-10 12:35:38 +01:00
Anna Stuchlik
8d618bbfc6 doc: update cqlsh compatibility with Python
This commit updates the cqlsh compatibility
with Python to Python 3.

In addition it:
- Replaces "Cassandra" with "ScyllaDB" in
  the description of cqlsh.
  The previous description was outdated, as
  we no longer can talk about using cqlsh
  released with Cassandra.
- Replaces occurrences of "Scylla" with "ScyllaDB".
- Adds additional locations of cqlsh (Docker Hub
  and PyPI), as well as the link to the scylla-cqlsh
  repository.

Closes scylladb/scylladb#16016
2023-11-10 09:19:41 +02:00
Avi Kivity
d8bf8f0f43 Merge 'Do not create directories in datadir for S3-backed sstables' from Pavel Emelyanov
After 146e49d0dd (Rewrap keyspace population loop) the datadir layout is no longer needed by sstables boot-time loader and finally directories can be omitted for S3-backed keyspaces. Tables of that keyspace don't touch/remove their datadirs either (snapshots still don't work for S3)

fixes: #13020

Closes scylladb/scylladb#16007

* github.com:scylladb/scylladb:
  test/object_store: Check that keyspace directory doesn't appear
  sstables/storage: Do storage init/destroy based on storage options
  replica/{ks|cf}: Move storage init/destroy to sstables manager
  database: Add get_sstables_manager(bool_class is_system) method
2023-11-09 20:35:13 +02:00
Kamil Braun
3bcee6a981 Revert "Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani"
This reverts commit 7c7baf71d5.

If `stop_gracefully` times out during test teardown phase, it crashes
the test framework reporting multiple errors, for example:
```
12:35:52  /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
12:35:52    self.exit_artifacts = {}
12:35:52  RuntimeWarning: Enable tracemalloc to get the object allocation traceback
12:35:52  Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:52  Traceback (most recent call last):
12:35:52    File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for
12:35:52      return fut.result()
12:35:52             ^^^^^^^^^^^^
12:35:52    File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait
12:35:52      return await self._transport._wait()
12:35:52             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12:35:52    File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait
12:35:52      return await waiter
12:35:52             ^^^^^^^^^^^^
12:35:52  asyncio.exceptions.CancelledError
12:35:52
12:35:52  The above exception was the direct cause of the following exception:
12:35:52
12:35:52  Traceback (most recent call last):
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully
12:35:52      await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS)
12:35:52    File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for
12:35:52      raise exceptions.TimeoutError() from exc
12:35:52  TimeoutError
12:35:52
12:35:52  During handling of the above exception, another exception occurred:
12:35:52
12:35:52  Traceback (most recent call last):
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789
12:35:52      code = await main()
12:35:52             ^^^^^^^^^^^^
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main
12:35:52      await run_all_tests(signaled, options)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests
12:35:52      await reap(done, pending, signaled)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap
12:35:52      result = coro.result()
12:35:52               ^^^^^^^^^^^^^
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run
12:35:52      await test.run(options)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run
12:35:52      async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager:
12:35:52    File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__
12:35:52      await anext(self.gen)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager
12:35:52      await manager.stop()
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop
12:35:52      await self.clusters.put(self.cluster, is_dirty=True)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put
12:35:52      await self.destroy(obj)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster
12:35:52      await cluster.stop_gracefully()
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully
12:35:52      await asyncio.gather(*(server.stop_gracefully() for server in self.running.values()))
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully
12:35:52      raise RuntimeError(
12:35:52  RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:58  sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited
12:35:58  sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
```
2023-11-09 12:30:35 +01:00
Gleb Natapov
2dd8152c8b storage_service: topology coordinator: log rollback event before changing node's state
The test for the rollback relies on the log to be there after operation
fails, but if node's state is changed before the log the operation may
fail before the log is printed.

Fixes scylladb/scylladb#15980

Message-ID: <ZUuwoq65SJcS+yTH@scylladb.com>
2023-11-09 12:11:58 +01:00
Botond Dénes
d8b6771eb8 Merge 'doc: add CQL Reference for Materialized Views and remove irrelevant version information' from Anna Stuchlik
This PR is a follow-up to https://github.com/scylladb/scylladb/pull/15742#issuecomment-1766888218.
It adds CQL Reference for Materialized Views to the Materialized Views page.

In addition, it removes the irrelevant information about when the feature was added and replaces "Scylla" with "ScyllaDB".

(nobackport)

Closes scylladb/scylladb#15855

* github.com:scylladb/scylladb:
  doc: remove versions from Materialized Views
  doc: add CQL Reference for Materialized Views
2023-11-09 10:43:11 +01:00
Botond Dénes
1cccc86813 Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk"
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.

Reverting because rest_api.test_compaction_task started failing after
this was merged.

Fixes: #16005
2023-11-09 10:43:11 +01:00
Eliran Sinvani
c5956957f3 use_statement: Covert an exception to a future exception
The use statement execution code can throw if the keyspace is
doesn't exist, this can be a problem for code that will use
execute in a fiber since the exception will break the fiber even
if `then_wrapped` is used.

Fixes #14449

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb/scylladb#14394
2023-11-09 10:43:11 +01:00
Pavel Emelyanov
7e1017c7d8 test/object_store: Check that keyspace directory doesn't appear
When creating a S3-backed keyspace its storage dir shouldn't be made.
Also it shouldn't be "resurrected" by boot-time loader of existing
keyspaces.

For extra confidence check that the system keyspace's directory does
exists where the test expects keyspaces' directories to appear.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
f6eae191ff sstables/storage: Do storage init/destroy based on storage options
It's only local storage type that needs directores touch/remove, S3
storage initialization is for now a no-op, maybe some day soon it will
appear.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
11b704e8b8 replica/{ks|cf}: Move storage init/destroy to sstables manager
It's the manager that knows about storages and it should init/destroy
it. Also the "upload" and "staging" paths are about to be hidden in
sstables/ code, this code move also facilitates that.

The indentation in storage.cc is deliberately broken to make next patch
look nicer (spoiler: it won't have to shift those lines right).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
68cf26587c database: Add get_sstables_manager(bool_class is_system) method
There's one place that does this selection, soon there will appear
another, so it's worth having a convenience helper getter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Nadav Har'El
6453f41ca9 Merge 'schema: add whitespaces to values of table options' from Michał Jadwiszczak
Add a space after each colon and comma (if they don't have any after) in values of table option which are json objects (`caching`, `tombstone_gc` and `cdc`).
This improves readability and matches client-side describe format.

Fixes: #14895

Closes scylladb/scylladb#15900

* github.com:scylladb/scylladb:
  cql-pytest:test_describe: add test for whitespaces in json objects
  schema: add whitespace to description of  table options
2023-11-08 15:26:49 +02:00
Anna Stuchlik
ca0f5f39b5 doc: fix info about in 5.4 upgrade guide
This commit fixes the information about
Raft-based consistent cluster management
in the 5.2-to-5.4 upgrade guide.

This a follow-up to https://github.com/scylladb/scylladb/pull/15880 and must be backported to branch-5.4.

In addition, it adds information about removing
DateTieredCompactionStrategy to the 5.2-to-5.4
upgrade guide, including the guideline to
migrate to TimeWindowCompactionStrategy.

Closes scylladb/scylladb#15988
2023-11-08 13:21:53 +01:00
Kamil Braun
3036a80334 docs: mention Raft getting enabled when upgrading to 5.4
Fixes: scylladb/scylladb#15952

Closes scylladb/scylladb#16000
2023-11-08 14:18:29 +02:00
Kamil Braun
f094e23d84 system_keyspace: use system memory for system.raft table
`system.raft` was using the "user memory pool", i.e. the
`dirty_memory_manager` for this table was set to
`database::_dirty_memory_manager` (instead of
`database::_system_dirty_memory_manager`).

This meant that if a write workload caused memory pressure on the user
memory pool, internal `system.raft` writes would have to wait for
memtables of user tables to get flushed before the write would proceed.

This was observed in SCT longevity tests which ran a heavy workload on
the cluster and concurrently, schema changes (which underneath use the
`system.raft` table). Raft would often get stuck waiting many seconds
for user memtables to get flushed. More details in issue #15622.
Experiments showed that moving Raft to system memory fixed this
particular issue, bringing the waits to reasonable levels.

Currently `system.raft` stores only one group, group 0, which is
internally used for cluster metadata operations (schema and topology
changes) -- so it makes sense to keep use system memory.

In the future we'd like to have other groups, for strongly consistent
tables. These groups should use the user memory pool. It means we won't
be able to use `system.raft` for them -- we'll just have to use a
separate table.

Fixes: scylladb/scylladb#15622

Closes scylladb/scylladb#15972
2023-11-08 11:21:14 +02:00
Nadav Har'El
284534f489 Merge 'Nodetool additional commands 4/N' from Botond Dénes
This PR implements the following new nodetool commands:
* snapshot
* drain
* flush
* disableautocompaction
* enableautocompaction

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15939

* github.com:scylladb/scylladb:
  test/nodetool: add README.md
  tools/scylla-nodetool: implement enableautocompaction command
  tools/scylla-nodetool: implement disableautocompaction command
  tools/scylla-nodetool: implement the flush command
  tools/scylla-nodetool: extract keyspace/table parsing
  tools/scylla-nodetool: implement the drain command
  tools/scylla-nodetool: implement the snapshot command
  test/nodetool: add support for matching aproximate query parameters
  utils/http: make dns_connection_factory::initialize() static
2023-11-08 11:18:35 +02:00
Kefu Chai
cf70970226 build: cmake: use $<CONFIG:cfgs> when appropriate
since CMake 3.19, we are able to use $<CONFIG:cfgs> instead of
the more cubersume $<IN_LIST:$<CONFIG>,foo;bar> expression for
checking if a config is in a list of configurations.
and since the minimal required CMake of scylla is 3.27, so let's
use $<CONFIG:cfgs> when possible.

see also https://cmake.org/cmake/help/git-stage/manual/cmake-generator-expressions.7.html#configuration-expressions

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15989
2023-11-08 08:50:44 +02:00
Nadav Har'El
3729ea8bfd cql-pytest: translate Cassandra's test for CREATE operations
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.

The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:

Refs #6442: Always print all schema parameters (including default values)
Refs #8001: Documented unit "µs" not supported for assigning a duration"
            type.
Refs #8892: Add an option for default RF for new keyspaces.
Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:

1. Tests for CREATE TRIGGER were commented out because testing them
   in Cassandra requires adding a Java class for the test. We're also
   not likely to ever add this feature to Scylla (Refs #2205).

2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
   used internal Java APIs instead of CQL, and it also unlikely
   we'll ever implement it in a way compatible with Cassandra because
   of its Java reliance.

3. One test for data center names used internal Cassandra Java APIs, not
   CQL to create mock data centers and snitches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#15791
2023-11-08 08:46:27 +02:00
Botond Dénes
2860d43309 Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.

Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.

Closes scylladb/scylladb#15083

* github.com:scylladb/scylladb:
  test: test abort of compaction task that isn't started yet
  test: test running compaction task abort
  tasks: fail if a task was aborted
  compaction: abort task manager compaction tasks
2023-11-08 08:45:16 +02:00
Nadav Har'El
a3621dbd3e Merge 'Alternator: Support new ReturnValuesOnConditionCheckFailure feature' from Marcin Maliszkiewicz
alternator: add support for ReturnValuesOnConditionCheckFailure feature

As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed.

Fixes https://github.com/scylladb/scylladb/issues/14481

Closes scylladb/scylladb#15125

* github.com:scylladb/scylladb:
  alternator: add support for ReturnValuesOnConditionCheckFailure feature
  alternator: add ability to send additional fields in api_error
2023-11-07 23:19:51 +02:00
Takuya ASADA
a4aeef2eb0 scylla_util.py: run apt-get update before apt-get install if it necessary
Unlike yum, "apt-get install" may fails because package cache is outdated.
Let's check package cache mtime and run "apt-get update" if it's too old.

Fixes #4059

Closes scylladb/scylladb#15960
2023-11-07 20:40:16 +02:00
Wojciech Mitros
ab743271f1 test: increase timeout for lua UDF execution
When running on a particularly slow setup, for example on
an ARM machine in debug mode, the execution time of even
a small Lua UDF that we're using in tests may exceed our
default limits.
To avoid timeout errors, the limit in tests is now increased
to a value that won't be exceeded in any reasonable scenario
(for the current set of tested UDFs), while not making the
test take an excessive amount of time in case of an error in
the UDF execution.

Fixes #15977

Closes scylladb/scylladb#15983
2023-11-07 20:28:28 +02:00
Kamil Braun
07e9522d6c Merge 'raft topology: handle abort exceptions better in fence_previous_coordinator' from Piotr Dulikowski
When topology coordinator tries to fence the previous coordinator it
performs a group0 operation. The current topology coordinator might be
aborted in the meantime, which will result in a `raft::request_aborted`
exception being thrown. After the fix to scylladb/scylladb#15728 was
merged, the exception is caught, but then `sleep_abortable` is called
which immediately throws `abort_requested_exception` as it uses the same
abort source as the group0 operation. The `fence_previous_coordinator`
function which does all those things is not supposed to throw
exceptions, if it does - it causes `raft_state_monitor_fiber` to exit,
completely disabling the topology coordinator functionality on that
node.

Modify the code in the following way:

- Catch `abort_requested_exception` thrown from `sleep_abortable` and
  exit the function if it happens. In addition to the described issue,
it will also handle the case when abort is requested while
`sleep_abortable` happens,
- Catch `raft::request_aborted` thrown from group0 operation, log the
  exception with lower verbosity and exit the function explicitly.

Finally, wrap both `fence_previous_coordinator` and `run` functions in a
`try` block with `on_fatal_internal_error` in the catch handler in order
to implement the behavior that adding `noexcept` was originally supposed
to introduce.

Fixes: scylladb/scylladb#15747

Closes scylladb/scylladb#15948

* github.com:scylladb/scylladb:
  raft topology: catch and abort on exceptions from topology_coordinator::run
  Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept"
  raft topology: don't print an error when fencing previous coordinator is aborted
  raft topology: handle abort exceptions from sleeping in fence_previous_coordinator
2023-11-07 17:17:49 +01:00
Botond Dénes
60ea940f9e Merge 'docs: render options with role' from Kefu Chai
this series tries to

1. render options with role. so the options can be cross referenced and defined.
2. move the formatting out of the content. so the representation can be defined in a more flexible way.

Closes scylladb/scylladb#15860

* github.com:scylladb/scylladb:
  docs: add divider using CSS
  docs: extract _clean_description as a filter
  docs: render option with role
  docs: parse source files right into rst
2023-11-07 17:01:22 +02:00
Botond Dénes
3088453a09 test/nodetool: add README.md 2023-11-07 09:49:56 -05:00
Botond Dénes
7ff7cdc86a tools/scylla-nodetool: implement enableautocompaction command 2023-11-07 09:49:56 -05:00
Botond Dénes
0e0401a5c5 tools/scylla-nodetool: implement disableautocompaction command 2023-11-07 09:49:56 -05:00
Botond Dénes
f5083f66f5 tools/scylla-nodetool: implement the flush command 2023-11-07 09:49:56 -05:00
Botond Dénes
f082cc8273 tools/scylla-nodetool: extract keyspace/table parsing
Having to extract 1 keyspace and N tables from the command-line is
proving to be a common pattern among commands. Extract this into a
method, so the boiler-plate can be shared. Add a forward-looking
overload as well, which will be used in the next patch.
2023-11-07 09:49:56 -05:00