scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 04:06:59 +00:00

Author	SHA1	Message	Date
Nadav Har'El	92f591dc38	test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra The test function test_mv_synchronous_updates checks the synchronous_updates feature, which is a ScyllaDB extension and doesn't exist in Cassandra. So it should be marked with "scylla_only" so that it doesn't fail when running the tests on Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	301189ee28	test/cql-pytest: fix test_keyspace.py to not fail on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). When testing some invalid cases of ALTER TABLE, the test required that you cannot choose SimpleStrategy without specifying a replication_factor. As explained in Refs #16028, this isn't true in Cassandra 4.1 and up - it now has a default value for replication_factor and it's no longer required. So in this patch we split that part of the test to a separate test function and mark it scylla_only. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	2b67cd3921	test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only The tests in test/cql-pytest/test_guardrail_replication_strategy.py are for a Scylla-only feature that doesn't exist in Cassandra, so obviously they all fail on Cassandra. Let's mark them all as scylla_only. We use an autouse fixture to automatically mark all tests in this file as scylla-only, instead of marking each one separately. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	c4d3e08987	test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). This patch is only a partial fix - it fixes trivial differences in error messages, but some potentially-real differences remain so three of the tests still fail: 1. Trying to set tombstone_threshold to 5.5 is an error in ScyllaDB ("must be between 0.0 and 1.0") but allowed in Cassandra. 2. Trying to set bucket_low to 0.0 is an error in ScyllaDB, giving the wrong-looking error message "must be between 0.0 and 1.0" (so 0.0 should have been fine?!) but allowed in Cassandra. 3. Trying to set timestamp_resolution to SECONDS is an error in ScyllaDB ("invalid timestamp resolution SECONDS") but allowed in Cassandra. I don't think anybody wants to actually use "SECONDS", but it seems legal in Cassandra, so do we need to support it? The patch also simplifies the test to use cql-pytest's util.py, instead of cassandra_tests/porting.py. The latter was meant to make porting existing Cassandra tests easier - not for writing new ones - and made using a regular expression for testing error messages harder so I switched to using pytest.raises() whose "match=" accepts a regular expression. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	8e51ebd8a0	test/cql-pytest: fix test_filtering.py to not fail on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). It turns out that when the token() function is used with incorrect parameters (it needs to be passed all partition-key columns), the error message is different in ScyllaDB and Cassandra. Both are reasonable error messages, so if we insist on checking the error message - we should allow both. Also the same test called its second partition-key column "ck". This is confusing, because we usually use the name "ck" to refer to a clustering key. So just for clarity, we change this name to "pk2". This is not a functional change in the test. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Kefu Chai	58f3ced4d6	scylla-gdb: raise if no tasks are found the "task" fixture is supposed to return a task for test, if it fails to do so, it would be an issue not directly related to the test. so let's fail it early. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16042	2023-11-14 11:12:43 +02:00
Botond Dénes	22381441b0	migration_manager: also reload schema on enabling digest_insensitive_to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: #16004 Closes scylladb/scylladb#16013	2023-11-13 23:32:20 +02:00
Kamil Braun	d24b305712	Merge 'raft topology: join: do not time out waiting for the node to be joined' from Patryk Jędrzejczak When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. The response is not guaranteed to come back. If the topology coordinator cannot contact the joining node, it moves the node to the left state and moves on. Currently, to handle the case when the response does not come back, the joining node gives up waiting for it after 3 minutes. However, it might take more time for the topology coordinator to start handling the request to join, as it might be working on other tasks like adding other nodes, performing tablet migrations, etc. In general, any timeout duration would be unreliable. Therefore, we get rid of the timeout. From now on, the operator will be responsible for shutting down the node if the topology coordinator fails to deliver the rejection. Additionally, after removing the timeout, we adjust the topology coordinator. We make it try sending the response (both acceptance and rejection) only once since we do not care if it fails anymore. We only need to ensure that the joining node is moved to the left state if sending fails. Fixes #15865 Closes scylladb/scylladb#15944 * github.com:scylladb/scylladb: raft topology: fix indentation raft topology: join: try sending the response only once raft topology: join: do not time out waiting for the node to be joined group 0: group0_handshaker: add the abort_source parameter to post_server_start	2023-11-13 15:02:27 +01:00
Takuya ASADA	85339d1820	scylla_setup: add warning for CentOS7 default kernel Since CentOS7 default kernel is too old, has performance issues and also has some bugs, we have been recommended to use kernel-ml kernel. Let's check kernel version in scylla_setup and print warning if the kernel is CentOS7 default one. related #7365 Closes scylladb/scylladb#15705	2023-11-13 13:47:06 +02:00
Botond Dénes	2b11a02b67	Merge 'Improvements to gossiper shadow round' from Kamil Braun Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions. Fix the calculation of `nodes_down` which could count a single node multiple times. Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode). Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node). More details in commit messages. Ref: https://github.com/scylladb/scylladb/issues/15675 Closes scylladb/scylladb#15941 * github.com:scylladb/scylladb: gossiper: do_shadow_round: increment `nodes_down` in case of timeout gossiper: do_shadow_round: fix `nodes_down` calculation storage_service: make shadow round mandatory during bootstrap/replace gossiper: do_shadow_round: remove default value for nodes param gossiper: do_shadow_round: remove `fall_back_to_syn_msg`	2023-11-13 13:37:13 +02:00
Botond Dénes	dfd7981fa7	api/storage_service: start/stop native transport in the statement sg Currently, it is started/stopped in the streaming/maintenance sg, which is what the API itself runs in. Starting the native transport in the streaming sg, will lead to severely degraded performance, as the streaming sg has significantly less CPU/disk shares and reader concurrency semaphore resources. Furthermore, it will lead to multi-paged reads possibly switching between scheduling groups mid-way, triggering an internal error. To fix, use `with_scheduling_group()` for both starting and stopping native transport. Technically, it is only strictly necessary for starting, but I added it for stop as well for consistency. Also apply the same treatment to RPC (Thrift). Although no one uses it, best to fix it, just to be on the safe side. I think we need a more systematic approach for solving this once and for all, like passing the scheduling group to the protocol server and have it switch to it internally. This allows the server to always run on the correct scheduling group, not depending on the caller to remember using it. However, I think this is best done in a follow-up, to keep this critical patch small and easily backportable. Fixes: #15485 Closes scylladb/scylladb#16019	2023-11-13 14:08:01 +03:00
Anna Stuchlik	8a4a8f077a	doc: document full support for RBNO This commit updates the Repair-Based Node Operations page. In particular: - Information about RBNO enabled for all node operations is added (before 5.4, RBNO was enabled for the replace operation, while it was experimental for others). - The content is rewritten to remove redundant information about previous versions. The improvement is part of the 5.4 release. This commit must be backported to branch-5.4 Closes scylladb/scylladb#16015	2023-11-13 13:06:15 +02:00
Pavel Emelyanov	492b842929	messaging_service: Define metrics domain for client connections Recent seastar update included RPC metrics (scylladb/seastar#1753). The reported metrics groups together sockets based on their "metrics_domain" configuration option. This patch makes use of this domain to make scylla metrics sane. The domain as this patch defines it includes two strings: First, the datacenter the server lives in. This is because grouping metrics for connections to different datacenters makes little sense for several reasons. For example -- packet delays _will_ differ for local-DC vs cross-DC traffic and mixing those latencies together is pointless. Another example -- the amount of traffic may also differ for local- vs cross-DC connections e.g. because of different usage of enryption and/or compression. Second, each verb-idx gets its own domain. That's to be able to analyze e.g. query-related traffic from gossiper one. For that the existing isolation cookie is taken as is. Note, that the metrics is _not_ per-server node. So e.g. two gossiper connections to two different nodes (in one DC) will belong to the same domain and thus their stats will be summed when reported. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15785	2023-11-13 11:13:20 +01:00
Kefu Chai	efd65aebb2	build: cmake: add check-header target to have feature parity with `configure.py`. we won't need this once we migrate to C++20 modules. but before that day comes, we need to stick with C++ headers. we generate a rule for each .hh files to create a corresponding .cc and then compile it, in order to verify the self-containness of that header. so the number of rule is quite large, to avoid the unnecessary overhead. the check-header target is enabled only if `Scylla_CHECK_HEADERS` option is enabled. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15913	2023-11-13 10:27:06 +02:00
Avi Kivity	7b08886e8d	Update tools/java submodule (dependencies update) * tools/java 86a200e324...97c490947c (1): > Merge 'build: update several dependencies' from Piotr Grabowski Ref https://github.com/scylladb/scylla-tools-java/issues/348 Ref https://github.com/scylladb/scylla-tools-java/issues/349 Ref https://github.com/scylladb/scylla-tools-java/issues/350	2023-11-12 18:17:04 +02:00
Tomasz Grabiec	457d170078	Merge 'Multishard mutation query test fix misses expectations' from Botond Dénes There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many shards we have without readers on them. Fixes: https://github.com/scylladb/scylladb/issues/14087 Closes scylladb/scylladb#15806 * github.com:scylladb/scylladb: test/boost/multishard_mutation_query_test: fix querier cache misses expectations test/lib/test_utils: add require_* variants for all comparators	2023-11-12 13:15:29 +01:00
Benny Halevy	68a7bbe582	compaction_manager: perform_cleanup: ignore condition_variable_timed_out The polling loop was intended to ignore `condition_variable_timed_out` and check for progress using a longer `max_idle_duration` timeout in the loop. Fixes #15669 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15671	2023-11-12 13:53:51 +02:00
Patryk Jędrzejczak	2d7bfeb3fa	raft topology: fix indentation Broken in the previous commit.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	e94c7cff28	raft topology: join: try sending the response only once When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. In the previous commit, we have made the operator responsible for shutting down the joining node if the topology coordinator fails to deliver a response by removing the timeout. In this commit, we adjust the topology coordinator. We make it try sending the response (both acceptance and rejection) only once since we do not care if it fails anymore. We only need to ensure that the joining node is moved to the left state if sending fails.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	4ffa692cb3	raft topology: join: do not time out waiting for the node to be joined When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. The response is not guaranteed to come back. If the topology coordinator cannot contact the joining node, it moves the node to the left state and moves on. Currently, to handle the case when the response does not come back, the joining node gives up waiting for it after 3 minutes. However, it might take more time for the topology coordinator to start handling the request to join, as it might be working on other tasks like adding other nodes, performing tablet migrations, etc. In general, any timeout duration would be unreliable. Therefore, we get rid of the timeout. From now on, the operator will be responsible for shutting down the node if the topology coordinator fails to deliver the rejection. This change additionally fixes the TODO in raft_group0::join_group0.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	5f36e1d7f2	group 0: group0_handshaker: add the abort_source parameter to post_server_start Used in the following commit to enable the clean shutdown of a node that does not receive the join rejection from the topology coordinator.	2023-11-10 12:35:38 +01:00
Anna Stuchlik	8d618bbfc6	doc: update cqlsh compatibility with Python This commit updates the cqlsh compatibility with Python to Python 3. In addition it: - Replaces "Cassandra" with "ScyllaDB" in the description of cqlsh. The previous description was outdated, as we no longer can talk about using cqlsh released with Cassandra. - Replaces occurrences of "Scylla" with "ScyllaDB". - Adds additional locations of cqlsh (Docker Hub and PyPI), as well as the link to the scylla-cqlsh repository. Closes scylladb/scylladb#16016	2023-11-10 09:19:41 +02:00
Avi Kivity	d8bf8f0f43	Merge 'Do not create directories in datadir for S3-backed sstables' from Pavel Emelyanov After `146e49d0dd` (Rewrap keyspace population loop) the datadir layout is no longer needed by sstables boot-time loader and finally directories can be omitted for S3-backed keyspaces. Tables of that keyspace don't touch/remove their datadirs either (snapshots still don't work for S3) fixes: #13020 Closes scylladb/scylladb#16007 * github.com:scylladb/scylladb: test/object_store: Check that keyspace directory doesn't appear sstables/storage: Do storage init/destroy based on storage options replica/{ks\|cf}: Move storage init/destroy to sstables manager database: Add get_sstables_manager(bool_class is_system) method	2023-11-09 20:35:13 +02:00
Kamil Braun	3bcee6a981	Revert "Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani" This reverts commit `7c7baf71d5`. If `stop_gracefully` times out during test teardown phase, it crashes the test framework reporting multiple errors, for example: ``` 12:35:52 /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited 12:35:52 self.exit_artifacts = {} 12:35:52 RuntimeWarning: Enable tracemalloc to get the object allocation traceback 12:35:52 Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s 12:35:52 Traceback (most recent call last): 12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for 12:35:52 return fut.result() 12:35:52 ^^^^^^^^^^^^ 12:35:52 File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait 12:35:52 return await self._transport._wait() 12:35:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 12:35:52 File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait 12:35:52 return await waiter 12:35:52 ^^^^^^^^^^^^ 12:35:52 asyncio.exceptions.CancelledError 12:35:52 12:35:52 The above exception was the direct cause of the following exception: 12:35:52 12:35:52 Traceback (most recent call last): 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully 12:35:52 await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS) 12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for 12:35:52 raise exceptions.TimeoutError() from exc 12:35:52 TimeoutError 12:35:52 12:35:52 During handling of the above exception, another exception occurred: 12:35:52 12:35:52 Traceback (most recent call last): 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789 12:35:52 code = await main() 12:35:52 ^^^^^^^^^^^^ 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main 12:35:52 await run_all_tests(signaled, options) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests 12:35:52 await reap(done, pending, signaled) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap 12:35:52 result = coro.result() 12:35:52 ^^^^^^^^^^^^^ 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run 12:35:52 await test.run(options) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run 12:35:52 async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager: 12:35:52 File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__ 12:35:52 await anext(self.gen) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager 12:35:52 await manager.stop() 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop 12:35:52 await self.clusters.put(self.cluster, is_dirty=True) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put 12:35:52 await self.destroy(obj) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster 12:35:52 await cluster.stop_gracefully() 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully 12:35:52 await asyncio.gather(*(server.stop_gracefully() for server in self.running.values())) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully 12:35:52 raise RuntimeError( 12:35:52 RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s 12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited 12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited ```	2023-11-09 12:30:35 +01:00
Gleb Natapov	2dd8152c8b	storage_service: topology coordinator: log rollback event before changing node's state The test for the rollback relies on the log to be there after operation fails, but if node's state is changed before the log the operation may fail before the log is printed. Fixes scylladb/scylladb#15980 Message-ID: <ZUuwoq65SJcS+yTH@scylladb.com>	2023-11-09 12:11:58 +01:00
Botond Dénes	d8b6771eb8	Merge 'doc: add CQL Reference for Materialized Views and remove irrelevant version information' from Anna Stuchlik This PR is a follow-up to https://github.com/scylladb/scylladb/pull/15742#issuecomment-1766888218. It adds CQL Reference for Materialized Views to the Materialized Views page. In addition, it removes the irrelevant information about when the feature was added and replaces "Scylla" with "ScyllaDB". (nobackport) Closes scylladb/scylladb#15855 * github.com:scylladb/scylladb: doc: remove versions from Materialized Views doc: add CQL Reference for Materialized Views	2023-11-09 10:43:11 +01:00
Botond Dénes	1cccc86813	Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk" This reverts commit `2860d43309`, reversing changes made to `a3621dbd3e`. Reverting because rest_api.test_compaction_task started failing after this was merged. Fixes: #16005	2023-11-09 10:43:11 +01:00
Eliran Sinvani	c5956957f3	use_statement: Covert an exception to a future exception The use statement execution code can throw if the keyspace is doesn't exist, this can be a problem for code that will use execute in a fiber since the exception will break the fiber even if `then_wrapped` is used. Fixes #14449 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes scylladb/scylladb#14394	2023-11-09 10:43:11 +01:00
Pavel Emelyanov	7e1017c7d8	test/object_store: Check that keyspace directory doesn't appear When creating a S3-backed keyspace its storage dir shouldn't be made. Also it shouldn't be "resurrected" by boot-time loader of existing keyspaces. For extra confidence check that the system keyspace's directory does exists where the test expects keyspaces' directories to appear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	f6eae191ff	sstables/storage: Do storage init/destroy based on storage options It's only local storage type that needs directores touch/remove, S3 storage initialization is for now a no-op, maybe some day soon it will appear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	11b704e8b8	replica/{ks\|cf}: Move storage init/destroy to sstables manager It's the manager that knows about storages and it should init/destroy it. Also the "upload" and "staging" paths are about to be hidden in sstables/ code, this code move also facilitates that. The indentation in storage.cc is deliberately broken to make next patch look nicer (spoiler: it won't have to shift those lines right). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	68cf26587c	database: Add get_sstables_manager(bool_class is_system) method There's one place that does this selection, soon there will appear another, so it's worth having a convenience helper getter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Nadav Har'El	6453f41ca9	Merge 'schema: add whitespaces to values of table options' from Michał Jadwiszczak Add a space after each colon and comma (if they don't have any after) in values of table option which are json objects (`caching`, `tombstone_gc` and `cdc`). This improves readability and matches client-side describe format. Fixes: #14895 Closes scylladb/scylladb#15900 * github.com:scylladb/scylladb: cql-pytest:test_describe: add test for whitespaces in json objects schema: add whitespace to description of table options	2023-11-08 15:26:49 +02:00
Anna Stuchlik	ca0f5f39b5	doc: fix info about in 5.4 upgrade guide This commit fixes the information about Raft-based consistent cluster management in the 5.2-to-5.4 upgrade guide. This a follow-up to https://github.com/scylladb/scylladb/pull/15880 and must be backported to branch-5.4. In addition, it adds information about removing DateTieredCompactionStrategy to the 5.2-to-5.4 upgrade guide, including the guideline to migrate to TimeWindowCompactionStrategy. Closes scylladb/scylladb#15988	2023-11-08 13:21:53 +01:00
Kamil Braun	3036a80334	docs: mention Raft getting enabled when upgrading to 5.4 Fixes: scylladb/scylladb#15952 Closes scylladb/scylladb#16000	2023-11-08 14:18:29 +02:00
Kamil Braun	f094e23d84	system_keyspace: use system memory for `system.raft` table `system.raft` was using the "user memory pool", i.e. the `dirty_memory_manager` for this table was set to `database::_dirty_memory_manager` (instead of `database::_system_dirty_memory_manager`). This meant that if a write workload caused memory pressure on the user memory pool, internal `system.raft` writes would have to wait for memtables of user tables to get flushed before the write would proceed. This was observed in SCT longevity tests which ran a heavy workload on the cluster and concurrently, schema changes (which underneath use the `system.raft` table). Raft would often get stuck waiting many seconds for user memtables to get flushed. More details in issue #15622. Experiments showed that moving Raft to system memory fixed this particular issue, bringing the waits to reasonable levels. Currently `system.raft` stores only one group, group 0, which is internally used for cluster metadata operations (schema and topology changes) -- so it makes sense to keep use system memory. In the future we'd like to have other groups, for strongly consistent tables. These groups should use the user memory pool. It means we won't be able to use `system.raft` for them -- we'll just have to use a separate table. Fixes: scylladb/scylladb#15622 Closes scylladb/scylladb#15972	2023-11-08 11:21:14 +02:00
Nadav Har'El	284534f489	Merge 'Nodetool additional commands 4/N' from Botond Dénes This PR implements the following new nodetool commands: * snapshot * drain * flush * disableautocompaction * enableautocompaction All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#15939 * github.com:scylladb/scylladb: test/nodetool: add README.md tools/scylla-nodetool: implement enableautocompaction command tools/scylla-nodetool: implement disableautocompaction command tools/scylla-nodetool: implement the flush command tools/scylla-nodetool: extract keyspace/table parsing tools/scylla-nodetool: implement the drain command tools/scylla-nodetool: implement the snapshot command test/nodetool: add support for matching aproximate query parameters utils/http: make dns_connection_factory::initialize() static	2023-11-08 11:18:35 +02:00
Kefu Chai	cf70970226	build: cmake: use $<CONFIG:cfgs> when appropriate since CMake 3.19, we are able to use $<CONFIG:cfgs> instead of the more cubersume $<IN_LIST:$<CONFIG>,foo;bar> expression for checking if a config is in a list of configurations. and since the minimal required CMake of scylla is 3.27, so let's use $<CONFIG:cfgs> when possible. see also https://cmake.org/cmake/help/git-stage/manual/cmake-generator-expressions.7.html#configuration-expressions Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15989	2023-11-08 08:50:44 +02:00
Nadav Har'El	3729ea8bfd	cql-pytest: translate Cassandra's test for CREATE operations This is a translation of Cassandra's CQL unit test source file validation/operations/CreateTest.java into our cql-pytest framework. The 15 tests did not reproduce any previously-unknown bug, but did provide additional reproducers for several known issues: Refs #6442: Always print all schema parameters (including default values) Refs #8001: Documented unit "µs" not supported for assigning a duration" type. Refs #8892: Add an option for default RF for new keyspaces. Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Unfortunately, I also had to comment out - and not translate - several tests which weren't real "CQL tests" (tests that use only the CQL driver), and instead relied on Cassandra's Java implementation details: 1. Tests for CREATE TRIGGER were commented out because testing them in Cassandra requires adding a Java class for the test. We're also not likely to ever add this feature to Scylla (Refs #2205). 2. Similarly, tests for CEP-11 (Pluggable memtable implementations) used internal Java APIs instead of CQL, and it also unlikely we'll ever implement it in a way compatible with Cassandra because of its Java reliance. 3. One test for data center names used internal Cassandra Java APIs, not CQL to create mock data centers and snitches. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#15791	2023-11-08 08:46:27 +02:00
Botond Dénes	2860d43309	Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#15083 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-08 08:45:16 +02:00
Nadav Har'El	a3621dbd3e	Merge 'Alternator: Support new ReturnValuesOnConditionCheckFailure feature' from Marcin Maliszkiewicz alternator: add support for ReturnValuesOnConditionCheckFailure feature As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed. Fixes https://github.com/scylladb/scylladb/issues/14481 Closes scylladb/scylladb#15125 * github.com:scylladb/scylladb: alternator: add support for ReturnValuesOnConditionCheckFailure feature alternator: add ability to send additional fields in api_error	2023-11-07 23:19:51 +02:00
Takuya ASADA	a4aeef2eb0	scylla_util.py: run apt-get update before apt-get install if it necessary Unlike yum, "apt-get install" may fails because package cache is outdated. Let's check package cache mtime and run "apt-get update" if it's too old. Fixes #4059 Closes scylladb/scylladb#15960	2023-11-07 20:40:16 +02:00
Wojciech Mitros	ab743271f1	test: increase timeout for lua UDF execution When running on a particularly slow setup, for example on an ARM machine in debug mode, the execution time of even a small Lua UDF that we're using in tests may exceed our default limits. To avoid timeout errors, the limit in tests is now increased to a value that won't be exceeded in any reasonable scenario (for the current set of tested UDFs), while not making the test take an excessive amount of time in case of an error in the UDF execution. Fixes #15977 Closes scylladb/scylladb#15983	2023-11-07 20:28:28 +02:00
Kamil Braun	07e9522d6c	Merge 'raft topology: handle abort exceptions better in fence_previous_coordinator' from Piotr Dulikowski When topology coordinator tries to fence the previous coordinator it performs a group0 operation. The current topology coordinator might be aborted in the meantime, which will result in a `raft::request_aborted` exception being thrown. After the fix to scylladb/scylladb#15728 was merged, the exception is caught, but then `sleep_abortable` is called which immediately throws `abort_requested_exception` as it uses the same abort source as the group0 operation. The `fence_previous_coordinator` function which does all those things is not supposed to throw exceptions, if it does - it causes `raft_state_monitor_fiber` to exit, completely disabling the topology coordinator functionality on that node. Modify the code in the following way: - Catch `abort_requested_exception` thrown from `sleep_abortable` and exit the function if it happens. In addition to the described issue, it will also handle the case when abort is requested while `sleep_abortable` happens, - Catch `raft::request_aborted` thrown from group0 operation, log the exception with lower verbosity and exit the function explicitly. Finally, wrap both `fence_previous_coordinator` and `run` functions in a `try` block with `on_fatal_internal_error` in the catch handler in order to implement the behavior that adding `noexcept` was originally supposed to introduce. Fixes: scylladb/scylladb#15747 Closes scylladb/scylladb#15948 * github.com:scylladb/scylladb: raft topology: catch and abort on exceptions from topology_coordinator::run Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept" raft topology: don't print an error when fencing previous coordinator is aborted raft topology: handle abort exceptions from sleeping in fence_previous_coordinator	2023-11-07 17:17:49 +01:00
Botond Dénes	60ea940f9e	Merge 'docs: render options with role' from Kefu Chai this series tries to 1. render options with role. so the options can be cross referenced and defined. 2. move the formatting out of the content. so the representation can be defined in a more flexible way. Closes scylladb/scylladb#15860 * github.com:scylladb/scylladb: docs: add divider using CSS docs: extract _clean_description as a filter docs: render option with role docs: parse source files right into rst	2023-11-07 17:01:22 +02:00
Botond Dénes	3088453a09	test/nodetool: add README.md	2023-11-07 09:49:56 -05:00
Botond Dénes	7ff7cdc86a	tools/scylla-nodetool: implement enableautocompaction command	2023-11-07 09:49:56 -05:00
Botond Dénes	0e0401a5c5	tools/scylla-nodetool: implement disableautocompaction command	2023-11-07 09:49:56 -05:00
Botond Dénes	f5083f66f5	tools/scylla-nodetool: implement the flush command	2023-11-07 09:49:56 -05:00
Botond Dénes	f082cc8273	tools/scylla-nodetool: extract keyspace/table parsing Having to extract 1 keyspace and N tables from the command-line is proving to be a common pattern among commands. Extract this into a method, so the boiler-plate can be shared. Add a forward-looking overload as well, which will be used in the next patch.	2023-11-07 09:49:56 -05:00

1 2 3 4 5 ...

39757 Commits