scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Kamil Braun	2ebac52d2d	test/pylib: scylla_cluster: return error details from test framework endpoints If an endpoint handler throws an exception, the details of the exception are not returned to the client. Normally this is desirable so that information is not leaked, but in this test framework we do want to return the details to the client so it can log a useful error message. Do it by wrapping every handler into a catch clause that returns the exception message. Also modify a bit how HTTPErrors are rendered so it's easier to discern the actual body of the error from other details (such as the params used to make the request etc.) Before: ``` E test.pylib.rest_client.HTTPError: HTTP error 500: 500 Internal Server Error E E Server got itself in trouble, params None, json None, uri http+unix://api/cluster/before-test/test_stuff ``` After: ``` E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/before-test/test_stuff, params: None, json: None, body: E Failed to start server at host 127.155.129.1. E Check the log files: E /home/kbraun/dev/scylladb/testlog/test.py.dev.log E /home/kbraun/dev/scylladb/testlog/dev/scylla-1.log ``` Closes #12563 (cherry picked from commit `2f84e820fd`)	2023-02-07 17:04:37 +01:00
Kamil Braun	b536614913	test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager When we obtained a new cluster for a test case after the previous test case left a dirty cluster, we would release the old cluster's used IP addresses (`_before_test` function). However, we would not release the last cluster's IP after the last test case. We would run out of IPs with sufficiently many test files or `--repeat` runs. Fix this. Also reorder the operations a bit: stop the cluster (and release its IPs) before freeing up space in the cluster pool (i.e. call `self.cluster.stop()` before `self.clusters.steal()`). This reduces concurrency a bit - fewer Scyllas running at the same time, which is good (the pool size gives a limit on the desired max number of concurrently running clusters). Killing a cluster is quick so it won't make a significant difference for the next guy waiting on the pool. Closes #12564 (cherry picked from commit `3ed3966f13`)	2023-02-07 17:04:19 +01:00
Kamil Braun	85df0fd2b1	test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot If a cluster fails to boot, it saves the exception in `self.start_exception` variable; the exception will be rethrown when a test tries to start using this cluster. As explained in `before_test`: ``` def before_test(self, name) -> None: """Check that the cluster is ready for a test. If there was a start error, throw it here - the server is running when it's added to the pool, which can't be attributed to any specific test, throwing it here would stop a specific test.""" ``` It's arguable whether we should blame some random test for a failure that it didn't cause, but nevertheless, there's a problem here: the `start_exception` will be rethrown and the test will fail, but then the cluster will be simply returned to the pool and the next test will attempt to use it... and so on. Prevent this by marking the cluster as dirty the first time we rethrow the exception. Closes #12560 (cherry picked from commit `147dd73996`)	2023-02-07 17:03:56 +01:00
Avi Kivity	cdf9fe7023	test: disable commitlog O_DSYNC, preallocation Commitlog O_DSYNC is intended to make Raft and schema writes durable in the face of power loss. To make O_DSYNC performant, we preallocate the commitlog segments, so that the commitlog writes only change file data and not file metadata (which would require the filesystem to commit its own log). However, in tests, this causes each ScyllaDB instance to write 384MB of commitlog segments. This overloads the disks and slows everything down. Fix this by disabling O_DSYNC (and therefore preallocation) during the tests. They can't survive power loss, and run with --unsafe-bypass-fsync anyway. Closes #12542 (cherry picked from commit `9029b8dead`)	2023-02-07 17:02:59 +01:00
Beni Peled	8ff4717fd0	release: prepare for 5.2.0-rc1 scylla-5.2.0-rc1	2023-02-06 22:13:53 +02:00
Kamil Braun	291b1f6e7f	service/raft: raft_group0: prevent double abort There was a small chance that we called `timeout_src.request_abort()` twice in the `with_timeout` function, first by timeout and then by shutdown. `abort_source` fails on an assertion in this case. Fix this. Fixes: #12512 Closes #12514 (cherry picked from commit `54170749b8`)	2023-02-05 18:31:50 +02:00
Kefu Chai	b2699743cc	db: system_keyspace: take the reserved_memory into account before this change, we returns the total memory managed by Seastar in the "total" field in system.memory. but this value only reflect the total memory managed by Seastar's allocator. if `reserve_additional_memory` is set when starting app_template, Seastar's memory subsystem just reserves a chunk of memory of this specified size for system, and takes the remaining memory. since `f05d612da8`, we set this value to 50MB for wasmtime runtime. hence the test of `TestRuntimeInfoTable.test_default_content` in dtest fails. the test expects the size passed via the option of `--memory` to be identical to the value reported by system.memory's "total" field. after this change, the "total" field takes the reserved memory for wasm udf into account. the "total" field should reflect the total size of memory used by Scylla, no matter how we use a certain portion of the allocated memory. Fixes #12522 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12573 (cherry picked from commit `4a0134a097`)	2023-02-05 18:30:05 +02:00
Botond Dénes	50ae73a4bd	types: is_tuple(): handle reverse types Currently reverse types match the default case (false), even though they might be wrapping a tuple type. One user-visible effect of this is that a schema, which has a reversed<frozen<UDT>> clustering key component, will have this component incorrectly represented in the schema cql dump: the UDT will loose the frozen attribute. When attempting to recreate this schema based on the dump, it will fail as the only frozen UDTs are allowed in primary key components. Fixes: #12576 Closes #12579 (cherry picked from commit `ebc100f74f`)	2023-02-05 18:20:21 +02:00
Calle Wilund	c3dd4a2b87	alterator::streams: Sort tables in list_streams to ensure no duplicates Fixes #12601 (maybe?) Sort the set of tables on ID. This should ensure we never generate duplicates in a paged listing here. Can obviously miss things if they are added between paged calls and end up with a "smaller" UUID/ARN, but that is to be expected. (cherry picked from commit `da8adb4d26`)	2023-02-05 17:44:00 +02:00
Benny Halevy	0f9fe61d91	view: row_lock: lock_ck: find or construct row_lock under partition lock Since we're potentially searching the row_lock in parallel to acquiring the read_lock on the partition, we're racing with row_locker::unlock that may erase the _row_locks entry for the same clustering key, since there is no lock to protect it up until the partition lock has been acquired and the lock_partition future is resolved. This change moves the code to search for or allocate the row lock _after_ the partition lock has been acquired to make sure we're synchronously starting the read/write lock function on it, without yielding, to prevent this use-after-free. This adds an allocation for copying the clustering key in advance even if a row_lock entry already exists, that wasn't needed before. It only us slows down (a bit) when there is contention and the lock already existed when we want to go locking. In the fast path there is no contention and then the code already had to create the lock and copy the key. In any case, the penalty of copying the key once is tiny compared to the rest of the work that view updates are doing. This is required on top of `5007ded2c1` as seen in https://github.com/scylladb/scylladb/issues/12632 which is closely related to #12168 but demonstrates a different race causing use-after-free. Fixes #12632 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `4b5e324ecb`)	2023-02-05 17:22:31 +02:00
Anna Stuchlik	59d30ff241	docs: fixes https://github.com/scylladb/scylladb/issues/12654 , update the links to the Download Center Closes #12655 (cherry picked from commit `64cc4c8515`)	2023-02-05 17:19:56 +02:00
Anna Stuchlik	fb82dff89e	doc: fixes https://github.com/scylladb/scylladb/issues/12672 , fix the redirects to the Cloud docs Closes #12673 (cherry picked from commit `2be131da83`)	2023-02-05 17:17:35 +02:00
Kefu Chai	b588b19620	cql3/selection: construct string_view using char* not size before this change, we construct a sstring from a comma statement, which evaluates to the return value of `name.size()`, but what we expect is `sstring(const char, size_t)`. in this change instead of passing the size of the string_view, both its address and size are used * `std::string_view` is constructed instead of sstring, for better performance, as we don't need to perform a deep copy the issue is reported by GCC-13: ``` In file included from cql3/selection/selectable.cc:11: cql3/selection/field_selector.hh:83:60: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result] auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size())); ^~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12666 (cherry picked from commit `186ceea009`) Fixes #12739.	2023-02-05 13:50:48 +02:00
Michał Chojnowski	608ef92a71	commitlog: fix total_size_on_disk accounting after segment file removal Currently, segment file removal first calls `f.remove_file()` and does `total_size_on_disk -= f.known_size()` later. However, `remove_file()` resets `known_size` to 0, so in effect the freed space in not accounted for. `total_size_on_disk` is not just a metric. It is also responsible for deciding whether a segment should be recycled -- it is recycled only if `total_size_on_disk - known_size < max_disk_size`. Therefore this bug has dire performance consequences: if `total_size_on_disk - known_size` ever exceeds `max_disk_size`, the recycling of commitlog segments will stop permanently, because `total_size_on_disk - known_size` will never go back below `max_disk_size` due to the accounting bug. All new segments from this point will be allocated from scratch. The bug was uncovered by a QA performance test. It isn't easy to trigger -- it took the test 7 hours of constant high load to step into it. However, the fact that the effect is permanent, and degrades the performance of the cluster silently, makes the bug potentially quite severe. The bug can be easily spotted with Prometheus as infinitely rising `commitlog_total_size_on_disk` on the affected shards. Fixes #12645 Closes #12646 (cherry picked from commit `fa7e904cd6`)	2023-02-01 21:54:37 +02:00
Kamil Braun	d2732b2663	Merge 'Enable Raft by default in new clusters' from Kamil Braun New clusters that use a fresh conf/scylla.yaml will have `consistent_cluster_management: true`, which will enable Raft, unless the user explicitly turns it off before booting the cluster. People using existing yaml files will continue without Raft, unless consistent_cluster_management is explicitly requested during/after upgrade. Also update the docs: cluster creation and node addition procedures. Fixes #12572. Closes #12585 * github.com:scylladb/scylladb: docs: mention `consistent_cluster_management` for creating cluster and adding node procedures conf: enable `consistent_cluster_management` by default (cherry picked from commit `5c886e59de`)	2023-01-26 12:21:55 +01:00
Anna Mikhlin	34ab98e1be	release: prepare for 5.2.0-rc0 scylla-5.2.0-rc0	2023-01-18 14:54:36 +02:00
Tomasz Grabiec	563998b69a	Merge 'raft: improve group 0 reconfiguration failure handling' from Kamil Braun Make it so that failures in `removenode`/`decommission` don't lead to reduced availability, and any leftovers in group 0 can be removed by `removenode`: - In `removenode`, make the node a non-voter before removing it from the token ring. This removes the possibility of having a group 0 voting member which doesn't correspond to a token ring member. We can still be left with a non-voter, but that's doesn't reduce the availability of group 0. - As above but for `decommission`. - Make it possible to remove group 0 members that don't correspond to token ring members from group 0 using `removenode`. - Add an API to query the current group 0 configuration. Fixes #11723. Closes #12502 * github.com:scylladb/scylladb: test: test_topology: test for removing garbage group 0 members test/pylib: move some utility functions to util.py db: system_keyspace: add a virtual table with raft configuration db: system_keyspace: improve system.raft_snapshot_config schema service: storage_service: better error handling in `decommission` service: storage_service: fix indentation in removenode service: storage_service: make `removenode` work for group 0 members which are not token ring members service/raft: raft_group0: perform read_barrier in wait_for_raft service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove service/raft: raft_group0: link to Raft docs where appropriate service/raft: raft_group0: more logging service/raft: raft_group0: separate function for checking and waiting for Raft	2023-01-17 21:23:15 +01:00
Kamil Braun	d134c458e5	test/pylib: increase timeout when waiting for cluster before test Increase the timeout from default 5 minutes to 10 minutes. Sent as a workaround for #12546 to unblock next promotions. Closes #12547	2023-01-17 21:03:09 +02:00
Kamil Braun	4f1c317bdc	test: test_raft_upgrade: stop servers gracefully in test_recovery_after_majority_loss This test is frequently failing due to a timeout when we try to restart one of the nodes. The shutdown procedure apparently hangs when we try to stop the `hints_manager` service, e.g.: ``` INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Stopped INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped INFO 2023-01-13 03:22:56,997 [shard 0] hints_manager - Stopped ``` observe the 5 minute delay at the end. There is a known issue about `hints_manager` stop hanging: #8079. Now, for some reason, this is the only test case that is hitting this issue. We don't completely understand why. There is one significant difference between this test case and others: this is the only test case which kills 2 (out of 3) servers in the cluster and then tries to gracefully shutdown the last server. There's a hypothesis that the last server gets stuck trying to send hints to the killed servers. We weren't able to prove/falsify it yet. But if it's true, then this patch will: - unblock next promotions, - give us some important information when we see that the issue stops appearing. In the patch we shutdown all servers gracefully instead of killing them, like we do in the other test cases. Closes #12548	2023-01-17 20:51:09 +02:00
Pavel Emelyanov	4f415413d2	raft: Fix non-existing state_machine::apply_entry in docs The docs mention that method, but it doesn't exist. Instead, the state_machine interface defines plain .apply() one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12541	2023-01-17 12:53:05 +01:00
Kamil Braun	5545547d07	test: test_topology: test for removing garbage group 0 members Verify that `removenode` can remove group 0 members which are not token ring members.	2023-01-17 12:28:00 +01:00
Kamil Braun	c959ec455a	test/pylib: move some utility functions to util.py They were used in test_raft_upgrade, but we want to use them in other test files too.	2023-01-17 12:28:00 +01:00
Kamil Braun	a483915c62	db: system_keyspace: add a virtual table with raft configuration Add a new virtual table `system.raft_state` that shows the currently operating Raft configuration for each present group. The schema is the same as `system.raft_snapshot_config` (the latter shows the config from the last snapshot). In the future we plan to add more columns to this table, showing more information (like the current leader and term), hence the generic name. Adding the table requires some plumbing of `sharded<raft_group_registry>&` through function parameters to make it accessible from `register_virtual_tables`, but it's mostly straightforward. Also added some APIs to `raft_group_registry` to list all groups and find a given group (returning `nullptr` if one isn't found, not throwing an exception).	2023-01-17 12:28:00 +01:00
Kamil Braun	2bfe85ce9b	db: system_keyspace: improve system.raft_snapshot_config schema Remove the `ip_addr` column which was not used. IP addresses are not part of Raft configuration now and they can change dynamically. Swap the `server_id` and `disposition` columns in the clustering key, so when querying the configuration, we first obtain all servers with the current disposition and then all servers with the previous disposition (note that a server may appear both in current and previous).	2023-01-17 12:28:00 +01:00
Kamil Braun	c3ed82e5fb	service: storage_service: better error handling in `decommission` Improve the error handling in `decommission` in case `leave_group0` fails, informing the user what they should do (i.e. call `removenode` to get rid of the group 0 member), and allowing decommission to finish; it does not make sense to let the node continue to run after it leaves the token ring. (And I'm guessing it's also not safe. Or maybe impossible.)	2023-01-17 12:28:00 +01:00
Kamil Braun	beb0eee007	service: storage_service: fix indentation in removenode	2023-01-17 12:28:00 +01:00
Kamil Braun	aba33dd352	service: storage_service: make `removenode` work for group 0 members which are not token ring members Due to failures we might end up in a situation where we have a group 0 member which is not a token ring member: a decommission/removenode which failed after leaving/removing a node from the token ring but before leaving / removing a node from group 0. There was no way to get rid of such a group 0 member. A node that left the token ring must not be allowed to run further (or it can cause data loss, data resurrection and maybe other fun stuff), so we can't run decommission a second time (even if we tried, it would just say that "we're not a member of the token ring" and abort). And `removenode` would also not work, because it proceeds only if the node requested to be removed is a member of the token ring. We modify `removenode` so it can run in this situation and remove the group 0 member. The parts of `removenode` related to token ring modification are now conditioned on whether the node was a member of the token ring. The final `remove_from_group0` step is in its own branch. Some minor refactors were necessary. Some log messages were also modified so it's easier to understand which messages correspond the "token movement" part of the procedure. The `make_nonvoter` step happens only if token ring removal happens, otherwise we can skip directly to `remove_from_group0`. We also move `remove_from_group0` outside the "try...catch", fixing #11723. The "node ops" part of the procedure is related strictly to token ring movement, so it makes sense for `remove_from_group0` to happen outside. Indentation is broken in this commit for easier reviewability, fixed in the following commit. Fixes: #11723	2023-01-17 12:28:00 +01:00
Kamil Braun	ec2cd29e42	service/raft: raft_group0: perform read_barrier in wait_for_raft Right now wait_for_raft is called before performing group 0 configuration changes. We want to also call it before checking for membership, for that it's desirable to have the most recent information, hence call read_barrier. In the existing use cases it's not strictly necessary, but it doesn't hurt.	2023-01-17 12:28:00 +01:00
Kamil Braun	db734cd74f	service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode removenode currently works roughly like this: 1. stream/repair data so it ends up on new replica sets (calculated without the node we want to remove) 2. remove the node from the token ring 3. remove the node from group 0 configuration. If the procedure fails before after step 2 but before step 3 finishes, we're in trouble: the cluster is left with an additional voting group 0 member, which reduces group 0's availability, and there is no way to remove this member because `removenode` no longer considers it to be part of the cluster (it consults the token ring to decide). Improve this failure scenario by including a new step at the beginning: make the node a non-voter in group 0 configuration. Then, even if we fail after removing the node from the token ring but before removing it from group 0, we'll only be left with a non-voter which doesn't reduce availability. We make a similar change for `decommission`: between `unbootstrap()` (which streams data) and `leave_ring()` (which removes our tokens from the ring), become a non-voter. The difference here is that we don't become a non-voter at the beginning, but only after streaming/repair. In `removenode` it's desirable to make the node a non-voter as soon as possible because it's already dead. In decommission it may be desirable for us to remain a voter if we fail during streaming because we're still alive and functional in that case. In a later commit we'll also make it possible to retry `removenode` to remove a node that is only a group 0 member and not a token ring member.	2023-01-17 12:28:00 +01:00
Kamil Braun	1eee349a17	test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove The test would create a scenario where one node was down while the others started the Raft upgrade procedure. The procedure would get stuck, but it was possible to `removenode` the downed node using one of the alive nodes, which would unblock the Raft upgrade procedure. This worked because: 1. the upgrade procedure starts by ensuring that all peers can be contacted, 2. `removenode` starts by removing the node from the token ring. After removing the node from the token ring, the upgrade procedure becomes able to contact all peers (the peers set no longer contains the down node). At the end, after removing the node from the token ring, `removenode` would actually get stuck for a while, waiting for the upgrade procedure to finish before removing the peer from group 0. After the upgrade procedure finished, `removenode` would also finish. (so: first the upgrade procedure waited for removenode, then removenode waited for the upgrade procedure). We want to modify the `removenode` procedure and include a new step before removing the node from the token ring: making the node a non-voter. The purpose is to improve the possible failure scenarios. Previously, if the `removenode` procedure failed after removing the node from the token ring but before removing it from group 0, the cluster would contain a 'garbage' group 0 member which is a voter - reducing group 0's availability. If the node is made a non-voter first, then this failure will not be as big of a problem, because the leftover group 0 member will be a non-voter. However, to correctly perform group 0 operations including making someone a nonvoter, we must first wait for the Raft upgrade procedure to finish (or at least wait until everyone joins group 0). Therefore by including this 'make the node a non-voter' step at the beginning of `removenode`, we make it impossible to remove a token ring member in the middle of the upgrade procedure, on which the test case relied. The test case would get stuck waiting for the `removenode` operation to finish, which would never finish because it would wait for the upgrade procedure to finish, which would not finish because of the dead peer. We remove the test case; it was "lucky" to pass in the first place. We have a dedicated mechanism for handling dead peers during Raft upgrade procedure: the manual Raft group 0 RECOVERY procedure. There are other test cases in this file which are using that procedure.	2023-01-17 12:28:00 +01:00
Kamil Braun	4f0801406e	service/raft: raft_group0: link to Raft docs where appropriate Resolve some TODOs.	2023-01-17 12:28:00 +01:00
Kamil Braun	2befbaa341	service/raft: raft_group0: more logging Make the logs in leave_group0 consistent with logs in remove_from_group0.	2023-01-17 12:28:00 +01:00
Kamil Braun	77dc1c4c70	service/raft: raft_group0: separate function for checking and waiting for Raft leave_group0 and remove_from_group0 functions both start with the following steps: - if Raft is disabled or in RECOVERY mode, print a simple log message and abort - if Raft cluster feature flag is not yet enabled, print a complex log message and abort - wait for Raft upgrade procedure to finish - then perform the actual group 0 reconfiguration. Refactor these preparation steps to a separate function, `wait_for_raft`. This reduces code duplication; the function will also be used in more operations later (becoming a nonvoter or turning another server into a nonvoter). We also change the API so that the preparation function is called from outside by the caller before they call the reconfiguration function. This is because in later commits, some of the call sites (mainly `removenode`) will want to check explicitly whether Raft is enabled and wait for Raft's availabilty, then perform a sequence of steps related to group 0 configuration depending on the result. Also add a private function `raft_upgrade_complete()` which we use to assert that Raft is ready to be used.	2023-01-17 12:27:58 +01:00
Wojciech Mitros	5f45b32bfa	forward_service: prevent heap use-after-free of forward_aggregates Currently, we create `forward_aggregates` inside a function that returns the result of a future lambda that captures these aggregates by reference. As a result, the aggregates may be destructed before the lambda finishes, resulting in a heap use-after-free. To prolong the lifetime of these aggregates, we cannot use a move capture, because the lambda is wrapped in a with_thread_if_needed() call on these aggregates. Instead, we fix this by wrapping the entire return statement in a do_with(). Fixes #12528 Closes #12533	2023-01-17 13:25:57 +02:00
Gleb Natapov' via ScyllaDB development	15ebd59071	lwt: upgrade stored mutations to the latest schema during prepare Currently they are upgraded during learn on a replica. The are two problems with this. First the column mapping may not exist on a replica if it missed this particular schema (because it was down for instance) and the mapping history is not part of the schema. In this case "Failed to look up column mapping for schema version" will be thrown. Second lwt request coordinator may not have the schema for the mutation as well (because it was freed from the registry already) and when a replica tries to retrieve the schema from the coordinator the retrieval will fail causing the whole request to fail with "Schema version XXXX not found" Both of those problems can be fixed by upgrading stored mutations during prepare on a node it is stored at. To upgrade the mutation its column mapping is needed and it is guarantied that it will be present at the node the mutation is stored at since it is pre-request to store it that the corresponded schema is available. After that the mutation is processed using latest schema that will be available on all nodes. Fixes #10770 Message-Id: <Y7/ifraPJghCWTsq@scylladb.com>	2023-01-17 11:14:46 +01:00
Raphael S. Carvalho	f2f839b9cc	compaction: LCS: don't reshape all levels if only a single breaks disjointness LCS reshape is compacting all levels if a single one breaks disjointness. That's unnecessary work because rewriting that single level is enough to restore disjointness. If multiple levels break disjointness, they'll each be reshaped in its own iteration, so reducing operation time for each step and disk space requirement, as input files can be released incrementally. Incremental compaction is not applied to reshape yet, so we need to avoid "major compaction", to avoid the space overhead. But space overhead is not the only problem, the inefficiency, when deciding what to reshape when overlapping is detected, motivated this patch. Fixes #12495. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12496	2023-01-17 09:55:15 +02:00
Michał Chojnowski	9e17564c70	types: add some missing explicit instantiations Some functions defined by a template in types.cc are used in other translation units (via `cql3/untyped_result_set.hh`), but aren't explicitly instantiated. Therefore their linking can fail, depending on inlining decisions. (I experienced this when playing with compiler options). Fix that. Closes #12539	2023-01-17 10:46:01 +02:00
Nadav Har'El	5bf94ae220	cql: allow disabling of USING TIMESTAMP sanity checking As requested by issue #5619, commit `2150c0f7a2` added a sanity check for USING TIMESTAMP - the number specified in the timestamp must not be more than 3 days into the future (when viewed as a number of microseconds since the epoch). This sanity checking helps avoid some annoying client-side bugs and mis-configurations, but some users genuinely want to use arbitrary or futuristic-looking timestamps and are hindered by this sanity check (which Cassandra doesn't have, by the way). So in this patch we add a new configuration option, restrict_future_timestamp If set to "true", futuristic timestamps (more than 3 days into the future) are forbidden. The "true" setting is the default (as has been the case sinced #5619). Setting this option to "false" will allow using any 64-bit integer as a timestamp, like is allowed Cassanda (and was allowed in Scylla prior to #5619. The error message in the case where a futuristic timestamp is rejected now mentions the configuration paramter that can be used to disable this check (this, and the option's name "restrict_*", is similar to other so-called "safe mode" options). This patch also includes a test, which works in Scylla and Cassandra, with either setting of restrict_future_timestamp, checking the right thing in all these cases (the futuristic timestamp can either be written and read, or can't be written). I used this test to manually verify that the new option works, defaults to "true", and when set to "false" Scylla behaves like Cassandra. Fixes #12527 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12537	2023-01-16 23:18:56 +02:00
Kefu Chai	114f30016a	main: use std::shift_left() to consume tool name for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12536	2023-01-16 21:01:34 +02:00
Nadav Har'El	feef3f9dda	test/cql-pytest: test more than one restriction on same clustering column Cassandra refuses a request with more than one relation to the same clustering column, for example DELETE FROM tbl WHERE p = ? and c = ? AND c > ? complains that c cannot be restricted by more than one relation if it includes an Equal But it produces different error messages for different operators and even order. Currently, Scylla doesn't consider such requests an error. Whether or not we should be compatible with Cassandra here is discussed in issue #12472. But as long as we do accept these queries, we should be sure we do the right thing: "WHERE c = 1 AND c > 2" should match nothing, "WHERE c = 1 AND c > 0" should match the matches of c = 1, and so on. This patch adds a test for verify that these requests indeed yield correct results. The test is scylla_only because, as explained above, Cassandra doesn't support these requests at all. Refs #12472 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12498	2023-01-16 20:41:16 +02:00
Kefu Chai	86b451d45c	SCYLLA-VERSION-GEN: remove unnecessary bashism remove unnecessary bashism, so that this script can be interpreted by a POSIX shell. /bin/sh is specified in the shebang line. on debian derivatives, /bin/sh is dash, which is POSIX compliant. but this script is written in the bash dialect. before this change, we could run into following build failure when building the tree on Debian: [7/904] ./SCYLLA-VERSION-GEN ./SCYLLA-VERSION-GEN: 37: [[: not found after this change, the build is able to proceed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12530	2023-01-16 20:34:01 +02:00
Avi Kivity	0b418fa7cf	cql3, transport, tests: remove "unset" from value type system The CQL binary protocol introduced "unset" values in version 4 of the protocol. Unset values can be bound to variables, which cause certain CQL fragments to be skipped. For example, the fragment `SET a = :var` will not change the value of `a` if `:var` is bound to an unset value. Unsets, however, are very limited in where they can appear. They can only appear at the top-level of an expression, and any computation done with them is invalid. For example, `SET list_column = [3, :var]` is invalid if `:var` is bound to unset. This causes the code to be littered with checks for unset, and there are plenty of tests dedicated to catching unsets. However, a simpler way is possible - prevent the infiltration of unsets at the point of entry (when evaluating a bind variable expression), and introduce guards to check for the few cases where unsets are allowed. This is what this long patch does. It performs the following: (general) 1. unset is removed from the possible values of cql3::raw_value and cql3::raw_value_view. (external->cql3) 2. query_options is fortified with a vector of booleans, unset_bind_variable_vector, where each boolean corresponds to a bind variable index and is true when it is unset. 3. To avoid churn, two compatiblity structs are introduced: cql3::raw_value{,_view}_vector_with_unset, which can be constructed from a std::vector<raw_value{,_view/}>, which is what most callers have. They can also be constructed with explicit unset vectors, for the few cases they are needed. (cql3->variables) 4. query_options::get_value_at() now throws if the requested bind variable is unset. This replaces all the throwing checks in expression evaluation and statement execution, which are removed. 5. A new query_options::is_unset() is added for the users that can tolerate unset; though it is not used directly. 6. A new cql3::unset_operation_guard class guards against unsets. It accepts an expression, and can be queried whether an unset is present. Two conditions are checked: the expression must be a singleton bind variable, and at runtime it must be bound to an unset value. 7. The modification_statement operations are split into two, via two new subclasses of cql3::operation. cql3::operation_no_unset_support ignores unsets completely. cql3::operation_skip_if_unset checks if an operand is unset (luckily all operations have at most one operand that tolerates unset) and applies unset_operation_guard to it. 8. The various sites that accept expressions or operations are modified to check for should_skip_operation(). This are the loops around operations in update_statement and delete_statement, and the checks for unset in attributes (LIMIT and PER PARTITION LIMIT) (tests) 9. Many unset tests are removed. It's now impossible to enter an unset value into the expression evaluation machinery (there's just no unset value), so it's impossible to test for it. 10. Other unset tests now have to be invoked via bind variables, since there's no way to create an unset cql3::expr::constant. 11. Many tests have their exception message match strings relaxed. Since unsets are now checked very early, we don't know the context where they happen. It would be possible to reintroduce it (by adding a format string parameter to cql3::unset_operation_guard), but it seems not to be worth the effort. Usage of unsets is rare, and it is explicit (at least with the Python driver, an unset cannot be introduced by ommission). I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't recognize unsets) with cql3::maybe_unset_value (that does), but that caused huge amounts of churn, so I abandoned that in favor of the current approach. Closes #12517	2023-01-16 21:10:56 +02:00
Kamil Braun	7510144fba	Merge 'Add replace-node-first-boot option' from Benny Halevy Allow replacing a node given its Host ID rather than its ip address. This series adds a replace_node_first_boot option to db/config and makes use of it in storage_service. The new option takes priority over the legacy replace_address* options. When the latter are used, a deprecation warning is printed. Documentation updated respectively. And a cql unit_test is added. Ref #12277 Closes #12316 * github.com:scylladb/scylladb: docs: document the new replace_node_first_boot option dist/docker: support --replace-node-first-boot db: config: describe replace_address* options as deprecated test: test_topology: test replace using host_id test: pylib: ServerInfo: add host_id storage_service: get rid of get_replace_address storage_service: is_replacing: rely directly on config options storage_service: pass replacement_info to run_replace_ops storage_service: pass replacement_info to booststrap storage_service: join_token_ring: reuse replacement_info.address storage_service: replacement_info: add replace address init: do not allow cfg.replace_node_first_boot of seed node db: config: add replace_node_first_boot option	2023-01-16 15:08:31 +01:00
Michał Sala	bbbe12af43	forward_service: fix timeout support in parallel aggregates `forward_request` verb carried information about timeouts using `lowres_clock::time_point` (that came from local steady clock `seastar::lowres_clock`). The time point was produced on one node and later compared against other node `lowres_clock`. That behavior was wrong (`lowres_clock::time_point`s produced with different `lowres_clock`s cannot be compared) and could lead to delayed or premature timeout. To fix this issue, `lowres_clock::time_point` was replaced with `lowres_system_clock::time_point` in `forward_request` verb. Representation to which both time point types serialize is the same (64-bit integer denoting the count of elapsed nanoseconds), so it was possible to do an in-place switch of those types using logic suggested by @avikivity: - using steady_clock is just broken, so we aren't taking anything from users by breaking it further - once all nodes are upgraded, it magically starts to work Closes #12529	2023-01-16 12:08:13 +02:00
Botond Dénes	3d9ab1d9eb	Merge 'Get recursive tasks' statuses with task manager api call' from Aleksandra Martyniuk The PR adds an api call allowing to get the statuses of a given task and all its descendants. The parent-child tree is traversed in BFS order and the list of statuses is returned to user. Closes #12317 * github.com:scylladb/scylladb: test: add test checking recursive task status api: get task statuses recursively api: change retrieve_status signature	2023-01-16 11:44:50 +02:00
Tzach Livyatan	073f0f00c6	Add Scylla Summit 2023 in the top banner Closes #12519	2023-01-16 08:05:20 +02:00
Avi Kivity	5a07641b95	Update python3 submodule (license file fix) * tools/python3 548e860...279b6c1 (1): > create-relocatable-package: s/pyhton3-libs/python3-libs/	2023-01-15 17:59:27 +02:00
Benny Halevy	de3142e540	docs: document the new replace_node_first_boot option And mention that replacing a node using the legacy replace_addr* options is deprecated. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:41:44 +02:00
Benny Halevy	d4f1563369	dist/docker: support --replace-node-first-boot And mention that replace_address_first_boot is deprecated Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:36:09 +02:00
Benny Halevy	1577aa8098	db: config: describe replace_address* options as deprecated The replace_address options are still supported But mention in their description that they are now deprecated and the user should use replace_node_first_boot instead. While at it fix a typo in ignore_dead_nodes_for_replace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:36:09 +02:00

1 2 3 4 5 ...

34649 Commits