scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 18:40:38 +00:00

Author	SHA1	Message	Date
Piotr Smaron	dbb912c8dd	cql: remove unused helper function from test_tablets `change_default_rf` is not used anywhere, moreover it uses `replication_factor` tag, which is forbidden in ALTER tablets KS statement. (cherry picked from commit `042825247f`)	2024-10-08 18:06:42 +00:00
Pavel Emelyanov	190385ee2b	cql: Check that CREATEing tablets/vnodes is consistent with the CLI There are two bits that control whenter replication strategy for a keyspace will use tablets or not -- the configuration option and CQL parameter. This patch tunes its parsing to implement the logic shown below: if (strategy.supports_tablets) { if (cql.with_tablets) { if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { throw "tablets are not enabled"; } } else if (cql.with_tablets = off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { return create_keyspace_without_tablets(); } } } else { // strategy doesn't support tablets if (cql.with_tablets == on) { throw "invalid cql parameter"; } else if (cql.with_tablets == off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified return create_keyspace_without_tablets(); } } closes: #20088 In order to enable tablets "by default" for NetworkTopologyStrategy there's explicit check near ks_prop_defs::get_initial_tablets(), that's not very nice. It needs more care to fix it, e.g. provide feature service reference to abstract_replication_strategy constructor. But since ks_prop_defs code already highjacks options specifically for that strategy type (see prepare_options() helper), it's OK for now. There's also #20768 misbehavior that's preserved in this patch, but should be fixed eventually as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20928	2024-10-03 17:09:21 +03:00
Calle Wilund	4a1e83d6be	commitlog: Fix buffer_list_bytes not updated correctly Fixes #20862 With the change in `60af2f3cb2` the bookkeep for buffer memory was changed subtly, the problem here that we would shrink buffer size before we after flush use said buffer's size to decrement the buffer_list_bytes value, previously inc:ed by the full, allocated size. I.e. we would slowly grow this value instead of adjusting properly to actual used bytes. Test included. (cherry picked from commit `ee5e71172f`) Closes scylladb/scylladb#20914	2024-10-03 09:11:40 +03:00
Kamil Braun	a96654bea3	Merge '[Backport 6.1] Populate raft address map from gossiper on raft configuration change' from ScyllaDB For each new node added to the raft config populate it's ID to IP mapping in raft address map from the gossiper. The mapping may have expired if a node is added to the raft configuration long after it first appears in the gossiper. Fixes scylladb/scylladb#20600 Backport to all supported versions since the bug may cause bootstrapping failure. (cherry picked from commit `bddaf498df`) (cherry picked from commit `9e4cd32096`) Refs #20601 Closes scylladb/scylladb#20848 * github.com:scylladb/scylladb: test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join group0: make sure that address map has an entry for each new node in the raft configuration	2024-09-30 17:03:03 +02:00
Kamil Braun	79119f58e8	Merge '[Backport 6.1] mark node as being replaced earlier' from Gleb Natapov Before `17f4a151ce` the node was marked as been replaced in join_group0 state, before it actually joins the group0, so by the time it actually joins and starts transferring snapshot/log no traffic is sent to it. The commit changed this to mark the node as being replaced after the snapshot/log is already transferred so we can get the traffic to the node while it sill did not caught up with a leader and this may causes problems since the state is not complete. Mark the node as being replaced earlier, but still add the new node to the topology later as the commit above intended. Fixes: https://github.com/scylladb/scylladb/issues/20629 Need to be backported since this is a regression (cherry picked from commit `644e7a2012`) (cherry picked from commit `c0939d86f9`) (cherry picked from commit `1b4c255ffd`) Closes scylladb/scylladb#20834 * github.com:scylladb/scylladb: test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts topology coordinator:: mark node as being replaced earlier topology coordinator: do metadata barrier before calling finish_accepting_node() during replace	2024-09-27 16:10:07 +02:00
Andrei Chekun	392d95d2cd	test.py: Increase workers for cluster cleaning Increase workers for that used in method async_rmtree() that is used for cleaning directories. This should help to reduce flakiness. Increasing the workers count was introduced in `f54b7f5427` but there is no need to backport the whole commit. Closes scylladb/scylladb#20795	2024-09-27 14:47:08 +02:00
Kamil Braun	be76d6f9d9	service: raft: fix rpc error message What it called "leader" is actually the destination of the RPC. Trivial fix, should be backported to all affected versions. (cherry picked from commit `84dd0e922b`) Closes scylladb/scylladb#20827	2024-09-27 11:22:02 +02:00
Gleb Natapov	39a8203160	test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join (cherry picked from commit `9e4cd32096`)	2024-09-26 21:13:39 +00:00
Gleb Natapov	d2d1ed92c2	group0: make sure that address map has an entry for each new node in the raft configuration ID->IP mapping is added to the raft address map when the mapping first appears in the gossiper, but it is added as expiring entry. It becomes non expiring when a node is added to raft configuration. But when a node joins those two events may be distant in time (since the node's request may sit in the topology coordinator queue for a while) and mappings may expire already from the map. This patch makes sure to transfer the mapping from the gossiper for a node that is added to the raft configuration instead of assuming that the mapping is already there. (cherry picked from commit `bddaf498df`)	2024-09-26 21:13:39 +00:00
Gleb Natapov	c7be05cc50	test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts (cherry picked from commit `1b4c255ffd`)	2024-09-26 12:34:18 +03:00
Gleb Natapov	88712782de	topology coordinator:: mark node as being replaced earlier Before `17f4a151ce` the node was marked as been replaced in join_group0 state, before it actually joins the group0, so by the time it actually joins and starts transferring snapshot/log no traffic is sent to it. The commit changed this to mark the node as being replaced after the snapshot/log is already transferred so we can get the traffic to the node while it sill did not caught up with a leader and this may causes problems since the state is not complete. Mark the node as being replaced earlier, but still add the new node to the topology later as the commit above intended. (cherry picked from commit `c0939d86f9`)	2024-09-26 12:34:04 +03:00
Gleb Natapov	eaade2f0ef	topology coordinator: do metadata barrier before calling finish_accepting_node() during replace During replace with the same IP a node may get queries that were intended for the node it was replacing since the new node declares itself UP before it advertises that it is a replacement. But after the node starts replacing procedure the old node is marked as "being replaced" and queries no longer sent there. It is important to do so before the new node start to get raft snapshot since the snapshot application is not atomic and queries that run parallel with it may see partial state and fail in weird ways. Queries that are sent before that will fail because schema is empty, so they will not find any tables in the first place. The is pre-existing and not addressed by this patch. (cherry picked from commit `644e7a2012`)	2024-09-26 12:33:06 +03:00
Kefu Chai	ef32ba704d	docs: explain precedence of configure options to explain for instance which setting takes effect if both command line options and `scylla.yaml` configures the same parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `1aa030a8cd`) Closes scylladb/scylladb#20775	2024-09-26 10:47:42 +03:00
Anna Stuchlik	10d71d2f4b	doc: update the unified installer instructions This commit updates the unified installer instructions to avoid specifying a given version. At the moment, we're technically unable to use variables in URLs, so we need to update the page each release. Fixes https://github.com/scylladb/scylladb/issues/20677 (cherry picked from commit `400a14eefa`) Closes scylladb/scylladb#20709	2024-09-26 10:45:53 +03:00
Anna Stuchlik	9afb3daf98	doc: fix a broken link This commit fixes a link to the Manager by adding a missing underscore to the external link. (cherry picked from commit `aa0c95c95c`) Closes scylladb/scylladb#20707	2024-09-26 10:45:17 +03:00
Tzach Livyatan	82e7cb5bf5	Update client-node-encryption: OpsnSSL is FIPS enabled (cherry picked from commit `cb864b11d8`) Closes scylladb/scylladb#20651	2024-09-26 10:42:12 +03:00
Lakshmi Narayanan Sreethar	58da8fdbbc	[Backport 6.1]: database::get_all_tables_flushed_at: fix return value The `database::get_all_tables_flushed_at` method returns a variable without setting the computed all_tables_flushed_at value. This causes its caller, `maybe_flush_all_tables` to flush all the tables everytime regardless of when they were last flushed. Fix this by returning the computed value from `database::get_all_tables_flushed_at`. Fixes #20301 Closes scylladb/scylladb#20471 * github.com:scylladb/scylladb: cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config database::get_all_tables_flushed_at: fix return value (cherry picked from commit `0e5b444777`) Backported from #20471 to 6.1. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20581	2024-09-26 10:40:48 +03:00
Kamil Braun	92156e7930	test: fix `topology_custom/test_raft_recovery_stuck` flakiness The test performs consecutive schema changes in RECOVERY mode. The second change relies on the first. However the driver might route the changes to different servers and we don't have group 0 to guarantee linearizability. We must rely on the first change coordinator to push the schema mutations to other servers before returning, but that only happens when it sees other servers as alive when doing the schema change. It wasn't guaranteed in the test. Fix this. Fixes scylladb/scylladb#20791 Should be backported to all branches containing this test to reduce flakiness. (cherry picked from commit `f390d4020a`) Closes scylladb/scylladb#20809	2024-09-25 15:11:50 +02:00
Abhinav	33b50a9d3a	raft topology: add error for removal of non-normal nodes In the current scenario, We check if a node being removed is normal on the node initiating the removenode request. However, we don't have a similar check on the topology coordinator. The node being removed could be normal when we initiate the request, but it doesn't have to be normal when the topology coordinator starts handling the request. For example, the topology coordinator could have removed this node while handling another removenode request that was added to the request queue earlier. This commit intends to fix this issue by adding more checks in the enqueuing phase and return errors for duplicate requests for node removal. This PR fixes a bug. Hence we need to backport it. Fixes: scylladb/scylladb#20271 (cherry picked from commit `b25b8dccbd`) Closes scylladb/scylladb#20800	2024-09-25 11:35:27 +02:00
Gleb Natapov	43f9b3b997	test: skip test_lwt_semaphore::test_cas_semaphore in aarch64 debug mode The test configures write timeout to much smaller value to make the test run faster since for some writes sleep is inserted to hit the timeout, but it makes aarch64 debug flaky since timeout happens when it should not because of a natural slowness. (cherry picked from commit `71a5b1c6dd`) Closes scylladb/scylladb#20777	2024-09-24 15:20:09 +02:00
Botond Dénes	7ed2f87414	Merge '[Backport 6.1] cql3: add option to not unify bind variables with the same' from Avi Kivity Bind variables in CQL have two formats: positional (?) where a variable is referred to by its relative position in the statement, and named (:var), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the dialect and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes https://github.com/scylladb/scylladb/issues/15559 This may be useful to users transitioning from Cassandra, so merits a backport. (cherry picked from commit `f9322799af`) (cherry picked from commit `d69bf4f010`) (cherry picked from commit `ea8441dfa3`) Refs https://github.com/scylladb/scylladb/pull/19493 Closes scylladb/scylladb#20590 * github.com:scylladb/scylladb: cql3: add option to not unify bind variables with the same name cql3: introduce dialect infrastructure cql3: prepared_statement_cache: drop cache key default constructor Merge 'config: round-trip boolean configuration variables' from Avi Kivity	2024-09-24 15:15:05 +03:00
Jenkins Promoter	f4ad3436cb	Update ScyllaDB version to: 6.1.3	2024-09-24 15:07:23 +03:00
Benny Halevy	d13c77e1eb	time_window_compaction_strategy: get_reshaping_job: restrict sort of multi_window vector to its size Currently the function calls boost::partial_sort with a middle iterator that might be out of bound and cause undefined behavior. Check the vector size, and do a partial sort only if its longer than `max_sstables`, otherwise sort the whole vector. Fixes scylladb/scylladb#20608 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `39ce358d82`) Closes scylladb/scylladb#20663	2024-09-23 15:38:35 +03:00
Piotr Dulikowski	bf6dd16071	Merge '[Backport 6.1] message/messaging_service: guard adding maintenance tenant under cluster feature' from Michał Jadwiszczak In https://github.com/scylladb/scylladb/pull/18729, we introduced a new statement tenant $maintenance, but the change wasn't protected by any cluster feature. This wasn't a problem for OSS, since unknown isolation cookie just uses default scheduling group. However, in enterprise that leads to creating a service level on not-upgraded nodes, which may end up in an error if user create maximum number of service levels. This patch adds a cluster feature to guard adding the new tenant. It's done in the way to handle two upgrade scenarios: version without $maintenance tenant -> version with $maintenance tenant guarded by a feature version with $maintenance tenant but not guarded by a feature -> version with $maintenance tenant guarded by a feature The PR adds enabled flag to statement tenants. This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection. The $maintenance tenant is added to the config as disabled and it gets enabled once the corresponding feature is enabled. Fixes https://github.com/scylladb/scylladb/issues/20070 Refs https://github.com/scylladb/scylla-enterprise/issues/4403 (cherry picked from commit `d44844241d`) (cherry picked from commit `71a03ef6b0`) (cherry picked from commit `b4b91ca364`) Refs https://github.com/scylladb/scylladb/pull/19802 Closes scylladb/scylladb#20674 * github.com:scylladb/scylladb: message/messaging_service: guard adding maintenance tenant under cluster feature message/messaging_service: add feature_service dependency message/messaging_service: add `enabled` flag to statement tenants	2024-09-23 13:18:45 +02:00
Botond Dénes	f987afb2e1	Merge '[Manual Backport 6.1] generic_server: convert connection tracking to seastar::gate' from Laszlo Ersek This is a manual backport of #20212 to 6.1, superseding #20345 (which had run into conflicts). Please see the individual commit messages for backport notes. Fixes #10305 Closes scylladb/scylladb#20355 * github.com:scylladb/scylladb: generic_server: make server::stop() idempotent generic_server: coroutinize server::shutdown() generic_server: make server::shutdown() idempotent test/generic_server: add test case configure, cmake: sort the lists of boost unit tests generic_server: convert connection tracking to seastar::gate	2024-09-18 15:52:32 +03:00
Michał Jadwiszczak	7e14df5ba7	message/messaging_service: guard adding maintenance tenant under cluster feature Set `enabled` flag for `$maintenance` tenant to false and enable it when `MAINTENANCE_TENANT` feature is enabled. (cherry-picked from `b4b91ca364`)	2024-09-18 11:31:26 +02:00
Michał Jadwiszczak	d11df0fcbc	message/messaging_service: add feature_service dependency (cherry-picked from `71a03ef6b0`)	2024-09-18 11:26:56 +02:00
Michał Jadwiszczak	f928bb7967	message/messaging_service: add `enabled` flag to statement tenants Adding a new tenant needs to be done under cluster feature protection. However it wasn't the case for adding `$maintenance` statement tenant and to fix it we need to support an upgrade from node which doesn't know about maintenance tenant at all and from one which uses it without any cluster feature protection. This commit adds `enabled` flag to statement tenants. This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection. (cherry-picked from `d44844241d`)	2024-09-18 11:23:02 +02:00
Tomasz Grabiec	edea822bd7	Merge '[Backport 6.1] tablets: Fix race between repair and split' from Raphael "Raph" Carvalho Consider the following: ``` T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes ``` If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes https://github.com/scylladb/scylladb/issues/19378. Fixes https://github.com/scylladb/scylladb/issues/19416. Please replace this line with justification for the backport/* labels added to this PR (cherry picked from commit `239344ab55`) (cherry picked from commit `74612ad358`) Refs https://github.com/scylladb/scylladb/pull/19427 Closes scylladb/scylladb#20595 * github.com:scylladb/scylladb: tablets: Fix race between repair and split compaction: Allow "offline" sstable to be split	2024-09-17 13:25:03 +02:00
Avi Kivity	fb98d6f832	Merge '[Backport 6.1] replica: ignore cleanup of deallocated storage group' from Aleksandra Martyniuk Cleanup of a deallocated tablet throws an exception. Since failed cleanup is retried, we end up in an infinite loop. Ignore cleanup of deallocated storage groups. Fixes: https://github.com/scylladb/scylladb/issues/19752. Needs to be backported to all branches with tablets (6.0 and later) (cherry picked from commit `20d6cf55f2`) (cherry picked from commit `2c4b1d6b45`) Refs https://github.com/scylladb/scylladb/pull/20584 Closes scylladb/scylladb#20627 * github.com:scylladb/scylladb: test: check if cleanup of deallocated sg is ignored replica: ignore cleanup of deallocated storage group	2024-09-17 12:22:00 +03:00
Gleb Natapov	d2e9007442	paxos_state: release semaphore units before checking if a semaphore can be dropped To drop a semaphore it should not be held by anyone, so we need to release out units before checking if a semaphore can be dropped. Fixes: scylladb/scylladb#20602 (cherry picked from commit `9cc54932ae`) Closes scylladb/scylladb#20621	2024-09-16 22:08:45 +03:00
Aleksandra Martyniuk	032c9146d5	test: check if cleanup of deallocated sg is ignored (cherry picked from commit `2c4b1d6b45`)	2024-09-16 16:22:29 +02:00
Aleksandra Martyniuk	120ff5aeb8	replica: ignore cleanup of deallocated storage group Currently, attempt to cleanup deallocated storage group throws an exception. Failed tablet cleanup is retried, stucking in an endless loop. Ignore cleanup of deallocated storage group. (cherry picked from commit `20d6cf55f2`)	2024-09-16 12:44:36 +00:00
Raphael S. Carvalho	fe56fa39c0	tablets: Fix race between repair and split Consider the following: T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes #19378. Fixes #19416. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `74612ad358`)	2024-09-13 21:32:01 -03:00
Avi Kivity	8ddfd0d70d	cql3: add option to not unify bind variables with the same name Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes #15559 (cherry picked from commit `ea8441dfa3`) (cherry picked from commit `edb3068ecf`)	2024-09-13 18:17:15 +03:00
Avi Kivity	92dd47c6d6	cql3: introduce dialect infrastructure A dialect is a different way to interpret the same CQL statement. Examples: - how duplicate bind variable names are handled (later in this series) - whether `column = NULL` in LWT can return true (as is now) or whether it always returns NULL (as in SQL) Currently, dialect is an empty structure and will be filled in later. It is passed to query_processor methods that also accept a CQL string, and from there to the parser. It is part of the prepared statement cache key, so that if the dialect is changed online, previous parses of the statement are ignored and the statement is prepared again. The patch is careful to pick up the dialect at the entry point (e.g. CQL protocol server) so that the dialect doesn't change while a statement is parsed, prepared, and cached. (cherry picked from commit `d69bf4f010`)	2024-09-13 18:11:11 +03:00
Avi Kivity	4bf81f54b4	cql3: prepared_statement_cache: drop cache key default constructor It's unnecessary, and interferes with the following patch where we change the cache key type. (cherry picked from commit `f9322799af`)	2024-09-13 17:56:06 +03:00
Nadav Har'El	d9ba5423bb	Merge 'config: round-trip boolean configuration variables' from Avi Kivity When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in both directions. Not a regression, so a backport isn't strictly necessary. Closes scylladb/scylladb#19792 * github.com:scylladb/scylladb: config: specialize from-string conversion for bool config: wrap boost::lexical_cast<> when converting from strings (cherry picked from commit `9eb47b3ef0`)	2024-09-13 17:54:37 +03:00
Piotr Smaron	b60f9ef4c2	cql: fix exception when validating KS in CREATE TABLE `c70f321c6f` added an extra check if KS exists. This check can throw `data_dictionary::no_such_keyspace` exception, which is supposed to be caught and a more user-friendly exception should be thrown instead. This commit fixes the above problem and adds a testcase to validate it doesn't appear ever again. Also, I moved the check for the keyspace outside of the `for` loop, as it doesn't need to be checked repeatedly. Additionally, I added an extra comment to both `no_such_keyspace` and `no_such_column_family` exceptions explaining they should not be returned directly to the caller, as they lack error code, which may not trigger correct exceptions handling mechanisms on the driver side. Fixes: #20097 (cherry picked from commit `f1e8976fbe`) Closes scylladb/scylladb#20553 scylla-6.1.2 scylla-6.1.2-candidate-20240915043632	2024-09-13 11:36:51 +03:00
Piotr Dulikowski	00e96d4b70	Merge '[Backport 6.1]: hints: send hints with CL=ALL if target is leaving' from Piotr Dulikowski Currently, when attempting to send a hint, we might choose its recipients in one of two ways: - If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other, - Otherwise, we send the hint to all current replicas of the mutation. There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded. As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving. Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again. Fixes scylladb/scylladb#20558 Fixes scylladb/scylla-dtest#4582 Refs scylladb/scylladb#19835 This is a backport of the original PR without the tests, done avoid the need of resolving merge conflicts in that area. Closes scylladb/scylladb#20557 * github.com:scylladb/scylladb: hints: send hints with CL=ALL if target is leaving hints: inline do_send_one_mutation	2024-09-13 09:39:36 +02:00
Abhi	848054079b	raft: Add descriptions for requested abort errors Fixes: scylladb/scylladb#18902 This PR only improves error messages, no need to backport it. (cherry picked from commit `9b09439065`) Closes scylladb/scylladb#20526	2024-09-13 10:13:49 +03:00
Botond Dénes	c80cefe422	docs/cql/ddl.rst: fix description of sstable_compression ScyllaDB doesn't support custom compressors. The available compressors are the only available ones, not the default ones. Adjust the text to reflect this. (cherry picked from commit `08f109724b`) Closes scylladb/scylladb#20524	2024-09-13 10:12:59 +03:00
Takuya ASADA	b07c74a65c	install.sh: fix more incorrect permission on strict umask Even after `13caac7`, we still have more files incorrect permission, since we use "cp -r" and creating new file with redirect. To fix this, we need to replace "cp -r" with "cp -pr", and "chmod <perm>" on newly created files. Fixes #14383 Related #19775 (cherry picked from commit `9d7fed40b5`) Closes scylladb/scylladb#20432	2024-09-13 10:12:22 +03:00
Piotr Dulikowski	2556c7a0dc	hints: send hints with CL=ALL if target is leaving Currently, when attempting to send a hint, we might choose its recipients in one of two ways: - If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other, - Otherwise, we send the hint to all current replicas of the mutation. There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded. As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destiantion of the hint is leaving. Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again. Fixes scylladb/scylla-dtest#4582 Refs scylladb/scylladb#19835 (cherry picked from commit `61ac0a336d`)	2024-09-12 10:55:29 +02:00
Piotr Dulikowski	132d77f447	hints: inline do_send_one_mutation It's a small method and it is only used once in send_one_mutation. Inlining it lets us get rid of its declaration in the header - now, if one needs to change the variables passed from one function to another, it is no longer necessary to change the header. (cherry picked from commit `8abb06ab82`)	2024-09-12 10:55:21 +02:00
Gleb Natapov	bb9249f055	db/consistency_level: do not use result from hit weighted load balancer if it contains duplicates Because of https://github.com/scylladb/scylladb/issues/9285 hit weighted load balancer may sometimes return same node twice. It may cause wrong data to be read or unexpected errors to be returned to a client. Since the original bug is not easy to fix and it is rare lets introduce a workaround. We will check for duplicates and will use non HWLB one if one is found. (cherry picked from commit `e06a772b87`) Closes scylladb/scylladb#20468	2024-09-10 17:18:47 +03:00
Kamil Braun	e4a18b0858	test: test_raft_no_quorum: increase raft timeout in debug mode The test cases in this file use an error injection to reduce raft group 0 timeouts (from the default 1 minute), in order to speed up the tests; the scenarios expect these timeouts to happen, so we want them to happen as quick as possible, but we don't want to reduce timeouts so much that it will make other operations fail when we don't expect them to (e.g. when the test wants to add a node to the cluster). Unfortunately the selected 5 seconds in debug mode was not enough and made the tests flaky: scylladb/scylladb#20111. Increase it to 10 seconds. This unfortunately will slow down these tests as they have to sometimes wait for 10 seconds for the timeout to happen. But better to have this than a flaky test. Fixes: scylladb/scylladb#20111 (cherry picked from commit `52fdf5b4c9`) Closes scylladb/scylladb#20477	2024-09-10 08:48:06 +03:00
Kefu Chai	105293b2ab	docs: do not install scylla/ppa repo when perform upgrade for following reasons: 1. the ppa in question does not provide the build for the latest ubuntu's LTS release. it only builds for trusty, xenial, bionic and jammy. according to https://wiki.ubuntu.com/Releases, the latest LTS release is ubuntu noble at the time of writing. 2. the ppa in question does not provide the packages used in production. it does provides the package for building scylla 3. after we introduced the relocatable package, there is no need to provide extra user space dependencies apart from scylla packages. so, in this change, we remove all references to enabling the Scylla/PPA repository. Fixes scylladb/scylladb#20449 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `fe0e961856`) Closes scylladb/scylladb#20453	2024-09-10 08:46:47 +03:00
Nadav Har'El	ad47c0e2f9	alternator ttl: fix use-after-free The Alternator TTL scanning code uses an object "scan_ranges_context" to hold the scanning context. One of the members of this object is a service::query_state, and that in turn holds a reference to a service::client_state. The existing constructor created a temporary client_state object and saved a reference to it - which can result in use after free as the temporary object is freed as soon as the constructor ends. The fix is to save a client_state in the scan_ranges_context object, instead of a temporary object. Fixes #19988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `15f8046fcb`) Closes scylladb/scylladb#20436	2024-09-10 08:43:14 +03:00
Kefu Chai	0eb66cbee5	sstables: correct the debugging message printed when removing temp dir in `372a4d1b79`, we introduced a change which was for debugging the logging message. but the logging message intended for printing the temp_dir not prints an `optional<int>`. this is both confusing, and more importantly, it hurts the debuggability. in this change, the related change is reverted. Fixes scylladb/scylladb#20408 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `d26bb9ae30`) Closes scylladb/scylladb#20434	2024-09-10 08:42:29 +03:00

1 2 3 4 5 ...

43695 Commits