scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Author	SHA1	Message	Date
Anna Stuchlik	cd7a6b8892	doc: remove wrong image upgrade info (5.2-to-2023.1) This commit removes the information about the recommended way of upgrading ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade procedure is not supported (it was implemented, but then reverted). Refs https://github.com/scylladb/scylladb/issues/15733 Closes scylladb/scylladb#21876 Fixes https://github.com/scylladb/scylla-enterprise/issues/5041 Fixes https://github.com/scylladb/scylladb/issues/21898 (cherry picked from commit `98860905d8`)	2024-12-12 15:24:08 +02:00
Anna Stuchlik	a592393d5d	doc: add the 6.0-to-2024.2 upgrade guide-from-6 This commit adds an upgrade guide from ScyllDB 6.0 to ScyllaDB Enterprise 2024.2. Fixes https://github.com/scylladb/scylladb/issues/20063 Fixes https://github.com/scylladb/scylladb/issues/20062 Refs https://github.com/scylladb/scylla-enterprise/issues/4544 (cherry picked from commit `3d4b7e41ef`) Closes scylladb/scylladb#21618	2024-11-18 17:17:56 +02:00
Gleb Natapov	8c8316e21d	topology coordinator: take a copy of a replication state in raft_topology_cmd_handler Current code takes a reference and holds it past preemption points. And while the state itself is not suppose to change the reference may become stale because the state is re-created on each raft topology command. Fix it by taking a copy instead. This is a slow path anyway. Fixes: scylladb/scylladb#21220 (cherry picked from commit `fb38bfa35d`) Closes scylladb/scylladb#21372	2024-11-14 17:33:16 +02:00
Tomasz Grabiec	5d07be19c0	node-exporter: Disable hwmon collector This collector reads nvme temperature sensor, which was observed to cause bad performance on Azure cloud following the reading of the sensor for ~6 seconds. During the event, we can see elevated system time (up to 30%) and softirq time. CPU utilization is high, with nvm_queue_rq taking several orders of magnitude more time than normally. There are signs of contention, we can see __pv_queued_spin_lock_slowpath in the perf profile, called. This manifests as latency spikes and potentially also throughput drop due to reduced CPU capacity. By default, the monitoring stack queries it once every 60s. (cherry picked from commit `93777fa907`) Closes scylladb/scylladb#21306	2024-10-31 14:06:17 +01:00
Lakshmi Narayanan Sreethar	fd80dd2284	[Backport 6.0] replica/table: check memtable before discarding tombstone during read On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 `perf-simple-query` stats before and after this fix : `build/Dev/scylla perf-simple-query --smp=1 --flush` : ``` // Before this Fix // --------------- 94941.79 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59393 insns/op, 24029 cycles/op, 0 errors) 97551.14 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59376 insns/op, 23966 cycles/op, 0 errors) 96599.92 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59367 insns/op, 23998 cycles/op, 0 errors) 97774.91 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59370 insns/op, 23968 cycles/op, 0 errors) 97796.13 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59368 insns/op, 23947 cycles/op, 0 errors) throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79 instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02 cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19 // After this Fix // -------------- 95313.53 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59392 insns/op, 24058 cycles/op, 0 errors) 97311.48 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59375 insns/op, 24005 cycles/op, 0 errors) 98043.10 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59381 insns/op, 23941 cycles/op, 0 errors) 96750.31 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59396 insns/op, 24025 cycles/op, 0 errors) 93381.21 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59390 insns/op, 24097 cycles/op, 0 errors) throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21 instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73 cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22 ``` This PR fixes a regression introduced in `ce96b472d3` and should be backported to older versions. Closes scylladb/scylladb#20985 * github.com:scylladb/scylladb: topology-custom: add test to verify tombstone gc in read path replica/table: check memtable before discarding tombstone during read compaction_group: track maximum timestamp across all sstables (cherry picked from commit `519e167611`) Backported from #20985 to 6.0 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#21249	2024-10-25 11:20:24 +03:00
Botond Dénes	21cdf8833f	Merge '[Backport 6.0] tablet: Fix single-sstable split when attaching new unsplit sstables' from Raphael Raph Carvalho To fix a race between split and repair here `c1de4859d8`, a new sstable generated during streaming can be split before being attached to the sstable set. That's to prevent an unsplit sstable from reaching the set after the tablet map is resized. So we can think this split is an extension of the sstable writer. A failure during split means the new sstable won't be added. Also, the duration of split is also adding to the time erm is held. For example, repair writer will only release its erm once the split sstable is added into the set. This single-sstable split is going through run_custom_job(), which serializes with other maintenance tasks. That was a terrible decision, since the split may have to wait for ongoing maintenance task to finish, which means holding erm for longer. Additionally, if split monitor decides to run split on the entire compaction group, it can cause single-sstable split to be aborted since the former wants to select all sstables, propagating a failure to the streaming writer. That results in new sstable being leaked and may cause problems on restart, since the underlying tablet may have moved elsewhere or multiple splits may have happened. We have some fragility today in cleaning up leaked sstables on streaming failure, but this single-sstable split made it worse since the failure can happen during normal operation, when there's e.g. no I/O error. It makes sense to kill run_custom_job() usage, since the single-sstable split is offline and an extension of sstable writing, therefore it makes no sense to serialize with maintenance tasks. It must also inherit the sched group of the process writing the new sstable. The inheritance happens today, but is fragile. Fixes https://github.com/scylladb/scylladb/issues/20626. (cherry picked from commit `999f1f1318`) (cherry picked from commit `38ce2c605d`) Refs https://github.com/scylladb/scylladb/pull/20737 Closes scylladb/scylladb#21201 * github.com:scylladb/scylladb: tablet: Fix single-sstable split when attaching new unsplit sstables replica: Fix tablet split execute after restart	2024-10-25 11:18:58 +03:00
Botond Dénes	79b7aee58b	Merge '[Backport 6.0] Check system.tablets update before putting it into the table' from ScyllaDB Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load. fixes #20043 (cherry picked from commit `f09fe4f351`) (cherry picked from commit `e5bf376cbc`) (cherry picked from commit `1863ccd900`) Refs #21020 Closes scylladb/scylladb#21112 * github.com:scylladb/scylladb: tablets: Validate system.tablets update group0_client: Introduce change validation group0_client: Add shared_token_metadata dependency replica/tablets: Add to_tablet_metadata_(row_)?key helpers replica/tablets: extract tablet_replica_set_from_cell()	2024-10-25 11:18:32 +03:00
Benny Halevy	e45477811c	storage_service: rebuild: warn about tablets-enabled keyspaces Until we automatically support rebuild for tablets-enabled keyspaces, warn the user about them. The reason this is not an error, is that after increasing RF in a new datacenter, the current procedure is to run `nodetool rebuild` on all nodes in that dc to rebuild the new vnode replicas. This is not required for tablets, since the additional replicas are rebuilt automatically as part of ALTER KS. However, `nodetool rebuild` is also run after local data loss (e.g. due to corruption and removal of sstables). In this case, rebuild is not supported for tablets-enabled keyspaces, as tablet replicas that had lost data may have already been migrated to other nodes, and rebuilding the requested node will not know about it. It is advised to repair all nodes in the datacenter instead. Refs scylladb/scylladb#17575 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ed1e9a1543`) Closes scylladb/scylladb#20724	2024-10-25 11:18:12 +03:00
Tomasz Grabiec	86066f5313	Merge '[Backport 6.0] replica: Fix tombstone GC during tablet split preparation' from Raphael Raph Carvalho During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: sstable B is split first, and moved from main (unsplit) group to a split-ready group now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes https://github.com/scylladb/scylladb/issues/20044. Please replace this line with justification for the backport/* labels added to this PR Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed. (cherry picked from commit `bcd358595f`) (cherry picked from commit `93815e0649`) Refs https://github.com/scylladb/scylladb/pull/20939 Closes scylladb/scylladb#21204 * github.com:scylladb/scylladb: replica: Fix tombstone GC during tablet split preparation service: Improve error handling for split	2024-10-23 11:48:45 +02:00
Pavel Emelyanov	b71753a4bc	tablets: Validate system.tablets update Implement change validation for raft topology_change command. For now the only check is that the "pending replicas" contains at most one entry. The check mirrors similar one in `process_one_row` function. If not passed, this prevents system.tablets from being updated with the mutation(s) that will not be loaded later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 12:36:00 +03:00
Pavel Emelyanov	29e562af1a	group0_client: Introduce change validation Add validate_change() methods (well, a template and an overload) that are called by prepare_command() and are supposed to validate the proposed change before it hits persistent storage Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 12:35:44 +03:00
Pavel Emelyanov	ae5885abf5	group0_client: Add shared_token_metadata dependency It will be needed later to get tablet_metadata from. The dependency is "OK", shared_token_metadata is low-level sharded service. Client already references db::system_keyspace, which in turn references replica::database which, finally, references token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 12:35:44 +03:00
Pavel Emelyanov	a0f029ceaa	replica/tablets: Add to_tablet_metadata_(row_)?key helpers Extraceted from larger patch `f5976aa87b` (replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint()) by Botond. The helpers are needed to decode mutations with tablets update to validate them later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 12:35:42 +03:00
Kefu Chai	648b017a36	replica/tablets: extract tablet_replica_set_from_cell() so it can be reused to implement a low-level tool which reads tablets data from sstables Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 12:24:57 +03:00
Botond Dénes	f84cdc7569	Merge '[Backport 6.0] atomic_delete: allow deletion of sstables from several prefixes' from ScyllaDB Allow create_pending_deletion_log to delete a bunch of sstables potentially resides in different prefixes (e.g. in the base directory and under staging/). The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups. Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories. Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit. Fixes scylladb/scylladb#18862 Needs backport to 6.0 since tablets require this capability (cherry picked from commit `a7b92d7b6f`) (cherry picked from commit `027e64876a`) (cherry picked from commit `44bd183187`) (cherry picked from commit `f47b5e60bc`) Refs #19555 Closes scylladb/scylladb#20645 * github.com:scylladb/scylladb: sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory sstables: storage: keep base directory in base class sstables: storage: define opened_directory in header file sstable_directory: use only dirlog	2024-10-22 09:21:04 +03:00
Benny Halevy	3170f9abec	view: check_needs_view_update_path: get token_metadata_ptr check_needs_view_update_path is async and might yield so the token_metadata reference passed to it must be kept alive throughout the call. Fixes scylladb/scylladb#20979 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `d34878e96c`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21040	2024-10-22 09:18:49 +03:00
Raphael S. Carvalho	553803ac0f	replica: Fix tombstone GC during tablet split preparation During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: 1) sstable B is split first, and moved from main (unsplit) group to a split-ready group 2) now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes #20044. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `93815e0649`)	2024-10-20 20:38:21 -03:00
Raphael S. Carvalho	fde550360e	tablet: Fix single-sstable split when attaching new unsplit sstables To fix a race between split and repair here `c1de4859d8`, a new sstable generated during streaming can be split before being attached to the sstable set. That's to prevent an unsplit sstable from reaching the set after the tablet map is resized. So we can think this split is an extension of the sstable writer. A failure during split means the new sstable won't be added. Also, the duration of split is also adding to the time erm is held. For example, repair writer will only release its erm once the split sstable is added into the set. This single-sstable split is going through run_custom_job(), which serializes with other maintenance tasks. That was a terrible decision, since the split may have to wait for ongoing maintenance task to finish, which means holding erm for longer. Additionally, if split monitor decides to run split on the entire compaction group, it can cause single-sstable split to be aborted since the former wants to select all sstables, propagating a failure to the streaming writer. That results in new sstable being leaked and may cause problems on restart, since the underlying tablet may have moved elsewhere or multiple splits may have happened. We have some fragility today in cleaning up leaked sstables on streaming failure, but this single-sstable split made it worse since the failure can happen during normal operation, when there's e.g. no I/O error. It makes sense to kill run_custom_job() usage, since the single-sstable split is offline and an extension of sstable writing, therefore it makes no sense to serialize with maintenance tasks. It must also inherit the sched group of the process writing the new sstable. The inheritance happens today, but is fragile. Fixes #20626. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `38ce2c605d`)	2024-10-20 20:16:53 -03:00
Raphael S. Carvalho	3ba613b833	replica: Fix tablet split execute after restart let's assume there are 2 nodes, n1, n2. n1 is the coordinator. 1) n1 emits split 2) n1 and n2 complete split work 3) n1 becomes aware all replicas are ready for split 4) n2 restarts, but places split sstable into main group[1] 5) n1 executes split 6) n2 handles split completion, but see the main group is not empty [1]: During split, main group should only contain unsplit sstables. If all sstables are split, main must be empty. This is a result of replica not setting storage group to split mode on restart (using tablet map) and therefore sstables are incorrectly placed on main group. The fix is about looking at tablet map and setting group to split mode before sstables are populated into it. Refs #20626. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `999f1f1318`)	2024-10-20 20:16:51 -03:00
Benny Halevy	f61e0e0f3e	sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory To be able to atomically delete sstables both in base table directory and in its sub-directories, like `staging/`, use a shared pending_delete_dir under under the base directory. Note that this requires loading and processing the base directory first. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `f47b5e60bc`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> # Conflicts: # sstables/sstable_directory.hh	2024-10-20 09:17:06 +03:00
Benny Halevy	f0511ab4eb	sstables: storage: keep base directory in base class so we can use the base (table) directory for e.g. pending_delete logs, in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `44bd183187`)	2024-10-20 09:17:06 +03:00
Benny Halevy	38bc9ee175	sstables: storage: define opened_directory in header file So it can be used outside the storage module in the following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `027e64876a`)	2024-10-20 09:17:06 +03:00
Benny Halevy	75e19cdeb1	sstable_directory: use only dirlog Currently, there are leftover log messages using sstlog rather than dirlog, that was introduced in `aebd965f0e`, and that makes debugging harder. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `a7b92d7b6f`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> # Conflicts: # sstables/sstable_directory.cc	2024-10-20 09:17:06 +03:00
Piotr Smaron	cdc5ee84ec	test: fix flaky `test_multidc_alter_tablets_rf` The testcase is flaky due to a known python driver issue: https://github.com/scylladb/python-driver/issues/317. This issue causes the `CREATE KEYSPACE` statement to be sometimes executed twice in a row, and the 2nd CREATE statement causes the test to fail. In order to work around it, it's enough to add `if not exists` when creating a ks. Fixes: #21034 Needs to be backported to all 6.x branches, as the PR introducing this flakiness is backported to every 6.x branch. (cherry picked from commit `3969ffb39f`) Closes scylladb/scylladb#21134	2024-10-17 11:03:14 +03:00
Piotr Smaron	5fa7c5dbc0	cql/tablets: handle MVs in ALTER tablets KEYSPACE ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized views (MV), and only produced tablets mutations changing tables. With this patch we're producing tablets mutations for both tables and MVs, hence when e.g. we change the replication factor (RF) of a KS, both the tables' RFs and MVs' RFs are updated along with tablets replicas. The `test_tablet_rf_change` testcase has been extended to also verify that MVs' tablets replicas are updated when RF changes. Fixes: #20240 (cherry picked from commit `e0c1a51642`) Closes scylladb/scylladb#21024	2024-10-17 09:36:35 +03:00
Kamil Braun	dddd9837b2	Merge '[Backport 6.0] cql: improve validating RF's change in ALTER tablets KS' from Piotr Smaron This patch series fixes a couple of bugs around validating if RF is not changed by too much when performing ALTER tablets KS. RF cannot change by more than 1 in total, because tablets load balancer cannot handle more work at once. Fixes: https://github.com/scylladb/scylladb/issues/20039 Should be backported to 6.0 & 6.1 (wherever tablets feature is present), as this bug may break the cluster. (cherry picked from commit `042825247f`) (cherry picked from commit `adf453af3f`) (cherry picked from commit `9c5950533f`) (cherry picked from commit `47acdc1f98`) (cherry picked from commit `93d61d7031`) (cherry picked from commit `6676e47371`) (cherry picked from commit `2aabe7f09c`) (cherry picked from commit `ee56bbfe61`) Refs https://github.com/scylladb/scylladb/pull/20208 Closes scylladb/scylladb#21047 * github.com:scylladb/scylladb: cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS cql: join new and old KS options in ALTER tablets KS cql: fix validation of ALTERing RFs in tablets KS cql: harden `alter_keyspace_statement.cc::validate_rf_difference` cql: validate RF change for new DCs in ALTER tablets KS cql: extend test_alter_tablet_keyspace_rf cql: refactor test_tablets::test_alter_tablet_keyspace cql: remove unused helper function from test_tablets	2024-10-15 12:45:40 +02:00
Kefu Chai	38f432b598	install.sh: install seastar/scripts/addr2line.py as well seastar extracted `addr2line` python module out back in e078d7877273e4a6698071dc10902945f175e8bc. but `install.sh` was not updated accordingly. it still installs `seastar-addr2line` without installing its new dependency. this leaves us with a broken `seastar-addr2line` in the relocatable tarball. ```console $ /opt/scylladb/scripts/seastar-addr2line Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/seastar-addr2line", line 26, in <module> from addr2line import BacktraceResolver ModuleNotFoundError: No module named 'addr2line' ``` in this change, we redistribute `addr2line.py` as well. this should address the issue above. Fixes scylladb/scylladb#21077 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `da433aad9d`) Closes scylladb/scylladb#21086	2024-10-14 13:37:25 +03:00
Botond Dénes	45a3a1d460	Merge '[Backport 6.0] storage_proxy: Add conditions checking to avoid UB in speculating read executors.' from ScyllaDB During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in filter_for_query(): the map is considered incorrect if the list of replicas contains a node from a data center whose replication factor is 0. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625 As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2. (cherry picked from commit `132358dc92`) (cherry picked from commit `ae23d42889`) (cherry picked from commit `ad93cf5753`) (cherry picked from commit `8db6d6bd57`) (cherry picked from commit `c373edab2d`) Refs #20851 Closes scylladb/scylladb#21069 * github.com:scylladb/scylladb: Add conditions checking for get_read_executor Avoid an extra call to block_for in db::filter_for_query. Improve code readability in consistency_level.cc and storage_proxy.cc tools: Add build_info header with functions providing build type information tests: Add tests for alter table with RF=1 to RF=0	2024-10-14 13:37:08 +03:00
Michał Chojnowski	a4eab7cab6	reader_concurrency_semaphore: in stats, fix swapped count_resources and memory_resources can_admit_read() returns reason::memory_resources when the permit is queued due to lack of count resources, and it returns reason::count_resources when the permit is queued due to lack of memory resources. It's supposed to be the other way around. This bug is causing the two counts to be swapped in the stat dumps printed to the logs when semaphores time out. (cherry picked from commit `6cf3747c5f`) Closes scylladb/scylladb#21032	2024-10-14 13:35:47 +03:00
Sergey Zolotukhin	d490178a11	Add conditions checking for get_read_executor During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in get_endpoints_for_reading(): the map is considered incorrect the number of read replica nodes is higher than replication factor. The check is applied only when built in non release mode. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625 (cherry picked from commit `c373edab2d`)	2024-10-11 18:20:43 +00:00
Sergey Zolotukhin	a8114ab91c	Avoid an extra call to block_for in db::filter_for_query. (cherry picked from commit `8db6d6bd57`)	2024-10-11 18:20:43 +00:00
Sergey Zolotukhin	3b0a161d14	Improve code readability in consistency_level.cc and storage_proxy.cc Add const correctness and rename some variables to improve code readability. (cherry picked from commit `ad93cf5753`)	2024-10-11 18:20:42 +00:00
Sergey Zolotukhin	e35746064a	tools: Add build_info header with functions providing build type information A new header provides `constexpr` functions to retrieve build type information: `get_build_type()`, `is_release_build()`, and `is_debug_build()`. These functions are useful when adding changes that should be enabled at compile time only for specific build types. (cherry picked from commit `ae23d42889`)	2024-10-11 18:20:42 +00:00
Sergey Zolotukhin	c5d7b66a5e	tests: Add tests for alter table with RF=1 to RF=0 Adding Vnodes and Tablets tests for alter keyspace operation that decreases replication factor from 1 to 0 for one of two data centers. Tablet version fails due to issue described in scylladb/scylladb#20625. Test for scylladb/scylladb#20625 (cherry picked from commit `132358dc92`)	2024-10-11 18:20:42 +00:00
Botond Dénes	9e5bdc069e	repair/row_level: remove reader timeout This timeout was added to catch reader related deadlocks. We have not seen such deadlocks for a long time, but we did see false-timeouts caused by this, see explanation below. Since the cost now outweight the benefit, remove the timeout altogether. The false timeout happens during mixed-shard repair. The `reader_permit::set_timeout()` call is called on the top-level permit which repair has a handle on. In the case of the mixed-shard repair, this belongs to the multishard reader. Calling set_timeout() on the multishard reader has no effect on the actual shard readers, except in one case: when the shard reader is created, it inherits the multishard reader's current timeout. As the shard reader can be alive for a long time, this timeout is not refreshed and ultimately causes a timeout and fails the repair. Refs: #18269 (cherry picked from commit `3ebb124eb2`) Closes scylladb/scylladb#20957	2024-10-11 14:49:49 +03:00
Calle Wilund	edfa692b80	database: Also forced new schema commitlog segment on user initiated memtable flush Refs #20686 Refs #15607 In #15060 we added forced new commitlog segment on user initated flush, mainly so that tests can verify tombstone gc and other compaction related things, without having to wait for "organic" segment deletion. Schema commitlog was not included, mainly because we did not have tests featuring compaction checks of schema related tables, but also because it was assumed to be lower general througput. There is however no real reason to not include it, and it will make some testing much quicker and more predictable. (cherry picked from commit `60f8a9f39d`) Closes scylladb/scylladb#20706	2024-10-11 14:46:32 +03:00
Avi Kivity	16eab130fa	Merge '[Backport 6.0] scylla_raid_setup: configure SELinux file context' from ScyllaDB On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla. Fixes #19325 (cherry picked from commit `56c971373c`) (cherry picked from commit `0ac450de05`) Refs #20528 Closes scylladb/scylladb#20872 * github.com:scylladb/scylladb: scylla_raid_setup: configure SELinux file context scylla_coredump_setup: fix SELinux configuration for RHEL9	2024-10-10 19:02:11 +03:00
Gleb Natapov	899c696a3e	storage_proxy: make sure there is no end iterator in _live_iterators array storage_proxy::cancellable_write_handlers_list::update_live_iterators assumes that iterators in _live_iterators can be dereferenced, but the code does not make any attempt to make sure this is the case. The iterator can be the end iterator which cannot be dereferenced. The patch makes sure that there is no end iterator in _live_iterators. Fixes scylladb/scylladb#20874 (cherry picked from commit `da084d6441`) Closes scylladb/scylladb#21005	2024-10-10 18:57:41 +03:00
Piotr Smaron	2557991f92	cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS Tablets load balancer is unable to process more than a single pending replica, thus ALTER tablets KS cannot accept an ALTER statement which would result in creating 2+ pending replicas, hence it has to validate if the sum of absoulte differences of RFs specified in the statement is not greter than 1. (cherry picked from commit `ee56bbfe61`)	2024-10-10 12:38:00 +02:00
Piotr Smaron	f05b9adba6	cql: join new and old KS options in ALTER tablets KS A bug has been discovered while trying to ALTER tablets KS and specifying only 1 out of 2 DCs - the not specified DC's RF has been zeroed. This is because ALTER tablets KS updated the KS only with the RF-per-DC mapping specified in the ALTER tablets KS statement, so if a DC was ommitted, it was assigned a value of RF=0. This commit fixes that plus additionally passes all the KS options, not only the replication options, to the topology coordinator, where the KS update is performed. `initial_tablets` is a special case, which requires a special handling in the source code, as we cannot simply update old initial_tablet's settings with the new ones, because if only ` and TABLETS = {'enabled': true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but rather keep the old value - this is tested by the `test_alter_preserves_tablets_if_initial_tablets_skipped` testcase. Other than that, the above mentioned testcase started to fail with these changes, and it appeared to be an issue with the test not waiting until ALTER is completed, and thus reading the old value, hence the test's body has been modified to wait for ALTER to complete before performing validation. (cherry picked from commit `2aabe7f09c`)	2024-10-10 12:37:54 +02:00
Kefu Chai	e0f1076f63	auth: capture boost::regex_error not std::regex_error in `a3db5401`, we introduced the TLS certi authenticator, which is configured using `auth_certificate_role_queries` option . the value of this option contains a regular expression. so there are chances the regular expression is malformatted. in that case, when converting its value presenting the regular expression to an instance of `boost::regex`, Boost.Regex throws a `boost::regex_error` exception, not `std::regex_error`. since we decided to use Boost.Regex, let's catch `boost::regex_error`. Refs `a3db5401` Fixes scylladb/scylladb#20941 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `439c52c7c5`) Closes scylladb/scylladb#20954	2024-10-09 21:56:09 +03:00
Piotr Smaron	7aceb8a763	cql: fix validation of ALTERing RFs in tablets KS The validation has been corrected with: 1. Checking if a DC specified in ALTER exists. 2. Removing `REPLICATION_STRATEGY_CLASS_KEY` key from a map of RFs that needs their RFs to be validated. (cherry picked from commit `6676e47371`)	2024-10-08 18:08:23 +00:00
Piotr Smaron	0913a15dc1	cql: harden `alter_keyspace_statement.cc::validate_rf_difference` This function assumed that strings passed as arguments will be of integer types, but that wasn't the case, and we missed that because this function didn't have any validation, so this change adds proper validation and error logging. Arguments passed to this function were forwarded from a call to `ks_prop_defs::get_replication_options`, which, among rf-per-dc mapping, returns also `class:replication_strategy` pair. Second pair's member has been casted into an `int` type and somehow the code was still running fine, but only extra testing added later discovered a bug in here. (cherry picked from commit `93d61d7031`)	2024-10-08 18:08:22 +00:00
Piotr Smaron	e03cc8aa6c	cql: validate RF change for new DCs in ALTER tablets KS ALTER tablets KS validated if RF is not changed by more than 1 for DCs that already had replicas, but not for DCs that didn't have them yet, so specifying an RF jump from 0 to 2 was possible when listing a new DC in ALTER tablets KS statement, which violated internal invariants of tablets load balancer. This PR fixes that bug and adds a multi-dc testcases to check if adding replicas to a new DC and removing replicas from a DC is honoring the RF change constraints. Refs: #20039 (cherry picked from commit `47acdc1f98`)	2024-10-08 18:08:22 +00:00
Piotr Smaron	4172d34c5c	cql: extend test_alter_tablet_keyspace_rf Added cases to also test decreasing RF and setting the same RF. Also added extra explanatory comments. (cherry picked from commit `9c5950533f`)	2024-10-08 18:08:22 +00:00
Piotr Smaron	b3fcd9fc5b	cql: refactor test_tablets::test_alter_tablet_keyspace 1. Renamed the testcase to emphasize that it only focuses on testing changing RF - there are other tests that test ALTER tablets KS in general. 2. Fixed whitespaces according to PEP8 (cherry picked from commit `adf453af3f`)	2024-10-08 18:08:21 +00:00
Piotr Smaron	9810fb3efd	cql: remove unused helper function from test_tablets `change_default_rf` is not used anywhere, moreover it uses `replication_factor` tag, which is forbidden in ALTER tablets KS statement. (cherry picked from commit `042825247f`)	2024-10-08 18:08:21 +00:00
Raphael S. Carvalho	c4cdfb1d78	service: Improve error handling for split Retry wasn't really happening since the loop was broken and sleep part was skipped on error. Also, we were treating abort of split during shutdown as if it were an actual error and that confused longevity tests that parse for logs with error level. The fix is about demoting the level of logs when we know the exception comes from shutdown. Fixes #20890. (cherry picked from commit `bcd358595f`)	2024-10-04 11:17:41 +00:00
Pavel Emelyanov	1ff582f808	cql: Check that CREATEing tablets/vnodes is consistent with the CLI There are two bits that control whenter replication strategy for a keyspace will use tablets or not -- the configuration option and CQL parameter. This patch tunes its parsing to implement the logic shown below: if (strategy.supports_tablets) { if (cql.with_tablets) { if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { throw "tablets are not enabled"; } } else if (cql.with_tablets = off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { return create_keyspace_without_tablets(); } } } else { // strategy doesn't support tablets if (cql.with_tablets == on) { throw "invalid cql parameter"; } else if (cql.with_tablets == off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified return create_keyspace_without_tablets(); } } closes: #20088 In order to enable tablets "by default" for NetworkTopologyStrategy there's explicit check near ks_prop_defs::get_initial_tablets(), that's not very nice. It needs more care to fix it, e.g. provide feature service reference to abstract_replication_strategy constructor. But since ks_prop_defs code already highjacks options specifically for that strategy type (see prepare_options() helper), it's OK for now. There's also #20768 misbehavior that's preserved in this patch, but should be fixed eventually as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20929	2024-10-03 17:08:26 +03:00
Michael Litvak	e392531ca9	mv: skip building view updates on a pending replica Currently, a pending replica that applies a write on a table that has materialized views, will build all the view updates as a normal replica, only to realize at a late point, in db::view::get_view_natural_endpoint(), that it doesn't have a paired view replica to send the updates to. It will then either drop the view updates, or send them to a pending view replica, if such exists. This work is unnecessary since it may be dropped, and even if there is a pending view replica to send the updates to, the updates that are built by the pending replica may be wrong since it may have incomplete information. This commit fixes the inefficiency by skipping the view update building step when applying an update on a pending replica. The metric total_view_updates_on_wrong_node is added to count the cases that a view update is determined to be unnecessary. The test reproduces the scenario of writing to a table and applying the update on a pending replica, and verifies that the pending replica doesn't try to build view updates. Fixes scylladb/scylladb#19152 Closes scylladb/scylladb#19488 Fixes scylladb/scylladb#20787 (cherry picked from commit `08b29460fc`) Closes scylladb/scylladb#20934	2024-10-03 11:17:13 +02:00

1 2 3 4 5 ...

43224 Commits