scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Benny Halevy	415bdf3160	database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param Prepare for making the function async. Then, it will need to hold on to the erm while getting the token_ranges asynchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `2bbbe2a8bc`)	2024-08-26 21:50:39 +00:00
Benny Halevy	6b2d0f5934	compaction: task_manager_module: open code maybe_get_keyspace_local_ranges It is used only here and can be simplified by checking if the keyspace replication strategy is per table by the caller. Prepare for making get_keyspace_local_ranges async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ea5a0cca10`)	2024-08-26 21:50:39 +00:00
Benny Halevy	0f990a8dc5	alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder Add static `make` methods to ranges_holder_{primary,secondary} and use them to make the ranges objects and pass them to `token_ranges_owned_by_this_shard`, rather than letting token_ranges_owned_by_this_shard invoke the right constructor of the ranges_holder class. Prepare for making `make` async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `824bdf99d2`)	2024-08-26 21:50:39 +00:00
Benny Halevy	5f8b199253	alternator: ttl: can pass const gms::gossiper& to ranges_holder There's no need to pass a mutable reference to the gossiper. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b2abbae24b`)	2024-08-26 21:50:38 +00:00
Benny Halevy	2288f98d83	alternator: ttl: ranges_holder_primary: unconstify _token_ranges member To allow the class to be nothrow_move_constructable. Prepare for returning it as a future value. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `333c0d7c88`)	2024-08-26 21:50:38 +00:00
Benny Halevy	3ed214a728	alternator: ttl: refactor token_ranges_owned_by_this_shard Rather than holding a variant member (and defining both ranges_holder_{primary,secondary} in both specilizations of the class, just make the internal ranges_holder class first-class citizens and parameterize the `token_ranges_owned_by_this_shard` template by this class type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `d385219a12`)	2024-08-26 21:50:38 +00:00
Michał Jadwiszczak	b7e6f22999	cql3/statements/create_service_level: forbid creating SL starting with `$` Tenant names starting with `$` are reserved for internal ones. Forbid creating new service level which name starts with `$` and log a warning for existing service levels with `$` prefix. (cherry picked from commit `d729d1b272`) Closes scylladb/scylladb#20156	2024-08-26 13:03:16 +03:00
Anna Stuchlik	1683b07d2e	doc: extract the info about tablets defaut to a separate file This commit extracts the information about the default for tables in keyspace creation to a separate file in the _common folder. The file is then included using the scylladb_include_flag directive. The purpose of this commit is to make it possible to include a different file in the scylla-enterprise repo - with a different default. Refs https://github.com/scylladb/scylla-enterprise/issues/4585 (cherry picked from commit `107708434c`) Closes scylladb/scylladb#20220	2024-08-21 11:07:19 +03:00
David Garcia	853d2ec76f	docs: improve include flag directive The include flag directive now treats missing content as info logs instead of warnings. This prevents build failures when the enterprise-specific content isn't yet available. If the enterprise content is undefined, the directive automatically loads the open-source content. This ensures the end user has access to some content. address comments (cherry picked from commit `30887d096f`) Closes scylladb/scylladb#20226	2024-08-21 10:20:21 +03:00
Botond Dénes	0b1dbb3a64	Update tools/java submodule * tools/java 33938ec1...27999135 (1): > cassandra-stress: Make default repl. strategy NetworkTopologyStrategy Fixes: scylladb/scylla-tools-java#400 Closes scylladb/scylladb#20199	2024-08-21 10:02:59 +03:00
Anna Stuchlik	4b88ec4722	doc: fix a link on the RBAC page This commit fixes an external link on the Role Based Access Control page. Fixes https://github.com/scylladb/scylladb/issues/20166 (cherry picked from commit `c56c3ce469`) Closes scylladb/scylladb#20202	2024-08-19 15:29:54 +03:00
Dawid Medrek	8d90b81766	db/hints: Make commitlog use commitlog IO scheduling group Before these changes, we didn't specify which I/O scheduling group commitlog instances in hinted handoff should use. In this commit, we set it explicitly to the commitlog scheduling group. The rationale for this choice is the fact we don't want to cause a bottleneck on the write path -- if hints are written too slowly, new incoming mutations (NOT hints) might be rejected due to a too high number of hints currently being written to disk; see `storage_proxy::create_write_response_handler_helper()` for more context. (cherry picked from commit `6a7fb18b52`) Closes scylladb/scylladb#20093 scylla-6.1.1-candidate-20240815125859 scylla-6.1.1	2024-08-14 22:14:43 +03:00
Raphael S. Carvalho	bc0097688f	replica: Fix race between split compaction and migration After removal of rwlock (`53a6ec05ed`), the race was introduced because the order that compaction groups of a tablet are closed, is no longer deterministic. Some background first: Split compaction runs in main (unsplit) group, and adds sstable to left and right groups on completion. The race works as follow: 1) split compaction starts on main group of tablet X 2) tablet X reaches cleanup stage, so its compaction groups are closed in parallel 3) left or right group are closed before main (more likely when only main has flush work to do) 4) split compaction completes, and adds sstable to left and right 5) if e.g left is closed, adjusting backlog tracker will trigger an exception, and since that happens in row cache update's execute(), node crashes. The problem manifested as follow: [shard 0: gms] raft_topology - Initiating tablet cleanup of 5739b9b0-49d4-11ef-828f-770894013415:15 on 102a904a-0b15-4661-ba3f-f9085a5ad03c:0 ... [shard 0:strm] compaction - [Split keyspace1.standard1 009e2f80-49e5-11ef-85e3-7161200fb137] Splitting [/var/lib/scylla/data/keyspace1/...] ... [shard 0:strm] cache - Fatal error during cache update: std::out_of_range (Compaction state for table [0x600007772740] not found), at: ... -------- seastar::continuation<seastar::internal::promise_base_with_type<void>, row_cache::do_update(... -------- seastar::internal::do_with_state<std::tuple<row_cache::external_updater, std::function<seastar::future<void> ()> >, seastar::future<void> > -------- seastar::internal::coroutine_traits_base<void>::promise_type -------- seastar::internal::coroutine_traits_base<void>::promise_type -------- seastar::(anonymous namespace)::thread_wake_task -------- seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::async<sstables::compaction::run(... seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::future<sstables::compaction_resu... From the log above, it can be seen cache update failure happens under streaming sched group and during compaction completion, which was good evidence to the cause. Problem was reproduced locally with the help of tablet shuffling. Fixes: #19873. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `5af1f41ecd`) Closes scylladb/scylladb#20107	2024-08-14 22:13:53 +03:00
Aleksandra Martyniuk	69c1a0e2ca	repair: use find_column_family in insert_repair_meta repair_service::insert_repair_meta gets the reference to a table and passes it to continuations. If the table is dropped in the meantime, the reference becomes invalid. Use find_column_family at each table occurrence in insert_repair_meta instead. Fixes: #20057 (cherry picked from commit `719999b34c`) Refs #19953 Closes scylladb/scylladb#20076	2024-08-14 20:54:12 +03:00
Avi Kivity	c382e19e5e	Merge '[Backport 6.1] Prevent ALTERing non-existing KS with tablets' from ScyllaDB ALTER tablets KS executes in 2 steps: 1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req, 2. global topo req is executed by topo coordinator, which reads data attached to the req. The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue. BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126 (I suggest to disable displaying whitespace differences when reviewing this PR). Fixes: #19576 Requires 6.0 backport (cherry picked from commit `5b089d8e10`) (cherry picked from commit `0ea2128140`) (cherry picked from commit `ddb5204929`) Refs #19666 Closes scylladb/scylladb#20143 * github.com:scylladb/scylladb: tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist cql: refactor rf_change indentation Prevent ALTERing non-existing KS with tablets	2024-08-14 20:16:55 +03:00
Michał Chojnowski	b786e6a39a	cql_test_env: ensure shutdown() before stop() for system_keyspace If system_keyspace::stop() is called before system_keyspace::shutdown(), it will never finish, because the uncleared shared pointers will keep it alive indefinitely. Currently this can happen if an exception is thrown before the construction of the shutdown() defer. This patch moves the shutdown() call to immediately before stop(). I see no reason why it should be elsewhere. Fixes scylladb/scylla-enterprise#4380 (cherry picked from commit `eeaf4c3443`) Closes scylladb/scylladb#20145	2024-08-14 20:16:29 +03:00
Piotr Smaron	706761d8ec	tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist Using the error injection framework, we inject a sleep into the processing path of ALTER tablets KS, so that the topology coordinator of the leader node sleeps after the rf_change event has been scheduled, but before it is started to be executed. During that time the second node executes a DROP KS statement, which is propagated to the leader node. Once leader node wakes up and resumes processing of ALTER tablets KS, the KS won't exist and the node cannot crash, which was the case before. (cherry picked from commit `ddb5204929`)	2024-08-14 10:37:25 +00:00
Piotr Smaron	41e4c39087	cql: refactor rf_change indentation (cherry picked from commit `0ea2128140`)	2024-08-14 10:37:24 +00:00
Piotr Smaron	d5bdef9ee5	Prevent ALTERing non-existing KS with tablets ALTER tablets KS executes in 2 steps: 1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req, 2. global topo req is executed by topo coordinator, which reads data attached to the req. The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue. BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126 Fixes: #19576 (cherry picked from commit `5b089d8e10`)	2024-08-14 10:37:24 +00:00
Jenkins Promoter	a4dcf3956e	Update ScyllaDB version to: 6.1.1	2024-08-14 12:28:43 +03:00
Anna Stuchlik	858fa914b1	doc: update Raft info in 6.1 This commit updates the Raft information regarding the Raft verification procedure. In 6.1, the procedure is no longer related to the upgrade. Fixes https://github.com/scylladb/scylladb/issues/19932 (cherry picked from commit `705e53d223`) Closes scylladb/scylladb#20083	2024-08-11 11:37:05 +03:00
Kamil Braun	ec923171a6	storage_service: raft topology: warn when `raft_topology_cmd_handler` fails due to abort Currently we print an ERROR on all exceptions in `raft_topology_cmd_handler`. This log level is too high, in some cases exceptions are expected -- like during shutdown. And it causes dtest failures. Turn exceptions from aborts into WARN level. Also improve logging by printing the command that failed. Fixes scylladb/scylladb#19754 (cherry picked from commit `7506709573`) Closes scylladb/scylladb#20071	2024-08-08 18:13:53 +02:00
Tomasz Grabiec	0144549cd6	tablets: Do not allocate tablets on nodes being decommissioned If tablet-based table is created concurrently with node being decommissioned after tablets are already drained, the new table may be permanently left with replicas on the node which is no longer in the topology. That creates an immidiate availability risk because we are running with one replica down. This also violates invariants about replica placement and this state cannot be fixed by topology operations. One effect is that this will lead to load balancer failure which will inhibit progress of any topology operations: load_balancer - Replica 154b0380-1dd2-11b2-9fdd-7156aa720e1a:0 of tablet 7e03dd40-537b-11ef-9fdd-7156aa720e1a:1 not found in topology, at: ... Fixes #20032 (cherry picked from commit `f5c74a5df2`) Closes scylladb/scylladb#20066	2024-08-08 11:56:13 +03:00
Kamil Braun	0f246bfbc9	raft topology: improve logging Add more logging for raft-based topology operations in INFO and DEBUG levels. Improve the existing logging, adding more details. Fix a FIXME in test_coordinator_queue_management (by readding a log message that was removed in the past -- probably by accident -- and properly awaiting for it to appear in test). Enable group0_state_machine logging at TRACE level in tests. These logs are relatively rare (group 0 commands are used for metadata operations) and relatively small, mostly consist of printing `system.group0_history` mutation in the applied command, for example: ``` TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - apply() is called with 1 commands TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd: prev_state_id: optional(dd9d47c6-50ee-11ef-d77f-500b8e1edde3), new_state_id: dd9ea5c6-50ee-11ef-ae64-dfbcd08d72c3, creator_addr: 127.219.233.1, creator_id: 02679305-b9d1-41ef-866d-d69be156c981 TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd.history_append: {canonical_mutation: table_id 027e42f5-683a-3ed7-b404-a0100762063c schema_version c9c345e1-428f-36e0-b7d5-9af5f985021e partition_key pk{0007686973746f7279} partition_tombstone {tombstone: none}, row tombstone {range_tombstone: start={position: clustered, ckp{0010b4ba65c64b6e11ef8080808080808080}, 1}, end={position: clustered, ckp{}, 1}, {tombstone: timestamp=1722617232237511, deletion_time=1722617232}}{row {position: clustered, ckp{0010dd9ea5c650ee11efae64dfbcd08d72c3}, 0} tombstone {row_tombstone: none} marker {row_marker: 1722617232237511 0 0}, column description atomic_cell{ create system_distributed keyspace; create system_distributed_everywhere keyspace; create and update system_distributed(_everywhere) tables,ts=1722617232237511,expiry=-1,ttl=0}}} ``` note that the mutation contains a human-readable description of the command -- like "create system_distributed keyspace" above. These logs might help debugging various issues (e.g. when `apply` hangs waiting for read_apply mutex, or takes too long to apply a command). Ref: scylladb/scylladb#19105 Ref: scylladb/scylladb#19945 (cherry picked from commit `e8d5974961`) Closes scylladb/scylladb#20048	2024-08-07 13:39:30 +02:00
Anna Stuchlik	1a1583a5b6	doc: add post-installation configuration to the Web Installer page This commit extracts the information about the configuration the user should do right after installation (especially running scylla_setup) to a separate file. The file is included in the relevant pages, i.e., installing with packages and installing with Web Installer. In addition, the examples on the Web Installer page are updated with supported versions of ScyllaDB. Fixes https://github.com/scylladb/scylladb/issues/19908 (cherry picked from commit `849856b964`) Closes scylladb/scylladb#20050	2024-08-07 10:14:13 +03:00
Botond Dénes	f78b88b59b	Merge '[Backport 6.1] db/view: drop view updates to replaced node marked as left' from ScyllaDB When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. In addition to the fix, this PR also includes a regression test heavily based on the test that @kbr-scylla prepared during his investigation of the issue. Fixes: scylladb/scylladb#19439 This issue can cause multiple nodes to crash at once and the fix is quite small, so I think this justifies backporting it to all affected versions. 6.0 and 6.1 are affected. No need to backport to 5.4 as this issue only happens with tablets, and tablets are experimental there. (cherry picked from commit `6af7882c59`) (cherry picked from commit `5ec8c06561`) Refs #19765 Closes scylladb/scylladb#19895 * github.com:scylladb/scylladb: test: regression test for MV crash with tablets during decommission db/view: drop view updates to replaced node marked as left	2024-08-07 09:18:26 +03:00
Tzach Livyatan	73d46ec548	Improve tombstone_compaction_interval description (cherry picked from commit `861a1cedea`) Closes scylladb/scylladb#20025	2024-08-07 09:06:56 +03:00
Tzach Livyatan	dcee7839d4	Update tracing.rst - fix table node_slow_log_time name (cherry picked from commit `858fd4d183`) Closes scylladb/scylladb#20023	2024-08-07 09:05:50 +03:00
Anna Stuchlik	75477f5661	doc: add OS support for version 6.1 This commit adds OS support for version 6.1 and removes OS support for 5.4 (according to our support policy for versions). (cherry picked from commit `eca2dfd8c3`) Closes scylladb/scylladb#20019	2024-08-07 09:04:13 +03:00
Nadav Har'El	78d7c953b0	test: increase timeouts for /localnodes test In commit `bac7c33313` we introduced a new test for the Alternator "/localnodes" request, checking that a node that is still joining does not get returned. The tests used what I thought were "very high" timeouts - we had a timeout of 10 seconds for starting a single node, and injected a 20 second sleep to leave us 10 seconds after the first sleep. But the test failed in one extremely slow run (a debug build on aarch64), where starting just a single node took more than 15 seconds! So in this patch I increase the timeouts significantly: We increase the wait for the node to 60 seconds, and the sleeping injection to 120 seconds. These should definitely be enough for anyone (famous last words...). The test doesn't actually wait for these timeouts, so the ridiculously high timeouts shouldn't affect the normal runtime of this test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `ca8b91f641`) Closes scylladb/scylladb#19940	2024-08-07 08:55:23 +03:00
Nadav Har'El	753fc87efa	alternator: exclude CDC log table from ListTables The Alternator command ListTables is supposed to list actual tables created with CreateTable, and should list things like materialized views (created for GSI or LSI) or CDC log tables. We already properly excluded materialized views from the list - and had the tests to prove it - but forgot both the exclusion and the testing for CDC log tables - so creating a table xyz with streams enable would cause ListTables to also list "xyz_scylla_cdc_log". This patch fixes both oversights: It adds the code to exclude CDC logs from the output of ListTables, add adds a test which reproduces the bug before this fix, and verifies the fix works. Fixes #19911. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `d293a5787f`) Closes scylladb/scylladb#19938	2024-08-07 08:54:08 +03:00
Benny Halevy	c75dbc1f9c	sstable_directory: delete_atomically: allow sstables from multiple prefixes Currently, delete_atomically can be called with a list of sstables from mixed prefixes in two cases: 1. truncate: where we delete all the sstables in the table directory 2. tablet cleanup: similar to truncate but restricted to sstables in a single tablet replica In both cases, it is possible that sstables in staging (or quarantine) are mixed with sstables in the base directory. Until a more comprehensive fix is in place, (see https://github.com/scylladb/scylladb/pull/19555) this change just lifts the ban on atomic deletion of sstables from different prefixes, and acknowledging that the implementation is not atomic across prefixes. This is better than crashing for now, and can be backported more easily to branches that support tablets so tablet migration can be done safely in the presence of repair of tables with views. Refs scylladb/scylladb#18862 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `26abad23d9`) Closes scylladb/scylladb#19919	2024-08-06 16:27:57 +03:00
Lakshmi Narayanan Sreethar	96e5ebe28c	boost/bloom_filter_test: wait for total memory reclaimed update The testcase `test_bloom_filter_reclaim_during_reload` checks the SSTable manager's `_total_memory_reclaimed` against an expected value to verify that a Bloom filter was reloaded. However, it does not wait for the manager to update the variable, causing the check to fail if the update has not occurred yet. Fix it by making the testcase wait until the variable is updated to the expected value. Fixes #19879 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `27b305b9d1`) Closes scylladb/scylladb#19897	2024-08-06 16:26:36 +03:00
Takuya ASADA	c45e92142e	scylla_raid_setup: install update-initramfs when it's not available scylla_raid_setup may fail on Ubuntu minimal image since it calls update-initramfs without installing. (cherry picked from commit `02b20089cb`) Closes scylladb/scylladb#19869	2024-08-06 16:24:27 +03:00
Aleksandra Martyniuk	d69f0e529a	test: tasks: adjust tests to new wait_task behavior After `c1b2b8cb2c` /task_manager/wait_task/ does not unregister tasks anymore. Delete the check if the task was unregistered from test_task_manager_wait. Check task status in drain_module_tasks to ensure that the task is removed from task manager. Fixes: #19351. (cherry picked from commit `dfe3af40ed`) Closes scylladb/scylladb#19839	2024-08-06 16:23:02 +03:00
Łukasz Paszkowski	86ff3c2aa3	api/system: add highest_supported_sstable_format path Current upgrade dtest rely on a ccm node function to get_highest_supported_sstable_version() that looks for r'Feature (.*)_SSTABLE_FORMAT is enabled' in the log files. Starting from scylla-6.0 ME_SSTABLE_FORMAT is enabled by default and there is no cluster feature for it. Thus get_highest_supported_sstable_version() returns an empty list resulting in the upgrade tests failures. This change introduces a seperate API path that returns the highest supported sstable format (one of la, mc, md, me) by a scylla node. Fixes scylladb/scylladb#19772 Backports to 6.0 and 6.1 required. The current upgrade test in dtest checks scylla upgrades up to version 5.4 only. This patch is a prerequisite to backport the upgrade tests fix in dtest. (cherry picked from commit `781eb7517c`) Closes scylladb/scylladb#19814	2024-08-06 16:21:48 +03:00
Avi Kivity	efac73109e	Merge '[Backport 6.1] doc: add the 6.0-to-6.1 upgrade guide' from ScyllaDB This PR adds the 6.0-to-6.1 upgrade guide (including metrics) and removes the 5.4-to-6.0 upgrade guide. Compared 5.4-to-6.0, the the 6.0-to-6.1 guide: - Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite. - Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates are mandatory in 6.1 and don't require any user action after upgrading to 6.1. - Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management), so now there's no scenario that would require the user to follow the validation procedure. - Removed the references to the Enable Consistent Topology Updates page (which was in version 6.0 and is removed with this PR) across the docs. See the individual commits for more details. Fixes https://github.com/scylladb/scylladb/issues/19853 Fixes https://github.com/scylladb/scylladb/issues/19933 This PR must be backported to branch-6.1 as it is critical in version 6.1. (cherry picked from commit `9972e50134`) (cherry picked from commit `32fa5aa938`) Refs #19983 Closes scylladb/scylladb#20038 * github.com:scylladb/scylladb: doc: remove the 5.4-to-6.0 upgrade guide doc: add the 6.0-to-6.1 upgrade guide	2024-08-06 13:28:24 +03:00
Anna Stuchlik	8c975712d3	doc: remove the 5.4-to-6.0 upgrade guide This commit removes the 5.4-to-6.0 upgrade guide and all references to it. It mainly removes references to the Enable Consistent Topology Updates page, which was added as enabling the feature was optional. In rare cases, when a reference to that page is necessary, the internal link is replaced with an external link to version 6.0. Especially the Handling Cluster Membership Change Failures page was modified for troubleshooting purposes rather than removed. (cherry picked from commit `32fa5aa938`)	2024-08-06 10:20:09 +00:00
Anna Stuchlik	1fdfe11bb0	doc: add the 6.0-to-6.1 upgrade guide This commit adds the 6.0-to-6.1 upgrade guide. Compared to the previous upgrade guide: - Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite. - Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates are mandatory in 6.1 and don't require any user action after upgrading to 6.1. - Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management), so now there's no scenario that would require the user to follow the validation procedure. (cherry picked from commit `9972e50134`)	2024-08-06 10:20:09 +00:00
Botond Dénes	58c06819d7	Update ./tools/python3 submodule * ./tools/python3 18fa79ee...ea49f0ca (1): > install.sh: fix incorrect permission on strict umask Fixes: https://github.com/scylladb/scylladb/issues/19775 Closes scylladb/scylladb#20022	2024-08-06 10:02:07 +03:00
Michael Litvak	5b604509ce	db: fix waiting for counter update operations on table stop When a table is dropped it should wait for all pending operations in the table before the table is destroyed, because the operations may use the table's resources. With counter update operations, currently this is not the case. The table may be destroyed while there is a counter update operation in progress, causing an assert to be triggered due to a resource being destroyed while it's in use. The reason the operation is not waited for is a mistake in the lifetime management of the object representing the write in progress. The commit fixes it so the object lives for the duration of the entire counter update operation, by moving it to the `do_with` list. Fixes scylladb/scylla-enterprise#4475 Closes scylladb/scylladb#20018	2024-08-05 12:54:19 +02:00
Jenkins Promoter	abbf0b24a6	Update ScyllaDB version to: 6.1.0 scylla-6.1.0-candidate-20240804073311 scylla-6.1.0	2024-08-04 14:31:47 +03:00
Kamil Braun	347857e5e5	Merge '[Backport 6.1] raft: fix the shutdown phase being stuck' from ScyllaDB Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 (cherry picked from commit `2dbe9ef2f2`) (cherry picked from commit `5dfc50d354`) Refs #19860 Closes scylladb/scylladb#19970 * github.com:scylladb/scylladb: raft: fix the shutdown phase being stuck raft: use the abort source reference in raft group0 client interface	2024-08-02 11:24:34 +02:00
Emil Maskovsky	cd2ca5ef57	raft: fix the shutdown phase being stuck Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 (cherry picked from commit `5dfc50d354`)	2024-07-31 20:52:23 +00:00
Emil Maskovsky	5a4065ecd5	raft: use the abort source reference in raft group0 client interface Most callers of the raft group0 client interface are passing a real source instance, so we can use the abort source reference in the client interface. This change makes the code simpler and more consistent. (cherry picked from commit `2dbe9ef2f2`)	2024-07-31 20:52:23 +00:00
Kamil Braun	ed4f2ecca4	docs: extend "forbidden operations" section for Raft-topology upgrade The Raft-topology upgrade procedure must not be run concurrently with version upgrade. (cherry picked from commit `bb0c3cdc65`) Closes scylladb/scylladb#19836	2024-07-29 16:52:40 +02:00
Jenkins Promoter	8f80a84e93	Update ScyllaDB version to: 6.1.0-rc2	2024-07-29 15:50:26 +03:00
Piotr Dulikowski	95abb6d4a7	test: regression test for MV crash with tablets during decommission Regression test for scylladb/scylladb#19439. Co-authored-by: Kamil Braun <kbraun@scylladb.com> (cherry picked from commit `5ec8c06561`)	2024-07-26 14:02:51 +00:00
Piotr Dulikowski	30b0cb4f5d	db/view: drop view updates to replaced node marked as left When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. Fixes: scylladb/scylladb#19439 (cherry picked from commit `6af7882c59`)	2024-07-26 14:02:50 +00:00
Nadav Har'El	97ae704f99	alternator: do not allow authentication with a non-"login" role Alternator allows authentication into the existing CQL roles, but roles which have the flag "login=false" should be refused in authentication, and this patch adds the missing check. The patch also adds a regression test for this feature in the test/alternator test framework, in a new test file test/alternator/cql_rbac.py. This test file will later include more tests of how the CQL RBAC commands (CREATE ROLE, GRANT, REVOKE) affect authentication and authorization in Alternator. In particular, these tests need to use not just the DynamoDB API but also CQL, so this new test file includes the "cql" fixture that allows us to run CQL commands, to create roles, to retrieve their secret keys, and so on. Fixes #19735 (cherry picked from commit `14cd7b5095`) Closes scylladb/scylladb#19863 scylla-6.1.0-rc1-candidate-20240727120202 scylla-6.1.0-rc1	2024-07-25 12:45:27 +03:00

1 2 3 4 5 ...

43595 Commits