scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 09:00:35 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	4ff08decb8	virtual_tables: cluster_status_table: execute: set dc regardless of the token ownership If a node is in `locator::topology`, then it has a location. We remove the token ownership condition to make the table more descriptive.	2024-08-29 10:37:06 +02:00
Piotr Dulikowski	da5f4faac1	Merge 'mv: reject user requests by coordinator when a replica is overloaded by MVs' from Wojciech Mitros Currently, when a view update backlog of one replica is full, the write is still sent by the coordinator to all replicas. Because of the backlog, the write fails on the replica, causing inconsistency that needs to be fixed by repair. To avoid these inconsistencies, this patch adds a check on the coordinator for overloaded replicas. As a result, a write may be rejected before being sent to any replicas and later retried by the user, when the replica is no longer overloaded. This patch does not remove the replica write failures, because we still may reach a full backlog when more view updates are generated after the coordinator check is performed and before the write reaches the replica. Fixes scylladb/scylladb#17426 Closes scylladb/scylladb#18334 * github.com:scylladb/scylladb: mv: test the view update behavior mv: add test for admission control storage_proxy: return overloaded_exception instead of throwing mv: reject user requests by coordinator when a replica is overloaded by MVs	2024-08-27 12:50:34 +02:00
Pavel Emelyanov	ed6e6700ab	backup-task: Make it abortable (almost) Make the impl::is_abortable() return 'yes' and check the impl::_as in the files listing loop. It's not real abort, since files listing loop is expected to be fast and most of the time will be spent in s3::client code reading data from disk and sending them to S3, but client doesn't support aborting its requests. That's some work yet to be done. Also add injection for future testing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	a812f13ddd	code: Introduce backup API method The method starts a task that uploads all files from the given keyspace's snapshot to the requested endpoint/bucket. The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion (hint: it's good to have non-zero TTL value to make sure fast backups don't finish before the caller manages to call wait_task API). If snapshot doesn't exist, nothing happens (FIXME, need to return back an error in that case). If endpoint is not configured locally, the API call resolves with bad-request instantly. Sstables components are scanned for all tables in the keyspace and are uploaded into the /bucket/${cf_name}/${snapshot_name}/ path. Task is not abortable (FIXME -- to be added) and doesn't really report its progress other than running/done state (FIXME -- to be added too). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	dff51fd58c	snapshot-ctl: Add config to snapshot_ctl Pretty much all services in Scylla have their own config. Add one to snapshot-ctl too, it will be populated later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:20 +03:00
Pavel Emelyanov	f37857e20a	snapshot-ctl: Add sstables::storage_manager dependency The storage_manager maintains set of clients to configured object storage(s). The snapshot ctl is going to spawn tasks that will talk to those storages, thus it needs the storage manager to get the clients from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	362331c89b	snapshot-ctl: Maintain task manager module This service is going to start tasks managed by task manager. For that, it should have its module set up and registered. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4ae89a9c81	snapshot-ctl: Add "snapshots" logger Will be used later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	90c794172b	snapshot-ctl: Outline stop() method and constructor These two are going to grow, keep them out not to pollute the header Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	96946a4b11	snapshot-ctl: Inline run_snapshot_list<> This helper will be used by a code from another .cc file, so the template needs to be in header for smooth instantiation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Tomasz Grabiec	a3a97e8aad	Merge 'schema_tables: calculate_schema_digest: prevent stalls due to large m…' from Benny Halevy …utations vector With a large number of table the schema mutations vector might get big enoug to cause reactor stalls when freed. For example, the following stall was hit on 2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables: ``` (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799 ``` This change returns a mutations generator from the `map` lambda coroutine so we can process them one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls. Fixes #18173 Closes scylladb/scylladb#18174 * github.com:scylladb/scylladb: schema_tables: calculate_schema_digest: filter the key earlier schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector	2024-08-20 21:24:38 +02:00
Avi Kivity	7eb3b15fff	Merge 'utils/tagged_integer: remove conversion to underlying integer' from Laszlo Ersek ~~~ utils/tagged_integer: remove conversion to underlying integer Silently converting a tagged (i.e., "dimension-ful") integer to a naked ("dimensionless") integer defeats the purpose of having tagged integers, and is a source of practical bugs, such as <https://github.com/scylladb/scylladb/issues/20080>. We could make the conversion operator explicit, for enforcing static_cast<TAGGED_INTEGER_TYPE::value_type>(TAGGED_INTEGER_VALUE) in every conversion location -- but that's a mouthful to write. Instead, remove the conversion operator, and let clients call the (identically behaving) value() member function. ~~~ No backport needed (refactoring). The series is supposed to solve #20081. Two patches in the series touch up code that is known to be (orthogonally) buggy; see - `service/raft_sys_table_storage: tweak dead code` (#20080) - `test/raft/replication: untag index_t in test_case::get_first_val()` (#20151) Fixes for those (independent) issues will have to be rebased on this series, or this series will have to be rebased on those (due to context conflicts). The series builds at every stage. The debug and release unit test suites pass at the end. Closes scylladb/scylladb#20159 * github.com:scylladb/scylladb: utils/tagged_integer: remove conversion to underlying integer test/raft/randomized_nemesis_test: clean up remaining index_t usage test/raft/randomized_nemesis_test: clean up index_t usage in store_snapshot() test/raft/replication: clean up remaining index_t usage test/raft/replication: take an "index_t start_idx" in create_log() test/raft/replication: untag index_t in test_case::get_first_val() test/raft/etcd_test: tag index_t and term_t for comparisons and subtractions test/raft/fsm_test: tag index_t and term_t for comparisons and subtractions test/raft/helpers: tighten compare_log_entries() param types service/raft_sys_table_storage: tweak dead code service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries) service/raft_sys_table_storage: untag index_t and term_t for queries raft/server: clean up index_t usage raft/tracker: don't drop out of index_t space for subtraction raft/fsm: clean up index_t and term_t usage raft/log: clean up index_t usage db/system_keyspace: promise a tagged integer from increment_and_get_generation() gms/gossiper: return "strong_ordering" from compare_endpoint_startup() gms/gossiper: get "int32_t" value of "gms::version_type" explicitly	2024-08-19 19:52:54 +03:00
Benny Halevy	52234214e5	schema_tables: calculate_schema_digest: filter the key earlier Currently, each frozen mutation we get from system_keyspace::query_mutations is unfrozen in whole to a mutation and only then we check its key with the provided `accept_keyspace` function. This is wasteful, since they key can be processed directly form the frozen mutation, before taking the toll of unfreezing it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-15 12:33:34 +03:00
Benny Halevy	95a5fba0ea	schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector With a large number of table the schema mutations vector might get big enoug to cause reactor stalls when freed. For example, the following stall was hit on 2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables: ``` (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799 ``` This change returns a mutations generator from the `map` lambda coroutine so we can process them one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls. Fixes #18173 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-15 12:33:34 +03:00
Laszlo Ersek	9e95f3a198	db/system_keyspace: promise a tagged integer from increment_and_get_generation() Internally, increment_and_get_generation() produces a "gms::generation_type" value. In turn, all callers of increment_and_get_generation() -- namely scylla_main() [main.cc] and single_node_cql_env::run_in_thread() [test/lib/cql_test_env.cc] -- pass the resolved value to storage_service::init_address_map() and storage_service::join_cluster(), both of which take a "gms::generation_type". Therefore it is pointless to "untag" the generation value temporarily between the producer and the consumers. Correct the return type of increment_and_get_generation(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Łukasz Paszkowski	da95f44adc	readers: Use reversed schema and native reversed slices The reconcilable_result is built as it would be constructed for forward read queries for tables with reversed order. Mutations constructed for reversed queries are consumed forward. Drop overloaded reversed functions that reverse read_command and reconcilable_result directly and keep only those requiring smart pointers. They are not used any more.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	b270097f1f	config: drop reversed_reads_auto_bypass_cache Reverse reads have already been with us for a while, thus this back door option to bypass in-memory data cache for reversed queries can be retired.	2024-08-13 10:02:42 +02:00
Łukasz Paszkowski	80df313f49	config: drop enable_optimized_reversed_reads Reverse reads have already been with us for a while, thus this back door option to read entire paritions forward and reversing them after can be retired.	2024-08-13 10:02:42 +02:00
Avi Kivity	318278ff92	Merge 'tablets: reload only changed metadata' from Botond Dénes Currently, each change to tablet metadata triggers a full metadata reload from disk. This is very wasteful, especially if the metadata change affects only a single row in the `system.tablets` table. This is the case when the tablet load balancer triggers a migration, this will affect a single row in the table, but today will trigger a full reload. We expect tablet count to potentially grow to thousands and beyond and the overhead of this full reload can become significant. This PR makes tablet metadata reload partial, instead of reloading all metadata on topology or schema changes, reload only the partitions that are affected by the change. Copy the rest from the in-memory state. This is done with two passes: first the change mutations are scanned and a hint is produced. This hint is then passed down to the reload code, which will use it to only reload parts (rows/partitions) of the metadata that has actually changed. The performance difference between full reload and partial reload is quite drastic: ``` INFO 2024-07-25 05:06:27,347 [shard 0:stat] testlog - Tablet metadata reload: full 616.39ms partial 0.18ms ``` This was measured with the modified (by this PR) `perf_tablets`, which creates 100 tables, each with 2K tablets. The test was modified to change a single tablet, then do a full and partial reload respectively, measuring the time it takes for reach. Fixes: #15294 New feature, no backport needed. Closes scylladb/scylladb#15541 * github.com:scylladb/scylladb: test/perf/perf_tablets: add tablet metadata reload perf measurement test/boost/tablets_test: add test for partial tablet metadata updates db/schema_tables: pass tablet hint to update_tablet_metadata() service/storage_service: load_tablet_metadata(): add hint parameter service/migration_listener: update_tablet_metadata(): add hint parameter service/raft/group0_state_machine: provide tablet change hint on topology change service/storage_service: topology_state_load(): allow providing change hint replica/tablets: add update_tablet_metadata() replica/tablets: fix indentation replica/tablets: extract tablet_metadata builder logic replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint() locator/tablets: add tablet_map::clear_tablet_transition_info() locator/tablets: make tablet_metadata cheap to copy mutation/canonical_mutation: add key()	2024-08-11 21:27:18 +03:00
Botond Dénes	b886ed44a7	db/schema_tables: pass tablet hint to update_tablet_metadata() Replace the has_tablet_mutations in `merge_tables_and_views()` with a hint parameter, which is calculated in the caller, from the original schema change mutations. This hint is then forwarded to the notifier's `update_tablet_metadata()` so that subscribers can refresh only the tablet partitions that changed.	2024-08-11 09:53:19 -04:00
Botond Dénes	2cec0d8dd1	service/migration_listener: update_tablet_metadata(): add hint parameter The hint contains information related to what exactly changed, allowing listeners to do partial updates, instead of reloading all metadata on each notification.	2024-08-11 09:53:19 -04:00
Calle Wilund	e18a855abe	extensions: Add exception types for IO extensions and handle in memtable write path Fixes #19960 Write path for sstables/commitlog need to handle the fact that IO extensions can generate errors, some of which should be considered retry-able, and some that should, similar to system IO errors, cause the node to go into isolate mode. One option would of course be for extensions to simply generate std::system_errors, with system_category and appropriate codes. But this is probably a bad idea, since it makes it more muddy at which level an error happened, as well as limits the expressibility of the error. This adds three distinct types (sharing base) distinguishing permission, availabilty and configuration errors. These are treated akin to EACCESS, ENOENT and EINVAL in disk error handler and memtable write loop. Tests updated to use and verify behaviour. Closes scylladb/scylladb#19961	2024-08-11 13:52:35 +03:00
Dawid Medrek	e5d01d4000	db/hints: Make commitlog use commitlog IO scheduling group Before these changes, we didn't specify which I/O scheduling group commitlog instances in hinted handoff should use. In this commit, we set it explicitly to the commitlog scheduling group. The rationale for this choice is the fact we don't want to cause a bottleneck on the write path -- if hints are written too slowly, new incoming mutations (NOT hints) might be rejected due to a too high number of hints currently being written to disk; see `storage_proxy::create_write_response_handler_helper()` for more context. Fixes scylladb/scylladb#18654 Closes scylladb/scylladb#19170	2024-08-08 16:14:07 +02:00
Calle Wilund	d6742e9bce	distributed_loader: Remove load_prio_keyspaces Fixes #13334 All required code paths (see enterprise) now uses extensions::is_extension_internal_keyspace. The old mechanism can be removed. One less global var. Closes scylladb/scylladb#20047	2024-08-08 12:10:27 +03:00
Avi Kivity	3fe60560d2	Merge 'Coroutinize view_builder::start()' from Pavel Emelyanov It runs in the background and consists of two parts -- async() lambda and following .then()-s. This PR move the background running code into its own method and coroutinizes it in parts. With #19954 merged it finally looks really nice. Closes scylladb/scylladb#20058 * github.com:scylladb/scylladb: view_builder: Restore indentation after previous patches view_builder: Coroutinize inner start_in_background() calls view_builder: Coroutinize outer start_in_background() calls view_builder: Add helper method for background start	2024-08-07 19:47:32 +03:00
Dawid Medrek	96509c4cf7	db/hints: Make sync points be created for all hosts when not specified Sync points are created, via POST HTTP requests, for a subset of nodes in the cluster. Those nodes are specified in a request's parameter `target_hosts`. When the parameter is empty, Scylla should assume the user wants to create a sync point for ALL nodes. Before these changes, sync points were created only for LIVE nodes. If a node was dead but still part of the cluster and the user requested creating a sync point leaving the parameter `target_hosts` empty, the dead node was skipped during the creation of the sync point. That was inconsistent with the guarantees the sync point API provides. In this commit, we fix that issue and add a test verifying that the changes have made the implementation compliant with the design of the sync point API -- the test only passes after this commit. Fixes scylladb/scylladb#9413 Closes scylladb/scylladb#19750	2024-08-07 13:15:20 +02:00
Pavel Emelyanov	63afbc0fcb	view_builder: Restore indentation after previous patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	aa1a5d3201	view_builder: Coroutinize inner start_in_background() calls One of the co_await-ed parts of this method is async() lambda. It can be coroutinized too. One thing to care is the semaphore units -- its scope should (?) terminate earlier than the whole start_in_background() so release it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	167c6a9c5e	view_builder: Coroutinize outer start_in_background() calls The method consists of two parts -- one running in async() thread and continuations to it. This patch turns the latter chain into co_await-s. The mentioned chain is "guarded" by then_wrapped() catch of any exception, which is turned into a plain try-catch block. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	10a87f5c5b	view_builder: Add helper method for background start The view_builder::start() happens in the background. It's good to have explicit start_in_background() method and coroutinize it next. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 13:59:57 +03:00
Piotr Dulikowski	1963619803	Merge 'Use cross shard barrier to start view builder' from Pavel Emelyanov When starting, view builder wants all shards to synchronize with each other in the middle of initialization. For that they all synchronize via shard-0's instance counter and a shared future. There's cross-shard barrier in utils/ that provides the same facility. Closes scylladb/scylladb#19954 * github.com:scylladb/scylladb: view_builder: Drop unused members view_builder: Use cross-shard barrier on start view_builder: Add cross-shard barrier to its .start() method	2024-08-07 08:54:15 +02:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Wojciech Mitros	795ac177c2	mv: add test for admission control In this patch we add 2 tests for checking that the mv admission control works. The first one simply checks whether, after increasing the backlog on one node over the admission control threshold, the following request is rejected with the error message corresponding to the admission control. The second one checks whether, after triggering admission control, the entire user request fails instead of just failing a replica write. This is done by performing a number of writes, some of which trigger the admission control and cause retries, then checking if the node that had a large view update backlog received all the writes. Before, the writes would succeed on enough replicas, reaching QUORUM, and allowing the user write to succeed and cause no retries, even though on the replica with a high backlog the write got rejected due to the backlog size.	2024-08-02 12:12:24 +02:00
Wojciech Mitros	5eaae05aaf	mv: reject user requests by coordinator when a replica is overloaded by MVs Currently, when a replica's view update backlog is full, the write is still sent by the coordinator to all replicas. Because of the backlog, the write fails on the replica, causing inconsistency that needs to be fixed by repair. To avoid these inconsistencies, this patch adds a check on the coordinator for overloaded replicas. As a result, a write may be rejected before being sent to any replicas and later retried by the user, when the replica is no longer overloaded. Fixes scylladb/scylladb#17426	2024-08-02 12:12:19 +02:00
Piotr Dulikowski	39b49a41cc	Merge 'mv: delete a partition in a single operation when applicable' from Michael Litvak Currently when a partition is deleted from the base table, we generate a row tombstone update for each one of the view rows in the partition. When the partition key in the view is the same as the base, maybe in a different order, this can be done more efficiently - The whole corresponding view partition can be deleted with one partition tombstone update. With this commit, when generating view updates, if the update mutation has a partition tombstone then for the views which have the same partition key we will generate a partition tombstone update, and skip the individual row tombstone updates. Fixes scylladb/scylladb#8199 Closes scylladb/scylladb#19338 * github.com:scylladb/scylladb: mv: skip reading rows when generating partition tombstone update mv: delete a partition in a single operation when applicable cql-pytest: move ScyllaMetrics to util file to allow reuse	2024-08-02 11:00:18 +02:00
Piotr Dulikowski	44f327675d	Merge 'Remove gossiper argument from storage_service::join_cluster()' from Pavel Emelyanov It's only needed to start hints via proxy, but proxy can do it without gossiper argument Closes scylladb/scylladb#19894 * github.com:scylladb/scylladb: storage_service: Remote gossiper argument from join_cluster() proxy: Use remote gossiper to start hints resource manager hints: Const-ify gossiper references and anchor pointers	2024-08-01 10:18:14 +02:00
Pavel Emelyanov	93ed978729	view_builder: Drop unused members There's a counter and a shared future on board, that used to facilitate start-time barrier synchronization. Now they are not needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:59:40 +03:00
Pavel Emelyanov	613161c7b9	view_builder: Use cross-shard barrier on start When starting, view builder spawns an async background fibers, and upon its completion each shard needs to wait for other shards to do the same. This is exactly what cross-shard barrier is about, so instead of synchronizing via v.b.'s shard-0 instance, use the barrier. This makes the view_builder::start() shorder and earier to read. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:56:25 +03:00
Pavel Emelyanov	fb1b749445	view_builder: Add cross-shard barrier to its .start() method The barrier will be used by next patch to synchronize shards with each other. When passed to invoke_on_all() lambda like this, each lambda gets its its copy of the barrier "handler" that maintains shared state across shards. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:54:28 +03:00
Kefu Chai	36f5032b2d	db: correct the doxygen comment the parameter names do not match with the ones we are using. these comments were inherited from Origin, but we failed to update them accordingly. in this change, the comments are updated to reflect the function signatures. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19900	2024-07-28 18:24:57 +03:00
Pavel Emelyanov	dd7c7c301d	hints: Const-ify gossiper references and anchor pointers There are two places in hints code that need gossiper: hist_sender calling gossiper::is_alive() and endpoint_downtime_not_bigger_than() helper in manager. Both can live with const gossiper, so the dependency references and anchor pointers can be restricted to const too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-26 16:28:54 +03:00
Tomasz Grabiec	851da230c8	Merge 'db/view: drop view updates to replaced node marked as left' from Piotr Dulikowski When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. In addition to the fix, this PR also includes a regression test heavily based on the test that @kbr-scylla prepared during his investigation of the issue. Fixes: scylladb/scylladb#19439 This issue can cause multiple nodes to crash at once and the fix is quite small, so I think this justifies backporting it to all affected versions. 6.0 and 6.1 are affected. No need to backport to 5.4 as this issue only happens with tablets, and tablets are experimental there. Closes scylladb/scylladb#19765 * github.com:scylladb/scylladb: test: regression test for MV crash with tablets during decommission db/view: drop view updates to replaced node marked as left	2024-07-25 11:47:14 +02:00
Michael Litvak	6f25f4b387	mv: skip reading rows when generating partition tombstone update when deleting a base partition, in some cases we can update the view by generating a single partition deletion update, instead of generating a row deletion update for each of the partition rows. If this is the case for all the affected views, and there are no other updates besides deleting the partition, then we can skip reading and iterating over all the rows, since this won't generate any additional updates that are not covered already.	2024-07-25 11:12:58 +03:00
Michael Litvak	d0b02dc0d0	mv: delete a partition in a single operation when applicable Currently when a partition is deleted from the base table, we generate a row tombstone update for each one of the view rows in the partition. When the partition key in the view is the same as the base, maybe in a different order, this can be done more efficiently - The whole corresponding view partition can be deleted with one partition tombstone update. With this commit, when generating view updates, if the update mutation has a partition tombstone then for the views which have the same partition key we will generate a partition tombstone update, and skip the individual row tombstone updates. Fixes scylladb/scylladb#8199	2024-07-25 11:12:58 +03:00
Aleksandra Martyniuk	c64cb98bcf	db: node_ops: filter topology request entries system_keyspace::get_topology_request_entries returns entries for requests which are running or have finished after specified time. In task manager node ops task set the time so that they are shown for task_ttl seconds after they have finished.	2024-07-23 13:35:02 +02:00
Aleksandra Martyniuk	94282b5214	db: service: modify methods to get topology_requests data Modify get_topology_request_state (and wait_for_topology_request_completion), so that it doesn't call on_internal_error when request_id isn't in the topology_requests table if require_entry == false. Add other methods to get topology request entry.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	880058073b	db: service: add request type column to topology_requests topology_requests table will be used by task manager node ops tasks, but it loses info about request type, which is required by tasks. Add request_type column to topology_requests.	2024-07-23 13:35:01 +02:00
Nadav Har'El	9eb47b3ef0	Merge 'config: round-trip boolean configuration variables' from Avi Kivity When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in both directions. Not a regression, so a backport isn't strictly necessary. Closes scylladb/scylladb#19792 * github.com:scylladb/scylladb: config: specialize from-string conversion for bool config: wrap boost::lexical_cast<> when converting from strings	2024-07-22 17:53:02 +03:00
Botond Dénes	d3135db457	Merge 'commitlog: Add optional max lifetime parameter to cl instance' from Calle Wilund If set, any remaining segment that has data older than this threshold will request flushing, regardless of data pressure. I.e. even a system where nothing happends will after X seconds flush data to free up the commit log. Related to #15820 The functionality here is to prevent pathological/test cases where a silent system cannot fully process stuff like compaction, GC etc due to things like CL forcing smaller GC windows etc. Closes scylladb/scylladb#15971 * github.com:scylladb/scylladb: commitlog: Make max data lifetime runtime-configurable db::config: Expose commitlog_max_data_lifetime_in_s parameter commitlog: Add optional max lifetime parameter to cl instance	2024-07-22 17:21:33 +03:00
Avi Kivity	36b57f3432	Merge 'token: inline optimizations' from Benny Halevy This series contains several optimizations for dht::token around its comparison functions as well as minimum_token and maximum_token definitions, by moving them inline into dht/token.hh This results in a nice improvement in perf-simple-query: ``` ==> perf-simple-query.pre <== (`21c67a5a64`) throughput: mean=95774.01 standard-deviation=1129.83 median=96243.64 median-absolute-deviation=1090.08 maximum=96864.09 minimum=94471.19 instructions_per_op: mean=41813.68 standard-deviation=16.27 median=41809.29 median-absolute-deviation=7.02 maximum=41841.64 minimum=41799.41 cpu_cycles_per_op: mean=22383.19 standard-deviation=331.01 median=22254.53 median-absolute-deviation=332.26 maximum=22744.11 minimum=21996.73 ==> perf-simple-query.post.0 <== (token: move ordering operator inline) throughput: mean=96350.01 standard-deviation=640.10 median=96228.88 median-absolute-deviation=621.45 maximum=96988.16 minimum=95478.51 instructions_per_op: mean=41627.13 standard-deviation=37.55 median=41627.06 median-absolute-deviation=2.43 maximum=41679.44 minimum=41573.31 cpu_cycles_per_op: mean=22184.65 standard-deviation=151.03 median=22163.05 median-absolute-deviation=120.83 maximum=22348.49 minimum=21967.30 ==> perf-simple-query.post.1 <== (token: operator<=>: optimize the common case) throughput: mean=96778.29 standard-deviation=1719.34 median=97021.72 median-absolute-deviation=1059.56 maximum=98300.99 minimum=93893.75 instructions_per_op: mean=41590.25 standard-deviation=5.53 median=41589.50 median-absolute-deviation=4.17 maximum=41598.39 minimum=41584.57 cpu_cycles_per_op: mean=22135.33 standard-deviation=471.98 median=21969.30 median-absolute-deviation=244.89 maximum=22905.24 minimum=21685.33 ==> perf-simple-query.post.3 <== (token: always initialize data member) throughput: mean=98264.33 standard-deviation=998.49 median=98533.02 median-absolute-deviation=780.45 maximum=99075.40 minimum=96656.51 instructions_per_op: mean=41657.61 standard-deviation=22.53 median=41648.49 median-absolute-deviation=12.89 maximum=41696.81 minimum=41642.07 cpu_cycles_per_op: mean=21808.57 standard-deviation=93.63 median=21794.56 median-absolute-deviation=75.41 maximum=21949.46 minimum=21719.55 ==> perf-simple-query.post.4 <== (token: constexpr ctors, methods, and minimum/maximum_token) throughput: mean=98095.05 standard-deviation=1333.32 median=98930.22 median-absolute-deviation=906.80 maximum=99209.38 minimum=96194.25 instructions_per_op: mean=41572.28 standard-deviation=6.04 median=41574.49 median-absolute-deviation=4.76 maximum=41579.56 minimum=41564.72 cpu_cycles_per_op: mean=21831.35 standard-deviation=169.56 median=21732.86 median-absolute-deviation=102.93 maximum=22091.66 minimum=21689.63 ==> perf-simple-query.post.5 <== (token: initialize non-key tokens with min() value) throughput: mean=99502.32 standard-deviation=1003.70 median=99744.03 median-absolute-deviation=388.87 maximum=100482.95 minimum=97813.42 instructions_per_op: mean=41593.48 standard-deviation=17.27 median=41585.25 median-absolute-deviation=8.46 maximum=41619.41 minimum=41575.86 cpu_cycles_per_op: mean=21545.90 standard-deviation=86.66 median=21578.01 median-absolute-deviation=43.17 maximum=21612.41 minimum=21395.42 ``` Optimization only. No backport required Closes scylladb/scylladb#19782 * github.com:scylladb/scylladb: token: initialize non-key tokens with min() value token: make kind-based ctor private token: constexpr ctors, methods, and minimum/maximum_token token: always initialize data member everywhere: use dht::token is_{minimum,maximum} token: operator<=>: optimize the common case token: move ordering operator inline partitioner_test: add more token-level tests	2024-07-21 15:07:36 +03:00

1 2 3 4 5 ...

3906 Commits