scylladb

Author	SHA1	Message	Date
Botond Dénes	dd50bd9bd4	db/batchlog_manager: re-add v1 support system.batchlog will still have to be used while the cluster is upgrading from an older version, which doesn't know v2 yet. Re-add support for replaying v1 batchlogs. The switch to v2 will happen after the BATCHLOG_V2 cluster feature is enabled. The only external user -- storage_proxy -- only needs a minor adjustment: switch between the table names. The rest is handled transparently by the db/batchlog.hh interface and the batchlog_manager.	2026-02-20 07:03:46 +02:00
Botond Dénes	ca2bbbad97	db/batchlog_manager: make structs stats public Need to rename stats() -> get_stats() because it shadows the now exported type name.	2026-02-20 07:03:46 +02:00
Botond Dénes	ac059dadc6	db/batchlog_manager: add feature_service dependency Will be needed to check for batchlog_v2 feature.	2026-02-20 07:03:46 +02:00
Botond Dénes	8edd5b80ab	test/boost/batchlog_manager_test: add test for batchlog cleanup Add more tests covering different aspects of batchlog replay, cleanup, replay timeout and finally v1 -> v2 migration.	2025-12-02 14:21:26 +02:00
Botond Dénes	e309b5dbe1	db/batchlog_manager: config: s/write_timeout/reply_timeot/ Although the value of this item is indeed derived from the write timeout config, the name doesn't reflect what it is used for. Change it to reflect it better.	2025-12-02 14:21:26 +02:00
Botond Dénes	846b656610	db,service: switch to system.batchlog_v2 New batchlogs are written to the batchlog_v2 table and replay also uses the v2 table. The content of system.batchlog is attempted to be migrated to system.batchlog_v2 after each start of the batchlog_manager service. The migration is retried on each replay if it fails. This is reduntant but simple. Batchlog cleanup now doesn't involve flushing memtables, the only remaining user of replica/database.hh is gone, so the include is dropped.	2025-12-02 14:21:26 +02:00
Aleksandra Martyniuk	7f20b66eff	db: repair: throw if replay fails Return a flag determining whether all the batches were sent successfully in batchlog_manager::replay_all_failed_batches (batches skipped due to being too fresh are not counted). Throw in repair_flush_hints_batchlog_handler if not all batches were replayed, to ensure that repair_time isn't updated.	2025-10-23 10:38:31 +02:00
Michael Litvak	a9b476e057	test: test_batchlog_manager: test batch replay when a node is down Add a test of the batchlog manager replay loop applying failed batches while some replica is down. The test reproduces an issue where the batchlog manager tries to replay a failed batch, doesn't get a response from some replica, and becomes stuck. It verifies that the batchlog manager can eventually recover from this situation and continue applying failed batches.	2025-07-07 12:23:06 +03:00
Michael Litvak	74a3fa9671	batchlog_manager: set timeout on writes Set a timeout on writes of replayed batches by the batchlog manager. We want to avoid having infinite timeout for the writes in case it gets stuck for some unexpected reason. The timeout is set to be high enough to allow any reasonable write to complete.	2025-07-07 12:23:06 +03:00
Botond Dénes	898ce98500	db/batchlog_manager: remove unused member _total_batches_replayed And its getter. There are no users for either. Closes scylladb/scylladb#24416	2025-06-16 09:37:00 +03:00
Benny Halevy	0672c9da5c	db: batchlog_manager: use named gate Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-12 11:28:48 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Asias He	fed9b54664	db/batchlog_manager: Add get_last_replay It is used to get the time when the last replay is executed.	2024-10-30 11:07:57 +08:00
Botond Dénes	3361542e84	db/batchlog_manager: wire in batchlog_replay_cleanup_after_replays After the specified amount of replays, trigger a cleanup: flush batchlog table memtables. This allows the cleanup to happen on a configurable interval, instead of on every batchlog replay attempt, which might be too much.	2024-10-30 11:07:57 +08:00
Botond Dénes	169c74346d	db/batchlog_manager: do_batch_log_replay(): add cleanup flag Add a flag controlling whether cleanup (memtable flush) will be done after the replay. This is to allow repair to opt out from cleanup -- when many concurrenty repairs are running, there can be storms of calles to do_batch_log_replay(), which will be mostly no-op, but they will all attempt to flush the memtable to clean-up after themselves. This is unnecessary and introduces latency to repairs, best to leave the cleanup to the periodic batch-log replay.	2024-10-30 11:07:57 +08:00
Kefu Chai	ee36358a60	db: remove unused includes these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. please note, since we have `using seastar::shared_ptr` in `seastarx.h`, this renders `#include <seastar/core/shared_ptr.hh>` unnecessary if we don't need the full definition of `seastar::shared_ptr`. so, in this change, all the unused includes are removed. but there are some headers which are actually used, while still being identified by this tool. these includes are marked with "IWYU pragma: keep". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-04 20:48:18 +08:00
Avi Kivity	cc8b4e0630	batchlog_manager, test: initialize delay configuration In `b4e66ddf1d` (4.0) we added a new batchlog_manager configuration named delay, but forgot to initialize it in cql_test_env. This somehow worked, but doesn't with clang 18. Fix it by initializing to 0 (there isn't a good reason to delay it). Also provide a default to make it safer. Closes scylladb/scylladb#18572	2024-05-13 07:57:35 +03:00
Kefu Chai	be364d30fd	db: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16664	2024-01-09 11:44:19 +02:00
Pavel Emelyanov	d48aff5789	batchlog_manager: Remove start() method It's now a no-op, can be dropped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-12 16:37:52 +03:00
Pavel Emelyanov	3966a50ed4	batchlog_manager: Start replay loop in constructor ... and sanitize the future used on stop. The loop in question is now started in .start(), but all callers now construct the manager late enough, so the loop spawning can be moved. This also calls for renaming the future member of the class and allows to make it regular, not shared, future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-12 16:35:53 +03:00
Pavel Emelyanov	1907518034	batchlog_manager: Add system_keyspace dependency The manager will need system ks to get truncation record from, so add it explicitly. Start-stop sequence no allows that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Pavel Emelyanov	8a03683671	batchlog_manager: Drain it with shared future The .drain() method can be called from several places, each needs to wait for its completion. Now this is achieved with the help of a gate, but there's a simpler way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-04 13:42:45 +03:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Nadav Har'El	3fbbad7d60	build performance: speed up inclusion of <gm/inet_address.hh> The header file <gm/inet_address.hh> is included, directly or indirectly, from 291 source files in Scylla. It is hard to reduce this number because Scylla relies heavily on IP addresses as keys to different things. So it is important that this header file be fast to include. Unfortunately it wasn't... ClangBuildAnalyzer measurements showed that each inclusion of this header file added a whopping 2 seconds (in dev build mode) to the build. A total of 600 CPU seconds - 10 CPU minutes - were spent just on this header file. It was actually worse because the build also spent additional time on template instantiation (more on this below). So in this patch we: 1. Remove some unnecessary stuff from gms/inet_address.hh, and avoid including it in one place that doesn't need it. This is just cosmetic, and doesn't significantly speed up the build. 2. Move the to_sstring() implementation for the .hh to .cc. This saves a lot of time on template instantiations - previously every source file instantiated this to_sstring(), which was slow (that "format" thing is slow). 3. Do not include <seastar/net/ip.hh> which is a huge file including half the world. All we need from it is the type "ipv4_address", so instead include just the new <seastar/net/ipv4_address.hh>. This change brings most of the performance improvement. So source files forgot to include various Seastar header files because the includes-everything ip.hh did it - so we need to add these missing includes in this patch. After this patch, ClangBuildAnalyzer's reports that the cost of inclusion of <gms/inet_address.hh> is down from 2 seconds to 0.326 seconds. Additionally the format<inet_address> template instantiation 291 times - about half a second each - is also gone. All in all, this patch should reduce around 10 CPU minutes from the build. Refs #1 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-01-04 21:07:23 +02:00
Benny Halevy	d344765ec6	get rid of the global batchlog_manager Now that it's unused. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	744275df73	batchlog_manager: get_batch_log_mutation_for: move to storage_proxy And rename to get_batchlog_mutation_for while at it, as it's about the batchlog, not batch_log. This resolves a circular dependency between the batchlog_manager and the storage_proxy that required it in the case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	55967a8597	batchlog_manager: endpoint_filter: move to gossiper There's nothing in this function that actually requries the batchlog manager instance. It uses a random number engine that's moved along with it to class gossiper. This resolves a circular dependency between the batchlog_manager and storage_proxy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	691afe1c4d	batchlog_manager: derive from peering_sharded_service So that do_batch_log_replay can get the sharded batchlog_manager as container(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	03039e8f8a	main: allow setting the global batchlog_manager As a prerequisite to globalizing the batchlog_manager, allow setting a global pointer to it and instantiate the sharded<db::batchlog_manager> on the main/cql_test_env stack. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	5165780d81	batchlog_manager: refactor drain out of stop drain() aborts the replay loop fiber and returns its future. It's grabbing _gate so stop() will wait on it. The intention is to call stop_replay_loop from storage_service::decommission and do_drain rather than stop, so we can stop the batchlog manager once, using a deferred action in main. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-07-20 20:23:06 +03:00
Benny Halevy	deef1b4f59	batchlog_manager: stop: use abort_source to abort batchlog_replay_loop Harden start/stop by using an abort_source to abort from the replay loop. Extract the loop into batchlog_replay_loop() coroutine, with the _stop abourt source as a stop condition, plus use it for sleep_abortable to be able to promptly stop while sleeping. start() stores batchlog_replay_loop's future in a newly added _started member, which is waited on in stop() to synchronize with the start process at any stage. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-07-20 19:32:55 +03:00
Avi Kivity	4d70f3baee	storage_proxy: change unordered_set<inet_address> to small_vector in write path The write paths in storage_proxy pass replica sets as std::unordered_set<gms::inet_address>. This is a complex type, with N+1 allocations for N members, so we change it to a small_vector (via inet_address_vector_replica_set) which requires just one allocation, and even zero when up to three replicas are used. This change is more nuanced than the corresponding change to the read path `abe3d7d7` ("Merge 'storage_proxy: use small_vector for vectors of inet_address' from Avi Kivity"), for two reasons: - there is a quadratic algorithm in abstract_write_response_handler::response(): it searches for a replica and erases it. Since this happens for every replica, it happens N^2/2 times. - replica sets for writes always include all datacenters, while reads usually involve just one datacenter. So, a write to a keyspace that has 5 datacenters will invoke 15*(15-1)/2 =105 compares. We could remove this by sending the index of the replica in the replica set to the replica and ask it to include the index in the response, but I think that this is unnecessary. Those 105 compares need to be only 105/15 = 7 times cheaper than the corresponding unordered_set operation, which they surely will. Handling a response after a cross-datacenter round trip surely involves L3 cache misses, and a small_vector reduces these to a minimum compared to an unordered_set with its bucket table, linked list walking and managent, and table rehashing. Tests using perf_simple_query --write --smp 1 --operations-per-shard 1000000 --task-quota-ms show two allocations removed (as expected) and a nice reduction in instructions executed. before: median 204842.54 tps ( 54.2 allocs/op, 13.2 tasks/op, 49890 insns/op) after: median 206077.65 tps ( 52.2 allocs/op, 13.2 tasks/op, 49138 insns/op) Closes #8847	2021-06-17 13:46:40 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Pavel Emelyanov	b4e66ddf1d	batchlog: Use in-config ring-delay This kills the first (out of two) global reference on storage_service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Avi Kivity	89be47e291	batchlog_manager: remove dependency on db::config Extract configuration into a new struct batchlog_manager_config and have the callers populate it using db::config. This reduces dependencies on global objects.	2018-12-09 20:11:38 +02:00
Nadav Har'El	25bd139508	cross-tree: clean up use of std::random_device() std::random_device() uses the relatively slow /dev/urandom, and we rarely if ever intend to use it directly - we normally want to use it to seed a faster random_engine (a pseudo-random number generator). In many places in the code, we first created a random_device variable, and then using it created a random_engine variable. However, this practice created the risk of a programmer accidentally using the random_device object, instead of the random_engine object, because both have the same API; This hurts performance. This risk materialized in just two places in the code, utils/uuid.cc and gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is not included in this patch, and the fix for gossiper.{cc,hh} is included here. To avoid risking the same mistake in the future, this patch switches across the code to an idiom where the random_device object is not named, so cannot be accidentally used. We use the following idiom: std::default_random_engine _engine{std::random_device{}()}; Here std::random_device{}() creates the random device (/dev/urandom) and pulls a random integer from it. It then uses this seed to create the random_engine (the pseudo-random number generator). The std::random_device{} object is temporary and unnamed, and cannot be unintentionally used directly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180726154958.4405-1-nyh@scylladb.com>	2018-07-26 16:54:58 +01:00
Avi Kivity	86de6cc7fb	Merge seastat upstream * seastar f14d2a3...7a49ae5 (8): > sharded: improve support for cooperating sharded<> services > sharded: support for peer services > semaphore: add a version of with_semaphore that takes a duration timeout > scripts: perftune.py: fix the CPU mask generation for more than 64 CPUs > Revert "future-utils: make when_all() (vector variant) exception safe" > Revert "future-utils: fix gross compilation errors in when_all()" > future-utils: fix gross compilation errors in when_all() > future-utils: make when_all() (vector variant) exception safe Includes change to batchlog_manager constructor to adapt it to seastar::sharded::start() change.	2017-08-06 17:47:47 +03:00
Vlad Zolotarov	a9f6e5f8da	db::batchlog_manager: move collectd registration to the metrics registration layer Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-01-10 16:24:54 -05:00
Vlad Zolotarov	4ef5b11e9b	batchlog_manager: add a counter for a total number of write attempts Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:29:21 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	cdb43c5586	batchlog_manager: Allow user initiated bachlog replay operation During decommission, the storage_service::unbootstrap() needs to initiate a batchlog replay operation. To sync the replay operation initiated by the timer in batchlog_manager and storage_service, a semaphore is introduced. To simplify the semaphore locking, the management code now always runs on shard zero, but the real work is distruted to all shards.	2016-03-30 20:54:30 +08:00
Calle Wilund	42c086a5cd	batchlog_manager: Fixup includes + exception handling * Fix exception handling in batch loop (report + still re-arm) * Cleanup seastar include reference style	2015-10-07 17:06:34 +03:00
Calle Wilund	6f94a3bdad	batchlog_manager: Use gate instead of semaphore Since that exists now.	2015-10-07 14:30:09 +02:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Paweł Dziepak	ddec2b4d09	batchlog_manager: pass mutations by const ref Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-03 10:30:29 +02:00
Calle Wilund	9a52ad84b1	BatchlogManager: make blm globally reachable distributed like other objects	2015-08-11 17:10:17 +02:00
Calle Wilund	0ded44eeee	BatchlogManager: make endpoint_filter method + implement	2015-08-11 17:10:16 +02:00

1 2

52 Commits