scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 04:56:58 +00:00

Author	SHA1	Message	Date
Konstantin Osipov	25087536bc	main: developer-mode configuraiton option uses dash, not underscore Message-Id: <20190520115524.101871-1-kostja@scylladb.com>	2019-05-20 15:14:11 +03:00
Avi Kivity	b19792405f	main: RAII-ify shutdown Instead of app-template::run_deprecated() and at_exit() hooks, use app_template::run() and RAII (via defer()) to stop services. This makes it easier to add services that do support shutdown correctly. Ref #2737 Message-Id: <20190420175733.29454-1-avi@scylladb.com>	2019-04-23 16:13:39 +02:00
Benny Halevy	5a99023d4a	treewide: use lambda for io_check of *touch_directory To prepare for a seastar change that adds an optional file_permissions parameter to touch_directory and recursive_touch_directory. This change messes up the call to io_check since the compiler can't derive the Func&& argument. Therefore, use a lambda function instead to wrap the call to {recursive_,}touch_directory. Ref #4395 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>	2019-04-21 12:04:39 +03:00
Piotr Jastrzebski	da1eba5bdb	Use read_sstables_format in main.cc Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Piotr Jastrzebski	caa6798f2c	system_keyspace: add storage_service param to setup Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Tomasz Grabiec	a717e11026	Merge "row level repair shutdown fixes" from Asias This series fixes row level repair shutdown related issues we saw with dtests, e.g., use after free of the repair meta object, fail to stop a table during shutdown. Fixes: #4044 Fixes: #4314 Fixes: #4333 Fixes: #4380 Tests: repair_additional_test.py:RepairAdditionalTest.repair_abort_test repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test * sestar-dev.git asias/repair.fix.shutdown.v1: repair: Wait for pending repair_meta operation before removing it repair: Check shutdown in row level repair repair: Remove repair meta when node is dead repair: Remove all row level repair during shtudown	2019-04-05 15:47:25 +03:00
Asias He	344d0ee37d	repair: Remove repair meta when node is dead Repair follower nodes will create repair meta object when repair master node starts a repair. Normally, the repair meta object is removed when repair master finishes the repair and sends the verb REPAIR_ROW_LEVEL_STOP to all the followers to remove the repair meta object. In case of repair master was killed suddenly, no one will remove the repair meta object. To prevent keeping this repair meta object forever, we should remove such objects when gossip detects a node is dead with the gossip listener. Fixes: #4380 Reviewed-by: Botond Dénes <bdenes@scylladb.com>	2019-04-02 19:28:53 +08:00
Asias He	70fbe85b3e	main: Add shutdown database log It is useful to know which step we are during shutdown process. Refs: #4044 Message-Id: <f7c94c60d039560bfacd6d473f7d828940cc55b7.1554172140.git.asias@scylladb.com>	2019-04-02 11:49:00 +03:00
Benny Halevy	e3f7fe44c0	init: validate file ownership and mode. Files and directories must be owned by the process uid. Files must have read access and directories must have read, write, and execute access. Refs #3117 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:40:12 +02:00
Benny Halevy	ff4d8b6e85	treewide: use std::filesystem Rather than {std::experimental,boost,seastar::compat}::filesystem On Sat, 2019-03-23 at 01:44 +0200, Avi Kivity wrote: > The intent for seastar::compat was to allow the application to choose > the C++ dialect and have seastar follow, rather than have seastar choose > the types and have the application follow (as in your patch). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:21:10 +02:00
Asias He	b91452ed4c	storage_service: Pass gossiper object to storage_service Pass the gossiper object to storage_service class in order to avoid the usage of the static object returned from get_local_gossiper().	2019-03-22 09:11:26 +08:00
Asias He	ee1227b3ae	gossiper: Pass db::config object to gossiper class Gossiper calls service::get_local_storage_service() to get cfg options. To avoid cyclic dependency, pass the cfg object to gossiper directly.	2019-03-22 08:25:16 +08:00
Asias He	1652ee512a	init: Pass gossiper object to init_ms_fd_gossiper In order to avoid the usage of the static gossiper object returned from get_local_gossiper().	2019-03-22 08:25:16 +08:00
Asias He	71bf757b2c	gossiper: Enable features only after gossip is settled n1, n2, n3 in the cluster, shutdown n1, n2, n3 start n1, n2 start n3, we saw features are enabled using the system table while n1 and n2 are already up and running in the cluster. INFO 2019-02-27 09:24:41,023 [shard 0] gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} INFO 2019-02-27 09:24:41,025 [shard 0] storage_service - Starting up server gossip INFO 2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.1 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} INFO 2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.2 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} The problem is we enable the features too early in the start up process. We should enable features after gossip is settled. Fixes #4289 Message-Id: <04f2edb25457806bd9e8450dfdcccc9f466ae832.1551406991.git.asias@scylladb.com>	2019-03-18 18:25:29 +01:00
Duarte Nunes	2718c90448	Merge 'Add canceling long-standing view update requests' from Piotr " This series allows canceling view update requests when a node is discovered DOWN. View updates are sent in the background with long timeout (5 minutes), and in case we discover that the node is unavailable, there's no point in waiting that long for the request to finish. What's more, waiting for these requests occurs on shutdown, which may result in waiting 5 minutes until Scylla properly shuts down, which is bad for both users and dtests. This series implements storage_proxy as a lifecycle subscriber, so it can react to membership changes. It also keeps track of all "interruptible" writes per endpoint, so once a node is detected as DOWN, an artificial timeout can be triggered for all aforementioned write requests. Fixes #3826 Fixes #3966 Fixes #4028 " * 'write_hints_for_view_updates_on_shutdown_4' of https://github.com/psarna/scylla: service: remove unused stop_hints_manager storage_proxy: add drain_on_shutdown implementation main: register storage proxy as lifecycle subscriber storage_proxy: add endpoint_lifecycle_subscriber interface storage_proxy: register view update handlers for view write type storage_proxy: add intrusive list of view write handlers storage_proxy: add view_update_write_response_handler	2019-03-08 13:34:46 -03:00
Piotr Sarna	c61d0ee8aa	main: register storage proxy as lifecycle subscriber In order to be able to act when node joins/leaves, storage proxy is registered as an endpoint lifecycle subscriber. Fixes #3826 Fixes #4028	2019-03-07 12:10:40 +01:00
Rafael Ávila de Espíndola	765d8535f1	db: Add a stop_database helper This reduces code duplication. A followup patch will add more code to stop_database. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:45 -08:00
Nadav Har'El	0eddf19432	main: add INFO log messages at start, initialization end, and end. Scylla currently prints a welcome message when it starts, with the Scylla version, but this is not printed to the regular log so in some cases (e.g., Jenkins runs) we do not see it in the log. So let's add a regular INFO-level log message with the same information. Also, Scylla currently doesn't print any specific log message when it normally completes its shutdown. In some cases, users may end up wondering whether Scylla hung in the middle of the shutdown, or in fact exited normally. Refs #4238. So in this patch we add a "shutdown complete" message as the very last message in a successfull shutdown. We print Scylla's version also in the shutdown message, which may be useful to see in the logs when shutting down one version of Scylla and starting a different version. Finally, we also add a log message when initialization is complete, which may also be useful to understand whether Scylla hung during initialization. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217140659.19512-1-nyh@scylladb.com>	2019-02-22 16:52:31 +01:00
Rafael Ávila de Espíndola	9cd14f2602	Don't write to system.large_partition during shutdown The included testcase used to crash because during database::stop() we would try to update system.large_partition. There doesn't seem to be an order we can stop the existing services in cql_test_env that makes this possible. This patch then adds another step when shutting down a database: first stop updating system.large_partition. This means that during shutdown any memtable flush, compaction or sstable deletion will not be reflected in system.large_partition. This is hopefully not too bad since the data in the table is TTLed. This seems to impact only tests, since main.cc calls _exit directly. Tests: unit (release,debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190213194851.117692-1-espindola@scylladb.com>	2019-02-15 10:49:10 +01:00
Calle Wilund	12ebcf1ec7	commitlog_replay: Use dedicated table for truncation Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data.	2019-02-13 09:08:12 +00:00
Piotr Sarna	0eb703dc80	all: rename view_update_from_staging_generator The new name, view_update_generator, is both more concise and correct, since we now generate from directories other than "/staging".	2019-01-15 17:31:47 +01:00
Piotr Sarna	46305861c3	init: pass view update generator to storage service Storage service needs to access view update generator in order to register staging sstables from /upload directory.	2019-01-15 17:31:36 +01:00
Piotr Sarna	09401e0e71	sstables,table: rename is_staging to requires_view_building A generalized name will be more fitting once we treat uploaded sstables as requiring view building too.	2019-01-15 16:47:01 +01:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Nadav Har'El	da090a5458	materialized views: move hints to top-level directory While we keep ordinary hints in a directory parallel to the data directory, we decided to keep the materialized view hints in a subdirectory of the data directory, named "view_pending_updates". But during boot, we expect all subdirectories of data/ to be keyspace names, and when we notice this one, we print a warning: WARN: database - Skipping undefined keyspace: view_pending_updates This spurious warning annoyed users. But moreover, we could have bigger problems if the user actually tries to create a keyspace with that name. So in this patch, we move the view hints to a separate top-level directory, which defaults to /var/lib/scylla/view_hints, but as usual can be configured. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190107142257.16342-1-nyh@scylladb.com>	2019-01-07 16:43:43 +02:00
Piotr Sarna	a0003c52cf	main,repair: add params to row level repair init Row level repair needs references to system distributed keyspace and view update generator in order to enqueue some sstables as staging.	2019-01-03 08:31:41 +01:00
Avi Kivity	6641353854	tracing: remove static class_registry Static class_registries hinder librarification by requiring linking with all object files (instead of a library from which objects are linked on demand) and reduce readability by hiding dependencies and by their horrible syntax. Hide them behind a non-static, non-template tracing backend registry. Message-Id: <20181229121000.7885-1-avi@scylladb.com>	2018-12-31 13:24:54 +00:00
Avi Kivity	c180a18dbb	Distribute distributed_loader into its own header and source files distributed_loader is a sizeable fraction of database.cc, so moving it out reduces compile time and improves readability. Message-Id: <20181230200926.15074-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Avi Kivity	f0a709cfc8	commitlog_replayer: don't use query_processor During normal writes, query processing happens before commitlog, so logically commitlog replaying the commitlog shouldn't need it. And in fact the dependency on query_processor can be eliminated, all it needs is the local node's database.	2018-12-29 11:00:29 +02:00
Avi Kivity	e4233262cf	legacy_schema_migrator: initialize with database reference Provide legacy_schema_migrator with a sharded<database> so it doesn't need to use the one from query_processor. We want to replace query_processor's sharded<database> with just a local database reference in order to simplify it, and this is standing in the way.	2018-12-29 10:58:22 +02:00
Avi Kivity	c96fc1d585	Merge "Introduce row level repair" from Asias " === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes: #3033 Tests: dtests/repair_additional_test.py " * 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits) repair: Enable row level repair repair: Add row_level_repair repair: Add docs for row level repair repair: Add repair_init_messaging_service_handler repair: Add repair_meta repair: Add repair_writer repair: Add repair_reader repair: Add repair_row repair: Add fragment_hasher repair: Add decorated_key_with_hash repair: Add get_random_seed repair: Add get_common_diff_detect_algorithm repair: Add shard_config repair: Add suportted_diff_detect_algorithms repair: Add repair_stats to repair_info repair: Introduce repair_stats flat_mutation_reader: Add make_generating_reader storage_service: Introduce ROW_LEVEL_REPAIR feature messaging_service: Add RPC verbs for row level repair repair: Export the repair logger ...	2018-12-25 13:13:00 +02:00
Duarte Nunes	6df32bfb0c	main: Start and stop the view_update_backlog_broker Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	776fdd4d1a	service/storage_proxy: Expose local view update backlog The local view update backlog is the max backlog out of the relative memory backlog size and the relative hints backlog size. We leverage the db::view::node_update_backlog class so we can send the max backlog out of the node's shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Asias He	b9e0db801d	repair: Enable row level repair Finally, enable new row level repair if the cluster supports it. If not, fallback to the old partition level repair. Fixes #3033	2018-12-12 16:49:01 +08:00
Avi Kivity	89be47e291	batchlog_manager: remove dependency on db::config Extract configuration into a new struct batchlog_manager_config and have the callers populate it using db::config. This reduces dependencies on global objects.	2018-12-09 20:11:38 +02:00
Tomasz Grabiec	6012a63660	Merge "Fix window during init where waiting for a feature can be ignored" from Avi storage_service keeps a bunch of "feature" variables, indicating cluster-wide supported features, and has the ability to wait until the entire cluster supports a given feature. The propagation of features depends on gossip, but gossip is initialized after storage_service, so the current code late-initializes the features. However, that means that whoever waits on a feature between storage_service initialization and gossip initialization loses their wait entry. In #3952, we have proof that this in fact happens. Fix this by removing the circular dependency. We now store features in a new service, feature_service, that is started before both gossip and storage_service. Gossip updates feature_service while storage_service reads for it. Fixes #3953. * https://github.com/avikivity/3953/v4.1: storage_service: deinline enable_all_features() gossiper: keep features registered tests/gossip: switch to seastar::thread storage_service: deinline init/deinit functions gossiper: split feature storage into a new feature_service gossiper: maybe enable features after start_gossiping() storage_service: fix gap when feature::when_enabled() doesn't work	2018-12-06 15:42:26 +01:00
Avi Kivity	4e553b692e	gossiper: split feature storage into a new feature_service Feature lifetime is tied to storage_service lifetime, but features are now managed by gossip. To avoid circular dependency, add a new feature_service service to manage feature lifetime. To work around the problem, the current code re-initializes features after gossip is initialized. This patch does not fix this problem; it only makes it possible to solve it by untyping features from gossip.	2018-12-06 16:31:04 +02:00
Glauber Costa	fee4d2eb9b	compaction_manager: delay initialization of the compaction manager. If the compaction manager is started, compactions may start (this is regardless of whether or not we trigger them). The problem with that is that they start at a time in which we are flushing the commitlog and the initialization procedure waits for the commitlog to be fully flushed and the resulting memtables flushed before we move on. Because there are no incoming writes, the amount of shares in memtable flushes decrease as memory used decreases and that can cause the startup procedure to take a long time. We have recently started to bump the shares manually for manual flushes. While that guarantees that we will not drive the shares to zero, I will make the argument that we can do better by making sure that those things are, at this point, running alone: user experience is affected by startup times and the bump we give to user-triggered operations will only do so much. Even if we increase the shares a lot flushes will still be fighting for resources with compactions and startup will take longer than it could. By making sure that flushes are this point running alone we improve the user experience by making sure the startup is as fast as it can be. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-12-04 13:48:42 -05:00
Piotr Sarna	6ab8235369	main: fix deinitialization order for view update generator View update generator should be stopped only after drain_on_shutdown() is performed on storage service. Message-Id: <4d2bda4c73422a2ebf46d6dcd06c95d960839889.1543230849.git.sarna@scylladb.com>	2018-11-26 11:21:37 +00:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Piotr Sarna	16c042039c	main: add registering staging sstables read from disk Staging sstables read from disk are registered to the view update generator right after initializing non system keyspaces. Fixes #3275	2018-11-13 15:04:43 +01:00
Piotr Sarna	dc74887ff3	streaming: add system distributed keyspace ref to streaming Streaming code needs system distributed keyspace to check if streamed sstables should be staging, so a proper reference is added.	2018-11-13 15:01:53 +01:00
Piotr Sarna	7ef5e1b685	streaming: add view update generator reference to streaming Streaming code may need view update generator service to generate and send view updates, so a proper reference is added.	2018-11-13 15:01:53 +01:00
Piotr Sarna	eb0c507a45	main: add generating missed mv updates from staging sstables If any sstables are found in the staging directory, it means that they missed generating view updates, so it's performed now.	2018-11-13 15:01:53 +01:00
Avi Kivity	a71ab365e3	toplevel: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Vlad Zolotarov	aca0882a3f	hinted handoff: enable storing hints before starting messaging_service When messaging_service is started we may immediately receive a mutation from another node (e.g. in the MV update context). If hinted handoff is not ready to store hints at that point we may fail some of MV updates. We are going to resolve this by start()ing hints::managers before we start messaging_service and blocking hints replaying until all relevant objects are initialized. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:49:58 -04:00
Avi Kivity	e7ae4beef0	main: run prometheus and API servers under streaming group Both the Prometheus and the API servers are used for maintenance operations, similarly to streaming. Run them under the streaming scheduling group to prevent them from impacting normal operations, and rename the streaming scheduling group to reflect the more generic role. This helps to prevent spikes from Prometheus or API requests from interfering with the normal workload. Using an existing group is preferable to creating a new group because in the worst case, all the non-main-workload groups compete with the main workload. Consolidating them allows us to give them significant shares in total without increasing competition in the worst case. The group's label is unchanged to preserve compatibility with dashboards. A nice side effect is that repair, which is initiated by API calls, gets placed into the maintenance group naturally. Compaction tasks which are run by compaction manager are not changed. Message-Id: <20180714160723.23655-1-avi@scylladb.com>	2018-07-30 15:07:33 +01:00
Avi Kivity	8c993e0728	messaging: tag RPC services with scheduling groups Assign a scheduling_group for each RPC service. Assignement is done by connection (get_rpc_client_idx()) - all verbs on the same connection are assigned the same group. While this may seem arbitrary, it avoids priority inversion; if two verbs on the same connection have different scheduling groups, the verb with the low shares may cause a backlog and stall the connection, including following requests from verbs that ought to have higher shares. The scheduling_group parameters are encapsulated in different classes as they are passed around to avoid adding dependencies. Message-Id: <20180708140433.6426-1-avi@scylladb.com>	2018-07-13 13:57:08 +02:00
Vlad Zolotarov	c65a110839	main: remove the "experimental" tag from the hinted handoff feature Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-06 19:19:40 -04:00
Vlad Zolotarov	83ba6d84a1	db::hints::manager: implement rebalance() method Rebalance hints segments that need to be sent among all present shards. Ensure that after rebalancing the difference between the number of segments of any two shards is not greater than 1. Try to minimize the amount of "file rename" operations in order to achieve the needed result. Note: "Resharding" is a particular case of rebalancing. Tests: dtest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-06 19:18:46 -04:00

... 8 9 10 11 12 ...

776 Commits