scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 04:06:59 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	271ceb57b9	gossiper: Keep immutable options on gossip_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-02 18:34:55 +03:00
Benny Halevy	132c9d5933	main: shutdown: do not abort on certain system errors Currently any unhandled error during deferred shutdown is rethrown in a noexcept context (in ~deferred_action), generating a core dump. The core dump is not helpful if the cause of the error is "environmental", i.e. in the system, rather than in scylla itself. This change detects several such errors and calls _Exit(255) to exit the process early, without leaving a coredump behind. Otherwise, call abort() explicitly, rather than letting terminate() be called implicitly by the destructor exception handling code. Fixes #9573 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220227101054.1294368-1-bhalevy@scylladb.com>	2022-02-27 16:26:48 +02:00
Pavel Emelyanov	de6c60c1c9	client_data: Sanitize connection_notifier Now the connection_notifier is all gone, only the client_data bits are left. To keep it consistent -- rename the files. Also, while at it, brush up the header dependencies and remove the not really used constexprs for client states. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-18 15:02:26 +03:00
Pavel Emelyanov	971c431a23	code: Remove old on-disk version of system.clients table This includes most of the connection_notifier stuff as well as the auxiliary code from system_keyspace.cc and a bunch of updating calls from the client state changing. Other than less code and less disk updates on clients connection paths, this removes one usage of the nasty global qctx thing. Since the system.clients goes away rename the system.clients_v here too so the table is always present out there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-18 15:02:26 +03:00
Michael Livshin	3fef604075	sstables_manager: add get_local_host_id() method and support Since ME sstable format includes originating host id in stats metadata, local host id needs to be made available for writing and validation. Both Scylla server (where local host id comes from the `system.local` table) and unit tests (where it is fabricated) must be accomodated. Regardless of how the host id is obtained, it is stored in the db config instance and accessed through `sstables_manager`. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Michael Livshin	7d2af177eb	system_keyspace, main: load (or create) local host id earlier We want it to be cached before any sstable is written, so do it right after system_keyspace::minimal_setup(). Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Botond Dénes	ef34c10a94	main: run scylla main to when there are no arguments main() has some logic to select the main function it will delegate to based on argv[1]. The intent is that when the value of argv[1] suggest that the user did not specify a specific app to run, we default to "server" (scylla proper). This logic currently breaks down when there are no arguments at all: in this case the following error is printed and scylla refuses to start: error: unrecognized first argument: expected it to be "server", a regular command-line argument or a valid tool name (see `scylla --list-tools`), but got Fix this by checking for empty argv[1] and defaulting to "server" in that case. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220210092125.293682-1-bdenes@scylladb.com>	2022-02-10 11:47:20 +02:00
Michał Sala	66a93d3000	cql3: query_processor: add forward_service reference to query_processor	2022-02-01 21:14:41 +01:00
Michał Sala	a6cf3f52bd	service: introduce forward_service The new service is responsible for: * spreading forward_request execution across multiple nodes in cluster * collecting forward_request execution results and merging them `forward_service::dispatch` method takes forward_request as an argument, and forwards its execution to group of other nodes (using rpc verb added in previous commits). Each node (in the group chosen by dispatch method) is provided with forward_request, which is no different from the original argument except for changed partition ranges. They are changed so that vnodes contained in them are owned by recipient node. Executing forward_request is realized in `forward_service::execute` method, that is registered to be called on FORWARD_REQUEST verb receipt. Process of executing forward_request consists of mocking few non-serializable object (such as `cql3::selection`) in order to create `service:pager:query_pagers::pager` and `cql3::selection::result_set_builder`. After pager and result_set_builder creation, execution process resembles what might be seen in select_statement's execution path.	2022-02-01 21:14:41 +01:00
Nadav Har'El	4aa9e86924	Merge 'alternator: move uses of replica module to data_dictionary' from Avi Kivity Alternator is a coordinator-side service and so should not access the replica module. In this series all but one of uses of the replica module are replaced with data_dictionary. One case remains - accessing the replication map which is not available (and should not be available) via the data dictionary. The data_dictionary module is expanded with missing accessors. Closes #9945 * github.com:scylladb/scylla: alternator: switch to data_dictionary for table listing purposes data_dictionary: add get_tables() data_dictionary: introduce keyspace::is_internal()	2022-01-19 11:34:25 +02:00
Avi Kivity	7399f3fae7	alternator: switch to data_dictionary for table listing purposes As a coordinator-side service, alternator shouldn't touch the replica module, so it is migrated here to data_dictionary. One use case still remains that uses replica::keyspace - accessing the replication map. This really isn't a replica-side thing, but it's also not logically part of the data dictionary, so it's left using replica::keyspace (using the data_dictionary::database::real_database() escape hatch). Figuring out how to expose the replication map to coordinator-side services is left for later.	2022-01-19 11:03:36 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	4392c20bd3	replica: move distributed_loader into replica module distributed_loader is replica-side thing, so it belongs in the replica module ("distributed" refers to its ability to load sstables in their correct shards). So move it to the replica module.	2022-01-10 15:25:28 +02:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Botond Dénes	a37b4bbbaf	main: improve handling of non-matching argv[1] Be silent when argv[1] starts with "-", it is probably an option to scylla (and "server" is missing from the cmd line). Print an error and stop when argv[1] doesn't start with "-" and thus the user assumably meant to start either the server or a tool and mis-typed it. Instead of trying to guess what they meant stop with a clear error message.	2022-01-06 06:59:59 +02:00
Botond Dénes	fe0bfa1d7b	main: add move tool listing to --list-tools And make it the central place listing available tools (to minimize the places to update when adding a new one). The description is edited to point to this command instead of listing the tools itself.	2022-01-06 06:58:44 +02:00
Botond Dénes	ab0e39503b	main: rephrase app description Remove "compatible with Apache Cassandra", scylla is much more than that already. Rephrase the part describing the included tools such that it is clear that the scylla server is the main thing and the tools are the "extra" additions. Also use the term "tool" instead of the term "app".	2022-01-06 06:37:32 +02:00
Asias He	a8ad385ecd	repair: Get rid of the gc_grace_seconds The gc_grace_seconds is a very fragile and broken design inherited from Cassandra. Deleted data can be resurrected if cluster wide repair is not performed within gc_grace_seconds. This design pushes the job of making the database consistency to the user. In practice, it is very hard to guarantee repair is performed within gc_grace_seconds all the time. For example, repair workload has the lowest priority in the system which can be slowed down by the higher priority workload, so that there is no guarantee when a repair can finish. A gc_grace_seconds value that is used to work might not work after data volume grows in a cluster. Users might want to avoid running repair during a specific period where latency is the top priority for their business. To solve this problem, an automatic mechanism to protect data resurrection is proposed and implemented. The main idea is to remove the tombstone only after the range that covers the tombstone is repaired. In this patch, a new table option tombstone_gc is added. The option is used to configure tombstone gc mode. For example: 1) GC a tombstone after gc_grace_seconds cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ; This is the default mode. If no tombstone_gc option is specified by the user. The old gc_grace_seconds based gc will be used. 2) Never GC a tombstone cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'}; 3) GC a tombstone immediately cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'}; 4) GC a tombstone after repair cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'}; In addition to the 'mode' option, another option 'propagation_delay_in_seconds' is added. It defines the max time a write could possibly delay before it eventually arrives at a node. A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc option can only be used after the whole cluster supports the new feature. A mixed cluster works with no problem. Tests: compaction_test.py, ninja test Fixes #3560 [avi: resolve conflicts vs data_dictionary]	2022-01-04 19:48:14 +02:00
Avi Kivity	5eccb42846	Merge "Host tool executables in the scylla main executable" from Botond " A big problem with scylla tool executables is that they include the entire scylla codebase and thus they are just as big as the scylla executable itself, making them impractical to deploy on production machines. We could try to combat this by selectively including only the actually needed dependencies but even ignoring the huge churn of sorting out our depedency hell (which we should do at one point anyway), some tools may genuinely depend on most of the scylla codebase. A better solution is to host the tool executables in the scylla executable itself, switching between the actual main function to run some way. The tools themselves don't contain a lot of code so this won't cause any considerable bloat in the size of the scylla executable itself. This series does exactly this, folds all the tool executables into the scylla one, with main() switching between the actual main it will delegate to based on a argv[1] command line argument. If this is a known tool name, the respective tool's main will be invoked. If it is "server", missing or unrecognized, the scylla main is invoked. Originally this series used argv[0] as the mean to switch between the main to run. This approach was abandoned for the approach mentioned above for the following reasons: * No launcher script, hard link, soft link or similar games are needed to launch a specific tool. * No packaging needed, all tools are automatically deployed. * Explicit tool selection, no surprises after renaming scylla to something else. * Tools are discoverable via scylla's description. * Follows the trend set by modern command line multi-command or multi-app programs, like git. Fixes: #7801 Tests: unit(dev) " * 'tools-in-scylla-exec-v5' of https://github.com/denesb/scylla: main,tools,configure.py: fold tools into scylla exec tools: prepare for inclusion in scylla's main main: add skeleton switching code on argv[1] main: extract scylla specific code into scylla_main()	2022-01-04 17:55:07 +02:00
Pavel Solodovnikov	83862d9871	db: save supported features after passing gossip feature check Move saving features to `system.local#supported_features` to the point after passing all remote feature checks in the gossiper, right before joining the ring. This makes `system.local#supported_features` column to store advertised feature set. Leave a comment in the definition of `system.local` schema to reflect that. Since the column value is not actually used anywhere for now, it shouldn't affect any tests or alter the existing behavior. Later, we can optimize the gossip communication between nodes in the cluster, removing the feature check altogether in some cases (since the column value should now be monotonic). Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-12-23 12:48:37 +03:00
Botond Dénes	bb0874b28b	main,tools,configure.py: fold tools into scylla exec The infrastructure is now in place. Remove the proxy main of the tools, and add appropriate `else if` statements to the executable switch in main.cc. Also remove the tool applications from the `apps` list and add their respective sources as dependencies to the main scylla executable. With this, we now have all tool executables living inside the scylla main one.	2021-12-20 18:27:25 +02:00
Botond Dénes	972d789a27	main: add skeleton switching code on argv[1] To prepare for the scylla executable hosting more than one apps, switching between them using argv[1]. This is consistent with how most modern multi-app/multi-command programs work, one prominent example being git. For now only one app is present: scylla itself, called "server". If argv[1] is missing or unrecognized, this is what is used as the default for backward-compatibility. The scylla app also gets a description, which explains that scylla hosts multiple apps and lists all the available ones.	2021-12-20 18:26:38 +02:00
Botond Dénes	1a4ca831a4	main: extract scylla specific code into scylla_main() main() now contains only generic setup and teardown code and it delegates to scylla_main(). In the next patches we want to wire in tool executables into the scylla one. This will require selecting the main to run at runtime. scylla_main() will be just one of those (the default).	2021-12-20 17:31:46 +02:00
Avi Kivity	a97731a7e5	migration_manager: replace uses of get_storage_proxy and get_local_storage_proxy with constructor-provided reference A static helper also gained a storage_proxy parameter.	2021-12-16 21:05:47 +02:00
Avi Kivity	d768e9fac5	cql3, related: switch to data_dictionary Stop using database (and including database.hh) for schema related purposes and use data_dictionary instead. data_dictionary::database::real_database() is called from several places, for these reasons: - calling yet-to-be-converted code - callers with a legitimate need to access data (e.g. system_keyspace) but with the ::database accessor removed from query_processor. We'll need to find another way to supply system_keyspace with data access. - to gain access to the wasm engine for testing whether used defined functions compile. We'll have to find another way to do this as well. The change is a straightforward replacement. One case in modification_statement had to change a capture, but everything else was just a search-and-replace. Some files that lost "database.hh" gained "mutation.hh", which they previously had access to through "database.hh".	2021-12-15 13:54:23 +02:00
Gleb Natapov	e9fafea5c1	migration_manager: pass raft_gr to the migration manager Migration manager will be use raft group zero to distribute schema changes.	2021-12-11 12:31:07 +02:00
Avi Kivity	f28552016f	Update seastar submodule * seastar f8a038a0a2...8d15e8e67a (21): > core/program_options: preserve defaultness of CLI arguments > log: Silence logger when logging > Include the core/loop.hh header inside when_all.hh header > http: Fix deprecated wrappers > foreign_ptr: Add concept > util: file: add read_entire_file > short_streams: move to util > Revert "Merge: file: util: add read_entire_file utilities" > foreign_ptr: declare destroy as a static method > Merge: file: util: add read_entire_file utilities > Merge "output_stream: handle close failure" from Benny > net: bring local_address() to seastar::connected_socket. > Merge "Allow programatically configuring seastar" from Botond > Merge 'core: clean up memory metric definitions' from John Spray > Add PopOS to debian list in install-dependencies.sh > Merge "make shared_mutex functions exception safe and noexcept" from Benny > on_internal_error: set_abort_on_internal_error: return current state > Implementation of iterator-range version of when_any > net: mark functions returning ethernet_address noexcept > net: ethernet_address: mark functions noexcept > shared_mutex: mark wake and unlock methods noexcept Contains patch from Botond Dénes <bdenes@scylladb.com>: db/config: configure logging based on app_template::seastar_options Scylla has its own config file which supports configuring aspects of logging, in addition to the built-in CLI logging options. When applying this configuration, the CLI provided option values have priority over the ones coming from the option file. To implement this scylla currently reads CLI options belonging to seastar from the boost program options variable map. The internal representation of CLI options however do not constitute an API of seastar and are thus subject to change (even if unlikely). This patch moves away from this practice and uses the new shiny C++ api: `app_template::seastar_options` to obtain the current logging options.	2021-12-08 14:21:11 +02:00
Pavel Emelyanov	2d8272dc03	thrift: Keep sharded proxy reference on thrift_handler Carried via main -> controller -> server -> factory -> handler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-03 17:48:19 +03:00
Pavel Emelyanov	e4f35e2139	migration_manager: Eliminate storage service from passive announcing Currently storage service acts as a glue between database schema value and the migration manager "passive_announce" call. This interposing is not required, migration manager can do all the management itself, and the linkage can be done in main. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-02 19:43:30 +02:00
Pavel Emelyanov	d4d0bd147e	migration_manager: Subscribe on gossiper events This is to start schema pulls upon on_join, on_alive and on_change ones in the next patch. Migration manager already has gossiper reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-02 19:43:30 +02:00
Nadav Har'El	605a2de398	config: change default prometheus_address handling, again In the very recent commit `3c0e703` fixing issue #8757, we changed the default prometheus_address setting in scylla.yaml to "localhost", to match the default listen_address in the same file. We explained in that commit how this helped developers who use an unchanged scylla.yaml, and how it didn't hurt pre-existing users who already had their own scylla.yaml. However, it was quickly noted by Tzach and Amnon that there is one use case that was hurt by that fix: Our existing documentation, such as the installation guide https://www.scylladb.com/download/?platform=centos ask the user to take our initial scylla.yaml, and modify listen_address, rpc_address, seeds, and cluster_name - and that's it. That document - and others - don't tell the user to also override prometheus_address, so users will likely forget to do so - and monitoring will not work for them. So this patch includes a different solution to #8757. What it does is: 1. The setting of prometheus_address in scylla.yaml is commented out. 2. In config.cc, prometheus_address defaults to empty. 3. In main.cc, if prometheus_address is empty (i.e., was not explicitly set by the user), the value of listen_address is used instead. In other words, the idea is that prometheus_address, if not explicitly set by the user, should default to listen_address - which is the address used to listen to the internal Scylla inter-node protocol. Because the documentation already tells the user to set listen_address and to not leave it set to localhost, setting it will also open up prometheus, thereby solving #9701. Meanwhile, developers who leave the default listen_address=localhost will also get prometheus_address=localhost, so the original #8757 is solved as well. Finally, for users who had an old scylla.yaml where prometheus_address was explicitly set to something, this setting will continue to be used. This was also a requirement of issue #8757. Fixes #9701. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211129155201.1000893-1-nyh@scylladb.com>	2021-12-02 19:43:30 +02:00
Avi Kivity	078f69c133	Merge "raft: (service) implement group 0 as a service" from Kostja " To ensure consistency of schema and topology changes, Scylla needs a linearizable storage for this data available at every member of the database cluster. The series introduces such storage as a service, available to all Scylla subsystems. Using this service, any other internal service such as gossip or migrations (schema) could persist changes to cluster metadata and expect this to be done in a consistent, linearizable way. The series uses the built-in Raft library to implement a dedicated Raft group, running on shard 0, which includes all members of the cluster (group 0), adds hooks to topology change events, such as adding or removing nodes of the cluster, to update group 0 membership, ensures the group is started when the server boots. The state machine for the group, i.e. the actual storage for cluster-wide information still remains a stub. Extending it to actually persist changes of schema or token ring is subject to a subsequent series. Another Raft related service was implemented earlier: Raft Group Registry. The purpose of the registry is to allow Scylla have an arbitrary number of groups, each with its own subset of cluster members and a relevant state machine, sharing a common transport. Group 0 is one (the first) group among many. " * 'raft-group-0-v12' of github.com:scylladb/scylla-dev: raft: (server) improve tracing raft: (metrics) fix spelling of waiters_awaken raft: make forwarding optional raft: (service) manage Raft configuration during topology changes raft: (service) break a dependency loop raft: (discovery) introduce leader discovery state machine system_keyspace: mark scylla_local table as always-sync commitlog system_keyspace: persistence for Raft Group 0 id and Raft Server Id raft: add a test case for adding entries on follower raft: (server) allow adding entries/modify config on a follower raft: (test) replace virtual with override in derived class raft: (server) fix a typo in exception message raft: (server) implement id() helper raft: (server) remove apply_dummy_entry() raft: (test) fix missing initialization in generator.hh	2021-11-30 16:24:51 +02:00
Piotr Sarna	ecd122a1b0	Merge 'alternator: rudimentary implementation of TTL expiration service' from Nadav Har'El In this patch series we add an implementation of an expiration service to Alternator, which periodically scans the data in the table, looking for expired items and deleting them. We also continue to improve the TTL test suite to cover additional corner cases discovered during the development of the code. This implementation is good enough to make all existing tests but one, plus a few new ones, pass, but is still a very partial and inefficient implementation littered with FIXMEs throughout the code. Among other things, this initial implementation doesn't do anything reasonable about pacing of the scan or about multiple tables, it scans entire items instead of only the needed parts, and because each shard "owns" a different subset of the token ranges, if a node goes down, partitions which it "owns" will not get expired. The current tests cannot expose these problems, so we will need to develop additional tests for them. Because this implementation is very partial, the Alternator TTL continues to remain "experimental", cannot be used without explicitly enabling this experimental feature, and must not be used for any important deployment. Refs #5060 but doesn't close the issue (let's not close it until we have a reasonably complete implementation - not this partial one). Closes #9624 * github.com:scylladb/scylla: alternator: fix TTL expiration scanner's handling of floating point test/alternator: add TTL test for more data test/alternator: remove "xfail" tag from passing tests in test_ttl.py test/alternator: make test_ttl.py tests fast on Alternator alternator: initial implmentation of TTL expiration service alternator: add another unwrap_number() variant alternator: add find_tag() function test/alternator: test another corner case of TTL setting test/alternator: test TTL expiration for table with sort key test/alternator: improve basic test for TTL expiration test/alternator: extract is_aws() function	2021-11-28 22:12:52 +02:00
Avi Kivity	ec775ba292	Merge "Remove more gms::get(_local)?_gossiper() calls" from Pavel E " This set covers simple but diverse cases: - cache hitrace calculator - repair - system keyspace (virtual table) - dht code - transport event notifier All the places just require straightforward arguments passing. And a reparation in transport -- event notifier needs a backref to the owning server. Remaining after this set is the snitch<->gossiper interaction and the cache hitrate app state update from table code. tests: unit(dev) " * 'br-unglobal-gossiper-cont' of https://github.com/xemul/scylla: transport: Use server gossiper in event notifier transport: Keep backreference from event_notifier transport: Keep gossiper on server dht: Pass gossiper to range_streamer::add_ranges dht: Pass gossiper argument to bootstrap system_keyspace: Keep gossiper on cluster_status_table code: Carry gossiper down to virtual tables creation repair: Use local gossiper reference cache_hitrate_calculator: Keep reference on gossiper	2021-11-28 14:18:28 +02:00
Pavel Solodovnikov	1365e2f13e	gms: feature_service: re-enable features on node startup Re-enable previously persisted enabled features on node startup. The features list to be enabled is read from `system.local#enabled_features`. In case an unknown feature is encountered, the node fails to boot with an exception, because that means the node is doing a prohibited downgrade procedure. Features should be enabled before commitlog starts replaying since some features affect storage (for example, when determining used sstable format). This patch implements a part of solution proposed by Tomek in https://github.com/scylladb/scylla/issues/4458. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-11-28 14:18:24 +02:00
Nadav Har'El	13a3aca460	alternator: initial implmentation of TTL expiration service In this patch we add an incomplete implementation of an expiration service to Alternator, which periodically scans the data in the table, looking for expired items and deleting them. This implementation involves a new "expiration service" which runs a background scan in each shard. Each shard "owns" a subset of the token ranges - the intersection of the node's primary ranges with this shard's token ranges - and scans those ranges over and over, deleting any items which are found expired. This implementation is good enough to make all existing tests but one pass, but is still a partial and inefficient implementation littered with FIXMEs throughout the code. Among other things, this implementation doesn't do anything reasonable about pacing of the scan or about multiple tables, it scans entire items instead of only the needed parts, and if a node goes down, the part of the token range which it "owns" will not be scanned for expiration (we need living nodes to take over the background expiration work for dead nodes). The current tests cannot expose these problems, so we will need to develop additional tests for them. Because this implementation is very partial, the Alternator TTL continues to remain "experimental", cannot be used without explicitly enabling this experimental feature, and must not be used for any important deployment. The new TTL expiration service will only run (at the moment) in the background if the Alternator TTL experimental feature is enabled and and if Alternator is enabled as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-25 22:01:37 +02:00
Konstantin Osipov	c22f945f11	raft: (service) manage Raft configuration during topology changes Operations of adding or removing a node to Raft configuration are made idempotent: they do nothing if already done, and they are safe to resume after a failure. However, since topology changes are not transactional, if a bootstrap or removal procedure fails midway, Raft group 0 configuration may go out of sync with topology state as seen by gossip. In future we must change gossip to avoid making any persistent changes to the cluster: all changes to persistent topology state will be done exclusively through Raft Group 0. Specifically, instead of persisting the tokens by advertising them through gossip, the bootstrap will commit a change to a system table using Raft group 0. nodetool will switch from looking at gossip-managed tables to consulting with Raft Group 0 configuration or Raft-managed tables. Once this transformation is done, naturally, adding a node to Raft configuration (perhaps as a non-voting member at first) will become the first persistent change to ring state applied when a node joins; removing a node from the Raft Group 0 configuration will become the last action when removing a node. Until this is done, do our best to avoid a cluster state when a removed node or a node which addition failed is stuck in Raft configuration, but the node is no longer present in gossip-managed system tables. In other words, keep the gossip the primary source of truth. For this purpose, carefully chose the timing when we join and leave Raft group 0: Join the Raft group 0 only after we've advertised our tokens, so the cluster is aware of this node, it's visible in nodetool status, but before node state jumps to "normal", i.e. before it accepts queries. Since the operation is idempotent, invoke it on each restart. Remove the node from Group 0 before its tokens are removed from gossip-managed system tables. This guarantees that if removal from Raft group 0 fails for whatever reason, the node stays in the ring, so nodetool removenode and friends are re-tried. Add tracing.	2021-11-25 12:35:42 +03:00
Pavel Emelyanov	43951318c8	transport: Keep gossiper on server The gossiper is needed by the transport::event_notifier. There's already gossiper reference on the transport controller, but it's a local reference, because controller doesn't need more. This patch upgrages controller reference to sharded<> and propagates it further up to the server. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:54:45 +03:00
Pavel Emelyanov	ef1960d034	code: Carry gossiper down to virtual tables creation One of the tables needs gossiper and uses global one. This patch prepares the fix by patching the main -> register_virtual_tables stack with the gossiper reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:52:55 +03:00
Pavel Emelyanov	770d34796b	cache_hitrate_calculator: Keep reference on gossiper The calculator needs to update its app-state on gossiper. Keeping a reference is safe -- gossiper starts early, the calculator -- at the very very end, stop in reverse. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:52:27 +03:00
Pavel Emelyanov	4a34226aa6	streaming, main: Remove global stream_manager Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-24 12:17:37 +03:00
Pavel Emelyanov	fd920e2420	streaming, api: Standardize the API start/stop Todays idea of API reg/unreg is to carry the target service via lambda captures down to the route handlers and unregister those handers before the target is about to stop. This patch makes it so for the streaming API. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-24 12:17:37 +03:00
Pavel Emelyanov	aaa58b7b89	storage_service: Keep streaming_manager reference The manager is drained() on drain/decommission/isolate. Since now it's storage_service who orchestrates all of the above, it needs and explicit reference on the target. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-24 12:17:35 +03:00
Pavel Emelyanov	c2c676784a	streaming: Fix interaction with gossiper Streaming manager registers itself in gossiper, so it needs an explicit dependency reference. Also it forgets to unregister itself, so do it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-24 12:15:59 +03:00
Pavel Emelyanov	73e10c7aed	streaming: Move start/stop onto common rails In case of streaming this mostly means dropping the global init/uninit calls and replacing them with sharded<stream_manager> instance. It's still global, but it's being fixed atm. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-24 12:15:58 +03:00
Benny Halevy	d344765ec6	get rid of the global batchlog_manager Now that it's unused. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	9cde52c58f	storage_service: keep a reference to the batchlog_manager Rather than accessing the global batchlog_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	03039e8f8a	main: allow setting the global batchlog_manager As a prerequisite to globalizing the batchlog_manager, allow setting a global pointer to it and instantiate the sharded<db::batchlog_manager> on the main/cql_test_env stack. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	1d7556d099	main: pass erm_factory to storage_service To be used for creating effective_replication_map when token_metadata changes, and update all keyspaces with it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-19 10:46:51 +02:00

1 2 3 4 5 ...

750 Commits