scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 19:21:01 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	864b2c5736	CMakeLists.txt: Add raft directory to source code directories Needed for IDE integration. Not used for building currently. Message-Id: <1601570008-19666-1-git-send-email-tgrabiec@scylladb.com>	2020-10-01 19:38:39 +03:00
Gleb Natapov	3e8dbb3c09	lwt: do not return unavailable exception from the 'learn' stage Unavailable exception means that operation was not started and it can be retried safely. If lwt fails in the learn stage though it most certainly means that its effect will be observable already. The patch returns timeout exception instead which means uncertainty. Fixes #7258 Message-Id: <20201001130724.GA2283830@scylladb.com>	2020-10-01 17:16:52 +02:00
Tomasz Grabiec	ca7f0c61f0	Merge "raft: initial implementation" from Gleb This is the beginning of raft protocol implementation. It only supports log replication and voter state machine. The main difference between this one and the RFC (besides having voter state machine) is that the approach taken here is to implement raft as a deterministic state machine and move all the IO processing away from the main logic. To do that some changes to RPC interface was required: all verbs are now one way meaning that sending a request does not wait for a reply and the reply arrives as a separate message (or not at all, it is safe to drop packets). * scylla-dev/raft-v4: raft: add a short readme file raft: compile raft tests raft: add raft tests raft: Implement log replication and leader election raft: Introduce raft interface header	2020-10-01 17:09:52 +02:00
Konstantin Osipov	9a5f2b87dc	raft: add a short readme file The file has a brief description of the code status, usage and some implementation assumptions.	2020-10-01 14:30:59 +03:00
Gleb Natapov	16cb009ea2	raft: compile raft tests Compilation is not enabled by default as it requires coroutines support and may require special compiler (until distributed one fixes all the bugs related to coroutines). To enable raft tests compilation new configure.py option is added (--build-raft).	2020-10-01 14:30:59 +03:00
Gleb Natapov	4959609589	raft: add raft tests Add test for currently implemented raft features. replication_test tests replication functionality with various initial log configurations. raft_fsm_test test voting state machine functionality.	2020-10-01 14:30:59 +03:00
Gleb Natapov	e1ac1a61c9	raft: Implement log replication and leader election This patch introduces partial RAFT implementation. It has only log replication and leader election support. Snapshotting and configuration change along with other, smaller features are not yet implemented. The approach taken by this implementation is to have a deterministic state machine coded in raft::fsm. What makes the FSM deterministic is that it does not do any IO by itself. It only takes an input (which may be a networking message, time tick or new append message), changes its state and produce an output. The output contains the state that has to be persisted, messages that need to be sent and entries that may be applied (in that order). The input and output of the FSM is handled by raft::server class. It uses raft::rpc interface to send and receive messages and raft::storage interface to implement persistence.	2020-10-01 14:30:59 +03:00
Gleb Natapov	c073997431	raft: Introduce raft interface header This commit introduce public raft interfaces. raft::server represents single raft server instance. raft::state_machine represents a user defined state machine. raft::rpc, raft::rpc_client and raft::storage are used to allow implementing custom networking and storage layers. A shared failure detector interface defines keep-alive semantics, required for efficient implementation of thousands of raft groups.	2020-10-01 14:30:59 +03:00
Piotr Dulikowski	bfbf02a657	transport/config: fix cross-shard use of updateable_value Recently, the cql_server_config::max_concurrent_requests field was changed to be an updateable_value, so that it is updated when the corresponding option in Scylla's configuration is live-reloaded. Unfortunately, due to how cql_server is constructed, this caused cql_server instances on all shards to store an updateable_value which pointed to an updateable_value_source on shard 0. Unsynchronized cross-shard memory operations ensue. The fix changes the cql_server_config so that it holds a function which creates an updateable_value appropriate for the given shard. This pattern is similar to another, already existing option in the config: get_service_memory_limiter_semaphore. This fix can be reverted if updateable_value becomes safe to use across shards. Tests: unit(dev) Fixes: #7310	2020-10-01 14:10:56 +03:00
Etienne Adam	98dc0dc03a	redis: only create required keyspaces/tables The 'redis_database_count' was already existing, but was not used when initializing the keyspaces. This patch merely uses it. I think it's better that way, it seems cleaner not to create 15 x 5 tables when we use only one redis database. Also change a test to test with a higher max number of database. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200930210256.4439-1-etienne.adam@gmail.com>	2020-10-01 10:27:03 +03:00
Wojciech Mitros	e79ad38425	tracing: add username to the session table In order to improve observability, add a username field to the the system_traces.sessions table. The system table should be change while upgrading by running the fix_system_distributed_tables.py script. Until the table is updated, the old behaviour is preserved. Fixes #6737.	2020-10-01 04:46:40 +02:00
Nadav Har'El	d73cf589e7	docs: fix typos in docs/alternator/alternator.md Discovered by running a spell-checker. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930101046.76710-1-nyh@scylladb.com>	2020-10-01 04:46:40 +02:00
Nadav Har'El	8db01aeeb4	docs: fix typo in alternator/getting-started.md Fix a typo reported by a user. Ran spell-checker to verify there are no other obvious spelling mistakes in that file. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930084304.74776-1-nyh@scylladb.com>	2020-10-01 04:46:40 +02:00
Avi Kivity	701d24a832	Merge 'Enhance max concurrent requests code' from Piotr Sarna This miniseries enhances the code from #7279 by: * adding metrics for shed requests, which will allow to pinpoint the problem if the max concurrent requests threshold is too low * making the error message more comprehensive by pointing at the variable used to set max concurrent requests threshold Example of an ehanced error message: ``` ConnectionException('Failed to initialize new connection to 127.0.0.1: Error from server: code=1001 [Coordinator node overloaded] message="too many in-flight requests (configured via max_concurrent_requests_per_shard): 18"',)}) ``` Closes #7299 * github.com:scylladb/scylla: transport: make _requests_serving param uint32_t transport: make overloaded error message more descriptive transport: add requests_shed metrics	2020-10-01 04:46:40 +02:00
Piotr Sarna	876e9fe51a	transport: make _requests_serving param uint32_t It's not realistic for a shard to have over 4 billion concurrent requests, so this value can be safely represented in 32 bits. Also, since the current concurrency limit is represented in uint32_t, it makes sense for these two to have matching types.	2020-09-30 08:20:52 +02:00
Piotr Sarna	d18f68f1c1	transport: make overloaded error message more descriptive The message now mentions the config variable used to set the limit of max allowed concurrent requests.	2020-09-30 08:20:51 +02:00
Piotr Sarna	792ff3757a	transport: add requests_shed metrics The counter shows a total number of requests shed due to overload.	2020-09-30 08:20:50 +02:00
Avi Kivity	fd1dd0eac7	Merge "Track the memory consumption of reader buffers" from Botond " The last major untracked area of the reader pipeline is the reader buffers. These scale with the number of readers as well as with the size and shape of data, so their memory consumption is unpredictable varies wildly. For example many small rows will trigger larger buffers allocated within the `circular_buffer<mutation_fragment>`, while few larger rows will consume a lot of external memory. This series covers this area by tracking the memory consumption of both the buffer and its content. This is achieved by passing a tracking allocator to `circular_buffer<mutation_fragment>` so that each allocation it makes is tracked. Additionally, we now track the memory consumption of each and every mutation fragment through its whole lifetime. Initially I contemplated just tracking the `_buffer_size` of `flat_mutation_reader::impl`, but concluded that as our reader trees are typically quite deep, this would result in a lot of unnecessary `signal()`/`consume()` calls, that scales with the number of mutation fragments and hence adds to the already considerable per mutation fragment overhead. The solution chosen in this series is to instead track the memory consumption of the individual mutation fragments, with the observation that these are typically always moved and very rarely copied, so the number of `signal()`/`consume()` calls will be minimal. This additional tracking introduces an interesting dilemma however: readers will now have significant memory on their account even before being admitted. So it may happen that they can prevent their own admission via this memory consumption. To prevent this, memory consumption is only forwarded to the semaphore upon admission. This might be solved when the semaphore is moved to the front -- before the cache. Another consequence of this additional, more complete tracking is that evictable readers now consume memory even when the underlying reader is evicted. So it may happen that even though no reader is currently admitted, all memory is consumed from the semaphore. To prevent any such deadlocks, the semaphore now admits a reader unconditionally if no reader is admitted -- that is if all count resources all available. Refs: #4176 Tests: unit(dev, debug, release) " * 'track-reader-buffers/v2' of https://github.com/denesb/scylla: (37 commits) test/manual/sstable_scan_footprint_test: run test body in statement sched group test/manual/sstable_scan_footprint_test: move test main code into separate function test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s test/manual/sstable_scan_footprint_test: make clustering row size configurable test/manual/sstable_scan_footprint_test: document sstable related command line arguments mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*() test: simple_schema: add make_static_row() reader_permit: reader_resources: add operator== mutation_fragment: memory_usage(): remove unused schema parameter mutation_fragment: track memory usage through the reader_permit reader_permit: resource_units: add permit() and resources() accessors mutation_fragment: add schema and permit partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment mutation_fragment: remove as_mutable_end_of_partition() mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/ mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/ mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/ flat_mutation_reader: make _buffer a tracked buffer mutation_reader: extract the two fill_buffer_result into a single one ...	2020-09-29 16:08:16 +03:00
Pekka Enberg	8f17ca2d1a	scripts/refresh-submodules.sh: Add python3 submodule Message-Id: <20200928075422.377888-1-penberg@scylladb.com>	2020-09-29 16:06:32 +03:00
Yaron Kaikov	d48df44f26	configure.py: build python3, jmx, tools and unified-tar only in relevant dist-{mode} Today when ever we are building scylla in a singel mode we still building jmx, tools and python3 for all dev,release and debug. Let's make sure we build only in relevant build mode Also adding unified-tar to ninja build Closes #7260	2020-09-29 15:41:52 +03:00
Juliusz Stasiewicz	0afa738a8f	tracing: Fix error on slow batches `trace_keyspace_helper::make_slow_query_mutation_data` expected a "query" key in its parameters, which does not appear in case of e.g. batches of prepared statements. This is example of failing `record.parameters`: ``` ...{"query[0]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}, {"query[1]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}... ``` In such case Scylla recorded no trace and said: ``` ERROR 2020-09-28 10:09:36,696 [shard 3] trace_keyspace_helper - No "query" parameter set for a session requesting a slow_query_log record ``` Fix here is to leave query empty if not found. The users can still retrieve the query contents from existing info. Fixes #5843 Closes #7293	2020-09-29 13:24:39 +02:00
Asias He	eedcee7f31	gossip: Reduce unncessary VIEW_BACKLOG updates The blacklog of current and max in VIEW_BACKLOG is not update but the nodes are updating VIEW_BACKLOG all the time. For example: ``` INFO 2020-03-06 17:13:46,761 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486026590,718) INFO 2020-03-06 17:13:46,821 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486026531,742) INFO 2020-03-06 17:13:47,765 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486027590,721) INFO 2020-03-06 17:13:47,825 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486027531,745) INFO 2020-03-06 17:13:48,772 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486028590,726) INFO 2020-03-06 17:13:48,833 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486028531,750) INFO 2020-03-06 17:13:49,772 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486029590,729) INFO 2020-03-06 17:13:49,832 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486029531,753) ``` The downside of such updates: - Introduces more gossip exchange traffic - Updates system.peers all the time The extra unnecessary gossip traffic is fine to a cluster in a good shape but when some of the nodes or shards are loaded, such messages and the handling of such messages can make the system even busy. With this patch, VIEW_BACKLOG is updated only when the backlog is really updated. Btw, we can even make the update only when the change of the backlog is great than a threshold, e.g., 5%, which can reduce the traffic even further. Fixes #5970	2020-09-29 13:37:37 +03:00
Avi Kivity	6fdc8f28a9	Update tools/jmx submodule * tools/jmx 45e4f28...25bcd76 (1): > install.sh: stop using symlinks for systemd units on nonroot mode Fixes #7288.	2020-09-29 13:32:45 +03:00
Takuya ASADA	8504332e17	scylla_setup: skip offline warnings on nonroot mode Since most of the scripts requires root privilege, we don't shows up offline warning on nonroot mode. Fixes #7286 Closes #7287	2020-09-29 13:30:13 +03:00
Eliran Sinvani	925cdc9ae1	consistency level: fix wrong quorum calculation whe RF = 0 We used to calculate the number of endpoints for quorum and local_quorum unconditionally as ((rf / 2) + 1). This formula doesn't take into account the corner case where RF = 0, in this situation quorum should also be 0. This commit adds the missing corner case. Tests: Unit Tests (dev) Fixes #6905 Closes #7296	2020-09-29 13:25:41 +03:00
Takuya ASADA	ba29074c42	install.sh: stop using symlinks for systemd units on nonroot mode On some environment, systemctl enable <service> fails when we use symlink. So just directly copy systemd units to ~/.config/systemd/user, instead of creating symlink. Fixes #7288 Closes #7290	2020-09-29 12:20:41 +03:00
Piotr Sarna	9e5ce5a93c	counters: remove unused 1.7.4 counter order code After cleaning up old cluster features (`253a7640e3`) the code for special handling of 1.7.4 counter order was effectively only used in its own tests, so it can be safely removed. Closes #7289	2020-09-29 12:16:58 +03:00
Avi Kivity	57f377e1fe	Merge 'Add max concurrent requests configuration option to coordinator' from Piotr Sarna This series approaches issue #7072 and provides a very simple mechanism for limiting the number of concurrent CQL requests being served on a shard. Once the limit is hit, new requests will be instantly refused and OverloadedException will be returned to the client. This mechanism has many improvement opportunities: * shedding requests gradually instead of having one hard limit, * having more than one limit per different types of queries (reads, writes, schema changes, ...), * not using a preconfigured value at all, and instead figuring out the limit dynamically, * etc. ... and none of these are taken into account in this series, which only adds a very basic configuration variable. The variable can be updated live without a restart - it can be done by updating the .yaml file and triggering a configuration re-read via sending the SIGHUP signal to Scylla. The default value for this parameter is a very large number, which translates to effectively not shedding any requests at all. Refs #7072 Closes #7279 * github.com:scylladb/scylla: transport: make max_concurrent_requests_per_shard reloadable transport: return exceptional future instead of throwing transport,config: add a param for max request concurrency exceptions: make a single-param constructor explicit exceptions: add a constructor based on custom message	2020-09-29 12:14:03 +03:00
Pekka Enberg	1adf2cc848	Revert "scylla_ntp_setup: use chrony on all distributions" This reverts commit `8366d2231d` because it causes the following "scylla_setup" failure on Ubuntu 16.04: Command: 'sudo /usr/lib/scylla/scylla_setup --nic ens5 --disks /dev/nvme0n1 --swap-directory / ' Exit code: 1 Stdout: Setting up libtomcrypt0:amd64 (1.17-7ubuntu0.1) ... Setting up chrony (2.1.1-1ubuntu0.1) ... Creating '_chrony' system user/group for the chronyd daemon… Creating config file /etc/chrony/chrony.conf with new version Processing triggers for libc-bin (2.23-0ubuntu11.2) ... Processing triggers for ureadahead (0.100.0-19.1) ... Processing triggers for systemd (229-4ubuntu21.29) ... 501 Not authorised NTP setup failed. Stderr: chrony.service is not a native service, redirecting to systemd-sysv-install Executing /lib/systemd/systemd-sysv-install enable chrony Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/scylla_ntp_setup", line 63, in <module> run('chronyc makestep') File "/opt/scylladb/scripts/scylla_util.py", line 504, in run return subprocess.run(cmd, stdout=stdout, stderr=stderr, shell=shell, check=exception, env=scylla_env).returncode File "/opt/scylladb/python3/lib64/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['chronyc', 'makestep']' returned non-zero exit status 1.	2020-09-29 11:23:23 +03:00
Piotr Sarna	4b856cf62d	transport: make max_concurrent_requests_per_shard reloadable This configuration entry is expected to be used as a quick fix for an overloaded node, so it should be possible to reload this value without having to restart the server.	2020-09-29 10:11:36 +02:00
Piotr Sarna	4da8957461	transport: return exceptional future instead of throwing Throwing bears an additional cost, so it's better to simply construct the error in place and return it.	2020-09-29 10:00:30 +02:00
Piotr Sarna	b4db6d2598	transport,config: add a param for max request concurrency The newly introduced parameter - max_concurrent_requests_per_shard - can be used to limit the number of in-flight requests a single coordinator shard can handle. Each surplus request will be immediately refused by returning OverloadedException error to the client. The default value for this parameter is large enough to never actually shed any requests. Currently, the limit is only applied to CQL requests - other frontends like alternator and redis are not throttled yet.	2020-09-29 09:59:30 +02:00
Botond Dénes	2ee026f26f	test/manual/sstable_scan_footprint_test: run test body in statement sched group So that queries are processed in said scheduling group and thus they use the user read concurrency semaphore.	2020-09-28 11:27:49 +03:00
Botond Dénes	272a54b81c	test/manual/sstable_scan_footprint_test: move test main code into separate function	2020-09-28 11:27:49 +03:00
Botond Dénes	29861b068e	test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s To avoid stalls.	2020-09-28 11:27:49 +03:00
Botond Dénes	daa9fa72f1	test/manual/sstable_scan_footprint_test: make clustering row size configurable So that large-row workloads can be simulated too.	2020-09-28 11:27:49 +03:00
Botond Dénes	2ff326a41a	test/manual/sstable_scan_footprint_test: document sstable related command line arguments	2020-09-28 11:27:49 +03:00
Botond Dénes	ceb308411c	mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*()	2020-09-28 11:27:49 +03:00
Botond Dénes	ceb0b02ee8	test: simple_schema: add make_static_row()	2020-09-28 11:27:49 +03:00
Botond Dénes	63578bf0a7	reader_permit: reader_resources: add operator==	2020-09-28 11:27:49 +03:00
Botond Dénes	256140a033	mutation_fragment: memory_usage(): remove unused schema parameter The memory usage is now maintained and updated on each change to the mutation fragment, so it needs not be recalculated on a call to `memory_usage()`, hence the schema parameter is unused and can be removed.	2020-09-28 11:27:47 +03:00
Botond Dénes	041d71bd6f	mutation_fragment: track memory usage through the reader_permit The memory usage of mutation fragments is now tracked through its lifetime through a reader permit. This was the last major (to my current knowledge) untracked piece of the reader pipeline.	2020-09-28 11:27:29 +03:00
Botond Dénes	52662f17ea	reader_permit: resource_units: add permit() and resources() accessors	2020-09-28 11:27:29 +03:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	54357221f0	partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment It is what its callers want anyway.	2020-09-28 10:53:56 +03:00
Botond Dénes	1e6285d776	mutation_fragment: remove as_mutable_end_of_partition() There is nothing to mutate on a partition_end fragment.	2020-09-28 10:53:56 +03:00
Botond Dénes	5079b9ccf1	mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_mutation_start() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	72a88e0257	mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_range_tombstone() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	4f5ccf82cb	mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_clustering_row() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	f2b9cad4c6	mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_static_row() &&`.	2020-09-28 10:53:56 +03:00

1 2 3 4 5 ...

23790 Commits