scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Calle Wilund	04deacd7e7	alternator::streams: Improve paging and fix parent-child calculation Fixes #7345 Fixes #7346 Do a more efficient collection skip when doing paging, instead of iterating the full sets. Ensure some semblance of sanity in the parent-child relationship between shards by ensuring token order sorting and finding the apparent previous ID coverting the approximate range of new gen. Fix endsequencenumber generation by looking at whether we are last gen or not, instead of the (not filled in) 'expired' column.	2020-10-07 08:43:39 +00:00
Calle Wilund	3cdd7fe191	alternator::streams: Remove table from shard_id Fixes #7344 It is not data really needed, as shard_id:s are not required to be unique across streams, and also because the length limit on shard_id text representation. As a side effect, shard iter instead carries the stream arn.	2020-10-07 08:43:39 +00:00
Calle Wilund	f1ad66218a	alternator::streams: Filter our cdc streams older than data/table Fixes #7347 If cdc stream id:s are older than either table creation or now - 24h we can skip them in describe_stream, to minimize the amount of shards being returned.	2020-10-07 06:13:28 +00:00
Calle Wilund	5081d354be	alternator::error: Add a few dynamo exception types	2020-10-06 12:52:58 +00:00
Avi Kivity	4f30c479f3	Merge "token_metadata cleanup" from Benny " Misc. cleanups and minor optimizations of token_metadata methods in preparation to futurizing parts of the api around update_pending_ranges and abstract_replication_strategy::calculate_natural_endpoints, to prevent reactor stalls on these paths Test: unit(dev) " * 'token_metadata_cleanup' of github.com:bhalevy/scylla: token_metadata: get rid of unused calculate_pending_ranges_for_* methods token_metadata: get rid of clone_after_all_settled token_metadata_impl: remove_endpoint: do not sort tokens token_metadata_impl: always sort_tokens in place	2020-10-05 13:31:59 +03:00
Takuya ASADA	0f786f05fe	install.sh: logging to scylla-server.log when journalctl --user does not work On some environment such as CentOS8, journalctl --user -xe does not work since journald is running in volatile mode. The issue cannnot fix in non-root mode, as a workaround we should logging to a file instead of journal. Also added scylla_logrotate to ExecStartPre which rename previous log file, since StandardOutput=file:/path/to/file will erase existing file when service restarted. Fixes #7131 Closes #7326	2020-10-05 13:17:27 +03:00
Avi Kivity	d72465531e	build: use consistent version-release strings across submodules Instead of relying on SCYLLA-VERSION-GEN to be consistently updated in each submodule, propagate the top-level product-version-release to all submodules. This reduces the churn required for each release, and makes the release strings consistent (previously, the git hash in each was different). Closes #7268	2020-10-05 12:32:49 +03:00
Avi Kivity	715d50bc85	Update seastar submodule * seastar 292ba734bc...8c8fd3ed28 (15): > semaphore_units: add return_units and return_all > semaphore_units: release: mark as noexcept > circular_buffer: support non-default-constructible allocators correctly > core/shared_ptr: Expose use count through {lw_}enable_shared_from_this > memory: align allocations to std::max_align_t > util/log: logger::do_log(): go easier on allocations > doc: add link to multipage version of tutorial > doc: fix the output directories of split and tutorial.html > build: do not copy htmlsplit.py to build dir > doc: add "--input" and "--output-dir" options to htmlsplit.py > doc: update split script to use xml.etree.ElementTree > Merge "shared_future: make functions noexcept" from Benny > tutorial: add linebreak between sections > tutorial: format "future<int>" as inline code block > docs: specify HTML language code for tutorial.html	2020-10-04 21:30:27 +03:00
Etienne Adam	46f0354cdb	redis: pass request as a reference This patch change the way the request object is passed, using a reference instead of temporaries. 'exists' test is passing in debug mode, whereas it was always failing before. Fixes #7261 by ensuring request object is alive for all commands during the whole request duration. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200924202034.30399-1-etienne.adam@gmail.com>	2020-10-04 14:58:00 +03:00
Avi Kivity	5b5b8b3264	lua: be compatibile with Lua 5.4's lua_resume() Lua 5.4 added an extra parameter to lua_resume()[1]. The parameter denotes the number of arguments yielded, but our coroutines don't yield any arguments, so we can just ignore it. Define a macro to allow adding extra stuff with Lua 5.4, and use it to supply the extra parameter. [1] https://www.lua.org/manual/5.4/manual.html#8.3 Closes #7324	2020-10-04 14:07:51 +03:00
Nadav Har'El	ad48d8b43c	Merge 'idl: fix definition order related build failures with clang' from Avi Kivity Clang eagerly instantiates templates, apparently with the following algorithm: - if both the declaration and definition are seen at the time of instantiation, instantiate the template - if only the declaration is see at the time of instantiation, just emit a reference to the template; even if the definition is later seen, it is not instantiated The "reference" in the second case is a relocation entry in the object file that is satisfied at link time by the linker, but if no other object file instantiated the needed template, a link error results. These problems are hard to diagnose but easy to fix. This series fixes all known such issues in the code base. It was tested on gcc as well. Closes #7322 * github.com:scylladb/scylla: query-result-reader: order idl implementations correctly frozen_schema: order idl implementations correctly idl-compiler: generate views after serializers	2020-10-04 11:16:19 +03:00
Takuya ASADA	d611d74905	dist/common/scripts/scylla_setup: force developer mode on nonroot when NOFILE is too low On Ubuntu 16/18 and Debian 9, LimitNOFILE is set to 4096 and not able to override from user unit. To run scylla-server in such environment, we need to turn on developer mode and show warnings. Fixes #7133 Closes #7323	2020-10-04 10:16:30 +03:00
Avi Kivity	4b40bc5065	query-result-reader: order idl implementations correctly Clang eagerly instantiates templates, so if it needs a template function for which it has a declaration but not a definition, it will not instantiate the definition when it sees it. This causes link errors. Fix by ordering the idl implementation files so that definitions come before uses.	2020-10-03 19:56:29 +03:00
Avi Kivity	94fcec99d1	frozen_schema: order idl implementations correctly Clang eagerly instantiates templates, so if it needs a template function for which it has a declaration but not a definition, it will not instantiate the definition when it sees it. This causes link errors. Fix by ordering the idl implementation files so that definitions come before uses.	2020-10-03 19:56:28 +03:00
Avi Kivity	a99aba9e48	idl-compiler: generate views after serializers Clang eagerly instantiates templates, so if it needs a template function for which it has a declaration but not a definition, it will not instantiate the definition when it sees it. This causes link errors. In this case, the views use the serializer implementations, but are generated before them. Fix by generating the view implementations after the serializer implementations that they use.	2020-10-03 19:56:25 +03:00
Tomasz Grabiec	40b42393d2	Merge "Raft: disable boost tests, add disable to test.py" from Alejo Add disable option for test configuration. Tests in this list will be disabled for all modes. * alejo/next-disable-raft-tests-01: Raft: disable boost tests for now Tests: add disable to configuration Raft: Remove tests for now	2020-10-02 15:51:13 +02:00
Yaron Kaikov	bec0c15ee9	configure.py: Add version to unified tarball filename Let's add the version and release to unified tarball filename to avoid having to do that in release engineering pipelines, for example. Closes #7317	2020-10-02 15:48:11 +03:00
Alejo Sanchez	bb67d15e2f	Raft: disable boost tests for now Disable raft fsm boost tests until raft is part of build. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-02 14:03:01 +02:00
Alejo Sanchez	eff7b63c08	Tests: add disable to configuration For suite.yaml add an extra configuration option disable. Tests in this list will disabled for all modes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-02 14:01:50 +02:00
Alejo Sanchez	ef170a5088	Raft: Remove tests for now Remove raft C++ tests until raft is included in build process. [tgrabiec]: Fixes test.py failure. Tests are not compiled unless --build-raft is passed to configure.py and we cannot enable it by default yet. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20201002102847.1140775-1-alejo.sanchez@scylladb.com>	2020-10-02 12:42:21 +02:00
Alejo Sanchez	4e26dad3a0	Raft: Remove tests for now Remove raft C++ tests until raft is included in build process. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-02 12:26:05 +02:00
Tomasz Grabiec	864b2c5736	CMakeLists.txt: Add raft directory to source code directories Needed for IDE integration. Not used for building currently. Message-Id: <1601570008-19666-1-git-send-email-tgrabiec@scylladb.com>	2020-10-01 19:38:39 +03:00
Gleb Natapov	3e8dbb3c09	lwt: do not return unavailable exception from the 'learn' stage Unavailable exception means that operation was not started and it can be retried safely. If lwt fails in the learn stage though it most certainly means that its effect will be observable already. The patch returns timeout exception instead which means uncertainty. Fixes #7258 Message-Id: <20201001130724.GA2283830@scylladb.com>	2020-10-01 17:16:52 +02:00
Tomasz Grabiec	ca7f0c61f0	Merge "raft: initial implementation" from Gleb This is the beginning of raft protocol implementation. It only supports log replication and voter state machine. The main difference between this one and the RFC (besides having voter state machine) is that the approach taken here is to implement raft as a deterministic state machine and move all the IO processing away from the main logic. To do that some changes to RPC interface was required: all verbs are now one way meaning that sending a request does not wait for a reply and the reply arrives as a separate message (or not at all, it is safe to drop packets). * scylla-dev/raft-v4: raft: add a short readme file raft: compile raft tests raft: add raft tests raft: Implement log replication and leader election raft: Introduce raft interface header	2020-10-01 17:09:52 +02:00
Konstantin Osipov	9a5f2b87dc	raft: add a short readme file The file has a brief description of the code status, usage and some implementation assumptions.	2020-10-01 14:30:59 +03:00
Gleb Natapov	16cb009ea2	raft: compile raft tests Compilation is not enabled by default as it requires coroutines support and may require special compiler (until distributed one fixes all the bugs related to coroutines). To enable raft tests compilation new configure.py option is added (--build-raft).	2020-10-01 14:30:59 +03:00
Gleb Natapov	4959609589	raft: add raft tests Add test for currently implemented raft features. replication_test tests replication functionality with various initial log configurations. raft_fsm_test test voting state machine functionality.	2020-10-01 14:30:59 +03:00
Gleb Natapov	e1ac1a61c9	raft: Implement log replication and leader election This patch introduces partial RAFT implementation. It has only log replication and leader election support. Snapshotting and configuration change along with other, smaller features are not yet implemented. The approach taken by this implementation is to have a deterministic state machine coded in raft::fsm. What makes the FSM deterministic is that it does not do any IO by itself. It only takes an input (which may be a networking message, time tick or new append message), changes its state and produce an output. The output contains the state that has to be persisted, messages that need to be sent and entries that may be applied (in that order). The input and output of the FSM is handled by raft::server class. It uses raft::rpc interface to send and receive messages and raft::storage interface to implement persistence.	2020-10-01 14:30:59 +03:00
Gleb Natapov	c073997431	raft: Introduce raft interface header This commit introduce public raft interfaces. raft::server represents single raft server instance. raft::state_machine represents a user defined state machine. raft::rpc, raft::rpc_client and raft::storage are used to allow implementing custom networking and storage layers. A shared failure detector interface defines keep-alive semantics, required for efficient implementation of thousands of raft groups.	2020-10-01 14:30:59 +03:00
Piotr Dulikowski	bfbf02a657	transport/config: fix cross-shard use of updateable_value Recently, the cql_server_config::max_concurrent_requests field was changed to be an updateable_value, so that it is updated when the corresponding option in Scylla's configuration is live-reloaded. Unfortunately, due to how cql_server is constructed, this caused cql_server instances on all shards to store an updateable_value which pointed to an updateable_value_source on shard 0. Unsynchronized cross-shard memory operations ensue. The fix changes the cql_server_config so that it holds a function which creates an updateable_value appropriate for the given shard. This pattern is similar to another, already existing option in the config: get_service_memory_limiter_semaphore. This fix can be reverted if updateable_value becomes safe to use across shards. Tests: unit(dev) Fixes: #7310	2020-10-01 14:10:56 +03:00
Etienne Adam	98dc0dc03a	redis: only create required keyspaces/tables The 'redis_database_count' was already existing, but was not used when initializing the keyspaces. This patch merely uses it. I think it's better that way, it seems cleaner not to create 15 x 5 tables when we use only one redis database. Also change a test to test with a higher max number of database. Signed-off-by: Etienne Adam <etienne.adam@gmail.com> Message-Id: <20200930210256.4439-1-etienne.adam@gmail.com>	2020-10-01 10:27:03 +03:00
Wojciech Mitros	e79ad38425	tracing: add username to the session table In order to improve observability, add a username field to the the system_traces.sessions table. The system table should be change while upgrading by running the fix_system_distributed_tables.py script. Until the table is updated, the old behaviour is preserved. Fixes #6737.	2020-10-01 04:46:40 +02:00
Nadav Har'El	d73cf589e7	docs: fix typos in docs/alternator/alternator.md Discovered by running a spell-checker. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930101046.76710-1-nyh@scylladb.com>	2020-10-01 04:46:40 +02:00
Nadav Har'El	8db01aeeb4	docs: fix typo in alternator/getting-started.md Fix a typo reported by a user. Ran spell-checker to verify there are no other obvious spelling mistakes in that file. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200930084304.74776-1-nyh@scylladb.com>	2020-10-01 04:46:40 +02:00
Avi Kivity	701d24a832	Merge 'Enhance max concurrent requests code' from Piotr Sarna This miniseries enhances the code from #7279 by: * adding metrics for shed requests, which will allow to pinpoint the problem if the max concurrent requests threshold is too low * making the error message more comprehensive by pointing at the variable used to set max concurrent requests threshold Example of an ehanced error message: ``` ConnectionException('Failed to initialize new connection to 127.0.0.1: Error from server: code=1001 [Coordinator node overloaded] message="too many in-flight requests (configured via max_concurrent_requests_per_shard): 18"',)}) ``` Closes #7299 * github.com:scylladb/scylla: transport: make _requests_serving param uint32_t transport: make overloaded error message more descriptive transport: add requests_shed metrics	2020-10-01 04:46:40 +02:00
Benny Halevy	5a250f529f	token_metadata: get rid of unused calculate_pending_ranges_for_* methods They are only called inernally by token_metadata_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:16:23 +03:00
Benny Halevy	41e5a3a245	token_metadata: get rid of clone_after_all_settled It's unused. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:15:11 +03:00
Benny Halevy	105a2f5244	token_metadata_impl: remove_endpoint: do not sort tokens Call sort_tokens at the caller as all call sites from within token_metadata_impl call remove_endpoint for multiple endpoints so the tokens can be re-sorted only once, when done removing all tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:12:32 +03:00
Benny Halevy	86303f4fdd	token_metadata_impl: always sort_tokens in place No need to return the sorted tokens vector as it's always assigned to _sorted_tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:08:56 +03:00
Piotr Sarna	876e9fe51a	transport: make _requests_serving param uint32_t It's not realistic for a shard to have over 4 billion concurrent requests, so this value can be safely represented in 32 bits. Also, since the current concurrency limit is represented in uint32_t, it makes sense for these two to have matching types.	2020-09-30 08:20:52 +02:00
Piotr Sarna	d18f68f1c1	transport: make overloaded error message more descriptive The message now mentions the config variable used to set the limit of max allowed concurrent requests.	2020-09-30 08:20:51 +02:00
Piotr Sarna	792ff3757a	transport: add requests_shed metrics The counter shows a total number of requests shed due to overload.	2020-09-30 08:20:50 +02:00
Avi Kivity	fd1dd0eac7	Merge "Track the memory consumption of reader buffers" from Botond " The last major untracked area of the reader pipeline is the reader buffers. These scale with the number of readers as well as with the size and shape of data, so their memory consumption is unpredictable varies wildly. For example many small rows will trigger larger buffers allocated within the `circular_buffer<mutation_fragment>`, while few larger rows will consume a lot of external memory. This series covers this area by tracking the memory consumption of both the buffer and its content. This is achieved by passing a tracking allocator to `circular_buffer<mutation_fragment>` so that each allocation it makes is tracked. Additionally, we now track the memory consumption of each and every mutation fragment through its whole lifetime. Initially I contemplated just tracking the `_buffer_size` of `flat_mutation_reader::impl`, but concluded that as our reader trees are typically quite deep, this would result in a lot of unnecessary `signal()`/`consume()` calls, that scales with the number of mutation fragments and hence adds to the already considerable per mutation fragment overhead. The solution chosen in this series is to instead track the memory consumption of the individual mutation fragments, with the observation that these are typically always moved and very rarely copied, so the number of `signal()`/`consume()` calls will be minimal. This additional tracking introduces an interesting dilemma however: readers will now have significant memory on their account even before being admitted. So it may happen that they can prevent their own admission via this memory consumption. To prevent this, memory consumption is only forwarded to the semaphore upon admission. This might be solved when the semaphore is moved to the front -- before the cache. Another consequence of this additional, more complete tracking is that evictable readers now consume memory even when the underlying reader is evicted. So it may happen that even though no reader is currently admitted, all memory is consumed from the semaphore. To prevent any such deadlocks, the semaphore now admits a reader unconditionally if no reader is admitted -- that is if all count resources all available. Refs: #4176 Tests: unit(dev, debug, release) " * 'track-reader-buffers/v2' of https://github.com/denesb/scylla: (37 commits) test/manual/sstable_scan_footprint_test: run test body in statement sched group test/manual/sstable_scan_footprint_test: move test main code into separate function test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s test/manual/sstable_scan_footprint_test: make clustering row size configurable test/manual/sstable_scan_footprint_test: document sstable related command line arguments mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*() test: simple_schema: add make_static_row() reader_permit: reader_resources: add operator== mutation_fragment: memory_usage(): remove unused schema parameter mutation_fragment: track memory usage through the reader_permit reader_permit: resource_units: add permit() and resources() accessors mutation_fragment: add schema and permit partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment mutation_fragment: remove as_mutable_end_of_partition() mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/ mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/ mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/ flat_mutation_reader: make _buffer a tracked buffer mutation_reader: extract the two fill_buffer_result into a single one ...	2020-09-29 16:08:16 +03:00
Pekka Enberg	8f17ca2d1a	scripts/refresh-submodules.sh: Add python3 submodule Message-Id: <20200928075422.377888-1-penberg@scylladb.com>	2020-09-29 16:06:32 +03:00
Yaron Kaikov	d48df44f26	configure.py: build python3, jmx, tools and unified-tar only in relevant dist-{mode} Today when ever we are building scylla in a singel mode we still building jmx, tools and python3 for all dev,release and debug. Let's make sure we build only in relevant build mode Also adding unified-tar to ninja build Closes #7260	2020-09-29 15:41:52 +03:00
Juliusz Stasiewicz	0afa738a8f	tracing: Fix error on slow batches `trace_keyspace_helper::make_slow_query_mutation_data` expected a "query" key in its parameters, which does not appear in case of e.g. batches of prepared statements. This is example of failing `record.parameters`: ``` ...{"query[0]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}, {"query[1]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}... ``` In such case Scylla recorded no trace and said: ``` ERROR 2020-09-28 10:09:36,696 [shard 3] trace_keyspace_helper - No "query" parameter set for a session requesting a slow_query_log record ``` Fix here is to leave query empty if not found. The users can still retrieve the query contents from existing info. Fixes #5843 Closes #7293	2020-09-29 13:24:39 +02:00
Asias He	eedcee7f31	gossip: Reduce unncessary VIEW_BACKLOG updates The blacklog of current and max in VIEW_BACKLOG is not update but the nodes are updating VIEW_BACKLOG all the time. For example: ``` INFO 2020-03-06 17:13:46,761 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486026590,718) INFO 2020-03-06 17:13:46,821 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486026531,742) INFO 2020-03-06 17:13:47,765 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486027590,721) INFO 2020-03-06 17:13:47,825 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486027531,745) INFO 2020-03-06 17:13:48,772 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486028590,726) INFO 2020-03-06 17:13:48,833 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486028531,750) INFO 2020-03-06 17:13:49,772 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486029590,729) INFO 2020-03-06 17:13:49,832 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486029531,753) ``` The downside of such updates: - Introduces more gossip exchange traffic - Updates system.peers all the time The extra unnecessary gossip traffic is fine to a cluster in a good shape but when some of the nodes or shards are loaded, such messages and the handling of such messages can make the system even busy. With this patch, VIEW_BACKLOG is updated only when the backlog is really updated. Btw, we can even make the update only when the change of the backlog is great than a threshold, e.g., 5%, which can reduce the traffic even further. Fixes #5970	2020-09-29 13:37:37 +03:00
Avi Kivity	6fdc8f28a9	Update tools/jmx submodule * tools/jmx 45e4f28...25bcd76 (1): > install.sh: stop using symlinks for systemd units on nonroot mode Fixes #7288.	2020-09-29 13:32:45 +03:00
Takuya ASADA	8504332e17	scylla_setup: skip offline warnings on nonroot mode Since most of the scripts requires root privilege, we don't shows up offline warning on nonroot mode. Fixes #7286 Closes #7287	2020-09-29 13:30:13 +03:00
Eliran Sinvani	925cdc9ae1	consistency level: fix wrong quorum calculation whe RF = 0 We used to calculate the number of endpoints for quorum and local_quorum unconditionally as ((rf / 2) + 1). This formula doesn't take into account the corner case where RF = 0, in this situation quorum should also be 0. This commit adds the missing corner case. Tests: Unit Tests (dev) Fixes #6905 Closes #7296	2020-09-29 13:25:41 +03:00

1 2 3 4 5 ...

23815 Commits