scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Benny Halevy	2e93996473	streaming: make_streaming_consumer: close reader on errors Currently, if e.g. find_column_family throws an error, as seen in #8776 when the table was dropped during repair, the reader is not closed. Use a coroutine to simplify error handling and close the reader if an exception is caught. Also, catch an error inside the lambda passed to make_interposer_consumer when making the shared_sstable for streaming, and close the reader their and return an exceptional future early, since the reader will not be moved to sst->write_components, that assumes ownership over it and closes it in all cases. Fixes #8776 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-08 08:50:46 +03:00
Benny Halevy	42028c324c	streaming: make_streaming_consumer: coroutinize returned function To simplify error handling in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-08 08:48:33 +03:00
Piotr Sarna	fa29b79c20	transport: close connections when too large requests arrive Too large requests are currently handled by the CQL server by skipping them and sending back an error response. That's however wasteful and dangerous: bogus request sizes will force Scylla to potentially skip gigabytes of data - and skipping is done by simply reading from the socket, so it may results in gigabytes of bandwidth wasted. Even if the request size is not bogus, closing the connection forces users to adjust their request sizes, which should be done anyway. Originally, there was a bug in handling too large requests which only read their headers and then left the connection in a broken, undefined state, trying to interpret the rest of the large request as a next CQL header. It was later fixed to skip the request, but closing the connection is a safer thing to do. Fixes #8798 Closes #8800	2021-06-07 12:23:55 +03:00
Avi Kivity	e6c5a63581	Merge "Fix several issues on transport stop" from Pavel E " There's a bunch of issues with starting and stopping of cql_server with the help of cql_controller. fixes: #8796 tests: manual(start + stop, start + exception on cql_set_state() ) unit not run, they don't mess with transport controller " * 'br-transport-stop-fixes' of https://github.com/xemul/scylla: transport/controller: Stop server on state change failure too transport/controller: Rollback server start on state change failure too transport/controller: Do not leave _server uninitialized transport/controller: Rework try-catch into defers	2021-06-07 11:41:36 +03:00
Alejo Sanchez	bd168d57ff	raft: fix vote reply handling in prevote Do not register a reply to prevote as a real vote Found and authored by @kostja. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20210604122530.1975388-1-alejo.sanchez@scylladb.com>	2021-06-06 19:18:49 +03:00
Tomasz Grabiec	50d64646cd	Merge "raft: replication test fixes and OOP refactor" from Alejo Feature requests, fixes, and OOP refactor of replication_test. Note: all known bugs and hangs are now fixed. A new helper class "raft_cluster" is created. Each move of a helper function to the class has its own commit. New helpers are provided To simplify code, for now only a single apply function can be set per raft_cluster. No tests were using in any other way. In the future, there could be custom apply functions per server dynamically assigned, if this becomes needed. * alejo/raft-tests-replication-02-v3-30: (66 commits) raft: replication test: wait for log for both index and term raft: replication test: reset network at construction raft: replication test: use lambda visitor for updates raft: replication test: move structs into class raft: replication test: move data structures to cluster class raft: replication test: remove shared pointers raft: replication test: move get_states() to raft_cluster raft: replication test: test_server inside raft_cluster raft: replication test: rpc declarative tests raft: replication test: add wait_log raft: replication test: add stop and reset server raft: replication test: disconnect 2 support raft: replication test: explicit node_id naming raft: replication test: move definitions up raft: replication test: no append entries support raft: replication test: fix helper parameter raft: replication test: stop servers out of config raft: replication test: wait log when removing leader from configuration raft: replication test: only manipulate servers in configuration raft: replication test: only cancel rearm ticker for removed server ...	2021-06-06 19:18:49 +03:00
Piotr Sarna	cb17aa1e53	Merge 'test/alternator: rewrite run script to share code with cql-pytest's run script' from Nadav Har'El In this small series, I rewrite test/alternator/run to Python using the utility functions developed for test/cql-pytest. In the future, we should do the same to test/redis/run and test/scylla-gdb/run. The benefit of this rewrite is less code duplication (all run scripts start with the same duplicate code to deal with temporary directories, to run Scylla IP addresses, etc.), but most importantly - in the future fixes we do to cql-pytest (e.g., parameters needed to start Scylla efficiently, how to shut down Scylla, etc.) will appear automatically in alternator test without needing to remember to change both. Another benefit is that test/alternator/run will now be Python, not a shell script. This should make it easier to integrate it into test.py (refs #6212) in the future - if we want to. Closes #8792 * github.com:scylladb/scylla: test/alternator: rewrite test/alternator/run script in Python test/cql-pytest: make test run code more general	2021-06-06 19:18:49 +03:00
Nadav Har'El	fe1fa9d72b	docs: update Alternator's compatibility.md In the last year, four new features were added to DynamoDB which we don't yet support - Kinesis Streams, PartiQL, Contributor Insights and Export to S3. Let's document them as missing Alternator features, and point to the four newly-created issues about these features. Refs #8786 Refs #8787 Refs #8788 Refs #8789 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210603125825.1179171-1-nyh@scylladb.com>	2021-06-06 19:18:49 +03:00
Avi Kivity	872cd8f692	test: adjust copyright statement to use ScyllaDB rather than old name	2021-06-06 19:18:49 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Avi Kivity	b3ce1c8b40	gdb: prepare for Seastar's "smp: allow having multiple instances of the smp class" scylladb/seastar@e6463df8a0 ("smp: allow having multiple instances of the smp class") changes the type of seastar::smp::_qs from a unique_ptr to a regular pointer. Adjust for that change, with a fallback to support older versions. Closes #8784	2021-06-06 19:18:49 +03:00
Nadav Har'El	48ff641f67	Merge 'commitlog: make_checked_file for segments, report and ignore other errors on shutdown' from Benny Halevy Shutdown must never fail, otherwise it may cause hangs as seen in https://github.com/scylladb/scylla/issues/8577. This change wraps the file created in `allocate_segment_ex` in `make_checked_file` so that scylla will abort when failing to write to the commitlog files. In case other errors are seen during shutdown, just log them and continue with shutting down to prevent scylla from hanging. Fixes #8577 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #8578 * github.com:scylladb/scylla: commitlog: segment_manager::shutdown: abort on errors commitlog: allocate_segment_ex: make_checked_file	2021-06-06 19:18:49 +03:00
Avi Kivity	8a4abe9895	cql3: expression: don't copy expression in has_supporting_index() std::bind() copies the bound parameters for safekeeping. Here this includes expr, which can be quite heavyweight. Use std::ref() to prevent copying. This is safe since the bound expression is executed and discarded before has_supporting_index() returns. Closes #8791	2021-06-06 19:18:49 +03:00
Nadav Har'El	cee0340c89	scripts/pull_github_pr.sh: do not hard-code project name The current pull_github_pr.sh hard-codes the project name "scylladb/scylla". Let's determine it automaticaly, from the git origin url. This will allow using exactly the same script in other Scylla subprojects, e.g., Seastar. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210318142624.1794419-1-nyh@scylladb.com>	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	2187a59089	treewide: move `service::cas_request` out from `storage_proxy.hh` And remove all remaining inclusions of `storage_proxy.hh` in the headers. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	142d3b5ad9	cdc: self-sufficient headers fixup Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Gleb Natapov	bb822c92ab	raft: change raft::rpc api to return void for most sending functions Most RAFT packets are sent very rarely during special phases of the protocol (like election or leader stepdown). The protocol itself does not care if a packet is sent or dropped, so returning futures from their send function does not serve any purpose. Change the raft's rpc interface to return void for all packet types but append_request. We still want to get a future from sending append_request for backpressure purposes since replication protocol is more efficient if there is no packet loss, so it is better to pause a sender than dropping packets inside the rpc. Rpc is still allowed to drop append_requests if overloaded.	2021-06-06 19:18:49 +03:00
Gleb Natapov	f5a54d6c05	raft: move ELECTION_TIMEOUT definition to a public header Move ELECTION_TIMEOUT definition to be visible to outside modules.	2021-06-06 19:18:49 +03:00
Gleb Natapov	87844c0ce1	raft: remove unused clock type definition RAFT uses logical clock now and this define is from older times.	2021-06-06 19:18:49 +03:00
Gleb Natapov	90ea71da54	raft: wait for io and applier fiber to stop before before aborting snapshots and waiters IO and applier fibers may update waiters and start new snapshot transfers, so abort() needs to wait for them to stop before proceeding to abort waiters and snapshot transfers,	2021-06-06 19:18:49 +03:00
Yaron Kaikov	6a447db8a8	scylla_util.py: Fix Azure support for machine-image In https://github.com/scylladb/scylla/pull/7807 we added support for Azure instance in Scylla. The following changes are required in order machine-image to work: 1) fix wrong metadata URL and updating metadata path values (was intreduce in `f627fcbb0c`) 2) fix function naming which been used my machine image 3) add missing function which are reuqired by mahcine-image 4) cleanup unused functions Closes #8596	2021-06-06 09:21:23 +03:00
Asias He	2a7b855255	repair: Init repair metrics during startup The _node_ops_metrics is thread local, it is constructed when it is first accessed. If there are no node operations, the metrics will not be shown. To make the metrics more consistent, init during startup. Refs #8311 Closes #8780	2021-06-06 09:21:23 +03:00
Benny Halevy	3f9bad0f0a	test: compound_test: use tests::random For reproducibility. Test: compound_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210602061910.286893-2-bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Benny Halevy	40e032ff8b	test: compound_test: use to seastar test framework Prepare for using tests::random instead of std::rand for reproducibility. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210602061910.286893-1-bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Asias He	9b902fad79	gossiper: Update timestamp for nodes in ack and ack2 msg handler In commit `425e3b1182` (gossip: Introduce direct failure detector), the call to notify_failure_detector inside ack and ack2 msg handler was removed since there is no need to update the old failure detector anymore. However, the timestamp for endpoit_state is also updated inside notify_failure_detector. With the new failure detector we still need the timestamp for endpoit_state. Otherwise, nodes might be removed from gossip wrongly. For example, as we saw in issue #8702: INFO 2021-05-24 22:45:24,713 [shard 0] gossip - FatClient 127.0.60.2 has been silent for 5000ms, removing from gossip To fix, update the timestamp as we do before in ack and ack2 msg handler. Fixes #8702 Closes #8777	2021-06-06 09:21:23 +03:00
Benny Halevy	f081e651b3	memtable_list: rename request_flush to just flush Now that it returns a future that always waits on pending flushes there is no point in calling it `request_flush`. `flush()` is simpler and better describes its function. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Benny Halevy	4f20cd3bea	memtable_list: rename seal_active_memtable_immediate to seal_active_memtable Now that there's no more seal_active_memtable_delayed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Benny Halevy	ba65b90b34	memtable_list: get rid of seal_active_memtable_delayed This path is unused since `e5be3352cf`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Calle Wilund	3b55ef36d1	cf_prop_defs: Fix extensions merge to handle removal Fixes #8773 When refactored for cdc, properties -> extensions merge was modified so it did not handle _removal_ (i.e. an extension function returning null -> no entry in new map). This causes certain enterprise extensions to not be able to disable themselves. Fixed by filtering existing extensions by property keywords. Unit test added. Closes #8774	2021-06-06 09:21:23 +03:00
Nadav Har'El	f22ed3ff5c	test/alternator: reduce very high timeout in one tracing test In test_tracing.py::test_slow_query_log, the was what looked like an innocent 30-second timeout, but this was in fact a 8 minute timeout - because it started with sleeping 1 second, then 2 seconds, then 3, ... until 30 seconds. Such a high timeout is frustrating when trying to debug failures in the test - which is only expected to take 2 seconds (and all of it because of an artificial timeout). So fix the loop to stop iterating after 60 seconds (a compromise between 30 seconds and 8 minutes...), sleeping a constant amount between iterations. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210601150631.1037158-1-nyh@scylladb.com>	2021-06-06 09:21:23 +03:00
Avi Kivity	100d6f4094	build: enable -Wunused-function Also drop a single violation in transport/server.cc. This helps prevent dead code from piling up. Three functions in row_cache_test that are not used in debug mode are moved near their user, and under the same ifdef, to avoid triggering the error. Closes #8767	2021-06-06 09:21:23 +03:00
Benny Halevy	6ce826206a	sstables: use vector empty method rather than size Testing std::vector::empty() is slightly more efficient than testing for `size() > 0`. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210601115552.155148-2-bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Benny Halevy	0565ba31a1	compaction_info: is_stop_requested: use sstring::empty rather than size `!empty()` is slightly more efficient than `size() > 0`. While at it, mark the function noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210601115552.155148-1-bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Benny Halevy	948a9da832	table: do_apply: verify that _async_gate is open Applying changes to the memtable after table::stop is prohibited. Verify that by making sure that the _async_gate is still open in `do_apply`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210601055042.41380-1-bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Benny Halevy	82a263f672	database: apply_in_memory: run_when_memory_available under table::run_async Make sure to apply the mutation under the table's _async_gate. Fixes #8790 Test: unit(dev), view_build_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #8794	2021-06-06 09:21:23 +03:00
Calle Wilund	131da30856	table: Always use explicit commitlog discard + clear out rp_set Fixes #8733 If a memtable flush is still pending when we call table::clear(), we can end up doing a "discard-all" call to commitlog, followed by a per-segment-count (using rp_set) _later_. This will foobar our internal usage counts and quite probably cause assertion failures. Fixed by always doing per-memtable explicit discard call. But to ensure this works, since a memtable being flushed remains on memtable list for a while (why?), we must also ensure we clear out the rp_set on discard. Closes #8766	2021-06-06 09:21:23 +03:00
Pavel Emelyanov	0944d69475	repair, streaming: Generalize consumer lambdas Both streaming and repair call the distributed sstables writing with equal lambdas each being ~30 lines of code. The only difference between them is repair might request offstrategy compaction for new sstable. Generalization of these two pieces save lines of codes and speeds the release/repair/row_level.o compilation by half a minute (out of twelve). tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210531133113.23003-1-xemul@scylladb.com>	2021-06-06 09:21:23 +03:00
Lubos Kosco	777771df34	scylla_util.py: Relax GCE setup NVMe device checks We don't want to fail I/O setup if there are more than one NVMe devices mounted as root nor if there are no NVMe devices. Fixes #8032 Closes #8444	2021-06-06 09:21:23 +03:00
Botond Dénes	b0056f88dc	test.py: revamp coverage support Instead of attempting to universally set the proper environment necessary for tests to generate profiling data such that coverage.py can process it, allow each Test subclass to set up the environment as needed by the specific Test variant. With this we now have support for all current test types, including cql, cql-pytest and alternator tests.	2021-06-06 09:21:23 +03:00
Botond Dénes	438391b4cc	scripts/coverage.py: check that --path is a directory To detect a bad --path that would fail coverage generation early.	2021-06-06 09:21:23 +03:00
Botond Dénes	ca91fd0e34	scripts/coverage.py: update main()'s docstring with new --run modifiers And fix a typo while there.	2021-06-06 09:21:23 +03:00
Botond Dénes	2ba3fc2e11	scripts/coverage.py: add --distinct-id parameter Yet another modifier for `--run`, allowing running the same executable multiple times and then generating a coverage report across all runs. This will also be used by test.py for those test suites (cql test) which run the same executable multiple times, with different inputs.	2021-06-06 09:21:23 +03:00
Botond Dénes	b1f46b3693	scripts/coverage.py: add --executable parameter Another modifier for `--run`, allowing to override the test executable path. This is useful when the real test is ran through a run-script, like in the case of cql-pytest.	2021-06-06 09:21:23 +03:00
Pavel Emelyanov	76947c829e	transport/controller: Stop server on state change failure too If on stop the set_cql_state() throws the local sharded<cql_server> will be left not stopped and will fail the respective assertion on its destruction. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-04 16:53:21 +03:00
Pavel Emelyanov	f6ef148c76	transport/controller: Rollback server start on state change failure too If set_cql_state() throws the cserver remains started. If this happens on start before the controller stop defer action is scheduled the destruction of controller will fain on assertion that checks the _server must be stopped. Effectively this is the fix of #8796 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-04 16:50:51 +03:00
Pavel Emelyanov	6995e41e64	transport/controller: Do not leave _server uninitialized If an exception happens after sharded<cql_server>.start() the controller's _server pointer is left pointing to stopped sharded server. This makes it impossible to start the server again (via API) since the check for if (_server) will always be true. This is the continuation of the `ae4d5a60` fix. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-04 16:48:26 +03:00
Pavel Emelyanov	12220b74e8	transport/controller: Rework try-catch into defers This is to make further patching simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-04 16:48:12 +03:00
Alejo Sanchez	3e91a8ca0d	raft: replication test: wait for log for both index and term Waiting on index alone does not guarantee leader correct leader log propagation. This patch add checking also the term of the leader's last log entry. This was exposed with occasional problems with packet drops. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-04 08:38:19 -04:00
Alejo Sanchez	545893145e	raft: replication test: reset network at construction Reset network in constructor, not in unrelated function. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-06-04 08:18:32 -04:00

1 2 3 4 5 ...

26827 Commits