scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 21:47:10 +00:00

Author	SHA1	Message	Date
Nadav Har'El	c51a41a885	test/cql-pytest: test for multiple restrictions on same column It turns out that there is a difference between how Scylla and Cassandra handle multiple restrictions on the same column - for example "WHERE c = 0 and c >0". Cassandra treats all such cases as invalid queries, whereas Scylla allows them. This test demonstrates this difference (it is marked "scylla_only" because it's a Scylla-only feature), and also verifies that the results of such queries on Scylla are correct - i.e., if the two restrictions conflict the result is empty, and if the two restrictions overlap, the result can be non-empty. The test passes, verifying that although Scylla differs from Cassandra on this, its behavior is correct. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220512165107.644932-1-nyh@scylladb.com>	2022-05-13 08:43:57 +03:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Botond Dénes	1f0d3d57eb	Merge 'Convert (almost) all uses of flat_mutation_reader_assertions to v2' from Michael Livshin "Almost" because 2 uses of the v1 asserter remain (as they are deliberate). Closes #10518 * github.com:scylladb/scylla: tests: remove obsolete utility functions tests: less trivial flat_reader_assertions{,_v2} conversions tests: trivial flat_reader_assertions{,_v2} conversions flat_mutation_reader_assertions_v2: improve range tombstone support	2022-05-13 08:04:20 +03:00
Piotr Sarna	eb6f4cc839	Merge 'dependencies: add rust' from Wojciech Mitros The main reason for adding rust dependency to scylla is the wasmtime library, which is written in rust. Although there exist c++ bindings, they don't expose all of its features, so we want to do that ourselves using rust's cxx. The patch also includes an example rust source to be used in c++, and its example use in tests/boost/rust_test. The usage of wasmtime has been slightly modified to avoid duplicate symbol errors, but as a result of adding a Rust dependency, it is going to be removed from `configure.py` completely anyway Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #10341 * github.com:scylladb/scylla: docs: document rust tests: add rust example	2022-05-12 15:24:58 +02:00
Wojciech Mitros	4ad012cb6a	tests: add rust example The patch includes an example rust source to be used in c++, and its example use in tests/boost/rust_test. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-05-11 16:49:31 +02:00
Nadav Har'El	043b1c7f89	Update seastar submodule. Unfortunately, also requires two changes to Scylla itself to make it still compile - see below * seastar 5e863627...96bb3a1b (18): > install-dependencies: add rocky as a supported distro > circleci: relax docker limits to allow running with new toolchain > core: memory: Add memory::free_memory() also in Debug mode > build: bump up zlib to 1.2.12 > cmake: add FindValgrind.cmake > Merge 'seastar-addr2line: support sct syslogs' from Benny Halevy > rpc: lower log level for 'failed to connect' errors > scripts: Build validation > perftune.py: remove rx_queue_count from mode condition. > memory: add attributes to memalign for compatibility with glibc 2.35 > condition-variable: Fix timeout "when" potentially not killing timer > Merge "tests: perf: measure coroutines performance" from Benny > Merge: Refine COUNTER metrics > Revert "Merge: Refine COUNTER metrics" > reactor: document intentional bitwise-on-bool op in smp_pollfn::poll() > Merge: Refine COUNTER metrics > SLES: additionally check irqbalance.service under /usr/lib > rpc_tester: job_cpu: mark virtual methods override Changes to Scylla also included in this merge: 1. api: Don't export DERIVEs (Pavel Emelyanov) Newer seastar doesn't have DERIVE metrics, but does have REAL_COUNTER one. Teach the collectd getter the change. (for the record: I don't understand how this endpoing works at all, there's a HISTOGRAM metrics out there that would be attempted to get exposed with the v.ui() call which's totally wrong) 2. test: use linux_perf_events.{cc,hh} from Seastar Seastar now has linux_perf_events.{cc,hh}. Remove Scylla's version of the same files and use Seastar's. Without this change, Scylla fails to compile when some source files end up including both versions and seeing double definitions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-05-11 14:46:30 +02:00
Tomasz Grabiec	f703e8ded5	Merge 'New failure detector for Raft' from Kamil Braun We introduce a new service that performs failure detection by periodically pinging endpoints. The set of pinged endpoints can be dynamically extended and shrinked. To learn about liveness of endpoints, user of the service registers a listener and chooses a threshold - a duration of time which has to pass since the last successful ping in order to mark an endpoint as dead. When an endpoint responds it's immediately marked as alive. Endpoints are identified using abstract integer identifiers. The method of performing a ping is a dependency of the service provided by the user through the `pinger` interface. The implementation of `pinger` is responsible for translating the abstract endpoint IDs to 'real' addresses. For example, production implementation may map endpoint IDs to IP addresses and use TCP/IP to perform the ping, while a test/simulation implementation may use a simulated network that also operates on abstract identifiers. Similarly, the method of measuring time is a dependency provided by the user using the `clock` interface. The service operates on abstract time intervals and timepoints. So, for example, in a production implementation time can be measured using a stopwatch, while in test/simulation we can use a logical clock. The service distributes work across different shards. When an endpoint is added to the set of detected endpoints, the service will choose a shard with the smallest amount of workers and create a worker that is responsible for periodically pinging this endpoint on that shard and sending notifications to listeners. We modify the randomized nemesis test to use the new service. The service is sharded, but for simplicity of implementation in the test we implement rpcs and sleeps by routing the requests to shard 0, where logical timers and network live. rpcs are using the existing simulated network and clock using the existing logical timers. We also integrate the service with production code. There, `pinger` is implemented using existing GOSSIP_ECHO verb. The gossip echo message requires the node's gossip generation number. We handle this by embedding the pinger implementation inside `gossiper`, and making `gossiper` update the generation number (cached inside the pinger class) periodically. Production `clock` is a simple implementation which uses `std::chrono::steady_clock` and `seastar::sleep_until` underneath. Translating `steady_clock` durations to `direct_fd::clock` durations happens by taking the number of ticks. We connect the group 0 raft server rpc implementation to the new service, so that when servers are added or removed from the the group 0 configuration, corresponding endpoints are added to the direct failure detector service. Thus the set of detected endpoints will be equal to the group 0 configuration. On each shard, we register a listener for the service. The listener maintains a set of live addresses; on mark_alive it adds a server to the set and on mark_dead it removes it. This set is then used to implement the `raft::failure_detector` interface, consisting of `is_alive()` function, which simply checks set membership. --- v6: - remove `_alive_start_index`. Instead, keep a map of `bool`s to track liveness of each endpoint. See the code for details (`listeners_liveness` struct and its usage in `ping_fiber()`, `notify_fiber()`, `add/remove_worker`, `add/remove_listener`). The diff is easy to read: `f617aeca62..d4b225437c` v5: - renamed `rpc` to `pinger` - replaced `bool` with `enum class endpoint_update` (with values `added` and `removed`) in `_endpoint_updates` - replaced `unsigned` with `shard_id` - fixed definition of `threshold(size_t n)` (it didn't use `n`, but `_alive_start`; fortunately all uses passed `_alive_start` as `n` so the bug wouldn't affect the behavior) - improve `_num_workers` assertions - signal `_alive_start_changed` only when `_alive_start` indeed changed - renamed `{_marked}_alive_start` to `{_marked}_alive_start_index` v4: - rearrange ping_fiber(). Remove the loop at the end of the big `while` which was timing out listeners (after the sleep). Instead: - rely on the loop before the sleep for timing out listeners - before calling ping(), check if there is a timed out listener, if so abandon the ping, immediately proceed to the timing-out-listeners loop, and then immediately proceed to the next iteration of the big `while` (without sleeping) - inline send_mark_dead() and send_mark_alive(); each was used in exactly one place after the rearrangement - when marking alive, instead of repeatedly doing `--_alive_start` and signalling the condition variable, just do `_alive_start = 0` and signal the condition variable once - fix the condition for stopping `endpoint_worker::notify_fiber()`: before, it was `_as.abort_requested()`, now it is `_as.abort_requested() && _alive_start == _fd._listeners.size()`. Indeed, we want to wait for the stopping code (`destroy_worker()`) to set `_alive_start = _fd._listeners.size()` before `notify_fiber()` finishes so `notify_fiber()` can send the final `mark_dead` notifications for this endpoint. There was a race before where `notify_fiber()` could finish before it sent those notifications (because it finished as soon as it noticed `_as.abort_requested()`) - fix some waits in the unit test; they depended on particular ordering of tasks by the Scylla reactor, the test could sometimes hang in debug mode which randomizes task order - fix `rpc::ping()` in randomized_nemesis_test so it doesn't give an exceptional discarded future in some cases v3: - fix a race in failure_detector::stop(): we must first wait for _destroy_subscriptions fiber to finish on all shards, only then we can set _impl to nullptr on any shard - invoke_abortable_on was moved from randomized_nemesis_test to raft/helpers - add a unit test (second patch) v2: - rename `direct_fd` namespace to `direct_failure_detector` - move gms/direct_failure_detector.{cc,hh} to direct_failure_detector/failure_detector.{cc,hh} - cleaned license comments - removed _mark_queue for sending notifications from ping_fiber() to notify_fiber(). Instead: - _listeners is now a boost::container::flat_multimap (previously it was std::multimap) - _alive_start is no longer an iterator to _listeners, but an index (size_t) - _mark_queue was replaced with a second index to _listeners, _marked_alive_start, together with a condition variable, _alive_start_changed - ping_fiber() signals _alive_start_changed when it changes _alive_start - notify_fiber() waits on _alive_start_changed. When it wakes up, it compares _marked_alive_start to _alive_start, sends notifications to listeners appropriately, and updates _marked_alive_start - replacing _mark_queue with index + condition variable allowed some better exception specifications: send_mark_alive and send_mark_dead are now noexcept, ping_fiber() is specified to not return exceptional futures other than sleep_aborted which can only happen when we destroy the worker (previously, ping_fiber() could silently stop due to exception happening when we insert to _mark_queue - it could probably only be bad_alloc, but still) - _shard_workers is now unordered_map<endpoint_id, endpoint_worker> instead of unordered_map<endpoint_id, unique_ptr<endpoint_worker>> (after learning how to construct map values in place - using either `emplace`+`forward_as_tuple` or `try_emplace`) - `failure_detector::impl::add_endpoint` now gives strong exception guarantee: if an exception is thrown, no state changes - same for `failure_detector::impl::remove_endpoint` - `failure_detector::impl::create_worker` now uses `on_internal_error` when it detects that there is a worker for this endpoint already - thanks to the strong exception guarantees of `add_endpoint` and `remove_endpoint` this should never happen - comment at _num_workers definition why we maintain this statistic (to pick a shard with smallest number of workers) - remove unnecessary `if (_as.abort_requested())` in `ping_fiber()` - in ping_fiber(), after a ping, we send notifications to listeners which we know will time-out before the next ping starts. Before, we would sleep until the threshold is actually passed by the clock. Now we send it immediately - we know ahead of time that the listener will time-out and we can notify it immediately. - due to above, comment at `register_listener` was adjusted, with the following note added: "Note: the `mark_dead` notification may be sent earlier if we know ahead of time that `threshold` will be crossed before the next `ping()` can start." - `register_listener` now takes a `listener&`, not `listener` - at `register_listener` comment why we allow different thresholds (second to last paragraph) - at `register_listener` mention that listeners can be registered on any shard (last paragraph) - add protected destructors to rpc, clock, listener, and mention that these objects are not owned/destroyed by `failure_detector`. - replaced _endpoint_queue (seastar::queue<pair<endpoint_id, bool>>) with unordered_map<endpoint_id, bool> + condition variable. When user calls add/remove_endpoint, an entry is inserted to this map, or existing entry is updated, and the condition variable is signaled. update_endpoint_fiber() waits on the condition variable, performs the add/remove operation, and removes entries from this map. Compared to the previous solution: - the new solution has at most one entry for a given endpoint, so the number of entries is bounded by the number of different endpoints (so in the main Scylla use case, by the number of different nodes that ever exist); the previous solution could in theory have a backlog of unprocessed events, with updates for a given endpoint appearing multiple times in the queue at once - when the add/remove operation fails in update_endpoint_fiber(), we don't remove the entry from the map so the operation can be retried later. Previously we would always remove the entry from the queue so it doesn't grow too big in presence of failures. - when the add/remove operation fails in update_endpoint_fiber(), we sleep for 10ping_period before retrying. Note that this codepath should not be reached in practice, it can basically only happen on bad_alloc - commented that `clock::sleep_until` should signalize aborts using `sleep_aborted` - `clock::now()` is `noexcept` - `add/remove_endpoint` can be called after `stop()`, they just won't do anything in that case. Reason: next item - in randomized_nemesis_test, stop failure detector before raft server (it was the other way before), so it stops using server's RPC before server is aborted. Before, the log was spammed with errors from failure detector because failure detector was getting gate_closed_exceptions from the RPC when the server was stopped. A side effect is that the raft server may continue adding/removing endpoints when the failure detector is stopped, which is fine due to above item - randomized_nemesis_test: direct_fd_clock::sleep_until translates abort_requested_exception to sleep_aborted (so sleep_until satisfies the interface specification) - message/rpc_protocol_impl: send_message_abortable: if abort_source::subscribe returns null, immediately throw abort_requested_exception (before we would send the message out and not react to an abort if it happened before we were called) - rebase Closes #10437 * github.com:scylladb/scylla: service: raft: remove `raft_gossip_failure_detector` service: raft: raft_group_registry: use direct failure detector notifications for raft server liveness service: raft: add/remove direct failure detector endpoints on group 0 configuration changes main: start direct failure detector service messaging_service: abortable version of `send_gossip_echo` message: abortable version of `send_message` test: raft: randomized_nemesis_test: remove old failure_detector test: raft: randomized_nemesis_test: use `direct_failure_detector::failure_detector` test: raft: randomized_nemesis_test: ping all shards on each tick test: unit test for new failure detector service direct_failure_detector: introduce new failure detector service	2022-05-11 14:46:27 +02:00
Michael Livshin	1e690e6773	tests: remove obsolete utility functions Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-10 22:10:40 +03:00
Michael Livshin	864882253a	tests: less trivial flat_reader_assertions{,_v2} conversions Dealing with the handful of tests that check range tombstones in interesting ways and need more than search-and-replace. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-10 22:10:40 +03:00
Michael Livshin	3cc2343775	tests: trivial flat_reader_assertions{,_v2} conversions (Which entails temporary cut-and-pasting some utility functions) Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-10 22:10:40 +03:00
Michael Livshin	51cf84e8c9	flat_mutation_reader_assertions_v2: improve range tombstone support * Track the active range tombstone. * Add `may_produce_tombstones()`. * Flesh out `produces_row_with_key()`. * Add more trace logs. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-10 22:10:40 +03:00
Benny Halevy	fc79787863	test: rest_api: storage_service: verify_snapshot_details: add truncate Truncate the test table and verify that the 'live' snapshot size is now non-zero. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-10 10:45:14 +03:00
Benny Halevy	e1d58d4422	database: add snapshot_on_all And move the logic from snapshot-ctl down to the replica::database layer. A following patch will move the flush phase from the replica::table::snapshot layer out to the caller. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-10 10:45:14 +03:00
Benny Halevy	aa127a2dbb	snapshot-ctl: run_snapshot_modify_operation: reject views and secondary index using the schema Detecting a secondary index by checking for a dot in the table name is wrong as tables generated by Alternator may contain a dot in their name. Instead detect bot hmaterialized view and secondary indexes using the schema()->is_view() method. Fixes #10526 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-10 10:44:52 +03:00
Benny Halevy	e95ecbbea6	test: rest_api: storage_service: add test_storage_service_snapshot Test the snapshot operations via the rest api. Added test/rest_api/rest_util.py with new_test_snapshot that creates a new test snapshot and automagically deletes it when the `with` block if exited, similar to new_test_keyspace and new_test_table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-10 09:56:44 +03:00
Benny Halevy	5b4eb44795	database: add flush_on_all variants Use by api layer. Will be used in a later patch to flush on all shards before taking a snapshot. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-10 09:56:44 +03:00
Benny Halevy	05c7f4b832	test: rest_api: add test_storage_service_flush Add a basic rest_api test for keyspace_flush. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-10 09:56:44 +03:00
Nadav Har'El	1c6163d51f	Merge 'cql3: expr: allow bind markers in collection literals' from Michał Sala Allowing bind markers in collection literals is a change which causes minor differences in behavior between Scylla and Cassandra. Despite such an undesirable effect, I think allowing them is a good idea because it makes [refactoring work made by cvybhu](https://github.com/scylladb/scylla/pull/10409) easier - `469d03f8c2`. Also, making Scylla accept a superset of valid Cassandra cql expressions does not make us less compatible (maybe apart from test suit compatibility). Closes #10457 * github.com:scylladb/scylla: test/boost: cql_query_test: allow bound variables in test_list_of_tuples_with_bound_var test/boost: cql_query_test: test bound variables in collection literals cql3: expr: do not allow unset values inside collections cql3: expr: prepare_expr: allow bind markers in collection literals	2022-05-09 19:15:22 +03:00
Botond Dénes	fd27fbfe64	Merge "Add user types carrier helper" from Pavel Emelyanov " There's a cql_type_parser::parse() method that needs to get user types for a keyspace by its name. For this it uses the global storage proxy instance as a place to get database from. This set introduces an abstract user_types_storage helper object that's responsible in providing the user types for the caller. This helper, in turn, is provided to the parse() method by the database itself or by the schema_ctxt object that needs parse() to unfreeze schemas and doesn't have database at those times. This removes one more get_storage_proxy() call. " * 'br-user-types-storage' of https://github.com/xemul/scylla: cql_type_parser: Require user_types_storage& in parse() schame_tables: Add db/ctxt args here and there user_types: Carry storage on database and schema_ctxt data_dictionary: Introduce user types storage	2022-05-09 17:38:52 +03:00
Raphael S. Carvalho	48e3117ebc	compaction: move propagate_replacement() into private namespace propagate_replacement() is an internal function that shouldn't be in the public interface. No one besides an unit test for incremental compaction needs it. In the future, I want to revisit incremental compaction unit test to stop using it and only rely on public interfaces Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220506171647.81063-1-raphaelsc@scylladb.com>	2022-05-09 16:49:50 +03:00
Kamil Braun	7e4bb68061	service: raft: add/remove direct failure detector endpoints on group 0 configuration changes We connect the group 0 raft server rpc implementation to the new direct failure detector service, so that when servers are added or removed from the the group 0 configuration, corresponding endpoints are added to the direct failure detector service. Thus the set of detected endpoints will be equal to the group 0 configuration. This causes the failure detector service to start pinging endpoints, but no listeners are registered yet. The following commit changes that.	2022-05-09 15:31:19 +02:00
Kamil Braun	38f65e5a2e	main: start direct failure detector service We add the new direct failure detector to the list of services started in the Scylla process. To start the service, we need an implementation of `pinger` and `clock`. `pinger` is implemented using existing GOSSIP_ECHO verb. The gossip echo message requires the node's gossip generation number. We handle this by embedding the pinger implementation inside `gossiper`, and making `gossiper` update the generation number (cached inside the pinger class) periodically. `clock` is a simple implementation which uses `std::chrono::steady_clock` and `seastar::sleep_until` underneath. Translating `steady_clock` durations to `direct_failure_detector::clock` durations happens by taking the number of ticks. The service is currently not used, just initialized; no endpoints are added and no listeners are registered yet, but the following commits change that.	2022-05-09 13:14:42 +02:00
Kamil Braun	c15f3a9698	test: raft: randomized_nemesis_test: remove old failure_detector No longer used. Split from the previous commit for a better diff.	2022-05-09 13:14:41 +02:00
Kamil Braun	915d329f1f	test: raft: randomized_nemesis_test: use `direct_failure_detector::failure_detector` Until now the nemesis test used its own failure detector implementation which used one-way heartbeats. Switch it to use the new direct failure detection service, which will also be used in production code. Integrating it does require some work however as we need to implement the `pinger` and `clock` interfaces for the failure detector. The service is sharded, but for simplicity of implementation we implement rpcs and sleeps by routing the requests to shard 0, where logical timers and network live.	2022-05-09 13:14:41 +02:00
Kamil Braun	e5fc0681d9	test: raft: randomized_nemesis_test: ping all shards on each tick Right now the test is running entirely on shard 0, but we want to introduce a sharded service to the test. The initial naive attempt of doing that failed because the test would time out (reach the tick limit) before any work distributed to other shards could even start. The solution in this commit solves that by synchronizing the shards on each tick. When the test is ran with smp=1, the behavior is as before.	2022-05-09 13:14:41 +02:00
Kamil Braun	e4f85cf425	test: unit test for new failure detector service	2022-05-09 13:14:41 +02:00
Avi Kivity	94f677b790	Merge 'sstables/index_reader: short-circuit fast-forward-to when at EOF' from Botond Dénes Attempting to call advance_to() on the index, after it is positioned at EOF, can result in an assert failure, because the operation results in an attempt to move backwards in the index-file (to read the last index page, which was already read). This only happens if the index cache entry belonging to the last index page is evicted, otherwise the advance operation just looks-up said entry and returns it. To prevent this, we add an early return conditioned on eof() to all the partition-level advance-to methods. A regression unit test reproducing the above described crash is also added. Fixes: #10403 Closes #10491 * github.com:scylladb/scylla: sstables/index_reader: short-circuit fast-forward-to when at EOF test/lib/random_schema: add a simpler overload for fixed partition count	2022-05-08 14:17:40 +03:00
Avi Kivity	1ecb87b7a8	Merge 'Harden table truncate' from Benny Halevy This series fixes a few issue on the table truncate path: - "memtable_list: safely futurize clear_and_add" - reinstates an async version of table::clear_and_add, just safe against #10421 - a unit test reproducing #10421 was added to make sure the new version is indeed safe. - "table: clear: serialize with ongoing flush" fixes #10423 - a unit test reproducing #10423 was added Fixes #10281 Fixes #10423 Test: unit(dev), database_test. test_truncate_without_snapshot_during_{writes,flushes} (debug) Closes #10424 * github.com:scylladb/scylla: test: database_test: add test_truncate_without_snapshot_during_writes memtable_list: safely futurize clear_and_add table: clear: serialize with ongoing flush	2022-05-08 11:30:21 +03:00
Botond Dénes	e8f3d7dd13	sstables/index_reader: short-circuit fast-forward-to when at EOF Attempting to call advance_to() on the index, after it is positioned at EOF, can result in an assert failure, because the operation results in an attempt to move backwards in the index-file (to read the last index page, which was already read). This only happens if the index cache entry belonging to the last index page is evicted, otherwise the advance operation just looks-up said entry and returns it. To prevent this, we add an early return conditioned on eof() to all the partition-level advance-to methods. A regression unit test reproducing the above described crash is also added.	2022-05-05 14:42:37 +03:00
Botond Dénes	98f3d516a2	test/lib/random_schema: add a simpler overload for fixed partition count Some tests want to generate a fixed amount of random partitions, make their life easier.	2022-05-05 14:33:37 +03:00
Benny Halevy	c9612855c7	query: coroutinize to_data_query_result Reduce stalls by maybe yielding in-between partitions, and by awaiting unfreeze_gently where possible. Refs #10038 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-05 13:32:25 +03:00
Benny Halevy	e12454f175	frozen_mutation: add unfreeze_gently And use in data_read_resolver::resolve Fixes #2361 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-05 13:32:25 +03:00
Pavel Emelyanov	2104d90dd0	user_types: Carry storage on database and schema_ctxt The user types storage is needed in cql_type_parser::parse which is in turn called with either replica::database or scema_ctxt at hand. To facilitate the former case replica::database has its own user types storage created in database constructor. The latter case is a bit trickier. In many cases the ctxt is created as a temporary object and the database is available at those places. Also the ctxt object lives on the schema_registry instance which doesn't have database nearby. However, that ctxt lifetime is the same as the registry instance one and when it's created there's a database at hand (it's the database constructor that calls schema_registry.init() passing "this" into it). Thus, the solution is to make database's user types storage be a shared pointer that's shared between database itself and all the ctxts out there including the one that lives on schema_registry instance. When database goes away it .deactivate()s its user types storage so that any ctxts that may share it stay on the safe side and don't use database after free. This part will go away when the schema_registry will be deglobalized. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-05 13:06:04 +03:00
Calle Wilund	78350a7e1b	cdc: Ensure columns removed from log table are registered as dropped If we are redefining the log table, we need to ensure any dropped columns are registered in "dropped_columns" table, otherwise clients will not be able to read data older than now. Includes unit test. Should probably be backported to all CDC enabled versions. Fixes #10473 Closes #10474	2022-05-04 14:19:39 +02:00
Botond Dénes	4440d4b41a	Merge "De-globalize gossiper" from Pavel Emelyanov " - Alternator gets gossiper for its proxy dependency - Forward service method that takes global gossiper can re-use proxy method (forward -> proxy reference is already there) - Table code is patched to require gossiper argument - Snitch gets a dependency reference on snitch_ptr and some extra care for snitch driver vs snitch-ptr interaction and gossip test - Cql test env should carry gossiper reference on-board - Few places can re-use the existing local gossiper reference - Scylla-gdb needs to get gossiper from debug namespace and needs _not_ to get feature service from gossiper " * 'br-gossiper-deglobal-2' of https://github.com/xemul/scylla: code: De-globalize gossiper scylla-gdb, main: Get feature service without gossiper help test: Use cql-test-env gossiper cql test env: Keep gossiper reference on board code: Use gossiper reference where possible snitch: Use local gossiper in drivers snitch: Keep gossiper reference test: Remove snitch from manual gossip test gossiper: Use container() instead of the global pointer main, cql_test_env: Start snitch later snitch: Move snitch_base::get_endpoint_info() forward service: Re-use proxy's helper with duplicated code table: Don't use global gossiper alternator: Don't use global gossiper	2022-05-03 15:56:07 +03:00
Nadav Har'El	6fb762630b	cql-pytest: translate Cassandra's tests for SELECT operations This is a translation of Cassandra's CQL unit test source file validation/operations/SelectTest.java into our our cql-pytest framework. This large test file includes 78 tests for various types of SELECT operations. Four additional tests require UDF in Java syntax, and were skipped. All 78 tests pass on Cassandra. 25 of the tests fail on Scylla reproducing 3 already known Scylla issues and 8 previously-unknown issues: Previously known issues: Refs #2962: Collection column indexing Refs #4244: Add support for mixing token, multi- and single-column restrictions Refs #8627: Cleanly reject updates with indexed values where value > 64k Newly-discovered issues: Refs #10354: SELECT DISTINCT should allow filter on static columns, not just partition keys Refs #10357: Spurious static row returned from query with filtering, despite not matching filter Refs #10358: Comparison with UNSET_VALUE should produce an error Refs #10359: "CONTAINS NULL" and "CONTAINS KEY NULL" restrictions should match nothing Refs #10361: Null or UNSET_VALUE subscript should generate an invalid request error Refs #10366: Enforce Key-length limits during SELECT Refs #10443: SELECT with IN and ORDER BY orders rows per partition instead of for the entire response Refs #10448: The CQL token() function should validate its parameters Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #10449	2022-05-03 11:45:05 +03:00
Pavel Emelyanov	e80adbade3	code: De-globalize gossiper No code uses global gossiper instance, it can be removed. The main and cql-test-env code now have their own real local instances. This change also requires adding the debug:: pointer and fixing the scylle-gdb.py to find the correct global location. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	b25fc29801	test: Use cql-test-env gossiper There's yet another -test-env -- the alternator- one -- which needs gossiper. It now uses global reference, but can grab gossiper reference from the cql-test-env which partitipates in initialization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	b0544ba7bd	cql test env: Keep gossiper reference on board The reference is already available at the env initialization, but it's not kept on the env instance itself. Will be used by the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	4bea0b7491	code: Use gossiper reference where possible Some places in the code has function-local gossiper reference but continue to use global instance. Re-use the local reference (it's going to become sharded<> instance soon). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	38c77d0d85	snitch: Keep gossiper reference The reference is put on the snitch_ptr because this is the sharded<> thing and because gossiper reference is the same for different snitch drivers. Also, getting gossiper from snitch_ptr by driver will look simpler than getting it from any base class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	52fc4d6b22	test: Remove snitch from manual gossip test It's not in use out there Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	2d32c47d0d	main, cql_test_env: Start snitch later Snitch depends on gossiper and system keyspace, so it needs to be started after those two do. fixes #10402 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:32 +03:00
Botond Dénes	53c66fe24a	Merge "Make LCS reshape and major more efficient by picking the ideal output level" from Raphael S. Carvalho " Today, both operations are picking the highest level as the ideal level for placing the output, but the size of input should be used instead. The formula for calculating the ideal level is: ceil(log base(fan_out) of (total_input_size / max_fragment_size)) where fan_out = 10 by default, total_input_size = total size of input data and max_fragment_size = maximum size for fragment (160M by default) such that 20 fragments will be placed at level 2, as level 1 capacity is 10 fragments only. By placing the output in the incorrect level, tons of backlog will be generated for LCS because it will either have to promote or demote fragments until the levels are properly balanced. " * 'optimize_lcs_major_and_reshape/v2' of https://github.com/raphaelsc/scylla: compaction: LCS: avoid needless work post major compaction completion compaction: LCS: avoid needless work post reshape completion compaction: LCS: extract calculation of ideal level for input compaction: LCS: Fix off-by-one in formula used to calculate ideal level	2022-05-02 10:16:09 +03:00
Avi Kivity	5169ce40ef	Merge 'loading_cache: force minimum size of unprivileged ' from Piotr Grabowski This series enforces a minimum size of the unprivileged section when performing `shrink()` operation. When the cache is shrunk, we still drop entries first from unprivileged section (as before this commit), however, if this section is already small (smaller than `max_size / 2`), we will drop entries from the privileged section. This is necessary, as before this change the unprivileged section could be starved. For example if the cache could store at most 50 entries and there are 49 entries in privileged section, after adding 5 entries (that would go to unprivileged section) 4 of them would get evicted and only the 5th one would stay. This caused problems with BATCH statements where all prepared statements in the batch have to stay in cache at the same time for the batch to correctly execute. To correctly check if the unprivileged section might get too small after dropping an entry, `_current_size` variable, which tracked the overall size of cache, is changed to two variables: `_unprivileged_section_size` and `_privileged_section_size`, tracking section sizes separately. New tests are added to check this new behavior and bookkeeping of the section sizes. A test is added, that sets up a CQL environment with a very small prepared statement cache, reproduces issue in #10440 and stresses the cache. Fixes #10440. Closes #10456 * github.com:scylladb/scylla: loading_cache_test: test prepared stmts cache loading_cache: force minimum size of unprivileged loading_cache: extract dropping entries to lambdas loading_cache: separately track size of sections loading_cache: fix typo in 'privileged'	2022-05-01 19:36:35 +03:00
Eliran Sinvani	e0c7178e75	query_processor: remove default internal query caching behavior When executing internal queries, it is important that the developer will decide if to cache the query internally or not since internal queries are cached indefinitely. Also important is that the programmer will be aware if caching is going to happen or not. The code contained two "groups" of `query_processor::execute_internal`, one group has caching by default and the other doesn't. Here we add overloads to eliminate default values for caching behaviour, forcing an explicit parameter for the caching values. All the call sites were changed to reflect the original caching default that was there. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2022-05-01 08:33:55 +03:00
Avi Kivity	7f1e368e92	Merge 'replica/database: drop_column_family(): properly cleanup stale querier cache entries' from Botond Dénes Said method has to evict all querier cache entries, belonging to the to-be-dropped table. This is already the case, but there was a window where new entries could sneak in, causing a stale reference to the table to be de-referenced later when they are evicted due to TTL. This window is now closed, the entries are evicted after the method has waited for all ongoing operations on said table to stop. Fixes: #10450 Closes #10451 * github.com:scylladb/scylla: replica/database: drop_column_family(): drop querier cache entries after waiting for ops replica/database: finish coroutinizing drop_column_family() replica/database: make remove(const column_family&) private	2022-04-29 22:06:51 +03:00
Tomasz Grabiec	dbef83af71	Merge 'raft: fix startup hangs' from Kamil Braun Fix hangs on Scylla node startup with Raft enabled that were caused by: - a deadlock when enabling the USES_RAFT feature, - a non-voter server forgetting who the leader is and not being able to forward a `modify_config` entry to become a voter. Read the commit messages for details. Fixes: #10379 Refs: #10355 Closes #10380 * github.com:scylladb/scylla: raft: actively search for a leader if it is not known for a tick duration raft: server: return immediately from `wait_for_leader` if leader is known service: raft: don't support/advertise USES_RAFT feature	2022-04-29 19:47:10 +02:00
Piotr Grabowski	6537dc6126	loading_cache_test: test prepared stmts cache Add a new test that sets up a CQL environment with a very small prepared statements cache. The test reproduces a scenario described in #10440, where a privileged section of prepared statement cache gets large and that could possibly starve the unprivileged section, making it impossible to execute BATCH statements. Additionally, at the end of the test, prepared statements/"simulated batches" with prepared statements are executed a random number of times, stressing the cache. To create a CQL environment with small prepared cache, cql_test_config is extended to allow setting custom memory_config value.	2022-04-29 19:22:55 +02:00
Piotr Grabowski	3f2224a47f	loading_cache: force minimum size of unprivileged This patch enforces a minimum size of unprivileged section when performing shrink() operation. When the cache is shrank, we still drop entries first from unprivileged section (as before this commit), however if this section is already small (smaller than max_size / 2), we will drop entries from the privileged section. For example if the cache could store at most 50 entries and there are 49 entries in privileged section, after adding 5 entries (that would go to unprivileged section) 4 of them would get evicted and only the 5th one would stay. This caused problems with BATCH statements where all prepared statements in the batch have to stay in cache at the same time for the batch to correctly execute. New tests are added to check this behavior and bookkeeping of section sizes. Fixes #10440.	2022-04-29 19:19:04 +02:00

1 2 3 4 5 ...

3073 Commits