scylladb

Author	SHA1	Message	Date
Botond Dénes	fd69add579	test: move away from v1 reader from mutations Use the v2 variant instead.	2022-03-31 10:36:23 +03:00
Avi Kivity	39504dea61	Merge "Convert result builders to v2" from Botond " Namely the query result writer and the reconcilable result builder, used for building results for regular queries and mutation queries (used in read repair) respectively. With this, there are no users left for the v1 output of the compactor, so we remove that, making the compactor v2 all-the-way (and simpler). This means that for regular queries, a downgrade phase is eliminated completely, as regular queries don't store range tombstone in their result, so no need to convert them. Tests: unit(dev, release, debug) " * 'result-builders-v2/v1' of https://github.com/denesb/scylla: reconcilable_result_builder: remove v1 support query_result_builder: remove v1 support mutation_compactor: drop v1 related code-paths mutation_compactor: drop v1 support altogether from the API tree: migrate to the v2 consumer APIs test/boost/mutation_test: remove v1 specific test code querier: switch to v2 compactor output reconcilable_result_builder: add v2 support query_result_writer: add v2 support query_result_builder: make consume(range_tombstone) noop	2022-03-15 14:32:58 +02:00
Mikołaj Sielużycki	1d84a254c0	flat_mutation_reader: Split readers by file and remove unnecessary includes. The flat_mutation_reader files were conflated and contained multiple readers, which were not strictly necessary. Splitting optimizes both iterative compilation times, as touching rarely used readers doesn't recompile large chunks of codebase. Total compilation times are also improved, as the size of flat_mutation_reader.hh and flat_mutation_reader_v2.hh have been reduced and those files are included by many file in the codebase. With changes real 29m14.051s user 168m39.071s sys 5m13.443s Without changes real 30m36.203s user 175m43.354s sys 5m26.376s Closes #10194	2022-03-14 13:20:25 +02:00
Botond Dénes	0b5217052d	querier: switch to v2 compactor output The change is mostly mechanical: update all compactor instances to the _v2 variant and update all call-sites, of which there is not that many. As a consequence of this patch, queries -- both single-partition and range-scans -- now do the v2->v1 conversion in the consumers, instead of in the compactor.	2022-03-11 09:24:05 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Michael Livshin	a1b8ba23d2	reader_concurrency_semaphore: convert to flat_mutation_reader_v2 Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-12-21 11:26:17 +02:00
Botond Dénes	64bb48855c	flat_mutation_reader: revamp flat_mutation_reader_from_mutations() Add schema parameter so that: * Caller has better control over schema -- especially relevant for reverse reads where it is not possible to follow the convention of passing the query schema which is reversed compared to that of the mutations. * Now that we don't depend on the mutations for the schema, we can lift the restriction on mutations not being empty: this leads to safer code. When the mutations parameter is empty, an empty reader is created. Add "make_" prefix to follow convention of similar reader factory functions. Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>	2021-11-15 17:58:46 +02:00
Botond Dénes	42b677ef6f	querier: consume_page(): remove now unused max_size parameter	2021-09-29 12:15:48 +03:00
Kamil Braun	fbb83dd5ca	reader_concurrency_semaphore: remove default parameter values from constructors It's easy to forget about supplying the correct value for a parameter when it has a default value specified. It's safer if 'production code' is forced to always supply these parameters manually. The default values were mostly useful in tests, where some parameters didn't matter that much and where the majority of uses of the class are. Without default values adding a new parameter is a pain, forcing one to modify every usage in the tests - and there are a bunch of them. To solve this, we introduce a new constructor which requires passing the `for_tests` tag, marking that the constructor is only supposed to be used in tests (and the constructor has an appropriate comment). This constructor uses default values, but the other constructors - used in 'production code' - do not.	2021-09-14 12:20:28 +02:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	9b0b13c450	reader_concurrency_semaphore: adjust reactivated reader timeout Update the reader's timeout where needed after unregistering inactive_read. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	fe479aca1d	reader_permit: add timeout member To replace the timeout parameter passed to flat_mutation_reader methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00
Botond Dénes	c07db00b70	test: move away from make_permit() Use the most appropriate up-front admission variant.	2021-07-14 17:19:02 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Avi Kivity	50f3bbc359	Merge "treewide: various header cleanups" from Pavel S " The patch set is an assorted collection of header cleanups, e.g: * Reduce number of boost includes in header files * Switch to forward declarations in some places A quick measurement was performed to see if these changes provide any improvement in build times (ccache cleaned and existing build products wiped out). The results are posted below (`/usr/bin/time -v ninja dev-build`) for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX). Before: Command being timed: "ninja dev-build" User time (seconds): 28262.47 System time (seconds): 824.85 Percent of CPU this job got: 3979% Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2129888 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1402838 Minor (reclaiming a frame) page faults: 124265412 Voluntary context switches: 1879279 Involuntary context switches: 1159999 Swaps: 0 File system inputs: 0 File system outputs: 11806272 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 After: Command being timed: "ninja dev-build" User time (seconds): 26270.81 System time (seconds): 767.01 Percent of CPU this job got: 3905% Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2117608 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1400189 Minor (reclaiming a frame) page faults: 117570335 Voluntary context switches: 1870631 Involuntary context switches: 1154535 Swaps: 0 File system inputs: 0 File system outputs: 11777280 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 The observed improvement is about 5% of total wall clock time for `dev-build` target. Also, all commits make sure that headers stay self-sufficient, which would help to further improve the situation in the future. " * 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla: transport: remove extraneous `qos/service_level_controller` includes from headers treewide: remove evidently unneded storage_proxy includes from some places service_level_controller: remove extraneous `service/storage_service.hh` include sstables/writer: remove extraneous `service/storage_service.hh` include treewide: remove extraneous database.hh includes from headers treewide: reduce boost headers usage in scylla header files cql3: remove extraneous includes from some headers cql3: various forward declaration cleanups utils: add missing <limits> header in `extremum_tracking.hh`	2021-05-24 14:24:20 +03:00
Avi Kivity	7e5a0b6fd0	test: drop unused fields Drop unused fields in various tests and test libraries.	2021-05-21 21:04:49 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Botond Dénes	300ee974f7	test: use with_cql_test_env_thread where needed Currently `with_cql_test_env()` is equivalent to `with_cql_test_env_thread()`, which resulted in many tests using the former while really needing the latter and getting away with it. This equivalence is incidental and will go away soon, so make sure all cql test env using tests that expect to be run in a thread use the appropriate variant. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210514141614.128213-1-bdenes@scylladb.com>	2021-05-18 13:44:52 +03:00
Botond Dénes	caaa8ef59a	reader_permit: always forward resources This commit conceptually reverts `4c8ab10`. Said commit was meant to prevent the scenario where memory-only permits -- those that don't pass admission but still consume memory -- completely prevent the admission of reads, possibly even causing a deadlock because a permit might even blocks its own admission. The protection introduced by said commit however proved to be very problematic. It made the status of resources on the permit very hard to reason about and created loopholes via which permits could accumulate without tracking or they could even leak resources. Instead of continuing to patch this broken system, this commit does away with this "protection" based on the observation that deadlocks are now prevented anyway by the admission criteria introduced by `0fe75571d9`, which admits a read anyway when all the initial count resources are available (meaning no admitted reader is alive), regardless of availability of memory. The benefits of this revert is that the semaphore now knows about all the resources and is able to do its job better as it is not "lied to" about resource by the permits. Furthermore the status of a permit's resources is much simpler to reason about, there are no more loopholes in unexpected state transitions to swallow/leak resources. To prove that this revert is indeed safe, in the next commit we add robust tests that stress test admission on a highly contested semaphore. This patch also does away with the registered/admitted differentiation of permits, as this doesn't make much sense anymore, instead these two are unified into a single "active" state. One can always tell whether a permit was admitted or not from whether it owns count resources anyway.	2021-04-26 15:56:56 +03:00
Benny Halevy	7c7569f0ad	querier_cache: implement stop Close the _closing_gate to wait on background close of dropped queries, and close all remaining queriers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	87c62b5f59	test: querier_cache_test: close looked up querier Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	2f9cf01aa7	querier_cache: futurize evict api Prepare for futurizing the lower-level inactive reads api. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	43bf0f9356	reader_concurrency_semaphore: add stop method In addition to clear_inactive_reads, that's currently called when the database object is destroyed, introduce a stop() method that will: 1. wait on all background closes of inactive_reads. 2. close all present inactive_reads and waits on their close. 3. signal waiters on the wait_list via broken() with a proper exception indicating that the semaphore was closed. In addition, assert in the semaphore's destructor that it has no remaining inactive reads. Stop must be called from whoever owns the r_c_s. Mainly, from database::stop. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Botond Dénes	c822f0d02a	test: querier_cache_test: add memory based cache eviction test Ensure that the memory consumption of querier cache entries is kept under the limit.	2021-03-18 14:58:21 +02:00
Botond Dénes	594636ebbf	querier: insert(): account immediately evicted querier as resource based eviction `reader_concurrency_semaphore::register_inactive_read()` drops the registered inactive read immediately if there is a resource shortage. This is in effect a resource based eviction, so account it as such in `querier::insert()`.	2021-03-18 14:57:57 +02:00
Benny Halevy	46c2229b78	reader_concurrency_semaphore: separate set_notify_handler from register_inactive_reader Register the inactive reader first with no evict_notify_handler and ttl. Those can be set later, only if registration succeeded. Otherwise, as in the querier example, there is no need to to place the querier in the index and erase it on eviction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 22:31:01 +02:00
Benny Halevy	e072199b8d	reader_concurrency_semaphore: unregister_inactive_read: calling on wrong semaphore is an internal error Calling unregister_inactive_read on the wrong semaphore is a blatant bug so better call on_internal_error so it'd be easier to catch and fix. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 20:32:40 +02:00
Avi Kivity	913d970c64	Merge "Unify inactive readers" from Botond " Currently inactive readers are stored in two different places: * reader concurrency semaphore * querier cache With the latter registering its inactive readers with the former. This is an unnecessarily complex (and possibly surprising) setup that we want to move away from. This series solves this by moving the responsibility if storing of inactive reads solely to the reader concurrency semaphore, including all supported eviction policies. The querier cache is now only responsible for indexing queriers and maintaining relevant stats. This makes the ownership of the inactive readers much more clear, hopefully making Benny's work on introducing close() and abort() a little bit easier. Tests: unit(release, debug:v1) " * 'unify-inactive-readers/v2' of https://github.com/denesb/scylla: reader_concurrency_semaphore: store inactive readers directly querier_cache: store readers in the reader concurrency semaphore directly querier_cache: retire memory based cache eviction querier_cache: delegate expiry to the reader_concurrency_semaphore reader_concurrency_semaphore: introduce ttl for inactive reads querier_cache: use new eviction notify mechanism to maintain stats reader_concurrency_semaphore: add eviction notification facility reader_concurrency_semaphore: extract evict code into method evict()	2021-02-03 10:59:04 +02:00
Benny Halevy	5e41228fe8	test: everywhere: use seastar::testing::local_random_engine Use the thread_local seastar::testing::local_random_engine in all seastar tests so they can be reproduced using the --random-seed option. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210112103713.578301-2-bhalevy@scylladb.com>	2021-01-13 11:07:29 +02:00
Avi Kivity	77d54410d0	test: querier_cache_test: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:37 +03:00
Botond Dénes	ff623e70b3	reader_concurrency_semaphore: name permits Require a schema and an operation name to be given to each permit when created. The schema is of the table the read is executed against, and the operation name, which is some name identifying the operation the permit is part of. Ideally this should be different for each site the permit is created at, to be able to discern not only different kind of reads, but different code paths the read took. As not all read can be associated with one schema, the schema is allowed to be null. The name will be used for debugging purposes, both for coredump debugging and runtime logging of permit-related diagnostics.	2020-10-13 12:32:13 +03:00
Botond Dénes	40c5474022	querier_cache_test: test_immediate_evict_on_insert: use two permits The test currently uses a single permit shared between two simulated reads (to wait admission twice). This is not a supported way of using a permit and will stop working soon as we make the states the permit is in more pronounced.	2020-10-12 15:56:56 +03:00
Botond Dénes	f7eea06f61	querier_cache_test: use local semaphore not the test global one In the mutation source, which creates the reader for this test, the global test semaphore's permit was passed to the created reader (`tests::make_permit()`). This caused reader resources to be accounted on the global test semaphore, instead of the local one the test creates. Just forward the permit passed to the mutation sources to the reader to fix this.	2020-10-06 08:22:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Botond Dénes	99388590da	querier_cache_test: test_resources_based_cache_eviction: use semaphore::consume() to drain semaphore It is much more reliable and simple this way, than playing with `reader_permit::wait_for_admission()`.	2020-09-28 08:46:22 +03:00
Botond Dénes	3c73cc2a4e	tests: prepare for permit forwarding consumption post admission Some tests rely on `consume*()` calls on the permit to take effect immediately. Soon this will only be true once the permit has been admitted, so make sure the permit is admitted in these tests.	2020-09-28 08:46:22 +03:00
Botond Dénes	c18756ce9a	reader_concurrency_semaphore: s/inactive_read_stats/stats/ In preparations of non-inactive read stats being added to the semaphore, rename its existing stats struct and member to a more generic name. Fields, whose name only made sense in the context of the old name are adjusted accordingly.	2020-09-23 13:11:55 +03:00
Wojciech Mitros	45215746fe	increase the maximum size of query results to 2^64 Currently, we cannot select more than 2^32 rows from a table because we are limited by types of variables containing the numbers of rows. This patch changes these types and sets new limits. The new limits take effect while selecting all rows from a table - custom limits of rows in a result stay the same (2^32-1). In classes which are being serialized and used in messaging, in order to be able to process queries originating from older nodes, the top 32 bits of new integers are optional and stay at the end of the class - if they're absent we assume they equal 0. The backward compatibility was tested by querying an older node for a paged selection, using the received paging_state with the same select statement on an upgraded node, and comparing the returned rows with the result generated for the same query by the older node, additionally checking if the paging_state returned by the upgraded node contained new fields with correct values. Also verified if the older node simply ignores the top 32 bits of the remaining rows number when handling a query with a paging_state originating from an upgraded node by generating and sending such a query to an older node and checking the paging_state in the reply(using python driver). Fixes #5101.	2020-08-03 17:32:49 +02:00
Botond Dénes	648ce473ab	test: set the allow_short_read slice option for paged queries Some tests use the lower level methods directly and meant to use paging but didn't and nobody noticed. This was revealed by the enforcement of max result size (introduced in a later patch), which caused these tests to fail due to exceeding the max result size. This patch fixes this by setting the `allow_short_reads` slice option.	2020-07-28 18:00:29 +03:00
Botond Dénes	9eab5bca27	query_*(): use the coordinator specified memory limit for unlimited queries It is important that all replicas participating in a read use the same memory limits to avoid artificial differences due to different amount of results. The coordinator now passes down its own memory limit for reads, in the form of max_result_size (or max_size). For unpaged or reverse queries this has to be used now instead of the locally set max_memory_unlimited_query configuration item. To avoid the replicas accidentally using the local limit contained in the `query_class_config` returned from `database::make_query_class_config()`, we refactor the latter into `database::get_reader_concurrency_semaphore()`. Most of its callers were only interested in the semaphore only anyway and those that were interested in the limit as well should get it from the coordinator instead, so this refactoring is a win-win.	2020-07-28 18:00:29 +03:00
Botond Dénes	159d37053d	storage_proxy: use read_command::max_result_size to pass max result size around Use the recently added `max_result_size` field of `query::read_command` to pass the max result size around, including passing it to remote nodes. This means that the max result size will be sent along each read, instead of once per connection. As we want to select the appropriate `max_result_size` based on the type of the query as well as based on the query class (user or internal) the previous method won't do anymore. If the remote doesn't fill this field, the old per-connection value is used.	2020-07-28 18:00:29 +03:00
Botond Dénes	92a7b16cba	query: read_command: add max_result_size This field will replace max size which is currently passed once per established rpc connection via the CLIENT_ID verb and stored as an auxiliary value on the client_info. For now it is unused, but we update all sites creating a read command to pass the correct value to it. In the next patch we will phase out the old max size and use this field to pass max size on each verb instead.	2020-07-28 18:00:29 +03:00
Botond Dénes	2ca118b2d5	query: read_command: add separate convenience constructor query::read_command currently has a single constructor, which serves both as an idl constructor (order of parameters is fixed) and a convenience one (most parameters have default values). This makes it very error prone to add new parameters, that everyone should fill. The new parameter has to be added as last, with a default value, as the previous ones have a default value as well. This means the compiler's help cannot be enlisted to make sure all usages are updated. This patch adds a separate convenience constructor to be used by normal code. The idl constructor looses all default parameters. New parameters can be added to any position in the convenience constructor (to force users to fill in a meaningful value) while the removed default parameters from the idl constructor means code cannot accidentally use it without noticing.	2020-07-28 18:00:29 +03:00
Botond Dénes	c364c7c6a2	result_memory_limiter: add unlimited_result_size constant To be used as the max result size for internal queries.	2020-07-28 18:00:29 +03:00
Botond Dénes	d5cc932a0b	database: query_mutations(): obtain the memory accounter inside Instead of requesting callers to do it and pass it as a parameter. This is in line with data_query().	2020-07-28 18:00:29 +03:00
Botond Dénes	92ce39f014	query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field We want to switch from using a single limit to a dual soft/hard limit. As a first step we switch the limit field of `query_class_config` to use the recently introduced type for this. As this field has a single user at the moment -- reverse queries (and not a lot of propagation) -- we update it in this same patch to use the soft/hard limit: warn on reaching the soft limit and abort on the hard limit (the previous behaviour).	2020-07-28 18:00:29 +03:00
Botond Dénes	11105cbb78	reader_concurrency_semaphore: make inactive read handles unique across semaphores Currently inactive read handles are only unique within the same semaphore, allowing for an unregister against another semaphore to potentially succeed. This can lead to disasters ranging from crashes to data corruption. While a handle should never be used with another semaphore in the first place, we have recently seen a bug (#6613) causing exactly that, so in this patch we prevent such unregister operations from ever succeeding by making handles unique across all semaphores. This is achieved by adding a pointer to the semaphore to the handle.	2020-07-23 16:43:33 +03:00
Botond Dénes	e678f06a5e	querier_cache: get semaphore from querier Currently the `querier_cache` is passed a semaphore during its construction and it uses this semaphore to do all the inactive reader registering/unregistering. This is inaccurate as in theory cached reads could belong to different semaphores (although currently this is not yet the case). As all queriers store a valid permit now, use this permit to obtain the semaphore the querier is associated with, and register the inactive read with this semaphore.	2020-05-28 11:34:35 +03:00
Botond Dénes	e4c591aa67	database: introduce make_query_class_config() And use it to obtain any query-class specific configuration that was obtained from `table::config` before, such as the read concurrency semaphore and the max memory limit for unlimited queries. As all users of these items get these from the query class config now, we can remove them from `table::config`.	2020-05-28 11:34:35 +03:00

1 2

62 Commits