scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 12:06:44 +00:00

Author	SHA1	Message	Date
Botond Dénes	6486d6c8bd	storage_proxy: use preferred/last replicas	2018-09-03 10:31:44 +03:00
Botond Dénes	577a06ce1b	storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent	2018-09-03 10:31:44 +03:00
Botond Dénes	6e59cee244	db::consistency_level::filter_for_query() add preferred_endpoints To the second overload (the one without read-repair related params) too.	2018-09-03 10:31:44 +03:00
Botond Dénes	2f66bde26f	storage_proxy: use query_mutations_from_all_shards() for range scans	2018-09-03 10:31:44 +03:00
Botond Dénes	6779b63dfe	tests: add unit test for multishard_mutation_query()	2018-09-03 10:31:44 +03:00
Botond Dénes	c678b665b4	tests/mutation_assertions.hh: add missing include	2018-09-03 10:31:44 +03:00
Botond Dénes	253407bdc8	multishard_mutation_query: add badness counters Add badness counters that allow tracking problems. The following counters are added: 1) multishard_query_unpopped_fragments 2) multishard_query_unpopped_bytes 3) multishard_query_failed_reader_stops 4) multishard_query_failed_reader_saves The first pair of counters observe the amount of work range scan queries have to undo on each page. It is normal for these counters to be non-zero, however sudden spikes in their values can indicate problems. This undoing of work is needed for stateful range-scans to work. When stateful queries are enabled the `multishard_combining_reader` is dismantled and all unconsumed fragments in its and any of its intermediate reader's buffers are pushed back into the originating shard reader's buffer (via `unpop_mutation_fragment()`). This also includes the `partition_start`, the `static_row` (if there is one) and all extracted and active `range_tombstone` fragments. This together can amount to a substantial amount of fragments. (1) counts the amount of fragments moved back, while (2) counts the number of bytes. Monitoring size and quantity separately allows for detecting edge cases like moving many small fragments or just a few huge ones. The counters count the fragments/bytes moved back to readers located on the shard they belong to. The second pair of counters are added to detect any problems around saving readers. Since the failure to save a reader will not fail the read itself, it is necessary to add visibility to these failures by other means. (3) counts the number of times stopping a shard reader (waiting on pending read-aheads and next-partitions) failed while (4) counts the number of times inserting the reader into the `querier_cache` failed. Contrary to the first two counters, which will almost certainly never be zero, these latter two counters should always be zero. Any other value indicates problems in the respective shards/nodes.	2018-09-03 10:31:44 +03:00
Botond Dénes	97364c7ad9	database: add query_mutations_on_all_shards() This method allows for querying a range or ranges on all shards of the node. Under the hood it uses the multishard_combining_reader for executing the query. It supports paging and stateful queries (saving and reusing the readers between pages). All this is transparent to the client, who only needs to supply the same query::read_command::query_uuid through the pages of the query (and supply correct start positions on each page, that match the stop position of the last page).	2018-09-03 10:31:44 +03:00
Botond Dénes	33d72efa49	mutation_compactor: add detach_state() Allow the state of the compaction to be detached. The detached state is a set of mutation fragments, which if replayed through a new compactor object will result in the latter being in the same state as the previous one was. This allows for storing the compaction state in the compacted reader by using `unpop_mutation_fragment()` to push back the fragments that comprise the detached state into the reader. This way, if a new compaction object is created it can just consume the reader and continue where the previous compaction left off.	2018-09-03 10:31:44 +03:00
Botond Dénes	48054ed810	flat_mutation_reader: add unpop_mutation_fragment() This is the inverse of `pop_mutation_fragment()`. Allow fragments to be pushed back into the buffer of the reader to undo a previous consumtion of the fragments.	2018-09-03 10:31:44 +03:00
Botond Dénes	3bcd577907	Move reconcilable_result_builder declaration to mutation_query.hh It will be used by code outside of mutation_partition.cc so it needs to be public. The definition remains in mutation_partition.cc.	2018-09-03 10:31:44 +03:00
Botond Dénes	b8b34223a4	mutation_source_test: add an additional REQUIRE() test_streamed_mutation_forwarding_is_consistent_with_slicing already has a REQUIRE() for the mutation read with the slicing reader. Add another one for the forwarding reader. This makes it more consistent and also helps finding problems with either the forwarding or slicing reader.	2018-09-03 10:31:44 +03:00
Botond Dénes	d347866664	mutation: add missing assert to mutation from reader read_mutation_from_flat_mutation_reader's internal adapter can build a single mutation only and hence can consume only a single partition. If more than one partitions are pushed down from the producer the adaptor will very likely crash. To avoid unnecessary investigations add an assert() to fail early and make it clear what the real problem is. All other consume_ methods have an assert() already for their invariants so this is just following suit.	2018-09-03 10:31:44 +03:00
Botond Dénes	ecb1e79bcc	querier: add shard_mutation_querier The querier to be used for saving shard readers belonging to a multishard range scan. This querier doesn't provide a `consume_page` method as it doesn't support reading from it directly. It is more of a storage to allow caching the reader and any objects it depends on.	2018-09-03 10:31:44 +03:00
Botond Dénes	07cdf766c5	querier: prepare for multi-ranges In the next patch a querier will be added that reads multiple ranges as opposed to a single range that data and mutation queriers read. To keep `querier_cache` code seamless regarding this difference change all range-matching logic to work in terms of `dht::partition_ranges_view`. This allows for cheap and seamless way of having a single code-base for the insert/lookup logic. Code actually matching ranges is updated to be able to handle both singular and multi-ranges while maintaining backward compatibility.	2018-09-03 10:31:44 +03:00
Botond Dénes	88a7effd8d	tests/querier_cache: add tests specific for multiple entry-types	2018-09-03 10:31:44 +03:00
Botond Dénes	c12008b8cb	querier: split querier into separate data and mutation querier types Instead of hiding what compaction method the querier uses (and only expose it via rejecting 'can_be_used_for_page()`) make it very explicit that these are really two different queriers. This allows using different indexes for the two queriers in `querier_cache` and eliminating the possibility of picking up a querier with the wrong compaction method (read kind). This also makes it possible to add new querier type(s) that suit the multishard-query's needs without making a confusing mess of `querier` by making it a union of all querying logic. Splitting the queriers this way changes what happens when a lookup finds a querier of the wrong kind (e.g. emit_only_live::yes for an emit_only_live::no command). As opposed to dropping the found (but wrong) querier the querier will now simply not be found by the lookup. This is a result of using separate search indexes for the different mutation kinds. This change should have no practical implications. Splitting is done by making querier templated on `emit_only_live_rows`. It doesn't make sense to duplicate the entire querier as the two share 99% of the code.	2018-09-03 10:31:44 +03:00
Botond Dénes	e46251ebf6	querier: move consume_page logic into a free function In preparation of the now single querier being split into multiple more specialized ones. Make it possible for the multiple queriers sharing the same implementation. Also, the code can now be reused by outside code as well, not just queriers.	2018-09-03 10:31:44 +03:00
Botond Dénes	c53f17ddb8	querier: move all matching related logic into free functions So that they can be used for multiple querier classes easily, without inheritance. The functions are not visible from the header. Also update the comments on `querier` to w.r.t. the disappeared checking functions. Change the language to be more general. In practice these checks are never done by client code, instead they are done by the `querier_cache`.	2018-09-03 10:31:44 +03:00
Botond Dénes	43f464c52d	querier: inline querier::current_position() and make it public	2018-09-03 10:31:44 +03:00
Botond Dénes	86a61ded7d	querier: s/position/position_view/ Also treat it as a view, that is take it by value in functions, instead of reference.	2018-09-03 10:31:44 +03:00
Botond Dénes	6e4ec53679	querier: move position outside of querier In preparation for having multiple querier types that can share code without inheritance.	2018-09-03 10:31:44 +03:00
Botond Dénes	a172dfec4e	querier: move clustering_position_tracker outside of querier In preparation for having multiple querier types that can share code without inheritance.	2018-09-03 10:31:44 +03:00
Botond Dénes	7bd955e993	querier_cache: move insert/lookup related logic into free functions In preparations for introducing support multiple entry types in the querier_cache move all insert/lookup related logic into free functions. Later these functions will be templated so they can handle multiple entry types with the same code.	2018-09-03 10:31:44 +03:00
Botond Dénes	cded477b94	querier: return std::optional<querier> instead of using create_fun() Requiring the caller of lookup() to pass in a `create_fun()` was not such a good idea in hindsight. It leads to awkward call sites and even more awkward code when trying to find out whether the lookup was successfull or not. Returning an optional gives calling code much more flexibility and makes the code cleaner.	2018-09-03 10:31:44 +03:00
Botond Dénes	5f726e9a89	querier: move all to query namespace To avoid name clashes.	2018-09-03 10:31:44 +03:00
Botond Dénes	867f69b9d1	dht::i_partitioner: add partition_ranges_view	2018-09-03 10:31:44 +03:00
Botond Dénes	a011a9ebf2	mutation_reader: multishard_combining_reader: support custom dismantler Add a dismantler functor parameter. When the multishard reader is destroyed this functor will be called for each shard reader, passing a future to a `stopped_foreign_reader`. This future becomes available when the shard reader is stopped, that is, when it finished all in-progress read-aheads and/or pending next partition calls. The intended use case for the dismantler functor is a client that needs to be notified when readers are destroyed and/or has to have access to any unconsumed fragments from the foreign readers wrapping the shard readers.	2018-09-03 10:31:44 +03:00
Botond Dénes	f13b878a94	mutation_reader: pass all standard reader params to `remote_reader_factory` Extend `remote_reader_factory` interface so that it accepts all standard mutation reader creation parameters. This allows factory lambdas to be truly stateless, not having to capture any standard parameters that is needed for creating the reader. Standard parameters are those accepted by `mutation_source::make_reader()`.	2018-09-03 10:31:44 +03:00
Botond Dénes	e67c6d9f39	flat_mutation_reader::impl: add protected buffer() member To allow implementations to access the buffer in a read-only way.	2018-09-03 10:31:44 +03:00
Botond Dénes	8915293257	multishard_combining_reader: fix incorrect comment	2018-09-03 10:31:44 +03:00
Botond Dénes	75d60b0627	docs: add paged-queries.md design doc	2018-09-03 10:31:44 +03:00
Duarte Nunes	6593226849	Merge branch 'loading_cache: fix a consistency of size() and iterators APIs' from Vlad " After we fixed reloading flow it enabled situations when items are no longer cached but still held in the underlying loading_shared_values object. Since loading_cache::size() returns the size of its loading_shared_values object and loading_cache::begin()/end()/find() are returning iterators based on loading_shared_values iterators these APIs may return very weird values, e.g. size() may return the same value after one of the items have been removed using remove(key) API. This series fixes this by switching mentioned above APIs to work on top of lru_list object instead of loading_shared_values. " * 'loading_cache_fix_api_semantics-v1' of https://github.com/vladzcloudius/scylla: loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' loading_cache: make size() return the size of lru_list instead of loading_shared_values	2018-09-01 11:05:28 +01:00
Avi Kivity	fd8eae50db	build: add relocatable package target A relocatable package contains the Scylla (and iotune) executables (in a bin/ directory), any libraries they may need (lib/) the configuration file defaults (conf/) and supporting scripts (dist/). The libraries are picked up from the host; including libc and the dynamic linker (ld.so). We also provide a thunk script that forces the library path (LD_LIBRARY_PATH) to point at our libraries, and overrides the interpreter to point at our ld.so. With these files, it is possible to run a fully functional Scylla instance on any Linux distribution. This is similar to chroot or containers, except that we run in the same namespace as the host. The packages are created by running ninja build/release/scylla-package.tar or ninja --mode debug build/debug/scylla-package.tar Message-Id: <20180828065352.30730-1-avi@scylladb.com>	2018-08-31 23:14:42 +01:00
Vlad Zolotarov	945d26e4ee	loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' Reloading may hold value in the underlying loading_shared_values while the corresponding cache values have already been deleted. This may create weird situations like this: <populate cache with 10 entries> cache.remove(key1); for (auto& e : cache) { std::out << e << std::endl; } <all 10 entries are printed, including the one for "key1"> In order to avoid such situations we are going to make the loading_cache::iterator to be a transform_iterator of lru_list::iterator instead of loading_shared_values::iterator because lru_list contains entries only for cached items. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 20:56:44 -04:00
Vlad Zolotarov	1e56c7dd58	loading_cache: make size() return the size of lru_list instead of loading_shared_values reloading flow may hold the items in the underlying loading_shared_values after they have been removed (e.g. via remove(key) API) thereby loading_shared_values.size() doesn't represent the correct value for the loading_cache. lru_list.size() on the other hand - does. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 15:55:30 -04:00
Paweł Dziepak	dbbd664600	Update seastar submodule * seastar 12f18ce...5712816 (6): > tests: add signal_test to test list > Merge "Enhancements for memory_output_stream" from Paweł > seastar-addr2line: don't print an empty line between backtrace lines > seastar-addr2line: add --verbose option > seastar-addr2line: make prefix matching non-greedy > future: make available() const	2018-08-30 11:41:27 +01:00
Glauber Costa	8dea1b3c61	database: fix directory for information when loading new SSTables from upload dir When we load new SSTables, we use the directory information from the entry descriptor to build information about those SSTables. When the descriptor is created by flush_upload_dir, the sstable directory used in the descriptor contains the `upload` part. Therefore, we will try to load SSTables that are in the upload directory when we already moved them out and fail. Since the generation also changes, we have been historically fixing the generation manually, but not the SSTable directory. The reason for that is that up until recently, the SSTable directory was passed statically to open_sstables, ignoring whatever the entry descriptor said. Now that the sstable directory is also derived from the entry descriptor, we should fix that too. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180829165326.12183-1-glauber@scylladb.com>	2018-08-30 10:34:25 +03:00
Nadav Har'El	2f02d006b3	materialized views: more tests Additional tests for cases surrounding issue #3362, where base rows disappear (or not) and view rows need to disappear (or not) as well. These new tests focus on checking that view_updates::do_delete_old_entry() is correct. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180829131914.16042-2-nyh@scylladb.com>	2018-08-29 14:33:48 +01:00
Nadav Har'El	16a6f76873	materialized views: simplify do_delete_old_entry() In previous patches, we gave up on an old (and broken) attempt to track the timestamps of many unselected base-table columns through one row marker in the view table - and replaced them by "virtual cells", one per unselected cell. The do_delete_old_entry() function still contains old code which maintained that row marker, and is no longer needed. That old code is no only no longer needed, it also no longer did anything because all columns now appear in the view (as virtual columns) so the code ignored them when calculating the row marker. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180829131914.16042-1-nyh@scylladb.com>	2018-08-29 14:33:41 +01:00
Duarte Nunes	79d796e710	Merge 'Materialized Views: row liveness correction' from Nadav " When a view's partition key contains only columns from the base's partition key (and not an additional one), the liveness - existance or disappearance - of a view-table row is tied to the liveness of the base table row. And that, in turn, depends not only on selected columns (base-table columns SELECTed to also appear in the view) but also on unselected columns. This means that we may need to keep a view row alive even without data, just because some unselected column is alive in the base table. Before this patch set we tried to build a single "row marker" in the view column which tried to summarize the liveness information in all unselected columns. But this proved unworkable, as explained in issue #3362 and as will be demonstrated in unit tests at the end of this series. Because we can't replace several unselected cells by one row marker, what we do in this series is to add for each for the unselected cells a "virtual cell" which contains the cell's liveness information (timestamp, deletion, ttl) but not its value. For collections, we can't represent the entire collection by one virtual cell, and rather need a collection of virtual cells. Fixes #3362 " * 'virtual-cols-v3' of https://github.com/nyh/scylla: Materialized Views: test that virtual columns are not visible Materialized Views: unit test reproducing fixed issue #3362 Materialized Views: no need for elaborate row marker calculations Materialized Views: add unselected columns as virtual columns Materialized Views: fill virtual columns Do not allow selecting a virtual column schema: persist "view virtual" columns to a separate system table schema: add "view virtual" flag to schema's column_definition Add "empty" type name to CQL parser, but only for internal parsing	2018-08-29 14:32:38 +01:00
Paweł Dziepak	6f1c3e6945	Merge "Convert more execution_stages to inherit scheduling_groups" from Avi " Previous work (`71471bb322`) converted the CQL layer to inheriting execution stages, paving the way to multiple users sharing the front-end. This patchset does the same thing to the back-end, converting more execution stages to preserve the caller's scheduling_group. Since RPC now (`8c993e0728`) assigns the correct scheduling group within the replica, we can extend that work so a statement is executed with the same scheduling group all the way to sstable parsing, even if we cross nodes in the process. This improves performance isolation and paves the way to multi-user SLA guarantees. " * tag 'inherit-sched_group/v1' of https://github.com/avikivity/scylla: database: make database's mutation apply stage inherit its scheduling group from the caller database: make database::_mutation_query_stage inherit the scheduling group database: make database::_data_query_stage inheriting its caller's scheduling_group storage_proxy: make _mutate_stage inherit its caller's scheduling_group	2018-08-28 13:49:31 +01:00
Duarte Nunes	f6aadd8077	Merge 'utils::loading_cache: improve reload() robustness' from Vlad "This series introduces a few improvements related to a reload flow. From now on the callback may assume that the "key" parameter value is kept alive till the end of its execution in the reloading flow. It may also safely evict as many items from the cache as needed." Fixes #3606 * 'loading_cache_improve_reload-v1' of https://github.com/vladzcloudius/scylla: utils::loading_cache: hold a shared_value_ptr to the value when we reload utils::loading_cache::on_timer(): remove not needed capture of "this" utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload	2018-08-28 10:52:20 +01:00
Piotr Sarna	aa2bfc0a71	tests: add multi-column pk test to INSERT JSON case Refs #3687 Message-Id: <6ba1328549ed701691ca7cbdacc7d6fa72f2c3de.1534171422.git.sarna@scylladb.com>	2018-08-28 11:34:13 +03:00
Piotr Sarna	fa72422baa	cql3: fix handling multi-column partition key in INSERT JSON Multiple column partition keys were previously handled incorrectly, now the implementation is based on from_exploded instead of from_singular. Fixes #3687 Message-Id: <09e0bdb0f1c18d49b9e67c21777d93ba1545a13c.1534171422.git.sarna@scylladb.com>	2018-08-28 11:34:11 +03:00
Avi Kivity	1fd9974b6b	Merge "tests/loading_cache_test: Fix flakiness" from Duarte " Fix loading_cache_test flakiness by retrying assertions. Tests: unit(loading_cache_test(debug, release)) Fixes #3723 " * 'loading-cache-test-flake/v4' of https://github.com/duarten/scylla: tests/loading_cache_test: Unflake test_loading_cache_loading_reloading tests/loading_cache_test: Use eventually() instead of open-coding it tests/mutation_reader_test: Extract eventually_true() to eventually.hh tests/cql_test_env: Lift eventually() to its own header file	2018-08-28 09:35:09 +03:00
Takuya ASADA	4a5157857a	dist/debian: support package renaming on build script To automatically rename packages on enterprise release, added package name prefix as a variable on build_deb.sh. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180828010445.11920-1-syuu@scylladb.com>	2018-08-28 09:25:07 +03:00
Avi Kivity	22396d57c2	Update seastar submodule * seastar 9bb1611...12f18ce (17): > correctly configure I/O Scheduler for usage with the YAML file > Added support for user-defined signal handlers > Added reactor method to modify blocked_reactor_notify_ms > configure.py: Use the user-specified compiler for dialect detection > seastar-addr2line: clear current trace when omitting already seen trace > seastar-addr2line: fix redirecting output to a file > seastar-addr2line: don't require a space before the addresses > tests: Ensure test thread is always joined > README.md: Add cute badges > iotune: adjust num-io-queues recommendation > dns: add SRV record lookup > reactor: define max_aio_per_queue for C++14 > reactor,alien: silence GCC warnings > core,json,net: silence GCC warnings > fstream: "using data_sink_impl::put" to silence gcc warning > Merge 'Ensure Seastar compiles in C++14 mode' from Jesse > Revert "foreign_ptr: allow waiting for the destruction of the managed ptr"	2018-08-28 09:10:14 +03:00
Tomasz Grabiec	75cde85349	Merge "Support reading range tombstones" from Piotr and Vladimir Implement and test support for reading range tombstones in SSTables 3. Does not yet support reads which are using slicing or fast forwarding. From github.com/scylladb/seastar-dev.git haaawk/sstables3/tombstones_v11: Piotr Jastrzebski (5): sstables: Add consumer_m::consume_range_tombstone sstables: Support null columns in ck sstables: Support reading range_tombstones sstables: Test reading range_tombstones sstables: Add test for RT with non-full key Vladimir Krivopalov (2): sstables: Add operator<< overload for bound_kind_m. keys: Add clustering_key_prefix::make_full helper.	2018-08-27 20:43:38 +02:00
Duarte Nunes	40044c0460	tests/loading_cache_test: Unflake test_loading_cache_loading_reloading The `loading_cache_test::test_loading_cache_loading_reloading` test case is flaky, and fails in both debug and release mode. In an over-provisioned environment, it's possible that when the reactor runs, the timers for the `sleep()` and for reloading the `loading_cache` are both expired, and continuations are scheduled with an arbitrary order, causing the test to fail. Fixes #3723 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-27 19:24:05 +01:00

1 2 3 4 5 ...

16435 Commits