scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Vladimir Krivopalov	3d13ee3909	tests: Test filtering and forwarding on a partition with interleaved rows and RTs. In this test, rows lie inside range tombstones so we split them on reading. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	d39e58a97a	tests: Add tests for reading wide partitions with range tombstones. Test the case where rows lie outside range tombstones. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	ec2047e1e6	sstables: Support slicing for range tombstones. Both filtering on queried ranges and fast-forwarding are supported. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	d57380f44c	sstables: Set/reset range tombstone start from end open marker. When we skip through a wide partition using promoted index, we may land to a position that lies in the middle of a range tombstone so we need to be aware of it. For this, we check if the previous promoted block has an end open marker and either set the range tombstone start using it or reset if missing. Note several things about the implementation. Firstly, we have to peek back at the previous promoted index block for the end open marker, and so we have to always preserve one more promoted index block when we read the next batch so that we can stil access it. Secondly, we use the previous promoted block end position to build position in partition for the range tombstone start. Lastly, we don't have a notion of end open marker in older consumers that work with SSTables of ka/la formats so we only call the corresponding methods if the consumer supports them. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	939e4893ef	sstables: Fix end_open_marker population in promoted index blocks. We should not access the internal object stored in std::optional when passing the end_open_marker, moreover that it can be disengaged. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	84bff86fbc	sstables: Add need_skip() helper to data_consume_context. This methods tells whether we will need to skip to reach the input position or not. It can be used for skipping with index when reading SSTables 3.x because we only want to to set/reset the open range tombstone bound when we actually move to another promoted index block. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-05 09:48:17 -07:00
Vladimir Krivopalov	ac0c71bdc1	sstables: For end_open_marker, return both position in partition and deletion time. Prior to this fix, the end_open_marker has been only accessible as a plain deletion_time structure. Now it also contains the start position of a promoted index block so that it can be used for setting range tombstone open bound. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-04 18:16:21 -07:00
Takuya ASADA	bd8a5664b8	dist/common/scripts/scylla_raid_setup: create scylla-server.service.d when it doesn't exist When /etc/systemd/system/scylla-server.service.d/capabilities.conf is not installed, we don't have /etc/systemd/system/scylla-server.service.d/, need to create it. Fixes #3738 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180904015841.18433-1-syuu@scylladb.com>	2018-09-04 10:12:32 +03:00
Tomasz Grabiec	4fb3f7e8eb	managed_vector: Make external_memory_usage() ignore reserved space This ensures that row::external_memory_usage() is invariant to insertion order of cells. It should be so, so that accounting of a clustering_row, merged from multiple MVCC versions by the partition_snapshot_flat_reader on behalf of a memtable flush, doesn't give a greater result than what is used by the memtable region. Overaccounting leads to assertion failure in ~flush_memory_accounter. Fixes #3625 (hopefully). Message-Id: <1535982513-19922-1-git-send-email-tgrabiec@scylladb.com>	2018-09-03 17:09:54 +03:00
Takuya ASADA	d78762d627	dist/debian: fix broken debian/changelog It also need $MUSTACHE_DIST. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180903094558.3862-1-syuu@scylladb.com>	2018-09-03 14:04:01 +03:00
Duarte Nunes	e49a14e308	Merge 'Stateful range scans' from Botond " This series extends the query statefullness, introduced by `f8613a841` to point queries, to range scans as well. This means that queriers will be saved and reused for range scans too. This series builds heavily on the infrastructure introduced by stateful point queries, namely the querier object and the querier_cache. It also builds on another critical piece of infrastructure, the multishard_combining_reader, introduced by `2d126a79b`. To make the range scan on a given node suspendable and resumable we move away from the current code in `storage_proxy::query_nonsingular_mutations_locally()` and use a multishard_combining_reader to execute the read. When the page is filled this reader is dismantled and its shard readers are saved in the querier cache. There are of course a lot more details to it but this is the gist of it. Tests: unit(release, debug), dtest(paging_test.py, paging_additional_test.py) " * '1865/range-scans/v7.1' of https://github.com/denesb/scylla: (33 commits) query_pagers: generate query_uuid for range-scans as well storage_proxy: use preferred/last replicas storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent db::consistency_level::filter_for_query() add preferred_endpoints storage_proxy: use query_mutations_from_all_shards() for range scans tests: add unit test for multishard_mutation_query() tests/mutation_assertions.hh: add missing include multishard_mutation_query: add badness counters database: add query_mutations_on_all_shards() mutation_compactor: add detach_state() flat_mutation_reader: add unpop_mutation_fragment() Move reconcilable_result_builder declaration to mutation_query.hh mutation_source_test: add an additional REQUIRE() mutation: add missing assert to mutation from reader querier: add shard_mutation_querier querier: prepare for multi-ranges tests/querier_cache: add tests specific for multiple entry-types querier: split querier into separate data and mutation querier types querier: move consume_page logic into a free function querier: move all matching related logic into free functions ...	2018-09-03 09:09:17 +01:00
Botond Dénes	cd49c23a66	query_pagers: generate query_uuid for range-scans as well And thus enable stateful range scans.	2018-09-03 10:31:44 +03:00
Botond Dénes	6486d6c8bd	storage_proxy: use preferred/last replicas	2018-09-03 10:31:44 +03:00
Botond Dénes	577a06ce1b	storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent	2018-09-03 10:31:44 +03:00
Botond Dénes	6e59cee244	db::consistency_level::filter_for_query() add preferred_endpoints To the second overload (the one without read-repair related params) too.	2018-09-03 10:31:44 +03:00
Botond Dénes	2f66bde26f	storage_proxy: use query_mutations_from_all_shards() for range scans	2018-09-03 10:31:44 +03:00
Botond Dénes	6779b63dfe	tests: add unit test for multishard_mutation_query()	2018-09-03 10:31:44 +03:00
Botond Dénes	c678b665b4	tests/mutation_assertions.hh: add missing include	2018-09-03 10:31:44 +03:00
Botond Dénes	253407bdc8	multishard_mutation_query: add badness counters Add badness counters that allow tracking problems. The following counters are added: 1) multishard_query_unpopped_fragments 2) multishard_query_unpopped_bytes 3) multishard_query_failed_reader_stops 4) multishard_query_failed_reader_saves The first pair of counters observe the amount of work range scan queries have to undo on each page. It is normal for these counters to be non-zero, however sudden spikes in their values can indicate problems. This undoing of work is needed for stateful range-scans to work. When stateful queries are enabled the `multishard_combining_reader` is dismantled and all unconsumed fragments in its and any of its intermediate reader's buffers are pushed back into the originating shard reader's buffer (via `unpop_mutation_fragment()`). This also includes the `partition_start`, the `static_row` (if there is one) and all extracted and active `range_tombstone` fragments. This together can amount to a substantial amount of fragments. (1) counts the amount of fragments moved back, while (2) counts the number of bytes. Monitoring size and quantity separately allows for detecting edge cases like moving many small fragments or just a few huge ones. The counters count the fragments/bytes moved back to readers located on the shard they belong to. The second pair of counters are added to detect any problems around saving readers. Since the failure to save a reader will not fail the read itself, it is necessary to add visibility to these failures by other means. (3) counts the number of times stopping a shard reader (waiting on pending read-aheads and next-partitions) failed while (4) counts the number of times inserting the reader into the `querier_cache` failed. Contrary to the first two counters, which will almost certainly never be zero, these latter two counters should always be zero. Any other value indicates problems in the respective shards/nodes.	2018-09-03 10:31:44 +03:00
Botond Dénes	97364c7ad9	database: add query_mutations_on_all_shards() This method allows for querying a range or ranges on all shards of the node. Under the hood it uses the multishard_combining_reader for executing the query. It supports paging and stateful queries (saving and reusing the readers between pages). All this is transparent to the client, who only needs to supply the same query::read_command::query_uuid through the pages of the query (and supply correct start positions on each page, that match the stop position of the last page).	2018-09-03 10:31:44 +03:00
Botond Dénes	33d72efa49	mutation_compactor: add detach_state() Allow the state of the compaction to be detached. The detached state is a set of mutation fragments, which if replayed through a new compactor object will result in the latter being in the same state as the previous one was. This allows for storing the compaction state in the compacted reader by using `unpop_mutation_fragment()` to push back the fragments that comprise the detached state into the reader. This way, if a new compaction object is created it can just consume the reader and continue where the previous compaction left off.	2018-09-03 10:31:44 +03:00
Botond Dénes	48054ed810	flat_mutation_reader: add unpop_mutation_fragment() This is the inverse of `pop_mutation_fragment()`. Allow fragments to be pushed back into the buffer of the reader to undo a previous consumtion of the fragments.	2018-09-03 10:31:44 +03:00
Botond Dénes	3bcd577907	Move reconcilable_result_builder declaration to mutation_query.hh It will be used by code outside of mutation_partition.cc so it needs to be public. The definition remains in mutation_partition.cc.	2018-09-03 10:31:44 +03:00
Botond Dénes	b8b34223a4	mutation_source_test: add an additional REQUIRE() test_streamed_mutation_forwarding_is_consistent_with_slicing already has a REQUIRE() for the mutation read with the slicing reader. Add another one for the forwarding reader. This makes it more consistent and also helps finding problems with either the forwarding or slicing reader.	2018-09-03 10:31:44 +03:00
Botond Dénes	d347866664	mutation: add missing assert to mutation from reader read_mutation_from_flat_mutation_reader's internal adapter can build a single mutation only and hence can consume only a single partition. If more than one partitions are pushed down from the producer the adaptor will very likely crash. To avoid unnecessary investigations add an assert() to fail early and make it clear what the real problem is. All other consume_ methods have an assert() already for their invariants so this is just following suit.	2018-09-03 10:31:44 +03:00
Botond Dénes	ecb1e79bcc	querier: add shard_mutation_querier The querier to be used for saving shard readers belonging to a multishard range scan. This querier doesn't provide a `consume_page` method as it doesn't support reading from it directly. It is more of a storage to allow caching the reader and any objects it depends on.	2018-09-03 10:31:44 +03:00
Botond Dénes	07cdf766c5	querier: prepare for multi-ranges In the next patch a querier will be added that reads multiple ranges as opposed to a single range that data and mutation queriers read. To keep `querier_cache` code seamless regarding this difference change all range-matching logic to work in terms of `dht::partition_ranges_view`. This allows for cheap and seamless way of having a single code-base for the insert/lookup logic. Code actually matching ranges is updated to be able to handle both singular and multi-ranges while maintaining backward compatibility.	2018-09-03 10:31:44 +03:00
Botond Dénes	88a7effd8d	tests/querier_cache: add tests specific for multiple entry-types	2018-09-03 10:31:44 +03:00
Botond Dénes	c12008b8cb	querier: split querier into separate data and mutation querier types Instead of hiding what compaction method the querier uses (and only expose it via rejecting 'can_be_used_for_page()`) make it very explicit that these are really two different queriers. This allows using different indexes for the two queriers in `querier_cache` and eliminating the possibility of picking up a querier with the wrong compaction method (read kind). This also makes it possible to add new querier type(s) that suit the multishard-query's needs without making a confusing mess of `querier` by making it a union of all querying logic. Splitting the queriers this way changes what happens when a lookup finds a querier of the wrong kind (e.g. emit_only_live::yes for an emit_only_live::no command). As opposed to dropping the found (but wrong) querier the querier will now simply not be found by the lookup. This is a result of using separate search indexes for the different mutation kinds. This change should have no practical implications. Splitting is done by making querier templated on `emit_only_live_rows`. It doesn't make sense to duplicate the entire querier as the two share 99% of the code.	2018-09-03 10:31:44 +03:00
Botond Dénes	e46251ebf6	querier: move consume_page logic into a free function In preparation of the now single querier being split into multiple more specialized ones. Make it possible for the multiple queriers sharing the same implementation. Also, the code can now be reused by outside code as well, not just queriers.	2018-09-03 10:31:44 +03:00
Botond Dénes	c53f17ddb8	querier: move all matching related logic into free functions So that they can be used for multiple querier classes easily, without inheritance. The functions are not visible from the header. Also update the comments on `querier` to w.r.t. the disappeared checking functions. Change the language to be more general. In practice these checks are never done by client code, instead they are done by the `querier_cache`.	2018-09-03 10:31:44 +03:00
Botond Dénes	43f464c52d	querier: inline querier::current_position() and make it public	2018-09-03 10:31:44 +03:00
Botond Dénes	86a61ded7d	querier: s/position/position_view/ Also treat it as a view, that is take it by value in functions, instead of reference.	2018-09-03 10:31:44 +03:00
Botond Dénes	6e4ec53679	querier: move position outside of querier In preparation for having multiple querier types that can share code without inheritance.	2018-09-03 10:31:44 +03:00
Botond Dénes	a172dfec4e	querier: move clustering_position_tracker outside of querier In preparation for having multiple querier types that can share code without inheritance.	2018-09-03 10:31:44 +03:00
Botond Dénes	7bd955e993	querier_cache: move insert/lookup related logic into free functions In preparations for introducing support multiple entry types in the querier_cache move all insert/lookup related logic into free functions. Later these functions will be templated so they can handle multiple entry types with the same code.	2018-09-03 10:31:44 +03:00
Botond Dénes	cded477b94	querier: return std::optional<querier> instead of using create_fun() Requiring the caller of lookup() to pass in a `create_fun()` was not such a good idea in hindsight. It leads to awkward call sites and even more awkward code when trying to find out whether the lookup was successfull or not. Returning an optional gives calling code much more flexibility and makes the code cleaner.	2018-09-03 10:31:44 +03:00
Botond Dénes	5f726e9a89	querier: move all to query namespace To avoid name clashes.	2018-09-03 10:31:44 +03:00
Botond Dénes	867f69b9d1	dht::i_partitioner: add partition_ranges_view	2018-09-03 10:31:44 +03:00
Botond Dénes	a011a9ebf2	mutation_reader: multishard_combining_reader: support custom dismantler Add a dismantler functor parameter. When the multishard reader is destroyed this functor will be called for each shard reader, passing a future to a `stopped_foreign_reader`. This future becomes available when the shard reader is stopped, that is, when it finished all in-progress read-aheads and/or pending next partition calls. The intended use case for the dismantler functor is a client that needs to be notified when readers are destroyed and/or has to have access to any unconsumed fragments from the foreign readers wrapping the shard readers.	2018-09-03 10:31:44 +03:00
Botond Dénes	f13b878a94	mutation_reader: pass all standard reader params to `remote_reader_factory` Extend `remote_reader_factory` interface so that it accepts all standard mutation reader creation parameters. This allows factory lambdas to be truly stateless, not having to capture any standard parameters that is needed for creating the reader. Standard parameters are those accepted by `mutation_source::make_reader()`.	2018-09-03 10:31:44 +03:00
Botond Dénes	e67c6d9f39	flat_mutation_reader::impl: add protected buffer() member To allow implementations to access the buffer in a read-only way.	2018-09-03 10:31:44 +03:00
Botond Dénes	8915293257	multishard_combining_reader: fix incorrect comment	2018-09-03 10:31:44 +03:00
Botond Dénes	75d60b0627	docs: add paged-queries.md design doc	2018-09-03 10:31:44 +03:00
Duarte Nunes	6593226849	Merge branch 'loading_cache: fix a consistency of size() and iterators APIs' from Vlad " After we fixed reloading flow it enabled situations when items are no longer cached but still held in the underlying loading_shared_values object. Since loading_cache::size() returns the size of its loading_shared_values object and loading_cache::begin()/end()/find() are returning iterators based on loading_shared_values iterators these APIs may return very weird values, e.g. size() may return the same value after one of the items have been removed using remove(key) API. This series fixes this by switching mentioned above APIs to work on top of lru_list object instead of loading_shared_values. " * 'loading_cache_fix_api_semantics-v1' of https://github.com/vladzcloudius/scylla: loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' loading_cache: make size() return the size of lru_list instead of loading_shared_values	2018-09-01 11:05:28 +01:00
Avi Kivity	fd8eae50db	build: add relocatable package target A relocatable package contains the Scylla (and iotune) executables (in a bin/ directory), any libraries they may need (lib/) the configuration file defaults (conf/) and supporting scripts (dist/). The libraries are picked up from the host; including libc and the dynamic linker (ld.so). We also provide a thunk script that forces the library path (LD_LIBRARY_PATH) to point at our libraries, and overrides the interpreter to point at our ld.so. With these files, it is possible to run a fully functional Scylla instance on any Linux distribution. This is similar to chroot or containers, except that we run in the same namespace as the host. The packages are created by running ninja build/release/scylla-package.tar or ninja --mode debug build/debug/scylla-package.tar Message-Id: <20180828065352.30730-1-avi@scylladb.com>	2018-08-31 23:14:42 +01:00
Vlad Zolotarov	945d26e4ee	loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' Reloading may hold value in the underlying loading_shared_values while the corresponding cache values have already been deleted. This may create weird situations like this: <populate cache with 10 entries> cache.remove(key1); for (auto& e : cache) { std::out << e << std::endl; } <all 10 entries are printed, including the one for "key1"> In order to avoid such situations we are going to make the loading_cache::iterator to be a transform_iterator of lru_list::iterator instead of loading_shared_values::iterator because lru_list contains entries only for cached items. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 20:56:44 -04:00
Vlad Zolotarov	1e56c7dd58	loading_cache: make size() return the size of lru_list instead of loading_shared_values reloading flow may hold the items in the underlying loading_shared_values after they have been removed (e.g. via remove(key) API) thereby loading_shared_values.size() doesn't represent the correct value for the loading_cache. lru_list.size() on the other hand - does. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 15:55:30 -04:00
Paweł Dziepak	dbbd664600	Update seastar submodule * seastar 12f18ce...5712816 (6): > tests: add signal_test to test list > Merge "Enhancements for memory_output_stream" from Paweł > seastar-addr2line: don't print an empty line between backtrace lines > seastar-addr2line: add --verbose option > seastar-addr2line: make prefix matching non-greedy > future: make available() const	2018-08-30 11:41:27 +01:00
Glauber Costa	8dea1b3c61	database: fix directory for information when loading new SSTables from upload dir When we load new SSTables, we use the directory information from the entry descriptor to build information about those SSTables. When the descriptor is created by flush_upload_dir, the sstable directory used in the descriptor contains the `upload` part. Therefore, we will try to load SSTables that are in the upload directory when we already moved them out and fail. Since the generation also changes, we have been historically fixing the generation manually, but not the SSTable directory. The reason for that is that up until recently, the SSTable directory was passed statically to open_sstables, ignoring whatever the entry descriptor said. Now that the sstable directory is also derived from the entry descriptor, we should fix that too. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180829165326.12183-1-glauber@scylladb.com>	2018-08-30 10:34:25 +03:00

1 2 3 4 5 ...

16447 Commits