scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	4f5ccf82cb	mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_clustering_row() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Avi Kivity	5e292897df	test: perf/perf_sstable.hh: prepare for asynchronously closed sstables_manager Obtain a test_env using do_with_async_returning(); and pass it to column_family_test_config so it can stop using test_sstables_manager.	2020-09-23 20:55:14 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Avi Kivity	3530e80ce1	Merge "Support md format" from Benny " This series adds support for the "md" sstable format. Support is based on the following: * do not use clustering based filtering in the presence of static row, tombstones. * Disabling min/max column names in the metadata for formats older than "md". * When updating the metadata, reset and disable min/max in the presence of range tombstones (like Cassandra does and until we process them accurately). * Fix the way we maintain min/max column names by: keeping whole clustering key prefixes as min/max rather than calculating min/max independently for each component, like Cassandra does in the "md" format. Fixes #4442 Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug) md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1 " * tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits) config: enable_sstables_md_format by default test: cql_query_test: add test_clustering_filtering unit tests table: filter_sstable_for_reader: allow clustering filtering md-format sstables table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results table: filter_sstable_for_reader: adjust to md-format table: filter_sstable_for_reader: include non-scylla sstables with tombstones table: filter_sstable_for_reader: do not filter if static column is requested table: filter_sstable_for_reader: refactor clustering filtering conditional expression features: add MD_SSTABLE_FORMAT cluster feature config: add enable_sstables_md_format database: add set_format_by_config test: sstable_3_x_test: test both mc and md versions test: Add support for the "md" format sstables: mx/writer: use version from sstable for write calls sstables: mx/writer: update_min_max_components for partition tombstone sstables: metadata_collector: support min_max_components for range tombstones sstable: validate_min_max_metadata: drop outdated logic sstables: rename mc folder to mx sstables: may_contain_rows: always true for old formats sstables: add may_contain_rows ...	2020-08-11 13:29:11 +03:00
Piotr Jastrzebski	80e3923b3c	codebase wide: replace find(...) != end() with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously the code pattern looked like: <collection>.find(<element>) != <collection>.end() In C++20 the same can be expressed with: <collection>.contains(<element>) This is not only more concise but also expresses the intend of the code more clearly. This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>	2020-08-11 13:28:50 +03:00
Benny Halevy	e2340d0684	config: enable_sstables_md_format by default Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 19:19:32 +03:00
Benny Halevy	65239a6e50	config: add enable_sstables_md_format MD format is disabled by default at this point. The option extends enable_sstables_mc_format so that both are needed to be set for supporting the md format. The MD_FORMAT cluster feature will be added in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Avi Kivity	257c17a87a	Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael " While working on another patch I was getting odd compiler errors saying that a call to ::make_shared was ambiguous. The reason was that seastar has both: template <typename T, typename... A> shared_ptr<T> make_shared(A&&... a); template <typename T> shared_ptr<T> make_shared(T&& a); The second variant doesn't exist in std::make_shared. This series drops the dependency in scylla, so that a future change can make seastar::make_shared a bit more like std::make_shared. " * 'espindola/make_shared' of https://github.com/espindola/scylla: Everywhere: Explicitly instantiate make_lw_shared Everywhere: Add a make_shared_schema helper Everywhere: Explicitly instantiate make_shared cql3: Add a create_multi_column_relation helper main: Return a shared_ptr from defer_verbose_shutdown	2020-08-02 19:51:24 +03:00
Botond Dénes	6660a5df51	result_memory_accounter: remove default constructor If somebody wants to bypass proper memory accounting they should at the very least be forced to consider if that is indeed wise and think a second about the limit they want to apply.	2020-07-28 18:00:29 +03:00
Rafael Ávila de Espíndola	efeaded427	Everywhere: Add a make_shared_schema helper This replaces a lot of make_lw_shared(schema(...)) with make_shared_schema(...). This makes it easier to drop a dependency on the differences between seastar::make_shared and std::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Pavel Emelyanov	f8ffc31218	test: Print more sizes in memory_footprint_test The row cache memory footprint changed after switch to B+ because we no longer have a sole cache_entry allocation, but also the bplus::data and bplus::node. Knowing their sizes helps analyzing the footprint changes. Also print the size of memtable_entry that's now also stored in B+'s data. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	174b101a49	row_cache: Switch partition tree onto B+ rails The row_cache::partitions_type is replaced from boost::intrusive::set to bplus::tree<Key = int64_t, T = array_trusted_bounds<cache_entry>> Where token is used to quickly locate the partition by its token and the internal array -- to resolve hashing conflicts. Summary of changes in cache_entry: - compare's goes away as the new collection needs tri-compare one which is provided by ring_position_comparator - when initialized the dummy entry is added with "after_all_keys" kind, not "before_all_keys" as it was by default. This is to make tree entries sorted by token - insertion and removing of cache_entries happens inside double_decker, most of the changes in row_cache.cc are about passing constructor args from current_allocator.construct into double_decker.empace_before() - the _flags is extended to keep array head/tail bits. There's a room for it, sizeof(cache_entry) remains unchanged The rest fits smothly into the double_decker API. Also, as was told in the previous patch, insertion and removal _may_ invalidate iterators, but may leave them intact. However, currently this doesn't seem to be a problem as the cache_tracker ::insert() and ::on_partition_erase do invalidate iterators unconditionally. Later this can be otimized, as iterators are invalidated by double-decker only in case of hash conflict, otherwise it doesn't change arrays and B+ tree doesn't invalidate its. tests: unit(dev), perf(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	95f15ea383	utils: B+ tree implementation // The story is at // https://groups.google.com/forum/#!msg/scylladb-dev/sxqTHM9rSDQ/WqwF1AQDAQAJ This is the B+ version which satisfies several specific requirements to be suitable for row-cache usage. 1. Insert/Remove doesn't invalidate iterators 2. Elements should be LSA-compactable 3. Low overhead of data nodes (1 pointer) 4. External less-only comparator 5. As little actions on insert/delete as possible 6. Iterator walks the sorted keys The design, briefly is: There are 3 types of nodes: inner, leaf and data, inner and leaf keep build-in array of N keys and N(+1) nodes. Leaf nodes sit in a doubly linked list. Data nodes live separately from the leaf ones and keep pointers on them. Tree handler keeps pointers on root and left-most and right-most leaves. Nodes do _not_ keep pointers or references on the tree (except 3 of them, see below). changes in v9: - explicitly marked keys/kids indices with type aliases - marked the whole erase/clear stuff noexcept - disposers now accept object pointer instead of reference - clear tree in destructor - added more comments - style/readability review comments fixed Prior changes - Add noexcepts where possible - Restrict Less-comparator constraint -- it must be noexcept - Generalized node_id - Packed code for beging()/cbegin() - Unsigned indices everywhere - Cosmetics changes - Const iterators - C++20 concepts - The index_for() implmenetation is templatized the other way to make it possible for AVX key search specialization (further patching) - Insertion tries to push kids to siblings before split Before this change insertion into full node resulted into this node being split into two equal parts. This behaviour for random keys stress gives a tree with ~2/3 of nodes half-filled. With this change before splitting the full node try to push one element to each of the siblings (if they exist and not full). This slows the insertion a bit (but it's still way faster than the std::set), but gives 15% less total number of nodes. - Iterator method to reconstruct the data at the given position The helper creates a new data node, emplaces data into it and replaces the iterator's one with it. Needed to keep arrays of data in tree. - Milli-optimize erase() - Return back an iterator that will likely be not re-validated - Do not try to update ancestors separation key for leftmost kid This caused the clear()-like workload work poorly as compared to std:set. In particular the row_cache::invalidate() method does exactly this and this change improves its timing. - Perf test to measure drain speed - Helper call to collect tree counters - Fix corner case of iterator.emplace_before() - Clean heterogenous lookup API - Handle exceptions from nodes allocations - Explicitly mark places where the key is copied (for future) - Extend the tree.lower_bound() API to report back whether the bound hit the key or not - Addressed style/cleanness review comments Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:29:43 +03:00
Pavel Emelyanov	9d38846ed2	test: Move perf measurement helpers into header To use the code in new perf tests in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 12:58:26 +03:00
Tomasz Grabiec	19501d9ef2	sstables, tests: Allow disabling binary search in promoted index from perf tests	2020-06-16 16:15:23 +02:00
Raphael S. Carvalho	8e47f61df7	compaction: Enable tombstone expiration based on the presence of the sstable set For tombstone expiration to proceed correctly without the risk of resurrecting data, the sstable set must be present. Regular compaction and derivatives provide the sstable set, so they're able to expire tombstones with no resurrection risk. Resharding, on the other hand, can run on any shard, not necessarily on the same shard that one of the input sstables belongs to, so it currently cannot provide a sstable set for tombstone expiration to proceed safely. That being said, let's only do expiration based on the presence of the set. This makes room for the sstable set to be feeded to compaction via descriptor, allowing even resharding to do expiration. Currently, compaction thinks that sstable set can only come from the table, and that also needs to be changed for further flexibility. It's theoretically possible that a given resharding job will resurrect data if a fully expired SSTable is resharded at a shard which it doesn't belong to. Resharding will have no way to tell that expiring all that data will lead to resurrection because the relevant SSTables are at different shards. This is fixed by checking for fully expired sstables only on presence of the sstable set. Fixes #6600. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200605200954.24696-1-raphaelsc@scylladb.com>	2020-06-07 11:46:48 +03:00
Piotr Sarna	d1f5d42a25	test: add big_decimal perf test In order to be able to measure the impact of rewritting the parsing mechanism from std::regex to a hand-written state machine.	2020-06-01 16:11:49 +02:00
Botond Dénes	d68ac8bf18	treewide: remove all uses of no_reader_permit()	2020-05-28 11:34:35 +03:00
Botond Dénes	fe024cecdc	row_cache: pass a valid permit to underlying read All reader are soon going to require a valid permit, so make sure we have a valid permit which we can pass to the underlying reader when creating it. This means `row_cache::make_reader()` now also requires a permit to be passed to it.	2020-05-28 11:34:35 +03:00
Botond Dénes	9ede82ebf8	memtable: pass a valid permit to the delegate reader All reader are soon going to require a valid permit, so make sure we have a valid permit which we can pass to the delegate reader when creating it. This means `memtable::make_flat_reader()` now also requires a permit to be passed to it. Internally the permit is stored in `scanning_reader`, which is used both for flushes and normal reads. In the former case a permit is not required.	2020-05-28 11:34:35 +03:00
Botond Dénes	cc5137ffe3	table: require a valid permit to be passed to most read methods Now that the most prevalent users (range scan and single partition reads) all pass valid permits we require all users to do so and propagate the permit down towards `make_sstable_reader()`. The plan is to use this permit for restricting the sstable readers, instead of the semaphore the table is configured with. The various `make_streaming_*reader()` overloads keep using the internal semaphores as but they also create the permit before the read starts and pass it to `make_sstable_reader()`.	2020-05-28 11:34:35 +03:00
Piotr Sarna	92aadb94e5	treewide: propagate trace state to write path In order to add tracing to places where it can be useful, e.g. materialized view updates and hinted handoff, tracing state is propagated to all applicable call sites.	2020-05-18 16:05:23 +02:00
Glauber Costa	70a89ab4ab	compaction: do not assume I/O priority class We shouldn't assume the I/O priority class for compactions. For instance, if we are dealing with offstrategy compactions we may want to use the maintenance group priority for them. For now, all compactions are put in the compaction class. rewrite compactions (scrub, cleanup) could be maintenance, but we don't have clear access to the database object at this time to derive the equivalent CPU priority. This is planned to be changed in the future, and when we do change it, we'll adjust. Same goes for resharding: while we could at this point change it we'd risking memory pressure since resharding is run online and sstables are shared until resharding is done. When we move it to offline execution we'll do it with maintenance priority. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200512002233.306538-3-glauber@scylladb.com>	2020-05-12 08:23:19 +03:00
Tomasz Grabiec	2078016f84	test: memory_footprint: Avoid invalid identifiers as columnnames Column name should not start with a digit, as can be the case with random_string(). Message-Id: <1588860648-15796-1-git-send-email-tgrabiec@scylladb.com>	2020-05-07 17:33:34 +03:00
Pavel Emelyanov	ef181fb2d0	test: Add option to flush memtables for perf_simple_query The test in question measures the speed of memtables, not the row_cache. With this option it can do both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200507140603.12350-1-xemul@scylladb.com>	2020-05-07 16:09:40 +02:00
Piotr Sarna	bec95a0605	treewide: use thread-safe variant of localtime In order to ensure thread-safety, all usages of localtime() are replaced with localtime_r(), which may accept a local buffer. Tests: unit(dev) Fixes #6364 Message-Id: <ad4a0c0e1707f0318325718715a3a647e3ebfdfe.1588592156.git.sarna@scylladb.com>	2020-05-04 14:46:08 +03:00
Raphael S. Carvalho	5ac0d31323	test: perf_simple_query: fix test with smp count > 1 that code doesn't run under a thread, so let's futurize it. the code worked with single cpu because get() returns right away due to no deferring point. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200427155303.82763-1-raphaelsc@scylladb.com>	2020-04-27 18:58:25 +03:00
Tomasz Grabiec	92771e904a	test: memory_footprint: Silence logging by default	2020-04-17 11:34:13 +02:00
Tomasz Grabiec	1df63b60c3	test: memory_footprint: Introduce --partition-count option	2020-04-17 11:34:13 +02:00
Tomasz Grabiec	7c2f6dd75e	test: memory_footprint: Run under a cql_test_env	2020-04-17 11:34:13 +02:00
Tomasz Grabiec	04c093cbec	test: memory_footprint: Calculate sstable size for each format version	2020-04-17 11:34:12 +02:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Avi Kivity	1799cfa88a	logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope This allows us to drop a #include <reactor.hh>, reducing compile time. Several translation units that lost access to required declarations are updated with the required includes (this can be an include of reactor.hh itself, in case the translation unit that lost it got it indirectly via logalloc.hh) Ref #1.	2020-04-05 12:45:08 +03:00
Botond Dénes	240b5e0594	frozen_schema: key() remove unused schema parameter Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200402092249.680210-1-bdenes@scylladb.com>	2020-04-02 14:43:35 +02:00
Glauber Costa	e8801cd77b	compaction: enhance compaction_descriptor with creator and replace function There are many differences between resharding and compaction that are artificial, arising more from the way we ended up implementing it than necessity. This patch attempts to pass the creator and replacer functions through the compaction_descriptor. There is a difference between the creator function for resharding and regular compaction: resharding has to pass the shard number on behalf of which the SSTable is created. However regular compactions can just ignore this. No need to have a special path just for this. After this is done, the constructor for the compaction object can be greatly simplified. In further patches I intend to simplify it a bit further, but some more cleanup has to happen first. To make that happen we have to construct a compaction_descriptor object inside the resharding function. This is temporary: resharding currently works with a descriptor, but at some point that descriptor is lost and broken into pieces to be passed to this function. The overarching goal of this work is exactly to be able to keep that descriptor for as long as possible, which should simplify things a lot. Callers are patched, but there are plenty for sstable_datafile_test.cc. For their benefit, a helper function is provided to keep the previous signature (test only). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-03-31 19:41:25 -04:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00
Botond Dénes	575466b2cf	test/boost/sstable_test.hh: move generic stuff to test/lib/sstable_utils.hh sstable_test.hh started as collection of utilities shared between the various `_sstable_test.cc` files. Predictably other tests started using it as well, among them some that are non boost unit tests. This poses a problem as if we add the missing boost/test/unit_test.hpp include to sstable_test.hh these tests will suddenly have missing symbols from boost::test. To avoid linking boost::test into all these users, extract utilities more widely used into sstable_utils.hh	2020-03-23 09:29:45 +02:00
Rafael Ávila de Espíndola	80d969ce31	everywhere: Use uninitialized_string instead of sstring::initialized_later This is just a trivial wrapper over initialized_later when using sstring, but also works when std::string is used. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:17:49 -07:00
Rafael Ávila de Espíndola	caef2ef903	everywhere: Don't assume sstring::begin() and sstring::end() are pointers If we switch to using std::string we have to handle begin and end returning iterators. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Konstantin Osipov	ac0717fb64	test: consistently use a global testlog object in all tests Use test/lib/log.hh in all tests now that we have it.	2020-03-05 13:34:24 +03:00
Avi Kivity	157fe4bd19	Merge "Remove default timeouts" from Botond " Timeouts defaulted to `db::no_timeout` are dangerous. They allow any modifications to the code to drop timeouts and introduce a source of unbounded request queue to the system. This series removes the last such default timeouts from the code. No problems were found, only test code had to be updated. tests: unit(dev) " * 'no-default-timeouts/v1' of https://github.com/denesb/scylla: database: database::query(), database::apply(): remove default timeouts database: table::query(): remove default timeout mutation_query: data_query(): remove default timeout mutation_query: mutation_query(): remove default timeout multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout reader_concurrency_semaphore: wait_admission(): remove default timeout utils/logallog: run_when_memory_available(): remove default timeout	2020-03-01 17:29:17 +02:00
Rafael Ávila de Espíndola	2679c0cc87	perf_simple_query: Pass a string_view to make_counter_schema With this we don't need to construct a sstring just to call make_counter_schema. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Botond Dénes	1073094f04	database: database::query(), database::apply(): remove default timeouts	2020-02-27 19:14:12 +02:00
Rafael Ávila de Espíndola	17f12a8197	perf_simple_query: Call set_abort_on_internal_error(true) We should never ignore an internal error in a perf test. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200225055745.321086-2-espindola@scylladb.com>	2020-02-26 18:22:05 +02:00
Rafael Ávila de Espíndola	c6897dcbea	perf_simple_query: Simplify with seastar::thread There is no reason not to use a seastar::thread in setup code. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200225055745.321086-1-espindola@scylladb.com>	2020-02-26 18:22:04 +02:00
Piotr Jastrzebski	ca4a89d239	dht: add dht::decorate_key and replace all dht::global_partitioner().decorate_key with dht::decorate_key It is an improvement because dht::decorate_key takes schema and uses it to obtain partitioner instead of using global partitioner as it was before. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:06 +01:00
Piotr Jastrzebski	edd7398a0c	memory_footprint_test: Make it clear it has to run with -c1 The test fails when run on number of cores different than 1. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 10:22:32 +01:00
Piotr Jastrzebski	1a8fe4befd	tests: move memory_footprint_test to perf/ Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-05 10:18:28 +01:00

1 2

57 Commits