scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 17:40:34 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	8510389188	sstables: handle unrecognized sstable component As in C*, unrecognized sstable components should be ignored when loading a sstable. At the moment, Scylla fails to do so and will not boot as a result. In addition, unknown components should be remembered when moving a sstable or changing its generation. Fixes #1780. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b7af0c28e5b574fd577a7a1d28fb006ac197aa0a.1478025930.git.raphaelsc@scylladb.com> (cherry picked from commit `53b7b7def3`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <e30115e089a4c3c3fb4aad956645c9d006c2ee55.1479141101.git.raphaelsc@scylladb.com>	2016-11-16 15:11:05 +02:00
Paweł Dziepak	91e5e50647	Merge "Remove quadratic behavior from atomic sstable deletion" from Avi "The atomic sstable deletion provides exception safety at the cost of quadratic behavior in the number of sstables awaiting deletion. This causes high cpu utilization during startup. Change the code to avoid quadratic complexity, and add some unit tests. See #1812." (cherry picked from commit `985d2f6d4a`)	2016-11-08 22:46:01 +02:00
Paweł Dziepak	210a390892	tests: add missing sstable for partition skipping test Commit `7dcd70124a` "tests/sstables: add test for fast forwarding reader" added a test for skipping parts of sstable. Unfortunately, it did not include the sstables it was trying to read. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 23:23:49 +01:00
Paweł Dziepak	0c24bbe639	tests/row_cache: add fast_forward_to() to throttled reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	69645455f3	tests/row_cache: count mutations read from _underlying Originally, cache tests checked how many times a mutation reader was created from the underlying mutation source to determine whether continuity flag is working correctly. This is not going to work with fast forwarding mutation readers so the test is switched to count number of mutations (+ end of stream markers) returned from underlying mutaiton readers which is much less fragile. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	5ac9babe97	tests/mutation_reader: test fast forwarding combined reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	7dcd70124a	tests/sstables: add test for fast forwarding reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	5534dc2817	tests: add more helpers to mutation reader assertions Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	cf024975fe	sstables: enable fast forwarding for range readers Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	f49a9e0d64	sstables: drop unused read_range_rows() overload That overload was used only by unit test and violated guarantee that partition range lives until mutation reader is done. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	25b91c51e2	ssables: add data_consume_rows_context::reset() reset() is going to be used to restore valid state after fast forwarding the reader. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Tomasz Grabiec	308434f891	tests: memtable: Add test for partition version list consistency after compaction	2016-10-18 11:57:14 +02:00
Tomasz Grabiec	d836e8f64b	tests: memtable: Add tests for flushing reader Message-Id: <1476454187-11462-1-git-send-email-tgrabiec@scylladb.com>	2016-10-14 15:11:06 +01:00
Avi Kivity	9ac441d3b5	range: adjust split_after to allow split_point outside input range Make split_after() more generic by allowing split_point to be anywhere, not just within the input range. If the split_point is before, the entire range is returned; and if it is after, stdx::nullopt is returned. "before" and "after" are not well defined for wrap-around ranges, so but we are phasing them out and soon there will not be wrapping_range::split_after() users. This is a prerequisite for converting partition_range and friends to nonwrapping_range. Message-Id: <1475765099-10657-1-git-send-email-avi@scylladb.com>	2016-10-06 17:54:44 +02:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Avi Kivity	f8118d9fc2	Merge "Virtual dirty memory management" from Glauber "Description: ============ Scylla currently suffers from a brick wall behavior of the request throttler. Requests pile up until we reach the dirty memory limit, at which point we stop serving them until we have freed enough memory to allow for more requests. The problem is that freeing dirty memory means writing an SSTable to completion. That can take a long time, even if we are blessed with great disks. Those long waiting times can and will translate into timeouts. That is bad behavior. What this patch does is introduce one form of virtual dirty memory accounting. Instead of allowing 100 % of the dirty memory to be filled up until we stop accepting requests, we will do that when we reach 50 % of memory. However, instead of releasing requests only when an SSTable is fully written, we start releasing them when some memory was written. The practical effect of that, is that once we reach 50 % occupancy in our dirty memory region, we will bring the system from CPU speed to disk speed, and will start accepting requests only at the rate we are able to write memory back. Results ======= With this patchset running a load big enough to easily saturate the disk, (commitlog disabled to highlight the effects of the memtable writer), I am able to run scylla for many minutes, with timeouts occurring only when I run out of disk space, whereas without this patch a swarm of timeouts would start merely 2 seconds after the load started - and would never get stable. In V2, I have sent a set of graphs illustrating the performance of this solution. This version does not have any significant differences in that front. For details, please refer to https://groups.google.com/d/msg/scylladb-dev/iCvD-3Z-QqY/EM8KUh_MAQAJ Accuracy of the accounting: --------------------------- It is important for us to be as accurate as possible when accounting freed memory, since every byte we mark as freed may allow one or more requests to be executed. I have measured the accuracy of this approach (ignoring padding, object size for the mutation fragments) to be 99.83 % of used memory in the test workload I have ran (large, 65k mutations). Memtables under this circumnstance tend to have a very high occupancy ratio because throttle breeds idle, and idle breeds compact-on-idle. Known Issues: ------------- A lot of time can be elapsed between destroying the flush_reader and actually releasing memory. The release of memory only happens when the SSTable is fully sealed, and we have to flush the files, as well as finish writing all SSTable components at this point. This happened in practice with a buggy kernel that would result in flushes taking a long time. After that is fixed, this is just a theoretical problem and in practice it shouldn't matter given the time we expect those operations to take." * 'virtual-dirty-v6' of github.com:glommer/scylla: database: allow virtual dirty memory management streamed_mutation: make _buffer private add accounting of memory read to partition_snapshot_reader move partition_snapshot_reader code to header file LSA: allow a group to query its own region group memtables: split scanning reader in two sstables: use special reader for writing a memtable LSA: export information about object memory footprint LSA: export information about size of the throttle queue database: export virtual dirty bytes region group	2016-10-04 20:57:52 +03:00
Glauber Costa	28e3f2f6ee	LSA: export information about object memory footprint We allocate objects of a certain size, but we use a bit more memory to hold them. To get a clerer picture about how much memory will an object cost us, we need help from the allocator. This patch exports an interface that allow users to query into a specific allocator to get that information. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Avi Kivity	a51804eca8	Merge "token_restriction: Deal with minimum tokens" from Duarte "This patch set ensures we can correctly handle queries where the minimum token is specified." * 'min-token/v3' of github.com:duarten/scylla: cql_query_test: Add test case for min/max token bounds token_restriction: Deal with minimum tokens partitioner: Parse token from bytes	2016-10-02 12:32:40 +03:00
Raphael S. Carvalho	a8ab4b8f37	lcs: fix starvation at higher levels When max sstable size is increased, higher levels are suffering from starvation because we decide to compact a given level if the following calculation results in a number greater than 1.001: level_size(L) / max_size_for_level_l(L) Fixes #1720. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-30 14:09:49 -03:00
Raphael S. Carvalho	a3bf7558f2	lcs: fix broken token range distribution at higher levels Uniform token range distribution across sstables in a level > 1 was broken, because we were only choosing sstable with lowest first key, when compacting a level > 0. This resulted in performance problem because L1->L2 may have a huge overlap over time, for example. Last compacted key will now be stored for each level to ensure sort of "round robin" selection of sstables for compactions at level >= 1. That's also done by C*, and they were once affected by it as described in https://issues.apache.org/jira/browse/CASSANDRA-6284. Fixes #1719. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-30 14:09:16 -03:00
Duarte Nunes	72af476397	cql_query_test: Add test case for min/max token bounds This patch adds a test case for specifying the minimum and maximum tokens in a cql3 query. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-30 11:45:45 +00:00
Raphael S. Carvalho	0eaa0f46c9	sstables: store first and last decorated keys in sstable object leveled strategy uses heavily first and last decorated keys of a sstable to get overlapping sstables in a given level. By storing first and last decorated keys in sstable object, it's expected that performance of leveled strategy (not compaction) will be improved. We will set first and last keys in sstable when either loading or sealing it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00
Raphael S. Carvalho	dffb41f9d8	sstables: remove schema parameter from some sstable methods schema can now be found in the sstable object itself. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00
Tomasz Grabiec	2282599394	tests: Add test for UUID type ordering Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>	2016-09-16 11:07:14 +01:00
Gleb Natapov	2e8b255741	Merge seastar upstream * seastar 0303e0c...e534401 (6): > Merge "enable rpc to work on non contiguous memory for receive" from Gleb > install-dependencies.sh: install python3 for Ubuntu/Debian, which requires for configure.py > fix tcp stuck when output_stream write more than 212992 bytes once. > scripts/posix_net_conf.sh: supress 'ls: cannot access /sys/class/net/<NIC>/device/msi_irqs/' error message > scripts/posix_net_conf.sh: fix 'command not found' error when specifies --cpu-mask > native_network_stack: Fix use after free/missing wait in dhcp Includes: "Remove utils::fragmented_input_stream and utils::input_stream in favor of seastar version" from Gleb.	2016-09-15 12:12:16 +03:00
Raphael S. Carvalho	2a426ab248	tests: add test to check tombstone metadata Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:35 -03:00
Raphael S. Carvalho	026853fabb	tests: add test to check composite validity Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:30 -03:00
Raphael S. Carvalho	1f31223f32	sstables: store schema in sstable object That will be needed for optimization that will store decorated keys in the sstable object, and also for a subsequent work that will detect wrong metadata (min/max column names) by looking at columns in the schema. As schema is stored in sstable, there's no longer a need to store ks and cf names in it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:17 -03:00
Glauber Costa	dc5d8e33af	Revert "row_cache: update sstable histograms on cache hits" This reverts commit `1726b1d0cc`. Reverting this patch turns our SSTable access counter into a miss counter only. The estimated histogram always starts its first bucket at 1, so by marking cache accesses we will be wrongly feeding "1" into the buckets. Notice that this is not yet ideal: nodetool is supposed to show a histogram of all reads, and by doing this we are changing its meaning slightly. Workloads that serve mostly from cache will be distorted towards their misses. The real solution is to use a different histogram, but we will need to enforce a newer version of nodetool for that: the current issue is that nodetool expects an EstimatedHistogram in a specific format in the other side. Conflicts: row_cache.hh Message-Id: <a599fa9e949766e7c9697450ae34fc28e881e90a.1472742276.git.glauber@scy lladb.com> Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-09-01 18:07:31 +03:00
Duarte Nunes	f4cf2f2aef	tracing: Make trace_state_ptr argument required This patch makes the optional trace_state_ptr arguments introduced in previous patches mandatory where possible. Functions which are called internally don't have a trace context, so for those we keep the argument's default value for convenience. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-01 12:04:32 +02:00
Glauber Costa	1726b1d0cc	row_cache: update sstable histograms on cache hits If we have a cache hit, we still need to update our sstable histogram - notting that we have touched 0 SSTables. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:14:22 -04:00
Paweł Dziepak	e981101fa9	Merge "Remove clustering_key_filtering_context" from Piotr "clustering_key_filtering_context is no longer needed. partition_slice can be used instead so this series removes clustering_key_filtering_context and passes partition_slice down where it's needed. Then a static get_ranges method is used to obtain clustering key ranges for a given partition. Fixes #1614."	2016-08-30 22:30:15 +01:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Paweł Dziepak	6012a7e733	mutation_partition: fix iterator invalidation in trim_rows Reversed iterators are adaptors for 'normal' iterators. These underlying iterators point to different objects that the reversed iterators themselves. The consequence of this is that removing an element pointed to by a reversed iterator may invalidate reversed iterator which point to a completely different object. This is what happens in trim_rows for reversed queries. Erasing a row can invalidate end iterator and the loop would fail to stop. The solution is to introduce reversal_traits::erase_dispose_and_update_end() funcion which erases and disposes object pointed to by a given iterator but takes also a reference to and end iterator and updates it if necessary to make sure that it stays valid. Fixes #1609. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com>	2016-08-25 16:52:35 +03:00
Paweł Dziepak	222bde7e6f	bytes_ostream: introduce upper bound on chunk size This patch makes append() and write() limit the maximum size of a single allocation to bytes_ostream::max_chunk_size. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-08-22 09:31:33 +01:00
Paweł Dziepak	0ee98ea4c4	tests: add fragmented input stream test Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-08-22 09:31:33 +01:00
Paweł Dziepak	148e9c5608	streamed_mutation_from_mutation: fix destroying bi::sets Once unlink_leftmost_without_rebalance() has been called on a bi::set no other method can be used. This includes clear_and_disposed() used by the mutation_partition destructor. We like unlink_leftmost_without_rebalance() because it is efficient, so the solution is to manually finish destroying clustering row and range tombstone sets in the reader destructor using that function. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-08-17 11:03:59 +01:00
Piotr Jastrzebski	bb0c4c3c40	Fix compilation errors query::range parameter in mutation_partiton::range has to be changed to nonwrapping_range. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <36e444bfe90586f8d3b08ca36d8dc13d5898ef97.1471347402.git.piotr@scylladb.com>	2016-08-16 12:49:54 +01:00
Duarte Nunes	be4adf212a	nonwrapping_range: Add unit tests Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-15 14:48:57 +00:00
Duarte Nunes	bb16e194bc	range: Add nonwrapping_range class This patch introduces the nonwrapping_range class. This class is intended to be used by code that requires non wrapping ranges. Internally, it uses a wrapping_range. Users are responsible for ensuring the bounds are correct when creating a nonwrapping_range. The path proposed here is to incrementally replace usages of wrapping_range/range by nonwrapping_range, pushing usages of wrapping ranges as further to the edges as possible. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-15 14:48:57 +00:00
Tomasz Grabiec	d7f8ce7722	Merge branch 'raphael/fix_min_max_metadata_v2' from git@github.com:raphaelsc/scylla.git Fix for generation of sstables min/max clustering metadata from Raphael.	2016-08-10 10:43:35 +02:00
Raphael S. Carvalho	8deb1ca19d	tests: add test to check sstables's min and max clustering values Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-08-09 15:54:40 -03:00
Calle Wilund	1593f4254b	flush_queue_test: Start semaphore in propagation tests not initialized Somehow, all my local runs timed ok anyway, but obviously not on all machines. Message-Id: <1470727968-1759-1-git-send-email-calle@scylladb.com>	2016-08-09 09:35:28 +02:00
Avi Kivity	98226a14ac	Merge "Exception propagation writers in commitlog batch" " While periodic mode is a all-bets-off crap-shoot as far as knowing if data actually reached disk or not, batch mode is supposed to be somewhat more reliable/deterministic. Thus, if we get an exception writing/flushing the current buffer, we should propagate exceptions to all execution paths involved in this buffer. Flush queue can now (optionally) propagate exceptions to all clients, and commit log uses this to ensure that commit log writers in batch mode all generate exceptions on disk errors. Also includes some rudimentary tests for flush queue mechanisms. Note: other main user, sstable flushing, is not affected, as default mode is still to keep exceptions to individual worker continuations, not waiters."	2016-08-08 15:33:26 +03:00
Nadav Har'El	fc063ae62d	tests: add test for promoted index writing In this unit test, we create using Scylla C++ code, the same large partition with 13520 CQL rows as we previously imported from Cassandra for the large partition test. We then verify that the sstable index file we just wrote is byte-for-byte identical to the one previously created by Cassandra. They should indeed be identical, because the data file has the same layout (even if timestamps are different) and our default promoted- index block size is the same (64K) so the sample of columns should be identical. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:15 +03:00
Nadav Har'El	1975abbfd6	sstables: disk_read_range Currently, the main sstable data parsing entry point data_consume_rows() takes a contiguous range of bytes to read from disk and parse. This range is supposed to be an entire partition or contiguous group of partitions. and is self contained (can be parsed without extra information about the identity of these partitions). For the promoted index feature (which we will add in a following patch) we will want the range to span only a part of a partition, and will need the caller to provide some information not available to the parser (such as the partition's key). In the future, we will also want to support a vector of byte ranges, instead of just one. So in preparation for this, this patch simply replaces the start/end pair by a new class disk_read_range, which can be easily extended in later patches. No new functionality is introduced in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:02 +03:00
Nadav Har'El	3dd079fb7a	tests: add test for reading parts of a large partition This patch adds a test that takes an sstable with one partition of 13,520 clustering rows (spanning 700 KB in the data file), and attempts to read various slices CQL rows, counting that we got back the expected number of rows. The sstable included here was generated by Cassandra, and includes a promoted index. Promoted index reading is not supported yet (we will add it in the next patch), so for now the code will always read the entire partition from disk; But still the clustering-key filtering is already functional, and will drop some of the rows as requested, so this test will pass. Later, when we add promoted index support, we should check that this test still passes - promoted index will make the reads in this test more efficient (which the test cannot verify), but the important thing to check is that it doesn't break any of these tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:46:59 +03:00
Calle Wilund	9098eed30b	flush_queue_test: Add tests for exception propagation v2: * Remove leading "_" in template types	2016-08-03 14:49:43 +00:00
Duarte Nunes	db1118e4f7	database_test: Add case for partition limit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-02 22:11:15 +00:00

1 2 3 4 5 ...

1206 Commits