scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Avi Kivity	dc6be68852	Merge "promoted index for reading partial partitions" from Nadav "The goal of this patch series is to support reading and writing of a "promoted index" - the Cassandra 2.* SSTable feature which allows reading only a part of the partition without needing to read an entire partition when it is very long. To make a long story short, a "promoted index" is a sample of each partition's column names, written to the SSTable Index file with that partition's entry. See a longer explanation of the index file format, and the promoted index, here: https://github.com/scylladb/scylla/wiki/SSTables-Index-File There are two main features in this series - first enabling reading of parts of partitions (using the promoted index stored in an sstable), and then enable writing promoted indexes to new sstables. These two features are broken up into smaller stand-alone pieces to facilitate the review. Three features are still missing from this series and are planned to be developed later: 1. When we fail to parse a partition's promoted index, we silently fall back to reading the entire partition. We should log (with rate limiting) and count these errors, to help in debugging sstable problems. 2. The current code only uses the promoted index when looking for a single contiguous clustering-key range. If the ck range is non-contiguous, we fall back to reading the entire partition. We should use the promoted index in that case too. 3. The current code only uses the promoted index when reading a single partition, via sstable::read_row(). When scanning through all or a range of partitions (read_rows() or read_range_rows()), we do not yet use the promoted index; We read contiguously from data file (we do not even read from the index file, so unsurprisingly we can't use it)." (cherry picked from commit `700feda0db`)	2016-08-09 17:54:15 +03:00
Paweł Dziepak	e95f4eaee4	Merge "partition_limit: Don't count dead partitions" from Duarte "This patch series ensures we don't count dead partitions (i.e., partitions with no live rows) towards the partition_limit. We also enforce the partition limit at the storage_proxy level, so that limits with smp > 1 works correctly." (cherry picked from commit `5f11a727c9`)	2016-08-03 12:44:32 +03:00
Tomasz Grabiec	b224ff6ede	Merge 'pdziepak/row-cache-wide-entries/v4' from seastar-dev.git This series adds the ability for partition cache to keep information whether partition size makes it uncacheable. During, reads these entries save us IO operations since we already know that the partiiton is too big to be put in the cache. First part of the patchset makes all mutation_readers allow the streamed_mutations they produce to outlive them, which is a guarantee used later by the code handling reading large partitions. (cherry picked from commit `d2ed75c9ff`)	2016-08-02 20:24:29 +02:00
Piotr Jastrzebski	6960fce9b2	Use continuity flag correctly with concurrent invalidations Between reading cache entry and actually using it invalidations can happen so we have to check if no flag was cleared if it was we need to read the entry again. Fixes #1464. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <7856b0ded45e42774ccd6f402b5ee42175bd73cf.1469701026.git.piotr@scylladb.com> (cherry picked from commit `fdfd1af694`)	2016-08-02 20:24:22 +02:00
Duarte Nunes	d11b0cac3b	sstable_mutation_test: Test non-compound cell name This patch adds a test case for reading non-compound cell names, validating that such a cell is not incorrectly marked as static. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1469616205-4550-5-git-send-email-duarte@scylladb.com>	2016-07-28 12:11:37 +02:00
Tomasz Grabiec	7d73599acd	tests: lsa_async_eviction_test: Use chunked_fifo<> To protect against large reallocations during push() which are done under reclaim lock and may fail.	2016-07-28 09:43:51 +02:00
Piotr Jastrzebski	bf27379583	Add tests for wide partiton handling in cache. They shouldn't be cached. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> (cherry picked from commit `7d29cdf81f`)	2016-07-27 14:09:45 +03:00
Paweł Dziepak	4e43cb84ff	mests/sstables: test reading sstable with duplicated range tombstones Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> (cherry picked from commit `b405ff8ad2`)	2016-07-27 14:09:02 +03:00
Paweł Dziepak	a39bec0e24	tests: extract streamed_mutation assertions Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> (cherry picked from commit `50469e5ef3`)	2016-07-27 14:05:43 +03:00
Raphael S. Carvalho	66ebef7d10	tests: add new test for date tiered strategy This test set the time window to 1 hour and checks that the strategy works accordingly. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `cf54af9e58`)	2016-07-21 12:00:26 +03:00
Raphael S. Carvalho	7b9cf528ad	tests: fix occassional failure in date tiered test That was a bug in the test itself. It could happen that a sstable would incorrectly belong to the next time window if the current minute is approaching its end. Fix is about having all sstables that we want in the same time window with the same min/max timestamp. Fixes #1448. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <ee25d49e7ed12b4cf7d018a08163404c3d122e56.1468782787.git.raphaelsc@scylladb.com>	2016-07-18 15:18:29 +02:00
Duarte Nunes	9792a77266	range: Add deoverlap function This patch adds the deoverlap function to range.hh, which takes in a vector of possibly overlapping ranges and returns a vector of non-overlapping ranges covering the same values. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-14 18:20:41 +02:00
Tomasz Grabiec	7227c537ce	Merge branch 'pdziepak/streamed-mutations-hashing/v5' from seastar-dev.git From Paweł: This is another episode in the "convert X to streamed mutations" series. Hashing mutations (mainly for repair) is converted so that it doesn't need to rebuild whole mutation. The first part of the series changes the way streamed mutations deal with range tombstones. Since it is not necessary to make sure we write disjoint tombstones to sstables there is no need anymore for streamed mutations to produce disjoint tombstones and, consequently, no need for range tombstones to be split into range_tombstone_begin and range_tombstone_end. The second part is the actual hashing implementation. However, to ensure that the hash depends only on the contents of the mutation and no the way it is stored in different data sources range tombstones have to be made disjoint before they are hashed. This series also ensures that any changes caused by streamed mutations to hashing and streaming do not break repair during upgrade.	2016-07-13 11:24:00 +02:00
Duarte Nunes	674afc52bc	compound_test: Test singular composite_view::explode() This patch adds a test case for composite_view::explode() called on a non-compound composite. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1468353393-3074-1-git-send-email-duarte@scylladb.com>	2016-07-13 11:23:24 +02:00
Paweł Dziepak	c5662919df	tests/streamed_mutation: test hashing Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Paweł Dziepak	eb1dcf08e7	tests/streamed_mutation: add test for range_tombstones_stream Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Paweł Dziepak	93cc4454a6	streamed_mutation: emit range_tombstones directly Originally, streamed_mutations guaranteed that emitted tombstones are disjoint. In order to achieve that two separate objects were produced for each range tombstone: range_tombstone_begin and range_tombstone_end. Unfortunately, this forced sstable writer to accumulate all clustering rows between range_tombstone_begin and range_tombstone_end. However, since there is no need to write disjoint tombstones to sstables (see #1153 "Write range tombstones to sstables like Cassandra does") it is also not necessary for streamed_mutations to produce disjoint range tombstones. This patch changes that by making streamed_mutation produce range_tombstone objects directly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:18 +01:00
Duarte Nunes	0b87d16699	composite: Add unit tests Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-11 16:55:11 +02:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Paweł Dziepak	cba996a3ea	Merge "Implement missing functions for byte_ordered_partitioner" from Asias	2016-07-08 10:49:25 +01:00
Asias He	9c27b5c46e	byte_ordered_partitioner: Implement missing describe_ownership and midpoint In order to support ByteOrderedPartitioner, we need to implement the missing describe_ownership and midpoint function in byte_ordered_partitioner class. As a starter, this path uses a simple node token distance based method to calculate ownership. C* uses a complicated key samples based method. We can switch to what C* does later. Tests are added to tests/partitioner_test.cc. Fixes #1378	2016-07-08 17:44:55 +08:00
Paweł Dziepak	a7b6c1110f	sstables: do not require seal_sstable() to be run in thread Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Paweł Dziepak	4e34bd4e8a	tests/streamed_mutation: test fragment_and_freeze() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Raphael S. Carvalho	b5ec4d46c6	tests: add test for date tiered compaction strategy Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	cab2892866	tests: add test for sstables::get_fully_expired_sstables Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	69b3860662	tests: add test for leveled_manifest::overlapping Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:45 -03:00
Raphael S. Carvalho	1118cfc51a	tests: test that sstable max_local_deletion_time is properly updated Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:13:34 -03:00
Avi Kivity	2a46410f4a	Change sstable_list from a map to a set sstable_list is now a map<generation, sstable>; change it to a set in preparation for replacing it with sstable_set. The change simplifies a lot of code; the only casualty is the code that computes the highest generation number.	2016-07-03 10:26:57 +03:00
Avi Kivity	1b448877d7	Merge " thrift: Implement CQL over thrift" from Duarte "This patchset implements the CQL over thrift verbs. Only CQL3 is supported, and the CQL2 verbs are disabled."	2016-06-28 13:36:12 +03:00
Piotr Jastrzebski	68e5a199e9	Clean continuous flag of cache entry preceeding invalidated decorated key even when it's not found. Add test. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c7b8f4df37256363bf304e0396f84b5f37921b81.1467059472.git.piotr@scylladb.com>	2016-06-28 10:26:02 +02:00
Duarte Nunes	c8afb4cc46	query_processor: Support thrift prepared statements This patch adds support for thrift prepared statements. It specializes the result_message::prepared into two types: result_message::prepared::cql and result_message::prepared::thrift, as their identifiers have different types. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-27 15:39:02 +02:00
Duarte Nunes	1ffae6e6ee	database_test: Add test case for row limit This patch introduces database_test and adds a test case to ensure the row limit is respected when querying multiple partition ranges. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20160623111723.17523-1-duarte@scylladb.com>	2016-06-23 14:20:34 +02:00
Duarte Nunes	aacc7193f2	schema: Replace keyspace's schema_ptr on CF update This patch ensures we replace the schema_ptr held by its respective keyspace object when a column family is being updated. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20160623085710.26168-1-duarte@scylladb.com>	2016-06-23 11:11:52 +02:00
Piotr Jastrzebski	9b011bff18	row_cache: add contiguity flag to cache entry to reduce disk IO during scans Add contiguity flag to cache entry and set it in scanning reader. Partitions fetched during scanning are continuous and we know there's nothing between them. Clear contiguity flag on cache entries when the succeeding entry is removed. Use continuous flag in range queries. Don't go do disk if we know that there's nothing between two entries we have in cache. We know that when continuous flag of the first one is set to true. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <72bae432717037e95d1ac9465deaccfa7c7da707.1466627603.git.piotr@scylladb.com>	2016-06-23 09:43:15 +03:00
Duarte Nunes	69798df95e	query: Limit number of partitions returned This is required to implement a thrift verb. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:48:13 +02:00
Tomasz Grabiec	597cbbdedc	Merge branch 'pdziepak/streamed-mutations/v5' from seastar-dev.git From Paweł: This series introduces streaming_mutations which allow mutations to be streamed between the producers and the consumers as a series of mutation_fragments. Because of that the mutation streaming interface works well with partitions larger than available memory provided that actual producer and consumer implementations can support this as well. mutation_fragments are the basic objects that are emitted by streamed_mutations they can represent a static row, a clustering row, the beginning and the end of a range tombstone. They are ordered by their clustering keys (with static rows being always the first emitted mutation fragment). The beginning of range tombstone is emitted before any clustering row affected by that tombstone and the end of range tombstone is emitted after the last clustering row affected by it. Range tombstones are disjoint. In this series all producers are converted to fully support the new interface, that includes cache, memtables and sstables. Mutation queries and data queries are the only consumers converted so far. To minimize the per-mutation_fragment overhead streamed_mutations use batching. The actual producer implementation fills a buffer until it is full (currently, buffer size is 16, the limit should, however, be changed to depend on the actual size in memory of the stored elements) or end of stream is reached. In order to guarantee isolation of writes reads from cache and memtable use MVCC. When a reader is created it takes a snapshot of the particular cache or memtable entry. The snapshot is immutable and if there happen to be any incoming writes while the read is active a new version of partition is created. When the snapshot is destroyed partition versions are merged together as much as possible. Performance results with perf_simple_query (median of results with duration 15): before after diff write 618652.70 618047.58 -0.10% read 661712.44 608070.49 -8.11%	2016-06-21 12:15:21 +02:00
Tomasz Grabiec	e783b58e3b	Merge branch 'glommer/LSA-throttler-v6' from git@github.com:glommer/scylla.gi From Glauber: This is my new take at the "Move throttler to the LSA" series, except this one don't actually move anything anywhere: I am leaving all memtable conversion out, and instead I am sending just the LSA bits + LSA active reclaim. This should help us see where we are going, and then we can discuss all memtable changes in a series on its own, logically separated (and hopefully already integrated with virtual dirty). [tgrabiec: trivial merge conflicts in logalloc.cc]	2016-06-21 10:22:26 +02:00
Glauber Costa	7f29cb8aba	tests: add logalloc tests for pressure notification tests to make sure varios scenarios of pressure notification for active asynchronous reclaim work. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-20 18:58:39 -04:00
Glauber Costa	8f5047fc5f	tests: add tests to new region_group throttle interface Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-20 18:51:00 -04:00
Paweł Dziepak	a3423bac38	tests/streamed_mutation: test freezing streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:52 +01:00
Paweł Dziepak	494c6fa9c1	tests/mutation_query_test: make sure mutations are sliced properly Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:52 +01:00
Paweł Dziepak	983321f194	tests/mutation: do not create memtable on stack Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	4a5a9148e3	tests/row_cache: test slicing mutation reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	e1a8d94542	tests/row_cache: test mvcc Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	e4ae7894d4	tests/mutation: test slicing mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	4992ea9949	tests: add test for anchorless_list Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	f991a2deb5	tests/row_cache_alloc_stress: use another memtable for underlying storage It is incorrect to update row_cache with a memtable that is also its underlying storage. The reason for that is that after memtable is merged into row_cache they share lsa region. Then when there is a cache miss it asks underlying storage for data. This will result with memtable reader running under row_cache allocation section. Since memtable reader also uses allocation section the result is an assertion fault since allocation sections from the same lsa region cannot be nested. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	5a5c519fa0	tests/row_cache_alloc_stress: use large cells instead of many rows With streamed_mutations a partition with many small rows doesn't stress the cache as much as the test expects. Use large clustering rows instead. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	71e961427a	test/sstables: test reading sstables with incorrect ordering Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	b6f78a8e2f	sstable: make sstable reads return streamed_mutation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00

1 2 3 4 5 ...

1148 Commits