scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	7e06499458	repair: convert hashing to streamed_mutations This patch makes hashing for repair calculate checksums in a way that doesn't require rebuilding whole mutation. Unfortunately, such checksums are incompatible with the old ones so the old way for computing checksums is preserved for compatibility reasons. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Paweł Dziepak	e779e2f0c9	streaming: do not fragment mutations in mixed cluster The receiving side needs to handle fragmented mutations properly so that isolation guarantees are not broken. If the receiving node may be an old one do not fragment mutations. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Paweł Dziepak	85c092c56c	storage_service: add LARGE_PARTITIONS_FEATURE Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Paweł Dziepak	c5662919df	tests/streamed_mutation: test hashing Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Paweł Dziepak	fe172484bd	streamed_mutation: add mutation_hasher mutation_hasher is a consumer of streamed_mutation that feeds its data to a specified hasher. It is not compatible with hashing_partition_visitor. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Paweł Dziepak	eb1dcf08e7	tests/streamed_mutation: add test for range_tombstones_stream Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Paweł Dziepak	93cc4454a6	streamed_mutation: emit range_tombstones directly Originally, streamed_mutations guaranteed that emitted tombstones are disjoint. In order to achieve that two separate objects were produced for each range tombstone: range_tombstone_begin and range_tombstone_end. Unfortunately, this forced sstable writer to accumulate all clustering rows between range_tombstone_begin and range_tombstone_end. However, since there is no need to write disjoint tombstones to sstables (see #1153 "Write range tombstones to sstables like Cassandra does") it is also not necessary for streamed_mutations to produce disjoint range tombstones. This patch changes that by making streamed_mutation produce range_tombstone objects directly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:18 +01:00
Paweł Dziepak	c3a8539074	streamed_mutation: add more comparators to position_in_partition Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:08 +01:00
Paweł Dziepak	27fea7bf2c	mutation_partition: add non-cons rows and tombstones accessors Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:07 +01:00
Paweł Dziepak	2208d4b53e	range_tombstone_list: add non-const begin() and end() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:07 +01:00
Paweł Dziepak	5a790a9b49	range_tombstone: add flip() range_tombstone::flip() flips range bounds. This is necessary in order to use range tombstone in reversed mutation fragment streams. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:07 +01:00
Paweł Dziepak	e1d306fa0d	range_tombstone: add memory_usage() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:07 +01:00
Paweł Dziepak	91a866501d	range_tombstone: add range_tombstone_accumulator range_tombstone_accumulator is a helper class that allows determining tombstone for a clustering row when range tombstones and clustering rows are streamed from streamed_mutation. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:07 +01:00
Paweł Dziepak	cd7937d33b	range_tombstone: add apply() range_tombstone::apply() allows merging two, possibly overlapping, range tombstones with the same start bound and produces one or two disjoint range tombstones as a result. It is intended to be used for merging tombstones coming from different sources. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:50:07 +01:00
Nadav Har'El	aec90a22da	sstable parsing: assert we do not lose clustering rows The sstable parsing code calls mp_row_consumer::flush() after every clustering row has been read, and this puts the now complete row in a single field "_ready". The assumption is that at this point parsing will stop, the consumer will move out this _ready (mp_row_consumer::get_mutation_fragment()) and when flush() is later called again, _ready will be empty again. This assumption is correct in our code, but is based on an intricate combination of estoreric parts of the code, such as: 1. In data_consume_row_context we stop parsing after reading the parition's header, before reading any clustering rows, giving the caller the chance to call sstable_streamed_mutation::read_next() to be prepared for the incoming mutations. 2. In mp_row_consumer::flush_if_needed(), we stop the parser after each individual clustering row. It is easy to break this assumption, and I did this in one of my code changes, and the result was silent loss of clustering rows, as "_ready" got silently overwritten before the reader had a chance to move it out. What this patch does is to add an assertion: If a clustering row is silently lost before being transferred to the mutation fragment reader, we croak. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1468389955-24600-1-git-send-email-nyh@scylladb.com>	2016-07-13 09:42:48 +01:00
Duarte Nunes	4eca7632ec	sstables: Replace composite fields with raw bytes This patch fixes a regression introduced in `f81329be60`, which made keys compound by default when using a particular ctor, in turn leading to mismatches when comparing the same key built with functions that properly consider compoundness. As a temporary fix, the sstable::key and sstable::key_view classes store raw bytes instead of a composite. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1468339295-3924-1-git-send-email-duarte@scylladb.com>	2016-07-12 18:08:04 +02:00
Duarte Nunes	f013425bb5	query: Ensure timestamp is last param in read_command Since the timestamp is not serialized, it must always be the last parameter of query::read_command. This patch reorders it with the partition_limit parameters and updates callers that specified a timestamp argument. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1468312334-10623-1-git-send-email-duarte@scylladb.com>	2016-07-12 10:41:54 +01:00
Tomasz Grabiec	c5e3c9bc35	Merge branch 'duarten/composite-v7' from git@github.com:duarten/scylla.git From Duarte: This patchset adds a representation of a legacy composite value to compound_compat.hh and replaces the one in sstables/key.hh. This patchset is needed for the thrift series.	2016-07-12 10:49:02 +02:00
Glauber Costa	73a70e6d0a	config: Use Scylla in user visible options We have imported most of our data about config options from Cassandra. Due to that, many options that mention the database by name are still using "Cassandra". Specially for the user visible options, which is something that a user sees, we should really be using Scylla here. This patch was created by automatically replacing every occurrence of "Cassandra" with "Scylla" and then later on discarding the ones in which the change didn't make sense (such as Unused options and mentions to the Cassandra documentation) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <1423e1d7e36874a1f46bd091aec96dcb4d8482d9.1468267193.git.glauber@scylladb.com>	2016-07-12 09:18:17 +03:00
Duarte Nunes	f81329be60	sstables: sstables::key delegates to composite The sstables::key class now delegates much of its functionality to the composite class. All existing behavior is preserved. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-11 23:37:33 +02:00
Gleb Natapov	726b79ea91	messaging_service: enable internode_compression option Use LZ4 for internode compression if enabled. Message-Id: <20160711141734.GZ18455@scylladb.com>	2016-07-11 18:30:21 +03:00
Avi Kivity	201f585ab6	Merge seastar upstream * seastar e7a7d41...e660d54 (1): > rpc: add factory class for lz4 compressor	2016-07-11 18:29:43 +03:00
Glauber Costa	f7706d51d1	scyllatop: fix typo Keyborad -> Keyboard Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <349f20fd69be2f2e05ae0b7800e34a336cd2472b.1468248179.git.glauber@scylladb.com>	2016-07-11 18:27:49 +03:00
Duarte Nunes	ad8ff1df7e	sstables: Replace composite class This patch replaces the sstables::composite class with the one in compound_compat.hh. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-11 16:55:11 +02:00
Duarte Nunes	0b87d16699	composite: Add unit tests Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-11 16:55:11 +02:00
Duarte Nunes	b179d8d378	compound_compat: Parse legacy compound values This patch adds support for parsing legacy compound values by introducing the composite class, a wrapper around a sequence of bytes serialized in the legacy format for compounds. Compound values can be sent though the thrift API. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-11 16:55:07 +02:00
Avi Kivity	9b08ddb639	Merge seastar upstream * seastar 9267dfa...e7a7d41 (3): > Merge "Compression support for RPC" from Gleb > reactor: allow sleeping while disk aio is pending > sstring: add resize method	2016-07-11 16:23:29 +03:00
Calle Wilund	4ab03e98cf	commitlog: Ensure we don't end up in a loop when we must wait for alloc Continuation reordering could cause us to repeatedly see the segment-local flag var even though actual write/sync ops are done. Can cause wild recursion without actual delayed continuation -> SOE. Fix by also checking queue status, since this is the wait object. Message-Id: <1468234873-13581-1-git-send-email-calle@scylladb.com>	2016-07-11 14:12:38 +03:00
Avi Kivity	f126efd7f2	transport: encode user-defined type metadata Right now we fall back to tuples, which confuses the client. Fixes #1443. Reviewed-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1468167120-1945-1-git-send-email-avi@scylladb.com>	2016-07-11 08:51:17 +03:00
Takuya ASADA	d2caa486ba	dist/redhat/centos_dep: disable go and ada language on scylla-gcc package, since ScyllaDB never use them centos-master jenkins job failed at building libgo, but we don't need go language, so let's disable it on scylla-gcc package. Also we never use ada, disable it too. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1468166660-23323-1-git-send-email-syuu@scylladb.com>	2016-07-10 19:12:52 +03:00
Avi Kivity	24e3026e32	Merge "compaction manager refactoring" from Raphael	2016-07-10 17:16:23 +03:00
Tomasz Grabiec	6a1f9a9b97	db: Improve logging Message-Id: <1467997671-16570-1-git-send-email-tgrabiec@scylladb.com>	2016-07-10 16:15:03 +03:00
Avi Kivity	b5bef73ad2	Merge "Avoiding checking bloom filters during compaction" from Tomasz "Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322."	2016-07-10 11:33:41 +03:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Tomasz Grabiec	c0233c877d	db: Avoid out-of-memory when flushing cannot keep up memtable_list::seal_on_overlflow() is called on each mutation to check if current memtable should be flushed. It will call memtable_list::seal_active_memtable() when that is the case. The number of concurrent seals is guarded by a semaphore, starting from commit `0f64eb7e7d`, and allows at most 4 of them. If there are 4 flushes already pending, every incoming mutation will enqueue a new flush task on the semaphore's wait list, without waiting for it. The wait queue can grow without bounds, eventually leading to out-of-memory. The fix is to seal the memtable immediately to satisfy should_flush() condition, but limit concurrency of actual flushes. This way the wait queue size on the semaphore is limited by memtables pending a flush, which is fairly limited. Message-Id: <1467997652-16513-1-git-send-email-tgrabiec@scylladb.com>	2016-07-10 10:53:51 +03:00
Tomasz Grabiec	74ff30a31a	mutation_reader: Introduce stable_flattened_mutations_consumer adaptor Needed to make compact_mutation class non-movable later. It is used in do_with, so needs to be movable. Will be solved by using this adaptor.	2016-07-09 22:31:28 +02:00
Tomasz Grabiec	fb44f895b2	mutation_reader: Name template parameters after concepts With so many consumer concepts out there, it is confusing to name parameters using genering "Consumer" name, let's name them after (already defined) concepts: CompactedMutationsConsumer, FlattenedConsumer.	2016-07-09 22:31:27 +02:00
Raphael S. Carvalho	ed5e7e6842	compaction: refactor compaction manager Previously, same function was used to handle both regular compaction and cleanup requests. That's bad because a lot of conditions were added for both compaction types to live in the same function. Now, cleanup and regular compaction will live in different functions. They share a lot of code, so helper functions were introduced. This change is also important for user-initiated compaction that will go through compaction manager in the future. Code is also a lot easier to read now. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 16:37:53 -03:00
Raphael S. Carvalho	da6a2b429d	compaction: add functions to register and deregister compacting sstables Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 16:00:51 -03:00
Raphael S. Carvalho	4d6dce8ec9	compaction: add helper function to get candidates for strategy Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 15:06:14 -03:00
Raphael S. Carvalho	e38f66c6fe	database: make certain column family functions const qualified Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 15:05:22 -03:00
Raphael S. Carvalho	bfc5376548	compaction: remove gate from compaction manager task There is no longer a need to use gate for regular termination of fiber that runs compaction. Now, we only set task->stopping to true, ask for compaction termination, and wait for its future to resolve. Code is simplified a lot with this change. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 15:05:10 -03:00
Paweł Dziepak	cba996a3ea	Merge "Implement missing functions for byte_ordered_partitioner" from Asias	2016-07-08 10:49:25 +01:00
Asias He	f4389349e4	config: Enable partitioner option Enable --partitioner option so that user can choose partitioner other than the default Murmur3Partitioner. Currently, only Murmur3Partitioner and ByteOrderedPartitioner are supported. When non-supported partitioner is specifed, error will be propogated to user.	2016-07-08 17:44:55 +08:00
Asias He	9c27b5c46e	byte_ordered_partitioner: Implement missing describe_ownership and midpoint In order to support ByteOrderedPartitioner, we need to implement the missing describe_ownership and midpoint function in byte_ordered_partitioner class. As a starter, this path uses a simple node token distance based method to calculate ownership. C* uses a complicated key samples based method. We can switch to what C* does later. Tests are added to tests/partitioner_test.cc. Fixes #1378	2016-07-08 17:44:55 +08:00
Asias He	e0949a8f4f	storage_service: Exit shadow round state if it fails If a node fails to talk to any seed node, shadow round will fail. We should exit shadow round state before we continue. This issue is spotted by consistency_test.TestConsistency.data_query_digest_test dtest. Message-Id: <ba0613532a69bac369ca316ab61d907b320c8e68.1467963674.git.asias@scylladb.com>	2016-07-08 10:05:07 +01:00
Avi Kivity	8dab93a853	sstables: fix low disk utilization with compression and small chunk lengths As Nadav notes we use the chunk length as the buffer size for the compressed stream too. Fix by using it only for the outer (uncompressed) stream; the inner (compressed) stream uses the sstable buffer size, 128 kiB. Fixes #1402. Message-Id: <1467910556-5759-1-git-send-email-avi@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2016-07-07 18:13:30 +01:00
Vlad Zolotarov	f2bf453be2	database: revive mutation retry in case of replay_position_reordered_exception The logic that would retry applying a mutation in case of a replay_position_reordered_exception error was broken by a commit `0c31f3e626` Author: Glauber Costa <glauber@scylladb.com> Date: Wed Apr 20 19:09:21 2016 -0400 database: move memtable throttler to the LSA throttler This patch makes it work again. Fixes #1439 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1467893342-30559-1-git-send-email-vladz@cloudius-systems.com>	2016-07-07 15:00:35 +02:00
Tomasz Grabiec	de429d6a53	Merge branch 'dev/pdziepak/streamed-mutations-streaming/v3' Support for streaming of large partitions from Paweł: This series converts streaming to streaming_mutations so that there is need to store full mutation in memory in order to send or receive it. The first several patches add a way of estimating mutation fragment memory usage and introduce fragment_and_freeze() which produces a stream of reasonably sized frozen mutations from a single streamed mutation. The second part of this patchset makes sure that streaming mutations in fragments doesn't break isolation guarantees. This is achieved by delaying visibility of sstables produced by streaming until the streaming is completed. However, our current receiving code merges mutations from all streaming plans together thus making it impossible to track which data was received from a particular streaming plan. The solution to that problem is to introduce an additional flag to STREAM_MUTATION verb which informs the receiver whether the mutation is fragmented and care must be taken to preserve isolation. Small mutations behaved as they were, with writes from different stream plans coalesced while big mutations are handled separately for each streaming task.	2016-07-07 13:23:39 +02:00
Paweł Dziepak	d9eb4d8028	streaming: use fragment_and_freeze() to send mutations Commit `206955e4` "streaming: Reduce memory usage when sending mutations" moved streaming mutation limiter from do_send_mutations() to send_mutations(). The reason for that was that send_mutation() did full mutation copies. That's no longer the case and streaming limiter should be moved back to do_send_mutation() in order to provide back pressure to fragment_and_freeze(). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:36 +01:00

1 2 3 4 5 ...

9876 Commits