scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	f7a143e7be	sstables: fix report of disk space used by bloom filter After change in boot, read_filter is called by distributed loader, so its update to _filter_file_size is lost. The load variant which receives foreign components that must do it. We were also not updating it for newly created sstables. Fixes #2449. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170606151129.5477-1-raphaelsc@scylladb.com> (cherry picked from commit `0ca1e5cca3`)	2017-06-06 19:00:00 +03:00
Avi Kivity	eb2fe0fbd3	Merge "reduce memory requirement for loading sstables" from Rapahel "fixes a problem in which memory requirement for loading in-memory components of sstables is very high due to unlimited parallelism." * 'mem_requirement_sstable_load_v2_2' of github.com:raphaelsc/scylla: database: fix indentation of distributed_loader::open_sstable database: reduce memory requirement to load sstables sstables: loads components for a sstable in parallel sstables: enable read ahead for read of in-memory components sstables: make random_access_reader work with read ahead (cherry picked from commit `ef428d008c`)	2017-05-25 12:59:55 +03:00
Calle Wilund	50c8a08e91	scylla: fix compilation errors on gcc 5 Message-Id: <1495030581-2138-1-git-send-email-calle@scylladb.com> (cherry picked from commit `6ca07f16c1`)	2017-05-17 18:04:58 +03:00
Tomasz Grabiec	e2c75d8532	Merge "Fix performance problems with high shard counts tag" from Avi From http://github.com/avikivity/scylla exponential-sharder/v3. The sharder, which takes a range of tokens and splits it among shards, is slow with large shard count and the default murmur3_partitioner_ignore_msb_bits. This patchset fixes excessive iteration in sstable sharding metadata writer and nonsignular range scans. Without this patchset, sealing a memtable takes > 60 ms on a 48-shard system. With the patchset, it drops below the latency tracker threshold I used (5 ms). Fixes #2392. (cherry picked from commit `84648f73ef`)	2017-05-17 16:19:24 +03:00
Raphael S. Carvalho	1d26fab73e	sstables: add method to export ancestors Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-05-01 11:09:42 +03:00
Raphael S. Carvalho	82cc3d7aa5	dtcs: do not compact fully expired sstable which ancestor is not deleted yet Currently, fully expired sstable[1] is unconditionally chosen for compaction by DTCS, but that may lead to a compaction loop under certain conditions. Let's consider that an almost expired sstable is compacted, and it's not deleted yet, and that the new sstable becomes expired before its ancestor is deleted. Because this new sstable is expired, it will be chosen by DTCS, but it will not be purged because 'compacted undeleted' sstables are taken into account by calculation of max purgeable timestamp and prevents expired data from being purged. The problem is that this sequence of events can keep happening forever as reported by issue #2260. NOTE: This problem was easier to reproduce before improvement on compaction of expired cells, because fully expired sstable was being converted into a sstable full of tombstones, which is also considered fully expired. Fixes #2260. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170428233554.13744-1-raphaelsc@scylladb.com> (cherry picked from commit `687a4bb0c2`)	2017-04-30 19:36:00 +03:00
Tomasz Grabiec	8a21961ec9	sstables: Fix incorrect parsing of cell names in promoted index Range tombstones are serialized to cell names in this place: _sst.maybe_flush_pi_block(_out, start, {}); Note that the column set is empty. This is correct. A range tombstone only has a clustering part. The cell name is deserialized by promoted index reader using mp_row_consumer::column, like this: mp_row_consumer::column col(schema, std::move(col_name), api::max_timestamp); return std::move(col.clustering); The problem is, column constructor assumes that there is always a component corresponding to a cell name if the table is not dense, and will pop it from the set of components (the clustering field): , cell(!schema.is_dense() ? pop_back(clustering) : (*(schema.regular_begin())).name()) promoted index block which starts or ends with a range tombstone will appear as having incorrect bounds. This may result in an incorrect value for data file range start to be calculated. Fixes #2327.	2017-04-27 18:30:00 +02:00
Tomasz Grabiec	08698d9030	sstables: Fix find_disk_ranges() to not miss relevant range tombstones Suppose the promoted index looks like this: block0: start=1 end=2 block1: start=4 end=5 start and end are cell names of the first and last cell in the block. If there is a range tombstone covering [2,3], it will be only in block0, because it is no longer in effect when block1 starts. However, slicing the index for [3, +inf], which intersects with the tombstone, will yield block1. That's because the slicing looks for a block with an end which is greater than or equal to the start of the slice: if (!found_range_start) { if (!range_start \|\| cmp(range_start->value(), end_ck) <= 0) { range_start_pos = ie.position() + offset; We should take into account that any given block may actually contain information for anything up to the start of the next block, so instead of using end_ck, effectively use next block's start_ck (exclusive). Fixes #2326.	2017-04-27 18:30:00 +02:00
Tomasz Grabiec	df5a291c63	sstables: Fix usage of wrong comparator in find_disk_ranges() This made a difference if clustering restriction bounds were not full keys but prefixes. Fixes #2272. Message-Id: <1493058357-24156-1-git-send-email-tgrabiec@scylladb.com>	2017-04-24 21:56:07 +03:00
Raphael S. Carvalho	2df7c80c66	compaction_manager: fix crash when dropping a resharding column family Problem is that column family field of task wasn't being set for resharding, so column family wasn't being properly removed from compaction manager. In addition to fixing this issue, we'll also interrupt ongoing compactions when dropping a column family, exactly like we do with shutdown. Fixes #2291. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170418125807.7712-1-raphaelsc@scylladb.com> (cherry picked from commit `e78db43b79`)	2017-04-18 17:40:09 +03:00
Raphael S. Carvalho	193b5d1782	partitioned_sstable_set: fix quadratic space complexity streaming generates lots of small sstables with large token range, which triggers O(N^2) in space in interval map. level 0 sstables will now be stored in a structure that has O(N) in space complexity and which will be included for every read. Fixes #2287. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170417185509.6633-1-raphaelsc@scylladb.com> (cherry picked from commit `11b74050a1`)	2017-04-18 13:05:00 +03:00
Nadav Har'El	dd56f1bec7	sstable decompression: fix skip() to end of file The skip() implementation for the compressed file input stream incorrectly handled the case of skipping to the end of file: In that case we just need to update the file pointer, but not skip anywhere in the compressed disk file; In particular, we must NOT call locate() to find the relevant on-disk compressed chunk, because there is none - locate() can only be called on actual positions of bytes, not on the one-past-end-of-file position. Fixes #2143 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170308100057.23316-1-nyh@scylladb.com> (cherry picked from commit `506e074ba4`)	2017-03-08 12:35:39 +02:00
Gleb Natapov	56725de0db	sstable: close sstable_writer's file if writing of sstable fails. Failing to close a file properly before destroying file's object causes crashes. [tgrabiec: fixed typo] Fixes #2122. Message-Id: <20170221144858.GG11471@scylladb.com> (cherry picked from commit `0977f4fdf8`)	2017-02-28 11:04:26 +02:00
Paweł Dziepak	83c6fc1114	sstables: write counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	5905729c4a	sstables: read counter cells	2017-02-02 10:35:14 +00:00
Tomasz Grabiec	6c75614d19	sstables: Fix input_stream not being closed by index_reader Fixes #2022 Message-Id: <1484912679-5729-1-git-send-email-tgrabiec@scylladb.com>	2017-01-20 11:58:33 +00:00
Paweł Dziepak	19ad35610b	sstables: do not discard future returned by fast_forward_to() continuous_data_consumer::fast_forward_to() returns a future which was later ignored by data_consume_context::fast_forward_to(). With the current implementation, the future in question is always ready and that's why the problem didn't manifest itself in the form of crashes or invalid results. Message-Id: <20170120105746.7300-1-pdziepak@scylladb.com>	2017-01-20 12:22:17 +01:00
Benoît Canet	bcc826cc34	mutation_reader: Short circuit the read path on empty range Add a boolean to short circuit the read path on empty range hoping for some speedup. tested in read write with cs using: cl=QUORUM duration=1m -mode native cql3 -rate threads=700 -node localhost Will do some additional benchmark. Fixes #1056 Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <20170118194451.16836-1-benoit@scylladb.com>	2017-01-20 10:05:40 +00:00
Tomasz Grabiec	dd0fb48564	sstables: Close _file even if random_access_reader::close() reports errors close() operation is like a destructor, it cannot fail. It just reports errors, but close itself succeeds. So we should proceed with the closing even if it fails. Message-Id: <1484245886-7269-1-git-send-email-tgrabiec@scylladb.com>	2017-01-18 12:41:55 +00:00
Avi Kivity	c227e3e706	Merge "move a few files in the ScyllaDB project to use the new metrics registration API" from Vlad * 'rearrange-scylla-collectd-stats-registration-v3' of github.com:cloudius-systems/seastar-dev: thrift::server: move collectd counters registration to the metrics registration layer gms::gossiper: move collectd counters registration to the metrics registration layer utils::logalloc: move collectd counters registration to metrics registration layer streaming::stream_manager: move a collectd counters registration to the metrics registration layer db::commitlog::commitlog: move collectd counters registration to the metrics registration layer sstables::compaction_manager: move collectd metrics registration to the metrics registration layer db::batchlog_manager: move collectd registration to the metrics registration layer transport::server: move collectd metrics registration to the metrics registration layer cql3::query_processor: move collectd metrics registration to the metrics registration layer database: move collectd registrations to metrics registration layer tracing::trace_keyspace_helper: move collectd metrics registration to a metric registration layer tracing::trace_keyspace_helper: fix alignment tracing::tracing: move collectd metrics registration to metrics registration layer	2017-01-12 17:13:08 +02:00
Tomasz Grabiec	33e1f9af6b	sstables: Close input_stream from random_access_reader Spotted by destroy-without-close detector. Message-Id: <1484072527-13058-1-git-send-email-tgrabiec@scylladb.com>	2017-01-11 09:40:00 +00:00
Vlad Zolotarov	00e37c389b	sstables::compaction_manager: move collectd metrics registration to the metrics registration layer Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-01-10 16:24:54 -05:00
Avi Kivity	0591303b72	Merge "avoid excessive memory usage during resharding" from Rapahel "Intended to reduce memory usage when resharding by sharing sstable components among shards. File descriptors are also shared from now on, meaning that a much smaller number of file descriptors will be used during resharding. Fixes #1951." branch 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla * 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla: db: avoid excessive memory usage during resharding checked_file_impl: add support to dup sstables: group sstable components that can be shared among shards sstables: rename sstable member	2017-01-09 20:43:50 +02:00
Raphael S. Carvalho	68dfcf5256	db: avoid excessive memory usage during resharding After resharding, sstables may be owned by all shards, which means that file descriptors and memory usage for metadata will increase by a factor equal to number of shards. That can easily lead to OOM. SSTable components are immutable, so they can be stored in one shard and shared with others that need it. We use the following formula to decide which shard will open the sstable and share it with the others: (generation % smp::count), which is the inverse of how we calculate generation for new sstables. So if no resharding is performed, everything is shard-local. With this approach, resource usage due to loaded sstables will be evenly distributed among shards. For this approach to work, we now only populate keyspaces from shard 0. It's now the sole responsible for iterating through column family dirs. In addition, most of population functions are now free and take distributed database object as parameter. Fixes #1951. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-09 15:24:36 -02:00
Paweł Dziepak	3339cced05	sstables: file_writer: make write() non-virtual Noone overrides file_writer::write() so there is no reason to inhibit optimisations and cause compiler to emit indirect calls. Message-Id: <20170104163618.26251-1-pdziepak@scylladb.com>	2017-01-09 09:47:37 +02:00
Raphael S. Carvalho	eed2a7d065	sstables: group sstable components that can be shared among shards We intend to share immutable sstable components among shards to reduce excessive memory usage when resharding shared sstables. This change is about grouping those components into a structure, and using foreign ptr to make sure that the structure will be deleted by whichever shard created it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:19 -02:00
Raphael S. Carvalho	a492f8dfaf	sstables: rename sstable member Rename _components to _recognized_components because _components will be used to name a field with shareable components. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:17 -02:00
Tomasz Grabiec	f2a63270d1	sstables: Fix double close on index and data files when writing fails file output streams take the responsibility of closing the file, they will close the file as part of closing the stream. During sstable writing we create sstable object and keep file references there as well. Sstable object also has responsibility for closing the files, and does so from sstable::~sstable(). Double close was supposed to be avoided by a construct like this: writer.close().get(); _file = {}; However if close() failed, which can happen when write-ahead failed, _file would not be cleared, and both the writer and sstable would close the file. This will result in a crash in append_challenged_posix_file_impl::close(), which is not prepared to be closed twice. Another problem is that if exception happened before we reached that construct, we still should close the writer. Currently we don't, so there's no double close on the file, but that's a bug which needs to be fixed and once that's fixed double close on _file will be even more likely. The fix employed here is to not keep files inside sstable object when writing. As soon as the writer is constructed, it's the only owner of the file. Fixes #1764. Message-Id: <1482428648-22553-1-git-send-email-tgrabiec@scylladb.com>	2016-12-23 11:44:43 +02:00
Avi Kivity	74ecd7072a	Merge "Reduce overhead of get_max_purgeable_timestamp() during compaction" from Tomasz * 'tgrabiec/calculate-hash-once-compaction' of github.com:cloudius-systems/seastar-dev: sstables: Calculate key hash only once during compaction tests: sstables: Add more test cases to tombstone_purge_test db: Expose column_family::add_sstable tests: sstables: Ensure timestamps are increasing tests: sstables: Simplify tombstone_purge_test	2016-12-22 14:33:30 +02:00
Tomasz Grabiec	045b9fd7c1	sstables: Calculate key hash only once during compaction Improves compaction performance.	2016-12-22 13:24:46 +01:00
Raphael S. Carvalho	c26090a6b2	sstables/compress: fix error message for snappy uncompression Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <898ad07db705355bdbf780afdb3aa982b8ca3823.1482364125.git.raphaelsc@scylladb.com>	2016-12-22 09:08:34 +01:00
Avi Kivity	875635554d	Merge "educe overhead of partition presence checker during cache update" from Tomasz Refs #1943. * 'tgrabiec/optimize-bloom-filter' of github.com:cloudius-systems/seastar-dev: db: Compute key hash once in partition_presence_checker bloom_filter: Allow checking presence using pre-hashed key db: Use incremental selector in partition_presence_checker	2016-12-21 14:24:54 +02:00
Raphael S. Carvalho	e28537b56f	sstables: fix calculation of memory footprint for summary size of keys weren't taken into account, so value reported via collectd is much smaller than actual footprint. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <3ca24612e4e84d1cbdea4f2d79e431a4f4479291.1482255327.git.raphaelsc@scylladb.com>	2016-12-20 18:28:47 +00:00
Tomasz Grabiec	0e487b3499	db: Compute key hash once in partition_presence_checker I measured reduction of cache update time by 20% for 6 sstables and by 40% for 16. Refs #1943.	2016-12-19 14:20:58 +01:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Asias He	85034c1b57	Convert to use dht::partition_range	2016-12-19 08:04:30 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Glauber Costa	56df53f51e	compaction_manager: fix shutdown sequence By the time we are able to acquire this semaphore, we may be stopped already. So we need to test it before we go ahead. I can see shutdown hangs before this patch that are fixed with it applied. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <e5b378893128d086d584ffbb2acd3fb687648e5c.1481655433.git.glauber@scylladb.com>	2016-12-14 09:26:24 +01:00
Avi Kivity	299d1fad0b	Merge "reduce bloom filter overhead in compaction" from Raphael "Function to calculate maximum purgeable timestamp is made 10 times faster when compacting sstables overlap with 10% of all sstables. That's possible with an incremental selector that will incrementally select sstables based on key being compacted. Currently, we iterate through all non-compacting sstables and consult their bloom filter to determine max purgeable timestamp, and that will be very expensive for compactions that are frequently deciding whether or not to purge tombstones." * 'filter_overhead_fix_v4' of github.com:raphaelsc/scylla: compaction: reduce bloom filter overhead with incremental selector tests: add test for sstable set's incremental selector sstable_set: introduce incremental selector compatible_ring_position: add function to return token	2016-12-11 09:46:58 +02:00
Glauber Costa	5803957ab5	compaction: fix build Commit `732ee275` moved tracking of one statistics value inside a lambda without capturing this in that lambda. Compilation fails as a result. Signed-off-by: Glauber Costa <glauber@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <68860640f4533dd43e43f341f1620e25464b700b.1481313455.git.glauber@scylladb.com>	2016-12-10 09:00:20 +02:00
Raphael S. Carvalho	fcfc84e836	compaction: reduce bloom filter overhead with incremental selector The procedure to calculate max purgeable timestamp is optimized by only visiting sstables that overlap with key being currently compacted. That's done using incremental sstable selector. Function to calculate maximum purgeable timestamp is made 10 times faster when compacting sstables overlap with 10% of all sstables. Fixes #1322. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-12-09 16:17:17 -02:00
Raphael S. Carvalho	02541e15c1	sstable_set: introduce incremental selector Incrementally select sstables from sstable set using token in ascending order. For leveled strategy, it returns all sstables that belong to current interval. For other strategies, it just return all sstables from the set. Useful for compaction which needs all sstables that overlap with key being currently compacted to calculate maximum purgeable timestamp. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-12-09 16:17:16 -02:00
Raphael S. Carvalho	732ee275f8	compaction: fix running compaction counter when splitting sstables The counter was being increased before taking the semaphore, so every pending split would count as a running compaction which misleads the user as a result. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <f2050cc3599cee7af29d4579368a154708b37731.1481248048.git.raphaelsc@scylladb.com>	2016-12-09 15:01:43 +02:00
Avi Kivity	872b5ef5f0	sstables: fix probe with Unknown component Commit `53b7b7def3` ("sstables: handle unrecognized sstable component") ignores unrecognized components, but misses one code path during probe_file(). Ignore unrecognized components there too. Fixes #1922. Message-Id: <20161208131027.28939-1-avi@scylladb.com>	2016-12-08 15:24:25 +01:00
Avi Kivity	5530a61975	stables: fix build with older boost (boost::variant::get<T&>) Older boost doesn't support boost::variant::get<T&> (where the type parameter is reference qualified); remove (unneeded anyway).	2016-12-08 10:56:05 +02:00
Avi Kivity	3c3a18f222	sstables: move sharding metadata from Statistics component to a new Scylla component The Cassandra derived sstable tools (and likely Cassandra itself) object to a new sub-component in the Statistics component; create a new Scylla component instead to host this data.	2016-12-07 15:20:13 +02:00
Avi Kivity	24140ec8c6	sstables: add support for sets of discriminated union types Allow declaring discriminated unions (with an enum type as the discriminant and any sstable serializable type as a value) and sets of these unions, with the disciminant as the key. Parsers and writers are auto-generated.	2016-12-07 13:27:52 +02:00
Raphael S. Carvalho	b30a2cb21a	lcs: generate info that preserves token distribution in higher levels The information (last compacted keys) is lost after node is restarted or schema is updated, which causes strategy to be rebuilt. We need it for strategy to guarantee uniform distribution of token range across sstables, or we could end up with 1 sstable of level L overlapping with lots of sstables of level L+1, and that results in a compaction of undesired length. That information can be generated from scratch by getting last key of newest sstable in each level > 0. Fixes #1906. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <35ebd15977d5a8418239febb160c796cdc0e98fa.1480533805.git.raphaelsc@scylladb.com>	2016-12-01 11:19:58 +02:00
Raphael S. Carvalho	38743c1948	sstables: provide write time of data component Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <59686148149f2159990329775e0cd8780bc54254.1480533805.git.raphaelsc@scylladb.com>	2016-12-01 11:19:57 +02:00

1 2 3 4 5 ...

817 Commits