scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 20:16:43 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	9d82a1ebfd	abstract_read_executor: make make_requests() exception safe Message-Id: <20170821162934.25386-5-pdziepak@scylladb.com>	2017-08-22 12:09:42 +02:00
Paweł Dziepak	31afc2f242	shared_index_lists: restore indentation Message-Id: <20170821162934.25386-4-pdziepak@scylladb.com>	2017-08-22 12:09:42 +02:00
Paweł Dziepak	93eaa95378	sstables: make shared_index_lists::get_or_load exception safe Message-Id: <20170821162934.25386-3-pdziepak@scylladb.com>	2017-08-22 12:09:42 +02:00
Avi Kivity	ef85cf1cb3	Merge "Compress in-memory compression-info" from Botond "Overly large metadata can hog memory which especially hurts in setups with bad disk/memory ratio. To ease the pain compress the in-memory compression-info. The compression is implemented based on Avi's idea which is to group n offsets together into segments, where each segment stores a base absolute offset into the file, the other offsets in the segments being relative offsets (and thus of reduced size). Also offsets are allocated only just enough bits to store their maximum value. The offsets are thus packed in a buffer like so: arrrarrrarrr... where n is 4, a is an absolute offset and r are offsets relative to a. This of course means that stored offsets will not be aligned, not even on a byte boundary, but the size reduction pretty convincing. In addition, segments are stored in buckets, where each bucket has its own base offset. In addition, segments in a buckets are optimized to address as large of a chunk of the data as possible for a given chunk size." Ref #1946. * 'bdenes/compress-compression-v3' of https://github.com/denesb/scylla: Add unit test for compress::offsets Optimise the storage of compression chunk offsets Add script to precompute segmented compression parameters	2017-08-22 10:30:58 +03:00
Botond Dénes	62c18da35c	Add unit test for compress::offsets	2017-08-21 17:06:20 +03:00
Botond Dénes	028c7a0888	Optimise the storage of compression chunk offsets To reduce the memory footprint of compression-info, n offsets are grouped together into segments, where each segment stores a base absolute offset into the file, the other offsets in the segments being relative offsets (and thus of reduced size). Also offsets are allocated only just enough bits to store their maximum value. The offsets are thus packed in a buffer like so: arrrarrrarrr... where n is 4, a is an absolute offset and r are offsets relative to a. The optimal value of n can be calculated for a given file_size (f) and chunk_size (c), by finding the minima of the following function: f(n) = (f/c)/n * (log2(f) + (n - 1)log2((n-1)(c + 64))) This is done in an empirical way, using a script (see below). Furthermore segments are stored in buckets, where each bucket has its own base offset. Each bucket therefore can address an equal chunk of the file and furthermore each segment in a bucket can address an equal sub-chunk of this area. The value of a given offset i is thus: bucket_base_offset_for(i) + segment_base_offset_for(i) + offset(i) To account for the bucketed storage we calculate a local_f, which is optimized so that a bucketful of segmented offsets can address the largest possible chunk of f. As value of this local_f only depends on the bucket_size (b) and c the value of n can be made independent of f and therefore only depend on one dynamic value, c. This makes life much simpler as we don't need to know the size of the file up-front, we can just append buckets to the storage on demand, while the required storage is still less than a third [1] of the original storage requirements (std::deque<uint64>). The table with the minima(f(n)) for different f and c values is pre-computed by gen_segmented_compress_params.py and stored in sstables/segmented_compress_params.hh. This script also creates a table with the best values of local_f for the given bucket_size. At runtime we only select the best params based on c. [1] This was calculated for c=4K and b=4K	2017-08-21 17:06:12 +03:00
Avi Kivity	de011ece52	main: deprecate non-murmur3 partitioners more forcefully Some (most?) users don't read logs or release notes, so they won't notice that the ByteOrdered and Random partitioners were deprecated in 2.0. Make them notice by refusing to start with a deprecated partitioner, unless a switch is explicitly enabled. Message-Id: <20170820073424.8331-1-avi@scylladb.com>	2017-08-21 14:32:22 +02:00
Avi Kivity	9f415ef870	sstables: accurate summary entry size calculation Calculate the summary entry size correctly, so we don't end up with oversize summaries. Message-Id: <20170819184255.14181-2-avi@scylladb.com>	2017-08-21 14:28:57 +02:00
Avi Kivity	17c372bf0e	sstables: get rid of 64kB minimum index advance to generate summary Limiting summary entry generation to at most one summary entry per 64k of index data can lead to large index pages, with thousands of index entries per summary entry. These are slow to parse, and there is no real gain from the limit, since we already enforce a size limit on the summary. Remove the limit and allow summary entry generation based solely on spanned data size. Fixes #2711. Message-Id: <20170819184255.14181-1-avi@scylladb.com>	2017-08-21 14:26:44 +02:00
Avi Kivity	81a33df25d	dht: reduce split_range_to_single_shard contiguous memory demand split_range_to_single_shard() returns a vector of size 4096, with each element (a partition_range) of size 100. The total of 400k can cause defragmentation if memory is fragmented. Fix by using a deque. Fixes #2707. Message-Id: <20170819141017.28287-1-avi@scylladb.com>	2017-08-21 14:25:45 +02:00
Piotr Jastrzebski	c602ffd610	Make Scylla ttl expiration behave like in Cassandra Fixes #2497 [tgrabiec: reworked the title] Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <2f5a99dce6ef11fe0ef135c9fa0592078fc9a056.1502886874.git.piotr@scylladb.com>	2017-08-21 14:25:45 +02:00
Botond Dénes	eae33a1f19	Add script to precompute segmented compression parameters The script generates sstables/segmented_compress_params.hh which contains a list with the optimal number of grouped offsets for different data and chunk sizes as well as a list with the best nominal data sizes for different chunk sizes, given a bucket size. Data sizes are in the range of [24,250] and chunks in the range of [24, 230]. Data sizes that are not used with the current bucket_size are ommited. See next commit for details of how the calculated values are used.	2017-08-21 10:44:08 +03:00
Avi Kivity	5a2439e702	main: check for large allocations Large allocations can require cache evictions to be satisfied, and can therefore induce long latencies. Enable the seastar large allocation warning so we can hunt them down and fix them. Message-Id: <20170819135212.25230-1-avi@scylladb.com>	2017-08-21 10:25:40 +03:00
Pekka Enberg	318423d50b	Merge seastar upstream * seastar 2d16aca...e96881a (4): > memory: add detector for large allocations > memory: reduce large allocations for small pools > net: Fix potential NULL pointer dereference in udp.cc > Update dpdk submodule	2017-08-21 10:24:08 +03:00
Tomasz Grabiec	8f2ca52740	tests: Run test_query_only_static_row test case on all mutation sources The test checks behavior common to all mutation readers, so it's better to run it against all mutation sources rather than only for cache reader. Message-Id: <1503072333-17995-1-git-send-email-tgrabiec@scylladb.com>	2017-08-20 12:23:28 +03:00
Raphael S. Carvalho	10eaa2339e	compaction: Make resharding go through compaction manager Two reasons for this change: 1) every compaction should be multiplexed to manager which in turn will make decision when to schedule. improvements on it will immediately benefit every existing compaction type. 2) active tasks metric will now track ongoing reshard jobs. Fixes #2671. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170817224334.6402-1-raphaelsc@scylladb.com>	2017-08-20 11:35:14 +03:00
Takuya ASADA	38b2ff617f	dist/redhat: follow the change on libgcc/libstdc++ package name Since we moved to external 3rdparty repository, we added '53' suffix on gcc packages, so follow the change. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20170819092039.1090-2-syuu@scylladb.com>	2017-08-19 16:01:28 +03:00
Takuya ASADA	f1b5401d1f	dist/redhat: Change g++ command name on CentOS We have added '-5.3' suffix on g++ command from scylla-gcc53-c++-5.3.1-2.2, follow the change on scylla build script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20170819092039.1090-1-syuu@scylladb.com>	2017-08-19 16:01:27 +03:00
Avi Kivity	e428805ba5	Merge "Optimize query result partition and row counts" from Duarte "Now that range queries go through the normal digest path, we rely on query::result::calculate_counts() to count the amount of partitions and rows returned. This series optimizes it, in case it is needed, and also changes the result message to include the partition and row counts, avoiding the calculation altogether." * 'calculate-counts/v3' of github.com:duarten/scylla: query-result: Send row and partition count over the wire query::result: Optimize calculate_counts()	2017-08-17 13:41:21 +03:00
Alexys Jacob	e5ff8efea3	dist: Fix Gentoo Linux scylla-jmx and scylla-tools packages detection These two admin related packages will be packaged under the "app-admin" category and not the "dev-db" one. This fixes the detection path of the packages for scylla_setup. Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20170817094756.21550-1-ultrabug@gentoo.org>	2017-08-17 13:20:43 +03:00
Nadav Har'El	7832d8a883	get rid of unused part in configure.py Scylla's configure.py contains stuff we copied from Seastar's configure.py, but is no longer used. Let's get rid of some of it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170813150842.12603-1-nyh@scylladb.com>	2017-08-17 12:05:44 +03:00
Duarte Nunes	1e7f0eab82	memtable: Created readers should be fast forwardable by default mutation_reader::forwarding defaults to yes. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170816180304.2121-1-duarte@scylladb.com>	2017-08-17 10:21:01 +03:00
Botond Dénes	e70cfc8f36	incremental_reader_selector: account for possibly disengaged lower bound In addition to the constructor (fixed previously) the check for no sstables on the first call to select() also has to be prepared for the lower bound of the range being disengaged. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4ab1296c71814fcd492996fa36fd00fd7bbbbc7f.1502949875.git.bdenes@scylladb.com>	2017-08-17 10:07:26 +03:00
Botond Dénes	af83b7f57b	incremental_reader_selector: use lazy_deref instead of tertiary operator Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4f4b884c6a1f517bd654f3b27608d854b17a66e1.1502948635.git.bdenes@scylladb.com>	2017-08-17 08:45:46 +03:00
Botond Dénes	eb7eee510d	combined_mutation_reader_test: use the global const objects directly Instead of local ones. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <3ec1a70e4c0198c0563dff9688bbaa7fcfcace71.1502891190.git.bdenes@scylladb.com>	2017-08-16 16:56:42 +03:00
Paweł Dziepak	784dcbf1ca	sstables: initialise index metrics on all shards Fixes #2702. Message-Id: <20170816085454.21554-1-pdziepak@scylladb.com>	2017-08-16 15:44:26 +03:00
Avi Kivity	d7e3fbc6fe	Merge seastar upstream * seastar 2a43102...2d16aca (1): > fstream: do not ignore unresolved future Fixes #2697.	2017-08-16 15:09:59 +03:00
Botond Dénes	611774b1d9	Use the incremental reader for compaction As leveled compaction strategy stands to gain the most from incrementally opening sstables. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <292648d3fa4ea97376c0b4360754a20132194f63.1502822066.git.bdenes@scylladb.com>	2017-08-15 21:38:04 +03:00
Takuya ASADA	0f9b095867	dist/common/scripts: prevent ignoreing flag that passed after another flag which requires parameter When user mistakenly forgot to pass parameter for a flag, our scripts misparses next flag as the parameter. ex) Correct usage is '--ntp-domain <domain> --setup-nic', but passed '--ntp-domain --setup-nic'. Result of that, next flag will ignore by scripts. To prevent such behavior, reject any parameter that start with '--'. Fixes #2609 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20170815114751.6223-1-syuu@scylladb.com>	2017-08-15 18:27:32 +03:00
Duarte Nunes	c7aa3ea069	mutation_partition: Remove obsolete short read detection When compacting a partition for querying we would read an extra row, to include any tombstones between that one and the previous row. This is no longer needed since we have a general mechanism to detect short reads in the storage_proxy. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811103031.22866-1-duarte@scylladb.com>	2017-08-15 12:01:55 +01:00
Avi Kivity	8df6dd1fa0	database: make incremental_reader_selector robust vs. full-range partition_range incremental_reader_selector assumes the partition_range it receives has a lower bound, but it was seen in mutation_test that this is not so. Fix by checking whether the bound exists or not. Message-Id: <20170815095852.14149-1-avi@scylladb.com>	2017-08-15 11:03:22 +01:00
Avi Kivity	a35bfb3ea9	Merge seastar upstream * seastar 47b31f6...2a43102 (1): > Merge "Fix crash in rpc due to access to already destroyed server socket" from Gleb Fixes #2690	2017-08-14 16:23:02 +03:00
Avi Kivity	e892a0082a	Merge "Drop exhausted mutation_readers when possible" from Duarte "Exhausted readers belonging to a combined_mutation_reader can be fast forwarded, so we have to keep them around. However, if the reader is not fast forwardable, then we can drop the contained readers and their buffers." * 'ff-reader/v2' of github.com:duarten/scylla: combined_mutation_reader: Drop exhausted readers if not in FF mode combined_mutation_reader: Remove superfluous mutation_readers list memtable_snapshot_source: Created readers should be fast forwardable	2017-08-14 16:20:38 +03:00
Duarte Nunes	7fb6a74302	combined_mutation_reader: Drop exhausted readers if not in FF mode Exhausted readers can be fast forwarded, so we have to keep them around. However, if the current reader is not fast forwardable, then we can drop those readers and their buffers. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Duarte Nunes	0b53f88a42	combined_mutation_reader: Remove superfluous mutation_readers list The _all_readers variable can do the same job. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Duarte Nunes	77477605c1	memtable_snapshot_source: Created readers should be fast forwardable As they're used by the cache tests. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Avi Kivity	afff29bdb9	Merge seastar upstream * seastar edb73ab...47b31f6 (1): > tls: Only recurse once in shutdown code Fixes #2691.	2017-08-14 15:09:42 +03:00
Duarte Nunes	a17cef76b2	query-result-writer: Remove unneeded field Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811102940.22747-1-duarte@scylladb.com>	2017-08-14 12:33:33 +01:00
Duarte Nunes	ec75eac37d	ring_position_exponential_vector_sharder: Take ranges by rvalue Avoids some copies. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170814093310.29200-1-duarte@scylladb.com>	2017-08-14 12:55:43 +03:00
Duarte Nunes	3b9a9b7321	query-result: Send row and partition count over the wire To avoid calculating them on the coordinator side. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 10:29:06 +02:00
Duarte Nunes	d7bab684ea	query::result: Optimize calculate_counts() Now that range queries go through the normal digest path, we rely on query::result::calculate_counts() to count the amount of partitions and rows returned. This patch makes it a bit faster. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 10:28:29 +02:00
Avi Kivity	cb2c5016ea	Merge seastar upstream * seastar 7a49ae5...edb73ab (11): > scripts: perftune.py: change the network module mode auto selection heuristic > net/tls: explicitly ignore ready future during shutdown > Use python2 explicitly as an interpreter for Python v2 scripts > peering_sharded_service: prevent over-run the container > Add link to documentation to the README.md > Add guidelines for contributing to Seastar > sharded: fix move constructor for peering_sharded_service services > Provide a convenient way to lazy-convert to string the values of pointers > tutorial: overhaul semaphores section > simple-stream: Make fragmented::write_substream return simple if possible > simple-stream: Make simple/fragmented memory output stream top level	2017-08-14 10:29:27 +03:00
Raphael S. Carvalho	050a7019b8	sstables/index_reader: fix index reader for summary entry spanning lots of keys quantity prevents index_reader from reading all index entries of a summary entry that span more than min_index_interval entries. That can happen after introduction of size-based sampling, and consequently, sstable will not be able to return a key which logical position in summary entry is beyond min_index_interval. It's ok to not use quantity because index_reader will read all indexes until either next summary entry or end of file is reached. Fixes test_sstable_conforms_to_mutation_source Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170812045821.25269-1-raphaelsc@scylladb.com>	2017-08-12 09:44:16 +03:00
Duarte Nunes	08e284a07e	combined_mutation_reader: Don't drop mutation readers This patch fixes a regression introduced in `a6b9186ca`. We should keep the readers around in case a subsequent call to fast_forward() will require them. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811160444.12795-1-duarte@scylladb.com>	2017-08-11 19:17:29 +03:00
Duarte Nunes	44b6da2e90	test.py: Add combined_mutation_reader_test Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811155017.9899-1-duarte@scylladb.com>	2017-08-11 18:54:11 +03:00
Avi Kivity	dbf8625ac9	Merge "size-based sampling for sstable summary" from Raphael "Fixes #1842." * 'size_based_sampling_v3' of github.com:raphaelsc/scylla: tests: test summary entry spanning more keys than min interval db/config: introduce sstable_summary_ratio option sstables: introduce size-based sampling for sstable summary sstables: make components_writer::offset const qualified and uint64_t sstables: make writer::offset const qualified and uint64_t	2017-08-11 18:41:45 +03:00
Duarte Nunes	e7d56884c0	list_reader_selector: Prevent infinite loop In case the readers are empty. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811153142.8926-1-duarte@scylladb.com>	2017-08-11 18:34:55 +03:00
Vladimir Krivopalov	003e8cf250	Use python2 explicitly as an interpreter for Python v2 scripts Signed-off-by: Vladimir Krivopalov <vladimir.krivopalov@gmail.com> Message-Id: <20170811032712.4362-1-vladimir.krivopalov@gmail.com>	2017-08-11 18:08:11 +03:00
Duarte Nunes	20337053ad	Don't use literal lambdas These are only available in C++17. Fixes the build after `b5460c2`. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-11 13:08:42 +02:00
Duarte Nunes	b5460c2990	Merge "Support `duration` type" from Jesse "This patch series adds support for the `duration` type in CQL, which was added to Cassandra in 3.10. As part of this work, it was necessary also to add support for the `vint` and `unsigned vint` types to the native protocol implementation, which are part of v5 of the specification. To test interactively, it is necessary to use cqlsh distributed with Cassandra, as the version we distribute does not yet support the duration type." * 'jhk/duration_protocol/v5' of https://github.com/hakuch/scylla: Support `duration` CQL native type CQL native protocol: Add support for `vint` serialization duration_test.cc: Add test for printing zero duration duration.cc: Remove nop `const` qualifier on return type Change `const` qualifier declaration order for `duration` duration.cc: Simplify range checking Rename `duration` to `cql_duration`	2017-08-11 10:56:55 +01:00

1 2 3 4 5 ...

12919 Commits