scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	99aa3d1964	tests: row_cache_test: Don't assume mvcc snapshots are not evictable The test was not updating the underlying mutation source but still expecting to get the right data after calling invalidate(). If snapshots are evictable, that's not guaranteed. Apply to underlying as well, so data is read from underlying if necessary.	2017-09-13 17:38:08 +02:00
Tomasz Grabiec	2df6f356b1	mvcc: Store LSA region reference in partition_snapshot Will be useful for improving encapsulation.	2017-09-13 17:38:08 +02:00
Tomasz Grabiec	4c920c9891	tests: cql_test_env: Use cancel_prior_atomic_deletions() This fixes a failure in view_schema_test, which starts many instances of single_node_cql_env. cancel_atomic_deletions() causes later deletions to fail, which causes some of the test cases to fail. Message-Id: <1505311250-3118-2-git-send-email-tgrabiec@scylladb.com>	2017-09-13 17:11:34 +03:00
Tomasz Grabiec	8a425cedc6	tests: cql_test_env: Cancel pending sstable deletions on shutdown Fixes a hang on shutdown with --smp 2 in perf_fast_forward. The hang is in sstables::await_background_jobs_on_all_shards(), which is waiting on sstable deletions. Not all shards agree to delete certain sstables, because e.g. not all shards decide to compact them yet. Cancel those deletes after database is stopped on all shards, like we do in main.cc Fixes #2796. Message-Id: <1505292239-26032-1-git-send-email-tgrabiec@scylladb.com>	2017-09-13 11:56:48 +03:00
Tomasz Grabiec	423142ec81	tests: row_cache_test: Fix abort in debug mode The test used apply() variant which assumed that it was invoked in a seastar thread, which is no longer the case after commit `d22fdf4`. Fix by copying outisde cache update, and use non-deferring apply() variant for cache update. Message-Id: <1505200142-3650-1-git-send-email-tgrabiec@scylladb.com>	2017-09-12 10:57:36 +03:00
Avi Kivity	f7023501d6	treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable> Since shared_sstable is going to be its own type soon, we can't use the old alias.	2017-09-12 10:43:05 +03:00
Avi Kivity	02028df9b1	cql_test_env: add forward declaration Not worthwhile to add a new #include for this.	2017-09-12 10:43:05 +03:00
Avi Kivity	5ebb15b9d4	sstable_mutation_test: add missing include	2017-09-12 10:43:05 +03:00
Avi Kivity	fdab47ab32	perf_fast_forward: add missing include	2017-09-12 10:43:05 +03:00
Avi Kivity	9b540eccb0	database: remove dependency on compaction.hh and compaction_manager.hh	2017-09-11 20:09:45 +03:00
Botond Dénes	9ebeb9d5ce	Fix --Wreturn-type warnings in tests: use abort() instead of assert(0) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <95927f933411302e84d57d169ee0147def7bc643.1504890922.git.bdenes@scylladb.com>	2017-09-10 17:09:53 +03:00
Tomasz Grabiec	121cd8cb6c	tests: Fix cql_query_test.cc::test_duration_restrictions validate_request_failure() assumed that the future returned by execute_cql() is always ready, which doesn't have to be the case, and caused aborts in debug mode build. Message-Id: <1504701342-13300-1-git-send-email-tgrabiec@scylladb.com>	2017-09-06 15:49:03 +03:00
Tomasz Grabiec	3986486cb3	tests: cql_test_env: Avoid exceptions to make debugging easier Message-Id: <1504701375-13491-1-git-send-email-tgrabiec@scylladb.com>	2017-09-06 15:48:59 +03:00
Paweł Dziepak	ed68a75b75	tests/counter: verify counter_id ordering	2017-09-05 10:52:54 +01:00
Paweł Dziepak	2b614201a7	tests/sstables: add storage_service_for_tests to counter write test Writing a counters to a sstable is going to require cluster feature information, which requires accessing some singletons.	2017-09-05 10:32:48 +01:00
Paweł Dziepak	5007c9290a	tests/sstables: add test for reading wrong-order counter cells	2017-09-05 10:32:48 +01:00
Paweł Dziepak	1e03c4acbe	tests/counter: test 1.7.4 compatible shard ordering	2017-09-05 10:32:47 +01:00
Paweł Dziepak	fd25a09db2	tests/counter: add tests for 1.7.4 counter shard order	2017-09-05 10:32:47 +01:00
Paweł Dziepak	b0f67c1680	tests/counter: verify order of counter shards	2017-09-05 10:32:47 +01:00
Paweł Dziepak	27397b5dad	tests/counter: add test for sorting and deduplicating shards	2017-09-05 10:32:47 +01:00
Tomasz Grabiec	d22fdf4261	row_cache: Improve safety of cache updates Cache imposes requirements on how updates to the on-disk mutation source are made: 1) each change to the on-disk muation source must be followed by cache synchronization reflecting that change 2) The two must be serialized with other synchronizations 3) must have strong failure guarantees (atomicity) Because of that, sstable list update and cache synchronization must be done under a lock, and cache synchronization cannot fail to synchronize. Normally cache synchronization achieves no-failure thing by wiping the cache (which is noexcept) in case failure is detect. There are some setup steps hoever which cannot be skipped, e.g. taking a lock followed by switching cache to use the new snapshot. That truly cannot fail. The lock inside cache synchronizers is redundant, since the user needs to take it anyway around the combined operation. In order to make ensuring strong exception guarantees easier, and making the cache interface easier to use correctly, this patch moves the control of the combined update into the cache. This is done by having cache::update() et al accept a callback (external_updater) which is supposed to perform modiciation of the underlying mutation source when invoked. This is in-line with the layering. Cache is layered on top of the on-disk mutation source (it wraps it) and reading has to go through cache. After the patch, modification also goes through cache. This way more of cache's requirements can be confined to its implementation. The failure semantics of update() and other synchronizers needed to change due to strong exception guaratnees. Now if it fails, it means the update was not performed, neither to the cache nor to the underlying mutation source. The database::_cache_update_sem goes away, serialization is done internally by the cache. The external_updater needs to have strong exception guarantees. This requirement is not new. It is however currently violated in some places. This patch marks those callbacks as noexcept and leaves a FIXME. Those should be fixed, but that's not in the scope of this patch. Aborting is still better than corrupting the state. Fixes #2754. Also fixes the following test failure: tests/row_cache_test.cc(949): fatal error: in "test_update_failure": critical check it->second.equal(*s, mopt->partition()) has failed which started to trigger after commit `318423d50b`. Thread stack allocation may fail, in which case we did not do the necessary invalidation.	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	56e3ce05db	row_cache: Don't require presence checker to be supplied externally The API is simpler and safer this way.	2017-09-04 10:04:29 +02:00
Paweł Dziepak	d5fa07f6df	Merge "sstables: switch from deque<> to a custom container" from Avi Large deques require contiguous storage, which may not be available (or may be expensive to obtain). Switch to new custom container instead, which allocates less contiguous storage. Allocation problems were observed with the summary and compression info. While there is work to reduce compression info contiguous space use, this solves all std::deque problems (and should not conflict with that work). Fixes #2708 * tag '2708/v6' of https://github.com/avikivity/scylla: sstables: switch std::deque to chunked_vector tests: add test for chunked_vector utils: add a new container type chunked_vector	2017-08-29 11:11:01 +01:00
Avi Kivity	5224ab9c92	Merge "Fix sstable reader not working for empty set of clustering ranges" from Tomasz "Fixes #2734." * 'tgrabiec/make-sstable-reader-work-with-empty-range-set' of github.com:scylladb/seastar-dev: tests: Introduce clustering_ranges_walker_test tests: simple_schema: Add missing include sstables: reader: Make clustering_ranges_walker work with empty range set clustering_ranges_walker: Make adjacency more accurate	2017-08-29 10:28:49 +03:00
Tomasz Grabiec	05e0ca6546	tests: Introduce clustering_ranges_walker_test	2017-08-28 21:08:55 +02:00
Tomasz Grabiec	dcbc1282a9	tests: simple_schema: Add missing include	2017-08-28 21:00:06 +02:00
Botond Dénes	eec451bcf8	segmented_offsets: use _current_bucket_segment_index consistently Previously _current_bucket_segment_index was used differently depending on whether update_position_trackers() is used in a random or sequential access. In the former case was used as the absolute index of the segment (independent of the buckets) and in the latter as the relative index of the segment within its bucket. This caused problems when there was a switch between random and sequential access, meaning one could get different results for an at() call depending on what was the previous at() call. Fix this by consistently using _current_bucket_segment_index as - like its name suggest - the bucket relative segment index. Ref #1946. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <7f68ac1d32c80e8dea6dfa11be02acaa961bce2a.1503924927.git.bdenes@scylladb.com>	2017-08-28 16:14:25 +03:00
Avi Kivity	fa8d0fe4d0	Revert "Revert "Revert "Revert "Merge "Compress in-memory compression-info" from Botond"""" This reverts commit `238877a0c6`. A fix was found and will be committed shortly.	2017-08-28 16:14:13 +03:00
Tomasz Grabiec	3241018c79	tests: mutation_source_test: Add more tests for fast forwarding across partitions	2017-08-28 10:30:08 +02:00
Avi Kivity	238877a0c6	Revert "Revert "Revert "Merge "Compress in-memory compression-info" from Botond""" This reverts commit `9d27455744`. It's still broken. To reproduce: ./tools/bin/cassandra-stress write -schema compression=LZ4Compressor (on a clean database) .0 0x00007ffff32aa69b in raise () from /lib64/libc.so.6 .1 0x00007ffff32ac4a0 in abort () from /lib64/libc.so.6 .2 0x000000000054a0e8 in seastar::memory::abort_on_underflow (size=<optimized out>) at core/memory.cc:1189 .3 seastar::memory::allocate_large (size=<optimized out>) at core/memory.cc:1194 .4 0x000000000054b305 in seastar::memory::allocate (size=size@entry=18446744073702885265) at core/memory.cc:1227 .5 0x000000000054b45e in malloc (n=n@entry=18446744073702885265) at core/memory.cc:1452 .6 0x00000000006013e4 in seastar::temporary_buffer<char>::temporary_buffer (this=0x6010195fc800, size=18446744073702885265) at /home/avi/urchin/seastar/core/temporary_buffer.hh:72 .7 0x0000000000a3908b in seastar::input_stream<char>::read_exactly (this=0x6010053d0248, n=18446744073702885265) at /home/avi/urchin/seastar/core/iostream-impl.hh:189 .8 0x0000000000a9c77f in compressed_file_data_source_impl::get (this=0x6010053d0240) at sstables/compress.cc:499 .9 0x0000000000aa1b01 in seastar::data_source::get (this=<optimized out>) at /home/avi/urchin/seastar/core/iostream.hh:63 .10 seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}::operator()() const (__closure=__closure@entry=0x6010195fcab0) at /home/avi/urchin/seastar/core/iostream-impl.hh:204 .11 0x0000000000aa22f0 in seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::apply<seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}&>(sstables::data_consume_rows_context&&) (func=...) at /home/avi/urchin/seastar/core/future.hh:1312 .12 seastar::repeat<seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}>(sstables::data_consume_rows_context&&) (action=...) at /home/avi/urchin/seastar/core/future-util.hh:203 .13 0x0000000000a9e730 in seastar::input_stream<char>::consume<sstables::data_consume_rows_context> (consumer=..., this=<optimized out>) at /home/avi/urchin/seastar/core/iostream-impl.hh:237 .14 data_consumer::continuous_data_consumer<sstables::data_consume_rows_context>::consume_input<sstables::data_consume_rows_context> (c=..., this=<optimized out>) at sstables/consumer.hh:226 .15 sstables::data_consume_context::impl::read (this=<optimized out>) at sstables/row.cc:411 .16 sstables::data_consume_context::read (this=<optimized out>) at sstables/row.cc:437 .17 0x0000000000aafbae in sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const (__closure=<optimized out>) at sstables/partition.cc:843 .18 seastar::apply_helper<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, std::tuple) (args=..., func=...) at ./seastar/core/apply.hh:36 .19 seastar::apply<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=..., func=...) at ./seastar/core/apply.hh:44 .20 seastar::futurize<seastar::future<> >::apply<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=..., func=...) at ./seastar/core/future.hh:1302 .21 seastar::future<>::then<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) ( this=this@entry=0x6010195fcbb0, func=...) at ./seastar/core/future.hh:890 .22 0x0000000000ac273f in sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const (__closure=0x6010195fcc28) at sstables/partition.cc:843 .23 seastar::do_until_continued<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}&&, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}&&, seastar::promise<>) (stop_cond=..., action=..., p=...) at /home/avi/urchin/seastar/core/future-util.hh:155 .24 0x0000000000ac29c3 in seastar::do_until<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}&&, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}&&) (action=..., stop_cond=..., this=<optimized out>) at /home/avi/urchin/seastar/core/future-util.hh:330 .25 sstables::sstable_streamed_mutation::fill_buffer (this=<optimized out>) at sstables/partition.cc:844 .26 0x0000000000ad3d2b in streamed_mutation::fill_buffer (this=0x6010195fcd10) at ./streamed_mutation.hh:489 .27 consume_flattened_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (streamed_mutation const&)> >(mutation_reader&, stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >&, std::function<bool (streamed_mutation const&)>&&) ( (gdb) p addr $1 = { chunk_start = 13330037, chunk_len = 18446744073702885265, offset = 0 }	2017-08-27 13:32:37 +03:00
Avi Kivity	576e33149f	Merge seastar upstream * seastar 0083ee8...85ca12d (1): > Merge "Run-time logging configuration" from Jesse Includes patch from Jesse: "Switch to Seastar for logging option handling In addition to updating the abstraction layer for Seastar logging in `log.hh`, the configuration system (`db/config.{hh,cc}`) has been updated in two ways: - The string-map type for Boost.program_options is now defined in Seastar. - A configuration value can be marked as `UsedFromSeastar`. This is like `Used`, except the option is expected to be defined in the Boost.Program_options description for Seastar. If the option is not defined in Seastar, or it is defined with a different type, then a run-time exception is thrown early in Scylla's initialization. This is necessary because logging options which are now defined in Seastar were previously defined in Scylla and support for these options in the YAML file cannot be dropped. In order to be able to verify that options marked `UsedFromSeastar` are actually defined in Seastar, the interface for adding options to `db::config` has changed from taking a `boost::program_options::options_description_easy_init` (which is handle into a `boost::program_options::options_description` which only allows adding options) to taking a `boost::program_options::options_description` directly (which also allows querying existing options). Scylla also fully defers to Seastar's support for run-time logging configuration." Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <ef26cffb91bef1ae95d508187a6dd861a6c4fc84.1503344007.git.jhaberku@scylladb.com>	2017-08-27 13:11:33 +03:00
Avi Kivity	204659ef40	tests: add test for chunked_vector	2017-08-26 16:44:47 +03:00
Avi Kivity	9d27455744	Revert "Revert "Merge "Compress in-memory compression-info" from Botond"" This reverts commit `9656fd79a0`. A fix is now available.	2017-08-24 13:37:35 +03:00
Tomasz Grabiec	9656fd79a0	Revert "Merge "Compress in-memory compression-info" from Botond" This reverts commit `ef85cf1cb3`, reversing changes made to `de011ece52`. Vlad reports that this causes SIGSEGV on cluster restarts. seastar::backtrace_buffer::append_backtrace() at /home/vladz/work/urchin/seastar/core/reactor.cc:274 (inlined by) print_with_backtrace at /home/vladz/work/urchin/seastar/core/reactor.cc:289 seastar::print_with_backtrace(char const) at /home/vladz/work/urchin/seastar/core/reactor.cc:296 sigsegv_action at /home/vladz/work/urchin/seastar/core/reactor.cc:3512 (inlined by) operator() at /home/vladz/work/urchin/seastar/core/reactor.cc:3498 (inlined by) _FUN at /home/vladz/work/urchin/seastar/core/reactor.cc:3494 ?? ??:0 operator()<seastar::temporary_buffer<char> > at /home/vladz/work/urchin/sstables/sstables.cc:870 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/apply.hh:44 (inlined by) do_void_futurize_apply_tuple<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/future.hh:1270 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/future.hh:1290 (inlined by) then<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)> > at /home/vladz/work/urchin/seastar/core/future.hh:890 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:873 (inlined by) do_until_continued<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>, sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>&> at /home/vladz/work/urchin/seastar/core/future-util.hh:155 do_until<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>, sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>&> at /home/vladz/work/urchin/seastar/core/future-util.hh:330 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:874 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/apply.hh:44 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:1302 then<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:890 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:875 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()> > at /home/vladz/work/urchin/seastar/core/apply.hh:44 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:1302 operator()<seastar::future_state<> > at /home/vladz/work/urchin/seastar/core/future.hh:900 (inlined by) run at /home/vladz/work/urchin/seastar/core/future.hh:395 seastar::reactor::run_tasks(seastar::circular_buffer<std::unique_ptr<seastar::task, std::default_delete<seastar::task> >, std::allocator<std::unique_ptr<seastar::task, std::default_delete<seastar::task> > > >&) at /home/vladz/work/urchin/seastar/core/reactor.cc:2317 seastar::reactor::run() at /home/vladz/work/urchin/seastar/core/reactor.cc:2775 seastar::app_template::run_deprecated(int, char*, std::function<void ()>&&) at /home/vladz/work/urchin/seastar/core/app-template.cc:142	2017-08-24 11:44:14 +02:00
Avi Kivity	ef85cf1cb3	Merge "Compress in-memory compression-info" from Botond "Overly large metadata can hog memory which especially hurts in setups with bad disk/memory ratio. To ease the pain compress the in-memory compression-info. The compression is implemented based on Avi's idea which is to group n offsets together into segments, where each segment stores a base absolute offset into the file, the other offsets in the segments being relative offsets (and thus of reduced size). Also offsets are allocated only just enough bits to store their maximum value. The offsets are thus packed in a buffer like so: arrrarrrarrr... where n is 4, a is an absolute offset and r are offsets relative to a. This of course means that stored offsets will not be aligned, not even on a byte boundary, but the size reduction pretty convincing. In addition, segments are stored in buckets, where each bucket has its own base offset. In addition, segments in a buckets are optimized to address as large of a chunk of the data as possible for a given chunk size." Ref #1946. * 'bdenes/compress-compression-v3' of https://github.com/denesb/scylla: Add unit test for compress::offsets Optimise the storage of compression chunk offsets Add script to precompute segmented compression parameters	2017-08-22 10:30:58 +03:00
Botond Dénes	62c18da35c	Add unit test for compress::offsets	2017-08-21 17:06:20 +03:00
Botond Dénes	028c7a0888	Optimise the storage of compression chunk offsets To reduce the memory footprint of compression-info, n offsets are grouped together into segments, where each segment stores a base absolute offset into the file, the other offsets in the segments being relative offsets (and thus of reduced size). Also offsets are allocated only just enough bits to store their maximum value. The offsets are thus packed in a buffer like so: arrrarrrarrr... where n is 4, a is an absolute offset and r are offsets relative to a. The optimal value of n can be calculated for a given file_size (f) and chunk_size (c), by finding the minima of the following function: f(n) = (f/c)/n * (log2(f) + (n - 1)log2((n-1)(c + 64))) This is done in an empirical way, using a script (see below). Furthermore segments are stored in buckets, where each bucket has its own base offset. Each bucket therefore can address an equal chunk of the file and furthermore each segment in a bucket can address an equal sub-chunk of this area. The value of a given offset i is thus: bucket_base_offset_for(i) + segment_base_offset_for(i) + offset(i) To account for the bucketed storage we calculate a local_f, which is optimized so that a bucketful of segmented offsets can address the largest possible chunk of f. As value of this local_f only depends on the bucket_size (b) and c the value of n can be made independent of f and therefore only depend on one dynamic value, c. This makes life much simpler as we don't need to know the size of the file up-front, we can just append buckets to the storage on demand, while the required storage is still less than a third [1] of the original storage requirements (std::deque<uint64>). The table with the minima(f(n)) for different f and c values is pre-computed by gen_segmented_compress_params.py and stored in sstables/segmented_compress_params.hh. This script also creates a table with the best values of local_f for the given bucket_size. At runtime we only select the best params based on c. [1] This was calculated for c=4K and b=4K	2017-08-21 17:06:12 +03:00
Piotr Jastrzebski	c602ffd610	Make Scylla ttl expiration behave like in Cassandra Fixes #2497 [tgrabiec: reworked the title] Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <2f5a99dce6ef11fe0ef135c9fa0592078fc9a056.1502886874.git.piotr@scylladb.com>	2017-08-21 14:25:45 +02:00
Tomasz Grabiec	8f2ca52740	tests: Run test_query_only_static_row test case on all mutation sources The test checks behavior common to all mutation readers, so it's better to run it against all mutation sources rather than only for cache reader. Message-Id: <1503072333-17995-1-git-send-email-tgrabiec@scylladb.com>	2017-08-20 12:23:28 +03:00
Botond Dénes	eb7eee510d	combined_mutation_reader_test: use the global const objects directly Instead of local ones. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <3ec1a70e4c0198c0563dff9688bbaa7fcfcace71.1502891190.git.bdenes@scylladb.com>	2017-08-16 16:56:42 +03:00
Duarte Nunes	7fb6a74302	combined_mutation_reader: Drop exhausted readers if not in FF mode Exhausted readers can be fast forwarded, so we have to keep them around. However, if the current reader is not fast forwardable, then we can drop those readers and their buffers. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Duarte Nunes	77477605c1	memtable_snapshot_source: Created readers should be fast forwardable As they're used by the cache tests. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Raphael S. Carvalho	050a7019b8	sstables/index_reader: fix index reader for summary entry spanning lots of keys quantity prevents index_reader from reading all index entries of a summary entry that span more than min_index_interval entries. That can happen after introduction of size-based sampling, and consequently, sstable will not be able to return a key which logical position in summary entry is beyond min_index_interval. It's ok to not use quantity because index_reader will read all indexes until either next summary entry or end of file is reached. Fixes test_sstable_conforms_to_mutation_source Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170812045821.25269-1-raphaelsc@scylladb.com>	2017-08-12 09:44:16 +03:00
Avi Kivity	dbf8625ac9	Merge "size-based sampling for sstable summary" from Raphael "Fixes #1842." * 'size_based_sampling_v3' of github.com:raphaelsc/scylla: tests: test summary entry spanning more keys than min interval db/config: introduce sstable_summary_ratio option sstables: introduce size-based sampling for sstable summary sstables: make components_writer::offset const qualified and uint64_t sstables: make writer::offset const qualified and uint64_t	2017-08-11 18:41:45 +03:00
Duarte Nunes	20337053ad	Don't use literal lambdas These are only available in C++17. Fixes the build after `b5460c2`. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-11 13:08:42 +02:00
Duarte Nunes	b5460c2990	Merge "Support `duration` type" from Jesse "This patch series adds support for the `duration` type in CQL, which was added to Cassandra in 3.10. As part of this work, it was necessary also to add support for the `vint` and `unsigned vint` types to the native protocol implementation, which are part of v5 of the specification. To test interactively, it is necessary to use cqlsh distributed with Cassandra, as the version we distribute does not yet support the duration type." * 'jhk/duration_protocol/v5' of https://github.com/hakuch/scylla: Support `duration` CQL native type CQL native protocol: Add support for `vint` serialization duration_test.cc: Add test for printing zero duration duration.cc: Remove nop `const` qualifier on return type Change `const` qualifier declaration order for `duration` duration.cc: Simplify range checking Rename `duration` to `cql_duration`	2017-08-11 10:56:55 +01:00
Raphael S. Carvalho	5124f94358	tests: test summary entry spanning more keys than min interval Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 01:37:06 -03:00
Raphael S. Carvalho	8726ee937d	sstables: introduce size-based sampling for sstable summary Currently, a summary entry is added after min_index_interval index entries were written. Not taking into account size of index entries becomes a problem with large partitions which may create big index entries due to promoted indexes. Read performance is affected as a consequence because index entries spanned by summary are all read from disk to serve request. What we wanna do is to also add a summary entry after index reaches a boundary. To deal with oversampling, we want to write 1 byte to summary for every 2000 bytes written to data file (this will be eventually made into an option in the config file). Both conditions must be met to avoid under or oversampling. That way, the amount of data needed from index file to satify the request is drastically reduced. Fixes #1842. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 00:30:12 -03:00
Jesse Haber-Kucharsky	509626fe08	Support `duration` CQL native type `duration` is a new native type that was introduced in Cassandra 3.10 [1]. Support for parsing and the internal representation of the type was added in `8fa47b74e8`. Important note: The version of cqlsh distributed with Scylla does not have support for durations included (it was added to Cassandra in [2]). To test this change, you can use cqlsh distributed with Cassandra. Duration types are useful when working with time-series tables, because they can be used to manipulate date-time values in relative terms. Two interesting applications are: - Aggregation by time intervals [3]: `SELECT * FROM my_table GROUP BY floor(time, 3h)` - Querying on changes in date-times: `SELECT ... WHERE last_heartbeat_time < now() - 3h` (Note: neither of these is currently supported, though columns with duration values are.) Internally, durations are represented as three signed counters: one for months, for days, and for nanoseconds. Each of these counters is serialized using a variable-length encoding which is described in version 5 of the CQL native protocol specification. The representation of a duration as three counters means that a semantic ordering on durations doesn't exist: Is `1mo` greater than `1mo1d`? We cannot know, because some months have more days than others. Durations can only have a concrete absolute value when they are "attached" to absolute date-time references. For example, `2015-04-31 at 12:00:00 + 1mo`. That duration values are not comparable presents some difficulties for the implementation, because most CQL types are. Like in Cassandra's implementation [2], I adopted a similar strategy to the way restrictions on the `counter` type are checked. A type "references" a duration if it is either a duration or it contains a duration (like a `tuple<..., duration, ...>`, or a UDT with a duration member). The following restrictions apply on durations. Note that some of these contexts are either experimental features (materialized views), or not currently supported at run-time (though support exists in the parser and code, so it is prudent to add the restrictions now): - Durations cannot appear in any part of a primary key, either for tables or materialized views. - Durations cannot be directly used as the element type of a `set`, nor can they be used as the key type of a `map`. Because internal ordering on durations is based on a byte-level comparison, this property of Cassandra was intended to help avoid user confusion around ordering of collection elements. - Secondary indexes on durations are not supported. - "Slice" relations (<=, <, >=, >) are not supported on durations with `WHERE` restrictions (like `SELECT ... WHERE span <= 3d`). Multi-column restrictions only work with clustering columns, which cannot be `duration` due to the first rule. - "Slice" relations are not supported on durations with query conditions (like `UPDATE my_table ... IF span > 5us`). Backwards incompatibility note: As described in the documentation [4], duration literals take one of two forms: either ISO 8601 formats (there are three), or a "standard" format. The ISO 8601 formats start with "P" (like "P5W"). Therefore, identifiers that have this form are no longer supported. Fixes #2240. [1] https://issues.apache.org/jira/browse/CASSANDRA-11873 [2] `bfd57d13b7` [3] https://issues.apache.org/jira/browse/CASSANDRA-11871 [4] http://cassandra.apache.org/doc/latest/cql/types.html#working-with-durations	2017-08-10 15:01:10 -04:00
Jesse Haber-Kucharsky	91dab1d998	CQL native protocol: Add support for `vint` serialization Version 5 of the native protocol for CQL [1] adds the `vint` and `unsigned vint` types. An unsigned integer encoded as a `vint` has a variable size based on the magnitude of the value. The first byte indicates the total number of bytes. For signed integers, a "zig-zag" encoding scheme ensures that small negative values are encoded as short-length `vint`s (0 -> 0, -1 -> 1, 1 -> 2, 2 -> 3, -2 -> 4, etc). [1] https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec	2017-08-10 14:11:30 -04:00

1 2 3 4 5 ...

1597 Commits