scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Avi Kivity	b94997be0d	Merge " Extract MC sstable writer to a separate compilation unit" from Tomasz " The motivation is to keep code related to each format separate, to make it easier to comprehend and reduce incremental compilation times. Also reduces dependency on sstable writer code by removing writer bits from sstales.hh. The ka/la format writers are still left in sstables.cc, they could be also extracted. " * 'extract-sstable-writer-code' of github.com:tgrabiec/scylla: sstables: Make variadic write() not picked on substitution error sstables: Extract MC format writer to mc/writer.cc sstables: Extract maybe_add_summary_entry() out of components_writer sstables: Publish functions used by writers in writer.hh sstables: Move common write functions to writer.hh sstables: Extract sstable_writer_impl to a header sstables: Do not include writer.hh from sstables.hh sstables: mc: Extract bound_kind_m related stuff into mc/types.hh sstables: types: Extract sstable_enabled_features::all() sstables: Move components_writer to .cc tests: sstable_datafile_test: Avoid dependency on components_writer (cherry picked from commit `b023e8b45d`)	2018-12-21 20:40:35 +02:00
Tomasz Grabiec	9b299241e5	Merge "Fixes for collecting stats in SST3 + more tests" from Vladimir This patchset fixes several remaining issues found during thorough testing of SSTables 3.x statistics and enriches ~30 unit tests with statistics validation against Cassandra-generated golden copies. * https://github.com/argenet/scylla/tree/projects/sstables-30/sst3-tests-statistics/v1: sstables: Enforce estimated_partitions in generate_summary() to be always positive. sstables: Don't enforce default max_local_deletion_time value for 'mc' files. sstables: Update TTL/local deletion stats for non-expiring and live liveness_info. sstables: Collect statistics when writing RT markers to SSTables 3.x. tests: Return sstable_assertions from validate_read() helper. tests: Introduce helper for validating stats metadata in SSTables 3.x tests. tests: Add stats metadata validation to test_write_static_row. tests: Add stats metadata validation to test_write_composite_partition_key. tests: Add stats metadata validation to test_write_composite_clustering_key. tests: Add stats metadata validation to test_write_wide_partitions. tests: Add stats metadata validation to write_ttled_row tests: Add stats metadata validation to write_ttled_column tests: Add stats metadata validation to write_deleted_column tests: Add stats metadata validation to write_deleted_row tests: Add stats metadata validation to write_collection_wide_update tests: Add stats metadata validation to write_collection_incremental_update tests: Add stats metadata validation to write_multiple_partitions tests: Add stats metadata validation to write_multiple_rows tests: Add stats metadata validation to write_missing_columns_large_set tests: Add stats metadata validation to write_different_types tests: Add stats metadata validation to write_empty_clustering_values tests: Add stats metadata validation to write_large_clustering_key tests: Add stats metadata validation to write_compact_table tests: Add stats metadata validation to write_user_defined_type_table tests: Add stats metadata validation to write_simple_range_tombstone tests: Add stats metadata validation to write_adjacent_range_tombstones tests: Add stats metadata validation to write_non_adjacent_range_tombstones tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows tests: Add stats metadata validation to write_range_tombstone_same_start_with_row tests: Add stats metadata validation to write_range_tombstone_same_end_with_row tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests. (cherry picked from commit `bb24d378b2`)	2018-12-08 14:08:46 +02:00
Avi Kivity	b9c99af18b	Merge "Fix tombstone histogram when writing SSTables 3.x" from Vladimir " This patchset extends a number of existing tests to check SSTables statistics for 'mc' format and fixes an issue discovered with the help of one of the tests. Tests: unit {release} " * 'projects/sstables-30/check-stats/v2' of https://github.com/argenet/scylla: tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions. tests: Run sstable_tombstone_histogram_test for all SSTables versions. tests: Run min_max_clustering_key_test on all SSTables versions. tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions. tests: Run test_sstable_max_local_deletion_time on all SSTables versions. tests: Extend test checking tombstones histogram to cover all SSTables versions. sstables: Properly track row-level tombstones when writing SSTables 3.x. tests: Run min_max_clustering_key_test_2 for all SSTables versions. tests: Make reusable_sst() helper accept SSTables version parameter. (cherry picked from commit `f073ea5f87`)	2018-12-08 14:08:44 +02:00
Avi Kivity	324dae3e12	Merge "compress: Restore lz4 as default compressor" from Duarte " Enables sstable compression with LZ4 by default, which was the long-time behavior until a regression turned off compression by default. Fixes #3926 " * 'restore-default-compression/v2' of https://github.com/duarten/scylla: tests/cql_query_test: Assert default compression options compress: Restore lz4 as default compressor tests: Be explicit about absence of compression (cherry picked from commit `bb85a21a8f`)	2018-11-21 16:45:22 +02:00
Avi Kivity	1468ec62de	Merge "Handle simple column type schema changes in SST3" from Piotr " This patchset enables very simple column type conversions. It covers only handling variable and fixed size type differences. Two types still have to be compatiple on bits level to be able to convert a field from one to the other. " * 'haaawk/sst3/column_type_schema_change/v4' of github.com:scylladb/seastar-dev: Fix check_multi_schema to actually check the column type change Handle very basic column type conversions in SST3 Enable check_multi_schema for SST3 (cherry picked from commit `b9702222f8`)	2018-10-03 17:44:26 +03:00
Vladimir Krivopalov	c33e0f3f15	tests: Fix test_wrong_range_tombstone_order for 'mc' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 18:02:46 -07:00
Vladimir Krivopalov	9239195473	tests: Disable test_old_format_non_compound_range_tombstone_is_read for 'mc' format. This test is not applicable to the 'mc' format as it covers a backward compatibility case which may only occur with SSTables generated by older Scylla versions in 'ka' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 18:02:46 -07:00
Vladimir Krivopalov	952536c9f5	tests: Disable check_multi_schema for 'mc' format. Altering types in schema has been disabled in Origin (see CASSANDRA-12443). We do the same. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 18:02:46 -07:00
Vladimir Krivopalov	86aae36e04	tests: Fix test_promoted_index_read for 'mc' format by using normalizing_reader. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:55:52 -07:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Vladimir Krivopalov	054eb2df66	tests: Rename sstable_assertions.hh -> tests/index_reader_assertions.hh The previous name of the file is moreover confusing as we have several sstable_assertions classes throughout tests but this header only contains a class for index reader assertions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Botond Dénes	a8e795a16e	sstables_set::incremental_selector: use ring_position instead of token Currently `sstable_set::incremental_selector` works in terms of tokens. Sstables can be selected with tokens and internally the token-space is partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as well. This is problematic for severeal reasons. The sub-range sstables cover from the token-space is defined in terms of decorated keys. It is even possible that multiple sstables cover multiple non-overlapping sub-ranges of a single token. The current system is unable to model this and will at best result in selecting unnecessary sstables. The usage of token for providing the next position where the intersecting sstables change [1] causes further problems. Attempting to walk over the token-space by repeatedly calling `select()` with the `next_position` returned from the previous call will quite possibly lead to an infinite loop as a token cannot express inclusiveness/exclusiveness and thus the incremental selector will not be able to make progress when the upper and lower bounds of two neighbouring intervals share the same token with different inclusiveness e.g. [t1, t2](t2, t3]. To solve these problems update incremental_selector to work in terms of ring position. This makes it possible to partition the token-space amoing sstables at decorated key granularity. It also makes it possible for select() to return a next_position that is guaranteed to make progress. partitioned_sstable_set now builds the internal interval map using the decorated key of the sstables, not just the tokens. incremental_selector::select() now uses `dht::ring_position_view` as both the selector and the next_position. ring_position_view can express positions between keys so it can also include information about inclusiveness/exclusiveness of the next interval guaranteeing forward progress. [1] `sstable_set::incremental_selector::selection::next_position`	2018-07-04 17:42:33 +03:00
Paweł Dziepak	96b0577343	row_cache: deglobalise row cache tracker Row cache tracker has numerous implicit dependencies on ohter objects (e.g. LSA migrators for data held by mutation_cleaner). The fact that both cache tracker and some of those dependencies are thread local objects makes it hard to guarantee correct destruction order. Let's deglobalise cache tracker and put in in the database class.	2018-06-25 09:37:43 +01:00
Paweł Dziepak	aa25f0844f	atomic_cell: introduce fragmented buffer value interface As a prepratation for the switch to the new cell representation this patch changes the type returned by atomic_cell_view::value() to one that requires explicit linearisation of the cell value. Even though the value is still implicitly linearised (and only when managed by the LSA) the new interface is the same as the target one so that no more changes to its users will be needed.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	418c159057	treewide: require type to copy atomic_cell	2018-05-31 15:51:11 +01:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Paweł Dziepak	e9d6fc48ac	treewide: require type for creating atomic_cell	2018-05-31 15:51:11 +01:00
Paweł Dziepak	93130e80fb	atomic_cell: require column_definition for creating atomic_cell views	2018-05-31 15:51:11 +01:00
Raphael S. Carvalho	59c57861ae	tests/sstable_test: switch to dynamic temporary dir creation sstable test fails when running concurrently (for example, release and debug mode) because it uses a static temporary dir in lots of tests. Let's fix it by switching to dynamic temporary dir, which is created using mkdtemp(). Also the sstable tests will now run in /tmp, and so it's made much faster. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180516042044.15336-1-raphaelsc@scylladb.com>	2018-05-16 08:00:29 +03:00
Vladimir Krivopalov	1da6144f90	tests: Add test covering checksumming SSTables 3.0 with CRC32. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-13 12:38:25 -07:00
Vladimir Krivopalov	adb43959d1	sstables: Move adler32 routines under the scope of a class. This is a step towards making digest algorithm customizable at compile time. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-13 12:38:25 -07:00
Piotr Sarna	fe02c3d0e2	database, sstables, tests: add large_partition_handler This commit makes database, sstables and tests aware of which large_partition_handler they use. Proper large_partition_handler is retrievable from config information and is based on existing compaction_large_partition_warning_threshold_mb entry. Right now CQL TABLE variant of large_partition_handler is used in the database. Tests use a NOP version of large_partition_handler, which does not depend on CQL queries at all.	2018-05-04 14:38:13 +02:00
Botond Dénes	ed7bde99bc	simple_schema.hh: remove unused include	2018-04-30 17:17:44 +03:00
Vladimir Krivopalov	b3572acd6e	A few improvements to encoding_stats structure. - Use the same default epoch as Origin - Use default value for the encoding_stats parameter in sstable::write_components() Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <846c6d2cbb97d2dd25968cb00b8557c86ff5e35c.1524854727.git.vladimir@scylladb.com>	2018-04-27 22:03:38 +03:00
Vladimir Krivopalov	948c4d79d3	Collect encoding statistics for memtable updates. We keep track of all updates and store the minimal values of timestamps, TTLs and local deletion times across all the inserted data. These values are written as a part of serialization_header for Statistics.db and used for delta-encoding values when writing Data.db file in SSTables 3.0 (mc) format. For #1969. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-04-25 15:39:14 -07:00
Piotr Jastrzebski	d492e92b15	Extract sstable::component_type to separete header It will be used in other places which won't depend on sstable. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:29:57 +02:00
Avi Kivity	28be4ff5da	Revert "Merge "Implement loading sstables in 3.x format" from Piotr" This reverts commit `513479f624`, reversing changes made to `01c36556bf`. It breaks booting. Fixes #3376.	2018-04-23 06:47:00 +03:00
Avi Kivity	513479f624	Merge "Implement loading sstables in 3.x format" from Piotr " Pass sstable version to parse, write and describe_type methods to make it possible to handle different versions. For now serialization header from 3.x format is ignored. Tests: units (release) " * 'haaawk/sstables3/loading_v3' of ssh://github.com/scylladb/seastar-dev: Add test for loading the whole sstable Add test for loading statistics Add support for 3_x stats metadata Pass sstable version to describe_type Pass sstable version to write methods metadata_type: add Serialization type Pass sstable_version_types to parse methods Add test for reading filter Add test for read_summary sstables 3.x: Add test for reading TOC sstable: Make component_map version dependent sstable::component_type: add operator<< Extract sstable::component_type to separete header Remove unused sstable::get_shared_components sstable_version_types: add mc version	2018-04-22 16:18:39 +03:00
Piotr Jastrzebski	82d483a1d3	Extract sstable::component_type to separete header It will be used in other places which won't depend on sstable. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 13:45:29 +02:00
Avi Kivity	70220d8f85	tests: sstable_datafile_test: peel off redundant parentheses around compression_parameters initializer The compression_parameter constructor is called with an extra level of parentheses. Presumably this caused a temporary object to be constructed and then moved into the argument being initialized, but gcc 8 complains about ambiguity. Make it happy by stripping off the redundant parentheses. Message-Id: <20180421121854.12314-1-avi@scylladb.com>	2018-04-21 13:53:29 +01:00
Raphael S. Carvalho	0c72781939	sstables/twcs: add support to millisecond timestamp resolution That's blocking KairosDB users because it uses TWCS with millisecond timestamp resolution. Also older drivers use millisecond instead of the default microsecond. Fixes #3152. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180411171244.19958-1-raphaelsc@scylladb.com>	2018-04-12 12:46:52 +03:00
Vladimir Krivopalov	3a9cb54c76	Merge the pair of index_readers into just one tracking a range. Historically, we had two index_readers per a sstable_mutation_reader, one for the lower bound and one for the upper bound. Most of public members of the index_reader class were only called on either of those. With the changes introduced in #2981, two readers are even more tied together as they now have a shared-per-pair list of index pages that needs proper cleanup and was protruding woefully into the caller code. This fix re-structures index_reader so that it now keeps track of both lower and upper bounds. The shared_index_lists structure is encapsulated within index_reader and becomes an internal detail rather than a liability. Fixes #3220. Tests: unit (debug, release) + Tested using cassandra-stress commands from #3189. perf_fast_forward results indicate there is no performance degradation caused by thix fix. =========================== Baseline =================================== running: large-partition-skips Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1 0 0.494458 1000000 2022418 1018 126960 27 0 0 0 0 0 0 0 97.6% 1 1 1.754717 500000 284946 997 127064 6 0 0 3 3 0 0 0 99.9% 1 8 0.551664 111112 201413 997 127064 6 0 0 3 3 0 0 0 99.7% 1 16 0.383888 58824 153232 1001 127080 10 0 0 5 5 0 0 0 99.5% 1 32 0.289073 30304 104832 997 127064 28 0 0 3 3 0 0 0 99.3% 1 64 0.236963 15385 64926 997 127064 122 0 0 3 3 0 0 0 99.2% 1 256 0.172901 3892 22510 997 127064 217 0 0 3 3 0 0 0 95.5% 1 1024 0.117570 976 8301 997 127064 235 0 0 3 3 0 0 0 49.0% 1 4096 0.085811 245 2855 664 27172 375 274 0 3 3 0 0 0 21.4% 64 1 0.512781 984616 1920149 1142 127064 139 0 0 3 3 0 0 0 98.7% 64 8 0.479232 888896 1854833 1001 127080 10 0 0 5 5 0 0 0 99.6% 64 16 0.451193 800000 1773078 997 127064 6 0 0 3 3 0 0 0 99.6% 64 32 0.408684 666688 1631305 997 127064 6 0 0 3 3 0 0 0 99.5% 64 64 0.351906 500032 1420924 997 127064 14 0 0 3 3 0 0 0 99.5% 64 256 0.227008 200000 881026 997 127064 211 0 0 3 3 0 0 0 99.1% 64 1024 0.125803 58880 468032 997 127064 290 0 0 3 3 0 0 0 65.1% 64 4096 0.098155 15424 157139 703 27856 401 267 0 3 3 0 0 0 25.8% running: large-partition-slicing Testing slicing of large partition: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000701 1 1427 9 296 6 4 0 3 3 0 0 0 12.4% 0 32 0.000698 32 45827 9 296 6 3 0 3 3 0 0 0 13.9% 0 256 0.000808 256 316920 10 328 6 3 0 3 3 0 0 0 24.9% 0 4096 0.004368 4096 937697 25 808 14 3 0 3 3 0 0 0 45.9% 500000 1 0.001196 1 836 13 412 9 4 0 3 3 0 0 0 22.7% 500000 32 0.001200 32 26664 13 412 9 4 0 3 3 0 0 0 22.2% 500000 256 0.001503 256 170338 14 444 10 4 0 3 3 0 0 0 25.3% 500000 4096 0.004351 4096 941465 30 956 20 4 0 3 3 0 0 0 50.7% running: large-partition-slicing-clustering-keys Testing slicing of large partition using clustering keys: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000625 1 1601 7 176 6 0 0 3 3 0 0 0 23.2% 0 32 0.000604 32 53016 7 176 6 0 0 3 3 0 0 0 24.7% 0 256 0.000695 256 368498 8 180 6 0 0 3 3 0 0 0 36.4% 0 4096 0.004083 4096 1003106 20 692 12 1 0 3 3 0 0 0 47.0% 500000 1 0.001198 1 835 12 516 9 3 0 3 3 0 0 0 22.8% 500000 32 0.000981 32 32631 12 388 9 3 0 3 3 0 0 0 29.2% 500000 256 0.001320 256 194011 13 384 10 3 0 3 3 0 0 0 29.0% 500000 4096 0.003944 4096 1038567 25 840 17 2 0 3 3 0 0 0 52.2% running: large-partition-slicing-single-key-reader Testing slicing of large partition, single-partition reader: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000849 1 1178 9 488 6 0 0 3 3 0 0 0 16.5% 0 32 0.000661 32 48415 9 296 6 0 0 3 3 0 0 0 22.2% 0 256 0.000756 256 338648 10 328 6 0 0 3 3 0 0 0 33.3% 0 4096 0.004147 4096 987610 22 840 12 1 0 3 3 0 0 0 47.9% 500000 1 0.001041 1 960 13 476 9 3 0 3 3 0 0 0 25.9% 500000 32 0.001020 32 31375 13 412 9 3 0 3 3 0 0 0 29.1% 500000 256 0.001265 256 202373 14 444 10 3 0 3 3 0 0 0 32.0% 500000 4096 0.004121 4096 994014 30 988 18 3 0 3 3 0 0 0 52.7% running: large-partition-select-few-rows Testing selecting few rows from a large partition: stride rows time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1000000 1 0.000668 1 1498 9 296 6 4 0 3 3 0 0 0 19.8% 500000 2 0.000976 2 2048 13 412 9 4 0 3 3 0 0 0 29.0% 250000 4 0.001408 4 2842 18 572 12 6 0 3 3 0 0 0 28.8% 125000 8 0.002004 8 3993 29 912 19 10 0 3 3 0 0 0 34.0% 62500 16 0.002883 16 5551 50 1584 32 18 0 3 3 0 0 0 41.9% 2 500000 1.053215 500000 474737 1138 127080 120 0 0 5 5 0 0 0 99.7% running: large-partition-forwarding Testing forwarding with clustering restriction in a large partition: pk-scan time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu yes 0.002717 2 736 24 2684 8 16 0 3 3 0 0 0 19.7% no 0.001004 2 1992 13 412 8 2 0 3 3 0 0 0 30.2% running: small-partition-skips Testing scanning small partitions with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 1.466523 1000000 681885 1369 139732 33 1 0 0 0 0 0 0 99.7% -> 1 1 12.792183 500000 39086 6235 177736 5155 0 0 5123 7663 0 0 0 96.4% -> 1 8 3.451431 111112 32193 6235 177736 5155 0 0 5123 9673 0 0 0 84.8% -> 1 16 2.223815 58824 26452 6234 177704 5154 0 0 5122 9965 0 0 0 75.0% -> 1 32 1.512511 30304 20036 6233 177680 5155 1 0 5123 10090 0 0 0 61.8% -> 1 64 1.129465 15385 13621 6227 177464 5154 0 0 5122 10159 0 0 0 49.5% -> 1 256 0.733282 3892 5308 6211 175464 5178 24 0 5122 10220 0 0 0 33.8% -> 1 1024 0.397302 976 2457 5946 142152 5369 217 0 5120 10235 0 0 0 32.1% -> 1 4096 0.187746 245 1305 5499 81992 5296 142 0 5122 10240 0 0 0 46.8% -> 64 1 2.428488 984616 405444 7332 177736 5155 25 0 5123 5208 0 0 0 79.9% -> 64 8 2.262876 888896 392817 6235 177736 5155 0 0 5123 5654 0 0 0 78.1% -> 64 16 2.137544 800000 374261 6234 177732 5154 0 0 5122 6110 0 0 0 77.1% -> 64 32 1.862466 666688 357960 6235 177736 5155 0 0 5123 6844 0 0 0 73.7% -> 64 64 1.547757 500032 323069 6234 177728 5155 0 0 5123 7651 0 0 0 68.7% -> 64 256 0.914612 200000 218672 6233 177704 5154 0 0 5122 9202 0 0 0 55.5% -> 64 1024 0.475472 58880 123835 6229 177492 5154 5 0 5122 9930 0 0 0 45.4% -> 64 4096 0.271239 15424 56865 6158 169480 5257 114 0 5115 10142 0 0 0 44.1% running: small-partition-slicing Testing slicing small partitions: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.003209 1 312 3 260 2 7 0 1 1 0 0 0 15.5% 0 32 0.004205 32 7610 16 1428 10 0 0 5 5 0 0 0 15.7% 0 256 0.009830 256 26042 97 8572 62 0 0 31 31 0 0 0 18.7% 0 4096 0.015471 4096 264748 100 8704 64 0 0 32 32 0 0 0 48.4% 500000 1 0.003654 1 274 34 492 33 0 0 32 64 0 0 0 28.7% 500000 32 0.004287 32 7464 40 1260 36 0 0 32 64 0 0 0 26.0% 500000 256 0.009598 256 26673 100 8748 64 4 0 32 64 0 0 0 20.6% 500000 4096 0.014151 4096 289449 119 7892 85 0 0 53 64 0 0 0 54.1% ======================== With the patch ================================ running: large-partition-skips Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1 0 0.468887 1000000 2132711 1018 126960 29 0 0 0 0 0 0 0 98.4% 1 1 1.735113 500000 288166 1001 127080 10 0 0 5 5 0 0 0 99.9% 1 8 0.535616 111112 207447 997 127064 6 0 0 3 3 0 0 0 99.6% 1 16 0.365487 58824 160947 1001 127080 15 0 0 5 5 0 0 0 99.5% 1 32 0.272208 30304 111326 997 127064 21 0 0 3 3 0 0 0 99.3% 1 64 0.224049 15385 68668 997 127064 208 0 0 3 3 0 0 0 99.1% 1 256 0.159247 3892 24440 997 127064 250 0 0 3 3 0 0 0 94.7% 1 1024 0.102107 976 9559 997 127064 292 0 0 3 3 0 0 0 53.6% 1 4096 0.084310 245 2906 664 27172 371 273 0 3 3 0 0 0 20.2% 64 1 0.508340 984616 1936923 1142 127064 129 0 0 3 3 0 0 0 98.1% 64 8 0.470369 888896 1889786 997 127064 6 0 0 3 3 0 0 0 99.6% 64 16 0.439917 800000 1818526 1001 127080 10 0 0 5 5 0 0 0 99.6% 64 32 0.397938 666688 1675358 997 127064 6 0 0 3 3 0 0 0 99.5% 64 64 0.344144 500032 1452972 997 127064 18 0 0 3 3 0 0 0 99.4% 64 256 0.219996 200000 909107 997 127064 251 0 0 3 3 0 0 0 99.1% 64 1024 0.124294 58880 473715 997 127064 284 1 0 3 3 0 0 0 62.2% 64 4096 0.097580 15424 158065 703 27856 400 267 0 3 3 0 0 0 25.3% running: large-partition-slicing Testing slicing of large partition: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000733 1 1365 9 296 6 4 0 3 3 0 0 0 19.3% 0 32 0.000705 32 45417 9 296 6 3 0 3 3 0 0 0 15.3% 0 256 0.000830 256 308364 10 328 6 3 0 3 3 0 0 0 26.7% 0 4096 0.004631 4096 884529 25 808 14 3 0 3 3 0 0 0 48.1% 500000 1 0.001184 1 845 13 412 9 4 0 3 3 0 0 0 23.7% 500000 32 0.001199 32 26690 13 412 9 4 0 3 3 0 0 0 21.9% 500000 256 0.001530 256 167296 14 444 10 4 0 3 3 0 0 0 26.8% 500000 4096 0.004379 4096 935474 30 956 19 4 0 3 3 0 0 0 51.5% running: large-partition-slicing-clustering-keys Testing slicing of large partition using clustering keys: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000620 1 1614 7 176 6 0 0 3 3 0 0 0 27.4% 0 32 0.000625 32 51218 7 176 6 0 0 3 3 0 0 0 27.0% 0 256 0.000701 256 365148 8 180 6 0 0 3 3 0 0 0 35.2% 0 4096 0.004063 4096 1008130 20 692 12 1 0 3 3 0 0 0 47.6% 500000 1 0.001208 1 827 12 516 9 3 0 3 3 0 0 0 24.3% 500000 32 0.000973 32 32876 12 388 9 3 0 3 3 0 0 0 28.7% 500000 256 0.001315 256 194612 13 384 10 3 0 3 3 0 0 0 29.0% 500000 4096 0.003950 4096 1037068 25 840 17 2 0 3 3 0 0 0 52.7% running: large-partition-slicing-single-key-reader Testing slicing of large partition, single-partition reader: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000844 1 1185 9 488 6 0 0 3 3 0 0 0 16.5% 0 32 0.000656 32 48753 9 296 6 0 0 3 3 0 0 0 23.1% 0 256 0.000751 256 341011 10 328 6 0 0 3 3 0 0 0 34.0% 0 4096 0.004173 4096 981632 22 840 12 1 0 3 3 0 0 0 47.0% 500000 1 0.001036 1 966 13 476 9 3 0 3 3 0 0 0 25.4% 500000 32 0.001014 32 31573 13 412 9 3 0 3 3 0 0 0 27.4% 500000 256 0.001280 256 200044 14 444 10 3 0 3 3 0 0 0 31.8% 500000 4096 0.004081 4096 1003746 30 988 18 3 0 3 3 0 0 0 51.6% running: large-partition-select-few-rows Testing selecting few rows from a large partition: stride rows time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1000000 1 0.000668 1 1498 9 296 6 3 0 3 3 0 0 0 21.7% 500000 2 0.000958 2 2088 13 412 9 4 0 3 3 0 0 0 27.7% 250000 4 0.001495 4 2676 18 572 12 6 0 3 3 0 0 0 25.8% 125000 8 0.002069 8 3866 29 912 19 10 0 3 3 0 0 0 30.8% 62500 16 0.002856 16 5603 50 1584 32 18 0 3 3 0 0 0 41.7% 2 500000 1.063129 500000 470310 1138 127080 120 0 0 5 5 0 0 0 99.7% running: large-partition-forwarding Testing forwarding with clustering restriction in a large partition: pk-scan time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu yes 0.002567 2 779 24 2684 8 16 0 3 3 0 0 0 21.5% no 0.001013 2 1975 13 412 8 2 0 3 3 0 0 0 28.9% running: small-partition-skips Testing scanning small partitions with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 1.349959 1000000 740763 1369 139732 33 1 0 0 0 0 0 0 99.7% -> 1 1 12.640751 500000 39555 8144 191168 7064 0 0 7032 11481 0 0 0 96.2% -> 1 8 3.404269 111112 32639 6651 180660 5571 0 0 5539 10505 0 0 0 84.5% -> 1 16 2.175424 58824 27040 6434 179116 5354 0 0 5322 10365 0 0 0 74.3% -> 1 32 1.493365 30304 20292 6335 178404 5257 0 0 5225 10294 0 0 0 61.1% -> 1 64 1.112168 15385 13833 6256 177672 5183 0 0 5151 10217 0 0 0 48.7% -> 1 256 0.719282 3892 5411 6211 175464 5178 24 0 5122 10220 0 0 0 33.3% -> 1 1024 0.393236 976 2482 5946 142152 5369 217 0 5120 10235 0 0 0 30.7% -> 1 4096 0.185284 245 1322 5499 81992 5296 142 0 5122 10240 0 0 0 44.7% -> 64 1 2.356711 984616 417792 7361 177944 5184 21 0 5152 5266 0 0 0 79.1% -> 64 8 2.192331 888896 405457 6253 177868 5173 0 0 5141 5690 0 0 0 77.2% -> 64 16 2.029835 800000 394121 6245 177812 5165 0 0 5133 6132 0 0 0 75.7% -> 64 32 1.806448 666688 369060 6245 177808 5165 0 0 5133 6864 0 0 0 72.6% -> 64 64 1.508492 500032 331478 6242 177788 5163 0 0 5131 7667 0 0 0 67.7% -> 64 256 0.892881 200000 223994 6233 177704 5154 0 0 5122 9202 0 0 0 54.2% -> 64 1024 0.465715 58880 126429 6229 177492 5154 0 0 5122 9930 0 0 0 44.0% -> 64 4096 0.266582 15424 57858 6158 169480 5257 114 0 5115 10142 0 0 0 42.3% running: small-partition-slicing Testing slicing small partitions: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.003113 1 321 3 260 2 0 0 1 1 0 0 0 13.4% 0 32 0.004166 32 7682 16 1428 10 0 0 5 5 0 0 0 14.9% 0 256 0.009813 256 26088 97 8572 62 0 0 31 31 0 0 0 18.4% 0 4096 0.014798 4096 276794 100 8704 64 0 0 32 32 0 0 0 46.3% 500000 1 0.003700 1 270 34 492 33 0 0 32 64 0 0 0 28.4% 500000 32 0.004030 32 7940 40 1260 36 0 0 32 64 0 0 0 27.8% 500000 256 0.009514 256 26908 100 8748 64 0 0 32 64 0 0 0 20.2% 500000 4096 0.013368 4096 306413 119 7892 85 0 0 53 64 0 0 0 53.6% Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <a72818f79ca4081a606424545b0053fa581d49e7.1522173144.git.vladimir@scylladb.com>	2018-03-29 15:23:31 +03:00
Avi Kivity	03c22ad524	Merge "Support for Cassandra 2.2 (LA) SSTable formats" from Daniel " These patches add support for C* 2.2 file(name) format. Namely: * It forces Scylla to write files in la format. * Adds storage-service feature for them. * cf and ks are determined from directory, not from file-name (for 2.2 format). * Adds some other fixes to make dtest happy. * Unit tests work with la format or with both formats. " * 'danfiala/filename-format-2.2-v4' of https://github.com/hagrid-the-developer/scylla: tests/sstables: Tests use la format or iterate over both formats. tests/sstables: Helper functions support 2.2 format directory structure. stables: Use 2.2 (la) format as a default format to store sstables if it is enabled by feature-bits. storage_service: Support la sstable storage format as a feature. sstables: make_descriptor accepts sstable-directory, because it is necessary to determine cf and ks in 2.2 format. sstables: Throw more detail exception for unknown item in reverse_map. sstables/compaction: Suppress NaN in a report of a throughput.	2018-03-19 17:49:44 +02:00
Daniel Fiala	4d703f9c6a	tests/sstables: Tests use la format or iterate over both formats. Signed-off-by: Daniel Fiala <daniel@scylladb.com>	2018-03-19 14:12:10 +01:00
Glauber Costa	d15bfbe548	tests: change tests to make summary non-copyable Right now the summary can be copied, but in real life there is no reason for this to be a requirement. Tests want it, so we can destroy a summary, load another, and compare the two. We can achieve this by allowing the first summary to be moved, and then we can still have a reference to the second. I am about to make a change that will make the summary not copyable as a requirement, so we need to do this first. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-14 10:46:20 -04:00
Raphael S. Carvalho	87035bd8d1	sstables: fix min and max timestamp when negative timestamp is specified unsigned type was incorrectly used for keeping track of min and max timestamp, so a negative number would be treated as a very high number that would incorrectly end up as max timestamp in sstable metadata. Fixes #3000. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180308162217.18963-1-raphaelsc@scylladb.com>	2018-03-08 18:31:30 +02:00
Avi Kivity	d973445a94	Merge "sstable/schema extensions" from Calle " Adds extension points to schema/sstables to enable hooking in stuff, like, say, something that modifies how sstable disk io works. (Cough, cough, encryption) Extensions are processed as property keywords in CQL. To add an extension, a "module" must register it into the extensions object on boot time. To avoid globals (and yet don't), extensions are reachable from config (and thus from db). Table/view tables already contain an extension element, so we utilize this to persist config. schema_tables tables/views from mutations now require a "context" object (currently only extensions, but abstracted for easier further changes. Because of how schemas currently operate, there is a super lame workaround to allow "schema_registry" access to config and by extension extensions. DB, upon instansiation, calls a thread local global "init" in schema_registry and registers the config. It, in turn, can then call table_from_mutations as required. Includes the (modified) patch to encapsulate compression into objects, mainly because it is nice to encapsulate, and isolate a little. " * 'calle/extensions-v5' of github.com:scylladb/seastar-dev: extensions: Small unit test sstables: Process extensions on file open sstables::types: Add optional extensions attribute to scylla metadata sstables::disk_types: Add hash and comparator(sstring) to disk_string schema_tables: Load/save extensions table cql: Add schema extensions processing to properties schema_tables: Require context object in schema load path schema_tables: Add opaque context object config_file_impl: Remove ostream operators main/init: Formalize configurables + add extensions to init call db::config: Add extensions as a config sub-object db::extensions: Configuration object to store various extensions cql3::statements::property_definitions: Use std::variant instead of any sstables: Add extension type for wrapping file io schema: Add opaque type to represent extensions sstables::compress/compress: Make compression a virtual object	2018-02-26 17:15:29 +02:00
Avi Kivity	e77ecda1da	tests: avoid signed/unsigned compares Container indices are size_t, and in other places we gratuituously declare a limit as unsigned and the loop index as signed. Tests: unit (release) Message-Id: <20180212121642.10525-1-avi@scylladb.com>	2018-02-12 12:25:21 +00:00
Avi Kivity	3f5a8229ac	tests: fix for sstable::get_index_reader() removal `71495691aa` removed sstable::get_index_reader(), but forgot to update its callers in tests/. Update the callers to construct a temporary shared_index_list and create the index_reader directly. This is none too clean, but shared_index_lists needs to be retired, and then the changes in this patch can go away too. Tests: unit (release) Message-Id: <20180211164739.17862-1-avi@scylladb.com>	2018-02-11 17:53:08 +00:00
Glauber Costa	98549775fa	sstable_tests: make sure min_threshold is set explicitly The SSTable tests are a bit fragile now because they rely on min_threshold having a particular value. That is the default value, but if I change that default - which I am planning to do - the test breaks. Right now the test is not broken, but if we are planning on relying on a property having a particular value in tests, we should explicitly set it. So I am proactively chaning min_threshold in the tests to have the value of 4 explicitly, so we can change that in the future without breaking anything. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180207155513.12498-1-glauber@scylladb.com>	2018-02-07 18:45:52 +01:00
Calle Wilund	74758c87cd	sstables::compress/compress: Make compression a virtual object Make a "compressor" an actual class, that can be implemented and registered via class registry. For "common" compressors, the objects will be shared, but complex implementors can be semi-stateful. sstable compression is split into two parts: The "static" config which is shared across shards, and a "local" one, which holds a compressor pointer. The latter is encapsulated, along with actual compressed data writers, in sstables/compress.cc. For compression (write), compression writer is instansiated with the settings active in table metadata. For decompression (read), compression reader is instansiated with the settings stored in sstable metadata, which can differ from the currently active table metadata. v2: * Structured patch sets differently (dependencies) * Added more comments/api descs * Added patch to move all sstable compression into compress.cc, effectively separating top-level virtual compressor object from sstable io knowledge v3: * Rebased v4: * Moved all sstable compression logic/knowledge into compress.cc (local compression). Merged the two patches (separation just confuses reader).	2018-02-07 10:11:45 +00:00
Piotr Jastrzebski	6f468802f4	Delete unused consume_all(streamed_mutation&) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:49 +01:00
Piotr Jastrzebski	7729bc5e7b	Remove unused mutation_reader_assertions Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Raphael S. Carvalho	2c181b69c9	sstables: fix wildly inaccurate sstable key estimation after dynamic index sampling The reason sstable key estimation is inaccurate is that it doesn't account that index sampling is now dynamic. The estimation is done as follow: uint64_t get_estimated_key_count() const { return ((uint64_t)_components->summary.header.size_at_full_sampling + 1) * _components->summary.header.min_index_interval; } The biggest problem is that _components->summary.header.min_index_interval isn't actually the minimum interval, but instead the default interval value set in the schema. So the estimation gets worse the larger the average partition, because the larger the average partition the lower the index sampling interval. One of the problems is that estimation has a big influence on bloom filter size, and so for large partitions we were generating bigger filters than we had to. From now on, size at full sampling is calculated as if sampling were static (which was the case until commit `8726ee937d` which introduced size-based sampling), using minimum index as a strict sampling interval. Tests: units (release) Fixes #3113. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180122233612.11147-1-raphaelsc@scylladb.com>	2018-01-23 10:42:24 +02:00
José Guilherme Vanz	380bc0aa0d	Swap arguments order of mutation constructor Swap arguments in the mutation constructor keeping the same standard from the constructor variants. Refs #3084 Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com> Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>	2018-01-21 12:58:42 +02:00
Tomasz Grabiec	16e06b5b46	Merge "remove ability to create a non-flat mutation reader" from Piotr * seastar-dev.git haaawk/flat_reader_clean_up_mutation_source_v3: test_range_queries: create flat reader from source run_sstable_resharding_test: create flat reader from source make_sstable_containing: create flat reader from source test_cache_delegates_to_underlying_only_once_multiple_mutation: use flat reader Migrate materalized views to flat_mutation_reader test_can_write_and_read_non_compound_range_tombstone_as_compound: use flat reader test_writing_combined_stream_with_tombstones_at_the_same_position: use flat reader Add flat_mutation_reader::peek() Add flat_mutation_reader_assertions::produces_range_tombstone Accept clustering_row_ranges in flat_mutation_reader_assertions::produces Add flat_mutation_reader_assertions::produces_eos_or_empty_mutation Add flat_mutation_reader_assertions::fast_forward_to overload test_query_only_static_row: use flat reader Move mutation_rebuilder to header test_streamed_mutation_forwarding_is_consistent_with_slicing: use flat reader test_clustering_slices: use flat reader test_streamed_mutation_forwarding_guarantees: use flat reader test_streamed_mutation_forwarding_across_range_tombstones: use flat reader test_streamed_mutation_slicing_returns_only_relevant_tombstones: use flat reader Add flat_mutation_reader_assertions::is_buffer_full test_fast_forwarding_across_partitions_to_empty_range: use flat reader Remove unused mutation_source::operator() mutation_source: rename make_flat_mutation_reader to make_reader Clean up imports in tests	2018-01-19 12:43:50 +01:00
Piotr Jastrzebski	d266eaa01e	mutation_source: rename make_flat_mutation_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-19 09:30:12 +01:00
Raphael S. Carvalho	f779877f43	tests/sstable_test: fix tests by not triggering compiler bug with c++17 $ gcc --version gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2) The following code struct S { S(int i = 42); }; void f() { S( {} ); } produces this assembly with g++ --std=c++14 lea rax, [rbp-1] mov esi, 0 mov rdi, rax call S::S(int) and this one with g++ --std=c++17 lea rax, [rbp-1] mov esi, 42 mov rdi, rax call S::S(int) For more details about compiler bug, check: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83937 NOTE: clang isn't affected by it. Test relied on braced initialization of compressor (an enum class) working properly when used as argument to compression_parameters's ctor. Braced-initilization of an integer based type should be zero, but default argument (lz4) was used instead, which means compression was enabled when it shouldn't. The course of action is to workaround the bug by explicitly setting compressor type to none. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180119013655.32564-1-raphaelsc@scylladb.com>	2018-01-19 09:27:39 +02:00
Raphael S. Carvalho	e641c0d333	tests: test for infinite recursion bug when doing high-level compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-01-03 16:23:02 -02:00
Piotr Jastrzebski	9e3da50ed1	Don't pass fwd to flat_mutation_reader_from_mutations if it's no Default value for fwd is no so there's no need to pass it explicitly. This is important because we will add additional parameter to flat_mutation_reader_from_mutations in next patch. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 17:00:43 +01:00

1 2 3 4 5

226 Commits