scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 22:25:48 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	538e041f22	Merge "Remove some dependencies on db::config" from Avi db::config is a global class; changes in any module can cause changes in db::config. Therefore, it is a cause of needless recompilation. Remove some of these dependencies by having consumers of db::config declare an intermediate config struct that is contains only configuration of interest to them, and have their caller fill it out (in the case of auth, it already followed this scheme and the patchset only moves the translation function). In addition, some outright pointless inclusions of db/config.hh are removed. The result is somewhat shorter compile times, and fewer needless recompiles. * https://github.com/avikivity/scylla unconfig-1/v1: config: remove inclusions of db/config.hh from header files repair: remove unneeded config.hh inclusion batchlog_manager: remove dependency on db::config auth: remove permissions_cache dependency on db::config auth: remove auth::service dependency on db::config auth: remove unneeded db/config.hh includes	2018-12-10 14:53:14 +01:00
Avi Kivity	7c7da0b462	sstables: fix overflow in clustering key blocks header bit access _ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too. Otherwise, a shift in the range 32-63 will produce wrong results. Fix by using a 64-bit mask. Found by Fedora 29's ubsan. Fixes #3973. Message-Id: <20181209120549.21371-1-avi@scylladb.com>	2018-12-10 11:09:25 +00:00
Avi Kivity	40677fae37	Merge "Compaction strategy aware major compaction" from Raphael " Make major compaction aware of compaction strategy, by using an optimal approach which suits the strategy needs. Refs #1431. " * 'compaction_strategy_aware_major_compaction_v2' of github.com:raphaelsc/scylla: tests: add test for compaction-strategy-aware major compaction compaction: implement major compaction heuristic for leveled strategy compaction: introduce notion of compaction-strategy-aware major compaction	2018-12-10 10:10:22 +02:00
Avi Kivity	864f55e745	config: remove inclusions of db/config.hh from header files Instead, distribute those inclusions to .cc files that require them. This reduces rebuilds when config.hh changes, and makes it easier to locate files that need config disaggregation.	2018-12-09 20:11:38 +02:00
Vladimir Krivopalov	cc12449646	sstables: Collect statistics when writing RT markers to SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Vladimir Krivopalov	2e5c221865	sstables: Update TTL/local deletion stats for non-expiring and live liveness_info. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 16:40:27 -08:00
Raphael S. Carvalho	525ee18560	compaction: implement major compaction heuristic for leveled strategy Major compaction for leveled strategy will now create a run of non-overlapping sstables at the highest level. Until now, a single sstable would be created at level 0 which was very suboptimal because all data would need to climb up the levels again, making it a very expensive I/O process. Refs #1431. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:22:31 -02:00
Raphael S. Carvalho	3d9566e40d	compaction: introduce notion of compaction-strategy-aware major compaction That's only the very first step which introduces the machinery for making major compaction aware of all strategies. By the time being, default implementation is used for them all which only suits size tiered. Refs #1431. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:22:30 -02:00
Vladimir Krivopalov	d2dfa2e15d	sstables: Don't enforce default max_local_deletion_time value for 'mc' files. Commit `cc6c383249` has fixed an issue with incorrectly tracking max_local_deletion_time and the check in validate_max_local_deletion_time was called to work around old files. This fix relaxes conditions for enforcing defaut max_local_deletion_time so that they don't apply to SSTables in 'mc' format because the original problem has been resolved before 'mc' format have been introduced. This is needed to be able to read correct values from Cassandra-generated SSTables that don't have a Scylla.db component. Its presence or absence is used as an indicator of possibly affected files. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 10:15:07 -08:00
Vladimir Krivopalov	0b1e6427ad	sstables: Enforce estimated_partitions in generate_summary() to be always positive. For tiny index files (< 8 bytes long) it could turn to zero and trigger an assertion in prepare_summary(). Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-06 10:15:07 -08:00
Vladimir Krivopalov	cdae62ec29	sstables: Properly track row-level tombstones when writing SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Avi Kivity	b0cb69ec25	Merge "Make sstable reader fail on unknown colum names in MC format" from Piotr " Before the reader was just ignoring such columns but this creates a risk of data loss. Refs #2598 " * 'haaawk/2598/v3' of github.com:scylladb/seastar-dev: sstables: Add test_sstable_reader_on_unknown_column sstables: Exception on sstable's column not present in schema sstables: store column name in column_translation::column_info sstables: Make test_dropped_column_handling test dropped columns	2018-12-05 10:43:29 +02:00
Tomasz Grabiec	b8c405c019	Merge "Correct the usage of row ttl and add write-read test" from Piotr Fixes the condition which determines whether a row ttl should be used for a cell and adds a test that uses each generated mutation to populate mutation source and then verifies that it can read back the same mutation. * seastar-dev.git haaawk/sst3/write-read-test/v3: Fix use_row_ttl condition Add test_all_data_is_read_back	2018-12-04 19:47:28 +01:00
Piotr Jastrzebski	75b99838fc	Fix use_row_ttl condition Previous condition was wrong and was using row ttl too often. We also have to change test_dead_row_marker to compare resulting sstable with sstable generated by Origin not by sstableupgrade. This is because sstableupgrade transmits information about deleted row marker automatically to cells in that row. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-12-04 10:51:36 +01:00
Avi Kivity	c3e664eec2	Merge "Improve corrupt sstable reporting" from Rafael " This is a small step in fixing issue #2347. It is mostly tests and testing infrastructure, but it does include a fix for a case where we were missing the filename in the malformed_sstable_exception. " * 'espindola/sstable-corruption-v2' of https://github.com/espindola/scylla: Add a filename to a malformed_sstable_exception. Try to read the full sst in broken_sst. Convert tests to SEASTAR_THREAD_TEST_CASE. Check the exception message. Move some tests to broken_sstable_test.cc	2018-12-04 10:32:10 +02:00
Rafael Ávila de Espíndola	21199a7a5c	Add a filename to a malformed_sstable_exception. It is reasonable for parse() to throw when it finds something wrong with the format. This seems to be the best spot to add the filename and rethrow. Also add a testcase to make sure we keep handling this error gracefully. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:50:23 -08:00
Rafael Ávila de Espíndola	cf4dc38259	Simplify state machine loop. These loops have the structure : while (true) { switch (state) { case state1: ... break; case state2: if (...) { ... break; } else {... continue; } ... } break; } There a couple things I find a bit odd on that structure: * The break refers to the switch, the continue to the loop. * A while (true) loop always hits a break or a continue. This patch uses early returns to simplify the logic to while (true) { switch (state) { case state1: ... return case state2: if (...) { ... return; } ... } } Now there are no breaks or continues. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181126171726.84629-1-espindola@scylladb.com>	2018-12-03 20:34:03 +01:00
Avi Kivity	b098b5b987	Merge "Optimize checksum_combine() for CRC32" from Tomek " zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977). Refs #3874 " * tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla: tests: perf_checksum: Test fast_crc32_combine() tests: Rename libdeflate_test to checksum_utils_test tests: libdeflate: Add more tests for checksum_combine() tests: libdeflate: Check both libdeflate and default checksummers sstables: Use fast_crc_combine() in the default checksummer utils/gz: Add fast implementation of crc32_combine() utils/gz: Add pre-computed polynomials utils/gz: Import Barett reduction implementation from libdeflate utils: Extract clmul() from crc.hh	2018-12-03 19:02:01 +02:00
Tomasz Grabiec	aa19f98d18	sstables: Write Statistics.db offset map entries in the same order as Cassandra Before this patch we were writing offset map enteies in unspecified order, the one returned by std::unorderd_map. Cassandra writes them sorted by metadata_type. Use the same order for improved compatibility. Fixes #3955. Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com>	2018-12-03 16:40:24 +02:00
Tomasz Grabiec	1eb03b6ff1	sstables: Use fast_crc_combine() in the default checksummer	2018-12-03 14:40:35 +01:00
Benny Halevy	9e7125a9de	distributed_loader::populate_column_family: lookup and remove temp sstable directories These may be left over in case we crash while writing sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	0b74927757	sstable: use std::experimental::filesystem rather than boost Note: Requires linking with -lstdc++fs Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	61d116a1f1	sstable::seal_sstable: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	90118fa9ef	sstable: create sstable component files in a subdirectory When writing the sstable, create a temporary directory for creating all components so that each sstable files' will be assigned a different allocaton groups on xfs. Files are immediately renamed to their default location after creation. Temp directory is removed when the sstable is sealed. Additional work to be introduced in the following patches: - When populating tables, temp directories need to be looked up and removed. Fixes #3167 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	23d8afb20d	sstable::new_sstable_component_file: pass component_type rather than filename So we can create the file in the sstable directory and then move into the final location Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	7b170eb0dc	sstable: cleanup filename related functions - use const sstring& params rather than sstring - returning const sstring is superfleous Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	ad5f1e4fbb	sstable: make write_crc, write_digest, and new_sstable_component_file private methods Prepare for per-sstable sub directory. Also, these functions get most of their parameters from the sst at hand so they might as well be first class members. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	d257e5c123	sstable: remove unused get_sstable_key_range Since `024c8ef8a1` db: adjust sstable load to use sstable self-reporting of shard ownership Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181202114523.14296-1-bhalevy@scylladb.com>	2018-12-02 18:32:34 +02:00
Vladimir Krivopalov	b0e5404071	sstables: When loading, read statistics before summary. In case if summary is missing and we attempt to re-generate it, statistics must be already read to provide us with values stored in serialization header to facilitate clustering prefixes deserialization. Fixes #3947 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 11:56:56 -08:00
Piotr Jastrzebski	5cc3f904ce	sstables: Exception on sstable's column not present in schema Previously such column was ignored but it's better to be explicit about this situation. Refs #2598 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-30 08:59:13 +01:00
Piotr Jastrzebski	c0ce94c6f9	sstables: store column name in column_translation::column_info This will be used for better diagnostics. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-30 08:59:00 +01:00
Rafael Ávila de Espíndola	777ea893e6	Delete data_consume_rows_at_once. As far as I can tell the old sstable reading code required reading the data into a contiguous buffer. The function data_consume_rows_at_once implemented the old behavior and incrementally code was moved away from it. Right now the only use is in two tests. The sstables used in those tests are already used in other tests with data_consume_rows. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181127024319.18732-2-espindola@scylladb.com>	2018-11-27 14:11:50 +02:00
Piotr Jastrzebski	4366302c4c	sstables: Extract mp_row_cosumer_m::check_schema_mismatch This method will contain common logic used in multiple places and reduce code duplication. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <bbda2f4ea4f9325055f096dc549f63b1bb03d3b6.1543311990.git.piotr@scylladb.com>	2018-11-27 12:45:12 +01:00
Avi Kivity	5e759b0c07	Merge "Optimize checksum computation for the MC sstable format" from Tomek " One part of the improvement comes from replacing zlib's CRC32 with the one from libdeflate, which is optimized for modern architecture and utilizes the PCLMUL instruction. perf_checksum test was introduced to measure performance of various checksumming operations. Results for 514 B (relevant for writing with compression enabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 58414 16.711us 3.483ns 16.708us 16.725us crc_test.perf_adler_combine 165788278 6.059ns 0.031ns 6.027ns 7.519ns crc_test.perf_zlib_crc32_combine 59546 16.767us 26.191ns 16.741us 16.801us --- crc_test.perf_deflate_crc32_checksum 12705072 83.267ns 4.580ns 78.687ns 98.964ns crc_test.perf_adler_checksum 3918014 206.701ns 23.469ns 183.231ns 258.859ns crc_test.perf_zlib_crc32_checksum 2329682 428.787ns 0.085ns 428.702ns 510.085ns Results for 64 KB (relevant for writing with compression disabled): test iterations median mad min max crc_test.perf_deflate_crc32_combine 25364 38.393us 17.683ns 38.375us 38.545us crc_test.perf_adler_combine 169797143 5.842ns 0.009ns 5.833ns 6.901ns crc_test.perf_zlib_crc32_combine 26067 38.663us 95.094ns 38.546us 40.523us --- crc_test.perf_deflate_crc32_checksum 202821 4.937us 14.426ns 4.912us 5.093us crc_test.perf_adler_checksum 44684 22.733us 206.263ns 22.492us 25.258us crc_test.perf_zlib_crc32_checksum 18839 53.049us 36.117ns 53.013us 53.274us The new CRC32 implementation (deflate_crc32) doesn't provide a fast checksum_combine() yet, it delegates to zlib so it's as slow as the latter. Because for CRC32 checksum_combine() is several orders of magnitude slower than checksum(), we avoid calling checksum_combine() completely for this checksummer. We still do it for adler32, which has combine() which is faster than checksum(). SStable write performance was evaluated by running: perf_fast_forward --populate --data-directory /tmp/perf-mc \ --rows=10000000 -c1 -m4G --datasets small-part Below is a summary of the average frag/s for a memtable flush. Each result is an average of about 20 flushes with stddev of about 4k. Before: [1] MC,lz4: 330'903 [2] LA,lz4: 450'157 [3] MC,checksum: 419'716 [4] LA,checksum: 459'559 After: [1'] MC,lz4: 446'917 ([1] + 35%) [2'] LA,lz4: 456'046 ([2] + 1.3%) [3'] MC,checksum: 462'894 ([3] + 10%) [4'] LA,checksum: 467'508 ([4] + 1.7%) After this series, the performance of the MC format writer is similar to that of the LA format before the series. There seems to be a small but consistent improvement for LA too. I'm not sure why. " * tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla: tests: perf: Introduce perf_checksum tests: Add test for libdeflate CRC32 implementation sstables: compress: Use libdeflate for crc32 sstables: compress: Rename crc32_utils to zlib_crc32_checksummer licenses: Add libdeflate license Integrate libdeflate with the build system Add libdeflate submodule sstables: Avoid checksum_combine() for the crc32 checksummer sstables: compress: Avoid unnecessary checksum_combine() sstables: checksum_utils: Add missing include	2018-11-26 20:10:46 +02:00
Tomasz Grabiec	bf0164cdaf	sstables: compress: Use libdeflate for crc32 Improves memtable flush performance by 10% in a CPU-bound case. Unlike the zlib implementation, libdeflate is optimized for modern CPUs. It utilizes the PCLMUL instruction.	2018-11-26 18:59:42 +01:00
Tomasz Grabiec	0ac1905f4f	sstables: compress: Rename crc32_utils to zlib_crc32_checksummer	2018-11-26 18:59:42 +01:00
Tomasz Grabiec	743cf43847	sstables: Avoid checksum_combine() for the crc32 checksummer checksum_combine() is much slower than re-feeding the buffer to checksum() for the zlib CRC32 checksummer. Introduce Checksum::prefer_combine() to determine this and select more optimal behavior for given checksummer. Improves performance of memtable flush with compression enabled by 30%.	2018-11-26 18:57:33 +01:00
Tomasz Grabiec	88cf1c61ba	sstables: compress: Avoid unnecessary checksum_combine()	2018-11-26 14:31:38 +01:00
Tomasz Grabiec	8372cf7bcc	sstables: checksum_utils: Add missing include	2018-11-26 14:31:38 +01:00
Paweł Dziepak	62ea153629	Merge "Check for schema mismatch after dropping dead cells" from Piotr " Previously we were checking for schema incompatibility between current schema and sstable serialization header before reading any data. This isn't the best approach because data in sstable may be already irrelevant due to column drop for example. This patchset moves the check after actual data is read and verified that it has a timestamp new enough to classify it as nonobsolete. Fixes #3924 " * 'haaawk/3924/v3' of github.com:scylladb/seastar-dev: sstables: Enable test_schema_change for MC format sstables3: Throw error on schema mismatch only for live cells sstables: Pass column_info to consume_*_column sstables: Add schema_mismatch to column_info sstables: Store column data type in column_info sstables: Remove code duplication in column_translation	2018-11-26 13:10:18 +00:00
Piotr Jastrzebski	4bdb86c712	sstables3: Throw error on schema mismatch only for live cells Previously we were throwing exception during the creation of column_translation. This wasn't always correct because sometimes column for which the mismatch appeared was already dropped and data present in sstable should be ignored anyway. Fixes #3924 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-11-26 13:25:10 +01:00
Rafael Ávila de Espíndola	6746907999	Use fully covered switches in continuous_data_consumer do_process_buffer had two unreachable default cases and a long if-else-if chain. This converts the the if-else-if chain to a switch and a helper function. This moves the error checking from run time to compile time. If we were to add a 128 bit integer for example, gcc would complain about it missing from the switch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181125221451.106067-1-espindola@scylladb.com>	2018-11-25 22:52:11 +00:00
Raphael S. Carvalho	2058001f94	sstables/compaction: propagate sstable replacement to all compaction of a CF This is needed for parallel compaction to work with sstable run based approach. That's because regular compaction clones a set containing all sstables of its column family. So compaction A can potentially hold a reference to a compacting sstable of compaction B, so preventing compacting B from releasing its exhausted sstable. So all replacements are propagated to all compactions of a given column family, and compactions in turn, including the one which initiated the propagation, will do the replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:30 -02:00
Raphael S. Carvalho	953fdcc867	sstables: store cf pointer in compaction_info motivation is that we need a more efficient way to find compactions that belong to a given column family in compaction list. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:28 -02:00
Raphael S. Carvalho	824c20b76d	sstables: add sstable's on closed handling Motivation is that it will be useful for catching regression on compaction when releasing early exhausted sstables. That's because sstable's space is only released once it's closed. So this will allow us to write a test case and possibly use it for entities holding exhausted sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:25 -02:00
Raphael S. Carvalho	e88d1d54b9	sstables/compaction_manager: prevent partial run from being selected for compaction Filter out sstable belonging to a partial run being generated by an ongoing compaction. Otherwise, that could lead to wrong decisions by the compaction strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:22 -02:00
Raphael S. Carvalho	23884fe9f6	compaction: use same run identifier for sstables generated by same compaction SSTables composing the same run will share the same run identifier. Therefore, a new compaction strategy will be able to get all sstables belong to the same run from sstable_set, which now keeps track of existing runs. Same UUID is passed to writers of a given compaction. Otherwise, a new UUID is picked for every sstable created by compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:20 -02:00
Raphael S. Carvalho	4f68cb34a6	sstables: introduce sstable run sstable run is a structure that will hold all sstables that has the same run identifier. All sstables belonging to the same run will not overlap with one another. It can be used by compaction strategy to work on runs instead of individual sstables. sstable_set structure which holds all sstables for a given column family will be responsible for providing to its user an interface to work with runs instead of individual sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:18 -02:00
Raphael S. Carvalho	fc92fb955d	sstables/compaction_manager: release reference to exhausted sstable through callback That's important for the reference to sstable to not be kept throughout the compaction procedure, which would break the goal of releasing space during compaction. Manager passes a callback to compaction which calls it whenever there's sstable replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:16 -02:00
Raphael S. Carvalho	3f309ebba9	sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor Motivation is that we want to release space for exhausted sstable and that will only happen when all references to it are gone and that backlog tracker takes the early replacement into account. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:13 -02:00

1 2 3 4 5 ...

1658 Commits