scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Vladimir Krivopalov	c53afd7bba	tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	cfbde5b89c	tests: Run test_sstable_max_local_deletion_time on all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 15:29:28 -08:00
Vladimir Krivopalov	9955710cac	tests: Extend test checking tombstones histogram to cover all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	cdae62ec29	sstables: Properly track row-level tombstones when writing SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	0f3fb32028	tests: Run min_max_clustering_key_test_2 for all SSTables versions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Vladimir Krivopalov	c474b0d851	tests: Make reusable_sst() helper accept SSTables version parameter. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-12-05 12:36:22 -08:00
Asias He	a5d8b66f2c	gossip: Make favor newly added node log debug level It is not very useful for user to know this. Message-Id: <6c2dfc522d6974adb97c34fbc1e3a0339d2d530c.1543997137.git.asias@scylladb.com>	2018-12-05 10:45:03 +02:00
Avi Kivity	b0cb69ec25	Merge "Make sstable reader fail on unknown colum names in MC format" from Piotr " Before the reader was just ignoring such columns but this creates a risk of data loss. Refs #2598 " * 'haaawk/2598/v3' of github.com:scylladb/seastar-dev: sstables: Add test_sstable_reader_on_unknown_column sstables: Exception on sstable's column not present in schema sstables: store column name in column_translation::column_info sstables: Make test_dropped_column_handling test dropped columns	2018-12-05 10:43:29 +02:00
Takuya ASADA	9388f3d626	reloc: drop --jobs from build_deb.sh/build_rpm.sh scripts Since we merged relocatable package, build_deb.sh/build_rpm.sh only does packaging using prebuilt binary taken from relocatable package, won't compile anything. So passing --jobs option to build_deb.sh/build_rpm.sh becomes meaningless, we can drop it. Note that we still can specify --jobs option on reloc/build_reloc.sh, it runs "ninja-build -jN" to compile Scylla, then generate relocatable package. See #3956 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181204205652.25138-1-syuu@scylladb.com>	2018-12-04 21:00:51 +00:00
Tomasz Grabiec	b8c405c019	Merge "Correct the usage of row ttl and add write-read test" from Piotr Fixes the condition which determines whether a row ttl should be used for a cell and adds a test that uses each generated mutation to populate mutation source and then verifies that it can read back the same mutation. * seastar-dev.git haaawk/sst3/write-read-test/v3: Fix use_row_ttl condition Add test_all_data_is_read_back	2018-12-04 19:47:28 +01:00
Tomasz Grabiec	9a4c00beb7	utils/gz: Fix compilation on non-x86 archs gen_crc_combine_table is now executed on every build, so it should not fail on unsupported archs. The generated file will not contain data, but this is fine since it should not be used. Another problem is that u32 and u64 aliases were not visible in the #else branch in crc_combine.cc Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com>	2018-12-04 18:17:27 +00:00
Piotr Jastrzebski	fed3b51abe	Add test_all_data_is_read_back This tests that a source after being populated with a mutation returns exactly the same mutation when read. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-12-04 11:42:08 +01:00
Piotr Sarna	7b0a3fbf8a	auth: add abort_source to waiting for schema agreement When the auth service is requested to stop during bootstrap, it might have still not reached schema agreement. Currently, waiting for this agreement is done in an infinite loop, without taking abort_source into account. This patch introduces checking if abort was requested and breaking the loop in such case, so auth service can terminate. Tests: unit (release) dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test) Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com>	2018-12-04 10:41:09 +00:00
Piotr Jastrzebski	75b99838fc	Fix use_row_ttl condition Previous condition was wrong and was using row ttl too often. We also have to change test_dead_row_marker to compare resulting sstable with sstable generated by Origin not by sstableupgrade. This is because sstableupgrade transmits information about deleted row marker automatically to cells in that row. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-12-04 10:51:36 +01:00
Avi Kivity	c3e664eec2	Merge "Improve corrupt sstable reporting" from Rafael " This is a small step in fixing issue #2347. It is mostly tests and testing infrastructure, but it does include a fix for a case where we were missing the filename in the malformed_sstable_exception. " * 'espindola/sstable-corruption-v2' of https://github.com/espindola/scylla: Add a filename to a malformed_sstable_exception. Try to read the full sst in broken_sst. Convert tests to SEASTAR_THREAD_TEST_CASE. Check the exception message. Move some tests to broken_sstable_test.cc	2018-12-04 10:32:10 +02:00
Avi Kivity	414b14a6bd	Merge "Make inactive shard readers evictable" from Botond " This series attempts to solve the regressions recently discovered in performance of multi-partition range-scans. Namely that they: * Flood the reader concurrency semaphore's queues, trampling other reads. * Behave very badly when too many of them is running concurrently (trashing). * May deadlock if enough of them is running without a timeout. The solution for these problems is to make inactive shard readers evictable. This should address all three issues listed above, to varying degrees: * Shard readers will now not cling onto their permits for the entire duration of the scan, which might be a lot of time. * Will be less affected by infinite concurrency (more than the node can handle) as each scan now can make progress by evicting inactive shard readers belonging to other scans. * Will not deadlock at all. In addition to the above fix, this series also bundles two further improvements: * Add a mechanism to `reader_concurrecy_semaphore` to be notified of newly inserted evictables. * General cleanups and fixes for `multishard_combining_reader` and `foreign_reader`. I can unbundle these mini series and send them separately, if the maintainers so prefer, altough considering that this series will have to be backported to 3.0, I think this present form is better. Fixes: #3835 " * 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits) tests/multishard_mutation_query_test: test stateless query too tests/querier_cache: fail resource-based eviction test gracefully tests/querier_cache: simplify resource-based eviction test tests/mutation_reader_test: add test_multishard_combining_reader_next_partition tests/mutation_reader_test: restore indentation tests/mutation_reader_test: enrich pause-related multishard reader test multishard_combining_reader: use pause-resume API query::partition_slice: add clear_ranges() method position_in_partition: add region() accessor foreign_reader: add pause-resume API tests/mutation_reader_test: implement the pause-resume API query_mutations_on_all_shards(): implement pause-resume API make_multishard_streaming_reader(): implement the pause-resume API database: add accessors for user and streaming concurrency semaphores reader_lifecycle_policy: extend with a pause-resume API query_mutations_on_all_shards(): restore indentation query_mutations_on_all_shards(): simplify the state-machine multishard_combining_reader: use the reader lifecycle policy multishard_combining_reader: add reader lifecycle policy multishard_combining_reader: drop unnecessary `reader_promise` member ...	2018-12-04 10:22:35 +02:00
Botond Dénes	9de4f3a834	tests/multishard_mutation_query_test: test stateless query too In the `test_read_all`, do a stateless read as well to ensure that path works correctly as well.	2018-12-04 08:51:05 +02:00
Botond Dénes	6676ceba7f	tests/querier_cache: fail resource-based eviction test gracefully Currently when this test fails, resources are not released in the correct order, which results in ASAN complaining about use-after-free in debug builds. This is due to the BOOST_REQUIRE macro aborting the test when the predicate fails, not allowing for correct destruction order to take place. To avoid this ugly failure, that adds noise and might cause a developer investigating into the failure starting on the wrong path, use the more mild BOOST_CHECK family of test macros. These will allow the test to run to completion even when the predicate fails, allowing for the correct destruction of the resources.	2018-12-04 08:51:05 +02:00
Botond Dénes	93e41397f7	tests/querier_cache: simplify resource-based eviction test Now that we have an accessor for all concurrency semaphores, we don't need the tricks of creating a dummy keyspace to get them. Use the accessors instead.	2018-12-04 08:51:05 +02:00
Botond Dénes	dcd2d116a3	tests/mutation_reader_test: add test_multishard_combining_reader_next_partition Test the interaction of the multishard reader with the foreign reader w.r.t next_partition(). next_partition() is a special operation, as it its execution is deferred until the next cross-shard operations. Give it some extra stress-testing.	2018-12-04 08:51:05 +02:00
Botond Dénes	20e994e526	tests/mutation_reader_test: restore indentation Left over from the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	a577ff97e9	tests/mutation_reader_test: enrich pause-related multishard reader test Enrich the existing test_multishard_combining_reader_as_mutation_source test case with delaying the pause/resume and eviction of paused readers.	2018-12-04 08:51:05 +02:00
Botond Dénes	22b14d593b	multishard_combining_reader: use pause-resume API Refactor the multishard combining reader to make use of the new pause-resume API to pause inactive shard readers. Make the pause-resume API mandatory to implement, as by now all existing clients have adapted it.	2018-12-04 08:51:05 +02:00
Botond Dénes	77b758707c	query::partition_slice: add clear_ranges() method Allows for clearing any custom partition ranges, effectively resetting them to the default ones. Useful for code that needs to set several different specific partition ranges, one after the other, but doesn't want to remember the last key it set a range for to be able to clear the previous range with `clear_range()`.	2018-12-04 08:51:05 +02:00
Botond Dénes	a594fd39ce	position_in_partition: add region() accessor	2018-12-04 08:51:05 +02:00
Botond Dénes	9601d23e0d	foreign_reader: add pause-resume API Allowing for pausing the reader and later resume it. Pausing the reader waits on the ongoing read ahead (if any), executes any pending `next_partition()` calls and than detaches the shard reader's buffer. The paused shard reader is returned to the client. Resuming the reader consists of getting the previously detached reader back, or one that has the same position as the old reader had. This API allows for making the inactive shard readers of the `multishard_combining_reader` evictable. The API is private, it's only accessible for classes knowing the full definition of the `foreign_reader` (which resides in a .cc file).	2018-12-04 08:51:05 +02:00
Botond Dénes	a12fae366d	tests/mutation_reader_test: implement the pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	f334d3717f	query_mutations_on_all_shards(): implement pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	72ed655ef0	make_multishard_streaming_reader(): implement the pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	bf0d1f4eea	database: add accessors for user and streaming concurrency semaphores These will soon be needed to register inactive user and streaming reads with the respective semaphores.	2018-12-04 08:51:05 +02:00
Botond Dénes	5f67a065c6	reader_lifecycle_policy: extend with a pause-resume API This API provides a way for the mulishard reader to pause inactive shard readers and later resume them when they are needed again. This allows for these paused shard readers to be evicted when the node is under pressure. How the readers are made evictable while paused is up to the clients. Using this API in the `multishard_combining_reader` and implementing it in the clients will be done in the next patches. Provide default implementation for the new virtual methods to facilitate gradual adoption.	2018-12-04 08:51:05 +02:00
Botond Dénes	6f0e0c4ed7	query_mutations_on_all_shards(): restore indentation The previous patch added half-aligned lines to improve readability of that patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	aa6083a75b	query_mutations_on_all_shards(): simplify the state-machine The `read_context` which handles creating, saving and looking-up the shard readers has to deal with its `destroy_reader()` method called any time, even before some other method finished its work. For example it is valid for a reader to be requested to be destroyed, even before the contexts finishes creating it. This means that state transitions that take time can be interleaved with another state transition request. To deal with this the read context uses `future_` states, states that mark an ongoing state transitions. This allows for state transition request that arrive in the middle of another state transition to be attached as a continuation to the ongoing transition, and to be executed after that finishes. This however resulted in complex code, that has to handle readers being in all sorts of different states, when the `save_readers()` method is called. To avoid all this complexity, exploit the fact that `destroy_reader()` receives a future<> as its argument, which resolves when all previous state transitions have finished. Use a gate to wait on all these futures to resolve. This way we don't need all those transitional states, instead in `save_readers()` we only need to wait on the gate to close. Thus the number of states `save_readers()` has to consider drops drastically. This has the theoretical drawback of the process of saving the readers having to wait on each of the readers to stop, but in practice the process finishes when the last reader is saved anyway, so I don't expect this to result in any slowdown.	2018-12-04 08:51:05 +02:00
Botond Dénes	007619de4c	multishard_combining_reader: use the reader lifecycle policy Refactor the multishard combining reader and its clients to use the reader lifecycle policy introduced in the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	0a616c899e	multishard_combining_reader: add reader lifecycle policy Currently `multishard_combining_reader` takes two functors, one for creating the readers and optionally one for destroying them. A bag of functors (`std::function`) however make for a terrible interface, and as we are about to add some more customization points, it's time to use something more formal: policy based design, a well-known design pattern. As well as merging the job of the two functors into a single policy class, also widen the area of responsibility of the policy to include keeping alive any resource the shard readers might need on their home shard. Implementing a proper reader cleanup is now not optional either. This patch only adds the `reader_managing_policy` interface, refactoring the multishard reader to use it will be done in the next patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	301abaca07	multishard_combining_reader: drop unnecessary `reader_promise` member The `reader_promise` member of the `shard_reader` was used to synchronize a foreground request to create the underlying reader with an ongoing background request with the same goal. This is however unnecessary. The underlying reader is created in the background only as part of a read ahead. In this case there is no need for extra synchronization point, the foreground reader create request can just wait for the read ahead to finish, for which there already exists a mean. Furthermore, foreground reader create requests are always followed by a `fill_buffer()` request, so by waiting on the read ahead we ensure that the following `fill_buffer()` call will not block.	2018-12-04 08:51:05 +02:00
Botond Dénes	a73175fdbc	multishard_combining_reader: drop tracking of pending next_partition calls Shard readers used to track pending `next_partition()` calls that they couldn't execute, because their underlying reader wasn't created yet. These pending calls were then executed after the reader was created. However the only situation where a shard reader can receive a `next_partition()` call, before its underlying reader wasn't created is when `next_partition()` is called on the multishard reader before a single fragment is read. In this case we know we are at a partition boundary and thus this call has no effect, therefore it is safe to ignore it.	2018-12-04 08:51:05 +02:00
Botond Dénes	ab3e639c3b	foreign_reader: use bool for pending_next_partition Foreign reader doesn't execute `next_partition()` calls straight away, when this would require interaction with the remote reader. Instead these calls are "remembered" and executed on the next occasion the foreign reader has to interact with the remote reader. This was implemented with a counter that counts the number of pending `next_partition()` calls. However when `next_partition()` is called multiple times, without interleaving calls to `operator()()` or `fast_forward_to()`, only the first such call has effect. Thus it doesn't make sense to count these calls, it is enough to just set a flag if there was at least one such call.	2018-12-04 08:51:05 +02:00
Botond Dénes	5a4fd1abab	multishard_combining_reader: drop support for streamed_mutation fast-forwarding It doesn't make sense for the multishard reader anyway, as it's only used by the row-cache. We are about to introduce the pausing of inactive shard readers, and it would require complex data structures and code to maintain support for this feature that is not even used. So drop it.	2018-12-04 08:51:05 +02:00
Botond Dénes	b36733971b	mutation_source_test: add option to skip intra-partition fast-forwarding tests To allow for using this test suite for testing mutation sources that don't support intra-partition fast-forwarding.	2018-12-04 08:51:05 +02:00
Botond Dénes	37f0117747	reader_concurrency_semaphore: refactor eviction mechanism As we are about to add multiple sources of evictable readers, we need a more scalable solution than a single functor being passed that opaquely evicts a reader when called. Add a generic way to register and unregister evictable (inactive) readers to the semaphore. The readers are expected to be registered when they become evictable and are expected to be unregistered when they cease to become evictable. The semaphore might evict any reader that is registered to it, when it sees fit. This also solves the problem of notifying the semaphore when new readers become evictable. Previously there was no such mechanism, and the semaphore would only evict any such new readers when a new permit was requested from it.	2018-12-04 08:51:00 +02:00
Rafael Ávila de Espíndola	21199a7a5c	Add a filename to a malformed_sstable_exception. It is reasonable for parse() to throw when it finds something wrong with the format. This seems to be the best spot to add the filename and rethrow. Also add a testcase to make sure we keep handling this error gracefully. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:50:23 -08:00
Rafael Ávila de Espíndola	a6e25e4bd0	Try to read the full sst in broken_sst. With this patch we use data_consume_rows to try to read the entire sstable. The patch also adds a test with a particular corruption that would not be found without parsing the file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:47:49 -08:00
Rafael Ávila de Espíndola	b1190c58ec	Convert tests to SEASTAR_THREAD_TEST_CASE. This will simplify future changes to broken_sst. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 13:26:06 -08:00
Rafael Ávila de Espíndola	e5c5afffc9	Check the exception message. This makes the tests a bit more strict by also checking the message returned by the what() function. This shows that some of the tests are out of sync with which errors they check for. I will hopefully fix this in another pass. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 12:31:53 -08:00
Rafael Ávila de Espíndola	f9d81bcd43	Move some tests to broken_sstable_test.cc sstable_test.cc was already a bit too big and there is potential for having a lot of tests about broken sstables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2018-12-03 12:16:30 -08:00
Rafael Ávila de Espíndola	cf4dc38259	Simplify state machine loop. These loops have the structure : while (true) { switch (state) { case state1: ... break; case state2: if (...) { ... break; } else {... continue; } ... } break; } There a couple things I find a bit odd on that structure: * The break refers to the switch, the continue to the loop. * A while (true) loop always hits a break or a continue. This patch uses early returns to simplify the logic to while (true) { switch (state) { case state1: ... return case state2: if (...) { ... return; } ... } } Now there are no breaks or continues. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20181126171726.84629-1-espindola@scylladb.com>	2018-12-03 20:34:03 +01:00
Avi Kivity	b098b5b987	Merge "Optimize checksum_combine() for CRC32" from Tomek " zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977). Refs #3874 " * tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla: tests: perf_checksum: Test fast_crc32_combine() tests: Rename libdeflate_test to checksum_utils_test tests: libdeflate: Add more tests for checksum_combine() tests: libdeflate: Check both libdeflate and default checksummers sstables: Use fast_crc_combine() in the default checksummer utils/gz: Add fast implementation of crc32_combine() utils/gz: Add pre-computed polynomials utils/gz: Import Barett reduction implementation from libdeflate utils: Extract clmul() from crc.hh	2018-12-03 19:02:01 +02:00
Tomasz Grabiec	aa19f98d18	sstables: Write Statistics.db offset map entries in the same order as Cassandra Before this patch we were writing offset map enteies in unspecified order, the one returned by std::unorderd_map. Cassandra writes them sorted by metadata_type. Use the same order for improved compatibility. Fixes #3955. Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com>	2018-12-03 16:40:24 +02:00
Avi Kivity	4dc402b53f	Merge "Create sstable in a sub-directory" from Benny " Due to an XFS heuristic, if all files are in one (or a few) directories, then block allocation can become very slow. This is because XFS divides the disk into a few allocation groups (AGs), and each directory allocates preferentially from a single AG. That AG can become filled long before the disk is full. This patchset works around the problem by: - creating sstable component files in their own temporary, per-sstable sub-directory, - moving the files back into the canonical location right after begin created, and finally - removing the temp sub-directory when the sstable is sealed. - In addition, any temporary sub-directories that might have been left over if scylla crashed while creating sstables are looked up and removed when populating the table. Fixes: #3167 Tests: unit (release) " * 'issues/3167/v7' of https://github.com/bhalevy/scylla: distributed_loader::populate_column_family: lookup and remove temp sstable directories database: directly use std::experimental::filesystem::path for lister::path database: use std::experimental::filesystem::path for lister::path sstable: use std::experimental::filesystem rather than boost sstable::seal_sstable: fixup indentation sstable: create sstable component files in a subdirectory sstable::new_sstable_component_file: pass component_type rather than filename sstable: cleanup filename related functions sstable: make write_crc, write_digest, and new_sstable_component_file private methods	2018-12-03 16:26:12 +02:00

1 2 3 4 5 ...

17174 Commits