scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-13 03:12:13 +00:00

Author	SHA1	Message	Date
Takuya ASADA	efa4d24deb	dist/redhat: fix systemd unit name of scylla-node-exporter systemd unit name of scylla-node-exporter is scylla-node-exporter.service, not node-exporter.service. Fixes #8966 Closes #8967 (cherry picked from commit `f19ebe5709`)	2021-07-07 18:37:42 +03:00
Takuya ASADA	3e1d608111	dist: stop removing /etc/systemd/system/.mount on package uninstall Listing /etc/systemd/system/.mount as ghost file seems incorrect, since user may want to keep using RAID volume / coredump directory after uninstalling Scylla, or user may want to upgrade enterprise version. Also, we mixed two types of files as ghost file, it should handle differently: 1. automatically generated by postinst scriptlet 2. generated by user invoked scylla_setup The package should remove only 1, since 2 is generated by user decision. However, just dropping .mount from %files section causes another problem, rpm will remove these files during upgrade, instead of uninstall (#8924). To fix both problem, specify .mount files as "%ghost %config". It will keep files both package upgrade and package remove. See scylladb/scylla-enterprise#1780 Closes #8810 Closes #8924 Closes #8959 (cherry picked from commit `f71f9786c7`)	2021-07-07 18:37:42 +03:00
Pavel Emelyanov	a87bb38c29	hasher: More picky noexcept marking of feed_hash() Commit `5adb8e555c` marked the ::feed_hash() and a visitor lambda of digester::feed_hash() as noexcept. This was quite recklesl as the appending_hash<>::operator()s called by ::feed_hash() are not all marked noexcept. In particular, the appending_hash<row>() is not such and seem to throw. The original intent of the mentioned commit was to facilitate the partition_hasher in repair/ code. The hasher itself had been removed by the `0af7a22c21`, so it no longer needs the feed_hash-s to be noexcepts. The fix is to inherit noexcept from the called hashers, but for the digester::feed_hash part the noexcept is just removed until clang compilation bug #50994 is fixed. fixes: #8983 tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210706153608.4299-1-xemul@scylladb.com> (cherry picked from commit `63a2fed585`)	2021-07-07 18:36:00 +03:00
Raphael S. Carvalho	ab3e284e04	LCS/reshape: Don't reshape single sstable in level 0 with strict mode With strict mode, it could happen that a sstable alone in level 0 is selected for offstrategy compaction, which means that we could run into an infinite reshape process. This is fixed by respecting the offstrategy threshold. Unit test is added. Fixes #8573. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210506181324.49636-1-raphaelsc@scylladb.com> (cherry picked from commit `8480839932`)	2021-07-07 14:06:20 +03:00
Raphael S. Carvalho	b0e833d9e5	LCS: reshape: Fix overlapping check when determining if a sstable set is disjoint Wrong comparison operator is used when checking for overlapping. It would miss overlapping when last key of a sstable is equal to the first key of another sstable that comes next in the set, which is sorted by first key. Fixes #8531. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `39ecddbd34`)	2021-07-07 14:04:04 +03:00
Hagit Segev	9e55d9bd04	release: prepare for 4.5.rc4 scylla-4.5.rc4	2021-07-07 12:17:19 +03:00
Avi Kivity	f89f4e69a0	Merge 'Commitlog: Handle disk usage and disk footprint discrepancies, ensuring we flush when needed (#8695 ) (v3)' from Calle Wilund Fixes #8270 If we have an allocation pattern where we leave large parts of segments "wasted" (typically because the segment has empty space, but cannot hold the mutation being added), we can have a disk usage that is below threshold, yet still get a disk footprint that is over limit causing new segment allocation to stall. We need to take a few things into account: 1.) Need to include wasted space in the threshold check. Whether or not disk is actually used does not matter here. 2.) If we stall a segment alloc, we should just flush immediately. No point in waiting for the timer task. 3.) Need to adjust the thresholds a bit. Depending on sizes, we should probably consider start flushing once we've used up space enough to be in the last available segment, so a new one is hopefully available by the time we hit the limit. 4.) (v2) Must ensure discard/delete routines are executed. Because we can race with background disk syncs, we may need to issue segment prunes from end_flush() so we wake up actual file deletion/recycling 5.) (v2) Shutdown must ensure discard/delete is run after we've disabled background task etc, otherwise we might fail waking up replenish and get stuck in gate 6.) (v2) Recycling or deleting segments must be consistent, regardless of shutdown. For same reason as above. 7.) (v3) Signal recycle/delete queues/promise on shutdown (with recognized marker) to handle edge case where we only have a single (allocating) segment in the list, and cannot wake up replenisher in any more civilized way. Also fix edge case (for tests), when we have too few segment to have an active one (i.e. need flush everything). New attempt at this, should fix intermittent shutdown deadlocks in commitlog_test. Closes #8764 * github.com:scylladb/scylla: commitlog_test: Add test case for usage/disk size threshold mismatch commitlog_test: Improve test assertion commitlog: Add waitable future for background sync/flush commitlog: abort queues on shutdown commitlog: break out "abort" calls into member functions commitlog: Do explicit discard+delete in shutdown commitlog: Recycle or not should not depend on shutdown state commitlog: Issue discard_unused_segments on segment::flush end IFF deletable commitlog: Flush all segments if we only have one. commitlog: Always force flush if segment allocation is waiting commitlog: Include segment wasted (slack) size in footprint check commitlog: Adjust (lower) usage threshold (cherry picked from commit `14252c8b71`)	2021-06-27 14:05:36 +03:00
Nadav Har'El	7146646bf4	Merge 'commitlog: make_checked_file for segments, report and ignore other errors on shutdown' from Benny Halevy Shutdown must never fail, otherwise it may cause hangs as seen in https://github.com/scylladb/scylla/issues/8577. This change wraps the file created in `allocate_segment_ex` in `make_checked_file` so that scylla will abort when failing to write to the commitlog files. In case other errors are seen during shutdown, just log them and continue with shutting down to prevent scylla from hanging. Fixes #8577 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #8578 * github.com:scylladb/scylla: commitlog: segment_manager::shutdown: abort on errors commitlog: allocate_segment_ex: make_checked_file (cherry picked from commit `48ff641f67`)	2021-06-27 14:04:04 +03:00
Benny Halevy	b3aba49ab0	commitlog: segment_manager: max_size must be aligned This was triggered by the test_total_space_limit_of_commitlog dtest. When it passes a very large commitlog_segment_size_in_mb (1/6th of the free memory size, in mb), segment_manager constructor limits max_size to std::numeric_limits<position_type>::max() which is 0xffffffff. This causes allocate_segment_ex to loop forever when writing the segment file since `dma_write` returns 0 when the count is unaligned (seen 4095). The fix here is to select a sligtly small maxsize that is aligned down to a multiple of 1MB. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210407121059.277912-1-bhalevy@scylladb.com> (cherry picked from commit `705f9c4f79`)	2021-06-27 14:03:41 +03:00
Hagit Segev	706de00ef2	release: prepare for 4.5.rc3 scylla-4.5.rc3	2021-06-20 17:14:50 +03:00
Piotr Sarna	cd5b915460	Merge 'view: fix use-after-move when handling view update failures' Backport of `6726fe79b6`. The code was susceptible to use-after-move if both local and remote updates were going to be sent. The whole routine for sending view updates is now rewritten to avoid use-after-move. Fixes #8830 Tests: unit(release), dtest(secondary_indexes_test.py:TestSecondaryIndexes.test_remove_node_during_index_build) Closes #8834 * backport-6726fe7: view: fix use-after-move when handling view update failures db,view: explicitly move the mutation to its helper function db,view: pass base token by value to mutate_MV	2021-06-16 13:27:12 +02:00
Piotr Sarna	247d30f075	view: fix use-after-move when handling view update failures The code was susceptible to use-after-move if both local and remote updates were going to be sent. The whole routine for sending view updates is now rewritten to avoid use-after-move. Refs #8830 Tests: unit(release), dtest(secondary_indexes_test.py:TestSecondaryIndexes.test_remove_node_during_index_build)	2021-06-16 13:25:50 +02:00
Piotr Sarna	13da17e6fe	db,view: explicitly move the mutation to its helper function The `apply_to_remote_endpoints` helper function used to take its `mut` parameter by reference, but then moved the value from it, which is confusing and prone to errors. Since the value is moved-from, let's pass it to the helper function as rvalue ref explicitly.	2021-06-16 13:25:46 +02:00
Piotr Sarna	6e29d74ab8	db,view: pass base token by value to mutate_MV The base token is passed cross-continuations, so the current way of passing it by const reference probably only works because the token copying is cheap enough to optimize the reference out. Fix by explicitly taking the token by value.	2021-06-16 13:23:10 +02:00
Tomasz Grabiec	e820e7f3c5	Merge 'Backport for 4.5: Fix replacing node takes writes' from Asias He This backport fixes the follow issue: Cassandra stress fails to achieve consistency during replace node operation #8013 without the NODE_OPS_CMD infrastructure. The commit `c82250e0cf` (gossip: Allow deferring advertise of local node to be up) which fixes for During replace node operation - replacing node is used to respond to read queries #7312 is already present in 4.5 branch. Closes #8703 * github.com:scylladb/scylla: storage_service: Delay update pending ranges for replacing node gossip: Add helper to wait for a node to be up	2021-06-08 23:20:25 +02:00
Nadav Har'El	0cebafd104	alternator: fix equality check of nested document containing a set In issue #5021 we noticed that the equality check in Alternator's condition expressions needs to handle sets differently - we need to compare the set's elements ignoring their order. But the implementation we added to fix that issue was only correct when the entire attribute was a set... In the general case, an attribute can be a nested document, with only some inner set. The equality-checking function needs to tranverse this nested document, and compare the sets inside it as appropriate. This is what we do in this patch. This patch also adds a new test comparing equality of a nested document with some inner sets. This test passes on DynamoDB, failed on Alternator before this patch, and passes with this patch. Refs #5021 Fixes #8514 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210419184840.471858-1-nyh@scylladb.com> (cherry picked from commit `dae7528fe5`)	2021-06-07 09:10:42 +03:00
Nadav Har'El	9abd4677b1	alternator: fix inequality check of two sets In issue #5021 we noted that Alternator's equality operator needs to be fixed for the case of comparing two sets, because the equality check needs to take into account the possibility of different element order. Unfortunately, we fixed only the equality check operator, but forgot there is also an inequality operator! So in this patch we fix the inequality operator, and also add a test for it that was previously missing. The implementation of the inequality operator is trivial - it's just the negation of the equality test. Our pre-existing tests verify that this is the correct implementation (e.g., if attribute x doesn't exist, then "x = 3" is false but "x <> 3" is true). Refs #5021 Fixes #8513 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210419141450.464968-1-nyh@scylladb.com> (cherry picked from commit `50f3201ee2`)	2021-06-07 08:42:46 +03:00
Nadav Har'El	9a07d7ca76	alternator: fix equality check of two unset attributes When a condition expression (ConditionExpression, FilterExpression, etc.) checks for equality of two item attributes, i.e., "x = y", and when one of these attributes was missing we correctly returned false. However, we also need to return false when both attributes are missing in the item, because this is what DynamoDB does in this case. In other words an unset attribute is never equal to anything - not even to another unset attribute. This was not happening before this patch: When x and y were both missing attributes, Alternator incorrectly returned true for "x = y", and this patch fixes this case. It also fixes "x <> y" which should to be true when both x and y are unset (but was false before this patch). The other comparison operators - <, <=, >, >=, BETWEEN, were all implemented correctly even before this patch. This patch also includes tests for all the two-unset-attribute cases of all the operators listed above. As usual, we check that these tests pass on both DynamoDB and Alternator to confirm our new behavior is the correct one - before this patch, two of the new tests failed on Alternator and passed on DynamoDB. Fixes #8511 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210419123911.462579-1-nyh@scylladb.com> (cherry picked from commit `46448b0983`)	2021-06-06 15:55:57 +03:00
Takuya ASADA	d92a26636a	dist: add DefaultDependencies=no to .mount units To avoid ordering cycle error on Ubuntu, add DefaultDependencies=no on .mount units. Fixes #8482 Closes #8495 (cherry picked from commit `0b01e1a167`) scylla-4.5.rc2	2021-05-31 12:11:51 +03:00
Yaron Kaikov	7445bfec86	install.sh: Setup aio-max-nr upon installation This is a follow up change to #8512. Let's add aio conf file during scylla installation process and make sure we also remove this file when uninstall Scylla As per Avi Kivity's suggestion, let's set aio value as static configuration, and make it large enough to work with 500 cpus. Closes #8650 Refs: #8713 (cherry picked from commit `dd453ffe6a`)	2021-05-27 14:12:19 +03:00
Yaron Kaikov	b077b198bf	scylla_io_setup: configure "aio-max-nr" before iotune On severl instance types in AWS and Azure, we get the following failure during scylla_io_setup process: ``` ERROR 2021-04-14 07:50:35,666 [shard 5] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application ``` We have scylla_prepare:configure_io_slots() running before the scylla-server.service start, but the scylla_io_setup is taking place before 1) Let's move configure_io_slots() to scylla_util.py since both scylla_io_setup and scylla_prepare are import functions from it 2) cleanup scylla_prepare since we don't need the same function twice 3) Let's use configure_io_slots() during scylla_io_setup to avoid such failure Fixes: #8587 Closes #8512 Refs: #8713 (cherry picked from commit `588a065304`)	2021-05-27 14:11:17 +03:00
Avi Kivity	44f85d2ba0	Update seastar submodule (httpd handler not reading content) * seastar dadd299e7d...dab10ba6ad (1): > httpd: allow handler to not read an empty content Fixes #8691.	2021-05-25 11:32:18 +03:00
Asias He	ccfe1d12ea	storage_service: Delay update pending ranges for replacing node In commit `c82250e0cf` (gossip: Allow deferring advertise of local node to be up), the replacing node is changed to postpone the responding of gossip echo message to avoid other nodes sending read requests to the replacing node. It works as following: 1) replacing node does not respond echo message to avoid other nodes to mark replacing node as alive 2) replacing node advertises hibernate state so other nodes knows replacing node is replacing 3) replacing node responds echo message so other nodes can mark replacing node as alive This is problematic because after step 2, the existing nodes in the cluster will start to send writes to the replacing node, but at this time it is possible that existing nodes haven't marked the replacing node as alive, thus failing the write request unnecessarily. For instance, we saw the following errors in issue #8013 (Cassandra stress fails to achieve consistency when only one of the nodes is down) ``` scylla: [shard 1] consistency - Live nodes 2 do not satisfy ConsistencyLevel (2 required, 1 pending, live_endpoints={127.0.0.2, 127.0.0.1}, pending_endpoints={127.0.0.3}) [shard 0] gossip - Fail to send EchoMessage to 127.0.0.3: std::runtime_error (Not ready to respond gossip echo message) c-s: java.io.IOException: Operation x10 on key(s) [4c4f4d37324c35304c30]: Error executing: (UnavailableException): Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive ``` To solve this problem for older releases without the patch "repair: Switch to use NODE_OPS_CMD for replace operation", a minimum fix is implemented in this patch. Once existing nodes learn the replacing node is in HIBERNATE state, they add the replacing as replacing, but only add the replacing to the pending list only after the replacing node is marked as alive. With this patch, when the existing nodes start to write to the replacing node, the replacing node is already alive. Tests: replace_address_test.py:TestReplaceAddress.replace_node_same_ip_test + manual test Fixes: #8013 Closes #8614 (cherry picked from commit `e4872a78b5`)	2021-05-25 14:12:31 +08:00
Asias He	b0399a7c3b	gossip: Add helper to wait for a node to be up This patch adds gossiper::wait_alive helper to wait for nodes to be up on all shards. Refs #8013 (cherry picked from commit `f690f3ee8e`)	2021-05-25 14:12:11 +08:00
Takuya ASADA	b81919dbe2	scylla_raid_setup: use /dev/disk/by-uuid to specify filesystem Currently, var-lib-scylla.mount may fails because it can start before MDRAID volume initialized. We may able to add "After=dev-disk-by\x2duuid-<uuid>.device" to wait for device become available, but systemd manual says it automatically configure dependency for mount unit when we specify filesystem path by "absolute path of a device node". So we need to replace What=UUID=<uuid> to What=/dev/disk/by-uuid/<uuid>. Fixes #8279 Closes #8681 (cherry picked from commit `3d307919c3`)	2021-05-24 17:23:56 +03:00
Takuya ASADA	5651a20ba1	install.sh: apply correct file security context when copying files Currently, unified installer does not apply correct file security context while copying files, it causes permission error on scylla-server.service. We should apply default file security context while copying files, using '-Z' option on /usr/bin/install. Also, because install -Z requires normalized path to apply correct security context, use 'realpath -m <PATH>' on path variables on the script. Fixes #8589 Closes #8602 (cherry picked from commit `60c0b37a4c`)	2021-05-19 12:40:12 +03:00
Takuya ASADA	8c30b83ea4	install.sh: fix not such file or directory on nonroot Since we have added scylla-node-exporter, we needed to do 'install -d' for systemd directory and sysconfig directory before copying files. Fixes #8663 Closes #8664 (cherry picked from commit `6faa8b97ec`)	2021-05-19 12:40:12 +03:00
Avi Kivity	fce7eab9ac	Merge 'Fix type checking in index paging' from Piotr Sarna When recreating the paging state from an indexed query, a bunch of panic checks were introduced to make sure that the code is correct. However, one of the checks is too eager - namely, it throws an error if the base column type is not equal to the view column type. It usually works correctly, unless the base column type is a clustering key with DESC clustering order, in which case the type is actually "reversed". From the point of view of the paging state generation it's not important, because both types deserialize in the same way, so the check should be less strict and allow the base type to be reversed. Tests: unit(release), along with the additional test case introduced in this series; the test also passes on Cassandra Fixes #8666 Closes #8667 * github.com:scylladb/scylla: test: add a test case for paging with desc clustering order cql3: relax a type check for index paging (cherry picked from commit `593ad4de1e`)	2021-05-19 12:40:09 +03:00
Raphael S. Carvalho	ab8eefade7	compaction_manager: Don't swallow exception in procedure used by reshape and resharding run_custom_job() was swallowing all exceptions, which is definitely wrong because failure in a resharding or reshape would be incorrectly interpreted as success, which means upper layer will continue as if everything is ok. For example, ignoring a failure in resharding could result in a shared sstable being left unresharded, so when that sstable reaches a table, scylla would abort as shared ssts are no longer accepted in the main sstable set. Let's allow the exception to be propagated, so failure will be communicated, and resharding and reshape will be all or nothing, as originally intended. Fixes #8657. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210515015721.384667-1-raphaelsc@scylladb.com> (cherry picked from commit `10ae77966c`)	2021-05-18 13:00:07 +03:00
Hagit Segev	e2704554b5	release: prepare for 4.5.rc2	2021-05-12 14:55:53 +03:00
Lauro Ramos Venancio	e36e490469	TWCS: initialize _highest_window_seen The timestamp_type is an int64_t. So, it has to be explicitly initialized before using it. This missing inicialization prevented the major compactation from happening when a time window finishes, as described in #8569. Fixes #8569 Signed-off-by: Lauro Ramos Venancio <lauro.venancio@incognia.com> Closes #8590 (cherry picked from commit `15f72f7c9e`)	2021-05-06 08:51:58 +03:00
Pavel Emelyanov	c97005fbb8	tracing: Stop tracing in main's deferred action Tracing is created in two steps and is destroyed in two too. The 2nd step doesn't have the corresponding stop part, so here it is -- defer tracing stop after it was started. But need to keep in mind, that tracing is also shut down on drain, so the stopping should handle this. Fixes #8382 tests: unit(dev), manual(start-stop, aborted-start) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210331092221.1602-1-xemul@scylladb.com> (cherry picked from commit `887a1b0d3d`)	2021-05-05 15:22:05 +03:00
Nadav Har'El	d881d539f3	Update tools/java submodule Backport sstableloader fix in tools/java submodule. Fixes #8230. * tools/java 768a59a6f1...dbcea78e7d (1): > sstableloader: Handle non-prepared batches with ":" in identifier names Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-05-03 10:02:00 +03:00
Avi Kivity	b8a502fab0	Merge '[branch 4.5] Backport reader_permit: always forward resources to the semaphore' from Botond Dénes This is a backport of `8aaa3a7bb8` to <= branch-4.5. The main conflicts were around Benny's reader close series (`fa43d7680`), but it also turned out that an additional patch (`2f1d65ca11`) also has to backported to make sure admission on signaling resources doesn't deadlock. Refs: https://github.com/scylladb/scylla/issues/8493 Closes #8558 * github.com:scylladb/scylla: test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units reader_concurrency_semaphore: add dump_diagnostics() reader_permit: always forward resources test: multishard_mutation_query_test: fuzzy-test: don't consume resource up-front reader_concurrency_semaphore: make admission conditions consistent	2021-04-29 15:36:03 +03:00
Botond Dénes	f7f2bb482f	test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress This unit test checks that the semaphore doesn't get into a deadlock when contended, in the presence of many memory-only reads (that don't wait for admission). This is tested by simulating the 3 kind of reads we currently have in the system: * memory-only: reads that don't pass admission and only own memory. * admitted: reads that pass admission. * evictable: admitted reads that are furthermore evictable. The test creates and runs a large number of these reads in parallel, read kinds being selected randomly, then creates a watchdog which kills the test if no progress is being made. (cherry picked from commit `45d580f056`)	2021-04-29 15:26:21 +03:00
Botond Dénes	b16db6512c	test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units This unit test passes a read through admission again-and-again, just like an evictable reader would be during its lifetime. When readmitted the read sometimes has to wait and sometimes not. This is to check that the readmitting a previously admitted reader doesn't leak any units. (cherry picked from commit `cadc26de38`)	2021-04-29 15:26:21 +03:00
Botond Dénes	1a7c8223fe	reader_concurrency_semaphore: add dump_diagnostics() Allow semaphore related tests to include a diagnostics printout in error messages to help determine why the test failed. (cherry picked from commit `d246e2df0a`)	2021-04-29 15:26:21 +03:00
Botond Dénes	ac6aa66a7b	reader_permit: always forward resources This commit conceptually reverts `4c8ab10`. Said commit was meant to prevent the scenario where memory-only permits -- those that don't pass admission but still consume memory -- completely prevent the admission of reads, possibly even causing a deadlock because a permit might even blocks its own admission. The protection introduced by said commit however proved to be very problematic. It made the status of resources on the permit very hard to reason about and created loopholes via which permits could accumulate without tracking or they could even leak resources. Instead of continuing to patch this broken system, this commit does away with this "protection" based on the observation that deadlocks are now prevented anyway by the admission criteria introduced by `0fe75571d9`, which admits a read anyway when all the initial count resources are available (meaning no admitted reader is alive), regardless of availability of memory. The benefits of this revert is that the semaphore now knows about all the resources and is able to do its job better as it is not "lied to" about resource by the permits. Furthermore the status of a permit's resources is much simpler to reason about, there are no more loopholes in unexpected state transitions to swallow/leak resources. To prove that this revert is indeed safe, in the next commit we add robust tests that stress test admission on a highly contested semaphore. This patch also does away with the registered/admitted differentiation of permits, as this doesn't make much sense anymore, instead these two are unified into a single "active" state. One can always tell whether a permit was admitted or not from whether it owns count resources anyway. (cherry picked from commit `caaa8ef59a`)	2021-04-29 15:26:21 +03:00
Botond Dénes	98a39884c3	test: multishard_mutation_query_test: fuzzy-test: don't consume resource up-front The fuzzy test consumes a large chunk of resource from the semaphore up-front to simulate a contested semaphore. This isn't an accurate simulation, because no permit will have more than 1 units in reality. Furthermore this can even cause a deadlock since `8aaa3a7` as now we rely on all count units being available to make forward progress when memory is scarce. This patch just cuts out this part of the test, we now have a dedicated unit test for checking a heavily contested semaphore, that does it properly, so no need to try to fix this clumsy attempt that is just making trouble at this point. Refs: #8493 Tests: release(multishard_mutation_query_test:fuzzy_test) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210429084458.40406-1-bdenes@scylladb.com> (cherry picked from commit `26ae9555d1`)	2021-04-29 15:26:21 +03:00
Eliran Sinvani	88192811e7	Materialized views: fix possibly old views comming from other nodes Migration manager has a function to get a schema (for read or write), this function queries a peer node and retrieves the schema from it. One scenario where it can happen is if an old node, queries an old not fixed index. This makes a hole through which views that are only adjusted for reading can slip through. Here we plug the hole by fixing such views before they are registered. Closes #8509 (cherry picked from commit `480a12d7b3`) Fixes #8554.	2021-04-29 14:02:01 +03:00
Botond Dénes	32f21f7281	reader_concurrency_semaphore: make admission conditions consistent Currently there are two places where we check admission conditions: `do_wait_admission()` and `signal()`. Both use `has_available_units()` to check resource availability, but the former has some additional resource related conditions on top (in `may_proceed()`), which lead to the two paths working with slightly different conditions. To fix, push down all resource availability related checks to `has_available_units()` to ensure admission conditions are consistent across all paths. (cherry picked from commit `d90cd6402c`)	2021-04-27 18:12:29 +03:00
Piotr Sarna	c9eaf95750	Merge 'commitlog: Fix race and edge condition in delete_segments' from Calle Wilund Fixes #8363 Fixes #8376 Delete segements has two issues when running with size-limited commit log and strict adherence to said limit. 1.) It uses parallel processing, with deferral. This means that the disk usage variables it looks at might not be fully valid - i.e. we might have already issued a file delete that will reduce disk footprint such that a segment could instead be recycled, but since vars are (and should) only updated _post_ delete, we don't know. 2.) It does not take into account edge conditions, when we only delete a single segment, and this segment is the border segment - i.e. the one pushing us over the limit, yet allocation is desperately waiting for recycling. In this case we should allow it to live on, and assume that next delete will reduce footprint. Note: to ensure exact size limit, make sure total size is a multiple of segment size. if we had an error in recycling (disk rename?), and no elements are available, we could have waiters hoping they will get segements. abort the queue (not permanent, but wakes up waiters), and let them retry. Since we did deletions instead, disk footprint should allow for new allocs at least. Or more likely, everything is broken, but we will at least make more noise. Closes #8372 * github.com:scylladb/scylla: commitlog: Add signalling to recycle queue iff we fail to recycle commitlog: Fix race and edge condition in delete_segments commitlog: coroutinize delete_segments commitlog_test: Add test for deadlock in recycle waiter (cherry picked from commit `8e808a56d2`)	2021-04-21 18:01:37 +03:00
Avi Kivity	44c6d0fcf9	Update seastar submodule (low bandwidth disks) * seastar 72e3baed9c...dadd299e7d (2): > io_queue: Honor disks with tiny request rate > io_queue: Shuffle fair_group creation Fixes #8378.	2021-04-21 14:00:35 +03:00
Avi Kivity	4bcc0badb2	Point seastar submodule at scylla-seastar.git This allows us to backport fixes to seastar selectively.	2021-04-21 14:00:00 +03:00
Tomasz Grabiec	97664e63fe	Merge 'Make sure that cache_flat_mutation_reader::do_fill_buffer does not fast forward finished underlying reader' from Piotr Jastrzębski It is possible that a partition is in cache but is not present in sstables that are underneath. In such case: 1. cache_flat_mutation_reader will fast forward underlying reader to that partition 2. The underlying reader will enter the state when it's empty and its is_end_of_stream() returns true 3. Previously cache_flat_mutation_reader::do_fill_buffer would try to fast forward such empty underlying reader 4. This PR fixes that Test: unit(dev) Fixes #8435 Fixes #8411 Closes #8437 * github.com:scylladb/scylla: row_cache: remove redundant check in make_reader cache_flat_mutation_reader: fix do_fill_buffer read_context: add _partition_exists read_context: remove skip_first_fragment arg from create_underlying read_context: skip first fragment in ensure_underlying (cherry picked from commit `163f2be277`)	2021-04-20 12:49:10 +03:00
Kamil Braun	204964637a	time_series_sstable_set: return partition start if some sstables were ck-filtered out When a particular partition exists in at least one sstable, the cache expects any single-partition query to this partition to return a `partition_start` fragment, even if the result is empty. In `time_series_sstable_set::create_single_key_sstable_reader` it could happen that all sstables containing data for the given query get filtered out and only sstables without the relevant partition are left, resulting in a reader which immediately returns end-of-stream (while it should return a `partition_start` and if not in forwarding mode, a `partition_end`). This commit fixes that. We do it by extending the reader queue (used by the clustering reader merger) with a `dummy_reader` which will be returned by the queue as the very first reader. This reader only emits a `partition_start` and, if not in forwarding mode, a `partition_end` fragment. Fixes #8447. Closes #8448 (cherry picked from commit `5c7ed7a83f`)	2021-04-20 12:49:10 +03:00
Kamil Braun	c402abe8e9	clustering_order_reader_merger: handle empty readers The merger could return end-of-stream if some (but not all) of the underlying readers were empty (i.e. not even returning a `partition_start`). This could happen in places where it was used (`time_series_sstable_set::create_single_key_sstable_reader`) if we opened an sstable which did not have the queried partition but passed all the filters (specifically, the bloom filter returned a false positive for this sstable). The commit also extends the random tests for the merger to include empty readers and adds an explicit test case that catches this bug (in a limited scope: when we merge a single empty reader). It also modifies `test_twcs_single_key_reader_filtering` (regression test for #8432) because the time where the clustering key filter is invoked changes (some invocations move from the constructor of the merger to operator()). I checked manually that it still catches the bug when I reintroduce it. Fixes #8445. Closes #8446 (cherry picked from commit `7ffb0d826b`)	2021-04-20 12:49:10 +03:00
Yaron Kaikov	4a78d6403e	release: prepare for 4.5.rc1 scylla-4.5.rc1	2021-04-14 17:02:17 +03:00
Kamil Braun	2f20d52ac7	sstables: fix TWCS single key reader sstable filter The filter passed to `min_position_reader_queue`, which was used by `clustering_order_reader_merger`, would incorrectly include sstables as soon as they passed through the PK (bloom) filter, and would include sstables which didn't pass the PK filter (if they passed the CK filter). Fortunately this wouldn't cause incorrect data to be returned, but it would cause sstables to be opened unnecessarily (these sstables would immediately return eof), resulting in a performance drop. This commit fixes the filter and adds a regression test which uses statistics to check how many times the CK filter was invoked. Fixes #8432. Closes #8433 (cherry picked from commit `3687757115`)	2021-04-11 12:59:31 +03:00
Raphael S. Carvalho	540439ee46	sstable_set: Implement compound_sstable_set's create_single_key_sstable_reader() compound set isn't overriding create_single_key_sstable_reader(), so default implementation is always called. Although default impl will provide correct behavior, specialized ones which provides better perf, which currently is only available for TWCS, were being ignored. compound set impl of single key reader will basically combine single key readers of all sets managed by it. Fixes #8415. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210406205009.75020-1-raphaelsc@scylladb.com> (cherry picked from commit `8e0a1ca866`)	2021-04-11 12:59:31 +03:00

1 2 3 4 5 ...

25836 Commits