scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Calle Wilund	d70cab0444	database: Do not assert on replay positions if truncate does not flush Fixes #6995 In `c2c6c71` the assert on replay positions in flushed sstables discarded by truncate was broken, by the fact that we no longer flush all sstables unless auto snapshot is enabled. This means the low_mark assertion does not hold, because we maybe/probably never got around to creating the sstables that would hold said mark. Note that the (old) change to not create sstables and then just delete them is in itself good. But in that case we should not try to verify the rp mark. (cherry picked from commit `9620755c7f`)	2020-08-11 00:00:43 +03:00
Avi Kivity	0ce3799187	Update seastar submodule * seastar 4641f4f2d3...2775a54dcb (1): > memory: fix small aligned free memory corruption Fixes #6831	2020-08-09 18:35:44 +03:00
Avi Kivity	ee113eca52	Merge 'hinted handoff: fix commitlog memory leak' from Piotr D " When commitlog is recreated in hints manager, only shutdown() method is called, but not release(). Because of that, some internal commitlog objects (`segment_manager` and `segment`s) may be left pointing to each other through shared_ptr reference cycles, which may result in memory leak when the parent commitlog object is destroyed. This PR prevents memory leaks that may happen this way by calling release() after shutdown() from the hints manager. Fixes: #6409, Fixes #6776 " * piodul-fix-commitlog-memory-leak-in-hinted-handoff: hinted handoff: disable warnings about segments left on disk hinted handoff: release memory on commitlog termination (cherry picked from commit `4c221855a1`)	2020-08-09 17:25:20 +03:00
Tomasz Grabiec	be11514985	thrift: Fix crash on unsorted column names in SlicePredicate The column names in SlicePredicate can be passed in arbitrary order. We converted them to clustering ranges in read_command preserving the original order. As a result, the clustering ranges in read command may appear out of order. This violates storage engine's assumptions and lead to undefined behavior. It was seen manifesting as a SIGSEGV or an abort in sstable reader when executing a get_slice() thrift verb: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed. Fixes #6486. Tests: - added a new dtest to thrift_tests.py which reproduces the problem Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `bfd129cffe`)	2020-08-08 19:47:57 +03:00
Rafael Ávila de Espíndola	ec874bdc31	alternator: Fix use after return Avoid a copy of timeout so that we don't end up with a reference to a stack allocated variable. Fixes #6897 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200721184939.111665-1-espindola@scylladb.com> (cherry picked from commit `e83e91e352`)	2020-08-03 22:24:12 +03:00
Nadav Har'El	43169ffa2c	alternator: fix Expected's "NULL" operator with missing AttributeValueList The "NULL" operator in Expected (old-style conditional operations) doesn't have any parameters, so we insisted that the AttributeValueList be empty. However, we forgot to allow it to also be missing - a possibility which DynamoDB allows. This patch adds a test to reproduce this case (the test passes on DyanmoDB, fails on Alternator before this patch, and succeeds after this patch), and a fix. Fixes #6816. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200709161254.618755-1-nyh@scylladb.com> (cherry picked from commit `f549d147ea`)	2020-08-03 20:39:01 +03:00
Yaron Kaikov	c5ed14bff6	release: prepare for 4.2.rc2 scylla-4.2.rc2	2020-08-03 16:50:38 +03:00
Takuya ASADA	8366eda943	scylla_util.py: always use relocatable CLI tools On some CLI tools, command options may different between latest version vs older version. To maximize compatibility of setup scripts, we should always use relocatable CLI tools instead of distribution version of the tool. Related #6954 (cherry picked from commit `a19a62e6f6`)	2020-08-03 10:39:26 +03:00
Takuya ASADA	5d0b0dd4c4	create-relocatable-package.py: add lsblk for relocatable CLI tools We need latest version of lsblk that supported partition type UUID. Fixes #6954 (cherry picked from commit `6ba2a6c42e`)	2020-08-03 10:39:07 +03:00
Juliusz Stasiewicz	6f259be5f1	aggregate_fcts: Use per-type comparators for dynamic types For collections and UDTs the `MIN()` and `MAX()` functions are generated on the fly. Until now they worked by comparing just the byte representations of arguments. This patch uses specific per-type comparators to provide semantically sensible, dynamically created aggregates. Fixes #6768 (cherry picked from commit `5b438e79be`)	2020-08-03 10:26:02 +03:00
Calle Wilund	16e512e21c	cql3::lists: Fix setter_by_uuid not handing null value Fixes #6828 When using the scylla list index from UUID extension, null values were not handled properly causing throws from underlying layer. (cherry picked from commit `3b74b9585f`)	2020-08-03 10:19:13 +03:00
Avi Kivity	c61dc4e87d	tools: toolchain: regenerate for gcc 10.2 Fixes #6813. As a side effect, this also brings in xxhash 0.7.4. (matches commit `66c2b4c8bf`)	2020-07-31 08:48:12 +03:00
Takuya ASADA	af76a3ba79	scylla_post_install.sh: generate memory.conf for CentOS7 On CentOS7, systemd does not support percentage-based parameter. To apply memory parameter on CentOS7, we need to override the parameter in bytes, instead of percentage. Fixes #6783 (cherry picked from commit `3a25e7285b`)	2020-07-30 16:41:10 +03:00
Tomasz Grabiec	8fb5ebb2c6	commitlog: Fix use-after-free on mutation object during replay The mutation object may be freed prematurely during commitlog replay in the schema upgrading path. We will hit the problem if the memtable is full and apply_in_memory() needs to defer. This will typically manifest as a segfault. Fixes #6953 Introduced in `79935df` Tests: - manual using scylla binary. Reproduced the problem then verified the fix makes it go away Message-Id: <1596044010-27296-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `3486eba1ce`)	2020-07-30 16:36:42 +03:00
Takuya ASADA	bfb11defdd	scylla_setup: skip boot partition On GCE, /dev/sda14 reported as unused disk but it's BIOS boot partition, should not use for scylla data partition, also cannot use for it since it's too small. It's better to exclude such partiotion from unsed disk list. Fixes #6636 (cherry picked from commit `d7de9518fe`)	2020-07-29 09:48:10 +03:00
Asias He	2d1ddcbb6a	repair: Fix race between create_writer and wait_for_writer_done We saw scylla hit user after free in repair with the following procedure during tests: - n1 and n2 in the cluster - n2 ran decommission - n2 sent data to n1 using repair - n2 was killed forcely - n1 tried to remove repair_meta for n1 - n1 hit use after free on repair_meta object This was what happened on n1: 1) data was received -> do_apply_rows was called -> yield before create_writer() was called 2) repair_meta::stop() was called -> wait_for_writer_done() / do_wait_for_writer_done was called with _writer_done[node_idx] not engaged 3) step 1 resumed, create_writer() was called and _repair_writer object was referenced 4) repair_meta::stop() finished, repair_meta object and its member _repair_writer was destroyed 5) The fiber created by create_writer() at step 3 hit use after free on _repair_writer object To fix, we should call wait_for_writer_done() after any pending operations were done which were protected by repair_meta::_gate. This prevents wait for writer done finishes before the writer is in the process of being created. Fixes: #6853 Fixes: #6868 Backports: 4.0, 4.1, 4.2 (cherry picked from commit `e6f640441a`)	2020-07-29 09:48:10 +03:00
Raphael S. Carvalho	4c560b63f0	sstable: index_reader: Make sure streams are all properly closed on failure Turns out the fix `f591c9c710` wasn't enough to make sure all input streams are properly closed on failure. It only closes the main input stream that belongs to context, but it misses all the input streams that can be opened in the consumer for promote index reading. Consumer stores a list of indexes, where each of them has its own input stream. On failure, we need to make sure that every single one of them is properly closed before destroying the indexes as that could cause memory corruption due to read ahead. Fixes #6924. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200727182214.377140-1-raphaelsc@scylladb.com> (cherry picked from commit `0d70efa58e`)	2020-07-29 09:48:10 +03:00
Nadav Har'El	00155e32b1	merge: db/view: view_update_generator: make staging reader evictable Merged patch set by Botond Dénes: The view update generation process creates two readers. One is used to read the staging sstables, the data which needs view updates to be generated for, and another reader for each processed mutation, which reads the current value (pre-image) of each row in said mutation. The staging reader is created first and is kept alive until all staging data is processed. The pre-image reader is created separately for each processed mutation. The staging reader is not restricted, meaning it does not wait for admission on the relevant reader concurrency semaphore, but it does register its resource usage on it. The pre-image reader however is restricted. This creates a situation, where the staging reader possibly consumes all resources from the semaphore, leaving none for the later created pre-image reader, which will not be able to start reading. This will block the view building process meaning that the staging reader will not be destroyed, causing a deadlock. This patch solves this by making the staging reader restricted and making it evictable. To prevent thrashing -- evicting the staging reader after reading only a really small partition -- we only make the staging reader evictable after we have read at least 1MB worth of data from it. test/boost: view_build_test: add test_view_update_generator_buffering test/boost: view_build_test: add test test_view_update_generator_deadlock reader_permit: reader_resources: add operator- and operator+ reader_concurrency_semaphore: add initial_resources() test: cql_test_env: allow overriding database_config mutation_reader: expose new_reader_base_cost db/view: view_updating_consumer: allow passing custom update pusher db/view: view_update_generator: make staging reader evictable db/view: view_updating_consumer: move implementation from table.cc to view.cc database: add make_restricted_range_sstable_reader() Signed-off-by: Botond Dénes <bdenes@scylladb.com> (cherry picked from commit `f488eaebaf`) Fixes #6892.	2020-07-28 17:02:09 +03:00
Avi Kivity	b06dffcc19	Merge "messaging: make verb handler registering independent of current scheduling group" from Botond " `0c6bbc8` refactored `get_rpc_client_idx()` to select different clients for statement verbs depending on the current scheduling group. The goal was to allow statement verbs to be sent on different connections depending on the current scheduling group. The new connections use per-connection isolation. For backward compatibility the already existing connections fall-back to per-handler isolation used previously. The old statement connection, called the default statement connection, also used this. `get_rpc_client_idx()` was changed to select the default statement connection when the current scheduling group is the statement group, and a non-default connection otherwise. This inadvertently broke `scheduling_group_for_verb()` which also used this method to get the scheduling group to be used to isolate a verb at handle register time. This method needs the default client idx for each verb, but if verb registering is run under the system group it instead got the non-default one, resulting in the per-handler isolation not being set-up for the default statement connection, resulting in default statement verb handlers running in whatever scheduling group the process loop of the rpc is running in, which is the system scheduling group. This caused all sorts of problems, even beyond user queries running in the system group. Also as of `0c6bbc8` queries on the replicas are classified based on the scheduling group they are running on, so user reads also ended up using the system concurrency semaphore. In particular this caused severe problems with ranges scans, which in some cases ended up using different semaphores per page resulting in a crash. This could happen because when the page was read locally the code would run in the statement scheduling group, but when the request arrived from a remote coordinator via rpc, it was read in a system scheduling group. This caused a mismatch between the semaphore the saved reader was created with and the one the new page was read with. The result was that in some cases when looking up a paused reader from the wrong semaphore, a reader belonging to another read was returned, creating a disconnect between the lifecycle between readers and that of the slice and range they were referencing. This series fixes the underlying problem of the scheduling group influencing the verb handler registration, as well as adding some additional defenses if this semaphore mismatch ever happens in the future. Inactive read handles are now unique across all semaphores, meaning that it is not possible anymore that a handle succeeds in looking up a reader when used with the wrong semaphore. The range scan algorithm now also makes sure there is no semaphore mismatch between the one used for the current page and that of the saved reader from the previous page. I manually checked that each individual defense added is already preventing the crash from happening. Fixes: #6613 Fixes: #6907 Fixes: #6908 Tests: unit(dev), manual(run the crash reproducer, observe no crash) " * 'query-classification-regressions/v1' of https://github.com/denesb/scylla: multishard_mutation_query: use cached semaphore messaging: make verb handler registering independent of current scheduling group multishard_mutation_query: validate the semaphore of the looked-up reader reader_concurrency_semaphore: make inactive read handles unique across semaphores reader_concurrency_semaphore: add name() accessor reader_concurrency_semaphore: allow passing name to no-limit constructor (cherry picked from commit `3f84d41880`)	2020-07-27 17:41:51 +03:00
Botond Dénes	508e58ef9e	sstables: clamp estimated_partitions to [1, +inf) in writers In some cases estimated number of partitions can be 0, which is albeit a legit estimation result, breaks many low-level sstable writer code, so some of these have assertions to ensure estimated partitions is > 0. To avoid hitting this assert all users of the sstable writers do the clamping, to ensure estimated partitions is at least 1. However leaving this to the callers is error prone as #6913 has shown it. As this clamping is standard practice, it is better to do it in the writers themselves, avoiding this problem altogether. This is exactly what this patch does. It also adds two unit tests, one that reproduces the crash in #6913, and another one that ensures all sstable writers are fine with estimated partitions being 0 now. Call sites previously doing the clamping are changed to not do it, it is unnecessary now as the writer does it itself. Fixes #6913 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200724120227.267184-1-bdenes@scylladb.com> (cherry picked from commit `fe127a2155`)	2020-07-27 15:00:00 +03:00
Piotr Sarna	776faa809f	Merge 'view_update_generator: use partitioned sstable set' from Botond. Recently it was observed (#6603) that since 4e6400293ea, the staging reader is reading from a lot of sstables (200+). This consumes a lot of memory, and after this reaches a certain threshold -- the entire memory amount of the streaming reader concurrency semaphore -- it can cause a deadlock within the view update generation. To reduce this memory usage, we exploit the fact that the staging sstables are usually disjoint, and use the partitioned sstable set to create the staging reader. This should ensure that only the minimum number of sstable readers will be opened at any time. Refs: #6603 Fixes: #6707 Tests: unit(dev) * 'view-update-generator-use-partitioned-set/v1' of https://github.com/denesb/scylla: db/view: view_update_generator: use partitioned sstable set sstables: make_partitioned_sstable_set(): return an sstable_set (cherry picked from commit `e4b74356bb`)	2020-07-21 15:40:02 +03:00
Raphael S. Carvalho	7037f43a17	table: Fix Staging SSTables being incorrectly added or removed from the backlog tracker Staging SSTables can be incorrectly added or removed from the backlog tracker, after an ALTER TABLE or TRUNCATE, because the add and removal don't take into account if the SSTable requires view building, so a Staging SSTable can be added to the tracker after a ALTER table, or removed after a TRUNCATE, even though not added previously, potentially causing the backlog to become negative. Fixes #6798. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200716180737.944269-1-raphaelsc@scylladb.com> (cherry picked from commit `b67066cae2`)	2020-07-21 12:57:09 +03:00
Avi Kivity	bd713959ce	Update seastar submodule * seastar 8aad24a5f8...4641f4f2d3 (4): > httpd: Don't warn on ECONNABORTED > httpd: Avoid calling future::then twice on the same future Fixes #6709. > httpd: Use handle_exception instead of then_wrapped > httpd: Use std::unique_ptr instead of a raw pointer	2020-07-19 11:49:02 +03:00
Rafael Ávila de Espíndola	b7c5a918cb	mutation_reader_test: Wait for a future Nothing was waiting for this future. Found while testing another patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200630183929.1704908-1-espindola@scylladb.com> (cherry picked from commit `6fe7706fce`) Fixes #6858. scylla-4.2.rc1	2020-07-16 14:44:31 +03:00
Asias He	fb2ae9e66b	repair: Relax node selection in bootstrap when nodes are less than RF Consider a cluster with two nodes: - n1 (dc1) - n2 (dc2) A third node is bootstrapped: - n3 (dc2) The n3 fails to bootstrap as follows: [shard 0] init - Startup failed: std::runtime_error (bootstrap_with_repair: keyspace=system_distributed, range=(9183073555191895134, 9196226903124807343], no existing node in local dc) The system_distributed keyspace is using SimpleStrategy with RF 3. For the keyspace that does not use NetworkTopologyStrategy, we should not require the source node to be in the same DC. Fixes: #6744 Backports: 4.0 4.1, 4.2 (cherry picked from commit `38d964352d`)	2020-07-16 12:02:38 +03:00
Asias He	7a7ed8c65d	repair: Relax size check of get_row_diff and set_diff In case a row hash conflict, a hash in set_diff will get more than one row from get_row_diff. For example, Node1 (Repair master): row1 -> hash1 row2 -> hash2 row3 -> hash3 row3' -> hash3 Node2 (Repair follower): row1 -> hash1 row2 -> hash2 We will have set_diff = {hash3} between node1 and node2, while get_row_diff({hash3}) will return two rows: row3 and row3'. And the error below was observed: repair - Got error in row level repair: std::runtime_error (row_diff.size() != set_diff.size()) In this case, node1 should send both row3 and row3' to peer node instead of fail the whole repair. Because node2 does not have row3 or row3', otherwise node1 won't send row with hash3 to node1 in the first place. Refs: #6252 (cherry picked from commit `a00ab8688f`)	2020-07-15 14:48:49 +03:00
Nadav Har'El	7b9be752ec	alternator test: configurable temporary directory The test/alternator/run script creates a temporary directory for the Scylla database in /tmp. The assumption was that this is the fastest disk (usually even a ramdisk) on the test machine, and we didn't need anything else from it. But it turns out that on some systems, /tmp is actually a slow disk, so this patch adds a way to configure the temporary directory - if the TMPDIR environment variable exists, it is used instead of /tmp. As before this patch, a temporary subdirectry is created in $TMPDIR, and this subdirectory is automatically deleted when the test ends. The test.py script already passes an appropriate TMPDIR (testlog/$mode), which after this patch the Alternator test will use instead of /tmp. Fixes #6750 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200713193023.788634-1-nyh@scylladb.com> (cherry picked from commit `8e3be5e7d6`)	2020-07-14 12:34:26 +03:00
Konstantin Osipov	903e967a16	Export TMPDIR pointing at subdir of testlog/ Export TMPDIR environment variable pointing at a subdir of testlog. This variable is used by seastar/scylla tests to create a a subdirectory with temporary test data. Normally a test cleans up the temporary directory, but if it crashes or is killed the directory remains. By resetting the default location from /tmp to testlog/{mode} we allow test.py we consolidate all test artefacts in a single place. Fixes #6062, "test.py uses tmpfs" (cherry picked from commit `e628da863d`)	2020-07-14 12:34:06 +03:00
Avi Kivity	b84946895c	Update seastar submodule * seastar 1e762652c4...8aad24a5f8 (2): > futures: Add a test for a broken promise in a parallel_for_each > future: Call set_to_broken_promise earlier Fixes #6749 (probably).	2020-07-13 20:08:16 +03:00
Asias He	a27188886a	repair: Switch to btree_set for repair_hash. In one of the longevity tests, we observed 1.3s reactor stall which came from repair_meta::get_full_row_hashes_source_op. It traced back to a call to std::unordered_set::insert() which triggered big memory allocation and reclaim. I measured std::unordered_set, absl::flat_hash_set, absl::node_hash_set and absl::btree_set. The absl::btree_set was the only one that seastar oversized allocation checker did not warn in my tests where around 300K repair hashes were inserted into the container. - unordered_set: hash_sets=295634, time=333029199 ns - flat_hash_set: hash_sets=295634, time=312484711 ns - node_hash_set: hash_sets=295634, time=346195835 ns - btree_set: hash_sets=295634, time=341379801 ns The btree_set is a bit slower than unordered_set but it does not have huge memory allocation. I do not measure real difference of total time to finish repair of the same dataset with unordered_set and btree_set. To fix, switch to absl btree_set container. Fixes #6190 (cherry picked from commit `67f6da6466`)	2020-07-13 10:09:23 +03:00
Dmitry Kropachev	51d4efc321	dist/common/scripts/scylla-housekeeping: wrap urllib.request with try ... except We could hit "cannot serialize '_io.BufferedReader' object" when request get 404 error from the server Now you will get legit error message in the case. Fixes #6690 (cherry picked from commit `de82b3efae`)	2020-07-09 18:24:55 +03:00
Avi Kivity	0847eea8d6	Update seastar submodule * seastar 11e86172ba...1e762652c4 (1): > sharded: Do not hang on never set freed promise Fixes #6606.	2020-07-09 15:52:26 +03:00
Avi Kivity	35ad57cb9c	Point seastar submodule at scylla-seastar.git This allows us to backport seastar patches to 4.2.	2020-07-09 15:50:25 +03:00
Hagit Segev	42b0b9ad08	release: prepare for 4.2.rc1	2020-07-08 23:01:10 +03:00
Dejan Mircevski	68b95bf2ac	cql/restrictions: Handle `WHERE a>0 AND a<0` WHERE clauses with start point above the end point were handled incorrectly. When the slice bounds are transformed to interval bounds, the resulting interval is interpreted as wrap-around (because start > end), so it contains all values above 0 and all values below 0. This is clearly incorrect, as the user's intent was to filter out all possible values of a. Fix it by explicitly short-circuiting to false when start > end. Add a test case. Fixes #5799. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `921dbd0978`)	2020-07-08 13:20:10 +03:00
Botond Dénes	fea83f6ae0	db/view: view_update_generator: re-balance wait/signal on the register semaphore The view update generator has a semaphore to limit concurrency. This semaphore is waited on in `register_staging_sstable()` and later the unit is returned after the sstable is processed in the loop inside `start()`. This was broken by `4e64002`, which changed the loop inside `start()` to process sstables in per table batches, however didn't change the `signal()` call to return the amount of units according to the number of sstables processed. This can cause the semaphore units to dry up, as the loop can process multiple sstables per table but return just a single unit. This can also block callers of `register_staging_sstable()` indefinitely as some waiters will never be released as under the right circumstances the units on the semaphore can permanently go below 0. In addition to this, `4e64002` introduced another bug: table entries from the `_sstables_with_tables` are never removed, so they are processed every turn. If the sstable list is empty, there won't be any update generated but due to the unconditional `signal()` described above, this can cause the units on the semaphore to grow to infinity, allowing future staging sstables producers to register a huge amount of sstables, causing memory problems due to the amount of sstable readers that have to be opened (#6603, #6707). Both outcomes are equally bad. This patch fixes both issues and modifies the `test_view_update_generator` unit test to reproduce them and hence to verify that this doesn't happen in the future. Fixes: #6774 Refs: #6707 Refs: #6603 Tests: unit(dev) Signed-off-by: Botond DÃ©nes <bdenes@scylladb.com> Message-Id: <20200706135108.116134-1-bdenes@scylladb.com> (cherry picked from commit `5ebe2c28d1`)	2020-07-08 11:13:24 +03:00
Takuya ASADA	76618a7e06	scylla_setup: don't add same disk device twice We shouldn't accept adding same disk twice for RAID prompt. Fixes #6711 (cherry picked from commit `835e76fdfc`)	2020-07-07 13:07:59 +03:00
Takuya ASADA	189a08ac72	scylla_setup: follow hugepages package name change on Ubuntu 20.04LTS hugepages package now renamed to libhugetlbfs-bin, we need to follow the change. Fixes #6673 (cherry picked from commit `03ce19d53a`) scylla-4.2.rc0	2020-07-05 14:41:33 +03:00
Takuya ASADA	a3e9915a83	dist/debian: apply generated package version for .orig.tar.gz file We currently does not able to apply version number fixup for .orig.tar.gz file, even we applied correct fixup on debian/changelog, becuase it just reading SCYLLA-VERSION-FILE. We should parse debian/{changelog,control} instead. Fixes #6736 (cherry picked from commit `a107f086bc`)	2020-07-05 14:08:37 +03:00
Asias He	e4bc14ec1a	boot_strapper: Ignore node to be replaced explicitly as stream source After commit `7d86a3b208` (storage_service: Make replacing node take writes), during replace operation, tokens in _token_metadata for node being replaced are updated only after the replace operation is finished. As a result, in range_streamer::add_ranges, the node being replaced will be considered as a source to stream data from. Before commit `7d86a3b208`, the node being replaced will not be considered as a source node because it is already replaced by the replacing node before the replace operation is finished. This is the reason why it works in the past. To fix, filter out the node being replaced as a source node explicitly. Tests: replace_first_boot_test and replace_stopped_node_test Backports: 4.1 Fixes: #6728 (cherry picked from commit e338028b7e22b0a80be7f80c337c52f958bfe1d7)	2020-07-01 14:36:43 +03:00
Takuya ASADA	972acb6d56	scylla_swap_setup: handle <1GB environment Show better error message and exit with non-zero status when memory size <1GB. Fixes #6659 (cherry picked from commit `a9de438b1f`)	2020-07-01 12:40:25 +03:00
Yaron Kaikov	7fbfedf025	dist/docker/redhat/Dockerfile: update 4.2 params Set SCYLLA_REPO and VERSION values for scylla-4.2	2020-06-30 13:09:06 +03:00
Avi Kivity	5f175f8103	Merge "Fix handling of decimals with negative scales" from Rafael " Before this series scylla would effectively infinite loop when, for example, casting a decimal with a negative scale to float. Fixes #6720 " * 'espindola/fix-decimal-issue' of https://github.com/espindola/scylla: big_decimal: Add a test for a corner case big_decimal: Correctly handle negative scales big_decimal: Add a as_rational member function big_decimal: Move constructors out of line (cherry picked from commit `3e2eeec83a`)	2020-06-29 12:05:17 +03:00
Benny Halevy	674ad6656a	comapction: restore % in compaction completion message The % sign fell off in `c4841fa735` Fixes #6727. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200625151352.736561-1-bhalevy@scylladb.com> (cherry picked from commit `a843945115`)	2020-06-28 12:10:21 +03:00
Hagit Segev	58498b4b6c	release: prepare for 4.2.rc0	2020-06-26 13:06:07 +03:00
Raphael S. Carvalho	b17d20b5f4	reshape: LCS: avoid unnecessary work on level 0 No need to sort level 0 as we only check if levels > 0 are disjoint. Also taking the opportunity to avoid copies when sorting. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200624151921.20160-1-raphaelsc@scylladb.com>	2020-06-24 18:27:22 +03:00
Rafael Ávila de Espíndola	67c22c8697	commitlog::read_log_file: Don't discard a future This makes the code a bit easier to read as there are no discarded futures and no references to having to keep a subscription alive, which we don't with current seastar. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200527013120.179763-1-espindola@scylladb.com>	2020-06-24 17:22:29 +03:00
Botond Dénes	5ff6ac52b2	scylla-gdb.py: collection element func: accept references and pointers to collections Add support to references (both lvalue and rvalue) and pointers to collections as well, in addition to plain values. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200624101305.428925-1-bdenes@scylladb.com>	2020-06-24 13:31:18 +03:00
Avi Kivity	a9c7a1a86c	Merge "repair: row_level: prevent deadlocks when repairing homogenous nodes" from Botond " Row level repair, when using a local reader, is prone to deadlocking on the streaming reader concurrency semaphore. This has been observed to happen with at least two participating nodes, running more concurrent repairs than the maximum allowed amount of reads by the concurrency semaphore. In this situation, it is possible that two repair instances, competing for the last available permits on both nodes, get a permit on one of the nodes and get queued on the other one respectively. As neither will let go of the permit it already acquired, nor give up waiting on the failed-to-acquired permit, a deadlock happens. To prevent this, we make the local repair reader evictable. For this we reuse the already existing evictable reader mechanism of the multishard combining reader. This patchset refactors this evictable reader mechanism into a standalone flat mutation reader, then exposes it to the outside world. The repair reader is paused after the repair buffer is filled, which is currently 32MB, so the cost of a possible reader recreation is amortized over 32MB read. The repair reader is said to be local, when it can use the shard-local partitioner. This is the case if the participating nodes are homogenous (their shard configuration is identical), that is the repair instance has to read just from one shard. A non-local reader uses the multishard reader, which already makes its shard readers evictable and hence is not prone to the deadlock described here. Fixes: #6272 Tests: unit(dev, release, debug) " * 'repair-row-level-evictable-local-reader/v3' of https://github.com/denesb/scylla: repair: row_level: destroy reader on EOS or error repair: row_level: use evictable_reader for local reads mutation_reader: expose evictable_reader mutation_reader: evictable_reader: add auto_pause flag mutation_reader: make evictable_reader a flat_mutation_reader mutation_reader: s/inactive_shard_read/inactive_evictable_reader/ mutation_reader: move inactive_shard_reader code up mutation_reader: fix indentation mutation_reader: shard_reader: extract remote_reader as evictable_reader mutation_reader: reader_lifecycle_policy: make semaphore() available early	2020-06-24 12:55:34 +03:00
Piotr Sarna	c2939c67b2	test: add a case for local altering of distributed tables Local altering, which does not propagate the change to other nodes, should not be allowed for a non-local table. Refs #6700 Message-Id: <34a2b191c0e827f296e6d720dc31bf8bda0fd160.1592990796.git.sarna@scylladb.com>	2020-06-24 12:51:41 +03:00

1 2 3 4 5 ...

22587 Commits