scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 06:23:03 +00:00

Author	SHA1	Message	Date
Botond Dénes	5d6a7272e7	sstables: clamp estimated_partitions to [1, +inf) in writers In some cases estimated number of partitions can be 0, which is albeit a legit estimation result, breaks many low-level sstable writer code, so some of these have assertions to ensure estimated partitions is > 0. To avoid hitting this assert all users of the sstable writers do the clamping, to ensure estimated partitions is at least 1. However leaving this to the callers is error prone as #6913 has shown it. As this clamping is standard practice, it is better to do it in the writers themselves, avoiding this problem altogether. This is exactly what this patch does. It also adds two unit tests, one that reproduces the crash in #6913, and another one that ensures all sstable writers are fine with estimated partitions being 0 now. Call sites previously doing the clamping are changed to not do it, it is unnecessary now as the writer does it itself. Fixes #6913 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200724120227.267184-1-bdenes@scylladb.com> [avi: adjust sstable_datafile_test's use of compaction_descriptor and make_permit] (cherry picked from commit `fe127a2155`)	2020-07-28 09:55:34 +03:00
Piotr Sarna	96625fa54b	Merge 'view_update_generator: use partitioned sstable set' from Botond. Recently it was observed (#6603) that since 4e6400293ea, the staging reader is reading from a lot of sstables (200+). This consumes a lot of memory, and after this reaches a certain threshold -- the entire memory amount of the streaming reader concurrency semaphore -- it can cause a deadlock within the view update generation. To reduce this memory usage, we exploit the fact that the staging sstables are usually disjoint, and use the partitioned sstable set to create the staging reader. This should ensure that only the minimum number of sstable readers will be opened at any time. Refs: #6603 Fixes: #6707 Tests: unit(dev) * 'view-update-generator-use-partitioned-set/v1' of https://github.com/denesb/scylla: db/view: view_update_generator: use partitioned sstable set sstables: make_partitioned_sstable_set(): return an sstable_set (cherry picked from commit `e4b74356bb`)	2020-07-21 15:41:46 +03:00
Raphael S. Carvalho	4f5f404619	table: Fix Staging SSTables being incorrectly added or removed from the backlog tracker Staging SSTables can be incorrectly added or removed from the backlog tracker, after an ALTER TABLE or TRUNCATE, because the add and removal don't take into account if the SSTable requires view building, so a Staging SSTable can be added to the tracker after a ALTER table, or removed after a TRUNCATE, even though not added previously, potentially causing the backlog to become negative. Fixes #6798. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200716180737.944269-1-raphaelsc@scylladb.com> (cherry picked from commit `b67066cae2`)	2020-07-21 12:57:28 +03:00
Asias He	cd4502ee64	repair: Relax size check of get_row_diff and set_diff In case a row hash conflict, a hash in set_diff will get more than one row from get_row_diff. For example, Node1 (Repair master): row1 -> hash1 row2 -> hash2 row3 -> hash3 row3' -> hash3 Node2 (Repair follower): row1 -> hash1 row2 -> hash2 We will have set_diff = {hash3} between node1 and node2, while get_row_diff({hash3}) will return two rows: row3 and row3'. And the error below was observed: repair - Got error in row level repair: std::runtime_error (row_diff.size() != set_diff.size()) In this case, node1 should send both row3 and row3' to peer node instead of fail the whole repair. Because node2 does not have row3 or row3', otherwise node1 won't send row with hash3 to node1 in the first place. Refs: #6252 (cherry picked from commit `a00ab8688f`)	2020-07-15 14:49:09 +03:00
Hagit Segev	3e6c6d5f58	release: prepare for 4.1.2 scylla-4.1.2	2020-07-14 23:56:02 +03:00
Avi Kivity	564b4c32b0	Update seastar submodule * seastar 78f626af6c...c9c1dc5fa7 (2): > futures: Add a test for a broken promise in a parallel_for_each > future: Call set_to_broken_promise earlier Fixes #6749 (probably).	2020-07-13 20:17:54 +03:00
Dmitry Kropachev	dfafc4e1a9	dist/common/scripts/scylla-housekeeping: wrap urllib.request with try ... except We could hit "cannot serialize '_io.BufferedReader' object" when request get 404 error from the server Now you will get legit error message in the case. Fixes #6690 (cherry picked from commit `de82b3efae`)	2020-07-09 18:25:16 +03:00
Dejan Mircevski	db286c5ca4	cql/restrictions: Handle `WHERE a>0 AND a<0` WHERE clauses with start point above the end point were handled incorrectly. When the slice bounds are transformed to interval bounds, the resulting interval is interpreted as wrap-around (because start > end), so it contains all values above 0 and all values below 0. This is clearly incorrect, as the user's intent was to filter out all possible values of a. Fix it by explicitly short-circuiting to false when start > end. Add a test case. Fixes #5799. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `921dbd0978`)	2020-07-08 13:21:00 +03:00
Botond Dénes	519fcd4729	db/view: view_update_generator: re-balance wait/signal on the register semaphore The view update generator has a semaphore to limit concurrency. This semaphore is waited on in `register_staging_sstable()` and later the unit is returned after the sstable is processed in the loop inside `start()`. This was broken by `4e64002`, which changed the loop inside `start()` to process sstables in per table batches, however didn't change the `signal()` call to return the amount of units according to the number of sstables processed. This can cause the semaphore units to dry up, as the loop can process multiple sstables per table but return just a single unit. This can also block callers of `register_staging_sstable()` indefinitely as some waiters will never be released as under the right circumstances the units on the semaphore can permanently go below 0. In addition to this, `4e64002` introduced another bug: table entries from the `_sstables_with_tables` are never removed, so they are processed every turn. If the sstable list is empty, there won't be any update generated but due to the unconditional `signal()` described above, this can cause the units on the semaphore to grow to infinity, allowing future staging sstables producers to register a huge amount of sstables, causing memory problems due to the amount of sstable readers that have to be opened (#6603, #6707). Both outcomes are equally bad. This patch fixes both issues and modifies the `test_view_update_generator` unit test to reproduce them and hence to verify that this doesn't happen in the future. Fixes: #6774 Refs: #6707 Refs: #6603 Tests: unit(dev) Signed-off-by: Botond DÃ©nes <bdenes@scylladb.com> Message-Id: <20200706135108.116134-1-bdenes@scylladb.com> (cherry picked from commit `5ebe2c28d1`)	2020-07-08 12:00:12 +03:00
Juliusz Stasiewicz	9bcbcbbcf2	counters: Read the state under timeout Counter update is a RMW operation. Until now the "Read" part was not guarded by a timeout, which is changed in this patch. Fixes #5069 (cherry picked from commit `e04fd9f774`)	2020-07-07 20:45:01 +03:00
Takuya ASADA	c622e5bfab	scylla_setup: don't add same disk device twice We shouldn't accept adding same disk twice for RAID prompt. Fixes #6711 (cherry picked from commit `835e76fdfc`)	2020-07-07 13:08:22 +03:00
Nadav Har'El	905643bbc2	docker: add option to start Alternator with HTTPS We already have a docker image option to enable alternator on an unencrypted port, "--alternator-port", but we forgot to also allow the similar option for enabling alternator on an encrypted (HTTPS) port: "--alternator-https-port" so this patch adds the missing option, and documents how to use it. Note that using this option is not enough. When this option is used, Alternator also requires two files, /etc/scylla/scylla.crt and /etc/scylla/scylla.key, to be inserted into the image. These files should contain the SSL certificate, and key, respectively. If these files are missing, you will get an error in the log about the missing file. Fixes #6583. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200621125219.12274-1-nyh@scylladb.com> (cherry picked from commit `e4eca5211a`)	2020-07-06 08:22:22 +02:00
Juliusz Stasiewicz	d396a298d6	cdc: Fix segfault when stream ID key is too short When a token is calculated for stream_id, we check that the key is exactly 16 bytes long. If it's not - `minimum_token` is returned and client receives empty result. This used to be the expected behavior for empty keys; now it's extended to keys of any incorrect length. Fixes #6570 (cherry picked from commit `8628ede009`)	2020-07-05 15:09:44 +03:00
Asias He	1d9bbbc957	boot_strapper: Ignore node to be replaced explicitly as stream source After commit `7d86a3b208` (storage_service: Make replacing node take writes), during replace operation, tokens in _token_metadata for node being replaced are updated only after the replace operation is finished. As a result, in range_streamer::add_ranges, the node being replaced will be considered as a source to stream data from. Before commit `7d86a3b208`, the node being replaced will not be considered as a source node because it is already replaced by the replacing node before the replace operation is finished. This is the reason why it works in the past. To fix, filter out the node being replaced as a source node explicitly. Tests: replace_first_boot_test and replace_stopped_node_test Backports: 4.1 Fixes: #6728 (cherry picked from commit e338028b7e22b0a80be7f80c337c52f958bfe1d7) scylla-4.1.1	2020-07-01 14:35:28 +03:00
Raphael S. Carvalho	4f1878803e	compaction: Fix the 2x disk space requirement in SSTable upgrade SSTable upgrade is requiring 2x the space of input SSTables because we aren't releasing references of the SSTables that were already upgraded. So if we're upgrading 1TB, it means that up to 2TB may be required for the upgrade operation to succeed. That can be fixed by moving all input SSTables when rewrite_sstables() asks for the set of SSTables to be compacted, so allowing their space to be released as soon as there is no longer any ref to them. Spotted while auditting code. Fixes #6682. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200619205701.92891-1-raphaelsc@scylladb.com> (cherry picked from commit `52180f91d4`)	2020-07-01 12:36:52 +03:00
Avi Kivity	c5e2fad1c8	Merge "Fix handling of decimals with negative scales" from Rafael " Before this series scylla would effectively infinite loop when, for example, casting a decimal with a negative scale to float. Fixes #6720 " * 'espindola/fix-decimal-issue' of https://github.com/espindola/scylla: big_decimal: Add a test for a corner case big_decimal: Correctly handle negative scales big_decimal: Add a as_rational member function big_decimal: Move constructors out of line (cherry picked from commit `3e2eeec83a`)	2020-06-29 12:05:39 +03:00
Hagit Segev	abd0fa52c0	release: prepare for 4.1.1	2020-06-25 08:06:32 +03:00
Piotr Sarna	dfa464c35b	alternator: fix propagating tags Updating tags was erroneously done locally, which means that the schema change was not propagated to other nodes. The new code announces new schema globally. Fixes #6513 Branches: 4.0,4.1 Tests: unit(dev) dtest(alternator_tests.AlternatorTest.test_update_condition_expression_and_write_isolation) Message-Id: <3a816c4ecc33c03af4f36e51b11f195c231e7ce1.1592935039.git.sarna@scylladb.com> (cherry picked from commit `f4e8cfe03b`)	2020-06-24 13:56:09 +03:00
Avi Kivity	be29b35c4b	Merge 'range_streamer: Handle table of RF 1 in get_range_fetch_map' from Asias " After "Make replacing node take writes" series, with repair based node operations disabled, we saw the replace operation fail like: ``` [shard 0] init - Startup failed: std::runtime_error (unable to find sufficient sources for streaming range (9203926935651910749, +inf) in keyspace system_auth) ``` The reason is the system_auth keyspace has default RF of 1. It is impossible to find a source node to stream from for the ranges owned by the replaced node. In the past, the replace operation with keyspace of RF 1 passes, because the replacing node calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node) before streaming. We saw: ``` [shard 0] range_streamer - Bootstrap : keyspace system_auth range (-9021954492552185543, -9016289150131785593] exists on {127.0.0.6} ``` Node 127.0.0.6 is the replacing node 127.0.0.5. The source node check in range_streamer::get_range_fetch_map will pass if the source is the node itself. However, it will not stream from the node itself. As a result, the system_auth keyspace will not get any data. After the "Make replacing node take writes" series, the replacing node calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node) after the streaming finishes. We saw: ``` [shard 0] range_streamer - Bootstrap : keyspace system_auth range (-9049647518073030406, -9048297455405660225] exists on {127.0.0.5} ``` Since 127.0.0.5 was dead, the source node check failed, so the bootstrap operation. Ta fix, we ignore the table of RF 1 when it is unable to find a source node to stream. Fixes #6351 " * asias-fix_bootstrap_with_rf_one_in_range_streamer: range_streamer: Handle table of RF 1 in get_range_fetch_map streaming: Use separate streaming reason for replace operation (cherry picked from commit `9afd599d7c`)	2020-06-23 13:53:03 +03:00
Asias He	97b7024c0c	streaming: Do not send end of stream in case of error Current sender sends stream_mutation_fragments_cmd::end_of_stream to receiver when an error is received from a peer node. To be safe, send stream_mutation_fragments_cmd::error instead of stream_mutation_fragments_cmd::end_of_stream to prevent end_of_stream to be written into the sstable when a partition is not closed yet. In addition, use mutation_fragment_stream_validator to valid the mutation fragments emitted from the reader, e.g., check if partition_start and partition_end are paired when the reader is done. If not, fail the stream session and send stream_mutation_fragments_cmd::error instead of stream_mutation_fragments_cmd::end_of_stream to isolate the problematic sstables on the sender node. Refs: #6478 (cherry picked from commit `a521c429e1`)	2020-06-23 12:47:35 +03:00
Alejo Sanchez	194ff1d226	lwt: validate before constructing metadata LWT batches conditions can't span multiple tables. This was detected in batch_statement::validate() called in ::prepare(). But ::cas_result_set_metadata() was built in the constructor, causing a bitset assert/crash in a reported scenario. This patch moves validate() to the constructor before building metadata. Closes #6332 Tested with https://github.com/scylladb/scylla-dtest/pull/1465 [avi: adjust spelling of exception message to 4.1 spelling] Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> (cherry picked from commit `d1521e6721`)	2020-06-21 18:20:41 +03:00
Gleb Natapov	b8f7fb35e1	cql transport: do not log broken pipe error when a client closes its side of a connection abruptly Fixes #5661 Message-Id: <20200615075958.GL335449@scylladb.com> (cherry picked from commit `7ca937778d`)	2020-06-21 13:08:58 +03:00
Amnon Heiman	f7d53ff607	api/storage_service.cc: stream result of token_range The get token range API can become big which can cause large allocation and stalls. This patch replace the implementation so it would stream the results using the http stream capabilities instead of serialization and sending one big buffer. Fixes #6297 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `7c4562d532`)	2020-06-21 12:57:15 +03:00
Rafael Ávila de Espíndola	eb190643f8	configure: Reduce the dynamic linker path size gdb has a SO_NAME_MAX_PATH_SIZE of 512, so we use that as the path size. Fixes: #6494 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200528202741.398695-2-espindola@scylladb.com> (cherry picked from commit `aa778ec152`)	2020-06-21 12:26:51 +03:00
Piotr Sarna	3f8345f1b8	alternator: fix the return type of PutItem Even if there are no attributes to return from PutItem requests, we should return a valid JSON object, not an empty string. Fixes #6568 Tests: unit(dev) (cherry picked from commit `8fc3ca855e`)	2020-06-21 12:21:19 +03:00
Piotr Sarna	891a3fa243	alternator: fix returning UnprocessedKeys unconditionally Client libraries (e.g. PynamoDB) expect the UnprocessedKeys and UnprocessedItems attributes to appear in the response unconditionally - it's hereby added, along with a simple test case. Fixes #6569 Tests: unit(dev) (cherry picked from commit `3aff52f56e`)	2020-06-21 12:19:18 +03:00
Tomasz Grabiec	db31542805	row_cache: Fix undefined behavior on key linearization This is relevant only when using partition or clustering keys which have a representation in memory which is larger than 12.8 KB (10% of LSA segment size). There are several places in code (cache, background garbage collection) which may need to linearize keys because of performing key comparison, but it's not done safely: 1) the code does not run with the LSA region locked, so pointers may get invalidated on linearization if it needs to reclaim memory. This is fixed by running the code inside an allocating section. 2) LSA region is locked, but the scope of with_linearized_managed_bytes() encloses the allocating section. If allocating section needs to reclaim, linearization context will contain invalidated pointers. The fix is to reorder the scopes so that linearization context lives within an allocating section. Example of 1 can be found in range_populating_reader::handle_end_of_stream() where it performs a lookup: auto prev = std::prev(it); if (prev->key().equal(_cache._schema, _last_key->_key)) { it->set_continuous(true); but handle_end_of_stream() is not invoked under allocating section. Example of 2 can be found in mutation_cleaner_impl::merge_some() where it does: return with_linearized_managed_bytes([&] { ... return _worker_state->alloc_section(region, [&] { Fixes #6637. Refs #6108. Tests: - unit (all) Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `e81fc1f095`)	2020-06-21 11:56:31 +03:00
Yaron Kaikov	b443b2574a	release: prepare for 4.1.0 scylla-4.1.0	2020-06-18 14:42:57 +03:00
Asias He	2ee321d88e	gossip: Do not send shutdown message when a node is in unknown status When a replacing node is in early boot up and is not in HIBERNATE sate yet, if the node is killed by a user, the node will wrongly send a shutdown message to other nodes. This is because UNKNOWN is not in SILENT_SHUTDOWN_STATES, so in gossiper::do_stop_gossiping, the node will send shutdown message. Other nodes in the cluster will call storage_service::handle_state_normal for this node, since NORMAL and SHUTDOWN status share the same status handler. As a result, other nodes will incorrectly think the node is part of the cluster and the replace operation is finished. Such problem was seen in replace_node_no_hibernate_state_test dtest: n1, n2 are in the cluster n2 is dead n3 is started to replace n2, but n3 is killed in the middle n3 announces SHUTDOWN status wrongly n1 runs storage_service::handle_state_normal for n3 n1 get tokens for n3 which is empty, because n3 hasn't gossip tokens yet n1 skips update normal tokens for n3, but think n3 has replaced n2 n4 starts to replace n2 n4 checks the tokens for n2 in storage_service::join_token_ring (Cannot replace token {} which does not exist!) or storage_service::prepare_replacement_info (Cannot replace_address {} because it doesn't exist in gossip) To fix, we add UNKNOWN into SILENT_SHUTDOWN_STATES and avoid sending shutdown message. Tests: replace_address_test.py:TestReplaceAddress.replace_node_no_hibernate_state_test Fixes: #6436 (cherry picked from commit `dddde33512`)	2020-06-16 15:03:48 +03:00
Avi Kivity	4563f4b992	tools: toolchain: regenerate for gnutls 3.6.14 CVE-2020-13777. Fixes #6627. Toolchain source image registry disambiguated due to tighter podman defaults.	2020-06-15 07:49:21 +03:00
Kamil Braun	81dc8eeec7	cdc: rename CDC description tables Commit `968177da04` has changed the schema of cdc_topology_description and cdc_description tables in the system_distributed keyspace. Unfortunately this was a backwards-incompatible change: these tables would always be created, irrespective of whether or not "experimental" was enabled. They just wouldn't be populated with experimental=off. If the user now tries to upgrade Scylla from a version before this change to a version after this change, it will work as long as CDC is protected b the experimental flag and the flag is off. However, if we drop the flag, or if the user turns experimental on, weird things will happen, such as nodes refusing to start because they try to populate cdc_topology_description while assuming a different schema for this table. The simplest fix for this problem is to rename the tables. This fix must get merged in before CDC goes out of experimental. If the user upgrades his cluster from a pre-rename version, he will simply have two garbage tables that he is free to delete after upgrading. sstables and digests need to be regenerated for schema_digest_test since this commit effectively adds new tables to the system_distributed keyspace. This doesn't result in schema disagreement because the table is announced to all nodes through the migration manager. (cherry picked from commit `d89b7a0548`) Fixes #6537.	2020-06-14 09:15:36 +03:00
Raphael S. Carvalho	2d72f7d8e5	compaction: Disable garbage collected writer if interposer consumer is used GC writer, used for incremental compaction, cannot be currently used if interposer consumer is used. That's because compaction assumes that GC writer will be operated only by a single compaction writer at a given point in time. With interposer consumer, multiple writers will concurrently operate on the same GC writer, leading to race condition which potentially result in use-after-free. Let's disable GC writer if interposer consumer is enabled. We're not losing anything because GC writer is currently only needed on strategies which don't implement an interposer consumer. Resharding will always disable GC writer, which is the expected behavior because it doesn't support incremental compaction yet. The proper fix, which allows GC writer and interposer consumer to work together, will require more time to implement and test, and for that reason, I am postponing it as #6472 is a showstopper for the current release. Fixes #6472. tests: mode(dev). [Raphael: Fixed compilation failure in unit test test_bug_6472 for backport] Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com> (cherry picked from commit `097a5e9e07`) Message-Id: <20200610203928.86717-1-raphaelsc@scylladb.com>	2020-06-11 13:21:56 +03:00
Takuya ASADA	c6ee86b512	aws: update enhanced networking supported instance list Sync enhanced networking supported instance list to latest one. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html Fixes #6540 (cherry picked from commit `969c4258cf`)	2020-06-09 16:02:09 +03:00
Hagit Segev	67348cd6e8	release: prepare for 4.1.rc2 scylla-4.1.rc2	2020-06-08 16:37:36 +03:00
Israel Fruchter	44cc4843f1	fix "scylla_coredump_setup: Remove the coredump create by the check" In 28c3d4 `out()` was used without `shell=True` and was the spliting of arguments failed cause of the complex commands in the cmd (pipe and such) Fixes #6159 (cherry picked from commit `a2bb48f44b`)	2020-06-04 20:54:51 +03:00
Israel Fruchter	f1f5586bf6	scylla_coredump_setup: Remove the coredump create by the check We generate a coredump as part of "scylla_coredump_setup" to verify that coredumps are working. However, we need to remove that test coredump to avoid people and test infrastructure reporting those coredumps. Fixes #6159 (cherry picked from commit `28c3d4f8e8`)	2020-06-03 16:52:51 +03:00
Amos Kong	3a447cd755	active the coredump directory mount during coredump setup Currently we use a systemd mount (var-lib-systemd-coredump.mount) to mount default coredump directory (/var/lib/systemd/coredump) to (/var/lib/scylla/coredump). The /var/lib/scylla had been mounted to a big storage, so we will have enough space for coredump after the mount. Currently in coredump_setup, we only enabled var-lib-systemd-coredump.mount, but not start it. The directory won't be mounted after coredump_setup, so the coredump will still be saved to default coredump directory. The mount will only effect after reboot. Fixes #6566 (cherry picked from commit `abf246f6e5`)	2020-06-03 09:25:59 +03:00
Pekka Enberg	176aa91be5	Revert "scylla_coredump_setup: Fix incorrect coredump directory mount" This reverts commit `e77dad3adf` because its incorrect. Amos explains: "Quote from https://www.freedesktop.org/software/systemd/man/systemd.mount.html What= Takes an absolute path of a device node, file or other resource to mount. See mount(8) for details. If this refers to a device node, a dependency on the respective device unit is automatically created. Where= Takes an absolute path of a file or directory for the mount point; in particular, the destination cannot be a symbolic link. If the mount point does not exist at the time of mounting, it is created as directory. So the mount point is '/var/lib/systemd/coredump' and '/var/lib/scylla/coredump' is the file to mount, because /var/lib/scylla had mounted a second big storage, which has enough space for Huge coredumps. Bentsi or other touched problem with old scylla-master AMI, a coredump occurred but not successfully saved to disk for enospc. The directory /var/lib/systemd/coredump wasn't mounted to /var/lib/scylla/coredump. They WRONGLY thought the wrong mount was caused by the config problem, so he posted a fix. Actually scylla-ami-setup / coredump wasn't executed on that AMI, err: unit scylla-ami-setup.service not found Because 'scylla-ami-setup.service' config file doesn't exist or is invalid. Details of my testing: https://github.com/scylladb/scylla/issues/6300#issuecomment-637324507 So we need to revert Bentsi's patch, it changed the right config to wrong." (cherry picked from commit `9d9d54c804`)	2020-06-03 09:25:49 +03:00
Avi Kivity	4a3eff17ff	Revert "Revert "config: Do not enable repair based node operations by default"" This reverts commit `71d0d58f8c`. Repair-based node operations are still not ready.	2020-06-02 18:08:03 +03:00
Nadav Har'El	2e00f6d0a1	alternator: fix support for bytes type in Query's KeyConditions Our parsing of values in a KeyConditions paramter of Query was done naively. As a result, we got bizarre error messages "condition not met: false" when these values had incorrect type (this is issue #6490). Worse - the naive conversion did not decode base64-encoded bytes value as needed, so KeyConditions on bytes-typed keys did not work at all. This patch fixes these bugs by using our existing utility function get_key_from_typed_value(), which takes care of throwing sensible errors when types don't match, and decoding base64 as needed. Unfortunately, we didn't have test coverage for many of the KeyConditions features including bytes keys, which is why this issue escaped detection. A patch will follow with much more comprehensive tests for KeyConditions, which also reproduce this issue and verify that it is fixed. Refs #6490 Fixes #6495 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524141800.104950-1-nyh@scylladb.com> (cherry picked from commit `6b38126a8f`)	2020-05-31 13:53:45 +03:00
Nadav Har'El	bf509c3b16	alternator: add mandatory configurable write isolation mode Alternator supports four ways in which write operations can use quorum writes or LWT or both, which we called "write isolation policies". Until this patch, Alternator defaulted to the most generally safe policy, "always_use_lwt". This default could have been overriden for each table separately, but there was no way to change this default for all tables. This patch adds a "--alternator-write-isolation" configuration option which allows changing the default. Moreover, @dorlaor asked that users must explicitly choose this default mode, and not get "always_use_lwt" without noticing. The previous default, "always_use_lwt" supports any workload correctly but because it uses LWT for all writes it may be disappointingly slow for users who run write-only workloads (including most benchmarks) - such users might find the slow writes so disappointing that they will drop Scylla. Conversely, a default of "forbid_rmw" will be faster and still correct, but will fail on workloads which need read-modify-write operations - and suprise users that need these operations. So Dor asked that that none of the write modes be made the default, and users must make an informed choice between the different write modes, rather than being disappointed by a default choice they weren't aware of. So after this patch, Scylla refuses to boot if Alternator is enabled but a "--alternator-write-isolation" option is missing. The patch also modifies the relevant documentation, adds the same option to our docker image, and the modifies the test-running script test/alternator/run to run Scylla with the old default mode (always_use_lwt), which we need because we want to test RMW operations as well. Fixes #6452 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524160338.108417-1-nyh@scylladb.com> (cherry picked from commit `c3da9f2bd4`)	2020-05-31 13:42:11 +03:00
Avi Kivity	84ef30752f	Update seastar submodule * seastar e708d1df3a...78f626af6c (1): > reactor: don't mlock all memory at once Fixes #6460.	2020-05-31 13:34:42 +03:00
Avi Kivity	f1b71ec216	Point seastar submodule at scylla-seastar.git This allows us to backport seastar patches to the 4.1 branch.	2020-05-31 13:34:42 +03:00
Piotr Sarna	93ed536fba	alternator: wait for schema agreement after table creation In order to be sure that all nodes acknowledged that a table was created, the CreateTable request will now only return after seeing that schema agreement was reached. Rationale: alternator users check if the table was created by issuing a DescribeTable request, and assume that the table was correctly created if it returns nonempty results. However, our current implementation of DescribeTable returns local results, which is not enough to judge if all the other nodes acknowledge the new table. CQL drivers are reported to always wait for schema agreement after issuing DDL-changing requests, so there should be no harm in waiting a little longer for alternator's CreateTable as well. Fixes #6361 Tests: alternator(local) (cherry picked from commit `5f2eadce09`)	2020-05-31 13:18:11 +03:00
Nadav Har'El	ab3da4510c	docs, alternator: improve description of status of global tables support The existing text did not explain what happens if additional DCs are added to the cluster, so this patch improves the explanation of the status of our support for global tables, including that issue. Fixes #6353 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200513175908.21642-1-nyh@scylladb.com> (cherry picked from commit `f3fd976120`)	2020-05-31 13:13:13 +03:00
Asias He	bb8fcbff68	repair: Abort the queue in write_end_of_stream in case of error In write_end_of_stream, it does: 1) Write write_partition_end 2) Write empty mutation_fragment_opt If 1) fails, 2) will be skipped, the consumer of the queue will wait for the empty mutation_fragment_opt forever. Found this issue when injecting random exceptions between 1) and 2). Refs #6272 Refs #6248 (cherry picked from commit `b744dba75a`)	2020-05-27 20:11:30 +03:00
Hagit Segev	af43d0c62d	release: prepare for 4.1.rc1 scylla-4.1.rc1	2020-05-26 18:57:30 +03:00
Amnon Heiman	8c8c266f67	storage_service: get_range_to_address_map prevent use after free The implementation of get_range_to_address_map has a default behaviour, when getting an empty keypsace, it uses the first non-system keyspace (first here is basically, just a keyspace). The current implementation has two issues, first, it uses a reference to a string that is held on a stack of another function. In other word, there's a use after free that is not clear why we never hit. The second, it calls get_non_system_keyspaces twice. Though this is not a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling that function does have a cost). This patch solves both issues, by chaning the implementation to hold a string instead of a reference to a string. Second, it stores the results from get_non_system_keyspaces and reuse them it's more efficient and holds the returned values on the local stack. Fixes #6465 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `69a46d4179`)	2020-05-25 12:48:11 +03:00
Nadav Har'El	6d1301d93c	alternator: better error messages when 'forbid_rmw' mode is on When the 'forbid_rmw' write isolation policy is selected, read-modify-write are intentionally forbidden. The error message in this case used to say: "Read-modify-write operations not supported" Which can lead users to believe that this operation isn't supported by this version of Alternator - instead of realizing that this is in fact a configurable choice. So in this patch we just change the error message to say: "Read-modify-write operations are disabled by 'forbid_rmw' write isolation policy. Refer to https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md#write-isolation-policies for more information." Fixes #6421. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200518125538.8347-1-nyh@scylladb.com> (cherry picked from commit `5ef9854e86`)	2020-05-25 08:49:48 +03:00
Tomasz Grabiec	be545d6d5d	sstables: index_reader: Fix overflow when calculating promoted index end When index file is larger than 4GB, offset calculation will overflow uint32_t and _promoted_index_end will be too small. As a result, promoted_index_size calculation will underflow and the rest of the page will be interpretd as a promoted index. The partitions which are in the remainder of the index page will not be found by single-partition queries. Data is not lost. Introduced in `6c5f8e0eda`. Fixes #6040 Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com> (cherry picked from commit `a6c87a7b9e`)	2020-05-24 09:45:42 +03:00

1 2 3 4 5 ...

22072 Commits