scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Nadav Har'El	2796b0050d	storage_service: correct missing exception in logging rebuild failure When failing to rebuild a node, we would print the error with the useless explanation "<no exception>". The problem was a typo in the logging command which used std::current_exception() - which wasn't relevant in that point - instead of "ep". Refs #8089 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210314113118.1690132-1-nyh@scylladb.com> (cherry picked from commit `d73934372d`)	2021-03-21 10:51:23 +02:00
Nadav Har'El	6bc005643e	alternator-test: increase read timeout and avoid retries By default the boto3 library waits up to 60 second for a response, and if got no response, it sends the same request again, multiple times. We already noticed in the past that it retries too many times thus slowing down failures, so in our test configuration lowered the number of retries to 3, but the setting of 60-second-timeout plus 3 retries still causes two problems: 1. When the test machine and the build are extremely slow, and the operation is long (usually, CreateTable or DeleteTable involving multiple views), the 60 second timeout might not be enough. 2. If the timeout is reached, boto3 silently retries the same operation. This retry may fail because the previous one really succeeded at least partially! The symptom is tests which report an error when creating a table which already exists, or deleting a table which dooesn't exist. The solution in this patch is first of all to never do retries - if a query fails on internal server error, or times out, just report this failure immediately. We don't expect to see transient errors during local tests, so this is exactly the right behavior. The second thing we do is to increase the default timeout. If 1 minute was not enough, let's raise it to 5 minutes. 5 minutes should be enough for every operation (famous last words...). Even if 5 minutes is not enough for something, at least we'll now see the timeout errors instead of some wierd errors caused by retrying an operation which was already almost done. Fixes #8135 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210222125630.1325011-1-nyh@scylladb.com> (cherry picked from commit `0b2cf21932`)	2021-03-19 00:09:17 +02:00
Raphael S. Carvalho	d591ff5422	LCS: reshape: tolerate more sstables in level 0 with relaxed mode Relaxed mode, used during initialization, of reshape only tolerates min_threshold (default: 4) L0 sstables. However, relaxed mode should tolerate more sstables in level 0, otherwise boot will have to reshape level 0 every time it crosses the min threshold. So let's make LCS reshape tolerate a max of max_threshold and 32. This change is beneficial because once table is populated, LCS regular compaction can decide to merge those sstables in level 0 into level 1 instead, therefore reducing WA. Refs #8297. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210318131442.17935-1-raphaelsc@scylladb.com> (cherry picked from commit `e53cedabb1`)	2021-03-18 19:19:58 +02:00
Raphael S. Carvalho	acb1c3eebf	compaction_manager: Fix performance of cleanup compaction due to unlimited parallelism Prior to `463d0ab`, only one table could be cleaned up at a time on a given shard. Since then, all tables belonging to a given keyspace are cleaned up in parallel. Cleanup serialization on each shard was enforced with a semaphore, which was incorrectly removed by the patch aforementioned. So space requirement for cleanup to succeed can be up to the size of keyspace, increasing the chances of node running out of space. Node could also run out of memory if there are tons of tables in the keyspace. Memory requirement is at least #_of_tables * 128k (not taking into account write behind, etc). With 5k tables, it's ~0.64G per shard. Also all tables being cleaned up in parallel will compete for the same disk and cpu bandwidth, so making them all much slower, and consequently the operation time is significantly higher. This problem was detected with cleanup, but scrub and upgrade go through the same rewrite procedure, so they're affected by exact the same problem. Fixes #8247. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210312162223.149993-1-raphaelsc@scylladb.com> (cherry picked from commit `7171244844`)	2021-03-18 14:29:20 +02:00
Dejan Mircevski	a04242ea62	cql3/expr: Handle `IN ?` bound to null Previously, we crashed when the IN marker is bound to null. Throw invalid_request_exception instead. This is a 4.3 backport of the #8265 fix. Tests: unit (dev) (cherry picked from commit `8db24fc03b`) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8308	2021-03-18 10:39:19 +02:00
Nadav Har'El	7131c7c523	update tools/java submodule Backported fix for Refs #8229 into submodule. * tools/java f2e8666d7e...d49ae89b4b (1): > sstableloader: Only escape column names once Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-03-15 16:56:32 +02:00
Raphael S. Carvalho	6af7cf8a39	compaction: Prevent cleanup and regular from compacting the same sstable Due to regression introduced by `463d0ab`, regular can compact in parallel a sstable being compacted by cleanup, scrub or upgrade. This redundancy causes resources to be wasted, write amplification is increased and so does the operation time, etc. That's a potential source of data resurrection because the now-owned data from a sstable being compacted by both cleanup and regular will still exist in the node afterwards, so resurrection can happen if node regains ownership. Fixes #8155. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210225172641.787022-1-raphaelsc@scylladb.com> (cherry picked from commit `2cf0c4bbf1`) Includes fixup patch: compaction_manager: Fix use-after-free in rewrite_sstables() Use-after-free introduced by `2cf0c4bbf1`. That's because compacting is moved into then_wrapped() lambda, so it's potentially freed on the next iteration of repeat(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210309232940.433490-1-raphaelsc@scylladb.com> (cherry picked from commit `f7cc431477`)	2021-03-11 08:24:42 +02:00
Asias He	e2d4940b6d	gossip: Handle timeout error in gossiper::do_shadow_round Currently, the rpc timeout error for the GOSSIP_GET_ENDPOINT_STATES verb is not handled in gossiper::do_shadow_round. If the GOSSIP_GET_ENDPOINT_STATES rpc call to any of the remote nodes goes timeout, gossiper::do_shadow_round will throw an exception and fail the whole boot up process. It is fine that some of the remote nodes timeout in shadow round. It is not a must to talk to all nodes. This patch fixes an issue we saw recently in our sct tests: ``` INFO \| scylla[1579]: [shard 0] init - Shutting down gossiping INFO \| scylla[1579]: [shard 0] gossip - gossip is already stopped INFO \| scylla[1579]: [shard 0] init - Shutting down gossiping was successful ... ERR \| scylla[1579]: [shard 0] init - Startup failed: seastar::rpc::timeout_error (rpc call timed out) ``` Fixes #8187 Closes #8213 (cherry picked from commit `dc40184faa`)	2021-03-09 19:04:08 +02:00
Benny Halevy	09f9ff3f96	repair: repair_writer: do not capture lw_shared_ptr cross-shard The shared_from_this lw_shared_ptr must not be accessed across shards. Capturing it in the lambda passed to mutation_writer::distribute_reader_and_consume_on_shards causes exactly that since the captured lw_shared_ptr is copied on other shards, and ends up in memory corruption as seen in #7535 (probably due to lw_shared_ptr._count going out-of-sync when incremented/decremented in parallel on other shards with no synchronization. This was introduced in `289a08072a`. The writer is not needed in the body of this lambda anyways so it doesn't need to capture it. It is already held by the continuations until the end of the chain. Fixes #7535 Test: repair_additional_test:RepairAdditionalTest.repair_disjoint_row_3nodes_diff_shard_count_test (dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201104142216.125249-1-bhalevy@scylladb.com> (cherry picked from commit `f93fb55726`)	2021-03-03 21:27:06 +02:00
Dejan Mircevski	d671185828	cql3: Fix maps::setter_by_key for unset values Unset values for key and value were not handled. Handle them in a manner matching Cassandra. This fixes all cases in testMapWithUnsetValues, so re-enable it (and fix a comment typo in it). Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `9eed26ca3d`) Fixes #7740.	2021-03-02 16:38:30 +02:00
Dejan Mircevski	8d1784805a	cql3: Fix `IN ?` for unset values When the right-hand side of IN is an unset value, we must report an error, like Cassandra does. This fixes testListWithUnsetValues, so re-enable it. Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `4515a49d4d`) Fixes #7740.	2021-03-02 16:38:10 +02:00
Dejan Mircevski	1d4ce229eb	cql3: Fix handling of scalar unset value Make the bind() operation of the scalar marker handle the unset-value case (which it previously didn't). Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `5bee97fa51`) Fixes #7740.	2021-03-02 16:37:45 +02:00
Dejan Mircevski	ba9897a34e	cql3: Fix crash when removing unset_value from set Avoid crash described in #7740 by ignoring the update when the element-to-remove is UNSET_VALUE. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `8b2f459622`) Fixes #7740.	2021-03-02 16:37:15 +02:00
Hagit Segev	5cdc1fa662	release: prepare for 4.3.2 scylla-4.3.2	2021-03-01 22:04:21 +02:00
Avi Kivity	81347037d3	Update seastar submodule * seastar 69f8394742...b70b444924 (1): > io_queue: Fix "delay" metrics Fixes #8166.	2021-03-01 13:57:57 +02:00
Avi Kivity	49c3b812b9	Update seastar submodule * seastar 6973080cd1...69f8394742 (1): > rpc: streaming sink: order outgoing messages Fixes #7552.	2021-03-01 12:20:57 +02:00
Avi Kivity	6ffd23a957	Point seastar submodule at scylla-seastar.git This allows is to backport Seastar patches to branch-4.3.	2021-03-01 12:19:40 +02:00
Raphael S. Carvalho	a0b78956e8	sstables: Fix TWCS reshape for windows with at least min_threshold sstables TWCS reshape was silently ignoring windows which contain at least min_threshold sstables (can happen with data segregation). When resizing candidates, size of multi_window was incorrectly used and it was always empty in this path, which means candidates was always cleared. Fixes #8147. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210224125322.637128-1-raphaelsc@scylladb.com> (cherry picked from commit `21608bd677`)	2021-02-28 16:42:43 +02:00
Pavel Solodovnikov	74941f67e6	large_data_handler: fix segmentation fault when constructing `data_value` from a `nullptr` It turns out that `cql_table_large_data_handler::record_large_rows` and `cql_table_large_data_handler::record_large_cells` were broken for reporting static cells and static rows from the very beginning: In case a large static cell or a large static row is encountered, it tries to execute `db::try_record` with `nullptr` additional values, denoting that there is no clustering key to be recorded. These values are next passed to `qctx.execute_cql()`, which creates `data_value` instances for each statement parameter, hence invoking `data_value(nullptr)`. This uses `const char*` overload which delegates to `std::string_view` ctor overload. It is UB to pass `nullptr` pointer to `std::string_view` ctor. Hence leading to segmentation faults in the aforementioned large data reporting code. What we want here is to make a null `data_value` instead, so just add an overload specifically for `std::nullptr_t`, which will create a null `data_value` with `text` type. A regression test is provided for the issue (written in `cql-pytest` framework). Tests: test/cql-pytest/test_large_cells_rows.py Fixes: #6780 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20201223204552.61081-1-pa.solodovnikov@scylladb.com> (cherry picked from commit `219ac2bab5`)	2021-02-23 12:13:51 +02:00
Avi Kivity	8c9c0807ef	Merge 'cdc: Limit size of topology description' from Piotr Jastrzębski Currently, whole topology description for CDC is stored in a single row. This means that for a large cluster of strong machines (say 100 nodes 64 cpus each), the size of the topology description can reach 32MB. This causes multiple problems. First of all, there's a hard limit on mutation size that can be written to Scylla. It's related to commit log block size which is 16MB by default. Mutations bigger than that can't be saved. Moreover, such big partitions/rows cause reactor stalls and negatively influence latency of other requests. This patch limits the size of topology description to about 4MB. This is done by reducing the number of CDC streams per vnode and can lead to CDC data not being fully colocated with Base Table data on shards. It can impact performance and consistency of data. This is just a quick fix to make it easily backportable. A full solution to the problem is under development. For more details see #7961, #7993 and #7985. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #8048 * github.com:scylladb/scylla: cdc: Limit size of topology description cdc: Extract create_stream_ids from topology_description_generator (cherry picked from commit `c63e26e26f`)	2021-02-22 20:39:08 +02:00
Takuya ASADA	f316e1db54	scylla_util.py: resolve /dev/root to get actual device on aws When psutil.disk_paritions() reports / is /dev/root, aws_instance mistakenly reports root partition is part of ephemeral disks, and RAID construction will fail. This prevents the error and reports correct free disks. Fixes #8055 Closes #8040 (cherry picked from commit `32d4ec6b8a`)	2021-02-21 16:23:21 +02:00
Nadav Har'El	675db3e65e	alternator: fix ValidationException in FilterExpression - and more The first condition expressions we implemented in Alternator were the old "Expected" syntax of conditional updates. That implementation had some specific assumptions on how it handles errors: For example, in the "LT" operator in "Expected", the second operand is always part of the query, so an error in it (e.g., an unsupported type) resulted it a ValidationException error. When we implemented ConditionExpression and FilterExpression, we wrongly used the same functions check_compare(), check_BETWEEN(), etc., to implement them. This results in some inaccurate error handling. The worst example is what happens when you use a FilterExpression with an expression such as "x < y" - this filter is supposed to silently skip items whose "x" and "y" attributes have unsupported or different types, but in our implementation a bad type (e.g., a list) for y resulted in a ValidationException which aborted the entire scan! Interestingly, in once case (that of BEGINS_WITH) we actually noticed the slightly different behavior needed and implemented the same operator twice - with ugly code duplication. But in other operators we missed this problem completely. This patch first adds extensive tests of how the different expressions (Expected, QueryFilter, FilterExpression, ConditionExpression) and the different operators handle various input errors - unsupported types, missing items, incompatible types, etc. Importantly, the tests demonstrate that there is often different behavior depending on whether the bad input comes from the query, or from the item. Some of the new tests fail before this patch, but others pass and were useful to verify that the patch doesn't break anything that already worked correctly previously. As usual, all the tests pass on Cassandra. Finally, this patch fixes all these problems. The comparison functions like check_compare() and check_BETWEEN() now not only take the operands, they also take booleans saying if each of the operands came from the query or from an item. The old-syntax caller (Expected or QueryFilter) always say that the first operand is from the item and the second is from the query - but in the new-syntax caller (ConditionExpression or FilterExpression) any or all of the operands can come from the query and need verification. The old duplicated code for check_BEGINS_WITH() - which a TODO to remove it - is finally removed. Instead we use the same idea of passing booleans saying if each of its operands came from an item or from the query. Fixes #8043 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `653610f4bc`)	2021-02-21 09:47:40 +02:00
Nadav Har'El	5a45c2b947	alternator: fix UpdateItem ADD for non-existent attribute UpdateItem's "ADD" operation usually adds elements to an existing set or adds a number to an existing counter. But it can also be used to create a new set or counter (as if adding to an empty set or zero). We unfortunately did not have a test for this case (creating a new set or counter), and when I wrote such a test now, I discovered the implementation was missing. So this patch adds both the test and the implementation. The new test used to fail before this patch, and passes with it - and passes on DynamoDB. Note that we only had this bug for the newer UpdateItem syntax. For the old AttributeUpdates syntax, we already support ADD actions on missing attributes, and already tested it in test_update_item_add(). I just forgot to test the same thing for the newer syntax, so I missed this bug :-( Fixes #7763. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207085135.2551845-1-nyh@scylladb.com> (cherry picked from commit `a8fdbf31cd`)	2021-02-21 08:24:43 +02:00
Benny Halevy	b446cbad97	stream_session: prepare: fix missing string format argument As seen in mv_populating_from_existing_data_during_node_decommission_test dtest: ``` ERROR 2021-02-11 06:01:32,804 [shard 0] stream_session - failed to log message: fmt::v7::format_error (argument not found) ``` Fixes #8067 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210211100158.543952-1-bhalevy@scylladb.com> (cherry picked from commit `d01e7e7b58`)	2021-02-14 13:10:22 +02:00
Shlomi Livne	da2c5fd549	scylla_io_setup did not configure pre tuned gce instances correctly scylla_io_setup condition for nr_disks was using the bitwise operator (&) instead of logical and operator (and) causing the io_properties files to have incorrect values Fixes #7341 Reviewed-by: Lubos Kosco <lubos@scylladb.com> Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Closes #8019 (cherry picked from commit `718976e794`)	2021-02-14 13:10:19 +02:00
Piotr Wojtczak	b44b814d94	Validate ascii values when creating from CQL Although the code for it existed already, the validation function hasn't been invoked properly. This change fixes that, adding a validating check when converting from text to specific value type and throwing a marshal exception if some characters are not ASCII. Fixes #5421 Closes #7532 (cherry picked from commit `caa3c471c0`)	2021-02-10 19:37:30 +02:00
Yaron Kaikov	46650adcd0	release: prepare for 4.3.1 scylla-4.3.1	2021-02-10 08:22:38 +02:00
Botond Dénes	baeddc3cb5	query: use local limit for non-limited queries in mixed cluster Since `fea5067df` we enforce a limit on the memory consumption of otherwise non-limited queries like reverse and non-paged queries. This limit is sent down to the replicas by the coordinator, ensuring that each replica is working with the same limit. This however doesn't work in a mixed cluster, when upgrading from a version which doesn't have this series. This has been worked around by falling back to the old max_result_size constant of 1MB in mixed clusters. This however resulted in a regression when upgrading from a pre `fea5067df` to a post `fea5067df` one. Pre `fea5067df` already had a limit for reverse queries, which was generalized to also cover non-paged ones too by `fea5067df`. The regression manifested in previously working reverse queries being aborted. This happened because even though the user has set a generous limit for them before the upgrade, in the mix cluster replicas fall back to the much stricter 1MB limit temporarily ignoring the configured limit if the coordinator is an old node. This patch solves this problem by using the locally configured limit instead of the max_result_size constant. This means that the user has to take extra care to configure the same limit on all replicas, but at least they will have working reverse queries during the upgrade. Fixes: #8035 Tests: unit(release), manual test by user who reported the issue Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210209075947.1004164-1-bdenes@scylladb.com> (cherry picked from commit `3d001b5587`)	2021-02-09 20:00:09 +02:00
Piotr Sarna	33831c49cc	Merge 'select_statement: Fix aggregate results on indexed selects (timeouts fixed) ' from Piotr Grabowski Overview Fixes #7355. Before this changes, there were a few invalid results of aggregates/GROUP BY on tables with secondary indexes (see below). Unfortunately, it still does NOT fix the problem in issue #7043. Although this PR moves forward fixing of that issue, there is still a bug with `TOKEN(...)` in `WHERE` clauses of indexed selects that is not addressed in this PR. It will be fixed in my next PR. It does NOT fix the problems in issues #7432, #7431 as those are out-of-scope of this PR and do not affect the correctness of results (only return a too large page). GROUP BY (first commit) Before the change, `GROUP BY` `SELECT`s with some `WHERE` restrictions on an indexed column would return invalid results (same grouped column values appearing multiple times): ``` CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck)); CREATE INDEX ks_t on ks.t(v); INSERT INTO ks.t(pk, ck, v) VALUES (1, 2, 3); INSERT INTO ks.t(pk, ck, v) VALUES (1, 4, 3); SELECT pk FROM ks.t WHERE v=3 GROUP BY pk; pk ---- 1 1 ``` This is fixed by correctly passing `_group_by_cell_indices` to `result_set_builder`. Fixes the third failing example from issue #7355. Paging (second commit) Fixes two issues related to improper paging on indexed `SELECT`s. As those two issues are closely related (fixing one without fixing the other causes invalid results of queries), they are in a single commit (second commit). The first issue is that when using `slice.set_range`, the existing `_row_ranges` (which specify clustering key prefixes) are not taken into account. This caused the wrong rows to be included in the result, as the clustering key bound was set to a half-open range: ``` CREATE TABLE ks.t(a int, b int, c int, PRIMARY KEY ((a, b), c)); CREATE INDEX kst_index ON ks.t(c); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 3); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 4); INSERT INTO ks.t(a, b, c) VALUES (1, 2, 5); SELECT COUNT() FROM ks.t WHERE c = 3; count ------- 2 ``` The second commit fixes this issue by properly trimming `row_ranges`. The second fixed problem is related to setting the `paging_state` to `internal_options`. It was improperly set to the value just after reading from index, making the base query start from invalid `paging_state`. The second commit fixes this issue by setting the `paging_state` after both index and base table queries are done. Moreover, the `paging_state` is now set based on `paging_state` of index query and the results of base table query (as base query can return more rows than index query). The second commit fixes the first two failing examples from issue #7355. Tests (fourth commit) Extensively tests queries on tables with secondary indices with aggregates and `GROUP BY`s. Tests three cases that are implemented in `indexed_table_select_statement::do_execute` - `partition_slices`, `whole_partitions` and (non-`partition_slices` and non-`whole_partitions`). As some of the issues found were related to paging, the tests check scenarios where the inserted data is smaller than a page, larger than a page and larger than two pages (and some in-between page boundaries scenarios). I found all those parameters (case of `do_execute`, number of inserted rows) to have an impact of those fixed bugs, therefore the tests validate a large number of those scenarios. Configurable internal_paging_size (third commit) Before this change, internal `page_size` when doing aggregate, `GROUP BY` or nonpaged filtering queries was hard-coded to `DEFAULT_COUNT_PAGE_SIZE` (10,000). This change adds new internal_paging_size variable, which is configurable by `set_internal_paging_size` and `reset_internal_paging_size` free functions. This functionality is only meant for testing purposes. Closes #7497 github.com:scylladb/scylla: tests: Add secondary index aggregates tests select_statement: Introduce internal_paging_size select_statement: Fix paging on indexed selects select_statement: Fix GROUP BY on indexed select (cherry picked from commit `8c645f74ce`)	2021-02-08 20:17:49 +02:00
Amnon Heiman	47fc8389fb	API: Fix aggregation in column_familiy Few method in column_familiy API were doing the aggregation wrong, specifically, bloom filter disk size. The issue is not always visible, it happens when there are multiple filter files per shard. Fixes #4513 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes #8007 (cherry picked from commit `4498bb0a48`)	2021-02-08 17:04:07 +02:00
Avi Kivity	a7a979b794	Merge 'Add waiting for flushes on table drops' from Piotr Sarna This series makes sure that before the table is dropped, all pending memtable flushes related to its memtables would finish. Normally, flushes are not problematic in Scylla, because all tables are by default `auto_snapshot=true`, which also implies that a table is flushed before being dropped. However, with `auto_snapshot=false` the flush is not attempted at all. It leads to the following race: 1. Run a node with `auto_snapshot=false` 2. Schedule a memtable flush (e.g. via nodetool) 3. Get preempted in the middle of the flush 4. Drop the table 5. The flush that already started wakes up and starts operating on freed memory, which causes a segfault Tests: manual(artificially preempting for a long time in bullet point 2. to ensure that the race occurs; segfaults were 100% reproducible before the series and do not happen anymore after the series is applied) Fixes #7792 Closes #7798 * github.com:scylladb/scylla: database: add flushes to waiting for pending operations table: unify waiting for pending operations database: add a phaser for flush operations database: add waiting for pending streams on table drop (cherry picked from commit `7636799b18`)	2021-02-02 17:12:17 +02:00
Avi Kivity	413e03ce5e	row_cache: linearize key in cache_entry::do_read() do_read() does not linearize cache_entry::_key; this can cause a crash with keys larger than 13k. Fixes #7897. Closes #7898 (cherry picked from commit `d508a63d4b`)	2021-01-17 09:30:23 +02:00
Hagit Segev	000585522e	release: prepare for 4.3.0 scylla-4.3.0	2021-01-10 10:04:40 +02:00
Evgeniy Naydanov	47b121130a	scylla_raid_setup: try /dev/md[0-9] if no --raiddev provided If scylla_raid_setup script called without --raiddev argument then try to use any of /dev/md[0-9] devices instead of only one /dev/md0. Do it in this way because on Ubuntu 20.04 /dev/md0 used by OS already. Closes #7628 (cherry picked from commit `587b909c5c`) Fixes #7627.	2021-01-03 16:46:16 +02:00
Takuya ASADA	15f55141ec	scylla_raid_setup: use sysfs to detect existing RAID volume We may not able to detect existing RAID volume by device file existance, we should use sysfs instead to make sure it's running. Fixes #7383 Closes #7399 (cherry picked from commit `fc1c4f2261`)	2021-01-03 16:45:36 +02:00
Avi Kivity	69fbeaa27e	Update tools/jmx submodule * tools/jmx c51906e...47b355e (1): > install.sh: set a valid WorkingDirectory for nonroot offline install Ref scylladb/scylla-jmx#151	2020-12-31 14:11:51 +02:00
Benny Halevy	a366de2a63	compaction: compaction_writer: destroy shared_sstable after the sstable_writer sstable_writer may depend on the sstable throughout its whole lifecycle. If the sstable is freed before the sstable_writer we might hit use-after-free as in the follwing case: ``` std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket>::operator+=(long) at /usr/include/c++/10/bits/stl_deque.h:240 (inlined by) std::operator+(std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket> const&, long) at /usr/include/c++/10/bits/stl_deque.h:378 (inlined by) std::_Deque_iterator<sstables::compression::segmented_offsets::bucket, sstables::compression::segmented_offsets::bucket&, sstables::compression::segmented_offsets::bucket>::operator[](long) const at /usr/include/c++/10/bits/stl_deque.h:252 (inlined by) std::deque<sstables::compression::segmented_offsets::bucket, std::allocator<sstables::compression::segmented_offsets::bucket> >::operator[](unsigned long) at /usr/include/c++/10/bits/stl_deque.h:1327 (inlined by) sstables::compression::segmented_offsets::push_back(unsigned long, sstables::compression::segmented_offsets::state&) at ./sstables/compress.cc:214 sstables::compression::segmented_offsets::writer::push_back(unsigned long) at ./sstables/compress.hh:123 (inlined by) compressed_file_data_sink_impl<crc32_utils, (compressed_checksum_mode)1>::put(seastar::temporary_buffer<char>) at ./sstables/compress.cc:519 seastar::output_stream<char>::put(seastar::temporary_buffer<char>) at table.cc:? (inlined by) seastar::output_stream<char>::put(seastar::temporary_buffer<char>) at ././seastar/include/seastar/core/iostream-impl.hh:432 seastar::output_stream<char>::flush() at table.cc:? seastar::output_stream<char>::close() at table.cc:? sstables::file_writer::close() at sstables.cc:? sstables::mc::writer::~writer() at writer.cc:? (inlined by) sstables::mc::writer::~writer() at ./sstables/mx/writer.cc:790 sstables::mc::writer::~writer() at writer.cc:? flat_mutation_reader::impl::consumer_adapter<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~consumer_adapter() at compaction.cc:? (inlined by) std::_Optional_payload_base<sstables::compaction_writer>::_M_destroy() at /usr/include/c++/10/optional:260 (inlined by) std::_Optional_payload_base<sstables::compaction_writer>::_M_reset() at /usr/include/c++/10/optional:280 (inlined by) std::_Optional_payload<sstables::compaction_writer, false, false, false>::~_Optional_payload() at /usr/include/c++/10/optional:401 (inlined by) std::_Optional_base<sstables::compaction_writer, false, false>::~_Optional_base() at /usr/include/c++/10/optional:474 (inlined by) std::optional<sstables::compaction_writer>::~optional() at /usr/include/c++/10/optional:659 (inlined by) sstables::compacting_sstable_writer::~compacting_sstable_writer() at ./sstables/compaction.cc:229 (inlined by) compact_mutation<(emit_only_live_rows)0, (compact_for_sstables)1, sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>::~compact_mutation() at ././mutation_compactor.hh:468 (inlined by) compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>::~compact_for_compaction() at ././mutation_compactor.hh:538 (inlined by) std::default_delete<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >::operator()(compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>) const at /usr/include/c++/10/bits/unique_ptr.h:85 (inlined by) std::unique_ptr<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer>, std::default_delete<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~unique_ptr() at /usr/include/c++/10/bits/unique_ptr.h:361 (inlined by) stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >::~stable_flattened_mutations_consumer() at ././mutation_reader.hh:342 (inlined by) flat_mutation_reader::impl::consumer_adapter<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >::~consumer_adapter() at ././flat_mutation_reader.hh:201 auto flat_mutation_reader::impl::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter>(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:272 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter>(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, flat_mutation_reader::no_filter, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:383 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> > >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer, noop_compacted_fragments_consumer> >, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at ././flat_mutation_reader.hh:389 (inlined by) seastar::future<void> sstables::compaction::setup<noop_compacted_fragments_consumer>(noop_compacted_fragments_consumer)::{lambda(flat_mutation_reader)#1}::operator()(flat_mutation_reader)::{lambda()#1}::operator()() at ./sstables/compaction.cc:612 ``` What happens here is that: compressed_file_data_sink_impl(output_stream<char> out, sstables::compression* cm, sstables::local_compression lc) : _out(std::move(out)) , _compression_metadata(cm) , _offsets(_compression_metadata->offsets.get_writer()) , _compression(lc) , _full_checksum(ChecksumType::init_checksum()) _compression_metadata points to a buffer held by the sstable object. and _compression_metadata->offsets.get_writer returns a writer that keeps a reference to the segmented_offsets in the sstables::compression that is used in the ~writer -> close path. Fixes #7821 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201227145726.33319-1-bhalevy@scylladb.com> (cherry picked from commit `8a745a0ee0`)	2020-12-29 15:07:12 +02:00
Yaron Kaikov	5bd52e4dba	release: prepare for 4.3.rc3 scylla-4.3.rc3	2020-12-17 14:27:38 +02:00
Gleb Natapov	8a3a69bc3e	mutation_writer: pass exceptions through feed_writer feed_writer() eats exception and transforms it into an end of stream instead. Downstream validators hate when this happens. Fixes #7482 Message-Id: <20201216090038.GB3244976@scylladb.com> (cherry picked from commit `61520a33d6`)	2020-12-16 17:19:39 +02:00
Aleksandr Bykov	50c01f7331	dist: scylla_util: fix aws_instance.ebs_disks method aws_instance.ebs_disks() method should return ebs disk instead of ephemeral Signed-off-by: Aleksandr Bykov <alex.bykov@scylladb.com> Closes #7780 (cherry picked from commit `e74dc311e7`)	2020-12-16 11:58:19 +02:00
Avi Kivity	ecfe466e7b	dist: rpm: uninstall tuned when installing scylla-kernel-conf tuned 2.11.0-9 and later writes to kerned.sched_wakeup_granularity_ns and other sysctl tunables that we so laboriously tuned, dropping performance by a factor of 5 (due to increased latency). Fix by obsoleting tuned during install (in effect, we are a better tuned, at least for us). Not needed for .deb, since debian/ubunto do not install tuned by default. Fixes #7696 Closes #7776 (cherry picked from commit `615b8e8184`)	2020-12-12 14:29:58 +02:00
Kamil Braun	69e5caadb6	cdc: produce postimage when inserting with no regular columns When a row was inserted into a table with no regular columns, and no such row existed in the first place, postimage would not be produced. Fix this. Fixes #7716. Closes #7723 (cherry picked from commit `2da723b9c8`)	2020-12-11 20:14:03 +02:00
Piotr Sarna	0ff3c0dcb5	Merge 'Cleanup CDC tests after CDC became GA' from Piotr Jastrzębski Now that CDC is GA, it should be enabled in all the tests by default. To achieve that the PR adds a special db::config::add_cdc_extension() helper which is used in cql_test_envm to make sure CDC is usable in all the tests that use cql_test_env.m As a result, cdc_tests can be simplified. Finally, some trailing whitespaces are removed from cdc_tests. Tests: unit(dev) Closes #7657 * github.com:scylladb/scylla: cdc: Remove trailing whitespaces from cdc_tests cdc: Remove mk_cdc_test_config from tests config: Add add_cdc_extension function for testing cdc: Add missing includes to cdc_extension.hh (cherry picked from commit `5a9dc6a3cc`)	2020-12-11 20:13:08 +02:00
Nadav Har'El	2148a194c2	alternator: fix broken Scan/Query paging with bytes keys When an Alternator table has partition keys or sort keys of type "bytes" (blobs), a Scan or Query which required paging used to fail - we used an incorrect function to output LastEvaluatedKey (which tells the user where to continue at the next page), and this incorrect function was correct for strings and numbers - but NOT for bytes (for bytes, we need to encode them as base-64). This patch also includes two tests - for bytes partition key and for bytes sort key - that failed before this patch and now pass. The test test_fetch_from_system_tables also used to fail after a Limit was added to it, because one of the tables it scans had a bytes key. That test is also fixed by this patch. Fixes #7768 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207175957.2585456-1-nyh@scylladb.com> (cherry picked from commit `86779664f4`)	2020-12-10 18:58:36 +02:00
Piotr Sarna	77ab7b1221	db: fix getting local ranges for size estimates table When getting local ranges, an assumption is made that if a range does not contain an end or when its end is a maximum token, then it must contain a start. This assumption proven not true during manual tests, so it's now fortified with an additional check. Here's a gdb output for a set of local ranges which causes an assertion failure when calling `get_local_ranges` on it: (gdb) p ranges $1 = std::vector of length 2, capacity 2 = {{_interval = {_start = std::optional<interval_bound<dht::token>> = {[contained value] = {_value = {_kind = dht::token_kind::before_all_keys, _data = 0}, _inclusive = false}}, _end = std::optional<interval_bound<dht::token>> [no contained value], _singular = false}}, {_interval = { _start = std::optional<interval_bound<dht::token>> [no contained value], _end = std::optional<interval_bound<dht::token>> = {[contained value] = {_value = { _kind = dht::token_kind::before_all_keys, _data = 0}, _inclusive = true}}, _singular = false}}} Closes #7764 (cherry picked from commit `1cc4ed50c1`)	2020-12-10 18:58:36 +02:00
Nadav Har'El	59bcd7f029	alternator, test: make test_fetch_from_system_tables faster The test test_fetch_from_system_tables tests Alternator's system-table feature by reading from all system tables. The intention was to confirm we don't crash reading any of them - as they have different schemas and can run into different problems (we had such problems in the initial implementation). The intention was not to read a lot from each table - we only make a single "Scan" call on each, to read one page of data. However, the Scan call did not set a Limit, so the single page can get pretty big. This is not normally a problem, but in extremely slow runs - such as when running the debug build on an extremely overcommitted test machine (e.g., issue #7706) reading this large page may take longer than our default timeout. I'll send a separate patch for the timeout issue, but for now, there is really no reason why we need to read a big page. It is good enough to just read 50 rows (with Limit=50). This will still read all the different types and make the test faster. As an example, in the debug run on my laptop, this test spent 2.4 seconds to read the "compaction_history" table before this patch, and only 0.1 seconds after this patch. 2.4 seconds is close to our default timeout (10 seconds), 0.1 is very far. Fixes #7706 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207075112.2548178-1-nyh@scylladb.com> (cherry picked from commit `220d6dde17`)	2020-12-10 18:58:36 +02:00
Calle Wilund	bc5008b165	alternator::streams: Use end-of-record info in get_records Fixes #7496 Since cdc log now has an end-of-batch/record marker that tells us explicitly that we've read the last row of a change, we can use this instead of timestamp checks + limit extra to ensure we have complete records. Note that this does not try to fulfill user query limit exact. To do this we would need to add a loop and potentially re-query if quried rows are not enough. But that is a separate exercise, and superbly suited for coroutines! (cherry picked from commit `c79108edbb`)	2020-12-10 18:58:36 +02:00
Nadav Har'El	dd7e3d3eab	alternator: fix query with both projection and filtering We had a bug when a Query/Scan had both projection (ProjectionExpression or AttributesToGet) and filtering (FilterExpression or Query/ScanFilter). The problem was that projection left only the requested attributes, and the filter might have needed - and not got - additional attributes. The solution in this patch is to add the generated JSON item also the extra attributes needed by filtering (if any), run the filter on that, and only at the end remove the extra filtering attributes from the item to be returned. The two tests test_query_filter.py::test_query_filter_and_attributes_to_get test_filter_expression.py::test_filter_expression_and_projection_expression Which failed before this patch now pass so we drop their "xfail" tag. Fixes #6951. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `282742a469`)	2020-12-10 18:58:36 +02:00
Lubos Kosco	3b617164dc	scylla_util.py: Increase disk to ram ratio for GCP Increase accepted disk-to-RAM ratio to 105 to accomodate even 7.5GB of RAM for one NVMe log various reasons for not recommending the instance type. Fixes #7587 Closes #7600 (cherry picked from commit `a0b1474bba`)	2020-12-09 09:38:01 +02:00
Benny Halevy	bb99d7ced6	large_data_handler: disable deletion of large data entries Currently we decide whether to delete large data entries based on the overall sstable data_size, since the entries themselves are typically much smaller than the whole sstable (especially cells and rows), this causes overzealous deletions (#7668) and inefficiency in the rows cache due to the large number of range tombstones created. Refs #7575 Test: sstable_3_x_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> This patch is targetted for branch-4.3 or earlier. In 4.4, the problem was fixed in #7669, but the fix is out of scope for backporting. Branch: 4.3 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201203130018.1920271-1-bhalevy@scylladb.com>	2020-12-06 11:33:41 +02:00

1 2 3 4 5 ...

24099 Commits