scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 19:21:01 +00:00

Author	SHA1	Message	Date
Takuya ASADA	5d3ff1e8a1	dist/redhat: stop using systemd macros, call systemctl directly Fedora version of systemd macros does not work correctly on CentOS7, since CentOS7 does not support "file trigger" feature. To fix the issue we need to stop using systemd macros, call systemctl directly. See scylladb/scylla-jmx#94 Closes #8005 (cherry picked from commit `7b310c591e`)	2021-05-18 13:50:51 +03:00
Raphael S. Carvalho	5358eaf1d6	compaction_manager: Don't swallow exception in procedure used by reshape and resharding run_custom_job() was swallowing all exceptions, which is definitely wrong because failure in a resharding or reshape would be incorrectly interpreted as success, which means upper layer will continue as if everything is ok. For example, ignoring a failure in resharding could result in a shared sstable being left unresharded, so when that sstable reaches a table, scylla would abort as shared ssts are no longer accepted in the main sstable set. Let's allow the exception to be propagated, so failure will be communicated, and resharding and reshape will be all or nothing, as originally intended. Fixes #8657. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210515015721.384667-1-raphaelsc@scylladb.com> (cherry picked from commit `10ae77966c`)	2021-05-18 13:00:32 +03:00
Avi Kivity	e78b96ee49	Update tools/jmx submodule (rpm systemd macros) * tools/jmx 0457674...5fcba13 (1): > dist/redhat: stop using systemd macros, call systemctl directly Ref scylladb/scylla-jmx#94.	2021-05-13 18:26:07 +03:00
Lauro Ramos Venancio	add245a27e	TWCS: initialize _highest_window_seen The timestamp_type is an int64_t. So, it has to be explicitly initialized before using it. This missing inicialization prevented the major compactation from happening when a time window finishes, as described in #8569. Fixes #8569 Signed-off-by: Lauro Ramos Venancio <lauro.venancio@incognia.com> Closes #8590 (cherry picked from commit `15f72f7c9e`)	2021-05-06 08:52:31 +03:00
Avi Kivity	108f56c6ed	Update tools/jmx submodule (nodetool cfstats deadlock) * tools/jmx 47b355e...0457674 (1): > APIBuilder: Unlock RW-lock in remove() Fixes #7991.	2021-05-03 16:52:41 +03:00
Nadav Har'El	d01ce491c0	Update tools/java submodule Backport sstableloader fix in tools/java submodule. Fixes #8230. * tools/java 2bedecd3a7...1489e7c539 (1): > sstableloader: Handle non-prepared batches with ":" in identifier names Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-05-03 10:15:06 +03:00
Tomasz Grabiec	7b2f65191c	thrift: Validate cell names when constructing clustering keys Currently, if the user provides a cell name with too many components, we will accept it and construct an invalid clusterin key. This may result in undefined behavior down the stream. It was caught by ASAN in a debug build when executing dtest cql_tests.py:MiscellaneousCQLTester.cql3_insert_thrift_test with nodetool flush manually added after the write. Triggered during sstable writing to an MC-format sstable: seastar::shared_ptr<abstract_type const>::operator*() const at ././seastar/include/seastar/core/shared_ptr.hh:577 sstables::mc::clustering_blocks_input_range::next() const at ./sstables/mx/writer.cc:180 To prevent corrupting the state in this way, we should fail early. This patch addds validation which will fail thrift requests which attempt to create invalid clustering keys. Fixes #7568. Example error: Internal server error: Cell name of ks.test has too many components, expected 1 got 2 in 0x0004000000040000017600 Message-Id: <1605550477-24810-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `0c5d23d274`)	2021-05-02 12:09:35 +03:00
Avi Kivity	add5ffa787	Merge '[branch 4.4] Backport reader_permit: always forward resources to the semaphore ' from Botond Dénes This is a backport of `8aaa3a7` to branch-4.4. The main conflicts were around Benny's reader close series (`fa43d76`), but it also turned out that an additional patch (2f1d65c) also has to backported to make sure admission on signaling resources doesn't deadlock. Refs: #8493 Closes #8571 * github.com:scylladb/scylla: test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units reader_concurrency_semaphore: add dump_diagnostics() reader_permit: always forward resources test: multishard_mutation_query_test: fuzzy-test: don't consume resource up-front reader_concurrency_semaphore: make admission conditions consistent (cherry picked from commit `bf9e1f6d2e`) [avi: convert coroutine in mutation_reader_test.cc to seastar thread]	2021-05-01 12:43:00 +03:00
Eliran Sinvani	32a1f2dcd9	Materialized views: fix possibly old views comming from other nodes Migration manager has a function to get a schema (for read or write), this function queries a peer node and retrieves the schema from it. One scenario where it can happen is if an old node, queries an old not fixed index. This makes a hole through which views that are only adjusted for reading can slip through. Here we plug the hole by fixing such views before they are registered. Closes #8509 (cherry picked from commit `480a12d7b3`) Fixes #8554.	2021-04-29 14:03:41 +03:00
Botond Dénes	f2072665d1	database: clear inactive reads in stop() If any inactive read is left in the semaphore, it can block `database::stop()` from shutting down, as sstables pinned by these reads will prevent `sstables::sstables_manager::close()` from finishing. This causes a deadlock. It is not clear how inactive reads can be left in the semaphore, as all users are supposed to clean up after themselves. Post 4.4 releases don't have this problem anymore as the inactive read handle was made a RAII object, removing the associated inactive read when destroyed. In 4.4 and earlier release this wasn't so, so errors could be made. Normally this is not a big issue, as these orphaned inactive reads are just evicted when the resources they own are needed, but it does become a serious issue during shutdown. To prevent a deadlock, clear the inactive reads earlier, in `database::stop()` (currently they are cleared in the destructor). This is a simple and foolproof way of ensuring any leftover inactive reads don't cause problems. Fixes: #8561 Tests: unit(dev) Closes #8562 (cherry picked from commit `840ca41393`)	2021-04-28 22:36:41 +03:00
Takuya ASADA	beb2bcb8bd	dist: increase fs.aio-max-nr value for other apps Current fs.aio-max-nr value cpu_count() * 11026 is exact size of scylla uses, if other apps on the environment also try to use aio, aio slot will be run out. So increase value +65536 for other apps. Related #8133 Closes #8228 (cherry picked from commit `53c7600da8`)	2021-04-25 16:16:31 +03:00
Takuya ASADA	8255b7984d	dist: tune fs.aio-max-nr based on the number of cpus Current aio-max-nr is set up statically to 1048576 in /etc/sysctl.d/99-scylla-aio.conf. This is sufficient for most use cases, but falls short on larger machines such as i3en.24xlarge on AWS that has 96 vCPUs. We need to tune the parameter based on the number of cpus, instead of static setting. Fixes #8133 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #8188 (cherry picked from commit `d0297c599a`)	2021-04-25 16:16:26 +03:00
Hagit Segev	28f5e0bd20	release: prepare for 4.3.3 scylla-4.3.3	2021-04-07 10:24:04 +03:00
Gleb Natapov	09f3bb93a3	storage_proxy: do not crash on LOCAL_QUORUM access to a DC with zero replication If a table that is not replicated to a certain DC (rf=0) is accessed with LOCAL_QUORUM on that DC the current code will crash since the 'targets' array will be empty and read executor does not handle it. Fix it by replying with empty result. Fixes #8354 Message-Id: <YGro+l2En3fF80CO@scylladb.com> (cherry picked from commit `cd24dfc7e5`) [avi: re-added virtual keyword when backporting, since 4.4 and below don't have `020da49c89`]	2021-04-06 19:37:32 +03:00
Nadav Har'El	76642eb00d	update submodule tools/java * tools/java d49ae89b4b...2bedecd3a7 (1): > sstableloader: fix handling of rewritten partition Backported refs #8390. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-04-05 18:33:39 +03:00
Pavel Emelyanov	a60f394d9a	test: Fix exit condition of row_cache_test::test_eviction_from_invalidated The test populates the cache, then invalidates it, then tries to push huge (10x times the segment size) chunks into seastar memory hoping that the invalid entries will be evicted. The exit condition on the last stage is -- total memory of the region (sum of both -- used and free) becomes less than the size of one chunk. However, the condition is wrong, because cache usually contains a dummy entry that's not necessarily on lru and on some test iteration it may happen that evictable size < chunk size < evictable size + dummy size In this case test fails with bad_alloc being unable to evict the memory from under the dummy. fixes: #7959 tests: unit(row_cache_test), unit(the failing case with the triggering seed from the issue + 200 times more with random seeds) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210309134138.28099-1-xemul@scylladb.com> (cherry picked from commit `096e452db9`)	2021-04-04 18:11:08 +03:00
Avi Kivity	f2af68850c	Merge 'Fix inconsistent mv si backport to 4.3' from Eliran Sinvani This is a backport of the fix for #7709 to 4.3 version. Closes #8375 * github.com:scylladb/scylla: Merge 'Fix inconsistencies in MV and SI (reworked)' from Eliran Sinvani storage_proxy: Add .local_db() getters	2021-04-04 17:01:26 +03:00
Takuya ASADA	c7781f8c9e	node_exporter_install: fix bad owner of node_exporter node_exporter files as installed with weird ownership, 3434:3434. This is because upstream node_exporter tar.gz contain the owner information, but we should overwrite it to valid one. Fixes #6222 Closes #8379	2021-04-01 17:34:19 +03:00
Piotr Sarna	8f37924694	Merge 'Fix inconsistencies in MV and SI (reworked)' from Eliran Sinvani This is a reworked submission of #7686 which has been reverted. This series fixes some race conditions in MV/SI schema creation and load, we spotted some places where a schema without a base table reference can sneak into the registry. This can cause to an unrecoverable error since write commands with those schemas can't be issued from other nodes. Most of those cases can occur on 2 main and uncommon cases, in a mixed cluster (during an upgrade) and in a small window after a view or base table altering. Fixes #7709 Closes #8091 * github.com:scylladb/scylla: database: Fix view schemas in place when loading global_schema_ptr: add support for view's base table materialized views: create view schemas with proper base table reference. materialized views: Extract fix legacy schema into its own logic (cherry picked from commit `d473bc9b06`)	2021-03-30 08:08:14 +03:00
Pavel Emelyanov	8588eef807	storage_proxy: Add .local_db() getters To facilitate the next patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `4c7bc8a3d1`)	2021-03-30 08:06:51 +03:00
Piotr Sarna	c50a2898cf	transport: return error on correct stream during size shedding When a request is shed due to being too large, its response was sent with stream id 0 instead of the stream id that matches the communication lane. That in turn confused the client, which is no longer the case. (cherry picked from commit `8635094144`)	2021-03-25 09:22:36 +01:00
Piotr Sarna	44f7251809	transport: return error on correct stream during shedding When a request is shed due to exceeding the max number of concurrent requests, its response was sent with stream id 0 instead of the stream id that matches the communication lane. That in turn confused the client, which is no longer the case. (cherry picked from commit `d6ea6937ee`)	2021-03-25 09:22:36 +01:00
Piotr Sarna	fc070d3dc6	transport: skip the whole request if it is too large When a request is shed due to being too large, only the header was actually read, and the body was still stuck in the socket - and would be read in the next iteration, which would expect to actually read a new request header. Instead, the whole message is now skipped, so that a new request can be correctly read and parsed. Fixes #8193 (cherry picked from commit `4a24d7dca0`)	2021-03-25 09:22:35 +01:00
Piotr Sarna	901784e122	transport: skip the whole request during shedding When a request is shed due to exceeding the number of max concurrent requests, only its header was actually read, and the body was still stuck in the socket - and would be read in the next iteration, which would expect to actually read a new request header. Instead, the whole message is now skipped, so that a new request can be correctly read and parsed. Refs #8193 (cherry picked from commit `3eb7e768cb`)	2021-03-25 09:22:29 +01:00
Botond Dénes	2ccda04d57	result_memory_accounter: abort unpaged queries hitting the global limit The `result_memory_accounter` terminates a query if it reaches either the global or shard-local limit. This used to be so only for paged queries, unpaged ones could grow indefinitely (until the node OOM'd). This was changed in `fea5067` which enforces the local limit on unpaged queries as well, by aborting them. However a loophole remained in the code: `result_memory_accounter::check_and_update()` has another stop condition, besides `check_local_limit()`, it also checks the global limit. This stop condition was not updated to enforce itself on unpaged queries by aborting them, instead it silently terminated them, causing them to return less data then requested. This was masked by most queries reaching the local limit first. This patch fixes this by aborting unpaged mutation queries when they hit the global limit. Fixes: #8162 Tests: unit(release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210226102202.51275-1-bdenes@scylladb.com> (cherry picked from commit `dd5a601aaa`)	2021-03-24 13:00:46 +02:00
Amos Kong	e8facb1932	schema.cc/describe: fix invalid compaction options in schema There is a typo in schema.cql of snapshot, lack of comma after compaction strategy. It will fail to restore schema by the file. AND compaction = {'class': 'SizeTieredCompactionStrategy''max_compaction_threshold': '32'} map_as_cql_param() function has a `first` parameter to smartly add comma, the compaction_strategy_options is always not the first. Fixes #7741 Signed-off-by: Amos Kong <amos@scylladb.com> Closes #7734 (cherry picked from commit `6b1659ee80`)	2021-03-24 12:57:50 +02:00
Tomasz Grabiec	6f338e7656	sstable: writer: ka/la: Write row marker cell after row tombstone Row marker has a cell name which sorts after the row tombstone's start bound. The old code was writing the marker first, then the row tombstone, which is incorrect. This was harmeless to our sstable reader, which recognized both as belonging to the current clustering row fragment, and collects both fine. However, if both atoms trigger creation of promoted index blocks, the writer will create a promoted index with entries wich violate the cell name ordering. It's very unlikely to run into in practice, since to trigger promoted index entries for both atoms, the clustering key would be so large so that the size of the marker cell exceeds the desired promoted index block size, which is 64KB by default (but user-controlled via column_index_size_in_kb option). 64KB is also the limit on clustering key size accepted by the system. This was caught by one of our unit tests: sstable_conforms_to_mutation_source_test ...which runs a battery of mutation reader tests with various desired promoted index block sizes, including the target size of 1 byte, which triggers an entry for every atom. The test started to fail for some random seeds after commit `ecb6abe` inside the test_streamed_mutation_forwarding_is_consistent_with_slicing test case, reporting a mutation mismatch in the following line: assert_that(sliced_m).is_equal_to(fwd_m, slice_with_ranges.row_ranges(*m.schema(), m.key())); It compares mutations read from the same sstable using different methods, slicing using clustering key restricitons, and fast forwarding. The reported mismatch was that fwd_m contained the row marker, but sliced_m did not. The sstable does contain the marker, so both reads should return it. After reverting the commit which introduced dynamic adjustments, the test passes, but both mutations are missing the marker, both are wrong! They are wrong because the promoted index contians entries whose starting positions violate the ordering, so binary search gets confused and selects the row tombstone's position, which is emitted after the marker, thus skipping over the row marker. The explanation for why the test started to fail after dynamic adjustements is the following. The promoted index cursor works by incrementally parsing buffers fed by the file input stream. It first parses the whole block and then does a binary search within the parsed array. The entries which cursor touches during binary search depend on the size of the block read from the file. The commit which enabled dynamic adjustements causes the block size to be different for subsequent reads, which allows one of the reads to walk over the corrupted entries and read the correct data by selecting the entry corresponding to the row marker. Fixes #8324 Message-Id: <20210322235812.1042137-1-tgrabiec@scylladb.com> (cherry picked from commit `9272e74e8c`)	2021-03-24 10:39:13 +02:00
Avi Kivity	7bb9230cfa	Merge "mutation_writer: explicitly close writers" from Benny " _consumer_fut is expected to return an exception on the abort path. Wait for it and drop any exception so it won't be abandoned as seen in #7904. A future<> close() method was added to return _consumer_fut. It is called both after abort() in the error path, and after consume_end_of_stream, on the success path. With that, consume_end_of_stream was made void as it doesn't return a future<> anymore. Fixes #7904 Test: unit(release) " * tag 'close-bucket-writer-v5' of github.com:bhalevy/scylla: mutation_writer: bucket_writer: add close mutation_writer/feed_writers: refactor bucket/shard writers mutation_writer: update bucket/shard writers consume_end_of_stream (cherry picked from commit `f11a0700a8`)	2021-03-21 18:11:52 +02:00
Benny Halevy	2898e98733	dist: scylla_util: prevent IndexError when no ephemeral_disks were found Currently we call firstNvmeSize before checking that we have enough (at least 1) ephemeral disks. When none are found, we hit the following error (see #7971): ``` File "/opt/scylladb/scripts/libexec/scylla_io_setup", line 239, in if idata.is_recommended_instance(): File "/opt/scylladb/scripts/scylla_util.py", line 311, in is_recommended_instance diskSize = self.firstNvmeSize File "/opt/scylladb/scripts/scylla_util.py", line 291, in firstNvmeSize firstDisk = ephemeral_disks[0] IndexError: list index out of range ``` This change reverses the order and first checks that we found enough disks before getting the fist disk size. Fixes #7971 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #8027 (cherry picked from commit `55e3df8a72`)	2021-03-21 12:20:19 +02:00
Nadav Har'El	2796b0050d	storage_service: correct missing exception in logging rebuild failure When failing to rebuild a node, we would print the error with the useless explanation "<no exception>". The problem was a typo in the logging command which used std::current_exception() - which wasn't relevant in that point - instead of "ep". Refs #8089 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210314113118.1690132-1-nyh@scylladb.com> (cherry picked from commit `d73934372d`)	2021-03-21 10:51:23 +02:00
Nadav Har'El	6bc005643e	alternator-test: increase read timeout and avoid retries By default the boto3 library waits up to 60 second for a response, and if got no response, it sends the same request again, multiple times. We already noticed in the past that it retries too many times thus slowing down failures, so in our test configuration lowered the number of retries to 3, but the setting of 60-second-timeout plus 3 retries still causes two problems: 1. When the test machine and the build are extremely slow, and the operation is long (usually, CreateTable or DeleteTable involving multiple views), the 60 second timeout might not be enough. 2. If the timeout is reached, boto3 silently retries the same operation. This retry may fail because the previous one really succeeded at least partially! The symptom is tests which report an error when creating a table which already exists, or deleting a table which dooesn't exist. The solution in this patch is first of all to never do retries - if a query fails on internal server error, or times out, just report this failure immediately. We don't expect to see transient errors during local tests, so this is exactly the right behavior. The second thing we do is to increase the default timeout. If 1 minute was not enough, let's raise it to 5 minutes. 5 minutes should be enough for every operation (famous last words...). Even if 5 minutes is not enough for something, at least we'll now see the timeout errors instead of some wierd errors caused by retrying an operation which was already almost done. Fixes #8135 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210222125630.1325011-1-nyh@scylladb.com> (cherry picked from commit `0b2cf21932`)	2021-03-19 00:09:17 +02:00
Raphael S. Carvalho	d591ff5422	LCS: reshape: tolerate more sstables in level 0 with relaxed mode Relaxed mode, used during initialization, of reshape only tolerates min_threshold (default: 4) L0 sstables. However, relaxed mode should tolerate more sstables in level 0, otherwise boot will have to reshape level 0 every time it crosses the min threshold. So let's make LCS reshape tolerate a max of max_threshold and 32. This change is beneficial because once table is populated, LCS regular compaction can decide to merge those sstables in level 0 into level 1 instead, therefore reducing WA. Refs #8297. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210318131442.17935-1-raphaelsc@scylladb.com> (cherry picked from commit `e53cedabb1`)	2021-03-18 19:19:58 +02:00
Raphael S. Carvalho	acb1c3eebf	compaction_manager: Fix performance of cleanup compaction due to unlimited parallelism Prior to `463d0ab`, only one table could be cleaned up at a time on a given shard. Since then, all tables belonging to a given keyspace are cleaned up in parallel. Cleanup serialization on each shard was enforced with a semaphore, which was incorrectly removed by the patch aforementioned. So space requirement for cleanup to succeed can be up to the size of keyspace, increasing the chances of node running out of space. Node could also run out of memory if there are tons of tables in the keyspace. Memory requirement is at least #_of_tables * 128k (not taking into account write behind, etc). With 5k tables, it's ~0.64G per shard. Also all tables being cleaned up in parallel will compete for the same disk and cpu bandwidth, so making them all much slower, and consequently the operation time is significantly higher. This problem was detected with cleanup, but scrub and upgrade go through the same rewrite procedure, so they're affected by exact the same problem. Fixes #8247. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210312162223.149993-1-raphaelsc@scylladb.com> (cherry picked from commit `7171244844`)	2021-03-18 14:29:20 +02:00
Dejan Mircevski	a04242ea62	cql3/expr: Handle `IN ?` bound to null Previously, we crashed when the IN marker is bound to null. Throw invalid_request_exception instead. This is a 4.3 backport of the #8265 fix. Tests: unit (dev) (cherry picked from commit `8db24fc03b`) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8308	2021-03-18 10:39:19 +02:00
Nadav Har'El	7131c7c523	update tools/java submodule Backported fix for Refs #8229 into submodule. * tools/java f2e8666d7e...d49ae89b4b (1): > sstableloader: Only escape column names once Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-03-15 16:56:32 +02:00
Raphael S. Carvalho	6af7cf8a39	compaction: Prevent cleanup and regular from compacting the same sstable Due to regression introduced by `463d0ab`, regular can compact in parallel a sstable being compacted by cleanup, scrub or upgrade. This redundancy causes resources to be wasted, write amplification is increased and so does the operation time, etc. That's a potential source of data resurrection because the now-owned data from a sstable being compacted by both cleanup and regular will still exist in the node afterwards, so resurrection can happen if node regains ownership. Fixes #8155. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210225172641.787022-1-raphaelsc@scylladb.com> (cherry picked from commit `2cf0c4bbf1`) Includes fixup patch: compaction_manager: Fix use-after-free in rewrite_sstables() Use-after-free introduced by `2cf0c4bbf1`. That's because compacting is moved into then_wrapped() lambda, so it's potentially freed on the next iteration of repeat(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210309232940.433490-1-raphaelsc@scylladb.com> (cherry picked from commit `f7cc431477`)	2021-03-11 08:24:42 +02:00
Asias He	e2d4940b6d	gossip: Handle timeout error in gossiper::do_shadow_round Currently, the rpc timeout error for the GOSSIP_GET_ENDPOINT_STATES verb is not handled in gossiper::do_shadow_round. If the GOSSIP_GET_ENDPOINT_STATES rpc call to any of the remote nodes goes timeout, gossiper::do_shadow_round will throw an exception and fail the whole boot up process. It is fine that some of the remote nodes timeout in shadow round. It is not a must to talk to all nodes. This patch fixes an issue we saw recently in our sct tests: ``` INFO \| scylla[1579]: [shard 0] init - Shutting down gossiping INFO \| scylla[1579]: [shard 0] gossip - gossip is already stopped INFO \| scylla[1579]: [shard 0] init - Shutting down gossiping was successful ... ERR \| scylla[1579]: [shard 0] init - Startup failed: seastar::rpc::timeout_error (rpc call timed out) ``` Fixes #8187 Closes #8213 (cherry picked from commit `dc40184faa`)	2021-03-09 19:04:08 +02:00
Benny Halevy	09f9ff3f96	repair: repair_writer: do not capture lw_shared_ptr cross-shard The shared_from_this lw_shared_ptr must not be accessed across shards. Capturing it in the lambda passed to mutation_writer::distribute_reader_and_consume_on_shards causes exactly that since the captured lw_shared_ptr is copied on other shards, and ends up in memory corruption as seen in #7535 (probably due to lw_shared_ptr._count going out-of-sync when incremented/decremented in parallel on other shards with no synchronization. This was introduced in `289a08072a`. The writer is not needed in the body of this lambda anyways so it doesn't need to capture it. It is already held by the continuations until the end of the chain. Fixes #7535 Test: repair_additional_test:RepairAdditionalTest.repair_disjoint_row_3nodes_diff_shard_count_test (dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201104142216.125249-1-bhalevy@scylladb.com> (cherry picked from commit `f93fb55726`)	2021-03-03 21:27:06 +02:00
Dejan Mircevski	d671185828	cql3: Fix maps::setter_by_key for unset values Unset values for key and value were not handled. Handle them in a manner matching Cassandra. This fixes all cases in testMapWithUnsetValues, so re-enable it (and fix a comment typo in it). Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `9eed26ca3d`) Fixes #7740.	2021-03-02 16:38:30 +02:00
Dejan Mircevski	8d1784805a	cql3: Fix `IN ?` for unset values When the right-hand side of IN is an unset value, we must report an error, like Cassandra does. This fixes testListWithUnsetValues, so re-enable it. Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `4515a49d4d`) Fixes #7740.	2021-03-02 16:38:10 +02:00
Dejan Mircevski	1d4ce229eb	cql3: Fix handling of scalar unset value Make the bind() operation of the scalar marker handle the unset-value case (which it previously didn't). Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `5bee97fa51`) Fixes #7740.	2021-03-02 16:37:45 +02:00
Dejan Mircevski	ba9897a34e	cql3: Fix crash when removing unset_value from set Avoid crash described in #7740 by ignoring the update when the element-to-remove is UNSET_VALUE. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `8b2f459622`) Fixes #7740.	2021-03-02 16:37:15 +02:00
Hagit Segev	5cdc1fa662	release: prepare for 4.3.2 scylla-4.3.2	2021-03-01 22:04:21 +02:00
Avi Kivity	81347037d3	Update seastar submodule * seastar 69f8394742...b70b444924 (1): > io_queue: Fix "delay" metrics Fixes #8166.	2021-03-01 13:57:57 +02:00
Avi Kivity	49c3b812b9	Update seastar submodule * seastar 6973080cd1...69f8394742 (1): > rpc: streaming sink: order outgoing messages Fixes #7552.	2021-03-01 12:20:57 +02:00
Avi Kivity	6ffd23a957	Point seastar submodule at scylla-seastar.git This allows is to backport Seastar patches to branch-4.3.	2021-03-01 12:19:40 +02:00
Raphael S. Carvalho	a0b78956e8	sstables: Fix TWCS reshape for windows with at least min_threshold sstables TWCS reshape was silently ignoring windows which contain at least min_threshold sstables (can happen with data segregation). When resizing candidates, size of multi_window was incorrectly used and it was always empty in this path, which means candidates was always cleared. Fixes #8147. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210224125322.637128-1-raphaelsc@scylladb.com> (cherry picked from commit `21608bd677`)	2021-02-28 16:42:43 +02:00
Pavel Solodovnikov	74941f67e6	large_data_handler: fix segmentation fault when constructing `data_value` from a `nullptr` It turns out that `cql_table_large_data_handler::record_large_rows` and `cql_table_large_data_handler::record_large_cells` were broken for reporting static cells and static rows from the very beginning: In case a large static cell or a large static row is encountered, it tries to execute `db::try_record` with `nullptr` additional values, denoting that there is no clustering key to be recorded. These values are next passed to `qctx.execute_cql()`, which creates `data_value` instances for each statement parameter, hence invoking `data_value(nullptr)`. This uses `const char*` overload which delegates to `std::string_view` ctor overload. It is UB to pass `nullptr` pointer to `std::string_view` ctor. Hence leading to segmentation faults in the aforementioned large data reporting code. What we want here is to make a null `data_value` instead, so just add an overload specifically for `std::nullptr_t`, which will create a null `data_value` with `text` type. A regression test is provided for the issue (written in `cql-pytest` framework). Tests: test/cql-pytest/test_large_cells_rows.py Fixes: #6780 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20201223204552.61081-1-pa.solodovnikov@scylladb.com> (cherry picked from commit `219ac2bab5`)	2021-02-23 12:13:51 +02:00
Avi Kivity	8c9c0807ef	Merge 'cdc: Limit size of topology description' from Piotr Jastrzębski Currently, whole topology description for CDC is stored in a single row. This means that for a large cluster of strong machines (say 100 nodes 64 cpus each), the size of the topology description can reach 32MB. This causes multiple problems. First of all, there's a hard limit on mutation size that can be written to Scylla. It's related to commit log block size which is 16MB by default. Mutations bigger than that can't be saved. Moreover, such big partitions/rows cause reactor stalls and negatively influence latency of other requests. This patch limits the size of topology description to about 4MB. This is done by reducing the number of CDC streams per vnode and can lead to CDC data not being fully colocated with Base Table data on shards. It can impact performance and consistency of data. This is just a quick fix to make it easily backportable. A full solution to the problem is under development. For more details see #7961, #7993 and #7985. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #8048 * github.com:scylladb/scylla: cdc: Limit size of topology description cdc: Extract create_stream_ids from topology_description_generator (cherry picked from commit `c63e26e26f`)	2021-02-22 20:39:08 +02:00
Takuya ASADA	f316e1db54	scylla_util.py: resolve /dev/root to get actual device on aws When psutil.disk_paritions() reports / is /dev/root, aws_instance mistakenly reports root partition is part of ephemeral disks, and RAID construction will fail. This prevents the error and reports correct free disks. Fixes #8055 Closes #8040 (cherry picked from commit `32d4ec6b8a`)	2021-02-21 16:23:21 +02:00

1 2 3 4 5 ...

24128 Commits