scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Takuya ASADA	f7e5339c14	scylla_io_setup: support ARM instances on AWS Add preset parameters for AWS ARM intances. Fixes #9493 (cherry picked from commit `4e8060ba72`)	2021-11-15 13:35:15 +02:00
Asias He	4c4972cb33	gossip: Fix use-after-free in real_mark_alive and mark_dead In commit `11a8912093` (gossiper: get_gossip_status: return string_view and make noexcept) get_gossip_status returns a pointer to an endpoint_state in endpoint_state_map. After commit `425e3b1182` (gossip: Introduce direct failure detector), gossiper::mark_dead and gossiper::real_mark_alive can yield in the middle of the function. It is possible that endpoint_state can be removed, causing use-after-free to access it. To fix, make a copy before we yield. Fixes #8859 Closes #8862 (cherry picked from commit `7a32cab524`)	2021-11-15 13:23:02 +02:00
Takuya ASADA	50ce5bef2c	scylla_util.py: On is_gce(), return False when it's on GKE GKE metadata server does not provide same metadata as GCE, we should not return True on is_gce(). So try to fetch machine-type from metadata server, return False if it 404 not found. Fixes #9471 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #9582 (cherry picked from commit `9b4cf8c532`)	2021-11-15 13:18:02 +02:00
Asias He	f864eea844	repair: Return HTTP 400 when repiar id is not found There are two APIs for checking the repair status and they behave differently in case the id is not found. ``` {"host": "192.168.100.11:10001", "method": "GET", "uri": "/storage_service/repair_async/system_auth?id=999", "duration": "1ms", "status": 400, "bytes": 49, "dump": "HTTP/1.1 400 Bad Request\r\nContent-Length: 49\r\nContent-Type: application/json\r\nDate: Wed, 03 Nov 2021 10:49:33 GMT\r\nServer: Seastar httpd\r\n\r\n{\"message\": \"unknown repair id 999\", \"code\": 400}"} {"host": "192.168.100.11:10001", "method": "GET", "uri": "/storage_service/repair_status?id=999&timeout=1", "duration": "0ms", "status": 500, "bytes": 49, "dump": "HTTP/1.1 500 Internal Server Error\r\nContent-Length: 49\r\nContent-Type: application/json\r\nDate: Wed, 03 Nov 2021 10:49:33 GMT\r\nServer: Seastar httpd\r\n\r\n{\"message\": \"unknown repair id 999\", \"code\": 500}"} ``` The correct status code is 400 as this is a parameter error and should not be retried. Returning status code 500 makes smarter http clients retry the request in hopes of server recovering. After this patch: curl -X PGET 'http://127.0.0.1:10000/storage_service/repair_async/system_auth?id=9999' {"message": "unknown repair id 9999", "code": 400} curl -X GET 'http://127.0.0.1:10000/storage_service/repair_status?id=9999' {"message": "unknown repair id 9999", "code": 400} Fixes #9576 Closes #9578 (cherry picked from commit `f5f5714aa6`)	2021-11-15 13:15:59 +02:00
Calle Wilund	b9735ab079	cdc: fix broken function signature in maybe_back_insert_iterator Fixes #9103 compare overload was declared as "bool" even though it is a tri-cmp. causes us to never use the speed-up shortcut (lessen search set), in turn meaning more overhead for collections. Closes #9104 (cherry picked from commit `59555fa363`)	2021-11-15 13:13:04 +02:00
Takuya ASADA	766e16f19e	scylla_io_setup: handle nr_disks on GCP correctly nr_disks is int, should not be string. Fixes #9429 Closes #9430 (cherry picked from commit `3b798afc1e`)	2021-11-15 13:06:30 +02:00
Michał Chojnowski	e6520df41c	utils: fragment_range: fix FragmentedView utils for views with empty fragments The copying and comparing utilities for FragmentedView are not prepared to deal with empty fragments in non-empty views, and will fall into an infinite loop in such case. But data coming in result_row_view can contain such fragments, so we need to fix that. Fixes #8398. Closes #8397 (cherry picked from commit `f23a47e365`)	2021-11-15 12:55:25 +02:00
Hagit Segev	26aca7b9f7	release: prepare for 4.5.2 scylla-4.5.2	2021-11-14 14:19:34 +02:00
Avi Kivity	103c85a23f	build: clobber user/group info from node_exporter tarball node_exporter is packaged with some random uid/gid in the tarball. When extracting it as an ordinary user this isn't a problem, since the uid/gid are reset to the current user, but that doesn't happen under dbuild since `tar` thinks the current user is root. This causes a problem if one wants to delete the build directory later, since it becomes owned by some random user (see /etc/subuid) Reset the uid/gid infomation so this doesn't happen. Closes #9579 Fixes #9610. (cherry picked from commit `e1817b536f`)	2021-11-10 14:18:56 +02:00
Nadav Har'El	db66b62e80	alternator: fix bug in ReturnValues=ALL_NEW This patch fixes a bug in UpdateItem's ReturnValues=ALL_NEW, which in some cases returned the OLD (pre-modification) value of some of the attributes, instead of its NEW value. The bug was caused by a confusion in our JSON utility function, rjson::set(), which sounds like it can set any member of a map, but in fact may only be used to add a new member - if a member with the same name (key) already existed, the result is undefined (two values for the same key). In ReturnValues=ALL_NEW we did exactly this: we started with a copy of the original item, and then used set() to override some of the members. This is not allowed. So in this patch, we introduce a new function, rjson::replace(), which does what we previously thought that rjson::set() does - i.e., replace a member if it exists, or if not, add it. We call this function in the ReturnValues=ALL_NEW code. This patch also adds a test case that reproduces the incorrect ALL_NEW results - and gets fixed by this patch. In an upcoming patch, we should rename the confusingly-named set() functions and audit all their uses. But we don't do this in this patch yet. We just add some comments to clarify what set() does - but don't change it, and just add one new function for replace(). Fixes #9542 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211104134937.40797-1-nyh@scylladb.com> (cherry picked from commit `b95e431228`)	2021-11-08 13:11:37 +02:00
Asias He	098fcf900f	storage_service: Abort restore_replica_count when node is removed from the cluster Consider the following procedure: - n1, n2, n3 - n3 is down - n1 runs nodetool removenode uuid_of_n3 to removenode from n3 the cluster - n1 is down in the middle of removenode operation Node n1 will set n3 to removing gossip status during removenode operation. Whenever existing nodes learn a node is in removing gossip status, they will call restore_replica_count to stream data from other nodes for the ranges n3 loses if n3 was removed from the cluster. If the streaming fails, the streaming will sleep and retry. The current max number of retry attempts is 5. The sleep interval starts at 60 seconds and increases 1.5 times per sleep. This can leave the cluster in a bad state. For example, nodes can go out of disk space if the streaming continues. We need a way to abort such streaming attempts. To abort the removenode operation and forcely remove the node, users can run `nodetool removenode force` on any existing nodes to move the node from removing gossip status to removed gossip status. However, the restore_replica_count will not be aborted. In this patch, a status checker is added in restore_replica_count, so that once a node is in removed gossip status, restore_replica_count will be aborted. This patch is for older releases without the new NODE_OPS_CMD infrastructure where such abort will happen automatically in case of error. Fixes #8651 Closes #8655 (cherry picked from commit `0858619cba`)	2021-11-02 17:25:34 +02:00
Benny Halevy	84025f6ce0	large_data_handle: add sstable name to log messages Although the sstable name is part of the system.large_* records, it is not printed in the log. In particular, this is essential for the "too many rows" warning that currently does not record a row in any large_* table so we can't correlate it with a sstable. Fixes #9524 Test: unit(dev) DTest: wide_rows_test.py Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211027074104.1753093-1-bhalevy@scylladb.com> (cherry picked from commit `a21b1fbb2f`)	2021-10-29 10:41:21 +03:00
Asias He	9898a114a6	repair: Handle everywhere_topology in bootstrap_with_repair The everywhere_topology returns the number of nodes in the cluster as RF. This makes only streaming from the node losing the range impossible since no node is losing the range after bootstrap. Shortcut to stream from all nodes in local dc in case the keyspace is everywhere_topology. Fixes #8503 (cherry picked from commit `3c36517598`)	2021-10-28 18:56:01 +03:00
Yaron Kaikov	4c0eac0491	release: prepare for 4.5.1 scylla-4.5.1	2021-10-24 14:11:07 +03:00
Benny Halevy	c1d8ce7328	date_tiered_manifest: get_now: fix use after free of sstable_list The sstable_list is destroyed right after the temporary lw_shared_ptr<sstable_list> returned from `cf.get_sstables()` is dereferenced. Fixes #9138 Test: unit(dev) DTest: resharding_test.py:ReshardingTombstones_with_DateTieredCompactionStrategy.disable_tombstone_removal_during_reshard_test (debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210804075813.42526-1-bhalevy@scylladb.com> (cherry picked from commit `3ad0067272`)	2021-10-20 17:03:49 +03:00
Jan Ciolek	5c5a71d2d7	cql3: Fix need_filtering on indexed table There were cases where a query on an indexed table needed filtering but need_filtering returned false. This is fixed by using new conditions in cases where we are using an index. Fixes #8991. Fixes #7708. For now this is an overly conservative implementation that returns true in some cases where filtering is not needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> (cherry picked from commit `54149242b4`)	2021-10-18 11:31:20 +03:00
Benny Halevy	454ff04ff6	utils: phased_barrier: advance_and_await: make noexcept As a function returning a future, simplify its interface by handling any exceptions and returning an exceptional future instead of propagating the exception. In this specific case, throwing from advance_and_await() will propagate through table::await_pending_* calls short-circuiting a .finally clause in table::stop(). Also, mark as noexcept methods of class table calling advance_and_await and table::await_pending_ops that depends on them. Fixes #8636 A followup patch will convert advance_and_await to a coroutine. This is done separately to facilitate backporting of this patch. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210511161407.218402-1-bhalevy@scylladb.com> (cherry picked from commit `c0dafa75d9`)	2021-10-13 12:26:03 +03:00
Takuya ASADA	f5f5b9a307	scylla_raid_setup: enabling mdmonitor.service on Debian variants On Debian variants, mdmonitor.service cannnot enable because it missing [Install] section, so 'systemctl enable mdmonitor.service' will fail, not able to run mdmonitor after the system restarted. To force running the service, add Wants=mdmonitor.service on var-lib-scylla.mount. Fixes #8494 Closes #8530 (cherry picked from commit `c9324634ca`)	2021-10-12 13:58:49 +03:00
Avi Kivity	a433c5fe06	Merge 'rjson: Add throwing allocator' from Piotr Sarna This series adds a wrapper for the default rjson allocator which throws on allocation/reallocation failures. It's done to work around several rapidjson (the underlying JSON parsing library) bugs - in a few cases, malloc/realloc return value is not checked, which results in dereferencing a null pointer (or an arbitrary pointer computed as 0 + `size`, with the `size` parameter being provided by the user). The new allocator will throw an `rjson:error` if it fails to allocate or reallocate memory. This series comes with unit tests which checks the new allocator behavior and also validates that an internal rapidjson structure which we indirectly rely upon (Stack) is not left in invalid state after throwing. The last part is verified by the fact that its destructor ran without errors. Fixes #8521 Refs #8515 Tests: * unit(release) * YCSB: inserting data similar to the one mentioned in #8515 - 1.5MB objects clustered in partitions 30k objects in size - nothing crashed during various YCSB workloads, but nothing also crashed for me locally before this patch, so it's not 100% robust relevant YCSB workload config for using 1.5MB objects: ```yaml fieldcount=150 fieldlength=10000 ``` Closes #8529 * github.com:scylladb/scylla: test: add a test for rjson allocation test: rename alternator_base64_test to alternator_unit_test rjson: add a throwing allocator (cherry picked from commit `c36549b22e`)	2021-10-12 13:56:59 +03:00
Benny Halevy	7f96ee6689	streaming: stream_session: do not escape curly braces in format strings Those turn into '{}' in the formatted strings and trigger a logger error in the following sstlog.warn(err.c_str()) call. Fixes #8436 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210408173048.124417-1-bhalevy@scylladb.com> (cherry picked from commit `76cd315c42`)	2021-10-12 13:49:13 +03:00
Calle Wilund	ebe196e32d	table: ensure memtable is actually in memtable list before erasing Fixes #8749 if a table::clear() was issued while we were flushing a memtable, the memtable is already gone from list. We need to check this before erase. Otherwise we get random memory corruption via std::vector::erase v2: * Make interface more set-like (tolerate non-existance in erase). Closes #8904 (cherry picked from commit `373fa3fa07`)	2021-10-12 13:47:25 +03:00
Benny Halevy	38aa455e83	utils: merge_to_gently: prevent stall in std::copy_if std::copy_if runs without yielding. See https://github.com/scylladb/scylla/issues/8897#issuecomment-867522480 Note that the standard states that no iterators or references are invalidated on insert so we can keep inserting before last1 when merging the remainder of list2 at the tail of list1. Fixes #8897 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `453e7c8795`)	2021-10-12 13:05:45 +03:00
Avi Kivity	a99382a076	main: start background reclaim before bootstrap We start background reclaim after we bootstrap, so bootstrap doesn't benefit from it, and sees long stalls. Fix by moving background reclaim initialization early, before storage_service::join_cluster(). (storage_service::join_cluster() is quite odd in that main waits for it synchronously, compared to everything else which is just a background service that is only initialized in main). Fixes #8473. Closes #8474 (cherry picked from commit `935378fa53`)	2021-10-12 13:00:49 +03:00
Pavel Emelyanov	6e2d055be3	mutation: Keep range tombstone in tree when consuming Current code std::move()-s the range tombstone into consumer thus moving the tombstone's linkage to the containing list as well. As the result the orignal range tombstone itself leaks as it leaves the tree and cannot be reached on .clear(). Another danger is that the iterator pointing to the tombstone becomes invalid while it's then ++-ed to advance to the next entry. The immediate fix is to keep the tombstone linked to the list while moving. fixes: #9207 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210825100834.3216-1-xemul@scylladb.com> (cherry picked from commit `b012040a76`)	2021-10-12 12:57:49 +03:00
Michael Livshin	152f710dec	avoid race between compaction and table stop Also add a debug-only compaction-manager-side assertion that tests that no new compaction tasks were submitted for a table that is being removed (debug-only because not constant-time). Fixes #9448. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20211007110416.159110-1-michael.livshin@scylladb.com> (cherry picked from commit `e88891a8af`)	2021-10-12 12:51:34 +03:00
Asias He	14620444a2	storage_service: Fix argument in send_meta_data::do_receive The extra status print is not needed in the log. Fixes the following error: ERROR 2021-08-10 10:54:21,088 [shard 0] storage_service - service/storage_service.cc:3150 @do_receive: failed to log message: fmt='send_meta_data: got error code={}, from node={}, status={}': fmt::v7::format_error (argument not found) Fixes #9183 Closes #9189 (cherry picked from commit `ce8fd051c9`)	2021-10-12 12:45:47 +03:00
Yaron Kaikov	47be33a104	release: prepare for 4.5.0 scylla-4.5.0	2021-10-06 14:12:31 +03:00
Takuya ASADA	18b8388958	scylla_cpuscaling_setup: add --force option To building Ubuntu AMI with CPU scaling configuration, we need force running mode for scylla_cpuscaling_setup, which run setup without checking scaling_governor support. See scylladb/scylla-machine-image#204 Closes #9326 (cherry picked from commit `f928dced0c`)	2021-10-05 16:20:10 +03:00
Takuya ASADA	56b24818ec	scylla_cpuscaling_setup: disable ondemand.service on Ubuntu On Ubuntu, scaling_governor becomes powersave after rebooted, even we configured cpufrequtils. This is because ondemand.service, it unconditionally change scaling_governor to ondemand or powersave. cpufrequtils will start before ondemand.service, scaling_governor overwrite by ondemand.service. To configure scaling_governor correctly, we have to disable this service. Fixes #9324 Closes #9325 (cherry picked from commit `cd7fe9a998`)	2021-10-03 14:08:22 +03:00
Raphael S. Carvalho	5ed149b7e1	compaction_manager: prevent unbounded growth of pending tasks There will be unbounded growth of pending tasks if they are submitted faster than retiring them. That can potentially happen if memtables are frequently flushed too early. It was observed that this unbounded growth caused task queue violations as the queue will be filled with tons of tasks being reevaluated. By avoiding duplication in pending task list for a given table T, growth is no longer unbounded and consequently reevaluation is no longer aggressive. Refs #9331. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210930125718.41243-1-raphaelsc@scylladb.com> (cherry picked from commit `52302c3238`)	2021-10-03 13:10:37 +03:00
Eliran Sinvani	9265dbd5f7	dist: rpm: Add specific versioning and python3 dependency The Red Hat packages were missing two things, first the metapackage wasn't dependant at all in the python3 package and second, the scylla-server package dependencies didn't contain a version as part of the dependency which can cause to some problems during upgrade. Doing both of the things listed here is a bit of an overkill as either one of them separately would solve the problem described in #XXXX but both should be applied in order to express the correct concept. Fixes #8829 Closes #8832 (cherry picked from commit `9bfb2754eb`)	2021-09-12 16:01:01 +03:00
Avi Kivity	443fda8fb1	Merge "evictable_readers: don't drop static rows, drop assumption about snapshot isolation" from Botond " This mini-series fixes two loosely related bugs around reader recreation in the evictable reader (related by both being around reader recreation). A unit test is also added which reproduces both of them and checks that the fixes indeed work. More details in the patches themselves. This series replaces the two independent patches sent before: * [PATCH v1] evictable_reader: always reset static row drop flag * [PATCH v1] evictable_reader: relax partition key check on reader recreation As they depend on each other, it is easier to add a test if they are in a series. Fixes: #8923 Fixes: #8893 Tests: unit(dev, mutation_reader_test:debug) " * 'evictable-reader-recreation-more-bugs/v1' of https://github.com/denesb/scylla: test: mutation_reader_test: add more test for reader recreation evictable_reader: relax partition key check on reader recreation evictable_reader: always reset static row drop flag (cherry picked from commit `4209dfd753`)	2021-09-06 20:35:14 +03:00
Pavel Emelyanov	02da29fd05	btree: Destroy, not drop, node on clone roll-back The node in this place is not yet attached to its parent, so in btree::debug::yes (tests only) mode the node::drop()'s parent checks will access null parent pointer. However, in non-tesing runtime there's a chance that a linear node fails to clone one of its keys and gets here. In this case it will carry both leftmost and rightmost flags and the assertion in drop will fire. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `1d857d604a`) Ref #9248.	2021-09-06 11:16:29 +03:00
Yaron Kaikov	edead1caf9	release: prepare for 4.5.rc7 scylla-4.5.rc7	2021-09-06 08:40:20 +03:00
Tomasz Grabiec	8dbd4edbb5	Merge 'hints: use token_metadata to tell if node has left the ring' from Piotr Dulikowski This PR changes the `can_send` function so that it looks at the `token_metadata` in order to tell if the destination node is in the ring. Previously, gossiper state was used for that purpose and required a relatively complicated condition to check. The new logic just uses `token_metadata::is_member` which reduces complexity of the `can_send` function. Additionally, `storage_service` is slightly modified so that during a removenode operation the `token_metadata` is first updated and only then endpoint lifecycle subscribers are notified. This was done in order to prevent a race just like the one which happened in #5087 - hints manager is a lifecycle subscriber and starts a draining operation when a node is removed, and in order for draining to work correctly, `can_send` should keep returning true for that node. Tests: - unit(dev) - dtest(hintedhandoff_additional_test.py) - dtest(topology_test.py) Closes #8387 * github.com:scylladb/scylla: hints: clarify docstring comment for can_send hints: use token_metadata to tell if node is in the ring hints: slightly reogranize "if" statement in can_send storage_service: release token_metadata lock before notify_left storage_service: notify_left after token_metadata is replicated (cherry picked from commit `307bd354d2`) Ref #5087.	2021-09-05 17:38:11 +03:00
Pavel Emelyanov	02bb2e1f4c	btree: Dont leak kids on clone roll-back When failed-to-be-cloned node cleans itself it must also clear all its child nodes. Plain destroy() doesn't do it, it only frees the provided node. fixes: #9248 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `d1a1a2dac2`)	2021-09-05 17:33:48 +03:00
Benny Halevy	4bae31523d	distributed_loader: distributed_loader::get_sstables_from_upload_dir: do not copy vector containing foreign shared sstables lw_shared_ptr must not be copied on a foreign shard. Copying the vector on shard 0 tries increases the reference count of lw_shared_ptr<sstable> elements that were created on other shards, as seen in https://github.com/scylladb/scylla/issues/9278. Fixes #9278 DTest: migration_test.py:TestLoadAndStream_with_3_0_md.load_and_stream_increase_cluster_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210902084313.2003328-1-bhalevy@scylladb.com> (cherry picked from commit `33f579f783`)	2021-09-05 17:02:38 +03:00
Michał Chojnowski	55348131f9	utils: compact-radix-tree: fix accidental cache line bouncing Whenever a node_head_ptr is assigned to nil_root, the _backref inside it is overwritten. But since nil_root is shared between shards, this causes severe cache line bouncing. (It was observed to reduce the total write throughput of Scylla by 90% on a large NUMA machine). This backreference is never read anyway, so fix this bug by not writing it. Fixes #9252 Closes #9246 (cherry picked from commit `126baa7850`)	2021-08-29 15:45:33 +03:00
Avi Kivity	c1b9de3d5e	Revert "messaging_service: Enforce dc/rack membership iff required for non-tls connections" This reverts commit `a0745f9498`. It breaks multiregion clusters on AWS. Ref #8418.	2021-08-29 15:44:11 +03:00
Avi Kivity	9956bce436	Revert "messaging_service: Bind to listen address, not broadcast" This reverts commit `6dc7ef512d`. It stands in the way of reverting `a0745f9498`, which is implicated in #8418.	2021-08-29 15:43:29 +03:00
Pavel Solodovnikov	95f32428e4	raft: create system tables only when `raft` experimental feature is set Also introduce a tiny function to return raft-enabled db config for cql testing. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210826091432.279532-1-pa.solodovnikov@scylladb.com> (cherry picked from commit `c0854a0f62`) Ref #9239.	2021-08-29 14:01:36 +03:00
Pavel Solodovnikov	5b3319816a	db: add experimental option for raft Introduce `raft` experimental option. Adjust the tests accordingly to accomodate the new option. It's not enabled by default when providing `--experimental=true` config option and should be requested explicitly via `--experimental-options=raft` config option. Hide the code related to `raft_group_registry` behind the switch. The service object is still constructed but no initialization is performed (`init()` is not called) if the flag is not set. Later, other raft-related things, such as raft schema changes, will also use this flag. Also, don't introduce a corresponding gossiper feature just yet, because again, it should be done after the raft schema changes API contract is stabilized. This will be done in a separate series, probably related to implementing the feature itself. Tests: unit(dev) Ref #9239. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210823121956.167682-1-pa.solodovnikov@scylladb.com> (cherry picked from commit `22794efc22`)	2021-08-29 14:01:33 +03:00
Avi Kivity	8f63a9de31	Update seastar submodule (perftune failure on bond NIC) * seastar dab10ba6ad...70ea9312a1 (1): > perftune.py: instrument bonding tuning flow with 'nic' parameter Fixes #9225.	2021-08-19 16:58:15 +03:00
Hagit Segev	88314fedfa	release: prepare for 4.5.rc6 scylla-4.5.rc6	2021-08-17 14:17:04 +03:00
Calle Wilund	b0edfa6d70	commitlog/config: Make hard size enforcement false by default + add config opt Refs #9053 Flips default for commitlog disk footprint hard limit enforcement to off due to observed latency stalls with stress runs. Instead adds an optional flag "commitlog_use_hard_size_limit" which can be turned on to in fact do enforce it. Sort of tape and string fix until we can properly tweak the balance between cl & sstable flush rate. Closes #9195 (cherry picked from commit `3633c077be`)	2021-08-16 10:05:08 +03:00
Takuya ASADA	9338f6b6b8	scylla_cpuscaling_setup: change scaling_governor path On some environment /sys/devices/system/cpu/cpufreq/policy0/scaling_governor does not exist even it supported CPU scaling. Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is avaliable on both environment, so we should switch to it. Fixes #9191 Closes #9193 (cherry picked from commit `e5bb88b69a`)	2021-08-12 12:10:04 +03:00
Asias He	28940ef505	table: Fix is_shared assert for load and stream The reader is used by load and stream to read sstables from the upload directory which are not guaranteed to belong to the local shard. Using the make_range_sstable_reader instead of make_local_shard_sstable_reader. Tests: backup_restore_tests.py:TestBackupRestore.load_and_stream_using_snapshot_test backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_2_test backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_1_test migration_test.py:TestLoadAndStream.load_and_stream_asymmetric_cluster_test migration_test.py:TestLoadAndStream.load_and_stream_decrease_cluster_test migration_test.py:TestLoadAndStream.load_and_stream_frozen_pk_test migration_test.py:TestLoadAndStream.load_and_stream_increase_cluster_test migration_test.py:TestLoadAndStream.load_and_stream_primary_replica_only_test Fixes #9173 Closes #9185 (cherry picked from commit `040b626235`)	2021-08-12 11:17:33 +03:00
Raphael S. Carvalho	4c03bcce4c	compaction: Prevent tons of compaction of fully expired sstable from happening in parallel Compaction manager can start tons of compaction of fully expired sstable in parallel, which may consume a significant amount of resources. This problem is caused by weight being released too early in compaction, after data is all compacted but before table is called to update its state, like replacing sstables and so on. Fully expired sstables aren't actually compacted, so the following can happen: - compaction 1 starts for expired sst A with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 2 starts for expired sst B with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 3 starts for expired sst C with weight W, but there's nothing to be compacted, so weight W is released, then calls table to update state. - compaction 1 is done updating table state, so it finally completes and releases all the resources. - compaction 2 is done updating table state, so it finally completes and releases all the resources. - compaction 3 is done updating table state, so it finally completes and releases all the resources. This happens because, with expired sstable, compaction will release weight faster than it will update table state, as there's nothing to be compacted. With my reproducer, it's very easy to reach 50 parallel compactions on a single shard, but that number can be easily worse depending on the amount of sstables with fully expired data, across all tables. This high parallelism can happen only with a couple of tables, if there are many time windows with expired data, as they can be compacted in parallel. Prior to `55a8b6e3c9`, weight was released earlier in compaction, before last sstable was sealed, but right now, there's no need to release weight earlier. Weight can be released in a much simpler way, after the compaction is actually done. So such compactions will be serialized from now on. Fixes #8710. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com> [avi: drop now unneeded storage_service_for_tests] (cherry picked from commit `a7cdd846da`)	2021-08-10 18:16:24 +03:00
Piotr Jastrzebski	860e2190a9	api: use proper type to reduce partition count Partition count is of a type size_t but we use std::plus<int> to reduce values of partition count in various column families. This patch changes the argument of std::plus to the right type. Using std::plus<int> for size_t compiles but does not work as expected. For example plus<int>(2147483648LL, 1LL) = -2147483647 while the code would probably want 2147483649. Fixes #9090 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #9074 (cherry picked from commit `90a607e844`)	2021-08-10 18:16:15 +03:00
Nadav Har'El	6cf88812f6	secondary index: fix regression in CREATE INDEX IF NOT EXISTS The recent commit `0ef0a4c78d` added helpful error messages in case an index cannot be created because the intended name of its materialized view is already taken - but accidentally broke the "CREATE INDEX IF NOT EXISTS" feature. The checking code was correct, but in the wrong place: we need to first check maybe the index already exists and "IF NOT EXISTS" was chosen - and only do this new error checking if this is not the case. This patch also includes a cql-pytest test for reproducing this bug. The bug is also reproduced by the translated Cassandra unit tests cassandra_tests/validation/entities/secondary_index_test.py:: testCreateAndDropIndex and this is how I found this bug. After these patch, all these tests pass. Fixes #8717. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210526143635.624398-1-nyh@scylladb.com> (cherry picked from commit `97e827e3e1`)	2021-08-10 18:02:02 +03:00

1 2 3 4 5 ...

25893 Commits