scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	ee48ed2864	LCS: reshape: tolerate more sstables in level 0 with relaxed mode Relaxed mode, used during initialization, of reshape only tolerates min_threshold (default: 4) L0 sstables. However, relaxed mode should tolerate more sstables in level 0, otherwise boot will have to reshape level 0 every time it crosses the min threshold. So let's make LCS reshape tolerate a max of max_threshold and 32. This change is beneficial because once table is populated, LCS regular compaction can decide to merge those sstables in level 0 into level 1 instead, therefore reducing WA. Refs #8297. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210318131442.17935-1-raphaelsc@scylladb.com> (cherry picked from commit `e53cedabb1`)	2021-03-18 19:19:46 +02:00
Raphael S. Carvalho	03f2eb529f	compaction_manager: Fix performance of cleanup compaction due to unlimited parallelism Prior to `463d0ab`, only one table could be cleaned up at a time on a given shard. Since then, all tables belonging to a given keyspace are cleaned up in parallel. Cleanup serialization on each shard was enforced with a semaphore, which was incorrectly removed by the patch aforementioned. So space requirement for cleanup to succeed can be up to the size of keyspace, increasing the chances of node running out of space. Node could also run out of memory if there are tons of tables in the keyspace. Memory requirement is at least #_of_tables * 128k (not taking into account write behind, etc). With 5k tables, it's ~0.64G per shard. Also all tables being cleaned up in parallel will compete for the same disk and cpu bandwidth, so making them all much slower, and consequently the operation time is significantly higher. This problem was detected with cleanup, but scrub and upgrade go through the same rewrite procedure, so they're affected by exact the same problem. Fixes #8247. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210312162223.149993-1-raphaelsc@scylladb.com> (cherry picked from commit `7171244844`)	2021-03-18 14:28:57 +02:00
Dejan Mircevski	c270014121	cql3/expr: Handle `IN ?` bound to null Previously, we crashed when the IN marker is bound to null. Throw invalid_request_exception instead. This is a 4.4 backport of the #8265 fix. Tests: unit (dev) (cherry picked from commit `8db24fc03b`) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8307 Fixes #8265.	2021-03-18 10:35:55 +02:00
Pavel Emelyanov	35804855f9	test: Fix exit condition of row_cache_test::test_eviction_from_invalidated The test populates the cache, then invalidates it, then tries to push huge (10x times the segment size) chunks into seastar memory hoping that the invalid entries will be evicted. The exit condition on the last stage is -- total memory of the region (sum of both -- used and free) becomes less than the size of one chunk. However, the condition is wrong, because cache usually contains a dummy entry that's not necessarily on lru and on some test iteration it may happen that evictable size < chunk size < evictable size + dummy size In this case test fails with bad_alloc being unable to evict the memory from under the dummy. fixes: #7959 tests: unit(row_cache_test), unit(the failing case with the triggering seed from the issue + 200 times more with random seeds) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210309134138.28099-1-xemul@scylladb.com> (cherry picked from commit `096e452db9`)	2021-03-16 23:42:11 +01:00
Piotr Sarna	a810e57684	Merge 'Alternator: support nested attribute paths... in all expressions' from Nadav Har'El. This series fixes #5024 - which is about adding support for nested attribute paths (e.g., a.b.c[2]) to Alternator. The series adds complete support for this feature in ProjectionExpression, ConditionExpression, FilterExpression and UpdateExpression - and also its combination with ReturnValues. Many relevant tests - and also some new tests added in this series - now pass. The first patch in the series fixes #8043 a bug in some error cases in conditions, which was discovered while working in this series, and is conceptually separate from the rest of the series. Closes #8066 * github.com:scylladb/scylla: alternator: correct implemention of UpdateItem with nested attributes and ReturnValues alternator: fix bug in ReturnValues=UPDATED_NEW alternator: implemented nested attribute paths in UpdateExpression alternator: limit the depth of nested paths alternator: prepare for UpdateItem nested attribute paths alternator: overhaul ProjectionExpression hierarchy implementation alternator: make parsed::path object printable alternator-test: a few more ProjectionExpression conflict test cases alternator-test: improve tests for nested attributes in UpdateExpression alternator: support attribute paths in ConditionExpression, FilterExpression alternator-test: improve tests for nested attributes in ConditionExpression alternator: support attribute paths in ProjectionExpression alternator: overhaul attrs_to_get handling alternator-test: additional tests for attribute paths in ProjectionExpression alternator-test: harden attribute-path tests for ProjectionExpression alternator: fix ValidationException in FilterExpression - and more (cherry picked from commit `cbbb7f08a0`)	2021-03-15 18:40:12 +02:00
Nadav Har'El	7b19cc17d6	updated tools/java submodule * tools/java 8080009794...56470fda09 (1): > sstableloader: Only escape column names once Backporting fix to Refs #8229. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-03-15 16:47:47 +02:00
Benny Halevy	101e0e611b	storage_service: use atomic_vector for lifecycle_subscribers So it can be modified while walked to dispatch subscribed event notifications. In #8143, there is a race between scylla shutdown and notify_down(), causing use-after-free of cql_server. Using an atomic vector itstead and futurizing unregister_subscriber allows deleting from _lifecycle_subscribers while walked using atomic_vector::for_each. Fixes #8143 Test: unit(release) DTest: update_cluster_layout_tests:TestUpdateClusterLayout.add_node_with_large_partition4_test(release) materialized_views_test.py:TestMaterializedViews.double_node_failure_during_mv_insert_4_nodes_test(release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210224164647.561493-2-bhalevy@scylladb.com> (cherry picked from commit `baf5d05631`)	2021-03-15 15:25:18 +02:00
Benny Halevy	ba23eb733d	cql_server: event_notifier: unregister_subscriber in stop Move unregister_subscriber from the destructor to stop as preparation for moving storage_service lifescyle_subscribers to atomic_vector and futurizing unregister_subscriber. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210224164647.561493-1-bhalevy@scylladb.com> (cherry picked from commit `1ed04affab`) Ref #8143.	2021-03-15 15:25:10 +02:00
Hagit Segev	ec20ff0988	release: prepare for 4.4.rc4 scylla-4.4.rc4	2021-03-11 23:57:55 +02:00
Raphael S. Carvalho	3613b082bc	compaction: Prevent cleanup and regular from compacting the same sstable Due to regression introduced by `463d0ab`, regular can compact in parallel a sstable being compacted by cleanup, scrub or upgrade. This redundancy causes resources to be wasted, write amplification is increased and so does the operation time, etc. That's a potential source of data resurrection because the now-owned data from a sstable being compacted by both cleanup and regular will still exist in the node afterwards, so resurrection can happen if node regains ownership. Fixes #8155. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210225172641.787022-1-raphaelsc@scylladb.com> (cherry picked from commit `2cf0c4bbf1`) Includes fixup patch: compaction_manager: Fix use-after-free in rewrite_sstables() Use-after-free introduced by `2cf0c4bbf1`. That's because compacting is moved into then_wrapped() lambda, so it's potentially freed on the next iteration of repeat(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210309232940.433490-1-raphaelsc@scylladb.com> (cherry picked from commit `f7cc431477`)	2021-03-11 08:24:01 +02:00
Asias He	b94208009f	gossip: Handle timeout error in gossiper::do_shadow_round Currently, the rpc timeout error for the GOSSIP_GET_ENDPOINT_STATES verb is not handled in gossiper::do_shadow_round. If the GOSSIP_GET_ENDPOINT_STATES rpc call to any of the remote nodes goes timeout, gossiper::do_shadow_round will throw an exception and fail the whole boot up process. It is fine that some of the remote nodes timeout in shadow round. It is not a must to talk to all nodes. This patch fixes an issue we saw recently in our sct tests: ``` INFO \| scylla[1579]: [shard 0] init - Shutting down gossiping INFO \| scylla[1579]: [shard 0] gossip - gossip is already stopped INFO \| scylla[1579]: [shard 0] init - Shutting down gossiping was successful ... ERR \| scylla[1579]: [shard 0] init - Startup failed: seastar::rpc::timeout_error (rpc call timed out) ``` Fixes #8187 Closes #8213 (cherry picked from commit `dc40184faa`)	2021-03-10 16:27:47 +02:00
Nadav Har'El	05c266c02a	Merge 'Fix alternator streams management regression' from Calle Wilund Refs: #8012 Fixes: #8210 With the update to CDC generation management, the way we retrieve and process these changed. One very bad bug slipped through though; the code for getting versioned streams did not take into account the late-in-pr change to make clustering of CDC gen timestamps reversed. So our alternator shard info became quite rump-stumped, leading to more or less no data depending on when generations changed w.r. data. Also, the way we track the above timestamps changed, so we should utilize this for our end-of-iterator check. Closes #8209 * github.com:scylladb/scylla: alternator::streams: Use better method for generation timestamp system_distributed_keyspace: Add better routine to get latest cdc gen. timestamp system_distributed_keyspace: Fix cdc_get_versioned_streams timestamp range (cherry picked from commit `e12e57c915`)	2021-03-10 16:27:43 +02:00
Avi Kivity	4b7319a870	Merge 'Split CDC streams table partitions into clustered rows ' from Kamil Braun Until now, the lists of streams in the `cdc_streams_descriptions` table for a given generation were stored in a single collection. This solution has multiple problems when dealing with large clusters (which produce large lists of streams): 1. large allocations 2. reactor stalls 3. mutations too large to even fit in commitlog segments This commit changes the schema of the table as described in issue #7993. The streams are grouped according to token ranges, each token range being represented by a separate clustering row. Rows are inserted in reasonably large batches for efficiency. The table is renamed to enable easy upgrade. On upgrade, the latest CDC generation's list of streams will be (re-)inserted into the new table. Yet another table is added: one that contains only the generation timestamps clustered in a single partition. This makes it easy for CDC clients to learn about new generations. It also enables an elegant two-phase insertion procedure of the generation description: first we insert the streams; only after ensuring that a quorum of replicas contains them, we insert the timestamp. Thus, if any client observes a timestamp in the timestamps table (even using a ONE query), it means that a quorum of replicas must contain the list of streams. --- Nodes automatically ensure that the latest CDC generation's list of streams is present in the streams description table. When a new generation appears, we only need to update the table for this generation; old generations are already inserted. However, we've changed the description table (from `cdc_streams_descriptions` to `cdc_streams_descriptions_v2`). The existing mechanism only ensures that the latest generation appears in the new description table. We add an additional procedure that rewrites the older generations as well, if we find that it is necessary to do so (i.e. when some CDC log tables may contain data in these generations). Closes #8116 * github.com:scylladb/scylla: tests: add a simple CDC cql pytest cdc: add config option to disable streams rewriting cdc: rewrite streams to the new description table cql3: query_processor: improve internal paged query API cdc: introduce no_generation_data_exception exception type docs: cdc: mention system.cdc_local table cdc: coroutinize do_update_streams_description sys_dist_ks: split CDC streams table partitions into clustered rows cdc: use chunked_vector for streams in streams_version cdc: remove `streams_version::expired` field system_distributed_keyspace: use mutation API to insert CDC streams storage_service: don't use `sys_dist_ks` before it is started (cherry picked from commit `f0950e023d`)	2021-03-09 14:08:44 +02:00
Takuya ASADA	5ce71f3a29	scylla_raid_setup: don't abort using raiddev when array_state is 'clear' On Ubuntu 20.04 AMI, scylla_raid_setup --raiddev /dev/md0 causes '/dev/md0 is already using' (issue #7627). So we merged the patch to find free mdX (`587b909`). However, look into /proc/mdstat of the AMI, it actually says no active md device available: ubuntu@ip-10-0-0-43:~$ cat /proc/mdstat Personalities : unused devices: <none> We currently decide mdX is used when os.path.exists('/sys/block/mdX/md/array_state') == True, but according to kernel doc, the file may available even array is STOPPED: clear No devices, no size, no level Writing is equivalent to STOP_ARRAY ioctl https://www.kernel.org/doc/html/v4.15/admin-guide/md.html So we should also check array_state != 'clear', not just array_state existance. Fixes #8219 Closes #8220 (cherry picked from commit `2d9feaacea`)	2021-03-08 14:28:58 +02:00
Pekka Enberg	f06f4f6ee1	Update tools/jmx submodule * tools/jmx 2c95650...c510a56 (1): > APIBuilder: Unlock RW-lock in remove()	2021-03-04 14:36:45 +02:00
Hagit Segev	c2d9247574	release: prepare for 4.4.rc3 scylla-4.4.rc3	2021-03-04 13:38:43 +02:00
Raphael S. Carvalho	b4e393d215	compaction: Fix leak of expired sstable in the backlog tracker expired sstables are skipped in the compaction setup phase, because they don't need to be actually compacted, but rather only deleted at the end. that is causing such sstables to not be removed from the backlog tracker, meaning that backlog caused by expired sstables will not be removed even after their deletion, which means shares will be higher than needed, making compaction potentially more aggressive than it have to. to fix this bug, let's manually register these sstables into the monitor, such that they'll be removed from the tracker once compaction completes. Fixes #6054. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210216203700.189362-1-raphaelsc@scylladb.com> (cherry picked from commit `5206a97915`)	2021-03-01 14:14:33 +02:00
Avi Kivity	1a2b7037cd	Update seastar submodule * seastar 74ae29bc17...2c884a7449 (1): > io_queue: Fix "delay" metrics Fixes #8166.	2021-03-01 13:57:04 +02:00
Raphael S. Carvalho	048f5efe1c	sstables: Fix TWCS reshape for windows with at least min_threshold sstables TWCS reshape was silently ignoring windows which contain at least min_threshold sstables (can happen with data segregation). When resizing candidates, size of multi_window was incorrectly used and it was always empty in this path, which means candidates was always cleared. Fixes #8147. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210224125322.637128-1-raphaelsc@scylladb.com> (cherry picked from commit `21608bd677`)	2021-02-28 17:20:26 +02:00
Takuya ASADA	056293b95f	dist/debian: don't run dh_installinit for scylla-node-exporter when service name == package name dh_installinit --name <service> is for forcing install debian/.service and debian/.default that does not matches with package name. And if we have subpackages, packager has responsibility to rename debian/.service to debian/<subpackage>.service. However, we currently mistakenly running dh_installinit --name scylla-node-exporter for debian/scylla-node-exporeter.service, the packaging system tries to find destination package for the .service, and does not find subpackage name on it, so it will pick first subpackage ordered by name, scylla-conf. To solve the issue, we just need to run dh_installinit without --name when $product == 'scylla'. Fixes #8163 Closes #8164 (cherry picked from commit `aabc67e386`)	2021-02-28 17:20:26 +02:00
Takuya ASADA	f96ea8e011	scylla_setup: allow running scylla_setup with strict umask setting We currently deny running scylla_setup when umask != 0022. To remove this limitation, run os.chmod(0o644) on every file creation to allow reading from scylla user. Note that perftune.yaml is not really needed to set 0644 since perftune.py is running in root user, but setting it to align permission with other files. Fixes #8049 Closes #8119 (cherry picked from commit `f3a82f4685`)	2021-02-26 08:49:59 +02:00
Hagit Segev	49cd0b87f0	release: prepare for 4.4.rc2 scylla-4.4.rc2	2021-02-24 19:15:29 +02:00
Asias He	0977a73ab2	messaging_service: Move gossip ack message verb to gossip group Fix a scheduling group leak: INFO [shard 0] gossip - gossiper::run sg=gossip INFO [shard 0] gossip - gossiper::handle_ack_msg sg=statement INFO [shard 0] gossip - gossiper::handle_syn_msg sg=gossip INFO [shard 0] gossip - gossiper::handle_ack2_msg sg=gossip After the fix: INFO [shard 0] gossip - gossiper::run sg=gossip INFO [shard 0] gossip - gossiper::handle_ack_msg sg=gossip INFO [shard 0] gossip - gossiper::handle_syn_msg sg=gossip INFO [shard 0] gossip - gossiper::handle_ack2_msg sg=gossip Fixes #7986 Closes #8129 (cherry picked from commit `7018377bd7`)	2021-02-24 14:11:16 +02:00
Pekka Enberg	9fc582ee83	Update seastar submodule * seastar 572536ef...74ae29bc (3): > perftune.py: fix assignment after extend and add asserts > scripts/perftune.py: convert nic option in old perftune.yaml to list for compatibility > scripts/perftune.py: remove repeated items after merging options from file Fixes #7968.	2021-02-23 15:18:00 +02:00
Avi Kivity	4be14c2249	Revert "repair: Make removenode safe by default" This reverts commit `829b4c1438`. It ended up causing repair failures. Fixes #7965.	2021-02-23 14:14:07 +02:00
Tomasz Grabiec	3160dd4b59	table: Fix schema mismatch between memtable reader and sstable writer The schema used to create the sstable writer has to be the same as the schema used by the reader, as the former is used to intrpret mutation fragments produced by the reader. Commit `9124a70` intorduced a deferring point between reader creation and writer creation which can result in schema mismatch if there was a concurrent alter. This could lead to the sstable write to crash, or generate a corrupted sstable. Fixes #7994 Message-Id: <20210222153149.289308-1-tgrabiec@scylladb.com>	2021-02-23 13:48:33 +02:00
Avi Kivity	50a8eab1a2	Update seastar submodule * seastar a287bb1a3...572536ef4 (1): > rpc: streaming sink: order outgoing messages Fixes #7552.	2021-02-23 10:19:45 +02:00
Avi Kivity	04615436a0	Point seastar submodule at scylla-seastar.git This allows us to backport Seastar fixes to this branch.	2021-02-23 10:17:22 +02:00
Takuya ASADA	d1ab37654e	scylla_util.py: resolve /dev/root to get actual device on aws When psutil.disk_paritions() reports / is /dev/root, aws_instance mistakenly reports root partition is part of ephemeral disks, and RAID construction will fail. This prevents the error and reports correct free disks. Fixes #8055 Closes #8040 (cherry picked from commit `32d4ec6b8a`)	2021-02-21 16:22:51 +02:00
Nadav Har'El	b47bdb053d	alternator: fix ValidationException in FilterExpression - and more The first condition expressions we implemented in Alternator were the old "Expected" syntax of conditional updates. That implementation had some specific assumptions on how it handles errors: For example, in the "LT" operator in "Expected", the second operand is always part of the query, so an error in it (e.g., an unsupported type) resulted it a ValidationException error. When we implemented ConditionExpression and FilterExpression, we wrongly used the same functions check_compare(), check_BETWEEN(), etc., to implement them. This results in some inaccurate error handling. The worst example is what happens when you use a FilterExpression with an expression such as "x < y" - this filter is supposed to silently skip items whose "x" and "y" attributes have unsupported or different types, but in our implementation a bad type (e.g., a list) for y resulted in a ValidationException which aborted the entire scan! Interestingly, in once case (that of BEGINS_WITH) we actually noticed the slightly different behavior needed and implemented the same operator twice - with ugly code duplication. But in other operators we missed this problem completely. This patch first adds extensive tests of how the different expressions (Expected, QueryFilter, FilterExpression, ConditionExpression) and the different operators handle various input errors - unsupported types, missing items, incompatible types, etc. Importantly, the tests demonstrate that there is often different behavior depending on whether the bad input comes from the query, or from the item. Some of the new tests fail before this patch, but others pass and were useful to verify that the patch doesn't break anything that already worked correctly previously. As usual, all the tests pass on Cassandra. Finally, this patch fixes all these problems. The comparison functions like check_compare() and check_BETWEEN() now not only take the operands, they also take booleans saying if each of the operands came from the query or from an item. The old-syntax caller (Expected or QueryFilter) always say that the first operand is from the item and the second is from the query - but in the new-syntax caller (ConditionExpression or FilterExpression) any or all of the operands can come from the query and need verification. The old duplicated code for check_BEGINS_WITH() - which a TODO to remove it - is finally removed. Instead we use the same idea of passing booleans saying if each of its operands came from an item or from the query. Fixes #8043 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `653610f4bc`)	2021-02-21 09:25:01 +02:00
Piotr Sarna	e11ae8c58f	test: fix a flaky timeout test depending on TTL One of the USING TIMEOUT tests relied on a specific TTL value, but that's fragile if the test runs on the boundary of 2 seconds. Instead, the test case simply checks if the TTL value is present and is greater than 0, which makes the test robust unless its execution lasts for more than 1 million seconds, which is highly unlikely. Fixes #8062 Closes #8063 (cherry picked from commit `2aa4631148`)	2021-02-14 13:08:39 +02:00
Benny Halevy	e4132edef3	stream_session: prepare: fix missing string format argument As seen in mv_populating_from_existing_data_during_node_decommission_test dtest: ``` ERROR 2021-02-11 06:01:32,804 [shard 0] stream_session - failed to log message: fmt::v7::format_error (argument not found) ``` Fixes #8067 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210211100158.543952-1-bhalevy@scylladb.com> (cherry picked from commit `d01e7e7b58`)	2021-02-14 13:08:20 +02:00
Shlomi Livne	492f0802fb	scylla_io_setup did not configure pre tuned gce instances correctly scylla_io_setup condition for nr_disks was using the bitwise operator (&) instead of logical and operator (and) causing the io_properties files to have incorrect values Fixes #7341 Reviewed-by: Lubos Kosco <lubos@scylladb.com> Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Closes #8019 (cherry picked from commit `718976e794`)	2021-02-14 13:08:00 +02:00
Takuya ASADA	34f22e1df1	dist/debian: install scylla-node-exporter.service correctly node-exporter systemd unit name is "scylla-node-exporter.service", not "node-exporter.service". Fixes #8054 Closes #8053 (cherry picked from commit `856fe12e13`)	2021-02-14 13:07:29 +02:00
Nadav Har'El	acb921845f	cql-pytest: fix flaky timeuuid_test.py The test timeuuid_test.py::testTimeuuid sporadically failed, and it turns out the reason was a bug in the test - which this patch fixes. The buggy test created a timeuuid and then compared the time stored in it to the result of the dateOf() CQL function. The problem is that dateOf() returns a CQL "timestamp", which has millisecond resolution, while the timeuuid may have finer than millisecond resolution. The reason why this test rarely failed is that in our implementation, the timeuuid almost always gets a millisecond-resolution timestamp. Only if now() gets called more than once in one millisecond, does it pick a higher time incremented by less than a millisecond. What this patch does is to truncate the time read from the timeuuid to millisecond resolution, and only then compare it to the result of dateOf(). We cannot hope for more. Fixes #8060 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210211165046.878371-1-nyh@scylladb.com> (cherry picked from commit `a03a8a89a9`)	2021-02-14 13:06:59 +02:00
Botond Dénes	5b6c284281	query: use local limit for non-limited queries in mixed cluster Since `fea5067df` we enforce a limit on the memory consumption of otherwise non-limited queries like reverse and non-paged queries. This limit is sent down to the replicas by the coordinator, ensuring that each replica is working with the same limit. This however doesn't work in a mixed cluster, when upgrading from a version which doesn't have this series. This has been worked around by falling back to the old max_result_size constant of 1MB in mixed clusters. This however resulted in a regression when upgrading from a pre `fea5067df` to a post `fea5067df` one. Pre `fea5067df` already had a limit for reverse queries, which was generalized to also cover non-paged ones too by `fea5067df`. The regression manifested in previously working reverse queries being aborted. This happened because even though the user has set a generous limit for them before the upgrade, in the mix cluster replicas fall back to the much stricter 1MB limit temporarily ignoring the configured limit if the coordinator is an old node. This patch solves this problem by using the locally configured limit instead of the max_result_size constant. This means that the user has to take extra care to configure the same limit on all replicas, but at least they will have working reverse queries during the upgrade. Fixes: #8022 Tests: unit(release), manual test by user who reported the issue Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210209075947.1004164-1-bdenes@scylladb.com> (cherry picked from commit `3d001b5587`)	2021-02-09 18:06:43 +02:00
Yaron Kaikov	7d15319a8a	release: prepare for 4.4 Update Docker parameters for the 4.4 release. Closes #7932	2021-02-09 09:42:53 +02:00
Amnon Heiman	a06412fd24	API: Fix aggregation in column_familiy Few method in column_familiy API were doing the aggregation wrong, specifically, bloom filter disk size. The issue is not always visible, it happens when there are multiple filter files per shard. Fixes #4513 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes #8007 (cherry picked from commit `4498bb0a48`)	2021-02-08 17:03:45 +02:00
Avi Kivity	2500dd1dc4	Merge 'dist/offline_installer/redhat: fix umask error' from Takuya ASADA Since makeself script changes current umask, scylla_setup causes "scylla does not work with current umask setting (0077)" error. To fix that we need use latest version of makeself, and specfiy --keep-umask option. Fixes #6243 Closes #6244 * github.com:scylladb/scylla: dist/offline_redhat: fix umask error dist/offline_installer/redhat: support cross build (cherry picked from commit `bb202db1ff`)	2021-02-01 13:03:06 +02:00
Hagit Segev	fd868722dd	release: prepare for 4.4.rc1 scylla-4.4.rc1	2021-01-31 14:09:44 +02:00
Pekka Enberg	f470c5d4de	Update tools/python3 submodule * tools/python3 c579207...199ac90 (1): > dist: debian: adjust .orig tarball name for .rc releases	2021-01-25 09:26:33 +02:00
Pekka Enberg	3677a72a21	Update tools/python3 submodule * tools/python3 1763a1a...c579207 (1): > dist/debian: handle rc version correctly	2021-01-22 09:36:54 +02:00
Hagit Segev	46e6273821	release: prepare for 4.4.rc0 scylla-4.4.rc0	2021-01-18 20:29:53 +02:00
Jenkins Promoter	ce7e31013c	release: prepare for 4.4	2021-01-18 15:49:55 +02:00
Avi Kivity	60f5ec3644	Merge 'managed_bytes: switch to explicit linearization' from Michał Chojnowski This is a revival of #7490. Quoting #7490: The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope. We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes. As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized(). Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view). By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure. This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator. Closes #7820 * github.com:scylladb/scylla: test: add hashers_test memtable: fix accounting of managed_bytes in partition_snapshot_accounter test: add managed_bytes_test utils: fragment_range: add a fragment iterator for FragmentedView keys: update comments after changes and remove an unused method mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator row_cache: more indentation fixes utils: remove unused linearization facilities in `managed_bytes` class misc: fix indentation treewide: remove remaining `with_linearized_managed_bytes` uses memtable, row_cache: remove `with_linearized_managed_bytes` uses utils: managed_bytes: remove linearizing accessors keys, compound: switch from bytes_view to managed_bytes_view sstables: writer: add write_* helpers for managed_bytes_view compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view types: change equal() to accept managed_bytes_view types: add parallel interfaces for managed_bytes_view types: add to_managed_bytes(const sstring&) serializer_impl: handle managed_bytes without linearizing utils: managed_bytes: add managed_bytes_view::operator[] utils: managed_bytes: introduce managed_bytes_view utils: fragment_range: add serialization helpers for FragmentedMutableView bytes: implement std::hash using appending_hash utils: mutable_view: add substr() utils: fragment_range: add compare_unsigned utils: managed_bytes: make the constructors from bytes and bytes_view explicit utils: managed_bytes: introduce with_linearized() utils: managed_bytes: constrain with_linearized_managed_bytes() utils: managed_bytes: avoid internal uses of managed_bytes::data() utils: managed_bytes: extract do_linearize_pure() thrift: do not depend on implicit conversion of keys to bytes_view clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view cql3: expression: linearize get_value_from_mutation() eariler bytes: add to_bytes(bytes) cql3: expression: mark do_get_value() as static	2021-01-18 11:01:28 +02:00
Avi Kivity	ab44464911	Revert "docker: remove sshd from the image" This reverts commit `32fd38f349`. Some tests (in scylla-cluster-tests) depend on it.	2021-01-17 14:34:40 +02:00
Raphael S. Carvalho	00c29e1e24	table: Move notify_bootstrap_or_replace_*() out of line Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210117045747.69891-9-raphaelsc@scylladb.com>	2021-01-17 10:36:13 +02:00
Michał Chojnowski	5b72fb65ae	test: add hashers_test This test is a sanity check. It verifies that our wrappers over well known hashes (xxhash, md5, sha256) actually calculate exactly those hashes. It also checks that the `update()` methods of used hashers are linear with respect to concatenation: that is, `update(a + b)` must be equivalent to `update(a); update(b)`. This wasn't relied on before, but now we need to confirm that hashing fragmented keys without linearizing them won't break backward compatibility.	2021-01-15 18:28:24 +01:00
Michał Chojnowski	85048b349b	memtable: fix accounting of managed_bytes in partition_snapshot_accounter managed_bytes has a small overhead per each fragment. Due to that, managed_bytes containing the same data can have different total memory usage in different allocators. The smaller the preferred max allocation size setting is, the more fragments are needed and the greater total per-fragment overhead is. In particular, managed_bytes allocated in the LSA could grow in memory usage when copied to the standard allocator, if the standard allocator had a preferred max allocation setting smaller than the LSA. partition_snapshot_accounter calculates the amount of memory used by mutation fragments in the memtable (where they are allocated with LSA) based on the memory usage after they are copied to the standard allocator. This could result in an overestimation, as explained above. But partition_snapshot_accounter must not overestimate the amount of freed memory, as doing otherwise might result in OOM situations. This patch prevents the overaccounting by adding minimal_external_memory_usage(): a new version of external_memory_usage(), which ignores allocator-dependent overhead. In particular, it includes the per-fragment overhead in managed_bytes only once, no matter how many fragments there are.	2021-01-15 18:21:13 +01:00
Michał Chojnowski	d31771c0b2	test: add managed_bytes_test	2021-01-15 18:21:13 +01:00

1 2 3 4 5 ...

24892 Commits