scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 06:53:12 +00:00

Author	SHA1	Message	Date
Botond Dénes	ec6c540c30	sstables: stats: remove now unused sstable_partition_reads counter	2021-01-27 17:38:17 +02:00
Botond Dénes	5f18e9eb37	sstable: remove read_.row._flat() methods	2021-01-27 17:38:17 +02:00
Botond Dénes	c3b4e990a2	tree-wide: use sstables::make_reader() instead of the read_.row._flat() methods	2021-01-27 17:38:17 +02:00
Botond Dénes	080bc2ffec	sstables: pass partition_range to create_single_key_sstable_reader() We want to unify the various sstable reader creation methods and this method taking a ring position instead of a partition range like everybody else stands in the way of that. This is effect reverts `68663d0de`.	2021-01-27 17:38:14 +02:00
Botond Dénes	a5a8037f6e	sstables: sstable: add make_reader() This will be the only method to create sstable readers with. For now we leave the other variants, they as well as their users will be removed in a following patch.	2021-01-27 15:20:06 +02:00
Pekka Enberg	9fc83ac627	Update tools/java submodule * tools/java 8080009794...4a55b81941 (1): > cassandra.in.sh: remove debug message	2021-01-26 15:56:58 +02:00
Avi Kivity	90a6c3bd7a	build: reduce release mode inline tuning on aarch64 I see a miscompile on aarch64 where a call to format("{}", uuid) translates a function pointer to -1. When called, this crashes. Reduce the inline threshold from 2500 to 600. This doesn't guarantee no miscompiles but all the tests pass with this parameter. Closes #7953	2021-01-26 11:14:42 +02:00
Tomasz Grabiec	90f6bb754e	Merge "raft: replication tests: fixes for debug mode" from Alejo The following patches fix issues seen occasionally in debug mode. Notes: - In debug mode there's still the UB nullptr arithmetic warning. * https://github.com/alecco/scylla/tree/raft-ale-tests-07h-wait-propagation: raft: replication test: wait for log propagation raft: replication test: move wait for log to a function raft: replication test: remove unused member raft: replication test: use later() raft: testing: remove election wait time and just yield	2021-01-26 11:14:42 +02:00
Avi Kivity	f58151d191	test: mutation_test: fix initialization order bug with thread local storage test_cell_external_memory_usage uses with_allocator() to observe how some types allocate memory. However, compiler reordering (observed with clang 11 on aarch64) can move the various thread-local CQL type object initialization into the with_allocator() scope; so any managed object allocated as part of this initialization also gets measured, and the test fails. The code movement is legal, as far as I can tell. Fix this by initializing the type object early; use an atomic_thread_fence as an optimization barrier so the compiler doesn't eliminate the or move the early initialization. Closes #7951	2021-01-26 11:14:42 +02:00
Nadav Har'El	356250f720	cql-pytest: tests for fromJson() failing to set tuple elements to null This patch adds a test for trying to set a tuple element to null with fromJson(), which works on Cassandra but fails on Scylla. So the test xfails on Scylla. Reproduces issue #7954. Refs #7954. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210124082311.126300-1-nyh@scylladb.com>	2021-01-26 11:14:42 +02:00
Avi Kivity	05c435dddc	Merge "mutation readers: remove next_partition() workarounds" from Botond " `next_partition()` used to return void, so readers that had to call future returning code had to work around this. Now that `next_partition()` returns a future, we can get rid of these workarounds. Tests: unit(release, debug) " * 'next-partition-cross-shard-readers/v1' of https://github.com/denesb/scylla: mutation_reader: reader_lifecycle_policy::stopped_reader: drop pending_next_partition flag mutation_reader: evictable_reader: remove next_partition() workaround mutation_reader: shard_reader: remove next_partition() workaround mutation_reader: foreign_reader: remove next_partition() workaround	2021-01-26 11:14:42 +02:00
Nadav Har'El	067330c08f	Merge 'redis: support large redis message' from Takuya ASADA If the message is larger than current buffer size, we need to consume more data until we reach to tail of the message. To do so, we need to return nullptr when it's not on the tail. Fixes #7273 Closes #7903 * github.com:scylladb/scylla: redis: rename _args_size/_size_left There are two types of numerical parameter in redis protocol: - *[0-9]+ defined array size - $[0-9]+ defined string size redis: fix large message handling	2021-01-25 10:11:17 +02:00
Takuya ASADA	229940aaff	redis: rename _args_size/_size_left There are two types of numerical parameter in redis protocol: - *[0-9]+ defined array size - $[0-9]+ defined string size Currently, array size is stored to args_count, and string size is stored to _arg_size / _size_left. It's bit hard to understand since both uses same word "arg(s)", let's rename string size variables to _bytes_count / _bytes_left.	2021-01-25 10:26:37 +09:00
Takuya ASADA	7a6ee9858f	redis: fix large message handling If the message is larger than current buffer size, we need to consume more data until we reach to tail of the message. To do so, we need to return nullptr when it's not on the tail. Fixes #7273	2021-01-25 10:26:37 +09:00
Alejo Sanchez	0d694990cf	raft: replication test: wait for log propagation Wait until entries propagate after adding and before changing leader using the same code as done for partitioning. This fixes occasional hangs in debug mode when a test switches to a different leader without leaving enough time for full propagation. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:33:54 -04:00
Alejo Sanchez	4d1ec88f90	raft: replication test: move wait for log to a function Move wait for log propagation to its own function for reuse. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:25:48 -04:00
Alejo Sanchez	72f9b108e3	raft: replication test: remove unused member Initial state doesn't need to specify total entries anymore. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:25:48 -04:00
Alejo Sanchez	db95d6e7f1	raft: replication test: use later() Instead of sleep 1us use later() Also use later to yield after sending append entries in rpc test impl. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:25:48 -04:00
Alejo Sanchez	f875ff72c9	raft: testing: remove election wait time and just yield Replace sleep time for elect_me_leader with yield to speed things up. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:25:48 -04:00
Pekka Enberg	8258556832	Update tools/python3 submodule * tools/python3 c579207...199ac90 (1): > dist: debian: adjust .orig tarball name for .rc releases	2021-01-24 21:30:59 +02:00
Gleb Natapov	020da49c89	storage_proxy: remove no longer needed range_slice_read_executor After support for mixed cluster compatibility feature DIGEST_MULTIPARTITION_READ was dropped in `854a44ff9b` range_slice_read_executor and never_speculating_read_executor become identical, so remove the former for good. Message-Id: <20210124122731.GA1122499@scylladb.com>	2021-01-24 14:45:22 +02:00
Benny Halevy	088f92e574	paxos_state: learn: fix injected error description It was copy-pasted from another injection point. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201220091439.3604201-1-bhalevy@scylladb.com>	2021-01-24 11:51:23 +02:00
Takuya ASADA	5d527bd17e	scylla_ntp_setup: use chrony on all distributions To simplify scylla_ntp_setup, use chrony on all distributions. Closes #7922	2021-01-24 11:45:58 +02:00
Takuya ASADA	984dc44ebf	dist: drop /etc/security/limits.d/scylla.conf Drop limits.d conf file, since we don't use it. We set these parameters via systemd unit file instead. Fixes #7925 Closes #7941	2021-01-24 11:43:39 +02:00
Benny Halevy	1847d49971	test: test_env: pick the highest sstable version by default If possible, test the highest sstable format version, as it's the mostly used. If there pre-written sstables we need to load from the test directory from an older version, either specify their version explicitly, or use the new test_env::reusable_sst method that looks up the latest sstable version in the given directory and generation. Test: unit(release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201210161822.2833510-1-bhalevy@scylladb.com>	2021-01-24 10:38:55 +02:00
Botond Dénes	226088d12e	mutation_reader: reader_lifecycle_policy::stopped_reader: drop pending_next_partition flag Its not used anymore.	2021-01-22 16:18:59 +02:00
Botond Dénes	4eb65b12a0	mutation_reader: evictable_reader: remove next_partition() workaround `next_partition()` now returns a future<>, so we can forward it to the remote shard in the scope of the next partition call, remove the now obsolete workaround for the synchronous next partition.	2021-01-22 16:18:30 +02:00
Botond Dénes	febd2feb4c	mutation_reader: shard_reader: remove next_partition() workaround `next_partition()` now returns a future<>, so we can forward it to the remote shard in the scope of the next partition call, remove the now obsolete workaround for the synchronous next partition.	2021-01-22 15:53:05 +02:00
Botond Dénes	81da6b756f	mutation_reader: foreign_reader: remove next_partition() workaround `next_partition()` now returns a future<>, so we can forward it to the remote shard in the scope of the next partition call, remove the now obsolete workaround for the synchronous next partition.	2021-01-22 15:30:36 +02:00
Nadav Har'El	cb9e2ee00a	cql-pytest: tests for fromJson() setting a map<ascii, int> The fromJson() function can take a map JSON and use it to set a map column. However, the specific example of a map<ascii, int> doesn't work in Scylla (it does work in Cassandra). The xfailing tests in this patch demonstrate this. Although the tests use perfectly legal ASCII, scylla fails the fromJson() function, with a misleading error. Refs #7949. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210121233855.100640-1-nyh@scylladb.com>	2021-01-22 14:29:25 +01:00
Pavel Emelyanov	90d445464b	compaction: Remove compaction_manager::enabled() This method was marked with 'FIXME -- should not be public' when it was introduced. Since then it has stopped being used and can even be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210122083146.5886-1-xemul@scylladb.com>	2021-01-22 14:07:38 +02:00
Kamil Braun	570d15c7bc	multishard_combining_reader: do not use `smp::count` `multishard_combining_reader` currently only works under the assumption that every table uses the same sharder configured using the node's number of shards. But we could potentially specify a different sharder for a chosen table, e.g. one that puts everything on shard 0. Then this assumption will be broken and the reader causes a segfault. Fixes #7945.	2021-01-21 18:28:18 +02:00
Nadav Har'El	328be1ca7c	cql-pytest: tests for fromJson() not accepting empty string as integer When writing to an integer column, Cassandra's fromJson() function allows not just JSON number constants, it also allows a string containing a number. Strings which do not hold a number fail with a FunctionFailure. In particular, the empty string "" is an invalid number, and should fail. The tests in this patch check this for two integer types: int and varint. Curiously, Cassandra and Scylla have opposite bugs here: Scylla fails to recognize the error for varint, while Cassandra fails to recognize the error for int. The tests in this patch reproduce these bugs. The tests demonstrating Scylla's bug are marked xfail, and the tests demonstrating Cassandra's bug is marked "cassandra_bug" (which means it is marked xfail only when running against Cassandra, but expected to succeed on Scylla. Refs #7944. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210121133833.66075-1-nyh@scylladb.com>	2021-01-21 15:24:48 +01:00
Nadav Har'El	702b1b97bf	cql: fix error return from execution of fromJson() and other functions As reproduced in cql-pytest/test_json.py and reported in issue #7911, failing fromJson() calls should return a FUNCTION_FAILURE error, but currently produce a generic SERVER_ERROR, which can lead the client to think the server experienced some unknown internal error and the query can be retried on another server. This patch adds a new cassandra_exception subclass that we were missing - function_execution_exception - properly formats this error message (as described in the CQL protocol documentation), and uses this exception in two cases: 1. Parse errors in fromJson()'s parameters are converted into a function_execution_exception. 2. Any exceptions during the execute() of a native_scalar_function_for function is converted into a function_execution_exception. In particular, fromJson() uses a native_scalar_function_for. Note, however, that functions which already took care to produce a specific Cassandra error, this error is passed through and not converted to a function_execution_exception. An example is the blobAsText() which can return an invalid_request error, so it is left as such and not converted. This also happens in Cassandra. All relevant tests in cql-pytest/test_json.py now pass, and are no longer marked xfail. This patch also includes a few more improvements to test_json.py. Fixes #7911 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210118140114.4149997-1-nyh@scylladb.com>	2021-01-21 15:21:13 +01:00
Nadav Har'El	49440d67ad	Merge: Fix multiple issues with timeuuid type Merged patch series by Konstantin Osipov: "These series improve uniqueness of generated timeuuids and change list append/prepend logic to use client/LWT timestamp in timeuuids generated for list keys. Timeuuid compare functions are optimized. The test coverage is extended for all of the above." uuid: add a comment warning against UUID::operator< uuid: replace slow versions of timeuiid compare with optimized/tested versions. test: add tests for legacy uuid compare & msb monotonicity test: add a test case for append/prepend limit test: add a test case for monotonicity of timeuuid least significant bits uuid: implement optimized timeuuid compare test: add a test case for list prepend/append with custom timestamp lists: rewrite list prepend to use append machinery lists: use query timestamp for list cell values during append uuid: fill in UUID node identifier part of UUID test: add a CQL test for list append/prepend operations	2021-01-21 13:20:07 +02:00
Konstantin Osipov	e18e2cb9f2	uuid: add a comment warning against UUID::operator<	2021-01-21 13:03:59 +03:00
Konstantin Osipov	845f6c667b	uuid: replace slow versions of timeuiid compare with optimized/tested versions.	2021-01-21 13:03:59 +03:00
Konstantin Osipov	56d8d166cb	test: add tests for legacy uuid compare & msb monotonicity	2021-01-21 13:03:59 +03:00
Konstantin Osipov	257c5b0879	test: add a test case for append/prepend limit	2021-01-21 13:03:59 +03:00
Konstantin Osipov	d6e65a3735	test: add a test case for monotonicity of timeuuid least significant bits Ensure that timeuuid least significant bits are compared correctly.	2021-01-21 13:03:59 +03:00
Konstantin Osipov	0af3758aff	uuid: implement optimized timeuuid compare Introduce uint64_t based comparator for serialized timeuuids. Respect Cassandra legacy for timeuuid compare order. Scylla uses two versions of timeuuid compare: - one for timeuuid values stored in uuid columns - a different one for timeuuid values stored in timeuuid columns. This commit re-implements the implementations of these comparators in types.cc and deprecates the respective implementations types.cc. They will be removed in a following patch. A micro-benchmark at https://github.com/alecco/timeuuid-bench/ shows 2-4x speed up of the new comparators.	2021-01-21 13:03:59 +03:00
Konstantin Osipov	b4500a55c7	test: add a test case for list prepend/append with custom timestamp Scylla now takes a custom timestamp into account when executing list append/prepend operations. Test the new semantics.	2021-01-21 13:03:59 +03:00
Konstantin Osipov	232ce6f611	lists: rewrite list prepend to use append machinery Rewrite list prepend to use the same machinery as append, and thus produce correct results when used in LWT. After this patch, list prepend begins to honor user supplied timestamps. If a user supplied timestamp for prepend is less than 2010-01-01 00:00:00 an exception is thrown. Fixes #7611	2021-01-21 13:03:59 +03:00
Konstantin Osipov	2b8ce83eea	lists: use query timestamp for list cell values during append Scylla list cells are represented internally as a map of timeuuid => value. To append a new value to a list the coordinator generates a timeuuid reflecting the current time as key and adds a value to the map using this key. Before this patch, Scylla always generated a timeuuid for a new value, even if the query had a user supplied or LWT timestamp. This could break LWT linearizability. User supplied timestamps were ignored. This is reported as https://github.com/scylladb/scylla/issues/7611 A statement which appended multiple values to a list or a BATCH generated an own microsecond-resolution timeuuid for each value: BEGIN BATCH UPDATE ... SET a = a + [3] UPDATE ... SET a = a + [4] APPLY BATCH UPDATE ... SET a = a + [3, 4] To fix the bug, it's necessary to preserve monotonicity of timeuuids within a batch or multi-value append, but make sure they all use the microsecond time, as is set by LWT or user. To explain the fix, it's first necessary to recall the structure of time-based UUIDs: 60 bits: time since start of GMT epoch, year 1582, represented in 100-nanosecond units 4 bits: version 14 bits: clock sequence, a random number to avoid duplicates in case system clock is adjusted 2 bits: type 48 bits: MAC address (or other hardware address) The purpose of clockseq bits is as defined in https://tools.ietf.org/html/rfc4122#section-4.1.5 is to reduce the probability of UUID collision in case clock goes back in time or node id changes. The implementation should reset it whenever one of these events may occur. Since LWT microsecond time is guaranteed to be unique by Paxos, the RFC provisioning for clockseq and MAC slots becomes excessive. The fix thus changes timeuuid slot content in the following way: - time component now contains the same microsecond time for all values of a statement or a batch. The time is unique and monotonic in case of LWT. Otherwise it's most always monotonic, but may not be unique if two timestamps are created on different coordinators. - clockseq component is used to store a sequence number which is unique and monotonic for all values within the statement/batch. - to protect against time back-adjustments and duplicates if time is auto-generated, MAC component contains a random (spoof) MAC address, re-created on each restart. The address is different at each shard. The change is made for all sources of time: user, generated, LWT. Conditioning the list key generation algorithm on the source of time would unnecessarily complicate the code while not increase quality (uniqueness) of created list keys. Since 14 bits of clockseq provide us with only 16383 distinct slots per statement or batch, 3 extra bits in nanosecond part of the time are used to extend the range to 131071 values per statement/batch. If the rang is exceeded beyond the limit, an exception is produced. A twist on the use of clockseq to extend timeuuid uniqueness is that Scylla, like Cassandra, uses int8 compare to compare lower bits of timeuuid for ordering. The patch takes this into account and sign-complements the clockseq value to make it monotonic according to the legacy compare function. Fixes #7611 test: unit (dev)	2021-01-21 13:03:59 +03:00
Konstantin Osipov	6d1781be36	uuid: fill in UUID node identifier part of UUID Before this patch, UUID generation code was not creating sufficiently unique IDs: the 6 byte node identifier was mostly empty, i.e. only containing shard id. This could lead to collisions between queries executed concurrently at different coordinators, and, since timeuuid is used as key in list append and prepend operations, lead to lost updates. To generate a unique node id, the patch uses a combination of hardware MAC address (or a random number if no hardware address is available) and the current shard id. The shard id is mixed into higher bits of MAC, to reduce the chances on NIC collision within the same network. With sufficiently unique timeuuids as list cell keys, such updates are no longer lost, but multi-value update can still be "merged" with another multi-value update. E.g. if node A executes SET l = l + [4, 5] and node B executes SET l = l + [6, 7], the list value could be any of [4, 5, 6, 7], [4, 6, 5, 7], [6, 4, 5, 7] and so on. At least we are now less likely to get any value lost. Fixes #6208. @todo: initialize UUID subsystem explicitly in main() and switch to using seastar::engine().net().network_interfaces() test: unit (dev)	2021-01-21 13:03:53 +03:00
Avi Kivity	4cfaab208e	allocation_strategy: set preferred max contiguous allocation to 128k for standard allocations Now that managed_bytes and its users do not assume that a managed_bytes instance allocated using standard_allocation_strategy is non-fragmented, we can set the preferred max contiguous allocation to 128k. This causes managed_bytes to fragment instances that are larger than this size. Note that managed_bytes is the only user. Closes #7943	2021-01-21 11:15:13 +02:00
Tomasz Grabiec	f08a3e3fd8	Merge "raft: test fixes, etcd tests, simplification" from Alejo This patch set adds etcd unit tests for raft. It also includes a fix for replication test in debug mode and a simplification for append_request. Tests: unit ({dev}), unit ({debug}), unit ({release}) * https://github.com/alecco/scylla/tree/raft-ale-tests-09b: raft: etcd unit tests: test log replication raft: boost test etcd: test fsm can vote from any state raft: boost test etcd: port TestLeaderElectionOverwriteNewerLogs raft: replication test: add etcd test for cycling leaders raft: testing: provide primitives to wait for log propagation raft: etcd unit tests: initial boost tests raft: combine append_request _receive and _send	2021-01-21 10:41:33 +02:00
Pekka Enberg	7d98e05923	Update tools/python3 submodule * tools/python3 1763a1a...c579207 (1): > dist/debian: handle rc version correctly	2021-01-21 10:41:33 +02:00
Avi Kivity	daa0e964fc	dbuild: avoid --pids-limit with podman and cgroupsv1 Podman doesn't correctly support --pids-limit with cgroupsv1. Some versions ignore it, and some versions reject the option. To avoid the error, don't supply --pids-limit if cgroupsv2 is not available (detected by its presence in /proc/filesystems). The user is required to configure the pids limit in /etc/containers/containers.conf. Fixes #7938. Closes #7939	2021-01-21 10:41:33 +02:00
Botond Dénes	4d581f1bb3	docs/README.md: guides: also mention running and debugging Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210120083304.36447-1-bdenes@scylladb.com>	2021-01-20 16:07:29 +02:00

1 2 3 4 5 ...

24937 Commits