scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Vladimir Krivopalov	d4e0fa96e3	tests: Read rows only index Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	5561c713d9	sstables: Do not seek through the promoted index for static row positions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	917528c427	sstables: Read promoted index stored in SSTables 3.x ('mc') format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	86d14f8166	sstables: Make promoted_index_block support clustering keys for both ka/la and mc formats. This is a pre-requisite for parsing promoted index blocks written in SSTables 'mc' format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:51:13 -07:00
Vladimir Krivopalov	79c2f0095c	utils: Add overloaded_functor helper. The overloaded_functor class template can be used to encompass multiple lambdas accepting different types into a single callable object that can be used with any of those types. One application is visitors for std::variant where different handling is required for different types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	593d8faf7d	position_in_partition: Add a constructor from range_tag_t{}, bound_kind and clustering_key_prefix. This facilitates position_in_partition creation when parsing range tombstones bounds from SSTables files. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	997ebaaa14	sstables: Support reading signed vints in continuous_data_consumer. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	540dfcc9bf	sstables: Factor out the code building a vector of fixed clustering values lengths. This code will be re-used in promoted_index_blocks_parser to parse clustering key prefixes from SSTables 3.x format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	741d5f3b5d	sstables: Remove unused includes from index_entry.hh Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	b29b948872	tests: Add test for reading SSTables 3.x index file with empty promoted index. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	054eb2df66	tests: Rename sstable_assertions.hh -> tests/index_reader_assertions.hh The previous name of the file is moreover confusing as we have several sstable_assertions classes throughout tests but this header only contains a class for index reader assertions. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Vladimir Krivopalov	f50ffa267f	sstables: Support parsing index entries from SSTables 3.x format. With this patch, index_reader is capable of reading index_entries from both 'ka'/'la' and 'mc' formats. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Piotr Jastrzebski	d0f8c71e28	sstables: move bound_kind_m to header and add helper methods. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-07-20 13:50:17 -07:00
Duarte Nunes	6bd087facb	Merge 'Make indexed queries with pk restrictions non-filtering' from Piotr " Queries that use secondary index and have a full partition key restriction or full primary key restriction should not require filtering - it's sufficient to add these restrictions to the index query. This also adds secondary index tests to cover this case. Tests: unit (release) " * 'si_and_pk_restrictions_2' of https://github.com/psarna/scylla: tests: add index + partition key test cql3: make index+primary key restrictions filtering-independent cql3: use primary key restrictions in filtering index queries cql3: add is_all_eq to primary key restrictions cql3: add explicit conversion between key restrictions cql3: add apply_to() method to single column restriction cql3: make primary key restrictions' values unambiguous	2018-07-19 16:54:43 +01:00
Tomasz Grabiec	d5534d6a77	Merge "Improve categorization of messaging verbs into connections" from Avi Now that verb categorizations also affect scheduling, getting them correct is more important. The first three patches in this series improve the infrastructure a little, and the forth fixes some categorization errors wrt. repair/streaming verbs. * https://github.com/avikivity/scylla msg-idx-sanity/v1: messaging: choose connection index via a look-up table messaging: convert do_get_rpc_client_idx into a switch messaging: remove default when computing rpc client index messaging: categorize more streaming/repair verbs as streaming	2018-07-19 15:03:15 +02:00
Tomasz Grabiec	ef4fb1f91d	sstables: mp_row_consumer_m: Add trace-level logging Very useful for debugging. The old mp_row_consumer_k_l had this. Message-Id: <1532000326-28649-1-git-send-email-tgrabiec@scylladb.com>	2018-07-19 14:58:00 +03:00
Asias He	1f06ee3960	range_streamer: Limit nr of nodes to stream in parallel For example, to bootstrap a 50th node in a cluster [shard 0] range_streamer - Bootstrap with [127.0.0.8, 127.0.0.2, 127.0.0.24, 127.0.0.21, 127.0.0.49, 127.0.0.44, 127.0.0.9, 127.0.0.7, 127.0.0.47, 127.0.0.15, 127.0.0.5, 127.0.0.30, 127.0.0.14, 127.0.0.12, 127.0.0.36, 127.0.0.11, 127.0.0.48, 127.0.0.28, 127.0.0.33, 127.0.0.10, 127.0.0.41, 127.0.0.4, 127.0.0.40, 127.0.0.3, 127.0.0.6, 127.0.0.43, 127.0.0.22, 127.0.0.26, 127.0.0.42, 127.0.0.25, 127.0.0.17, 127.0.0.37, 127.0.0.23, 127.0.0.13, 127.0.0.38, 127.0.0.1, 127.0.0.18, 127.0.0.20, 127.0.0.39, 127.0.0.27, 127.0.0.34, 127.0.0.32, 127.0.0.19, 127.0.0.16, 127.0.0.31, 127.0.0.45, 127.0.0.29, 127.0.0.35, 127.0.0.46] for keyspace=keyspace1 started, nodes_to_stream=49, nodes_in_parallel=49 the new node will get data from 49 existing nodes. Currently, it will stream from all the 49 existing nodes at the same time. It is not a good idea to stream from all the nodes in parallel which can overwhelm the bootstrap node, i.e., 49 nodes sending, 1 node receiving. To fix this, limit the nr of nodes to stream in parallel. We should have a better control over the memory usage and parallelism. But for now, limit the nr of nodes to a maximum of 16 as a starter. With this limit, each shard can work with as many as 16 remote nodes in parallel, I think this has enough parallelism for streaming in terms of performance. This change have effect on the bootstrap/decommission/removenode node operations, and do not have effect on repair. Refs #2782 Message-Id: <980610dc97490d4f16281a0c3203b9bee73e04e4.1531989557.git.asias@scylladb.com>	2018-07-19 11:44:05 +03:00
Avi Kivity	31d4d37161	Merge "Reduce continuous memory usage in gossip" from Asias" " Use chunked_vector instead of vector. It won't have compatibility issues because chunked_vector and vector have the same on wire format. Refs #278 " * 'asias/gossip_memory_v2' of github.com:scylladb/seastar-dev: gossip: Reduce continuous memory usage to_string: Add std::list and utils::chunked_vector support serializer: Add chunked_vector support	2018-07-19 09:12:09 +03:00
Tomasz Grabiec	9a0548397c	tests: row_cache: Add test for eviction from invalidated partitions Message-Id: <1531933216-28026-1-git-send-email-tgrabiec@scylladb.com>	2018-07-18 21:06:36 +03:00
Piotr Sarna	82c049692b	tests: add index + partition key test Tests covering querying both index and partition keys are added - it's checked that such queries do not require filtering.	2018-07-18 18:45:08 +02:00
Piotr Sarna	0c85bdcdc2	cql3: make index+primary key restrictions filtering-independent If full partition key (or full primary key) is used in an indexed query, it should not require filtering, because queries like that can be efficiently narrowed down with stricter index restrictions.	2018-07-18 18:45:08 +02:00
Piotr Sarna	2542630a18	cql3: use primary key restrictions in filtering index queries If both index and partition key is used in a query, it should not require filtering, because indexed query can be narrowed down with partition key information. This commit appends partition key restrictions to index query.	2018-07-18 18:45:08 +02:00
Piotr Sarna	27590816f0	cql3: add is_all_eq to primary key restrictions is_all_eq is later needed to decide if restrictions can be used in an indexed query.	2018-07-18 18:45:08 +02:00
Piotr Sarna	20a349777e	cql3: add explicit conversion between key restrictions Partition and clustering key restrictions sometimes need to be converted and this commit provides a way to do that.	2018-07-18 18:45:08 +02:00
Piotr Sarna	f1357defd6	cql3: add apply_to() method to single column restriction This method allows copying single column restriction, possibly with a new column definition.	2018-07-18 18:44:38 +02:00
Tomasz Grabiec	dc453d4f5d	tests: flat_mutation_reader: Use fluent assertions for better error messages Message-Id: <1531908313-29810-2-git-send-email-tgrabiec@scylladb.com>	2018-07-18 13:52:23 +01:00
Tomasz Grabiec	604c8baed8	tests: flat_mutation_reader_assertions: Introduce produces(mutation_fragment) Message-Id: <1531908313-29810-1-git-send-email-tgrabiec@scylladb.com>	2018-07-18 13:52:23 +01:00
Tomasz Grabiec	c46813717c	tests: sstables: Check that reading large index pages does not cause large allocations Reproducer of #3597. Message-Id: <1531914040-5427-1-git-send-email-tgrabiec@scylladb.com>	2018-07-18 14:56:41 +03:00
Piotr Sarna	30f9924ad5	cql3: make primary key restrictions' values unambiguous using directive must be used to disambiguate the overridden method.	2018-07-18 13:28:37 +02:00
Avi Kivity	31151cadd4	Merge "row_cache: Fix violation of continuity on concurrent eviction and population" from Tomasz " The problem happens under the following circumstances: - we have a partially populated partition in cache, with a gap in the middle - a read with no clustering restrictions trying to populate that gap - eviction of the entry for the lower bound of the gap concurrent with population The population may incorrectly mark the range before the gap as continuous. This may result in temporary loss of writes in that clustering range. The problem heals by clearing cache. Caught by row_cache_test::test_concurrent_reads_and_eviction, which has been failing sporadically. The problem is in ensure_population_lower_bound(), which returns true if current clustering range covers all rows, which means that the populator has a right to set continuity flag to true on the row it inserts. This is correct only if the current population range actually starts since before all clustering rows. Otherwise, we're populating since _last_row and should consult it. Fixes #3608. " * 'tgrabiec/fix-violation-of-continuity-on-concurrent-read-and-eviction' of github.com:tgrabiec/scylla: row_cache: Fix violation of continuity on concurrent eviction and population position_in_partition: Introduce is_before_all_clustered_rows()	2018-07-18 10:11:34 +03:00
Asias He	506eed325a	dht: Fix typo in boot_strapper.cc Eror -> Error Message-Id: <ab1050c526f6e70c3a365595376acde7706d86e9.1531877929.git.asias@scylladb.com>	2018-07-18 10:00:27 +03:00
Tomasz Grabiec	894961006b	Merge "db/view/view_builder: Fixes to bookkeeping" from Duarte This series contains a couple of fixes to the bookkeeping of the view build process, which could cause data to be left behind in the system tables. * git@github.com:duarten/scylla.git materialized-views/view-build-fixes/v1: Duarte Nunes (3): db/system_keyspace: Add function to remove view build status of a shard db/view: Don't have shard 0 clear other shard's status on drop db/view: Restrict writes to the distributed system keyspace to shard 0	2018-07-17 18:01:28 +02:00
Tomasz Grabiec	25d09e51ac	Merge "db/view/build_progress_virtual_reader: Fixes to clustering key adjusts" from Duarte This series contains a couple of fixes to the adjusting of clustering keys in the build_progress_virtual_reader, some of which could potentially cause heap overflows when querying the legacy system table. * git@github.com:duarten/scylla.git materialized-views/build-progress-virtual-reader-fixes/v1: Duarte Nunes (3): db/view/build_progress_virtual_reader: Use correct schema to adjust ck db/view/build_progress_virtual_reader: Fix full ck detection db/view/build_progress_virtual_reader: Also adjust end RT bound	2018-07-17 18:00:30 +02:00
Avi Kivity	9ffa6b9ad6	Merge "Fix leaks and corruption of continuity in cache in case of bad_alloc from key linearization" from Tomasz " This series fixes two issues related to bad_allocs and keys which require linearization (larger than 12.8 KiB). With such keys, comparators may throw if memory allocation fails. This may cause lookups in partition and rows trees to fail with bad_alloc. The first issue (#3583) was that partition version merging (mutation_partition::apply_monotonically()) was not taking into account that lookups may fail. If we fail, the partition which is being applied may be incorrectly left with the clustering range since the begging of the range up to the current row marked as continuous, if the current row has the continuity flag set, because we've moved all of the preceding rows into the target, and the correct lower bound row is no longer there in the source. This may mark some discontinuous ranges as continuous. Merging is retried by allocating_section, and there will be no problem if it eventually succeeds, original continuity will be reflected in the sum. The problem will persist if it doesn't eventually succeed, when we're really out of memory. The user-perceivable effect of this would be temporary loss of writes in the clustering range which was marked as continuous but shouldn't. Introduced in 2.2-rc1. The second issue (#3585) is that the code which inserts partitions in memtable and cache will leak the entry if boost::intrusive_set::insert() throws. This will also cause SIGSEGV when cache tries to evict from such a leaked entry. " * tag 'tgrabiec/fix-bad-continuity-on-oom-in-apply-v2' of github.com:tgrabiec/scylla: managed_bytes: Mark read_linearize() as an allocation point tests: Relax expectation about continuity after failed merging tests: mutation_partition: Verify continuity is consistent on bad_alloc on merging tests: Switch to seastar's allocation failure injector mutation_partition: Introduce set_continuity() clustering_interval_set: Introduce contained_in() clustering_interval_set: Introduce add() overload accepting another interval set mutation_partition: Fix merging to not leave the source with broader continuity on bad_alloc mutation_partition: Preserve continuity in case row merging with no tracker throws memtable, cache: Fix exception safety of partition entry insertions	2018-07-17 18:19:37 +03:00
Tomasz Grabiec	477d7b439b	row_cache: Fix violation of continuity on concurrent eviction and population ensure_population_lower_bound() returned true if current clustering range covers all rows, which means that the populator has a right to set continuity flag to true on the row it inserts. This is correct only if the current population range actually starts since before all clustering rows. Otherwise we're populating since _last_row, and should consult it. The fix introduces a new flag, set when starting to populte, which indicates if we're populating from the beginning of the range or not. We cannot simply check if _last_row is set in ensure_population_lower_bound() because _last_row can be set and then become empty again. Fixes #3608	2018-07-17 16:43:21 +02:00
Tomasz Grabiec	8d47d21149	position_in_partition: Introduce is_before_all_clustered_rows()	2018-07-17 16:43:21 +02:00
Tomasz Grabiec	612b223819	managed_bytes: Mark read_linearize() as an allocation point	2018-07-17 16:39:43 +02:00
Tomasz Grabiec	be678a81ee	tests: Relax expectation about continuity after failed merging Currently we check that the sum of continuities is exactly the same as expected on failure. Relax this to require that continuity is not broader, since in some bad_alloc scenarios, or preemption, we will have to mark some ranges as discontinuous.	2018-07-17 16:39:43 +02:00
Tomasz Grabiec	f366ac76e8	tests: mutation_partition: Verify continuity is consistent on bad_alloc on merging	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	d9db79a85d	tests: Switch to seastar's allocation failure injector It catches more allocation sites.	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	6b1fe6cbe5	mutation_partition: Introduce set_continuity()	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	ac772cbd81	clustering_interval_set: Introduce contained_in()	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	d24ebe8565	clustering_interval_set: Introduce add() overload accepting another interval set	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	c6c54021a8	mutation_partition: Fix merging to not leave the source with broader continuity on bad_alloc When clustering keys are larger than 12.8 KiB they may get fragmented and key comparator will need to linearize them on comparison. This may cause lookups in the rows tree to fail with bad_alloc. Partition version merging (mutation_partition::apply_monotonically()) was not taking this into account. If we fail on lookup, the partition which is being applied may be incorrectly left with the clustering range since the begging up to the current row marked as continuous, if the current row has the continuity flag set, because we've moved all of the preceding rows into the target, and the correct lower bound row is no longer there in the source. This may mark some discontinuous ranges as continuous. Merging is retried by allocating_section, and there will be no problem if it eventually suceeds, original continity will be reflected in the sum. The problem will persist if it doesn't eventually succeed, when we're really out of memory. To protect against this, we could reset the continuity flag of the current row in the source when exiting on exception. Fixes #3583	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	de5c52f422	mutation_partition: Preserve continuity in case row merging with no tracker throws Example: p: row{key=A, cont=0} row{key=C, cont=1} this: row{key=C, cont=0} When we get to processing key=C, key=A was already moved to this, so p has stale continuity on key=C, which marks (-inf,C) as continuous, whereas it should mark only (A, C). That's not a problem if merging succeeds, but if exception happens at this point, we will violate the invariant which says that the sum of p and this should yield the same logical partition. It wouldn't because continuity of the sum is calculated as a set union, and (-inf, A) would be incorrectly turned into a continuous range. This is not a problem currently because continuity is always full when there is no tracker (memtables), so won't change anyway, and when there is a tracker (cache) we never merge but overwrite instead, so there is no memory allocation and thus no possibility for failure. But better be safe.	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	567da3e063	memtable, cache: Fix exception safety of partition entry insertions boost::intrusive::set::insert() may throw if keys require linearization and that fails, in which case we will leak the entry. When this happens in cache, we will also violate the invariant for entry eviction, which assumes all tracked entries are linked, and cause a SEGFAULT. Use the non-throwing and faster insert_before() instead. Where we can't use insert_before(), use alloc_strategy_unique_ptr<> to ensure that entry is deallocated on insert failure. Fixes #3585.	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	c82c0be0be	tests: mutation_diff: Ignore differences in memory addresses Differences in memory addresses are not necessarily differences in values. Refs #3571 Message-Id: <1531824919-12737-1-git-send-email-tgrabiec@scylladb.com>	2018-07-17 16:32:04 +03:00
Amos Kong	0fcdab8538	scylla_setup: nic setup dialog is only for interactive mode Current code raises dialog even for non-interactive mode when we pass options in executing scylla_setup. This blocked automatical artifact-test. Fixes #3549 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <58f90e1e2837f31d9333d7e9fb68ce05208323da.1531824972.git.amos@scylladb.com>	2018-07-17 16:31:18 +03:00
Paweł Dziepak	422d1eaeb9	Merge "Improve usability of pkeys in system.large_partitions table" from Avi " Partition keys are currently stored in serialized form in the system.large_partitions table. This is an obstacle to operators who usually can't deserialize partition keys in their heads. Improve the situation by deserializing the partition key for them. " * tag 'pkey-print/v1' of https://github.com/avikivity/scylla: large_partition_handler: output friendly partition key keys: schema-aware printing of a partition_key	2018-07-17 13:51:22 +01:00
Avi Kivity	002ac87aac	Update seastar submodule * seastar aac6cf1...6b97e00 (5): > Merge "changes to fix travis CI builds" from Kefu > tls.cc: Make "close" timeout delay exception proof > core/sharded: mark foreign_ptr::get_owner_shard() const > core/memory: Expose counter of large allocations > tests: add test for multi-fragmented net::packet Fixes #3461. Ref scylladb/seastar#474.	2018-07-17 15:43:01 +03:00

1 2 3 4 5 ...

16172 Commits