scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 00:50:35 +00:00

Author	SHA1	Message	Date
Kamil Braun	96bc78905d	readers: evictable_reader: don't accidentally consume the entire partition The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in #13491. There was already a fix in this area to handle `partition_start` fragments correctly - #13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in #13563) to detect this case. Fixes #13491	2023-06-27 14:37:29 +02:00
Kamil Braun	5800ce8ddd	test: flat_mutation_reader_assertions: squash `r_t_c`s with the same position test_range_tombstones_v2 is too strict for this reader -- it expects a particular sequence of `range_tombstone_change`s, but multishard_combining_reader, when tested with a small buffer, may generate -- as expected -- additional (redundant) range tombstone change pairs (end+start). Currently we don't observe these redundant fragments due to a bug in `evictable_reader_v2` but they start appearing once we fix the bug and the test must be prepared first. To prepare the test, modify `flat_reader_assertions_v2` so it squashes redundant range tombstone change pairs. This happens only in non-exact mode. Enable exact mode in `test_sstable_reversing_reader_random_schema` for comparing two readers -- the squashing of `r_t_c`s may introduce an artificial difference.	2023-06-27 14:37:25 +02:00
Botond Dénes	f5e3b8df6d	Merge 'Optimize creation of reader excluding staging for view building' from Raphael "Raph" Carvalho View building from staging creates a reader from scratch (memtable \+ sstables - staging) for every partition, in order to calculate the diff between new staging data and data in base sstable set, and then pushes the result into the view replicas. perf shows that the reader creation is very expensive: ``` + 12.15% 10.75% reactor-3 scylla [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes + 10.01% 9.99% reactor-3 scylla [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 8.95% 8.94% reactor-3 scylla [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator() + 7.29% 7.28% reactor-3 scylla [.] dht::ring_position_tri_compare + 6.28% 6.27% reactor-3 scylla [.] dht::tri_compare + 4.11% 3.52% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 4.09% 4.07% reactor-3 scylla [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state + 3.46% 0.93% reactor-3 scylla [.] sstables::sstable_run::will_introduce_overlapping + 2.53% 2.53% reactor-3 libstdc++.so.6 [.] std::_Rb_tree_increment + 2.45% 2.45% reactor-3 scylla [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.14% 2.13% reactor-3 scylla [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.07% 2.07% reactor-3 scylla [.] logalloc::region_impl::free + 2.06% 1.91% reactor-3 scylla [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()https://github.com/scylladb/scylladb/issues/1}::operator()() const::{lambda()https://github.com/scylladb/scylladb/issues/1}::operator() + 2.04% 2.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 1.87% 0.00% reactor-3 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.86% 0.00% reactor-3 [kernel.kallsyms] [k] do_syscall_64 + 1.39% 1.38% reactor-3 libc.so.6 [.] __memcmp_avx2_movbe + 1.37% 0.92% reactor-3 scylla [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables:: + 1.34% 1.33% reactor-3 scylla [.] logalloc::region_impl::alloc_small + 1.33% 1.33% reactor-3 scylla [.] seastar::memory::small_pool::add_more_objects + 1.30% 0.35% reactor-3 scylla [.] seastar::reactor::do_run + 1.29% 1.29% reactor-3 scylla [.] seastar::memory::allocate + 1.19% 0.05% reactor-3 libc.so.6 [.] syscall + 1.16% 1.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst + 1.07% 0.79% reactor-3 scylla [.] sstables::partitioned_sstable_set::insert ``` That shows some significant amount of work for inserting sstables into the interval map and maintaining the sstable run (which sorts fragments by first key and checks for overlapping). The interval map is known for having issues with L0 sstables, as it will have to be replicated almost to every single interval stored by the map, causing terrible space and time complexity. With enough L0 sstables, it can fall into quadratic behavior. This overhead is fixed by not building a new fresh sstable set when recreating the reader, but rather supplying a predicate to sstable set that will filter out staging sstables when creating either a single-key or range scan reader. This could have another benefit over today's approach which may incorrectly consider a staging sstable as non-staging, if the staging sst wasn't included in the current batch for view building. With this improvement, view building was measured to be 3x faster. from `INFO 2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s` to `INFO 2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s` Refs https://github.com/scylladb/scylladb/issues/14089. Fixes scylladb/scylladb#14244. Closes #14364 * github.com:scylladb/scylladb: table: Optimize creation of reader excluding staging for view building view_update_generator: Dump throughput and duration for view update from staging utils: Extract pretty printers into a header	2023-06-27 07:25:30 +03:00
Raphael S. Carvalho	1d8cb32a5d	table: Optimize creation of reader excluding staging for view building View building from staging creates a reader from scratch (memtable + sstables - staging) for every partition, in order to calculate the diff between new staging data and data in base sstable set, and then pushes the result into the view replicas. perf shows that the reader creation is very expensive: + 12.15% 10.75% reactor-3 scylla [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes + 10.01% 9.99% reactor-3 scylla [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 8.95% 8.94% reactor-3 scylla [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator() + 7.29% 7.28% reactor-3 scylla [.] dht::ring_position_tri_compare + 6.28% 6.27% reactor-3 scylla [.] dht::tri_compare + 4.11% 3.52% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 4.09% 4.07% reactor-3 scylla [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state + 3.46% 0.93% reactor-3 scylla [.] sstables::sstable_run::will_introduce_overlapping + 2.53% 2.53% reactor-3 libstdc++.so.6 [.] std::_Rb_tree_increment + 2.45% 2.45% reactor-3 scylla [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.14% 2.13% reactor-3 scylla [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.07% 2.07% reactor-3 scylla [.] logalloc::region_impl::free + 2.06% 1.91% reactor-3 scylla [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1}::operator()() const::{lambda()#1}::operator() + 2.04% 2.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 1.87% 0.00% reactor-3 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.86% 0.00% reactor-3 [kernel.kallsyms] [k] do_syscall_64 + 1.39% 1.38% reactor-3 libc.so.6 [.] __memcmp_avx2_movbe + 1.37% 0.92% reactor-3 scylla [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables:: + 1.34% 1.33% reactor-3 scylla [.] logalloc::region_impl::alloc_small + 1.33% 1.33% reactor-3 scylla [.] seastar::memory::small_pool::add_more_objects + 1.30% 0.35% reactor-3 scylla [.] seastar::reactor::do_run + 1.29% 1.29% reactor-3 scylla [.] seastar::memory::allocate + 1.19% 0.05% reactor-3 libc.so.6 [.] syscall + 1.16% 1.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst + 1.07% 0.79% reactor-3 scylla [.] sstables::partitioned_sstable_set::insert That shows some significant amount of work for inserting sstables into the interval map and maintaining the sstable run (which sorts fragments by first key and checks for overlapping). The interval map is known for having issues with L0 sstables, as it will have to be replicated almost to every single interval stored by the map, causing terrible space and time complexity. With enough L0 sstables, it can fall into quadratic behavior. This overhead is fixed by not building a new fresh sstable set when recreating the reader, but rather supplying a predicate to sstable set that will filter out staging sstables when creating either a single-key or range scan reader. This could have another benefit over today's approach which may incorrectly consider a staging sstable as non-staging, if the staging sst wasn't included in the current batch for view building. With this improvement, view building was measured to be 3x faster. from INFO 2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s to INFO 2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s Refs #14089. Fixes #14244. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 22:30:39 -03:00
Raphael S. Carvalho	83c70ac04f	utils: Extract pretty printers into a header Can be easily reused elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 21:58:20 -03:00
Alexey Novikov	ca4e7f91c6	compact and remove expired rows from cache on read when read from cache compact and expire row tombstones remove expired empty rows from cache do not expire range tombstones in this patch Refs #2252, #6033 Closes #12917	2023-06-26 15:29:01 +02:00
Alejo Sanchez	4999cbc1cf	test/boost/cql_functions_test: split long running tests Split long running test_aggregate_functions to one case per type. This allows test.py to run them in parallel. Before this it would take 18 minutes to run in debug mode. Afterwards each case takes 30-45 seconds. Refs #13905 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #14368	2023-06-26 11:29:36 +03:00
Alejo Sanchez	8b1968cfbb	test/boost/schema_changes_test: split long-running test Split long running test test_schema_changes in 3 parts, one for each writable_sstable_versions so it can be run in parallel by test.py. Add static checks to alert if the array of types changed. Original test takes around 24 minutes in debug mode, and each new split test takes around 8 minutes. Refs #13905 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #14367	2023-06-26 11:24:07 +03:00
Alejo Sanchez	633f026d63	test/boost/memtable_test: allow parallel run Remove previous configuration blocking parallel run. Test cases run fine in local debug. Refs #13905 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #14369	2023-06-26 11:23:43 +03:00
Alejo Sanchez	3cbfd863eb	test/boost/database_test: split long running tests Split long running tests test_database_with_data_in_sstables_is_a_mutation_source_plain and test_database_with_data_in_sstables_is_a_mutation_source_reverse. They run with x_log2_compaction_groups of 0 and 1, each one taking from 10 to 15 minutes each in debug mode, for a total of 28 and 22 minutes. Split the test cases to run with 0 and 1, so test.py can run them in parallel. Refs #13905 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #14356	2023-06-26 11:20:27 +03:00
Avi Kivity	b858a4669d	cql3: expr: break up expression.hh header Adding a function declaration to expression.hh causes many recompilations. Reduce that by: - moving some restrictions-related definitions to the existing expr/restrictions.hh - moving evaluation related names to a new header expr/evaluate.hh - move utilities to a new header expr/expr-utilities.hh expression.hh contains only expression definitions and the most basic and common helpers, like printing.	2023-06-22 14:21:03 +03:00
Kefu Chai	f014ccf369	Revert "Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai"" This reverts commit `562087beff`. The regressions introduced by the reverted change have been fixed. So let's revert this revert to resurrect the uuid_sstable_identifier_enabled support. Fixes #10459	2023-06-21 13:02:40 +03:00
Avi Kivity	e233f471b8	Merge 'Respect tablet shard assignment' from Tomasz Grabiec This PR changes the system to respect shard assignment to tablets in tablet metadata (system.tablets): 1. The tablet allocator is changed to distribute tablets evenly across shards taking into account currently allocated tablets in the system. Each tablet has equal weight. vnode load is ignored. 2. CDC subsystem was not adjusted (not supported yet) 3. sstable sharding metadata reflects tablet boundaries 5. resharding is NOT supported yet (the node will abort on boot if there is a need to reshard tablet-based tables) 6. The system is NOT prepared to handle tablet migration / topology changes in a safe way. 7. Sstable cleanup is not wired properly yet After this PR, dht::shard_of() and schema::get_sharder() are deprecated. One should use table::shard_of() and effective_replication_map::get_sharder() instead. To make the life easier, support was added to obtain table pointer from the schema pointer: ``` schema_ptr s; s->table().shard_of(...) ``` Closes #13939 * github.com:scylladb/scylladb: locator: network_topology_startegy: Allocate shards to tablets locator: Store node shard count in topology service: topology: Extract topology updating to a lambda test: Move test_tablets under topology_experimental sstables: Add trace-level logging related to shard calculation schema: Catch incorrect uses of schema::get_sharder() dht: Rename dht::shard_of() to dht::static_shard_of() treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() storage_proxy: Avoid multishard reader for tablets storage_proxy: Obtain shard from erm in the read path db, storage_proxy: Drop mutation/frozen_mutation ::shard_of() forward_service: Use table sharder alternator: Use table sharder db: multishard: Obtain sharder from erm sstable_directory: Improve trace-level logging db: table: Introduce shard_of() helper db: Use table sharder in compaction sstables: Compute sstable shards using sharder from erm when loading sstables: Generate sharding metadata using sharder from erm when writing test: partitioner: Test split_range_to_single_shard() on tablet-like sharder dht: Make split_range_to_single_shard() prepared for tablet sharder sstables: Move compute_shards_for_this_sstable() to load() dht: Take sharder externally in splitting functions locator: Make sharder accessible through effective_replication_map dht: sharder: Document guarantees about mapping stability tablets: Implement tablet sharder tablets: Include pending replica in get_shard() dht: sharder: Introduce next_shard() db: token_ring_table: Filter out tablet-based keyspaces db: schema: Attach table pointer to schema schema_registry: Fix SIGSEGV in learn() when concurrent with get_or_load() schema_registry: Make learn(schema_ptr) attach entry to the target schema test: lib: cql_test_env: Expose feature_service test: Extract throttle object to separate header	2023-06-21 10:20:41 +03:00
Calle Wilund	f18e967939	storage_proxy: Make split_stats resilient to being called from different scheduling group Fixes #11017 When doing writes, storage proxy creates types deriving from abstract_write_response_handler. These are created in the various scheduling groups executing the write inducing code. They pick up a group-local reference to the various metrics used by SP. Normally all code using (and esp. modifying) these metrics are executed in the same scheduling group. However, if gossip sees a node go down, it will notify listeners, which eventually calls get_ep_stat and register_metrics. This code (before this patch) uses _active_ scheduling group to eventually add metrics, using a local dict as guard against double regs. If, as described above, we're called in a different sched group than the original one however, this can cause double registrations. Fixed here by keeping a reference to creating scheduling group and using this, not active one, when/if creating new metrics. Closes #14294	2023-06-21 10:08:27 +03:00
Tomasz Grabiec	ebdebb982b	locator: network_topology_startegy: Allocate shards to tablets Uses a simple algorihtm for allocating shards which chooses least-loaded shard on a given node, encapsulated in load_sketch. Takes load due to current tablet allocation into account. Each tablet, new or allocated for other tables, is assumed to have an equal load weight.	2023-06-21 00:58:25 +02:00
Tomasz Grabiec	29cbdb812b	dht: Rename dht::shard_of() to dht::static_shard_of() This is in order to prevent new incorrect uses of dht::shard_of() to be accidentally added. Also, makes sure that all current uses are caught by the compiler and require an explicit rename.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	21198e8470	treewide: Replace dht::shard_of() uses with table::shard_of() / erm::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e48ec6fed3	db, storage_proxy: Drop mutation/frozen_mutation ::shard_of() dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	d92287f997	db: multishard: Obtain sharder from erm This is not strictly necessary, as the multishard reader will be later avoided altogether for tablet-based tables, but it is a step towards converting all code to use the erm->get_sharder() instead of schema::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	36da062bcb	db: Use table sharder in compaction	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	ad983ac23d	sstables: Compute sstable shards using sharder from erm when loading schema::get_sharder() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should obtain the sharder from erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	36e12020b9	test: partitioner: Test split_range_to_single_shard() on tablet-like sharder	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	28b972a588	dht: Make split_range_to_single_shard() prepared for tablet sharder The function currently assumes that shard assignment for subsequent tokens is round robin, which will not be the case for tablets. This can lead to incorrect split calculation or infinite loop. Another assumption was that subsequent splits returned by the sharder have distinct shards. This also doesn't hold for tablets, which may return the same shard for subsequent tokens. This assumption was embedded in the following line: start_token = sharder.token_for_next_shard(end_token, shard); If the range which starts with end_token is also owned by "shard", token_for_next_shard() would skip over it.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	390bcf3fae	dht: Take sharder externally in splitting functions We need those functions to work with tablet sharder, which is not accessible through schema::get_sharder(). In order to propagate the right sharder, those functions need to take it externally rather from the schema object. The sharder will come from the effective_replication_map attached to the table object. Those splitting functions are used when generating sharding metadata of an sstable. We need to keep this sharding metadata consistent with tablet mapping to shards in order for node restart to detect that those sstables belong to a single shard and that resharding is not necessary. Resharding of sstables based on tablet metadata is not implemented yet and will abort after this series. Keeping sharding metadata accurate for tablets is only necessary until compaction group integration is finished. After that, we can use the sstable token range to determine the owning tablet and thus the owning shard. Before that, we can't, because a single sstable may contain keys from different tablets, and the whole key range may overlap with keys which belong to other shards.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	22ab100b41	tablets: Implement tablet sharder	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e44e6033d8	tablets: Include pending replica in get_shard() We need to move get_shard() from tablet_info to tablet_map in order to have access to transition_info.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	2303466375	db: schema: Attach table pointer to schema This will make it easier to access table proprties in places which only have schema_ptr. This is in particular useful when replacing dht::shard_of() uses with s->table().shard_of(), now that sharding is no longer static, but table-specific. Also, it allows us to install a guard which catches invalid uses of schema::get_sharder() on tablet-based tables. It will be helpful for other uses as well. For example, we can now get rid of the static_props hack.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	ad6d2b42f2	test: Extract throttle object to separate header	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	87b4606cd6	Merge 'atomic_cell: compare value last' from Benny Halevy Currently, when two cells have the same write timestamp and both are alive or expiring, we compare their value first, before checking if either of them is expiring and if both are expiring, comparing their expiration time and ttl value to determine which of them will expire later or was written later. This was based on an early version of Cassandra. However, the Cassandra implementation rightfully changed in `e225c88a65` ([CASSANDRA-14592](https://issues.apache.org/jira/browse/CASSANDRA-14592)), where the cell expiration is considered before the cell value. To summarize, the motivation for this change is three fold: 1. Cassandra compatibility 2. Prevent an edge case where a null value is returned by select query when an expired cell has a larger value than a cell with later expiration. 3. A generalization of the above: value-based reconciliation may cause select query to return a mixture of upserts, if multiple upserts use the same timeastamp but have different expiration times. If the cell value is considered before expiration, the select result may contain cells from different inserts, while reconciling based the expiration times will choose cells consistently from either upserts, as all cells in the respective upsert will carry the same expiration time. Fixes #14182 Also, this series: - updates dml documentation - updates internal documentation - updates and adds unit tests and cql pytest reproducing #14182 Closes #14183 * github.com:scylladb/scylladb: docs: dml: add update ordering section cql-pytest: test_using_timestamp: add tests for rewrites using same timestamp mutation_partition: compare_row_marker_for_merge: consider ttl in case expiry is the same atomic_cell: compare_atomic_cell_for_merge: update and add documentation compare_atomic_cell_for_merge: compare value last for live cells mutation_test: test_cell_ordering: improve debuggability	2023-06-20 12:11:48 +02:00
Benny Halevy	761d62cd82	compare_atomic_cell_for_merge: compare value last for live cells Currently, when two cells have the same write timestamp and both are alive or expiring, we compare their value first, before checking if either of them is expiring and if both are expiring, comparing their expiration time and ttl value to determine which of them will expire later or was written later. This was changed in CASSANDRA-14592 for consistency with the preference for dead cells over live cells, as expiring cells will become tombstones at a future time and then they'd win over live cells with the same timestamp, hence they should win also before expiration. In addition, comparing the cell value before expiration can lead to unintuitive corner cases where rewriting a cell using the same timestamp but different TTL may cause scylla to return the cell with null value if it expired in the meanwhile. Also, when multiple columns are written using two upserts using the same write timestamp but with different expiration, selecting cells by their value may return a mixed result where each cell is selected individually from either upsert, by picking the cells with the largest values for each column, while using the expiration time to break tie will lead to a more consistent results where a set of cell from only one of the upserts will be selected. Fixes scylladb/scylladb#14182 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-20 10:10:39 +03:00
Benny Halevy	ec034b92c0	mutation_test: test_cell_ordering: improve debuggability Currently, it is hard to tell which of the many sub-cases fail in this unit test, in case any of them fails. This change uses logging in debug and trace level to help with that by reproducing the error with --logger-log-level testlog=trace (The cases are deterministic so reproducing should not be a problem) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-20 10:10:39 +03:00
Tomasz Grabiec	5fa08adc88	Merge 'cache_flat_mutation_reader: use the correct schema in prepare_hash' from Michał Chojnowski Since `mvcc: make schema upgrades gentle` (`51e3b9321b`), rows pointed to by the cursor can have different (older) schema than the schema of the cursor's snapshot. However, one place in the code wasn't updated accordingly, causing a row to be processed with the wrong schema in the right circumstances. This passed through unit testing because it requires a digest-computing cache read after a schema change, and no test exercised this. This series fixes the bug and adds a unit test which reproduces the issue. Fixes #14110 Closes #14305 * github.com:scylladb/scylladb: test: boost/row_cache_test: add a reproducer for #14110 cache_flat_mutation_reader: use the correct schema in prepare_hash mutation: mutation_cleaner: add pause()	2023-06-20 01:30:11 +02:00
Michał Chojnowski	02bcb5d539	test: boost/row_cache_test: add a reproducer for #14110	2023-06-19 22:50:46 +02:00
Botond Dénes	bd7a3e5871	Merge 'Sanitize sstables-making utils in tests' from Pavel Emelyanov There are tons of wrappers that help test cases make sstables for their needs. And lots of code duplication in test cases that do parts of those helpers' work on their own. This set cleans some bits of those Closes #14280 * github.com:scylladb/scylladb: test/utils: Generalize making memtable from vector<mutation> test/util: Generalize make_sstable_easy()-s test/sstable_mutation: Remove useless helper test/sstable_mutation: Make writer config in make_sstable_mutation_source() test/utils: De-duplicate make_sstable_containing-s test/sstable_compaction: Remove useless one-line local lambda test/sstable_compaction: Simplify sstable making test/sstables*: Make sstable from vector of mutations test/mutation_reader: Remove create_sstable() helper from test	2023-06-19 14:05:29 +03:00
Pavel Emelyanov	6bec03f96f	test: Remove sstable_utils' storage_prefix() helper It's excessive, test case that needs it can get storage prefix without this fancy wrapper-helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14273	2023-06-19 13:51:04 +03:00
Pavel Emelyanov	1a332ef5e2	test: Check sstable bytes correctness on S3 too Commit `4e205650` (test: Verify correctness of sstable::bytes_on_disk()) added a test to verify that sstable::bytes_on_disk() is equal to the real size of real files. The same test case makes sense for S3-backed sstables as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14272	2023-06-19 13:47:31 +03:00
Nadav Har'El	ac3d0d4460	Merge 'cql3: expr: support evaluate(column_mutation_attribute)' from Avi Kivity In preparation for converting selectors to evaluate expressions, add support for evaluating column_mutation_attribute (representing the WRITETIME/TTL pseudo-functions). A unit test is added. Fixes #12906 Closes #14287 * github.com:scylladb/scylladb: test: expr: test evaluation of column_mutation_attribute test: lib: enhance make_evaluation_inputs() with support for ttls/timestamps cql3: expr: evaluate() column_mutation_attribute	2023-06-19 11:11:49 +03:00
Botond Dénes	562087beff	Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai" This reverts commit `d1dc579062`, reversing changes made to `3a73048bc9`. Said commit caused regressions in dtests. We need to investigate and fix those, but in the meanwhile let's revert this to reduce the disruption to our workflows. Refs: #14283	2023-06-19 08:49:27 +03:00
Avi Kivity	0f98e9f8c8	test: expr: test evaluation of column_mutation_attribute There's no way to evaluate a column_mutation_attribute via CQL yet (the only user uses old-style cql3::selection::selector), so we only supply a unit test.	2023-06-18 22:47:46 +03:00
Nadav Har'El	97d444bbf7	Merge 'cql3/expression: implement evaluate(field_selection) ' from Jan Ciołek Implement `expr:valuate()` for `expr::field_selection`. `field_selection` is used to represent access to a struct field. For example, with a UDT value: ``` CREATE TYPE my_type (a int, b int); ``` The expression `my_type_value.a` would be represented as a `field_selection`, which selects the field `a`. Evaluating such an expression consists of finding the right element's value in a serialized UDT value and returning it. Note that it's still not possible to use `field_selection` inside the `WHERE` clause. Enabling it would require changes to the grammar, as well as query planning, Current `statement_restrictions` just reacts with `on_internal_error` when it encounters a `field_selection`. Nonetheless it's a step towards relaxing the grammar, and now it's finally possible to evaluate all kinds of prepared expressions (#12906) Fixes: https://github.com/scylladb/scylladb/issues/12906 Closes #14235 * github.com:scylladb/scylladb: boost/expr_test: test evaluate(field_selection) cql3/expr: fix printing of field_selection cql3/expression: implement evaluate(field_selection) types/user: modify idx_of_field to use bytes_view column_identifer: add column_identifier_raw::text() types: add read_nth_user_type_field() types: add read_nth_tuple_element()	2023-06-18 11:08:25 +03:00
Pavel Emelyanov	85310bc043	test/sstable_mutation: Remove useless helper There are two make_sstable_mutation_source() helpers that call one another and test cases only need one of them, so leave just one that's in use. Also don't pass env's tempdir to make_sstable() util call, it can get env's tempdir on its own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-16 21:21:40 +03:00
Pavel Emelyanov	4a7be304ac	test/sstable_mutation: Make writer config in make_sstable_mutation_source() These local helpers accept writer config which's made the same way by callers, so the helpers can do it on their own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-16 21:20:50 +03:00
Pavel Emelyanov	753b674c31	test/sstable_compaction: Remove useless one-line local lambda The get_usable_sst() wrapper lambda is not needed, calling the make_sstable_containing() is shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-16 21:19:15 +03:00
Pavel Emelyanov	5b46993438	test/sstable_compaction: Simplify sstable making There's a temporary memtable and on-stack lambda that makes the mutation. Both are overkill, make_sstable_containing() can work on just plan on-stack-constructed mutation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-16 21:18:13 +03:00
Pavel Emelyanov	ce29f41436	test/sstables*: Make sstable from vector of mutations There are many cases that want to call make_sstable_containing() with the vector of mutations at hand. For that they apply it to a temporary memtable, but sstable-utils can work with the mutations vector as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-16 21:17:12 +03:00
Pavel Emelyanov	c2eb3e2c4c	test/mutation_reader: Remove create_sstable() helper from test It's a one-liner wrapper, caller can get the same result with existing utils facilities Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-06-16 21:16:34 +03:00
Pavel Emelyanov	900c609269	Merge 'Initialize `query_processor` early, without `messaging_service` or `gossiper`' from Kamil Braun In https://github.com/scylladb/scylladb/pull/14231 we split `storage_proxy` initialization into two phases: for local and remote parts. Here we do the same with `query_processor`. This allows performing queries for local tables early in the Scylla startup procedure, before we initialize services used for cluster communication such as `messaging_service` or `gossiper`. Fixes: #14202 As a follow-up we will simplify `system_keyspace` initialization, making it available earlier as well. Closes #14256 * github.com:scylladb/scylladb: main, cql_test_env: start `query_processor` early cql3: query_processor: split `remote` initialization step cql3: query_processor: move `migration_manager&`, `forwarder&`, `group0_client&` to a `remote` object cql3: query_processor: make `forwarder()` private cql3: query_processor: make `get_group0_client()` private cql3: strongly_consistent_modification_statement: fix indentation cql3: query_processor: make `get_migration_manager` private tracing: remove `qp.get_migration_manager()` calls table_helper: remove `qp.get_migration_manager()` calls thrift: handler: move implementation of `execute_schema_command` to `query_processor` data_dictionary: add `get_version` cql3: statements: schema_altering_statement: move `execute0` to `query_processor` cql3: statements: pass `migration_manager&` explicitly to `prepare_schema_mutations` main: add missing `supervisor::notify` message	2023-06-16 17:41:08 +03:00
Jan Ciolek	d6728a7eb5	boost/expr_test: test evaluate(field_selection) Add a unit test which tests evaluating field selections. Alas at the moment it's impossible to add a cql-pytest, as the grammar and query planning doesn't handle field selections inside the WHERE clause. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-16 01:21:02 +02:00
Nadav Har'El	e1513f1199	Merge 'cql3: prepare selectors' from Avi Kivity CQL statements carry expressions in many contexts: the SELECT, WHERE, SET, and IF clauses, plus various attributes. Previously, each of these contexts had its own representation for an expression, and another one for the same expression but before preparation. We have been gradually moving towards a uniform representation of expressions. This series tackles SELECT clause elements (selectors), in their unprepared phase. It's relatively simple since there are only five types of expression components (column references, writetime/ttl modifiers, function calls, casts, and field selections). Nevertheless, there isn't much commonality with previously converted expression elements so quite a lot of code is involved. After the series, we are still left with a custom post-prepare representation of expressions. It's quite complicated since it deals with two passes, for aggregation, so it will be left for another series. Closes #14219 * github.com:scylladb/scylladb: cql3: seletor: drop inheritance from assignment_testable cql3: selection: rely on prepared expressions cql3: selection: prepare selector expressions cql3: expr: match counter arguments to function parameters expecting bigint cql3: expr: avoid function constant-folding if a thread is needed cql3: add optional type annotation to assignment_testable cql3: expr: wire unresolved_identifier to test_assignment() cql3: expr: support preparing column_mutation_attribute cql3: expr: support preparing SQL-style casts cql3: expr: support preparing field_selection expressions cql3: expr: make the two styles of cast expressions explicit cql3: error injection functions: mark enabled_injections() as impure cql3: eliminate dynamic_cast<selector> from functions::get() cql3: test_assignment: pass optional schema everywhere cql3: expr: prepare_expr(): allow aggregate functions cql3: add checks for aggregation functions after prepare cql3: expr: add verify_no_aggregate_functions() helper test: add regression test for rejection of aggregates in the WHERE clause cql3: expr: extract column_mutation_attribute_type cql3: expr: add fmt formatter for column_mutation_attribute_kind cql3: statements: select_statement: reuse to_selectable() computation in SELECT JSON	2023-06-15 15:59:41 +03:00
Kefu Chai	2d265e860d	replica,sstable: introduce invalid generation id the invalid sstable id is the NULL of a sstable identifier. with this concept, it would be a lot simpler to find/track the greatest generation. the complexity is hidden in the generation_type, which compares the a) integer-based identifiers b) uuid-based identifiers c) invalid identitifer in different ways. so, in this change * the default constructor generation_type is now public. * we don't check for empty generation anymore when loading SSTables or enumerating them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-06-15 17:54:59 +08:00

1 2 3 4 5 ...

2623 Commits