scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 07:53:20 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	4e9d95d78c	Merge 'Compact data before streaming' from Botond Dénes Currently, streaming and repair processes and sends data as-is. This is wasteful: streaming might be sending data which is expired or covered by tombstones, taking up valuable bandwidth and processing time. Repair additionally could be exposed to artificial differences, due to different nodes being in different states of compactness. This PR adds opt-in compaction to `make_streaming_reader()`, then opts in all users. The main difference being in how these choose the current compaction time to use: * Load'n'stream and streaming uses the current time on the local node. * Repair uses a centrally chosen compaction time, generated on the repair master and propagated to al repair followers. This is to ensure all repair participants work with the exact state of compactness. Importantly, this compaction does not purge tombstones (tombstone GC is disabled completely). Fixes: https://github.com/scylladb/scylladb/issues/3561 Closes #14756 * github.com:scylladb/scylladb: replica: make_[multishard_]streaming_reader(): make compaction_time mandatory repair/row_level: opt in to compacting the stream streaming: opt-in to compacting the stream sstables_loader: opt-in for compacting the stream replica/table: add optional compacting to make_multishard_streaming_reader() replica/table: add optional compacting to make_streaming_reader() db/config: add config item for enabling compaction for streaming and repair repair: log the error which caused the repair to fail readers: compacting_reader: use compact_mutation_state::abandon_current_partition() mutation/mutation_compactor: allow user to abandon current partition	2023-07-28 16:42:13 +02:00
Kefu Chai	cc2bbde8f1	test: use BOOST_CHECK_EQUAL when appropriate in compaction_manager_basic_test compaction_manager_basic_test checks the stats of compaction_manager to verify that there are no ongoing or pending compactions after the triggering the compaction and waiting for its completion. but in #14865, there are still active compaction(s) after the compaction_manager's stats shows there is at least one task completed. to understand this issue better, let's use `BOOST_CHECK_EQUAL()` instead of `BOOST_REQUIRE()`, so that the test does not error out when the check fails, and we can have better understanding of the status when the test fails. Refs #14865 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14872	2023-07-28 15:45:07 +03:00
Avi Kivity	cf81eef370	Merge 'schema_mutations, migration_manager: Ignore empty partitions in per-table digest' from Tomasz Grabiec Schema digest is calculated by querying for mutations of all schema tables, then compacting them so that all tombstones in them are dropped. However, even if the mutation becomes empty after compaction, we still feed its partition key. If the same mutations were compacted prior to the query, because the tombstones expire, we won't get any mutation at all and won't feed the partition key. So schema digest will change once an empty partition of some schema table is compacted away. Tombstones expire 7 days after schema change which introduces them. If one of the nodes is restarted after that, it will compute a different table schema digest on boot. This may cause performance problems. When sending a request from coordinator to replica, the replica needs schema_ptr of exact schema version request by the coordinator. If it doesn't know that version, it will request it from the coordinator and perform a full schema merge. This adds latency to every such request. Schema versions which are not referenced are currently kept in cache for only 1 second, so if request flow has low-enough rate, this situation results in perpetual schema pulls. After `ae8d2a550d` (5.2.0), it is more liekly to run into this situation, because table creation generates tombstones for all schema tables relevant to the table, even the ones which will be otherwise empty for the new table (e.g. computed_columns). This change inroduces a cluster feature which when enabled will change digest calculation to be insensitive to expiry by ignoring empty partitions in digest calculation. When the feature is enabled, schema_ptrs are reloaded so that the window of discrepancy during transition is short and no rolling restart is required. A similar problem was fixed for per-node digest calculation in c2ba94dc39e4add9db213751295fb17b95e6b962. Per-table digest calculation was not fixed at that time because we didn't persist enabled features and they were not enabled early-enough on boot for us to depend on them in digest calculation. Now they are enabled before non-system tables are loaded so digest calculation can rely on cluster features. Fixes #4485. Manually tested using ccm on cluster upgrade scenarios and node restarts. Closes #14441 * github.com:scylladb/scylladb: test: schema_change_test: Verify digests also with TABLE_DIGEST_INSENSITIVE_TO_EXPIRY enabled schema_mutations, migration_manager: Ignore empty partitions in per-table digest migration_manager, schema_tables: Implement migration_manager::reload_schema() schema_tables: Avoid crashing when table selector has only one kind of tables	2023-07-28 00:01:33 +03:00
Alexey Novikov	ff721ec3e3	make timestamp string format cassandra compatible when we convert timestamp into string it must look like: '2017-12-27T11:57:42.500Z' it concerns any conversion except JSON timestamp format JSON string has space as time separator and must look like: '2017-12-27 11:57:42.500Z' both formats always contain milliseconds and timezone specification Fixes #14518 Fixes #7997 Closes #14726	2023-07-27 12:01:09 +03:00
Botond Dénes	fdaf908967	repair/row_level: opt in to compacting the stream Using a centrally generated compaction-time, generated on the repair master and propagated to all repair followers. For repair it is imperative that all participants use the exact same compaction time, otherwise there can be artificial differences between participants, generating unnecessary repair activity. If a repair follower doesn't get a compaction-time from the repair master, it uses a locally generated one. This is no worse than the previous state of each node being on some undefined state of compaction.	2023-07-27 04:57:50 -04:00
Botond Dénes	2f8d77e97b	replica/table: add optional compacting to make_multishard_streaming_reader() Doing to make_multishard_streaming_reader() what the previous commit did to make_streaming_reader(). In fact, the new compaction_time parameter is simply forwarded to the make_streaming_reader() on the shard readers. Call sites are updated, but none opt in just yet.	2023-07-27 03:22:11 -04:00
Raphael S. Carvalho	050ce9ef1d	cached_file: Evict unused pages that aren't linked to LRU yet It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes #14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14818	2023-07-27 00:01:46 +02:00
Nadav Har'El	056d04954c	Merge 'view_updating_consumer: account empty partitions memory usage' from Botond Dénes Te view updating consumer uses `_buffer_size` to decide when to flush the accumulated mutations, passing them to the actual view building code. This `_buffer_size` is incremented every time a mutation fragment is consumed. This is not exact, as e.g. range tombstones are represented differently in the mutation object, than in the fragment, but it is good enough. There is one flaw however: `_buffer_size` is not incremented when consuming a partition-start fragment. This is when the mutation object is created in the mutation rebuilder. This is not a big problem when partition have many rows, but if the partitions are tiny, the error in accounting quickly becomes significant. If the partitions are empty, `_buffer_size` is not bumped at all for empty partitions, and any number of these can accumulate in the buffer. We have recently seen this causing stalls and OOM as the buffer got to immense size, only containing empty and tiny partitions. This PR fixes this by accounting the size of the freshly created `mutation` object in `_buffer_size`, after the partition-start fragment is consumed. Fixes: #14819 Closes #14821 * github.com:scylladb/scylladb: test/boost/view_build_test: add test_view_update_generator_buffering_with_empty_mutations db/view/view_updating_consumer: account for the size of mutations mutation/mutation_rebuilder*: return const mutation& from consume_new_partition() mutation/mutation: add memory_usage()	2023-07-26 20:04:28 +03:00
Avi Kivity	ff1f461a42	Merge 'Introduce tablet load balancer' from Tomasz Grabiec After this series, tablet replication can handle the scenario of bootstrapping new nodes. The ownership is distributed indirectly by the means of a load-balancer which moves tablets around in the background. See docs/dev/topology-over-raft.md for details. The implementation is by no means meant to be perfect, especially in terms of performance, and will be improved incrementally. The load balancer will be also kicked by schema changes, so that allocation/deallocation done during table creation/drop will be rebalanced. Tablet data is streamed using existing `range_streamer`, which is the infrastructure for "the old streaming". This will be later replaced by sstable transfer once integration of tablets with compaction groups is finished. Also, cleanup is not wired yet, also blocked by compaction group integration. Closes #14601 * github.com:scylladb/scylladb: tests: test_tablets: Add test for bootstraping a node storage_service: topology_coordinator: Implement tablet migration state machine tablets: Introduce tablet_mutation_builder service: tablet_allocator: Introduce tablet load balancer tablets: Introduce tablet_map::for_each_tablet() topology: Introduce get_node() token_metadata: Add non-const getter of tablet_metadata storage_service: Notify topology state machine after applying schema change storage_service: Implement stream_tablet RPC tablets: Introduce global_tablet_id stream_transfer_task, multishard_writer: Work with table sharder tablets: Turn tablet_id into a struct db: Do not create per-keyspace erm for tablet-based tables tablets: effective_replication_map: Take transition stage into account when computing replicas tablets: Store "stage" in transition info doc: Document tablet migration state machine and load balancer locator: erm: Make get_endpoints_for_reading() always return read replicas storage_service: topology_coordinator: Sleep on failure between retries storage_service: topology_coordinator: Simplify coordinator loop main: Require experimental raft to enable tablets	2023-07-26 12:30:29 +03:00
Botond Dénes	d0f725c1b9	test/boost/view_build_test: add test_view_update_generator_buffering_with_empty_mutations A test reproducing #14819, that is, the view update builder not flushing the buffer when only empty partitions are consumed (with only a tombstone in them).	2023-07-26 03:09:53 -04:00
Botond Dénes	ad2ddffb22	Merge 'Remove qctx from system_keyspace::save_truncation_record()' from Pavel Emelyanov The method is called by db::truncate_table_on_all_shards(), its call-chain, in turn, starts from - proxy::remote::handle_truncate() - schema_tables::merge_schema() - legacy_schema_migrator - tests All of the above are easy to get system_keyspace reference from. This, in turn, allows making the method non-static and use query_processor reference from system_keyspace object in stead of global qctx Closes #14778 * github.com:scylladb/scylladb: system_keyspace: Make save_truncation_record() non-static code: Pass sharded<db::system_keyspace>& to database::truncate() db: Add sharded<system_keyspace>& to legacy_schema_migrator	2023-07-26 08:48:49 +03:00
Tomasz Grabiec	5c681a1d63	tablets: Introduce tablet_mutation_builder	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	6f4a35f9ae	service: tablet_allocator: Introduce tablet load balancer Will be invoked by the topology coordinator later to decide which tablets to migrate.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	f88220aeee	stream_transfer_task, multishard_writer: Work with table sharder So that we can use it on tablet-based tables.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	8cf92d4c86	tablets: Turn tablet_id into a struct The IDL compiler cannot deal with enum classes like this.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	dc2ec3f81c	tablets: Store "stage" in transition info It's needed to implement tablet migration. It stores the current step of tablet migration state machine. The state machine will be advanced by the topology change coordinator. See the "Tablet migration" section of topology-over-raft.md	2023-07-25 21:08:02 +02:00
Tomasz Grabiec	7851694eaa	locator: erm: Make get_endpoints_for_reading() always return read replicas Just a simplification. Drop the test case from token_metadata which creates pending endpoints without normal tokens. It fails after this change with exception: "sorted_tokens is empty in first_token_index!" thrown from token_metadata::first_token_index(), which is used when calculating normal endpoints. This test case is not valid, first node inserts its tokens as normal without going through bootstrap procedure.	2023-07-25 21:08:01 +02:00
Botond Dénes	3eec990e4e	Merge 'test: use different table names in simple_backlog_controller_test ' from Kefu Chai in this series, we use different table names in simple_backlog_controller_test. this test is a test exercising sstables compaction strategies. and it creates and keeps multiple tables in a single test session. but we are going to add metrics on per-table basis, and will use the table's ks and cf as the counter's labels. as the metrics subsystem does not allow multiple counters to share the same label. the test will fail when the metrics are being added. to address this problem, in this change 1. a new ctor is added for `simple_schema`, so we can create `simple_schema` with different names 2. use the new ctor in simple_backlog_controller_test Fixes #14767 Closes #14783 * github.com:scylladb/scylladb: test: use different table names in simple_backlog_controller_test test/lib/simple_schema: add ctor for customizing ks.cf test/lib/simple_schema: do not hardwire ks.cf	2023-07-25 10:26:33 +03:00
Botond Dénes	a8feb7428d	Merge 'semaphore mismatch: don't throw an error if both semaphores belong to user' from Michał Jadwiszczak If semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error. Until now, semaphore mismatch was only checked in multi-partition queries. The PR pushes the check to `querier_cache` and perform it on all `lookup__querier` methods. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader. This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it. Refers: https://github.com/scylladb/scylla-enterprise/issues/3182 Refers: https://github.com/scylladb/scylla-enterprise/issues/3050 Closes: #14770 Closes #14736 github.com:scylladb/scylladb: querier_cache: add stats of scheduling group mismatches querier_cache: check semaphore mismatch during querier lookup querier_cache: add reference to `replica::database::is_user_semaphore()` replica:database: add method to determine if semaphore is user one	2023-07-24 14:13:09 +03:00
Michał Jadwiszczak	a5fc53aa11	querier_cache: check semaphore mismatch during querier lookup Previously semaphore mismatch was checked only in multi-partition queries and if happened, an internal error was thrown. This commit pushed the check down to `querier_cache`, so each `lookup_*_querier` method will check for the mismatch. What's more, if semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning and drop cached reader instead of throwing an error. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader.	2023-07-21 19:05:50 +02:00
Michał Jadwiszczak	e5c965b280	querier_cache: add reference to `replica::database::is_user_semaphore()`	2023-07-21 18:58:57 +02:00
Kefu Chai	d78c6d5f50	test: use different table names in simple_backlog_controller_test in `simple_backlog_controller_test`, we need to have multiple tables at the same time. but the default constructor of `simple_schema` always creates schema with the table name of "ks.cf". we are going to have a per-table metrics. and the new metric group will use the table name as its counter labels, so we need to either disable this per-table metrics or use a different table name for each table. as in real world, we don't have multiple tables at the same time. it would be better to stop reusing the same table name in a single test session. so, in this change, we use a random cf_name for each of the created table. Fixes #14767 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-21 19:08:29 +08:00
Pavel Emelyanov	eaeffcdb81	code: Pass sharded<db::system_keyspace>& to database::truncate() The arguments goes via the db::(drop\|truncate)_table_on_all_shards() pair of calls that start from - storage_proxy::remote: has its sys.ks reference already - schema_tables::merge_schema: has sys.ks argument already - legacy_schema_migrator: the reference was added by previous patch - tests: run in cql_test_env with sys.ks on board Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-21 13:11:59 +03:00
Nadav Har'El	5860820934	Merge 'mutation/mutation_compactor: validate the input stream' from Botond Dénes The mutation compactor has a validator which it uses to validate the stream of mutation fragments that passes through it. This validator is supposed to validate the stream as it enters the compactor, as opposed to its compacted form (output). This was true for most fragment kinds except range tombstones, as purged range tombstones were not visible to the validator for the most part. This mistake was introduced by https://github.com/scylladb/scylladb/commit `e2c9cdb576`, which itself was a flawed attempt at fixing an error seen because purged tombstones were not terminated by the compactor. This patch corrects this mistake by fixing the above problem properly: on page-cut, if the validator has an active tombstone, a closing tombstone is generated for it, to avoid the false-positive error. With this, range tombstones can be validated again as they come in. The existing unit test checking the validation in the compactor is greatly expanded to check all (I hope) different validation scenarios. Closes #13817 * github.com:scylladb/scylladb: test/mutation_test: test_compactor_validator_sanity_test mutation/mutation_compactor: fix indentation mutation/mutation_compactor: validate the input stream mutation: mutation_fragment_stream_validating_filter: add accessor to underlying validator readers: reader-from-fragment: don't modify stream when created without range	2023-07-21 00:26:46 +03:00
Pavel Emelyanov	98609e2115	Merge 's3/test: close using deferred_close() or deferred()' from Kefu Chai let's use RAII to tear down the client and the input file, so we can always perform the cleanups even if the test throws. Closes #14765 * github.com:scylladb/scylladb: s3/test: use seastar::deferred() to perform cleanup s3/test: close using deferred_close()	2023-07-20 20:05:34 +03:00
Botond Dénes	53da97416a	Merge 'Remove qctx from system.paxos table access methods' from Pavel Emelyanov The "fix" is straightforward -- callers of system_keyspace::paxos methods need to get system keyspace from somewhere. This time the only caller is storage_proxy::remote that can have system keyspace via direct dependency reference. Closes #14758 * github.com:scylladb/scylladb: db/system_keyspace: Move and use qctx::execute_cql_with_timeout() db/system_keyspace: Make paxos methods non-static service/paxos: Add db::system_keyspace& argument to some methods test: Optionally initialize proxy remote for cql_test_env proxy/remote: Keep sharded<db::system_keyspace>& dependency	2023-07-20 16:53:25 +03:00
Botond Dénes	e62325babc	Merge 'Compaction reshard task' from Aleksandra Martyniuk Task manager tasks covering reshard compaction. Reattempt on https://github.com/scylladb/scylladb/pull/14044. Bugfix for https://github.com/scylladb/scylladb/issues/14618 is squashed with 95191f4. Regression test added. Closes #14739 * github.com:scylladb/scylladb: test: add test for resharding with non-empty owned_ranges_ptr test: extend test_compaction_task.py to test resharding compaction compaction: add shard_reshard_sstables_compaction_task_impl compaction: invoke resharding on sharded database compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run() compaction: add reshard_sstables_compaction_task_impl compaction: create resharding_compaction_task_impl	2023-07-20 16:43:22 +03:00
Botond Dénes	a35f4f6985	test/mutation_test: test_compactor_validator_sanity_test Greatly expand this test to check that the compactor validates the input stream properly. The test is renamed (the _sanity_test suffix is removed) to reflect the expanded scope.	2023-07-20 08:48:50 -04:00
Raphael S. Carvalho	3117f2f066	tests: Add test for table's mutation source excluding staging Commit `f5e3b8df6d` introduced an optimization for as_mutation_source_excluding_staging() and added a test that verifies correctness of single key and range reads based on supplied predicates. This new test aims to improve the coverage by testing directly both table::as_mutation_source() and as_mutation_source_excluding_staging(), therefore guaranteeing that both supply the correct predicate to sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14763	2023-07-20 07:14:36 +03:00
Kefu Chai	77faec4f38	s3/test: use seastar::deferred() to perform cleanup let's use RAII to remove the object use as a fixture, so we don't leave some object in the bucket for testing. this might interfere with other tests which share the same minio server with the test which fails to do its clean up if an exception is thrown. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-20 10:04:54 +08:00
Kefu Chai	7a9c802fc3	s3/test: close using deferred_close() let's use RAII to tear down the client and the input file, so we can always perform the cleanups even if the test throws. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-20 10:04:54 +08:00
Pavel Emelyanov	ea9db1b35c	Merge 'cql3: expr: remove the default constructor' from Avi Kivity `expression`'s default constructor is dangerous as an it can leak into computations and generate surprising results. Fix that by removing the default constructor. This is made somewhat difficult by the parser generator's reliance on default construction, and we need to expand our workaround (`uninitialized<>`) capabilities to do so. We also remove some incidental uses of default-constructed expressions. Closes #14706 * github.com:scylladb/scylladb: cql3: expr: make expression non-default-constructible cql3: grammar: don't default-construct expressions cql3: grammar: improve uninitialized<> flexibility cql3: grammar: adjust uninitialized<> wrapper test: expr_test: don't invoke expression's default constructor cql3: statement_restrictions: explicitly initialize expressions in index match code cql3: statement_restrictions: explicitly intitialize some expression fields cql3: statement_restrictions: avoid expression's default constructor when classifying restrictions cql3: expr: prepare_expression: avoid default-constructed expression cql3: broadcast_tables: prepare new_value without relying on expression default constructor	2023-07-19 21:46:03 +03:00
Pavel Emelyanov	b4fc1076e3	test: Optionally initialize proxy remote for cql_test_env Some test cases that use cql_test_env involve paxos state updates. Since this update is becoming via proxy->remote->system_keyspace those test cases need cql_test_env to initialize the remote part of the proxy too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-19 19:32:10 +03:00
Aleksandra Martyniuk	bfb81b8cdd	test: add test for resharding with non-empty owned_ranges_ptr	2023-07-19 17:19:10 +02:00
Kefu Chai	665135553d	build: cmake: remove nonexistent test the test of "type_json_test" was added locally, and has not landed on master. but it somehow was spilled into `87170bf07a` by accident. so, let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14749	2023-07-19 11:58:34 +03:00
Avi Kivity	460b28d067	Merge 'Introduce `SELECT MUTATION FRAGMENTS` statement' from Botond Dénes SELECT MUTATION FRAGMENTS is a new select statement sub-type, which allows dumping the underling mutations making up the data of a given table. The output of this statement is mutation-fragments presented as CQL rows. Each row corresponds to a mutation-fragment. Subsequently, the output of this statement has a schema that is different than that of the underlying table. The output schema is derived from the table's schema, as following: * The table's partition key is copied over as-is * The clustering key is formed from the following columns: - mutation_source (text): the kind of the mutation source, one of: memtable, row-cache or sstable; and the identifier of the individual mutation source. - partition_region (int): represents the enum with the same name. - the copy of the table's clustering columns - position_weight (int): -1, 0 or 1, has the same meaning as that in position_in_partition, used to disambiguate range tombstone changes with the same clustering key, from rows and from each other. * The following regular columns: - metadata (text): the JSON representation of the mutation-fragment's metadata. - value (text): the JSON representation of the mutation-fragment's value. Data is always read from the local replica, on which the query is executed. Migrating queries between coordinators is frobidden. More details in the documentation commit (last commit). Example: ```cql cqlsh> CREATE TABLE ks.tbl (pk int, ck int, v int, PRIMARY KEY (pk, ck)); cqlsh> DELETE FROM ks.tbl WHERE pk = 0; cqlsh> DELETE FROM ks.tbl WHERE pk = 0 AND ck > 0 AND ck < 2; cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 0, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 1, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 2, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (1, 0, 0); cqlsh> SELECT * FROM ks.tbl; pk \| ck \| v ----+----+--- 1 \| 0 \| 0 0 \| 0 \| 0 0 \| 1 \| 0 0 \| 2 \| 0 (4 rows) cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl); pk \| mutation_source \| partition_region \| ck \| position_weight \| metadata \| mutation_fragment_kind \| value ----+-----------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+----------- 1 \| memtable:0 \| 0 \| \| \| {"tombstone":{}} \| partition start \| null 1 \| memtable:0 \| 2 \| 0 \| 0 \| {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} \| clustering row \| {"v":"0"} 1 \| memtable:0 \| 3 \| \| \| null \| partition end \| null 0 \| memtable:0 \| 0 \| \| \| {"tombstone":{"timestamp":1688122848686316,"deletion_time":"2023-06-30 11:00:48z"}} \| partition start \| null 0 \| memtable:0 \| 2 \| 0 \| 0 \| {"marker":{"timestamp":1688122860037077},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122860037077}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 2 \| 0 \| 1 \| {"tombstone":{"timestamp":1688122853571709,"deletion_time":"2023-06-30 11:00:53z"}} \| range tombstone change \| null 0 \| memtable:0 \| 2 \| 1 \| 0 \| {"marker":{"timestamp":1688122864641920},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122864641920}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 2 \| 2 \| -1 \| {"tombstone":{}} \| range tombstone change \| null 0 \| memtable:0 \| 2 \| 2 \| 0 \| {"marker":{"timestamp":1688122868706989},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122868706989}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 3 \| \| \| null \| partition end \| null (10 rows) ``` Perf simple query: ``` /build/release/scylla perf-simple-query -c1 -m2G --duration=60 ``` Before: ``` median 141596.39 tps ( 62.1 allocs/op, 13.1 tasks/op, 43688 insns/op, 0 errors) median absolute deviation: 137.15 maximum: 142173.32 minimum: 140492.37 ``` After: ``` median 141889.95 tps ( 62.1 allocs/op, 13.1 tasks/op, 43692 insns/op, 0 errors) median absolute deviation: 167.04 maximum: 142380.26 minimum: 141025.51 ``` Fixes: https://github.com/scylladb/scylladb/issues/11130 Closes #14347 * github.com:scylladb/scylladb: docs/operating-scylla/admin-tools: add documentation for the SELECT * FROM MUTATION_FRAGMENTS() statement test/topology_custom: add test_select_from_mutation_fragments.py test/boost/database_test: add test for mutation_dump/generate_output_schema_from_underlying_schema test/cql-pytest: add test_select_mutation_fragments.py test/cql-pytest: move scylla_data_dir fixture to conftest.py cql3/statements: wire-in mutation_fragments_select_statement cql3/restrictions/statement_restrictions: fix indentation cql3/restrictions/statement_restrictions: add check_indexes flag cql3/statments/select_statement: add mutation_fragments_select_statement cql3: add SELECT MUTATION FRAGMENTS select statement sub-type service/pager: allow passing a query functor override service/storage_proxy: un-embed coordinator_query_options replica: add mutation_dump replica: extract query_state into own header replica/table: add make_nonpopulating_cache_reader() replica/table: add select_memtables_as_mutation_sources() tools,mutation: extract the low-level json utilities into mutation/json.hh tools/json_writer: fold SstableKey() overloads into callers tools/json_writer: allow writing metadata and value separately tools/json_writer: split mutation_fragment_json_writer in two classes tools/json_writer: allow passing custom std::ostream to json_writer	2023-07-19 11:54:11 +03:00
Asias He	c29e7e4644	Revert "Revert "view_update_generator: Increase the registration_queue_size"" This reverts commit `4cee8206f8`. The test is fixed. Closes #14750	2023-07-19 11:46:28 +03:00
Avi Kivity	503d21b570	cql3: expr: avoid separating column_mutation_attribute from its column_value when levellizing aggregation depth Since `ec77172b4b` (" Merge 'cql3: convert the SELECT clause evaluation phase to expressions' from Avi Kivity"), we rewrite non-aggregating selectors to include an aggregation, in order to have the rest of the code either deal with no aggregation, or all selectors aggregating, with nothing in between. This is done by wrapping column selectors with "first" function calls: col -> first(col). This broke non-aggregating selectors that included the ttl() or writetime() pseudo functions. This is because we rewrote them as writetime(first(col)), and writetime() isn't a function that operates on any values; it operates on mutations and so must have access to a column, not an expression. Fix by detecting this scenario and rewriting the expression as first(writetime(col)). Unit and integration tests are added. Fixes #14715. Closes #14716	2023-07-19 11:35:01 +03:00
Botond Dénes	7540e62522	test/boost/database_test: add test for mutation_dump/generate_output_schema_from_underlying_schema Checking that the generated schema has deterministic id and version.	2023-07-19 01:28:28 -04:00
Kamil Braun	6f22ed9145	Merge 'raft: move group0_state_machine::merger to its own header and add unit test for it' from Mikołaj Grzebieluch Move `merger` to its own header file. Leave the logic of applying commands to `group0_state_machine`. Remove `group0_state_machine` dependencies from `merger` to make it an independent module. Add a test that checks if `group0_state_machine_merger` preserves timeuuid monotonicity. `last_id()` should be equal to the largest timeuuid, based on its timestamps. This test combines two commands in the reverse order of their timeuuids. The timeuuids yield different results when compared in both timeuuid order and uuid order. Consequently, the resulting command should have a more recent timeuuid. Fixes #14568 Closes #14682 * github.com:scylladb/scylladb: raft: group0_state_machine_merger: add test for timeuuid ordering raft: group0_state_machine: extract merger to its own header	2023-07-18 17:43:50 +02:00
Raphael S. Carvalho	da18a9badf	Fix test.py with compaction groups test.py with --x-log2-compaction-groups option rotted a little bit. Some boost tests added later didn't use the correct header which parses the option or they didn't adjust suite.yaml. Perhaps it's time to set up a weekly (or bi-weekly) job to verify there are no regressions with it. It's important as it stresses the data plane for tablets reusing the existing tests available. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14732	2023-07-18 16:57:11 +03:00
Botond Dénes	7d5cca1958	Merge 'Regular compaction task' from Aleksandra Martyniuk Task manager's tasks covering regular compaction. Uses multiple inheritance on already existing regular_compaction_task_executor to keep track of the operation with task manager. Closes #14377 * github.com:scylladb/scylladb: test: add regular compaction task test compaction: turn regular_compaction_task_executor into regular_compaction_task_impl compaction: add compaction_manager::perform_compaction method test: modify sstable_compaction_test.cc compaction: add regular_compaction_task_impl compaction: switch state after compaction is done	2023-07-18 16:52:53 +03:00
Michał Jadwiszczak	62ced66702	schema: add scylla specific options to schema description Add `paxos_grace_seconds`, `tombstone_gc`, `cdc` and `synchronous_updates` options to schema description. Fixes: #12389 Fixes: scylladb/scylla-enterprise#2979 Closes #14275	2023-07-18 11:16:19 +03:00
Botond Dénes	21ff6efd74	test/boost/view_build_test: improve test_view_update_generator_register_semaphore_unit_leak By making it independent of the number of units the view update generator's registration semaphore is created with. We want to increase this number significantly and that would destabilize this test significantly. To prevent this, detach the test from the number of units completely, while stil preserving the original intent behind it, as best as it could be determined. Closes #14727	2023-07-18 09:18:28 +03:00
Botond Dénes	b3cb611be7	Merge 'treewide: enable -Wsign-compare and address the warnings from this option' from Kefu Chai in order to identify the problems caused by integer type promotion when comparing unsigned and signed integers, in this series, we - address the warnings raised by `-Wsign-compare` compiler option - add `-Wsign-compare` compiler option to the building systems Closes #14652 * github.com:scylladb/scylladb: treewide: use unsigned variable to compare with unsigned treewide: compare signed and unsigned using std::cmp_*()	2023-07-18 09:05:30 +03:00
Botond Dénes	f03efd7ea9	Merge 'build: cmake: fix the build of some tests' from Kefu Chai this series addresses the FTBFS of tests with CMake, and also checks for the unknown parameters in `add_scylla_test()` Closes #14650 * github.com:scylladb/scylladb: build: cmake: build SEASTAR tests as SEASTAR tests build: cmake: error out if found unknown keywords build: cmake: link tests against necessary libraries	2023-07-18 06:51:40 +03:00
Kefu Chai	fa3129fa29	treewide: use unsigned variable to compare with unsigned some times we initialize a loop variable like auto i = 0; or int i = 0; but since the type of `0` is `int`, what we get is a variable of `int` type, but later we compare it with an unsigned number, if we compile the source code with `-Werror=sign-compare` option, the compiler would warn at seeing this. in general, this is a false alarm, as we are not likely to have a wrong comparison result here. but in order to prevent issues due to the integer promotion for comparison in other places. and to prepare for enabling `-Werror=sign-compare`. let's use unsigned to silence this warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 10:27:18 +08:00
Kefu Chai	3129ae3c8c	treewide: compare signed and unsigned using std::cmp_() when comparing signed and unsigned numbers, the compiler promotes the signed number to coomon type -- in this case, the unsigned type, so they can be compared. but sometimes, it matters. and after the promotion, the comparison yields the wrong result. this can be manifested using a short sample like: ``` int main(int argc, char argv) { int x = -1; unsigned y = 2; fmt::print("{}\n", x < y); return 0; } ``` this error can be identified by `-Werror=sign-compare`, but before enabling this compiling option. let's use `std::cmp_()` to compare them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 10:27:18 +08:00
Aleksandra Martyniuk	ab4ae6b84a	test: modify sstable_compaction_test.cc Modify sstable_compaction_test.cc so that it does not depend on how quick compaction manager stats are updated after compaction is triggered. It is required since in the following changes the context may switch before the stats are updated.	2023-07-17 15:54:33 +02:00
Mikołaj Grzebieluch	bdf3959ae6	raft: group0_state_machine_merger: add test for timeuuid ordering This test checks if `group0_state_machine_merger` preserves timeuuid monotonicity. `last_id()` should be equal to the largest timeuuid, based on its timestamps. This test combines two commands in the reverse order of their timeuuids. The timeuuids yield different results when compared in both timeuuid order and uuid order. Consequently, the resulting command should have a more recent timeuuid. Closes #14568	2023-07-17 15:51:20 +02:00

1 2 3 4 5 ...

2731 Commits