scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 16:33:35 +00:00

Author	SHA1	Message	Date
Botond Dénes	2f8d77e97b	replica/table: add optional compacting to make_multishard_streaming_reader() Doing to make_multishard_streaming_reader() what the previous commit did to make_streaming_reader(). In fact, the new compaction_time parameter is simply forwarded to the make_streaming_reader() on the shard readers. Call sites are updated, but none opt in just yet.	2023-07-27 03:22:11 -04:00
Raphael S. Carvalho	050ce9ef1d	cached_file: Evict unused pages that aren't linked to LRU yet It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes #14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14818	2023-07-27 00:01:46 +02:00
Nadav Har'El	59c1498338	test/alternator: don't forget to delete tables on test failures Most of the Alternator tests are careful to unconditionally remove the test tables, even if the test fails. This is important when testing on a shared database (e.g., DynamoDB) but also useful to make clean shutdown faster as there should be no user table to flush. We missed a few such cases in test_gsi.py, and this patch corrects them. We do this by using the context manager new_test_table() - which automatically deletes the table when done - instead of the function create_test_table() which needs an explicit delete at the end. There are no functional changes in this patch - most of the lines changed are just reindents. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14835	2023-07-26 21:51:22 +03:00
Nadav Har'El	056d04954c	Merge 'view_updating_consumer: account empty partitions memory usage' from Botond Dénes Te view updating consumer uses `_buffer_size` to decide when to flush the accumulated mutations, passing them to the actual view building code. This `_buffer_size` is incremented every time a mutation fragment is consumed. This is not exact, as e.g. range tombstones are represented differently in the mutation object, than in the fragment, but it is good enough. There is one flaw however: `_buffer_size` is not incremented when consuming a partition-start fragment. This is when the mutation object is created in the mutation rebuilder. This is not a big problem when partition have many rows, but if the partitions are tiny, the error in accounting quickly becomes significant. If the partitions are empty, `_buffer_size` is not bumped at all for empty partitions, and any number of these can accumulate in the buffer. We have recently seen this causing stalls and OOM as the buffer got to immense size, only containing empty and tiny partitions. This PR fixes this by accounting the size of the freshly created `mutation` object in `_buffer_size`, after the partition-start fragment is consumed. Fixes: #14819 Closes #14821 * github.com:scylladb/scylladb: test/boost/view_build_test: add test_view_update_generator_buffering_with_empty_mutations db/view/view_updating_consumer: account for the size of mutations mutation/mutation_rebuilder*: return const mutation& from consume_new_partition() mutation/mutation: add memory_usage()	2023-07-26 20:04:28 +03:00
Nadav Har'El	d2ca600eec	test//run: kill Scylla with SIGTERM Today, test//run always kills Scylla at the end of the test with SIGKILL (kill -9), so the Scylla shutdown code doesn't run. It was believed that a clean shutdown would take a long time, but in fact, it turns out that 99% of the shutdown time was a silly sleep in the gossip code, which this patch disables with the "--shutdown-announce-in-ms" option. After enabling this option, clean shutdown takes (in a dev build on my laptop) just 0.02 seconds. It's worth noting that this shutdown has no real work to do - no tables to flush, and so on, because the pytest framework removes all the tables in its own fixture cleanup phase. So in this patch, to kill Scylla we use SIGTERM (15) instead of SIGKILL. We then wait until a timeout of 10 seconds (much much more than 0.02 seconds!) for Scylla to exit. If for some reason it didn't exit (e.g., it hung during the shutdown), it is killed again with SIGKILL, which is guaranteed to succed. This change gives us two advantages 1. Every test run with test/*/run exercises the shutdown path. It is perhaps excessive, but since the shutdown is so quick, there is no big downside. 2. In a test-coverage run, a clean shutdown allows flushing the counter files, which wasn't possible when Scylla was killed with KILL -9. Fixes #8543 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14825	2023-07-26 14:06:24 +03:00
Avi Kivity	ff1f461a42	Merge 'Introduce tablet load balancer' from Tomasz Grabiec After this series, tablet replication can handle the scenario of bootstrapping new nodes. The ownership is distributed indirectly by the means of a load-balancer which moves tablets around in the background. See docs/dev/topology-over-raft.md for details. The implementation is by no means meant to be perfect, especially in terms of performance, and will be improved incrementally. The load balancer will be also kicked by schema changes, so that allocation/deallocation done during table creation/drop will be rebalanced. Tablet data is streamed using existing `range_streamer`, which is the infrastructure for "the old streaming". This will be later replaced by sstable transfer once integration of tablets with compaction groups is finished. Also, cleanup is not wired yet, also blocked by compaction group integration. Closes #14601 * github.com:scylladb/scylladb: tests: test_tablets: Add test for bootstraping a node storage_service: topology_coordinator: Implement tablet migration state machine tablets: Introduce tablet_mutation_builder service: tablet_allocator: Introduce tablet load balancer tablets: Introduce tablet_map::for_each_tablet() topology: Introduce get_node() token_metadata: Add non-const getter of tablet_metadata storage_service: Notify topology state machine after applying schema change storage_service: Implement stream_tablet RPC tablets: Introduce global_tablet_id stream_transfer_task, multishard_writer: Work with table sharder tablets: Turn tablet_id into a struct db: Do not create per-keyspace erm for tablet-based tables tablets: effective_replication_map: Take transition stage into account when computing replicas tablets: Store "stage" in transition info doc: Document tablet migration state machine and load balancer locator: erm: Make get_endpoints_for_reading() always return read replicas storage_service: topology_coordinator: Sleep on failure between retries storage_service: topology_coordinator: Simplify coordinator loop main: Require experimental raft to enable tablets	2023-07-26 12:30:29 +03:00
Botond Dénes	d0f725c1b9	test/boost/view_build_test: add test_view_update_generator_buffering_with_empty_mutations A test reproducing #14819, that is, the view update builder not flushing the buffer when only empty partitions are consumed (with only a tombstone in them).	2023-07-26 03:09:53 -04:00
Botond Dénes	ad2ddffb22	Merge 'Remove qctx from system_keyspace::save_truncation_record()' from Pavel Emelyanov The method is called by db::truncate_table_on_all_shards(), its call-chain, in turn, starts from - proxy::remote::handle_truncate() - schema_tables::merge_schema() - legacy_schema_migrator - tests All of the above are easy to get system_keyspace reference from. This, in turn, allows making the method non-static and use query_processor reference from system_keyspace object in stead of global qctx Closes #14778 * github.com:scylladb/scylladb: system_keyspace: Make save_truncation_record() non-static code: Pass sharded<db::system_keyspace>& to database::truncate() db: Add sharded<system_keyspace>& to legacy_schema_migrator	2023-07-26 08:48:49 +03:00
Tomasz Grabiec	ae8ffe23fc	tests: test_tablets: Add test for bootstraping a node	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	5c681a1d63	tablets: Introduce tablet_mutation_builder	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	6f4a35f9ae	service: tablet_allocator: Introduce tablet load balancer Will be invoked by the topology coordinator later to decide which tablets to migrate.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	f88220aeee	stream_transfer_task, multishard_writer: Work with table sharder So that we can use it on tablet-based tables.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	8cf92d4c86	tablets: Turn tablet_id into a struct The IDL compiler cannot deal with enum classes like this.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	dc2ec3f81c	tablets: Store "stage" in transition info It's needed to implement tablet migration. It stores the current step of tablet migration state machine. The state machine will be advanced by the topology change coordinator. See the "Tablet migration" section of topology-over-raft.md	2023-07-25 21:08:02 +02:00
Tomasz Grabiec	7851694eaa	locator: erm: Make get_endpoints_for_reading() always return read replicas Just a simplification. Drop the test case from token_metadata which creates pending endpoints without normal tokens. It fails after this change with exception: "sorted_tokens is empty in first_token_index!" thrown from token_metadata::first_token_index(), which is used when calculating normal endpoints. This test case is not valid, first node inserts its tokens as normal without going through bootstrap procedure.	2023-07-25 21:08:01 +02:00
Tomasz Grabiec	b294932cf1	main: Require experimental raft to enable tablets Tablets depend on the topology changes on raft feature. Drop "tablets" from suite.yaml of the topology/ suite, which doesn't use tablets anymore.	2023-07-25 21:08:01 +02:00
Botond Dénes	3eec990e4e	Merge 'test: use different table names in simple_backlog_controller_test ' from Kefu Chai in this series, we use different table names in simple_backlog_controller_test. this test is a test exercising sstables compaction strategies. and it creates and keeps multiple tables in a single test session. but we are going to add metrics on per-table basis, and will use the table's ks and cf as the counter's labels. as the metrics subsystem does not allow multiple counters to share the same label. the test will fail when the metrics are being added. to address this problem, in this change 1. a new ctor is added for `simple_schema`, so we can create `simple_schema` with different names 2. use the new ctor in simple_backlog_controller_test Fixes #14767 Closes #14783 * github.com:scylladb/scylladb: test: use different table names in simple_backlog_controller_test test/lib/simple_schema: add ctor for customizing ks.cf test/lib/simple_schema: do not hardwire ks.cf	2023-07-25 10:26:33 +03:00
Botond Dénes	a8feb7428d	Merge 'semaphore mismatch: don't throw an error if both semaphores belong to user' from Michał Jadwiszczak If semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error. Until now, semaphore mismatch was only checked in multi-partition queries. The PR pushes the check to `querier_cache` and perform it on all `lookup__querier` methods. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader. This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it. Refers: https://github.com/scylladb/scylla-enterprise/issues/3182 Refers: https://github.com/scylladb/scylla-enterprise/issues/3050 Closes: #14770 Closes #14736 github.com:scylladb/scylladb: querier_cache: add stats of scheduling group mismatches querier_cache: check semaphore mismatch during querier lookup querier_cache: add reference to `replica::database::is_user_semaphore()` replica:database: add method to determine if semaphore is user one	2023-07-24 14:13:09 +03:00
Michał Jadwiszczak	a5fc53aa11	querier_cache: check semaphore mismatch during querier lookup Previously semaphore mismatch was checked only in multi-partition queries and if happened, an internal error was thrown. This commit pushed the check down to `querier_cache`, so each `lookup_*_querier` method will check for the mismatch. What's more, if semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning and drop cached reader instead of throwing an error. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader.	2023-07-21 19:05:50 +02:00
Michał Jadwiszczak	e5c965b280	querier_cache: add reference to `replica::database::is_user_semaphore()`	2023-07-21 18:58:57 +02:00
Jan Ciolek	cbc97b41d4	cql.g: make the parser reject INSERT JSON without a JSON value We allow inserting column values using a JSON value, eg: ```cql INSERT INTO mytable JSON '{ "\"myKey\"": 0, "value": 0}'; ``` When no JSON value is specified, the query should be rejected. Scylla used to crash in such cases. A recent change fixed the crash (https://github.com/scylladb/scylladb/pull/14706), it now fails on unwrapping an uninitialized value, but really it should be rejected at the parsing stage, so let's fix the grammar so that it doesn't allow JSON queries without JSON values. A unit test is added to prevent regressions. Refs: https://github.com/scylladb/scylladb/pull/14707 Fixes: https://github.com/scylladb/scylladb/issues/14709 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #14785	2023-07-21 18:52:47 +03:00
Kefu Chai	d78c6d5f50	test: use different table names in simple_backlog_controller_test in `simple_backlog_controller_test`, we need to have multiple tables at the same time. but the default constructor of `simple_schema` always creates schema with the table name of "ks.cf". we are going to have a per-table metrics. and the new metric group will use the table name as its counter labels, so we need to either disable this per-table metrics or use a different table name for each table. as in real world, we don't have multiple tables at the same time. it would be better to stop reusing the same table name in a single test session. so, in this change, we use a random cf_name for each of the created table. Fixes #14767 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-21 19:08:29 +08:00
Kefu Chai	1f596e4669	test/lib/simple_schema: add ctor for customizing ks.cf some low level tests, like the ones exercising sstables, creates multiple tables. and we are going to add per-table metrics and the new metrics uses the ks.cf as part of its unique id. so, once the per-table metrics is enabled, the sstable tests would fail. as the metrics subsystem does not allow registering multiple metric groups with the same name. so, in this change, we add a new constructor for `simple_schema`, so that we can customize the the schema's ks and cf when creating the `simple_schema`. in the next commit, we will use this new constructor in a sstable test which creates multiple tables. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-21 19:07:45 +08:00
Kefu Chai	306439d3aa	test/lib/simple_schema: do not hardwire ks.cf instead, query the name of ks and cf from the scheme. this change prepare us for the a simple_schema whose ks and cf can be customized by its contructor. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-21 19:07:45 +08:00
Mikołaj Grzebieluch	37ceef23a6	test: raft: skip test_old_ip_notification_repro in debug mode Closes #14777	2023-07-21 12:41:03 +02:00
Pavel Emelyanov	eaeffcdb81	code: Pass sharded<db::system_keyspace>& to database::truncate() The arguments goes via the db::(drop\|truncate)_table_on_all_shards() pair of calls that start from - storage_proxy::remote: has its sys.ks reference already - schema_tables::merge_schema: has sys.ks argument already - legacy_schema_migrator: the reference was added by previous patch - tests: run in cql_test_env with sys.ks on board Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-21 13:11:59 +03:00
Kefu Chai	a87b0d68cd	s3/test: remove the tempdir if test succeeds in `46616712`, we tried to keep the tmpdir only if the test failed, and keep up to 1 of them using the recently introduced option of `tmp_path_retention_count`. but it turns out this option is not supported by the pytest used by our jenkins nodes, where we have pytest 6.2.5. this is the one shipped along with fedora 36. so, in this change, the tempdir is removed if the test completes without failures. as the tempdir contains huge number of files, and jenkins is quite slow scanning them. after nuking the tempdir, jenkins will be much faster when scanning for the artifacts. Fixes #14690 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14772	2023-07-21 12:21:51 +03:00
Nadav Har'El	5860820934	Merge 'mutation/mutation_compactor: validate the input stream' from Botond Dénes The mutation compactor has a validator which it uses to validate the stream of mutation fragments that passes through it. This validator is supposed to validate the stream as it enters the compactor, as opposed to its compacted form (output). This was true for most fragment kinds except range tombstones, as purged range tombstones were not visible to the validator for the most part. This mistake was introduced by https://github.com/scylladb/scylladb/commit `e2c9cdb576`, which itself was a flawed attempt at fixing an error seen because purged tombstones were not terminated by the compactor. This patch corrects this mistake by fixing the above problem properly: on page-cut, if the validator has an active tombstone, a closing tombstone is generated for it, to avoid the false-positive error. With this, range tombstones can be validated again as they come in. The existing unit test checking the validation in the compactor is greatly expanded to check all (I hope) different validation scenarios. Closes #13817 * github.com:scylladb/scylladb: test/mutation_test: test_compactor_validator_sanity_test mutation/mutation_compactor: fix indentation mutation/mutation_compactor: validate the input stream mutation: mutation_fragment_stream_validating_filter: add accessor to underlying validator readers: reader-from-fragment: don't modify stream when created without range	2023-07-21 00:26:46 +03:00
Avi Kivity	e00811caac	cql3: grammar: reject intValue with no contents The grammar mistakenly allows nothing to be parsed as an intValue (itself accepted in LIMIT and similar clauses). Easily fixed by removing the empty alternative. A unit test is added. Fixes #14705. Closes #14707	2023-07-21 00:24:51 +03:00
Pavel Emelyanov	98609e2115	Merge 's3/test: close using deferred_close() or deferred()' from Kefu Chai let's use RAII to tear down the client and the input file, so we can always perform the cleanups even if the test throws. Closes #14765 * github.com:scylladb/scylladb: s3/test: use seastar::deferred() to perform cleanup s3/test: close using deferred_close()	2023-07-20 20:05:34 +03:00
Botond Dénes	53da97416a	Merge 'Remove qctx from system.paxos table access methods' from Pavel Emelyanov The "fix" is straightforward -- callers of system_keyspace::paxos methods need to get system keyspace from somewhere. This time the only caller is storage_proxy::remote that can have system keyspace via direct dependency reference. Closes #14758 * github.com:scylladb/scylladb: db/system_keyspace: Move and use qctx::execute_cql_with_timeout() db/system_keyspace: Make paxos methods non-static service/paxos: Add db::system_keyspace& argument to some methods test: Optionally initialize proxy remote for cql_test_env proxy/remote: Keep sharded<db::system_keyspace>& dependency	2023-07-20 16:53:25 +03:00
Botond Dénes	e62325babc	Merge 'Compaction reshard task' from Aleksandra Martyniuk Task manager tasks covering reshard compaction. Reattempt on https://github.com/scylladb/scylladb/pull/14044. Bugfix for https://github.com/scylladb/scylladb/issues/14618 is squashed with 95191f4. Regression test added. Closes #14739 * github.com:scylladb/scylladb: test: add test for resharding with non-empty owned_ranges_ptr test: extend test_compaction_task.py to test resharding compaction compaction: add shard_reshard_sstables_compaction_task_impl compaction: invoke resharding on sharded database compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run() compaction: add reshard_sstables_compaction_task_impl compaction: create resharding_compaction_task_impl	2023-07-20 16:43:22 +03:00
Botond Dénes	a35f4f6985	test/mutation_test: test_compactor_validator_sanity_test Greatly expand this test to check that the compactor validates the input stream properly. The test is renamed (the _sanity_test suffix is removed) to reflect the expanded scope.	2023-07-20 08:48:50 -04:00
Raphael S. Carvalho	3117f2f066	tests: Add test for table's mutation source excluding staging Commit `f5e3b8df6d` introduced an optimization for as_mutation_source_excluding_staging() and added a test that verifies correctness of single key and range reads based on supplied predicates. This new test aims to improve the coverage by testing directly both table::as_mutation_source() and as_mutation_source_excluding_staging(), therefore guaranteeing that both supply the correct predicate to sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14763	2023-07-20 07:14:36 +03:00
Kefu Chai	77faec4f38	s3/test: use seastar::deferred() to perform cleanup let's use RAII to remove the object use as a fixture, so we don't leave some object in the bucket for testing. this might interfere with other tests which share the same minio server with the test which fails to do its clean up if an exception is thrown. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-20 10:04:54 +08:00
Kefu Chai	7a9c802fc3	s3/test: close using deferred_close() let's use RAII to tear down the client and the input file, so we can always perform the cleanups even if the test throws. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-20 10:04:54 +08:00
Pavel Emelyanov	ea9db1b35c	Merge 'cql3: expr: remove the default constructor' from Avi Kivity `expression`'s default constructor is dangerous as an it can leak into computations and generate surprising results. Fix that by removing the default constructor. This is made somewhat difficult by the parser generator's reliance on default construction, and we need to expand our workaround (`uninitialized<>`) capabilities to do so. We also remove some incidental uses of default-constructed expressions. Closes #14706 * github.com:scylladb/scylladb: cql3: expr: make expression non-default-constructible cql3: grammar: don't default-construct expressions cql3: grammar: improve uninitialized<> flexibility cql3: grammar: adjust uninitialized<> wrapper test: expr_test: don't invoke expression's default constructor cql3: statement_restrictions: explicitly initialize expressions in index match code cql3: statement_restrictions: explicitly intitialize some expression fields cql3: statement_restrictions: avoid expression's default constructor when classifying restrictions cql3: expr: prepare_expression: avoid default-constructed expression cql3: broadcast_tables: prepare new_value without relying on expression default constructor	2023-07-19 21:46:03 +03:00
Pavel Emelyanov	b4fc1076e3	test: Optionally initialize proxy remote for cql_test_env Some test cases that use cql_test_env involve paxos state updates. Since this update is becoming via proxy->remote->system_keyspace those test cases need cql_test_env to initialize the remote part of the proxy too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-19 19:32:10 +03:00
Aleksandra Martyniuk	bfb81b8cdd	test: add test for resharding with non-empty owned_ranges_ptr	2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk	4fc4c2527c	test: extend test_compaction_task.py to test resharding compaction	2023-07-19 17:19:10 +02:00
Mikołaj Grzebieluch	00db47292b	test: raft: do not update raft address map with obsolete gossip data Regression test for #14257. It starts two nodes. It introduces a sleep in raft_group_registry::on_alive (in raft_group_registry.cc) when receiving a gossip notification about HOST_ID update from the second node. Then it restarts the second node with a different IP. Due to the sleep, the old notification from the old IP arrives after the second node has restarted. If the bug is present, this notification overrides the address map entry and the second read barrier times out, since the first node cannot reach the second node with the old IP. Closes #14609. Closes #14728	2023-07-19 11:57:49 +02:00
Kefu Chai	665135553d	build: cmake: remove nonexistent test the test of "type_json_test" was added locally, and has not landed on master. but it somehow was spilled into `87170bf07a` by accident. so, let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14749	2023-07-19 11:58:34 +03:00
Avi Kivity	460b28d067	Merge 'Introduce `SELECT MUTATION FRAGMENTS` statement' from Botond Dénes SELECT MUTATION FRAGMENTS is a new select statement sub-type, which allows dumping the underling mutations making up the data of a given table. The output of this statement is mutation-fragments presented as CQL rows. Each row corresponds to a mutation-fragment. Subsequently, the output of this statement has a schema that is different than that of the underlying table. The output schema is derived from the table's schema, as following: * The table's partition key is copied over as-is * The clustering key is formed from the following columns: - mutation_source (text): the kind of the mutation source, one of: memtable, row-cache or sstable; and the identifier of the individual mutation source. - partition_region (int): represents the enum with the same name. - the copy of the table's clustering columns - position_weight (int): -1, 0 or 1, has the same meaning as that in position_in_partition, used to disambiguate range tombstone changes with the same clustering key, from rows and from each other. * The following regular columns: - metadata (text): the JSON representation of the mutation-fragment's metadata. - value (text): the JSON representation of the mutation-fragment's value. Data is always read from the local replica, on which the query is executed. Migrating queries between coordinators is frobidden. More details in the documentation commit (last commit). Example: ```cql cqlsh> CREATE TABLE ks.tbl (pk int, ck int, v int, PRIMARY KEY (pk, ck)); cqlsh> DELETE FROM ks.tbl WHERE pk = 0; cqlsh> DELETE FROM ks.tbl WHERE pk = 0 AND ck > 0 AND ck < 2; cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 0, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 1, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 2, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (1, 0, 0); cqlsh> SELECT * FROM ks.tbl; pk \| ck \| v ----+----+--- 1 \| 0 \| 0 0 \| 0 \| 0 0 \| 1 \| 0 0 \| 2 \| 0 (4 rows) cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl); pk \| mutation_source \| partition_region \| ck \| position_weight \| metadata \| mutation_fragment_kind \| value ----+-----------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+----------- 1 \| memtable:0 \| 0 \| \| \| {"tombstone":{}} \| partition start \| null 1 \| memtable:0 \| 2 \| 0 \| 0 \| {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} \| clustering row \| {"v":"0"} 1 \| memtable:0 \| 3 \| \| \| null \| partition end \| null 0 \| memtable:0 \| 0 \| \| \| {"tombstone":{"timestamp":1688122848686316,"deletion_time":"2023-06-30 11:00:48z"}} \| partition start \| null 0 \| memtable:0 \| 2 \| 0 \| 0 \| {"marker":{"timestamp":1688122860037077},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122860037077}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 2 \| 0 \| 1 \| {"tombstone":{"timestamp":1688122853571709,"deletion_time":"2023-06-30 11:00:53z"}} \| range tombstone change \| null 0 \| memtable:0 \| 2 \| 1 \| 0 \| {"marker":{"timestamp":1688122864641920},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122864641920}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 2 \| 2 \| -1 \| {"tombstone":{}} \| range tombstone change \| null 0 \| memtable:0 \| 2 \| 2 \| 0 \| {"marker":{"timestamp":1688122868706989},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122868706989}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 3 \| \| \| null \| partition end \| null (10 rows) ``` Perf simple query: ``` /build/release/scylla perf-simple-query -c1 -m2G --duration=60 ``` Before: ``` median 141596.39 tps ( 62.1 allocs/op, 13.1 tasks/op, 43688 insns/op, 0 errors) median absolute deviation: 137.15 maximum: 142173.32 minimum: 140492.37 ``` After: ``` median 141889.95 tps ( 62.1 allocs/op, 13.1 tasks/op, 43692 insns/op, 0 errors) median absolute deviation: 167.04 maximum: 142380.26 minimum: 141025.51 ``` Fixes: https://github.com/scylladb/scylladb/issues/11130 Closes #14347 * github.com:scylladb/scylladb: docs/operating-scylla/admin-tools: add documentation for the SELECT * FROM MUTATION_FRAGMENTS() statement test/topology_custom: add test_select_from_mutation_fragments.py test/boost/database_test: add test for mutation_dump/generate_output_schema_from_underlying_schema test/cql-pytest: add test_select_mutation_fragments.py test/cql-pytest: move scylla_data_dir fixture to conftest.py cql3/statements: wire-in mutation_fragments_select_statement cql3/restrictions/statement_restrictions: fix indentation cql3/restrictions/statement_restrictions: add check_indexes flag cql3/statments/select_statement: add mutation_fragments_select_statement cql3: add SELECT MUTATION FRAGMENTS select statement sub-type service/pager: allow passing a query functor override service/storage_proxy: un-embed coordinator_query_options replica: add mutation_dump replica: extract query_state into own header replica/table: add make_nonpopulating_cache_reader() replica/table: add select_memtables_as_mutation_sources() tools,mutation: extract the low-level json utilities into mutation/json.hh tools/json_writer: fold SstableKey() overloads into callers tools/json_writer: allow writing metadata and value separately tools/json_writer: split mutation_fragment_json_writer in two classes tools/json_writer: allow passing custom std::ostream to json_writer	2023-07-19 11:54:11 +03:00
Asias He	c29e7e4644	Revert "Revert "view_update_generator: Increase the registration_queue_size"" This reverts commit `4cee8206f8`. The test is fixed. Closes #14750	2023-07-19 11:46:28 +03:00
Avi Kivity	503d21b570	cql3: expr: avoid separating column_mutation_attribute from its column_value when levellizing aggregation depth Since `ec77172b4b` (" Merge 'cql3: convert the SELECT clause evaluation phase to expressions' from Avi Kivity"), we rewrite non-aggregating selectors to include an aggregation, in order to have the rest of the code either deal with no aggregation, or all selectors aggregating, with nothing in between. This is done by wrapping column selectors with "first" function calls: col -> first(col). This broke non-aggregating selectors that included the ttl() or writetime() pseudo functions. This is because we rewrote them as writetime(first(col)), and writetime() isn't a function that operates on any values; it operates on mutations and so must have access to a column, not an expression. Fix by detecting this scenario and rewriting the expression as first(writetime(col)). Unit and integration tests are added. Fixes #14715. Closes #14716	2023-07-19 11:35:01 +03:00
Botond Dénes	a8fc71dbc0	test/topology_custom: add test_select_from_mutation_fragments.py	2023-07-19 01:28:28 -04:00
Botond Dénes	7540e62522	test/boost/database_test: add test for mutation_dump/generate_output_schema_from_underlying_schema Checking that the generated schema has deterministic id and version.	2023-07-19 01:28:28 -04:00
Botond Dénes	6709a71b96	test/cql-pytest: add test_select_mutation_fragments.py	2023-07-19 01:28:28 -04:00
Botond Dénes	05e010b1d3	test/cql-pytest: move scylla_data_dir fixture to conftest.py It will soon be used by more than one test file.	2023-07-19 01:28:28 -04:00
Pavel Emelyanov	8bc42f54d4	Merge 'feature_service: handle deprecated features correctly in feature check' from Piotr Dulikowski The feature check in `enable_features_on_startup` loads the list of features that were enabled previously, goes over every one of them and checks whether each feature is considered supported and whether there is a corresponding `gms::feature` object for it (i.e. the feature is "registered"). The second part of the check is unnecessary and wrong. A feature can be marked as supported but its `gms::feature` object not be present anymore: after a feature is supported for long enough (i.e. we only support upgrades from versions that support the feature), we can consider such a feature to be deprecated. When a feature is deprecated, its `gms::feature` object is removed and the feature is always considered enabled which allows to remove some legacy code. We still consider this feature to be supported and advertise it in gossip, for the sake of the old nodes which, even though they always support the feature, they still check whether other nodes support it. The problem with the check as it is now is that it disallows moving features to the disabled list. If one tries to do it, they will find out that upgrading the node to the new version does not work: `enable_features_on_startup` will load the feature, notice that it is not "registered" (there is no `gms::feature` object for it) and fail to boot. This commit fixes the problem by modifying `enable_features_on_startup` not to look at the registered features list at all. In addition to this, some other small cleanups are performed: - "LARGE_COLLECTION_DETECTION" is removed from the deprecated features list. For some reason, it was put there when the feature was being introduced. It does not break anything because there is a `gms::feature` object for it, but it's slightly confusing and therefore is removed. - The comment in `supported_feature_set` that invites developers to add features there as they are introduced is removed. It is no longer necessary to do so because registered features are put there automatically. Deprecated features should still be put there, as indicated as another comment. Fortunately, this issue does not break any upgrades as of now - since we added enabled cluster feature persisting, no features were deprecated, and we only add registered features to the persisted feature list. An error injection and a regression test is added. Closes #14701 * github.com:scylladb/scylladb: topology_custom: add deprecated features test feature_service: add error injection for deprecated cluster feature feature_service: move error injection check to helper function feature_service: handle deprecated features correctly in feature check	2023-07-18 21:01:48 +03:00

1 2 3 4 5 ...

5340 Commits