scylladb

Author	SHA1	Message	Date
Botond Dénes	24cb351655	Merge 'test: sstable_test: avoid using helper using generation_type::int_t ' from Kefu Chai the series drops some of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation. Closes #13845 github.com:scylladb/scylladb: test: drop unused helper functions test: sstable_mutation_test: avoid using helper using generation_type::int_t test: sstable_move_test: avoid using helper using generation_type::int_t test: sstable_*test: avoid using helper using generation_type::int_t test: sstable_3_x_test: do not use reuseable_sst() accepting integer	2023-05-11 10:17:02 +03:00
Kefu Chai	29284d64a5	test: drop unused helper functions all users of these two helpers have switched to their alternatives, so there is no need to keep them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-11 12:32:37 +08:00
Kefu Chai	bfd6caffbb	test: sstable_*test: avoid using helper using generation_type::int_t this change is one of the series which drops most of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation. so, in this change, instead of using the helper accepting int, we switch to the one which accepts generation_type by offering a default paramter, which is a generation created using 1. this preserves the existing behavior. we will divert other callers of `reusable_sst(..., generation_type::int)` in following-up changes in different ways. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-11 12:32:22 +08:00
Nadav Har'El	e57252092c	Merge 'cql3: result_set, selector: change value type to managed_bytes_opt' from Avi Kivity CQL evolved several expression evaluation mechanisms: WHERE clause, selectors (the SELECT clause), and the LWT IF clause are just some examples. Most now use expressions, which use managed_bytes_opt as the underlying value representation, but selectors still use bytes_opt. This poses two problems: 1. bytes_opt generates large contiguous allocations when used with large blobs, impacting latency 2. trying to use expressions with bytes_opt will incur a copy, reducing performance To solve the problem, we harmonize the data types to managed_bytes_opt (#13216 notwithstanding). This is somewhat difficult since the source of the values are views into a bytes_ostream. However, luckily bytes_ostream and managed_bytes_view are mostly compatible so with a little effort this can be done. The series is neutral wrt performance: before: ``` 222118.61 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 224250.14 tps ( 61.1 allocs/op, 12.1 tasks/op, 43094 insns/op, 0 errors) 224115.66 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 223508.70 tps ( 61.1 allocs/op, 12.1 tasks/op, 43107 insns/op, 0 errors) 223498.04 tps ( 61.1 allocs/op, 12.1 tasks/op, 43087 insns/op, 0 errors) ``` after: ``` 220708.37 tps ( 61.1 allocs/op, 12.1 tasks/op, 43118 insns/op, 0 errors) 225168.99 tps ( 61.1 allocs/op, 12.1 tasks/op, 43081 insns/op, 0 errors) 222406.00 tps ( 61.1 allocs/op, 12.1 tasks/op, 43088 insns/op, 0 errors) 224608.27 tps ( 61.1 allocs/op, 12.1 tasks/op, 43102 insns/op, 0 errors) 225458.32 tps ( 61.1 allocs/op, 12.1 tasks/op, 43098 insns/op, 0 errors) ``` Though I expect with some more effort we can eliminate some copies. Closes #13637 * github.com:scylladb/scylladb: cql3: untyped_result_set: switch to managed_bytes_view as the cell type cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt cql3: untyped_result_set: always own data types: abstract_type: add mixed-type versions of compare() and equal() utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt utils: managed_bytes: add managed_bytes_view::with_linearized() utils: managed_bytes: mark managed_bytes_view::is_linearized() const	2023-05-10 15:01:45 +03:00
Avi Kivity	42a1ced73b	cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt The expression system uses managed_bytes_opt for values, but result_set uses bytes_opt. This means that processing values from the result set in expressions requires a copy. Out of the two, managed_bytes_opt is the better choice, since it prevents large contiguous allocations for large blobs. So we switch result_set to use managed_bytes_opt. Users of the result_set API are adjusted. The db::function interface is not modified to limit churn; instead we convert the types on entry and exit. This will be adjusted in a following patch.	2023-05-07 17:17:36 +03:00
Kefu Chai	bd3e8d0460	test: drop a reusable_sst() variant which accepts int as generation this is one of the changes to reduce the usage of integer based generation test. in future, we will need to expand the test to exercise the UUID based generation, or at least to be neutral to the underlying generation's identifier type. so, to remove the helpers which only accept `generation_type::int_t` would helps us to make this happen. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-06 18:24:48 +08:00
Kefu Chai	05a172c7e7	build: cmake: link against Boost::unit_test_framework we introduced the linkage to Boost::unit_test_framework in `fe70333c19`, this library is used by test/lib/test_utils.cc, so update CMake accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13781	2023-05-05 13:55:00 +03:00
Botond Dénes	687a8bb2f0	Merge 'Sanitize test::filename(sstable) API' from Pavel Emelyanov There are two of them currently with slightly different declaration. Better to leave only one. Closes #13772 * github.com:scylladb/scylladb: test: Deduplicate test::filename() static overload test: Make test::filename return fs::path	2023-05-05 11:36:08 +03:00
Avi Kivity	1d351dde06	Merge 'Make S3 client work with real S3' from Pavel Emelyanov Current S3 client was tested over minio and it takes few more touches to work with amazon S3. The main challenge here is to support singed requests. The AWS S3 server explicitly bans unsigned multipart-upload requests, which in turn is the essential part of the sstables S3 backend, so we do need signing. Signing a request has many options and requirements, one of them is -- request _body_ can be or can be not included into signature calculations. This is called "(un)signed payload". Requests sent over plain HTTP require payload signing (i.e. -- request body should be included into signature calculations), which can a bit troublesome, so instead the PR uses unsigned payload (i.e. -- doesn't include the request body into signature calculation, only necessary headers and query parameters), but thus also needs HTTPS. So what this set does is makes the existing S3 client code sign requests. In order to sign the request the code needs to get AWS key and secret (and region) from somewhere and this somewhere is the conf/object_storage.yaml config file. The signature generating code was previously merged (moved from alternator code) and updated to suit S3 client needs. In order to properly support HTTPS the PR adds special connection factory to be used with seastar http client. The factory makes DNS resolving of AWS endpoint names and configures gnutls systemtrust. fixes: #13425 Closes #13493 * github.com:scylladb/scylladb: doc: Add a document describing how to configure S3 backend s3/test: Add ability to run boost test over real s3 s3/client: Sign requests if configured s3/client: Add connection factory with DNS resolve and configurable HTTPS s3/client: Keep server port on config s3/client: Construct it with config s3/client: Construct it with sstring endpoint sstables: Make s3_storage with endpoint config sstables_manager: Keep object storage configs onboard code: Introduce conf/object_storage.yaml configuration file	2023-05-04 18:08:54 +03:00
Pavel Emelyanov	56dfc21ba0	test: Deduplicate test::filename() static overload There are two of them currently, both returning fs::path for sstable components. One is static and can be dropped, callers are patched to use the non-static one making the code tiny bit shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-04 17:16:00 +03:00
Pavel Emelyanov	3f30a253be	test: Make test::filename return fs::path The sstable::filename() is private and is not supposed to be used as a path to open any files. However, tests are different and they sometimes know it is. For that they use test wrapper that has access to private members and may make assumptions about meaning of sstable::filename(). Said that, the test::filename() should return fs::path, not sstring. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-04 17:14:04 +03:00
Tomasz Grabiec	e385ce8a2b	Merge "fix stack use after free during shutdown" from Gleb storage_service uses raft_group0 but the during shutdown the later is destroyed before the former is stopped. This series move raft_group0 destruction to be after storage_service is stopped already. For the move to work some existing dependencies of raft_group0 are dropped since they do not really needed during the object creation. Fixes #13522	2023-05-04 15:14:18 +02:00
Pavel Emelyanov	fe70333c19	test: Auto-skip object-storage test cases if run from shell In case an sstable unit test case is run individually, it would fail with exception saying that S3_... environment is not set. It's better to skip the test-case rather than fail. If someone wants to run it from shell, it will have to prepare S3 server (minio/AWS public bucket) and provide proper environment for the test-case. refs: #13569 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13755	2023-05-04 14:15:18 +03:00
Gleb Natapov	dc6c3b60b4	init: move raft_group0 creation before storage_service storage_service uses raft_group0 so the later needs to exists until the former is stopped.	2023-05-04 13:03:18 +03:00
Gleb Natapov	e9fb885e82	service/raft: raft_group0: drop dependency on cdc::generation_service raft_group0 does not really depends on cdc::generation_service, it needs it only transiently, so pass it to appropriate methods of raft_group0 instead of during its creation.	2023-05-04 13:03:07 +03:00
Pavel Emelyanov	3bec5ea2ce	s3/client: Keep server port on config Currently the code temporarily assumes that the endpoint port is 9000. This is what tests' local minio is started with. This patch keeps the port number on endpoint config and makes test get the port number from minio starting code via environment. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:43 +03:00
Pavel Emelyanov	2f6aa5b52e	code: Introduce conf/object_storage.yaml configuration file In order to access real S3 bucket, the client should use signed requests over https. Partially this is due to security considerations, partially this is unavoidable, because multipart-uploading is banned for unsigned requests on the S3. Also, signed requests over plain http require signing the payload as well, which is a bit troublesome, so it's better to stick to secure https and keep payload unsigned. To prepare signed requests the code needs to know three things: - aws key - aws secret - aws region name The latter could be derived from the endpoint URL, but it's simpler to configure it explicitly, all the more so there's an option to use S3 URLs without region name in them we could want to use some time. To keep the described configuration the proposed place is the object_storage.yaml file with the format endpoints: - name: a.b.c port: 443 aws_key: 12345 aws_secret: abcdefghijklmnop ... When loaded, the map gets into db::config and later will be propagated down to sstables code (see next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:19:15 +03:00
Nadav Har'El	b5f28e2b55	Merge 'Add S3 support to sstables::test_env' from Pavel Emelyanov Currently there are only 2 tests for S3 -- the pure client test and compound object_store test that launches scylla, creates s3-backed table and CQL-queries it. At the same time there's a whole lot of small unit test for sstables functionality, part of it can run over S3 storage too. This PR adds this support and patches several test cases to use it. More test cases are to come later on demand. fixes: #13015 Closes #13569 * github.com:scylladb/scylladb: test: Make resharding test run over s3 too test: Add lambda to fetch bloom filter size test: Tune resharding test use of sstable::test_env test: Make datafile test case run over s3 too test: Propagate storage options to table_for_test test: Add support for s3 storage_options in config test: Outline sstables::test_env::do_with_async() test: Keep storage options on sstable_test_env config sstables: Add and call storage::destroy() sstables: Coroutinize sstable::destroy()	2023-05-02 21:48:05 +03:00
Pavel Emelyanov	f7df238545	test: Propagate storage options to table_for_test Teach table_for_tests use any storage options, not just local one. For now the only user that passes non-local options is sstables::test_env. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-02 11:30:03 +03:00
Pavel Emelyanov	fa1de16f30	test: Add support for s3 storage_options in config When the sstable test case wants to run over S3 storage it needs to specify that in test config by providing the S3 storage options. So first thing this patch adds is the helper that makes these options based on the env left by minio launcher from test.py. Next, in order to make sstables_manager work with S3 it needs the plugged system keyspace which, in turn, needs query processor, proxy, database, etc. All this stuff lives in cql_test_env, so the test case running with S3 options will run in a sstables::test_env nested inside cql_test_env. The latter would also need to plug its system keyspace to the former's sstables manager and turn the experimental feature ON. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-02 11:30:03 +03:00
Pavel Emelyanov	1e03733e8c	test: Outline sstables::test_env::do_with_async() It's growing larger, better to keep it in .cc file Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-02 11:15:45 +03:00
Pavel Emelyanov	f223f5357d	test: Keep storage options on sstable_test_env config So that it could be set to s3 by the test case on demand. Default is local storage which uses env's tempdir or explicit path argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-02 11:15:45 +03:00
Benny Halevy	ba883859c7	utils: to_string: get rid of to_string(const Range&) Use fmt::to_string instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:48:46 +03:00
Botond Dénes	022465d673	Merge 'Tone down offstrategy log message' from Benny Halevy In many cases we trigger offstrategy compaction opportunistically also when there's nothing to do. In this case we still print to the log lots of info-level message and call `run_offstrategy_compaction` that wastes more cpu cycles on learning that it has nothing to do. This change bails out early if the maintenance set is empty and prints a "Skipping off-strategy compaction" message in debug level instead. Fixes #13466 Also, add an group_id class and return it from compaction_group and table_state. Use that to identify the compaction_group / table_state by "ks_name.cf_name compaction_group=idx/total" in log messages. Fixes #13467 Closes #13520 * github.com:scylladb/scylladb: compaction_manager: print compaction_group id compaction_group, table_state: add group_id member compaction_manager: offstrategy compaction: skip compaction if no candidates are found	2023-05-02 08:05:18 +03:00
Raphael S. Carvalho	2dbae856f8	sstable: Piggyback on sstable parser and writer to provide bytes_on_disk bytes_on_disk is the sum of all sstable components. As read_simple() fetches the file size before parsing the component, bytes_on_disk can be added incrementally rather than an additional step after all components were already parsed. Likewise, write_simple() tracks the offset for each new component, and therefore bytes_on_disk can also be added incrementally. This simplifies s3 life as it no longer have to care about feeding a bytes_on_disk, which is currently limited to data and index sizes only. Refs #13649. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-27 12:06:48 -03:00
Raphael S. Carvalho	bc486b05fa	test: sstable_utils: reuse set_values() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-27 12:04:52 -03:00
Kamil Braun	30cc07b40d	Merge 'Introduce tablets' from Tomasz Grabiec This PR introduces an experimental feature called "tablets". Tablets are a way to distribute data in the cluster, which is an alternative to the current vnode-based replication. Vnode-based replication strategy tries to evenly distribute the global token space shared by all tables among nodes and shards. With tablets, the aim is to start from a different side. Divide resources of replica-shard into tablets, with a goal of having a fixed target tablet size, and then assign those tablets to serve fragments of tables (also called tablets). This will allow us to balance the load in a more flexible manner, by moving individual tablets around. Also, unlike with vnode ranges, tablet replicas live on a particular shard on a given node, which will allow us to bind raft groups to tablets. Those goals are not yet achieved with this PR, but it lays the ground for this. Things achieved in this PR: - You can start a cluster and create a keyspace whose tables will use tablet-based replication. This is done by setting `initial_tablets` option: ``` CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'initial_tablets': 8}; ``` All tables created in such a keyspace will be tablet-based. Tablet-based replication is a trait, not a separate replication strategy. Tablets don't change the spirit of replication strategy, it just alters the way in which data ownership is managed. In theory, we could use it for other strategies as well like EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy is augmented to support tablets. - You can create and drop tablet-based tables (no DDL language changes) - DML / DQL work with tablet-based tables Replicas for tablet-based tables are chosen from tablet metadata instead of token metadata Things which are not yet implemented: - handling of views, indexes, CDC created on tablet-based tables - sharding is done using the old method, it ignores the shard allocated in tablet metadata - node operations (topology changes, repair, rebuild) are not handling tablet-based tables - not integrated with compaction groups - tablet allocator piggy-backs on tokens to choose replicas. Eventually we want to allocate based on current load, not statically Closes #13387 * github.com:scylladb/scylladb: test: topology: Introduce test_tablets.py raft: Introduce 'raft_server_force_snapshot' error injection locator: network_topology_strategy: Support tablet replication service: Introduce tablet_allocator locator: Introduce tablet_aware_replication_strategy locator: Extract maybe_remove_node_being_replaced() dht: token_metadata: Introduce get_my_id() migration_manager: Send tablet metadata as part of schema pull storage_service: Load tablet metadata when reloading topology state storage_service: Load tablet metadata on boot and from group0 changes db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() migration_notifier: Introduce before_drop_keyspace() migration_manager: Make prepare_keyspace_drop_announcement() return a future<> test: perf: Introduce perf-tablets test: Introduce tablets_test test: lib: Do not override table id in create_table() utils, tablets: Introduce external_memory_usage() db: tablets: Add printers db: tablets: Add persistence layer dht: Use last_token_of_compaction_group() in split_token_range_msb() locator: Introduce tablet_metadata dht: Introduce first_token() dht: Introduce next_token() storage_proxy: Improve trace-level logging locator: token_metadata: Fix confusing comment on ring_range() dht, storage_proxy: Abstract token space splitting Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries" db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms() db: Introduce get_non_local_vnode_based_strategy_keyspaces() service: storage_proxy: Avoid copying keyspace name in write handler locator: Introduce per-table replication strategy treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type locator: Introduce effective_replication_map locator: Rename effective_replication_map to vnode_effective_replication_map locator: effective_replication_map: Abstract get_pending_endpoints() db: Propagate feature_service to abstract_replication_strategy::validate_options() db: config: Introduce experimental "TABLETS" feature db: Log replication strategy for debugging purposes db: Log full exception on error in do_parse_schema_tables() db: keyspace: Remove non-const replication strategy getter config: Reformat	2023-04-27 09:40:18 +02:00
Gleb Natapov	9849409c2a	service/raft: raft_group0: drop dependency on migration_manager raft_group0 does not really depends on migration_manager, it needs it only transiently, so pass it to appropriate methods of raft_group0 instead of during its creation.	2023-04-25 12:38:01 +03:00
Gleb Natapov	d5d156d474	service/raft: raft_group0: drop dependency on query_processor raft_group0 does not really depends on query_processor, it needs it only transiently, so pass it to appropriate methods of raft_group0 instead of during its creation.	2023-04-25 12:35:57 +03:00
Gleb Natapov	029f1737ef	service/raft: raft_group0: drop dependency on storage_service raft_group0 does not really depends on storage_service, it needs it only transiently, so pass it to appropriate methods of raft_group0 instead of during its creation.	2023-04-25 11:07:47 +03:00
Tomasz Grabiec	5e89f2f5ba	service: Introduce tablet_allocator Currently, responsible for injecting mutations of system.tablets to schema changes. Note that not all migrations are handled currently. Dependant view or cdc table drops are not handled.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	d42685d0cb	storage_service: Load tablet metadata on boot and from group0 changes	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	b4ac329367	test: lib: Do not override table id in create_table() It is already set by schema_maker. In tablets_test we will depend on the id being the same as that set in the schema_builder, so don't change it to something else.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	9b17ad3771	locator: Introduce per-table replication strategy Will be used by tablet-based replication strategies, for which effective replication map is different per table. Also, this patch adapts existing users of effective replication map to use the per-table effective replication map. For simplicity, every table has an effective replication map, even if the erm is per keyspace. This way the client code can be uniform and doesn't have to check whether replication strategy is per table. Not all users of per-keyspace get_effective_replication_map() are adapted yet to work per-table. Those algorithms will throw an exception when invoked on a keyspace which uses per-table replication strategy.	2023-04-24 10:49:36 +02:00
Benny Halevy	dabf46c37f	compaction_group, table_state: add group_id member To help identify the compaction group / table_state. Ref #13467 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-24 10:06:04 +03:00
Botond Dénes	9e757d9c6d	Merge 'De-globalize storage proxy' from Pavel Emelyanov All users of global proxy are gone (), proxy can be made fully main/cql_test_env local. () one test case still needs it, but can get it via cql_test_env Closes #13616 * github.com:scylladb/scylladb: code: Remove global proxy schema_change_test: Use proxy from cql_test_env test: Carry proxy reference on cql_test_env	2023-04-24 09:38:00 +03:00
Botond Dénes	1750bb34b7	Merge 'sstables, replica: add generation generator' from Kefu Chai this is the first step to the uuid-based generation identifier. the goal is to encapsulate the generation related logic in generator, so its consumers do not have to understand the difference between the int64_t based generation and UUID v1 based generation. this commit should not change the behavior of existing scylla. it just allows us to derive from `generation_generator` so we can have another generator which generates UUID based generation identifier. Closes #13073 * github.com:scylladb/scylladb: replica, test: create generation id using generator sstables: add generation_generator test: sstables: use generate_n for generating ids for testing	2023-04-24 09:31:08 +03:00
Tomasz Grabiec	bd0b299322	Merge 'Manage CDC generations when bootstrapping nodes using Raft Group 0 topology coordinator' from Kamil Braun Introduce a new table `CDC_GENERATIONS_V3` (`system.cdc_generations_v3`). The table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The difference is that V2 lives in `system_distributed_keyspace` and writes to it are distributed using regular `storage_proxy` replication mechanisms based on the token ring. The V3 table lives in `system_keyspace` and any mutations written to it will go through group 0. Extend the `TOPOLOGY` schema with new columns: - `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping node's `ring_slice`, it stores UUID of a newly introduced CDC generation which is used as partition key for the `CDC_GENERATIONS_V3` table to access this new generation's data. It's a regular column, meaning that every row (corresponding to a node) will have its own. - `current_cdc_generation_uuid` and `current_cdc_generation_timestamp` together form the ID of the newest CDC generation in the cluster. (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is when the CDC generation starts operating). Those are static columns since there's a single newest CDC generation. When topology coordinator handles a request for node to join, calculate a new CDC generation using the bootstrapping node's tokens, translate it to mutation format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0 at the same time we assign tokens to the node in Raft topology. The partition key for this data is stored in the bootstrapping node's `ring_slice`. After inserting new CDC generation data , we need to pick a timestamp for this generation and commit it, telling all nodes in the cluster to start using the generation for CDC log writes once their clocks cross that timestamp. We introduce a separate step to the bootstrap saga, before `write_both_read_old`, called `commit_cdc_generation`. In this step, the coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping node's `ring_slice` - which serves as the key to the table where the CDC generation data is stored - and combines it with a timestamp which it generates a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by default 1 minute). This gives us a CDC generation ID which we commit into the topology state as the `current_cdc_generation_id` while switching the saga to the next step, `write_both_read_old`. Once a new CDC generation is committed to the cluster by the topology coordinator, we also need to publish it to the user-facing description tables so CDC applications know which streams to read from. This uses regular distributed table writes underneath (tables living in the `system_distributed` keyspace) so it requires `token_metadata` to be nonempty. We need a hack for the case of bootstrapping the first node in the cluster - turning the tokens into normal tokens earlier in the procedure in `token_metadata`, but this is fine for the single-node case since no streaming is happening. When a node notices that a new CDC generation was introduced in `storage_service::topology_state_load`, it updates its internal data structures that are used when coordinating writes to CDC log tables. We include the current CDC generation data in topology snapshot transfers. Some fixes and refactors included. Closes #13385 * github.com:scylladb/scylladb: docs: cdc: describe generation changes using group 0 topology coordinator cdc: generation_service: add a FIXME cdc: generation_service: add legacy_ prefix for gossiper-based functions storage_service: include current CDC generation data in topology snapshots db: system_keyspace: introduce `query_mutations` with range/slice storage_service: hold group 0 apply mutex when reading topology snapshot service: raft_group0_client: introduce `hold_read_apply_mutex` storage_service: use CDC generations introduced by Raft topology raft topology: publish new CDC generation to the user description tables raft topology: commit a new CDC generation on node bootstrap raft topology: create new CDC generation data during node bootstrap service: topology_state_machine: make topology::find const db: system_keyspace: small refactor of `load_topology_state` cdc: generation: extract pure parts of `make_new_generation` outside db: system_keyspace: add storage for CDC generations managed by group 0 service: topology_state_machine: better error checking for state name (de)serialization service: raft: plumbing `cdc::generation_service&` cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter cdc: generation: make `topology_description_generator::get_sharding_info` a parameter sys_dist_ks: make `get_cdc_generation_mutations` public sys_dist_ks: move find_schema outside `get_cdc_generation_mutations` sys_dist_ks: move mutation size threshold calculation outside `get_cdc_generation_mutations` service/raft: group0_state_machine: signal topology state machine in `load_snapshot`	2023-04-21 18:11:27 +02:00
Kefu Chai	576adbdbc5	replica, test: create generation id using generator reuse generation_generator for generating generation identifiers for less repeatings. also, add allow update generator to update its lastest known generation id. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 22:02:30 +08:00
Pavel Emelyanov	739455c3aa	code: Remove global proxy No code needs global proxy anymore. Keep on-stack values in main and cql_test_env and keep the pointer on debug:: namespace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 14:18:59 +03:00
Pavel Emelyanov	681a19f54c	test: Carry proxy reference on cql_test_env All sharded<> services are created by cql_test_env on the stack. The cql_test_env() is then used to keep references on some of them and to export them to test cases via its methods. Proxy is missing on that exportable list, but will be needed, so add one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 14:16:54 +03:00
Kamil Braun	59b692e799	service: raft: plumbing `cdc::generation_service&` Pass a reference to the service into places. It shall be used later, by the group 0 state machine and topology coordinator.	2023-04-20 15:38:37 +02:00
Pavel Emelyanov	b239e0d368	test/lib: Add getenv_safe() helper The helper is like ::getenv() but checks if the variable exists and throws descriptive exception. So instead of fatal error: in "...": std::logic_error: basic_string: construction from null is not valid one could get something like fatal error: in "...": std::logic_error: Environment variable ... not set Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-19 12:49:26 +03:00
Pavel Emelyanov	4bb885b759	sstable: Make storage instance based on storage options This patch adds storage options lw-ptr to sstables_manager::make_sstable and makes the storage instance creation depend on the options. For local it just creates the filesystem storage instance, for S3 -- throws, but next patch will fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	525a261a4e	sstable: Make storage an API Currently sstable carries a filesystem_storage instance on board. Next patches will make it possible to use some other storage with different data accessing methods. This patch makes sstable carry abstract storage interface and make the existing filesystem_storage implement it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Botond Dénes	0a46a574e6	Merge 'Topology: introduce nodes' from Benny Halevy As a first step towards using host_id to identify nodes instead of ip addresses this series introduces a node abstraction, kept in topology, indexed by both host_id and endpoint. The revised interface also allows callers to handle cases where nodes are not found in the topology more gracefully by introducing `find_node()` functions that look up nodes by host_id or inet_address and also get a `must_exist` parameter that, if false (the default parameter value) would return nullptr if the node is not found. If true, `find_node` throws an internal error, since this indicates a violation of an internal assumption that the node must exist in the topology. Callers that may handle missing nodes, should use the more permissive flavor and handle the !find_node() case gracefully. Closes #11987 * github.com:scylladb/scylladb: topology: add node state topology: remove dead code locator: add class node topology: rename update_endpoint to add_or_update_endpoint topology: define get_{rack,datacenter} inline shared_token_metadata: mutate_token_metadata: replicate to all shards locator: endpoint_dc_rack: refactor default_location locator: endpoint_dc_rack: define default operator== test: storage_proxy_test: provide valid endpoint_dc_rack	2023-04-06 13:47:22 +03:00
Tomasz Grabiec	bbabf07f69	Merge 'test/boost/multishard_mutation_query: use random schema' from Botond Dénes This test currently uses `test/lib/test_table.hh` to generate data for its test cases. This data generation facility is used by no other tests. Worse, it is redundant as we already have a random data generator with fixed schema, in `test/lib/mutation_source_test.hh`. So in this series, we migrate the test cases in said test file to random schema and its random data generation facilities. These are used by several other test cases and using random schema allows us to cover a wider (quasi-infinite) number of possibilities. After migrating all tests away from it, `test/lib/test_table.hh` is removed. This series also reduces the runtime of `fuzzy_test` drastically. It should now run in a few minutes or even in seconds (depending on the machine). Fixes: #12944 Closes #12574 * github.com:scylladb/scylladb: test/lib: rm test_table.hh test/boos/multishard_mutation_query_test: migrate other tests to random schema test/boost/multishard_mutation_query_test: use ks keyspace test/boost/multishard_mutation_query_test: improve test pager test/boost/multishard_mutation_query_test: refactor fuzzy_test test/boost: add multishard_mutation_query_test more memory types/user: add get_name() accessor test/lib/random_schema: add create_with_cql() test/lib/random_schema: fix udt handling test/lib/random_schema: type_generator(): also generate frozen types test/lib/random_schema: type_generator(): make static column generation conditional test/lib/random_schema: type_generator(): don't generate duration_type for keys test/lib/random_schema: generate_random_mutations(): add overload with seed test/lib/random_schema: generate_random_mutations(): respect range tombstone count param test/lib/random_schema: generate_random_mutations(): add yields test/lib/random_schema: generate_random_mutations(): fix indentation test/lib/random_schema: generate_random_mutations(): coroutinize method test/lib/random_schema: generate_random_mutations(): expand comment	2023-04-05 10:32:58 +02:00
Botond Dénes	8167f11a23	Merge 'Move compaction manager tasks out of compaction manager' from Aleksandra Martyniuk Task manager compaction tasks that cover compaction group compaction need access to compaction_manager::tasks. To avoid circular dependency and be able to rely on forward declaration, task needs to be moved out of compaction manager. To avoid naming confusion compaction_manager::task is renamed. Closes #13226 * github.com:scylladb/scylladb: compaction: use compaction namespace in compaction_manager.cc compaction: rename compaction::task compaction: move compaction_manager::task out of compaction manager compaction: move sstable_task definition to source file	2023-04-03 15:40:42 +03:00
Benny Halevy	f3d5df5448	locator: add class node And keep per node information (idx, host_id, endpoint, dc_rack, is_pending) in node objects, indexed by topology on several indices like: idx, host_id, endpoint, current/pending, per dc, per dc/rack. The node index is a shorthand identifier for the node. node* and index are valid while the respective topology instance is valid. To be used, the caller must hold on to the topology / token_metadata object (e.g. via a token_metadata_ptr or effective_replication_map) Refs #6403 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> topology: add node idx Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:13:02 +03:00
Tomasz Grabiec	4d6443e030	Merge 'Schema commitlog separate dir' from Gusev Petr The commitlog api originally implied that the commitlog_directory would contain files from a single commitlog instance. This is checked in segment_manager::list_descriptors, if it encounters a file with an unknown prefix, an exception occurs in `commitlog::descriptor::descriptor`, which is logged with the `WARN` level. A new schema commitlog was added recently, which shares the filesystem directory with the main commitlog. This causes warnings to be emitted on each boot. This patch solves the warnings problem by moving the schema commitlog to a separate directory. In addition, the user can employ the new `schema_commitlog_directory` parameter to move the schema commitlog to another disk drive. This is expected to be released in 5.3. As #13134 (raft tables->schema commitlog) is also scheduled for 5.3, and it already requires a clean rolling restart (no cl segments to replay), we don't need to specifically handle upgrade here. Fixes: #11867 Closes #13263 * github.com:scylladb/scylladb: commitlog: use separate directory for schema commitlog schema commitlog: fix commitlog_total_space_in_mb initialization	2023-03-30 23:48:58 +02:00

1 2 3 4 5 ...

905 Commits