scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 20:46:56 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	b3df2d0db0	s3/test: Tune-up multipart upload test alignment Currently the test uses a sequence of 1024-bytes buffers. This lets minio server actively de-duplicate those blocks by page boundary (it's a guess, but it it's truish because minio reports back equivalent ETags for lots of uploading parts). Make the buffer not be power of two so that when squashed together the resulting 2^X buffers don't get equal. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:23:18 +03:00
Pavel Emelyanov	fffa04fa67	s3/test: Add jumbo upload test It re-uses most of the existing upload sink test, but configures the jumbo sink with at most 3 parts in each intermediate object not to upload 50Gb part to switch to the next one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:23:18 +03:00
Alejo Sanchez	19687b54f1	test/pytest: yaml configuration cluster section Separate cluster_size into a cluster section and specify this value as initial_size. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13440	2023-05-15 09:48:39 +02:00
Botond Dénes	0cff0ffa08	Merge 'alternator,config: make alternator_timeout_in_ms live-updateable' from Kefu Chai before this change, alternator_timeout_in_ms is not live-updatable, as after setting executor's default timeout right before creating sharded executor instances, they never get updated with this option anymore. but many users would like to set the driver timers based on server timers. we need to enable them to configure timeout even when the server is still running. in this change, * `alternator_timeout_in_ms` is marked as live-updateable * `executor::_s_default_timeout` is changed to a thread_local variable, so it can be updated by a per-shard updateable_value. and it is now a updateable_value, so its variable name is updated accordingly. this value is set in the ctor of executor, and it is disconnected from the corresponding named_value<> option in the dtor of executor. * alternator_timeout_in_ms is passed to the constructor of executor via sharded_parameter, so `executor::_timeout_in_ms` can be initialized on per-shard basis * `executor::set_default_timeout()` is dropped, as we already pass the option to executor in its ctor. Fixes #12232 Closes #13300 * github.com:scylladb/scylladb: alternator: split the param list of executor ctor into multi lines alternator,config: make alternator_timeout_in_ms live-updateable	2023-05-15 10:16:29 +03:00
Botond Dénes	6c27297406	Merge 'test: sstable_test: use generator to create new generations' from Kefu Chai in this series, instead of hardwiring to integer, we switch to generation generator for creating new generations. this should helps us to migrate to a generation identifier which can also represented by UUID. and potentially can help to improve the testing coverage once we switch over to UUID-based generation identifier. will need to parameterize these tests by then, for sure. Closes #13863 github.com:scylladb/scylladb: test: sstable: use generator to generate generations test: sstable: pass generation_type in helper functions test: sstable: use generator to generate generations	2023-05-15 10:04:30 +03:00
Botond Dénes	20ff122a84	Merge 'Delete S3 sstables without the help of deletion log' from Pavel Emelyanov There are two layers of stables deletion -- delete-atomically and wipe. The former is in fact the "API" method, it's called by table code when the specific sstable(s) are no longer needed. It's called "atomically" because it's expected to fail in the middle in a safe manner so that subsequent boot would pick the dangling parts and proceed. The latter is a low-level removal function that can fail in the middle, but it's not of _its_ care. Currently the atomic deletion is implemented with the help of sstable_directory::delete_atomically() method that commits sstables files names into deletion log, then calls wipe (indirectly), then drops the deletion log. On boot all found deletion logs are replayed. The described functionality is used regardless of the sstable storage type, even for S3, though deletion log is an overkill for S3, it's better be implemented with the help of ownership table. In fact, S3 storage already implements atomic deletion in its wipe method thus being overly careful. So this PR - makes atomic deletion be storage-specific - makes S3 wipe non-atomic fixes: #13016 note: Replaying sstables deletion from ownership table on boot is not here, see #13024 Closes #13562 * github.com:scylladb/scylladb: sstables: Implement atomic deleter for s3 storage sstables: Get atomic deleter from underlying storage sstables: Move delete_atomically to manager and rename	2023-05-15 08:57:47 +03:00
Wojciech Mitros	96e912e1cf	auth: disallow CREATE permission on a specific function Similarly to how we handle Roles and Tables, we do not allow permissions on non-existent objects, so the CREATE permission on a specific function is meaningless, because for the permission to be granted to someone, the function must be already created. This patch removes the CREATE permission from the set of permissions applicable to a specific function. Fixes #13822 Closes #13824	2023-05-14 18:40:34 +03:00
Wojciech Mitros	1e18731a69	cql-pytest: translate Cassandra's UFTypesTest This is a translation of Cassandra's CQL unit test source file validation/entities/UFTypesTest.java into our cql-pytest framework. There are 7 tests, which reproduce one known bug: Refs #13746: UDF can only be used in SELECT, and abort when used in WHERE, or in INSERT/UPDATE/DELETE commands And uncovered two previously unknown bugs: Refs #13855: UDF with a non-frozen collection parameter cannot be called on a frozen value Refs #13860: A non-frozen collection returned by a UDF cannot be used as a frozen one Additionally, we encountered an issue that can be treated as either a bug or a hole in documentation: Refs #13866: Argument and return types in UDFs can be frozen Closes #13867	2023-05-14 15:22:03 +03:00
Avi Kivity	31e820e5a1	Merge 'Allow tombstone GC in compaction to be disabled on user request' from Raphael "Raph" Carvalho Adding new APIs /column_family/tombstone_gc and /storage_service/tombstone_gc, that will allow for disabling tombstone garbage collection (GC) in compaction. Mimicks existing APIs /column_family/autocompaction and /storage_service/autocompaction. column_family variant must specify a single table only, following existing convention. whereas the storage_service one can specify an entire keyspace, or a subset of a tables in a keyspace. column_family API usage ----- ``` The table name must be in keyspace:name format Get status: curl -s -X GET "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf" Enable GC curl -s -X POST "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf" Disable GC curl -s -X DELETE "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf" ``` storage_service API usage ----- ``` Tables can be specified using a comma-separated list. Enable GC on keyspace curl -s -X POST "http://127.0.0.1:10000/storage_service/tombstone_gc/ks" Disable GC on keyspace curl -s -X DELETE "http://127.0.0.1:10000/storage_service/tombstone_gc/ks" Enable GC on a subset of tables curl -s -X POST "http://127.0.0.1:10000/storage_service/tombstone_gc/ks?cf=table1,table2" ``` Closes #13793 * github.com:scylladb/scylladb: test: Test new API for disabling tombstone GC test: rest_api: extract common testing code into generic functions Add API to disable tombstone GC in compaction api: storage_service: restore indentation api: storage_service: extract code to set attribute for a set of tables tests: Test new option for disabling tombstone GC in compaction compaction_strategy: bypass tombstone compaction if tombstone GC is disabled table: Allow tombstone GC in compaction to be disabled on user request	2023-05-14 14:16:16 +03:00
Tomasz Grabiec	a91e83fad6	Merge "issue raft read barrier before pulling schema" from Gleb Schema pull may fail because the pull does not contain everything that is needed to instantiate a schema pointer. For instance it does not contain a keyspace. This series changes the code to issue raft read barrier before the pull which will guaranty that the keyspace is created before the actual schema pull is performed.	2023-05-14 14:14:24 +03:00
Raphael S. Carvalho	a7ceb987f5	test: Fix sporadic failures of database_test database_test is failing sporadically and the cause was traced back to commit `e3e7c3c7e5`. The commit forces a subset of tests in database_test, to run once for each of predefined x_log2_compaction_group settings. That causes two problems: 1) test becomes 240% slower in dev mode. 2) queries on system.auth is timing out, and the reason is a small table being spread across hundreds of compaction groups in each shard. so to satisfy a range scan, there will be multiple hops, making the overhead huge. additionally, the compaction group aware sstable set is not merged yet. so even point queries will unnecessarily scan through all the groups. Fixes #13660. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13851	2023-05-14 14:14:24 +03:00
Avi Kivity	97694d26c4	Merge 'reader_permit: minor improvements to resource consume/release safety' from Botond Dénes This PR contains some small improvements to the safety of consuming/releasing resources to/from the semaphore: * reader_permit: make the low-level `consume()/signal()` API private, making the only user (an RAII class) friend. * reader_resources: split `reset()` into `noexcept` and potentially throwing variant. * reader_resources::reset_to(): try harder to avoid calling `consume()` (when the new resource amount is smaller then the previous one) Closes #13678 * github.com:scylladb/scylladb: reader_permit: resource_units::reset_to(): try harder to avoid calling consume() reader_permit: split resource_units::reset() reader_permit: make consume()/signal() API private	2023-05-14 14:14:23 +03:00
Avi Kivity	0a78995e2b	Merge 'Share s3 clients between sstables' from Pavel Emelyanov Currently s3::client is created for each sstable::storage. It's later shared between sstable's files and upload sink(s). Also foreign_sstable_open_info can produce a file from a handle making a new standalone client. Coupled with the seastar's http client spawning connections on demand, this makes it impossible to control the amount of opened connections to object storage server. In order to put some policy on top of that (as well as apply workload prioritization) s3 clients should be collected in one place and then shared by users. Since s3::client uses seastar::http::client under the hood which, in turn, can generate many connections on demand, it's enough to produce a single s3::client per configured endpoint one each shard and then share it between all the sstables, files and sinks. There's one difficulty however, solving which is most of what this PR does. The file handle, that's used to transfer sstable's file across shards, should keep aboard all it needs to re-create a file on another shard. Since there's a single s3::client per shard, creation of a file out of a handle should grab that shard's client somehow. The meaningful shard-local object that can help is the sstables_manager and there are three ways to make use of it. All deal with the fact that sstables_manager-s are not sharded<> services, but are owner by the database independently on each shard. 1. walk the client -> sst.manager -> database -> container -> database -> sst.manager -> client chain by keeping its first half on the handle and unrolling the second half to produce a file 2. keep sharded peering service referenced by the sstables_manager that's initialized in main and passed though the database constructor down to sstables_manager(s) 3. equip file_handle::to_file with the "context" argument and teach sstables foreign info opener to push sstables_manager down to s3 file ... somehow This PR chooses the 2nd way and introduces the sstables::storage_manager main-local sharded peering service that maintains all the s3::clients. "While at it" the new manager gets the object_storage_config updating facilities from the database (it's overloaded even without it already). Later the manager will also be in charge of collecting and exporting S3 metrics. In order to limit the number of S3 connections it also needs a patch seastar http::client, there's PR already doing that, once (if) merged there'll come one more fix on top. refs: #13458 refs: #13369 refs: scylladb/seastar#1652 Closes #13859 * github.com:scylladb/scylladb: s3: Pick client from manager via handle s3: Generalize s3 file handle s3: Live-update clients' configs sstables: Keep clients shared across sstables storage_manager: Rewrap config map sstables, database: Move object storage config maintenance onto storage_manager sstables: Introduce sharded<storage_manager>	2023-05-14 14:14:23 +03:00
Pavel Emelyanov	5985f00da9	sstables: Move delete_atomically to manager and rename This is to let manager decide which storage driver to call for atomic sstables deletion in the next patch. While at it -- rename the sstable_directory's method into something more descriptive (to make compiler catch all callers of it). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-12 17:52:12 +03:00
Raphael S. Carvalho	107999c990	test: Test new API for disabling tombstone GC Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-12 10:34:38 -03:00
Raphael S. Carvalho	c396db2e4c	test: rest_api: extract common testing code into generic functions Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-12 10:34:38 -03:00
Raphael S. Carvalho	6c32148751	tests: Test new option for disabling tombstone GC in compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-12 10:14:28 -03:00
Raphael S. Carvalho	3b28c26c77	table: Allow tombstone GC in compaction to be disabled on user request If tombstone GC was disabled, compaction will ensure that fully expired sstables won't be bypassed and that no expired tombstones will be purged. Changing the value takes immediate effect even on ongoing compactions. Not wired into an API yet. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-12 10:14:28 -03:00
Kefu Chai	e89e0d4b28	test: sstable: use generator to generate generations instead of assuming the integer-based generation id, let's use the generation generator for creating a new generation id. this helps us to improve the testing coverity once we migrate to the UUID-based generation identifier. this change uses generator to generate generations for `make_sstable_for_all_shards()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-12 13:22:32 +08:00
Kefu Chai	e3d6dd46b7	test: sstable: pass generation_type in helper functions always avoid using generation_type if possible. this helps us to hide the underlying type of generation identifier, which could also be a UUID in future. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-12 13:22:32 +08:00
Kefu Chai	e788bfbb43	test: sstable: use generator to generate generations instead of assuming the integer-based generation id, let's use the generation generator for creating a new generation id. this helps us to improve the testing coverity once we migrate to the UUID-based generation identifier. this change uses generator to create generations for `make_sstable_for_this_shard()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-12 13:22:30 +08:00
Pavel Emelyanov	a59096aa70	sstables, database: Move object storage config maintenance onto storage_manager Right now the map<endpoint, config> sits on the sstables manager and its update is governed by database (because it's peering and can kick other shards to update it as well). Having the sharded<storage_manager> at hand lets freeing database from the need to update configs and keeps sstables_manager a bit smaller. Also this will allow keeping s3 clients shared between sstables via this map by next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:39:00 +03:00
Pavel Emelyanov	2153751d45	sstables: Introduce sharded<storage_manager> The manager in question keeps track of whatever sstables_manager needs to work with the storage (spoiler: only S3 one). It's main-local sharded peering service, so that container() call can be used by next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-11 19:36:01 +03:00
Gleb Natapov	091ec285fe	serialized_action: make serialized_action abortable Add an ability to abort waiting for a result of a specific trigger() invocation.	2023-05-11 16:31:23 +03:00
Botond Dénes	3d75158fda	Merge 'Allow no owned token ranges in cleanup compaction' from Benny Halevy It is possible that a node will have no owned token ranges in some keyspaces based on their replication strategy, if the strategy is configured to have no replicas in this node's data center. In this case we should go ahead with cleanup that will effectively delete all data. Note that this is current very inefficient as we need to filter every partition and drop it as unowned. It can be optimized by either special casing this case or, better, use skip forward to the next owned range. This will skip to end-of-stream since there are no owned ranges. Fixes #13634 Also, add a respective rest_api unit test Closes #13849 * github.com:scylladb/scylladb: test: rest_api: test_storage_service: add test_storage_service_keyspace_cleanup_with_no_owned_ranges compaction_manager: perform_cleanup: handle empty owned ranges	2023-05-11 15:05:06 +03:00
Botond Dénes	24cb351655	Merge 'test: sstable_test: avoid using helper using generation_type::int_t ' from Kefu Chai the series drops some of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation. Closes #13845 github.com:scylladb/scylladb: test: drop unused helper functions test: sstable_mutation_test: avoid using helper using generation_type::int_t test: sstable_move_test: avoid using helper using generation_type::int_t test: sstable_*test: avoid using helper using generation_type::int_t test: sstable_3_x_test: do not use reuseable_sst() accepting integer	2023-05-11 10:17:02 +03:00
Benny Halevy	0b91bfbcc5	test: rest_api: test_storage_service: add test_storage_service_keyspace_cleanup_with_no_owned_ranges Test cleanup on a keyspace after altering it replication factor to 0. Expect no sstables to remain. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-11 08:16:31 +03:00
Kefu Chai	29284d64a5	test: drop unused helper functions all users of these two helpers have switched to their alternatives, so there is no need to keep them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-11 12:32:37 +08:00
Kefu Chai	b036d2b50c	test: sstable_mutation_test: avoid using helper using generation_type::int_t this change is one of the series which drops most of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation. so, in this change, instead of using `generation_type::int_t` in the helper functions, we just pass `generation_type` in place of integer. also, since `generate_clustered()` is only used by functions in the same compilation unit, let's take the opportunity to mark it `static`. and there is no need to pass generation as a template parameter, we just pass it as a regular parameter. we will divert other callers of `reusable_sst(..., generation_type::int)` in following-up changes in different ways. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-11 12:32:22 +08:00
Kefu Chai	689e1e99d6	test: sstable_move_test: avoid using helper using generation_type::int_t this change is one of the series which drops most of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation. so, in this change, instead of using `generation_type::int_t` in helper functions, we just use `generation_type`. please note, despite that we'd prefer generating the generations using generator, the SSTables used by the tests modified by this change are stored in the repo, to ensure that the tests are always able to find the SSTable files, we keep them unchanged instead of using generation_generator, or a random generation for the testing. we will divert other callers of `reusable_sst(..., generation_type::int)` in following-up changes in different ways. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-11 12:32:22 +08:00
Kefu Chai	bfd6caffbb	test: sstable_*test: avoid using helper using generation_type::int_t this change is one of the series which drops most of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation. so, in this change, instead of using the helper accepting int, we switch to the one which accepts generation_type by offering a default paramter, which is a generation created using 1. this preserves the existing behavior. we will divert other callers of `reusable_sst(..., generation_type::int)` in following-up changes in different ways. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-11 12:32:22 +08:00
Kefu Chai	ab8efbf1ab	test: sstable_3_x_test: do not use reuseable_sst() accepting integer this change is one of the series which drops most of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation. so, in this change, instead of using the helper accepting int, we switch to the one which accepts generation_type. also, as no callers are using the last parameter of `make_test_sstable()`, let's drop it . Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-11 12:32:21 +08:00
Nadav Har'El	f1cad230bb	Merge 'cql: enable setting permissions on resources with quoted UDT names' from Wojciech Mitros This series fixes an issue with altering permissions on UDFs with parameter types that are UDTs with quoted names and adds a test for it. The issue was caused by the format of the temporary string that represented the UDT in `auth::resource`. After parsing the user input to a raw type, we created a string representing the UDT using `ut_name::to_string()`. The segment of the resulting string that represented the name of the UDT was not quoted, making us unable to parse it again when the UDT was being `prepare`d. Other than for this purpose, the `ut_name::to_string()` is used only for logging, so the solution was modifying it to maybe quote the UDT name. Ref: https://github.com/scylladb/scylladb/pull/12869 Closes #13257 * github.com:scylladb/scylladb: cql-pytest: test permissions for UDTs with quoted names cql: maybe quote user type name in ut_name::to_string() cql: add a check for currently used stack in parser cql-pytest: add an optional name parameter to new_type()	2023-05-10 19:10:29 +03:00
Wojciech Mitros	1f45c7364c	cql: check permissions for used functions when creating a UDA Currently, when creating a UDA, we only check for permissions for creating functions. However, the creator gains all permissions to the UDA, including the EXECUTE permission. This enables the user to also execute the state/reduce/final functions that were used in the UDA, even if they don't have the EXECUTE permissions on them. This patch adds checks for the missing EXECUTE permissions, so that the UDA can be only created if the user has all required permissions. The new permissions that are now required when creating a UDA are now granted in the existing UDA test. Fixes #13818 Closes #13819	2023-05-10 18:06:04 +03:00
Wojciech Mitros	a86b9fa0bb	auth: fix formatting of function resource with no arguments Currently, when a function has no arguments, the function_args() method, which is supposed to return a vector of string_views representing the arguments of the function, returns a nullopt instead, as if it was a functions_resource on all functions or all functions in a keyspace. As a result, the functions_resource can't be properly formatted. This is fixed in this patch by returning an empty vector instead, and the fix is confirmed in a cql-pytest. Fixes #13842 Closes #13844	2023-05-10 17:07:33 +03:00
Nadav Har'El	e57252092c	Merge 'cql3: result_set, selector: change value type to managed_bytes_opt' from Avi Kivity CQL evolved several expression evaluation mechanisms: WHERE clause, selectors (the SELECT clause), and the LWT IF clause are just some examples. Most now use expressions, which use managed_bytes_opt as the underlying value representation, but selectors still use bytes_opt. This poses two problems: 1. bytes_opt generates large contiguous allocations when used with large blobs, impacting latency 2. trying to use expressions with bytes_opt will incur a copy, reducing performance To solve the problem, we harmonize the data types to managed_bytes_opt (#13216 notwithstanding). This is somewhat difficult since the source of the values are views into a bytes_ostream. However, luckily bytes_ostream and managed_bytes_view are mostly compatible so with a little effort this can be done. The series is neutral wrt performance: before: ``` 222118.61 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 224250.14 tps ( 61.1 allocs/op, 12.1 tasks/op, 43094 insns/op, 0 errors) 224115.66 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors) 223508.70 tps ( 61.1 allocs/op, 12.1 tasks/op, 43107 insns/op, 0 errors) 223498.04 tps ( 61.1 allocs/op, 12.1 tasks/op, 43087 insns/op, 0 errors) ``` after: ``` 220708.37 tps ( 61.1 allocs/op, 12.1 tasks/op, 43118 insns/op, 0 errors) 225168.99 tps ( 61.1 allocs/op, 12.1 tasks/op, 43081 insns/op, 0 errors) 222406.00 tps ( 61.1 allocs/op, 12.1 tasks/op, 43088 insns/op, 0 errors) 224608.27 tps ( 61.1 allocs/op, 12.1 tasks/op, 43102 insns/op, 0 errors) 225458.32 tps ( 61.1 allocs/op, 12.1 tasks/op, 43098 insns/op, 0 errors) ``` Though I expect with some more effort we can eliminate some copies. Closes #13637 * github.com:scylladb/scylladb: cql3: untyped_result_set: switch to managed_bytes_view as the cell type cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt cql3: untyped_result_set: always own data types: abstract_type: add mixed-type versions of compare() and equal() utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt utils: managed_bytes: add managed_bytes_view::with_linearized() utils: managed_bytes: mark managed_bytes_view::is_linearized() const	2023-05-10 15:01:45 +03:00
Botond Dénes	bb62038119	Merge 'Scrub compaction task' from Aleksandra Martyniuk Task manager's tasks covering scrub compaction on top, shard and table level. For this levels we have common scrub tasks for each scrub mode since they share code. Scrub modes will be differentiated on compaction group level. Closes #13694 * github.com:scylladb/scylladb: test: extend test_compaction_task.py to test scrub compaction compaction: add table_scrub_sstables_compaction_task_impl compaction: add shard_scrub_sstables_compaction_task_impl compaction: add scrub_sstables_compaction_task_impl api: get rid of unnecessary std::optional in scrub compaction: rename rewrite_sstables_compaction_task_impl	2023-05-10 14:18:20 +03:00
Kamil Braun	7d9ab44e81	Merge 'token_metadata: read remapping for write_both_read_new' from Gusev Petr When new nodes are added or existing nodes are deleted, the topology state machine needs to shunt reads from the old nodes to the new ones. This happens in the `write_both_read_new` state. The problem is that previously this state was not handled in any way in `token_metadata` and the read nodes were only changed when the topology state machine reached the final 'owned' state. To handle `write_both_read_new` an additional `interval_map` inside `token_metadata` is maintained similar to `pending_endpoints`. It maps the ranges affected by the ongoing topology change operation to replicas which should be used for reading. When topology state sm reaches the point when it needs to switch reads to a new topology, it passes `request_read_new=true` in a call to `update_pending_ranges`. This forces `update_pending_ranges` to compute the ranges based on new topology and store them to the `interval_map`. On the data plane, when a read on coordinator needs to decide which endpoints to use, it first consults this `interval_map` in `token_metadata`, and only if it doesn't contain a range for current token it uses normal endpoints from `effective_replication_map`. Closes #13376 * github.com:scylladb/scylladb: storage_proxy, storage_service: use new read endpoints storage_proxy: rename get_live_sorted_endpoints->get_endpoints_for_reading token_metadata: add unit test for endpoints_for_reading token_metadata: add endpoints for reading sequenced_set: add extract_set method token_metadata_impl: extract maybe_migration_endpoints helper function token_metadata_impl: introduce migration_info token_metadata_impl: refactor update_pending_ranges token_metadata: add unit tests token_metadata: fix indentation token_metadata_impl: return unique_ptr from clone functions	2023-05-10 10:03:30 +02:00
Avi Kivity	996f717dfc	Merge 'cql3/prepare_expr: force token() receiver name to be partition key token' from Jan Ciołek Let's say that we have a prepared statement with a token restriction: ```cql SELECT * FROM some_table WHERE token(p1, p2) = ? ``` After calling `prepare` the drivers receives some information about the prepared statment, including names of values bound to each bind marker. In case of a partition token restriction (`token(p1, p2) = ?`) there's an expectation that the name assigned to this bind marker will be `"partition key token"`. In a recent change the code handling `token()` expressions has been unified with the code that handles generic function calls, and as a result the name has changed to `token(p1, p2)`. It turns out that the Java driver relies on the name being `"partition key token"`, so a change to `token(p1, p2)` broke some things. This patch sets the name back to `"partition key token"`. To achieve this we detect any restrictions that match the pattern `token(p1, p2, p3) = X` and set the receiver name for X to `"partition key token"`. Fixes: #13769 Closes #13815 * github.com:scylladb/scylladb: cql-pytest: test that bind marker is partition key token cql3/prepare_expr: force token() receiver name to be partition key token	2023-05-09 20:44:46 +03:00
Petr Gusev	15fe4d8d69	token_metadata: add unit test for endpoints_for_reading	2023-05-09 18:42:03 +04:00
Botond Dénes	287ccce1cc	Merge 'sstables: extract storage out ' from Kefu Chai this change extracts the storage class and its derived classes out into their own source files. for couple reasons: - for better readability. the sstables.hh is over 1005 lines. and sstables.cc 3602 lines. it's a little bit difficult to figure out how the different parts in these sources interact with each other. for instance, with this change, it's clear some of helper functions are only used by file_system_storage. - probably less inter-source dependency. by extracting the sources files out, they can be compiled individually, so changing one .cc file does not impact others. this could speed up the compilation time. Closes #13785 * github.com:scylladb/scylladb: sstables: storage: coroutinize idempotent_link_file() sstables: extract storage out	2023-05-09 14:03:40 +03:00
Jan Ciolek	9ad1c5d9f2	cql-pytest: test that bind marker is partition key token When preparing a query each bind marker gets a name. For a query like: ```cql SELECT * FROM some_table WHERE token(p1, p2) = ? ``` The bind marker's name should be `"partition key token"`. Java driver relies on this name, having something else, like `"token(p1, p2)"` be the name breaks the Java driver. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-05-09 12:33:06 +02:00
Petr Gusev	3120cabf56	token_metadata: add unit tests We are going to refactor update_pending_ranges, so in this commit we add some simple unit tests to ensure we don't break it.	2023-05-09 13:56:06 +04:00
Aleksandra Martyniuk	f199ec5ec3	test: extend test_compaction_task.py to test scrub compaction	2023-05-09 11:15:26 +02:00
Kefu Chai	2eefcb37eb	sstables: extract storage out this change extracts the storage class and its derived classes out into storage.cc and storage.hh. for couple reasons: - for better readability. the sstables.hh is over 1005 lines. and sstables.cc 3602 lines. it's a little bit difficult to figure out how the different parts in these sources interact with each other. for instance, with this change, it's clear some of helper functions are only used by file_system_storage. - probably less inter-source dependency. by extracting the sources files out, they can be compiled individually, so changing one .cc file does not impact others. this could speed up the compilation time. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-09 16:47:00 +08:00
Botond Dénes	20f620feb9	Merge 'replica, sstable: replace generation_type::value() with generation_type::as_int()' from Kefu Chai this series prepares for the UUID based generation by replacing the general `value()` function with the function with more specific name: `as_int()`. Closes #13796 * github.com:scylladb/scylladb: test: drop a reusable_sst() variant which accepts int as generation treewide: replace generation_type::value() with generation_type::as_int()	2023-05-09 07:30:54 +03:00
Nadav Har'El	5f37d43ee6	Merge 'compaction: validate: validate the index too' from Botond Dénes In addition to the data file itself. Currently validation avoids the index altogether, using the crawling reader which only relies on the data file and ignores the index+summary. This is because a corrupt sstable usually has a corrupt index too and using both at the same time might hide the corruption. This patch adds targeted validation of the index, independent of and in addition to the already existing data validation: it validates the order of index entries as well as whether the entry points to a complete partition in the data file. This will usually result in duplicate errors for out-of-order partitions: one for the data file and one for the index file. Fixes: #9611 Closes #11405 * github.com:scylladb/scylladb: test/cql-pytest: add test_sstable_validation.py test/cql-pytest: extract scylla_path,temp_workdir fixtures to conftest.py tools/scylla-sstables: write validation result to stdout sstables/sstable: validate(): delegate to mx validator for mx sstables sstables/mx/reader: add mx specific validator mutation/mutation_fragment_stream_validator: add validator() accessor to validating filter sstables/mx/reader: template data_consume_rows_context_m on the consumer sstables/mx/reader: move row_processing_result to namespace scope sstables/mx/reader: use data_consumer::proceed directly sstables/mx/reader.cc: extend namespace to end-of-file (cosmetic) compaction/compaction: remove now unused scrub_validate_mode_validate_reader() compaction/compaction: move away from scrub_validate_mode_validate_reader() tools/scylla-sstable: move away from scrub_validate_mode_validate_reader() test/boost/sstable_compaction_test: move away from scrub_validate_mode_validate_reader() sstables/sstable: add validate() method compaction/compaction: scrub_sstables_validate_mode(): validate sstables one-by-one compaction: scrub: use error messages from validator mutation_fragment_stream_validator: produce error messages in low-level validator	2023-05-08 17:14:26 +03:00
Botond Dénes	b790f14456	reader_concurrency_semaphore: execution_loop(): trigger admission check when _ready_list is empty The execution loop consumes permits from the _ready_list and executes them. The _ready_list usually contains a single permit. When the _ready_list is not empty, new permits are queued until it becomes empty. The execution loops relies on admission checks triggered by the read releasing resouces, to bring in any queued read into the _ready_list, while it is executing the current read. But in some cases the current read might not free any resorces and thus fail to trigger an admission check and the currently queued permits will sit in the queue until another source triggers an admission check. I don't yet know how this situation can occur, if at all, but it is reproducible with a simple unit test, so it is best to cover this corner-case in the off-chance it happens in the wild. Add an explicit admission check to the execution loop, after the _ready_list is exhausted, to make sure any waiters that can be admitted with an empty _ready_list are admitted immediately and execution continues. Fixes: #13540 Closes #13541	2023-05-08 17:11:41 +03:00
Kamil Braun	153cb00e9d	test: test_random_tables: wait for token ring convergence before data queries The test performs an `INSERT` followed by a `SELECT`, checking if the previously inserted data is returned. This may fail because we're using `ring_delay = 0` in tests and the two queries may arrive at different nodes, whose `token_metadata` didn't converge yet (it's eventually consistent based on gossiping). I illustrated this here: https://github.com/scylladb/scylladb/issues/12937#issuecomment-1536147455 Ensure that the nodes' token rings are synchronized (by waiting until the token ring members on each node is the same as group 0 configuration). Fixes #12937 Closes #13791	2023-05-08 13:22:52 +02:00
Kamil Braun	3f3dcf451b	test: pylib: random_tables: perform read barrier in `verify_schema` `RandomTables.verify_schema` is often called in topology tests after performing a schema change. It compares the schema tables fetched from some node to the expected latest schema stored by the `RandomTables` object. However there's no guarantee that the latest schema change has already propagated to the node which we query. We could have performed the schema change on a different node and the change may not have been applied yet on all nodes. To fix that, pick a specific node and perform a read barrier on it, then use that node to fetch the schema tables. Fixes #13788 Closes #13789	2023-05-08 13:21:10 +02:00

1 2 3 4 5 ...

4891 Commits