scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Mikołaj Grzebieluch	e327478bb5	test.py: enable maintenance socket in tests by default	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	21b3ba4927	docs: add maintenance socket documentation	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	f96d30c2b5	main: add maintenance socket Add initialization of maintenance_auth_service and cql_maintenance_server_ctl. Create maintenance socket which enables interaction with the node through CQL protocol without authentication. The maintenance port is available by Unix domain socket. It gives full-permission access. It is created before the node joins the cluster.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	16ab2c28e4	main: refactor initialization of cql controller and auth service Move initialization of cql controller and auth service to functions. It will make it easier to create a new cql controller with a seperate auth service, for example for the maintenance socket. Make it possible to initialize new services before joining group0.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	999be1d14b	auth/service: don't create system_auth keyspace when used by maintenance socket The maintenance socket is created before joining the cluster. When maintenance auth service is started it creates system_auth keyspace if it's missing. It is not synchronized with other nodes, because this node hasn't joined the group0 yet. Thus a node has a mismatched schema and is unable to join the cluster. The maintenance socket doesn't use role management, thus the problem is solved by not creating system_auth keyspace when maintenance auth service is created. The logic of regular CQL port's auth service won't be changed. For the maintenance socket will be created a new separate auth service.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	2b9a88d17a	cql_controller: maintenance socket: fix indentation	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	ac61d0f695	cql_controller: add option to start maintenance socket Add an option to listen on the maintenance socket. It is set up on an unix domain socket and the metrics are disabled. This enables having an independent authentication mechanism for this socket. To start the maintenance socket, a new cql_controller has to be created with `db::maintenance_socket_enabled::yes` argument. Creating maintenance socket will raise an exception if * the path is longer than 107 chars (due to linux limits), * a file or a directory already exists in the path. The indentation is fixed in the next commit.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	cf43787295	db/config: add maintenance_socket_enabled bool class	2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch	11a2748d7f	auth: add maintenance_socket_role_manager Add `maintenance_socket_role_manager` which will disable all operations associated with roles to not depend on system_auth keyspace, which may be not yet created when the maintenance socket starts listening	2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch	e682e362a3	db/config: add maintenance_socket variable If set to "ignore", maintenance socket will be disabled. If set to "workdir", maintenance socket will be opened on <scylla's workdir>/cql.m. Otherwise it will be opened on path provided by maintenance_socket variable. It is set by default to 'ignore'.	2023-12-18 11:42:05 +01:00
Alexander Turetskiy	f30b5473ab	cql: Reject empty options while altering a keyspace Reject ALTER KEYSPACE request for NetworkTopologyStrategy when replication options are missed. Also reject CREATE KEYSPACE with no replication factor options. Cassandra has a default_keyspace_rf configuration that may allow such CREATE KEYSPACE commands, but Scylla doesn't have this option (refs #16028). fixes #10036 Closes scylladb/scylladb#16221	2023-12-10 17:44:35 +02:00
Kefu Chai	818343b57d	build: build session.cc in CMake building system this source file was added in `d3d83869`. so let's update cmake as well. sessions_tests was added in the same commit, so add it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16344	2023-12-09 22:14:47 +02:00
Avi Kivity	d62a5fc60b	Merge 'tools/scylla-nodetool: implement additional commands, part 5/N ' from Botond Dénes This PR implements the following new nodetool commands: * decomission * rebuild * removenode * getlogginglevels * setlogginglevel * move * refresh All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#16348 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the refresh command tools/scylla-nodetool: implement the move command tools/scylla-nodetool: implement setlogginglevel command tools/sclla-sstable: implement the getlogginglevels command tools/scylla-nodetool: implement the removenode command tools/scylla-nodetool: implement the rebuild command tools/scylla-nodetool: implement the decommission command	2023-12-09 21:47:22 +02:00
Pavel Emelyanov	5e69415387	guardrails: Do not validate initial_tablets as replication factor When checking replication strategy options the code assumes (and it's stated in the preceeding code comment) that all options are replication factors. Nowadays it's no longer so, the initial_tablets one is not replication factor and should be skipped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16335	2023-12-09 15:56:41 +02:00
Botond Dénes	496459165e	tools/scylla-nodetool: implement the refresh command	2023-12-08 08:58:16 -05:00
Botond Dénes	ad148a9dbc	tools/scylla-nodetool: implement the move command In the java nodetool, this command ends up calling an API endpoint which just throws an exception saying moving tokens is not supported. So in the native implementation we just throw an exception to the same effect in scylla-nodetool itself.	2023-12-08 08:29:39 -05:00
Botond Dénes	58d3850da1	tools/scylla-nodetool: implement setlogginglevel command	2023-12-08 08:18:56 -05:00
Botond Dénes	3a8590e1af	tools/sclla-sstable: implement the getlogginglevels command	2023-12-08 07:32:45 -05:00
Botond Dénes	c35ed794de	tools/scylla-nodetool: implement the removenode command	2023-12-08 07:32:31 -05:00
Botond Dénes	9a484cb145	tools/scylla-nodetool: implement the rebuild command	2023-12-08 07:05:30 -05:00
Botond Dénes	ea62f7c848	tools/scylla-nodetool: implement the decommission command	2023-12-08 06:14:36 -05:00
Kefu Chai	893f319004	sstables: add formatter for index_consume_entry_context_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, in order to enable the code in the header to access the formatter without being moved down after the full specialization's definition, we * move the enum definition out of the class and before the class, * rename the enum's name from state to index_consume_entry_context_state * define a formatter for index_consume_entry_context_state * remove its operator<<(). as fmt v10 is able to use `format_as()` as a fallback, the formatter full specialization is guarded with `#if FMT_VERSION < 10'00'00`. we will remove it after we start build with fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16204	2023-12-08 12:45:38 +02:00
Kurashkin Nikita	c071cd92b5	cql3:statement_restrictions.cc add more conditions to prevent "allow filtering" error to pop up in delete/update statements Modified Cassandra tests to check for Scylla's error messages Fixes #12474 Closes scylladb/scylladb#15811	2023-12-07 21:25:18 +02:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Avi Kivity	4b1ef00dbb	Merge 'File stream for tablet preparation' from Asias He This series adds preparation patches for file stream tablet implementation in enterprise branch. It minimizes the differences between those two branches. Closes scylladb/scylladb#16297 * github.com:scylladb/scylladb: messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb compaction_group_for_token: Handle minimum_token and maximum_token token serializer: Add temporary_buffer support cql_test_env: Allow messaging_service to start listen	2023-12-07 16:26:22 +02:00
Avi Kivity	ed2a9b8750	Merge 'Commitlog: Fix reading/writing position calculations and allocation size checks' from Calle Wilund Fixes #16298 The adjusted buffer position calculation in buffer_position(), introduced in https://github.com/scylladb/scylladb/pull/15494 was in fact broken. It calculated (like previously) a "position" based on diff between underlying buffer size and ostream size() (i.e. avail), then adjusted this according to sector overhead rules. However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted. The two cannot be compared as such, which means the "positions" we get here are borked. Luckily for us (sarcasm), the position calculation in replayer made a similar error, in that it adjusts up current position by one sector overhead to much, leading to us more or less getting the same, erroneous results in both ends. However, when/iff one needs to adjust the segment file format further, one might very quickly realize that this does not work well if, say, one needs to be able to safely read some extra bytes before first chunk in a segment. Conversely, trying to adjust this also exposes a latent potential error in the skip mechanism, manifesting here. Issue fixed by keeping track of the initial ostream capacity for segment buffer, and use this for position calculation, and in the case of replayer, move file pos adjustment from read_data() to subroutine (shared with skipping), that better takes data stream position vs. file position adjustment. In implementaion terms, we first inc the "data stream" pos (i.e. pos in data without overhead), then adjust for overhead. Also fix replayer::skip, so that we handle the buffer/pos relation correctly now. Added test for intial entry position, as well as data replay consistency for single entry_writer paths. Fixes #16301 The calculation on whether data may be added is based on position vs. size of incoming data. However, it did not take sector overhead into account, which lead us to writing past allowed segment end, which in turn also leads to metrics overflows. Closes scylladb/scylladb#16302 * github.com:scylladb/scylladb: commitlog: Fix allocation size check to take sector overhead into account. commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart	2023-12-07 12:27:54 +02:00
Botond Dénes	fb9379edf1	test/cql-pytest: test_select_from_mutation_fragments: bump timeout for slow test The test test_many_partitions is very slow, as it tests a slow scan over a lot of partitions. This was observed to time out on the slower ARM machines, making the test flaky. To prevent this, create an extra-patient cql connection with a 10 minutes timeout for the scan itself. Fixes: #16145 Closes scylladb/scylladb#16303	2023-12-07 11:55:53 +02:00
Yaniv Kaul	862909ee4f	Typos: fix typos in documentation Using codespell, went over the docs and fixed some typos. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16275	2023-12-07 11:10:17 +02:00
Anna Stuchlik	8b01cb7fb8	doc: set 5.4 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: - 5.4 is the latest version. - 5.4 is removed from the list of unstable versions. It must be merged when ScyllaDB 5.4 is released. No backport is required. Closes scylladb/scylladb#16308	2023-12-07 10:04:26 +02:00
Calle Wilund	dba39b47bd	commitlog: Fix allocation size check to take sector overhead into account. Fixes #16301 The calculation on whether data may be added is based on position vs. size of incoming data. However, it did not take sector overhead into account, which lead us to writing past allowed segment end, which in turn also leads to metrics overflows.	2023-12-07 07:36:27 +00:00
Calle Wilund	0d35c96ef4	commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart Fixes #16298 The adjusted buffer position calculation in buffer_position(), introduced in #15494 was in fact broken. It calculated (like previously) a "position" based on diff between underlying buffer size and ostream size() (i.e. avail), then adjusted this according to sector overhead rules. However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted. The two cannot be compared as such, which means the "positions" we get here are borked. Luckily for us (sarcasm), the position calculation in replayer made a similar error, in that it adjusts up current position by one sector overhead to much, leading to us more or less getting the same, erroneous results in both ends. However, when/iff one needs to adjust the segment file format further, one might very quickly realize that this does not work well if, say, one needs to be able to safely read some extra bytes before first chunk in a segment. Conversely, trying to adjust this also exposes a latent potential error in the skip mechanism, manifesting here. Issue fixed by keeping track of the initial ostream capacity for segment buffer, and use this for position calculation, and in the case of replayer, move file pos adjustment from read_data() to subroutine (shared with skipping), that better takes data stream position vs. file position adjustment. In implementaion terms, we first inc the "data stream" pos (i.e. pos in data without overhead), then adjust for overhead. Also fix replayer::skip, so that we handle the buffer/pos relation correctly now. Added test for intial entry position, as well as data replay consistency for single entry_writer paths.	2023-12-07 07:36:27 +00:00
Asias He	6beadab9e6	messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb They will be used to implement file stream for tablet in the future. Reserve the verb ID.	2023-12-07 14:54:12 +08:00
Asias He	67cfa12c7d	compaction_group_for_token: Handle minimum_token and maximum_token token The following error was seen: [shard 0] table - compaction_group_for_token: compaction_group idx=0 range=(minimum token,-6917529027641081857] does not contain token=minimum token Since minimum_token or maximum_token will not be inside a token range. Skip the in token range check.	2023-12-07 14:54:12 +08:00
Asias He	974b28a750	serializer: Add temporary_buffer support It will be used by file stream for tablet.	2023-12-07 09:46:37 +08:00
Asias He	faaf58f62c	cql_test_env: Allow messaging_service to start listen This is needed for rpc calls to work in the tests. With this patch, by default, messaging_service does not listen as it was before. This is useful for file stream for tablet test.	2023-12-07 09:46:36 +08:00
Avi Kivity	92d61def57	Merge 'scylla_swap_setup: run error check before allocating swap and increase swap allocation speed' from Takuya ASADA This patch fixes error check and speed up swap allocation. Following patches are included: - scylla_swap_setup: run error check before allocating swap avoid create swapfile before running error check - scylla_swap_setup: use fallocate on ext4 this inclease swap allocation speed on ext4 Closes scylladb/scylladb#12668 * github.com:scylladb/scylladb: scylla_swap_setup: use fallocate on ext4 scylla_swap_setup: run error check before allocating swap	2023-12-06 21:40:10 +02:00
Avi Kivity	55dacb8480	Merge 'Generalize atomic sstables deletion' from Pavel Emelyanov The current implementation starts in sstables_manager that gets the deletion function from storage which, in turn, should atomically do sst.unlink() over a list of sstables (s3 driver is still not atomic though #13567). This PR generalizes the atomic deletion inside sstables_manager method and removes the atomic deletor function that nobody liked when it was introduced (#13562) Closes scylladb/scylladb#16290 * github.com:scylladb/scylladb: sstables/storage: Drop atomic deleter sstables/storage: Reimplement atomic deletion in sstables_manager sstables/storage: Add prepare/complete skaffold for atomic deletion	2023-12-06 19:48:07 +02:00
Tomasz Grabiec	7d0f4c10a2	test: tablets: Add test for failed streaming being fenced away	2023-12-06 18:37:01 +01:00
Tomasz Grabiec	083a0279a9	error_injection: Introduce poll_for_message() To allow more complex waiting, which involves other exit conditions.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	ce0dc9e940	error_injection: Make is_enabled() public	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	733eb21601	api: Add API to kill connection to a particular host For testing failure scenarios.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	9dac0febce	range_streamer: Do not block topology change barriers around streaming Streaming was keeping effective_replication_map_ptr around the whole process, which blocks topology change barriers. This will inhibit progress of tablet load balancer or concurrent migrations, resulting in worse performance. Fix by switching to the most recent erm on sharder calls. multishard_writer calls shard_of() for each new partition. A better way would be to switch immediately when topology version changes, but this is left for later.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	c228f2c940	range_streamer, tablets: Do not keep token metadata around streaming It holds back global token metadata barrier during streaming, which limits parallelism of load balancing.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	7a59acf248	tablets: Fail gracefully when migrating tablet has no pending replica Before the patch we SIGSEGV trying to access pending replica in this case. Fail early instead.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	d1c1b59236	storage_service, api: Add API to disable tablet balancing Load balancing needs to be disabled before making a series of manual migrations so that we don't fight with the load balancer. Also will be used in tests to ensure tablets stick to expected locations.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	1f57d1ea28	storage_service, api: Add API to migrate a tablet Will be used in tests, or for hot fixes in production.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	31c995332c	storage_service, raft topology: Run streaming under session topology guard Prevents stale streaming operation from running beyond topology operation they were started in. After the session field is cleared, or changed to something else, the old topology_guard used by streaming is interrupted and fenced and the next barrier will join with any remaining work.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	080169cad6	storage_service, tablets: Use session to guard tablet streaming	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	5381792401	tablets: Add per-tablet session id field to tablet metadata range_streamer will pick it up when creating topology_guard. It's materialized in memory only for migrating tablets in tablet_transition_info.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	fd3c089ccc	service: range_streamer: Propagate topology_guard to receivers	2023-12-06 18:36:16 +01:00

1 2 3 4 5 ...

40181 Commits