scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Author	SHA1	Message	Date
Michael Huang	62a8a31be7	cdc: use chunked_vector for topology_description entries Lists can grow very big. Let's use a chunked vector to prevent large contiguous allocations. Fixes: #15302. Closes scylladb/scylladb#15428	2023-09-18 23:17:01 +03:00
Kefu Chai	30ef69fcb2	docs/dev/object_store: add more samples in hope to lower the bar to testing object store. * add language specifier for better readability of the document. to highlight the config with YAML syntax * add more specific comment on the AWS related settings * explain that endpoint should match in the CREATE KEYSPACE statement and the one defined by the YAML configuration. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15433	2023-09-15 17:35:17 +03:00
Avi Kivity	a3d73bfba7	Merge 'Add support for decommission with tablets' from Tomasz Grabiec Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be aborted. There is no infrastructure for aborts yet, so this is not implemented. Closes #15197 * github.com:scylladb/scylladb: tablets, raft topology: Add support for decommission with tablets tablet_allocator: Compute load sketch lazily tablet_allocator: Set node id correctly tablet_allocator: Make migration_plan a class tablets: Implement cleanup step storage_service, tablets: Prevent stale RPCs from running beyond their stage locator: Introduce tablet_metadata_guard locator, replica: Add a way to wait for table's effective_replication_map change storage_service, tablets: Extract do_tablet_operation() from stream_tablet() raft topology: Add break in the final case clause raft topology: Fix SIGSEGV when trace-level logging is enabled raft topology: Set node state in topology raft topology: Always set host id in topology	2023-09-14 17:16:23 +03:00
Tomasz Grabiec	551cc0233d	tablets, raft topology: Add support for decommission with tablets Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be paused so that admin becomes aware of the problem and issues an abort on the topology change. There is no infrastructure for aborts yet, so this is not implemented.	2023-09-14 13:05:49 +02:00
Kefu Chai	60c293ed7d	doc/dev: correct the path to `object_storage.yaml` we get the path object storage config like: ```c++ db::config::get_conf_sub("object_storage.yaml").native() ``` so, the default path should be $SCYLLA_CONF/object_storage.yaml. in this change, it is corrected. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15406	2023-09-14 10:40:55 +03:00
Patryk Jędrzejczak	1c58c6336a	system_keyspace: change id to timeuuid in CDC_GENERATIONS_V3 We change the type of IDs in CDC_GENERATIONS_V3 to timeuuid to give them a time-based order. We also change how we initialize them so that the new CDC generation always has the highest ID. This is the last step to enabling the efficient clearing of obsolete CDC generation data. Additionally, we change the types of current_cdc_generation_uuid, new_cdc_generation_data_uuid and the second values of the elements in unpublished_cdc_generations to timeuuid, so that they match id in CDC_GENERATIONS_V3.	2023-09-12 11:43:34 +02:00
Patryk Jędrzejczak	fab066cffe	cdc: generation: remove topology_description_generator After moving the creation of uuid out of make_new_generation_description, this function only calls the topology_description_generator's constructor and its generate method. We could remove this function, but we instead simplify the code by removing the topology_description_generator class. We can do this refactor because make_new_generation_description is the only place using it. We inline its generate method into make_new_generation_description and turn its private methods into static functions.	2023-09-12 11:18:54 +02:00
Patryk Jędrzejczak	2cd430ac80	system_kayspace: make CDC_GENERATIONS_V3 single-partition We make CDC_GENERATIONS_V3 single-partition by adding the key column and changing the clustering key from range_end to (id, range_end). This is the first step to enabling the efficient clearing of obsolete CDC generation data, which we need to prevent Raft-topology snapshots from endlessly growing as we introduce new generations over time. The next step is to change the type of the id column to timeuuid. We do it in the following commits. After making CDC_GENERATIONS_V3 single-partition, there is no easy way of preserving the num_ranges column. As it is used only for sanity checking, we remove it to simplify the implementation.	2023-09-12 09:51:45 +02:00
Patryk Jędrzejczak	2643ccc70e	docs: remove information about publish_cdc_generation We update documentation after replacing the topology::transition_state::publish_cdc_generation state with the CDC generation publisher fiber.	2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak	5ed9d4db6d	raft topology: add unpublished_cdc_generations to system.topology In the following commits, we replace the topology::transition_state::publish_cdc_generation state with a background fiber that continually publishes committed CDC generations. To make these generations accessible to the topology coordinator, we store them in the new column of system.topology -- unpublished_cdc_generations.	2023-09-08 09:05:01 +02:00
Nadav Har'El	5625624533	doc/dev: add document about analyzing build time Add a document describing in detail how to use clang's "-ftime-trace" option, and the ClangBuildAnalyzer tool, to find the source files, header files and templates which slow down Scylla's build the most. I've used this tool in the past to reduce Scylla build time - see commits: `fa7a302130` (reduced 6.5%) `f84094320d` (reduced 0.1%) `6ebf32f4d7` (reduced 1%) `d01e1a774b` (reduced 4%) I'm hoping that documenting how to use this tool will allow other developers to suggest similar commits. Refs #1. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #15209	2023-09-01 11:33:36 +03:00
Kamil Braun	169d19e5b0	Merge 'raft topology: support --ignore-dead-nodes in removenode and replace' from Patryk Jędrzejczak We add support for `--ignore-dead-nodes` in `raft_removenode` and `--ignore-dead-nodes-for-replace` in `raft_replace`. For now, we allow passing only host ids of the ignored nodes. Supporting IPs is currently impossible because `raft_address_map` doesn't provide a mapping from IP to a host id. The main steps of the implementation are as follows: - add the `ignore_nodes` column to `system.topology`, - set the `ignore_nodes` value of the topology mutation in `raft_removenode` and `raft_replace`, - extend `service::request_param` with alternative types that allow storing a set of ids of the ignored nodes, - load `ignore_nodes` from `system.topology` into `request_param` in `system_keyspace::load_topology_state`, - add `ignore_nodes` to `exclude_nodes` in `topology_coordinator::exec_global_command`, - pass `ignore_nodes` to `replace_with_repair` and `remove_with_repair` in `storage_service::raft_topology_cmd_handler`. Additionally, we add `test_raft_ignore_nodes.py` with two tests that verify the added changes. Fixes #15025 Closes #15113 * github.com:scylladb/scylladb: test: add test_raft_ignore_nodes test: ManagerClient.remove_node: allow List[HostId] for ignore_dead raft topology: pass ignore_nodes to {replace, remove}_with_repair raft topology: exec_global_command: add ignore_nodes to exclude_nodes raft topology: exec_global_command: change type of exclude_nodes topology_state_machine: extend request_param with a set of raft ids raft topology: set ignore_nodes in raft_removenode and raft_replace utils: introduce split_comma_separated_list raft topology: add the ignore_nodes column to system.topology	2023-08-22 18:04:59 +02:00
Patryk Jędrzejczak	16f5db8af2	raft topology: add the ignore_nodes column to system.topology In the following commits, we add support for --ignore-dead-nodes in raft_removenode and --ignore-dead-nodes-for-replace in raft_replace. To make these request parameters accessible for the topology coordinator, we store them in the new ignore_nodes column of system.topology.	2023-08-22 10:30:12 +02:00
Avi Kivity	23be6f0336	tablets: change persistent type of replica set from set to list The system.tablets table stores replica sets as a CQL set type, which is sorted. This means that if, in a tablet replica set [n1, n2, n3] n2 is replaced with n4, then on reload we'll see [n1, n3, n4], changing the relative position of n3 from the third replica to the second. The relative position of replicas in a replica set is important for materialized views, as they use it to pair base replicas with view replicas. To prepare for materialized views using tablets, change the persistent data type to list, which preserves order. The code that generates new replica sets already preserves order: see locator::replace_replica(). While this changes the system schema, tablets are an experimental feature so we don't need to worry about upgrades. Closes #15111	2023-08-21 22:55:14 +02:00
Tomasz Grabiec	fe181b3bac	tablets: Balance tablets concurrently with active migrations After this change, the load balancer can make progress with active migrations. If the algorithm is called with active tablet migrations in tablet metadata, those are treated by load balancer as if they were already completed. This allows the algorithm to incrementally make decision which when executed with active migrations will produce the desired result. Overload of shards is limited by the fact that the algorithm tracks streaming concurrency on both source and target shards of active migrations and takes concurrency limit into account when producing new migrations. The coordinator executes the load balancer on edges of tablet state machine stransitions. This allows new migrations to be started as soon as tablets finish streaming. The load balancer is also continuously invoked as long as it produces a non-empty plan. This is in order to saturate the cluster with streaming. A single make_plan() call is still not saturating, due to the way algorithm is implemented.	2023-07-31 01:45:23 +02:00
Tomasz Grabiec	05519bd5e5	doc: Document tablet migration state machine and load balancer	2023-07-25 21:08:02 +02:00
Benny Halevy	26ff8f7bf7	docs: dml: add update ordering section and add docs/dev/timestamp-conflict-resolution.md to document the details of the conflict resolution algorithm. Refs scylladb/scylladb#14063 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-20 11:55:54 +03:00
Avi Kivity	5acb137c2e	Merge 'docs/dev/reader-concurrency-semaphore.md: add section about operations' from Botond Dénes Containing two tables, describing all the possible operations seen in user, system and streaming semaphore diagnostics dumps. Closes #14171 * github.com:scylladb/scylladb: docs/dev/reader-concurrency-semaphore.md: add section about operations docs/dev/reader-concurrency-semaphore.md: switch to # headers markings reader_concurrency_semaphore: s/description/operation/ in diagnostics dumps	2023-06-07 22:53:18 +03:00
Botond Dénes	0c632b6e3d	docs/dev/reader-concurrency-semaphore.md: add section about operations Containing two tables, describing all the possible operations seen in user, system and streaming semaphore diagnostics dumps.	2023-06-07 14:22:52 +03:00
Botond Dénes	0067fa0a09	docs/dev/reader-concurrency-semaphore.md: switch to # headers markings As they allow for more levels, than the current `---` and `===` ones.	2023-06-07 14:22:10 +03:00
Botond Dénes	c4faa05888	reader_concurrency_semaphore: s/description/operation/ in diagnostics dumps "description" is not the respective column contains, so fix the header.	2023-06-07 14:21:48 +03:00
Marcin Maliszkiewicz	8b06684a8c	docs: dev: document pytest run convenience script Closes #13995	2023-06-07 12:37:52 +03:00
Kefu Chai	8e7c7e1079	docs/dev/repair_based_node_ops: better formatting * indent the nested paragraphs of list items * use table to format the time sequence for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14016	2023-05-25 08:31:43 +03:00
Botond Dénes	eb457b6104	Merge 'fixed broken links, added community forum link, university link, spelling and other mistakes' from Guy Shtub Closes #13979 * github.com:scylladb/scylladb: Update docker-hub.md Update docs/dev/docker-hub.md Update docs/dev/docker-hub.md Update docs/dev/docker-hub.md Update docs/dev/docker-hub.md Update docs/dev/docker-hub.md fixed broken links, added community forum link, university link, other mistakes	2023-05-24 09:58:58 +03:00
Guy Shtub	65c0afc899	Update docker-hub.md	2023-05-24 07:34:58 +03:00
Guy Shtub	7e3d768369	Update docs/dev/docker-hub.md Co-authored-by: Anna Stuchlik <37244380+annastuchlik@users.noreply.github.com>	2023-05-24 07:27:07 +03:00
Guy Shtub	6329036656	Update docs/dev/docker-hub.md Co-authored-by: Anna Stuchlik <37244380+annastuchlik@users.noreply.github.com>	2023-05-24 07:26:42 +03:00
Guy Shtub	3538a2e1c2	Update docs/dev/docker-hub.md Co-authored-by: Anna Stuchlik <37244380+annastuchlik@users.noreply.github.com>	2023-05-24 07:23:51 +03:00
Guy Shtub	53183d6302	Update docs/dev/docker-hub.md Co-authored-by: Anna Stuchlik <37244380+annastuchlik@users.noreply.github.com>	2023-05-24 07:23:37 +03:00
Guy Shtub	2677d47bbc	Update docs/dev/docker-hub.md Co-authored-by: Anna Stuchlik <37244380+annastuchlik@users.noreply.github.com>	2023-05-24 07:23:28 +03:00
Kefu Chai	b8c565875b	docs/dev/system_keyspace: add raft table it is one of the non-volatile tables. we need add more of them. but let's do this piecemeal. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-24 10:08:04 +08:00
Kefu Chai	eee0003312	docs/dev/system_keyspace: move sstables and tablets into another section not all tables in system keyspace are volatile. among other things, system.sstables and system.tablets are persisted using sstables like regular user tables. so move them into the section where we have other regular tables there. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-24 10:08:03 +08:00
Kefu Chai	1246568e3b	docs/dev/system_keyspace: use timeuuid for sstables.generation we changed the type of generation column in system.sstables from bigint to timeuuid in `74e9e6dd1a` but that change failed to update the document accordingly. so let's update the document to reflect the change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13994	2023-05-23 14:37:28 +03:00
Guy Shtub	eefaad189a	fixed broken links, added community forum link, university link, other mistakes	2023-05-22 13:12:16 +03:00
Tomasz Grabiec	9d4bca26cc	Merge 'raft topology: implement `check_and_repair_cdc_streams` API' from Kamil Braun `check_and_repair_cdc_streams` is an existing API which you can use when the current CDC generation is suboptimal, e.g. after you decommissioned a node the current generation has more stream IDs than you need. In that case you can do `nodetool checkAndRepairCdcStreams` to create a new generation with fewer streams. It also works when you change number of shards on some node. We don't automatically introduce a new generation in that case but you can use `checkAndRepairCdcStreams` to create a new generation with restored shard-colocation. This PR implements the API on top of raft topology, it was originally implemented using gossiper. It uses the `commit_cdc_generation` topology transition state and a new `publish_cdc_generation` state to create new CDC generations in a cluster without any nodes changing their `node_state`s in the process. Closes #13683 * github.com:scylladb/scylladb: docs: update topology-over-raft.md test: topology_experimental_raft: test `check_and_repair_cdc` API raft topology: implement `check_and_repair_cdc_streams` API raft topology: implement global request handling raft topology: introduce `prepare_new_cdc_generation_data` raft_topology: `get_node_to_work_on_opt`: return guard if no node found raft topology: remove `node_to_work_on` from `commit_cdc_generation` transition raft topology: separate `publish_cdc_generation` state raft topology: non-node-specific `exec_global_command` raft topology: introduce `start_operation()` raft topology: non-node-specific `topology_mutation_builder` topology_state_machine: introduce `global_topology_request` topology_state_machine: use `uint16_t` for `enum_class`es raft topology: make `new_cdc_generation_data_uuid` topology-global	2023-05-22 11:33:58 +02:00
Calle Wilund	469e710caa	docs: Add initial doc on commitlog segment file format Refs #12849 Just a few lines on the file format of segments. Closes #13848	2023-05-15 16:22:44 +03:00
Kamil Braun	ddb5b45aef	docs: update topology-over-raft.md It was already outdated before this PR. Describe the version of topology state machine implemented in this PR. Fix some typos and make it proper markdown so it renders nicely on GitHub etc.	2023-05-08 16:49:01 +02:00
Pavel Emelyanov	0b18e3bff9	doc: Add a document describing how to configure S3 backend Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-03 20:23:38 +03:00
Tomasz Grabiec	9d786c1ebc	db: tablets: Add persistence layer	2023-04-24 10:49:37 +02:00
Kamil Braun	88aff50e8b	docs: cdc: describe generation changes using group 0 topology coordinator Update the `Generation switching` section: most of the existing description landed in `Gossiper-based topology changes` subsection, and a new subsection was added to describe Raft group 0 based topology changes. Marked as WIP - we expect further development in this area soon. The existing gossiper-based description was also updated a bit.	2023-04-20 16:36:41 +02:00
Botond Dénes	edc75f51ff	docs/dev/reader-concurrency-semaphore.md: expand on how the semaphore works Greatly expand on the details of how the semaphore works. Organize the content into thematic chapters to improve navigation. Improve formatting while at it.	2023-04-14 08:51:24 -04:00
Botond Dénes	943ae7fc69	reader_permit: give better names to active* states The names of these states have been the source of confusion ever since they were introduced. Give them names which better reflects their true meaning and gives less room for misinterpretation. The changes are: * active/unused -> active * active/used -> active/need_cpu * active/blocked -> active/await Hopefully the new names do a better job at conveying what these states really mean: * active - a regular admitted permit, which is active (as opposed to an inactive permit). * active/need_cpu - an active permit which was marked as needing CPU for the read to make progress. This permit prevents admission of new permits while it is in this state. * active/await - a former active/need_cpu permit, which has to wait on I/O or a remote shard. While in this state, it doesn't block the admission of new permits (pending other criteria such as resource availability).	2023-04-14 08:40:46 -04:00
Pavel Emelyanov	08e9046d07	system_keyspace: Add ownership table The schema is CREATE TABLE system.sstables ( location text, generation bigint, format text, status text, uuid uuid, version text, PRIMARY KEY (location, generation) ) A sample entry looks like: location \| generation \| format \| status \| uuid \| version ---------------------------------------------------------------------+------------+--------+--------+--------------------------------------+--------- /data/object_storage_ks/test_table-d096a1e0ad3811ed85b539b6b0998182 \| 2 \| big \| sealed \| d0a743b0-ad38-11ed-85b5-39b6b0998182 \| me The uuid field points to the "folder" on the storage where the sstable components are. Like this: s3 `- test_bucket `- f7548f00-a64d-11ed-865a-0c1fbc116bb3 `- Data.db - Index.db - Filter.db - ... It's not very nice that the whole /var/lib/... path is in fact used as location, it needs the PR #12707 to fix this place. Also, the "status" part is not yet fully functional, it only supports three options: - creating -- the same as TemporaryTOC file exists on disk - sealed -- default state - deleting -- the analogy for the deletion log on disk The latter needs support from the distributed_loader, which's not yet there. In fact, distributes_loader also needs to be patched to actualy select entries from this table on load. Also it needs the mentioned PR #12707 to support staging and quarantine sstables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:28 +03:00
Kefu Chai	c24a9600af	docs: dev: correct a typo s/By expending/By expanding/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13392	2023-03-31 17:19:08 +03:00
Kefu Chai	11cea36c12	docs: dev: write mathematical expressions in LaTeX for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13341	2023-03-29 15:07:14 +03:00
Gleb Natapov	5e232ebee5	system_keyspace: add a table to persist topology change state machine's state Add local table to store topology change state machine's state there. Also add a function that loads the state to memory.	2023-03-21 16:06:43 +02:00
Gleb Natapov	a2b7d2c1a1	service: Introduce topology state machine data structures The topology state machine will track all the nodes in a cluster, their state, properties (topology, tokens, etc) and requested actions. Node state can be one of those: none - the node is not yet in the cluster bootstrapping - the node is currently bootstrapping decommissioning - the node is being decommissioned removing - the node is being removed replacing - the node is replacing another node normal - the node is working normally rebuild - the node is being rebuilt left - the node is left the cluster Nodes in state left are never removed from the state. Tokens also can be in one of the states: write_both_read_old - writes are going to new and old replica, but reads are from old replicas still write_both_read_new - writes still going to old and new replicas but reads are from new replica owner - tokens are owned by the node and reads and write go to new replica set only Tokens that needs to be move start in 'write_both_read_old' state. After entire cluster learns about it streaming start. After the streaming tokens move to 'write_both_read_new' state and again the whole cluster needs to learn about it and make sure no reads started before that point exist in the system. After that tokens may move to 'owner' state. topology_request is the field through which a topology operation request can be issued to a node. A request is one of the topology operation currently supported: join, leave, replace or remove.	2023-03-21 16:06:43 +02:00
Wojciech Mitros	52eb70aef0	docs: make wasm documentation visible for users Until now, the instructions on generating wasm files and using them for Scylla UDFs were stored in docs/dev, so they were not visible on the docs website. Now that the Rust helper library for UDFs is ready, and we're inviting users to try it out, we should also make the rest of the Wasm UDF documentation readily available for the users. Closes #13139	2023-03-14 16:21:23 +02:00
Wojciech Mitros	d4851ccae7	treewide: rename the "xwasm" UDF language to "wasm" When the WASM UDFs were first introduced, the LANGUAGE required in the CQL statements to use them was "xwasm", because the ABI for the UDFs was still not specified and changes to it could be backwards incompatible. Now, the ABI is stabilized, but if backwards incompatible changes are made in the future, we will add a new ABI version for them, so the name "xwasm" is no longer needed and we can finally change it to "wasm". Closes #13089	2023-03-07 10:21:11 +02:00
Wojciech Mitros	6d2e785b5c	docs: update wasm.md The WASM UDF implementation has changed since the last time the docs were written. In particular, the Rust helper library has been released, and using it should be the recommended method. Some decisions that were only experimental at the start, were also "set in stone", so we should refer to them as such. The docs also contain some code examples. This patch adds tests for these examples to make sure that they are not wrong and misleading. Closes #12941	2023-02-28 20:59:25 +02:00

... 3 4 5 6 7

341 Commits