scylladb

Author	SHA1	Message	Date
Michael Livshin	c96708d262	add support for the ME sstable format The ME format has been introduced in Cassandra 3.11.11: `11952fae77/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java (L123)` `d84c6e9810` It adds originating host id to sstable metadata in support of fixing loss of commit log data when moving sstables between nodes: https://issues.apache.org/jira/browse/CASSANDRA-16619 In Scylla: * The supported way to ingest sstables is via upload/, where stored commit log replay position should be disregarded (but see https://github.com/scylladb/scylla/issues/10080). * A later commit in this series implements originating host id validation for native ME sstables. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Benny Halevy	f6431824a7	api: add keyspace_offstrategy_compaction Perform offstrategy compaction via the REST API with a new `keyspace_offstrategy_compaction` option. This is useful for performing offstrategy compaction post repair, after repairing all token ranges. Otherwise, offstrategy compaction will only be auto-triggered after a 5 minutes idle timeout. Like major compaction, the api call returns the offstrategy compaction task future, so it's waited on. The `long` result counts the number of tables that required offstrategy compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-01-30 20:40:39 +02:00
Benny Halevy	cc122984d6	compaction: scrub: add quarantine_mode option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:29:04 +02:00
Botond Dénes	c1203618eb	api: storage_service: validate_keyspace -> scrub_keyspace (validate mode) Fold validate keyspace into scrub keyspace (validate mode).	2021-08-05 07:36:45 +03:00
Botond Dénes	b0ef57c833	api: storage_service: expose validation compaction	2021-07-12 10:25:15 +03:00
Benny Halevy	4169f56407	api: storage_service/snapshots: add sf (skip_flush) option Note: I tried adding the option and calling it "skip_flush" but I couldn't make it work with scylla-jmx, hence it's called by the abbreviated name - "sf". Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-02 17:20:19 +03:00
Botond Dénes	550a1cd036	api: storage_service/keyspace_scrub: expose new segregate mode Allow invoking scrub with the newly added segregate mode as well.	2021-05-05 14:35:04 +03:00
Botond Dénes	34643ac997	api: /storage_service/keyspace_scrub: add scrub mode param Add direct support to the newly added scrub mode enum. Instead of the legacy `skip_corrupted` flag, one can now select the desired mode from the mode enum. `skip_corrupted` is still supported for backwards compatibility but it is ignored when the mode enum is set.	2021-05-05 12:03:42 +03:00
Ivan Prisyazhnyy	778d9217f3	tracing: api: fast mode doc improvement Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>	2021-03-30 16:22:56 +02:00
Piotr Wojtczak	c1daf2bb24	column_family: Make toppartitions queries more generic Right now toppartitions can only be invoked on one column family at a time. This change introduces a natural extension to this functionality, allowing to specify a list of families. We provide three ways for filtering in the query parameter "name_list": 1. A specific column family to include in the form "ks:cf" 2. A keyspace, telling the server to include all column families in it. Specified by omitting the cf name, i.e. "ks:" 3. All column families, which is represented by an empty list The list can include any amount of one or both of the 1. and 2. option. Fixes #4520 Closes #7864	2021-03-24 17:54:05 +02:00
Ivan Prisyazhnyy	7cbe2aa9c6	tracing: rest api for lightweight slow query tracing The patch adds REST API support for the lightweight slow query tracing (fast) mode that is implemented by omitting all of the trace events during the tracing. $ curl -v http://localhost:10000/storage_service/slow_query $ curl -v --request POST http://localhost:10000/storage_service/slow_query\?fast=true\&enable=true Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>	2021-03-18 15:05:05 +02:00
Asias He	61ac8d03b9	repair: Add ignore_nodes option In some cases, user may want to repair the cluster, ignoring the node that is down. For example, run repair before run removenode operation to remove a dead node. Currently, repair will ignore the dead node and keep running repair without the dead node but report the repair is partial and report the repair is failed. It is hard to tell if the repair is failed only due to the dead node is not present or some other errors. In order to exclude the dead node, one can use the hosts option. But it is hard to understand and use, because one needs to list all the "good" hosts including the node itself. It will be much simpler, if one can just specify the node to exclude explicitly. In addition, we support ignore nodes option in other node operations like removenode. This change makes the interface to ignore a node explicitly more consistent. Refs: #7806 Closes #8233	2021-03-09 16:03:13 +01:00
Asias He	4d32d03172	storage_service: Introduce load_and_stream === Introduction === This feature extends the nodetool refresh to allow loading arbitrary sstables that do not belong to a node into the cluster. It loads the sstables from disk and calculates the owning nodes of the data and streams to the owners automatically. From example, say the old cluster has 6 nodes and the new cluster has 3 nodes. We can copy the sstables from the old cluster to any of the new nodes and trigger the load and stream process. This can make restores and migrations much easier. === Performance === I managed to get 40MB/s per shard on my build machine. CPU: AMD Ryzen 7 1800X Eight-Core Processor DISK: Samsung SSD 970 PRO 512GB Assume 1TB sstables per node, each shard can do 40MB/s, each node has 32 shards, we can finish the load and stream 1TB of data in 13 mins on each node. 1TB / 40 MB per shard * 32 shard / 60 s = 13 mins === Tests === backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_test which creates a cluster with 4 nodes and inserts data, then use load_and_stream to restore to a 2 nodes cluster. === Usage === curl -X POST "http://{ip}:10000/storage_service/sstables/{keyspace}?cf={table}&load_and_stream=true === Notes === Btw, with the old nodetool refresh, the node will not pick up the data that does not belong to this node but it will not delete it either. One has to run nodetool cleanup to remove those data manually which is a surprise to me and probably to users as well. With load and stream, the process will delete the sstables once it finishes stream, so no nodetool cleanup is needed. The name of this feature load and stream follows load and store in CPU world. Fixes #7831	2021-01-18 16:32:33 +08:00
Asias He	829b4c1438	repair: Make removenode safe by default Currently removenode works like below: - The coordinator node advertises the node to be removed in REMOVING_TOKEN status in gossip - Existing nodes learn the node in REMOVING_TOKEN status - Existing nodes sync data for the range it owns - Existing nodes send notification to the coordinator - The coordinator node waits for notification and announce the node in REMOVED_TOKEN Current problems: - Existing nodes do not tell the coordinator if the data sync is ok or failed. - The coordinator can not abort the removenode operation in case of error - Failed removenode operation will make the node to be removed in REMOVING_TOKEN forever. - The removenode runs in best effort mode which may cause data consistency issues. It means if a node that owns the range after the removenode operation is down during the operation, the removenode node operation will continue to succeed without requiring that node to perform data syncing. This can cause data consistency issues. For example, Five nodes in the cluster, RF = 3, for a range, n1, n2, n3 is the old replicas, n2 is being removed, after the removenode operation, the new replicas are n1, n5, n3. If n3 is down during the removenode operation, only n1 will be used to sync data with the new owner n5. This will break QUORUM read consistency if n1 happens to miss some writes. Improvements in this patch: - This patch makes the removenode safe by default. We require all nodes in the cluster to participate in the removenode operation and sync data if needed. We fail the removenode operation if any of them is down or fails. If the user want the removenode operation to succeed even if some of the nodes are not available, the user has to explicitly pass a list of nodes that can be skipped for the operation. $ nodetool removenode --ignore-dead-nodes <list_of_dead_nodes_to_ignore> <host_id> Example restful api: $ curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5" - The coordinator can abort data sync on existing nodes For example, if one of the nodes fails to sync data. It makes no sense for other nodes to continue to sync data because the whole operation will fail anyway. - The coordinator can decide which nodes to ignore and pass the decision to other nodes Previously, there is no way for the coordinator to tell existing nodes to run in strict mode or best effort mode. Users will have to modify config file or run a restful api cmd on all the nodes to select strict or best effort mode. With this patch, the cluster wide configuration is eliminated. Fixes #7359 Closes #7626	2020-12-10 10:14:39 +02:00
Pekka Enberg	a37eaaa022	sstables: Add support for the "md" format enum value Add the sstable_version_types::md enum value and logically extend sstable_version_types comparisons to cover also the > sstable_version_types::mc cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Asias He	271fac56a3	repair: Add synchronous API to query repair status This new api blocks until the repair job is either finished or failed or timeout. E.g., - Without timeout curl -X GET http://127.0.0.1:10000/storage_service/repair_status/?id=123 - With timeout curl -X GET http://127.0.0.1:10000/storage_service/repair_status/?id=123&timeout=5 The timeout is in second. The current asynchronous api returns immediately even if the repair is in progress. E.g., curl -X GET http://127.0.0.1:10000/storage_service/repair_async/ks?id=123 User can use the new synchronous API to avoid keep sending the query to poll if the repair job is finished. Fixes #6445	2020-07-14 11:20:15 +03:00
Juliusz Stasiewicz	aadd2ffa6a	api: Added command `/storage_service/cdc_streams_check_and_repair` This commit introduces a placeholder for HTTP POST request at `/storage_service/cdc_streams_check_and_repair`.	2020-05-29 12:23:08 +02:00
Amnon Heiman	6b020e67ce	api/storage_service: Support specifying a table when deleting a snapshot This patch adds an optional parameter to DELETE /storage_service/snapshots After this patch the following will be supported: If a keyspace called keyspace1 and a table called standard1 exists. curl -X POST 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1' curl -X DELETE --header 'Accept: application/json' 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1&cf=standard1' Fixes #5658 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Amnon Heiman	f43285f39a	api: replace swagger definition to use long instead of int (#5380 ) In swagger 1.2 int is defined as int32. We originally used int following the jmx definition, in practice internally we use uint and int64 in many places. While the API format the type correctly, an external system that uses swagger-based code generator can face a type issue problem. This patch replace all use of int in a return type with long that is defined as int64. Changing the return type, have no impact on the system, but it does help external systems that use code generator from swagger. Fixes #5347 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-12-11 12:48:29 +02:00
Calle Wilund	298da3fc4b	api/storage_service: Add "sstable_info" command Assembles information and attributes of sstables in one or more column families. v2: * Use (not really legal) nested "type" in json * Rename "table" param to "cf" for consistency * Some comments on data sizes * Stream result to avoid huge string allocations on final json	2019-08-06 08:14:15 +00:00
Glauber Costa	98332de268	api: use longs instead of ints for snapshot sizes Int types in json will be serialized to int types in C++. They will then only be able to handle 4GB, and we tend to store more data than that. Without this patch, listsnapshots is broken in all versions. Fixes: #3845 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181012155902.7573-1-glauber@scylladb.com>	2018-10-12 21:17:24 +03:00
Duarte Nunes	ff15068a41	service/storage_service: Allow querying the view build status This patch adds support for the nodetool viewbuildstatus command, which shows the progress of a materialized view build across the cluster. A view can be absent from the result, successfully built, or currently being built. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Amnon Heiman	827723cec8	API: Add get active repair api This patch adds an API to return an array of the ids of current active repairs. After this patch a call to: curl http://localhost:10000/storage_service/active_repair/ Will return the active repairs ids Fixes #3193 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2018-02-14 11:43:41 +02:00
Asias He	6dc62c6215	api: Add force_terminate_repair API The api /storage_service/force_terminate is supposed to be /storage_service/force_terminate_repair. scylla-jmx uses /storage_service/force_terminate api. So instead of renaming it, it is better to add a new name for it.	2017-08-30 15:19:51 +08:00
Calle Wilund	5eb54f9bc4	api::storage_service: c3 compat - make query keyspaces a trinary choice all, user or non-local strategy ones.	2016-11-08 12:22:04 +00:00
Amnon Heiman	ed1d02b1a3	API: Add slow query API definition This adds the GET and POST api for slow query logging. The GET return an object with the enable, ttl and threshold and the POST lets you configure each of them. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-09-03 01:15:15 +03:00
Amnon Heiman	56ea8c943e	API: add scylla release version API This adds a definition to the scylla release version. The API already return the compatibility version (ie. the compatible origin version) This definition returns the scylla version, a call to the API should return the same result as running scylla --version. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-07-03 16:26:21 +03:00
Nadav Har'El	f9ee74f56f	repair: options for repairing only a subrange To implement nodetool's "--start-token"/"--end-token" feature, we need to be able to repair only part of the ranges held by this node. Our REST API already had a "ranges" option where the tool can list the specific ranges to repair, but using this interface in the JMX implementation is inconvenient, because it requires the Java code to be able to intersect the given start/end token range with the actual ranges held by the repaired node. A more reasonable approach, which this patch uses, is to add new "startToken"/"endToken" options to the repair's REST API. What these options do is is to find the node's token ranges as usual, and only then intersect them with the user-specified token range. The JMX implementation becomes much simpler (in a separate patch for scylla-jmx) and the real work is done in the C++ code, where it belongs, not in Java code. With the additional scylla-jmx patch to use the new REST API options provided here, this fixes #917. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1455807739-25581-1-git-send-email-nyh@scylladb.com>	2016-02-18 17:13:56 +02:00
Amnon Heiman	6942b41693	API: rename the map of string, double to map_string_double This replaces the confusing name to a more meaningful name. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1451466952-405-1-git-send-email-amnon@scylladb.com>	2016-01-03 19:10:49 +02:00
Pekka Enberg	0aa105c9cf	Merge "load report a negative value" from Amnon "This series solve an issue with the load broadcaster that reports negative values due to an integer wrap around. While fixing this issue an additional change was made so that the load_map would return doubles and not formatted string. This is a better API, safer and better documented."	2015-12-30 10:21:55 +02:00
Amnon Heiman	ec379649ea	API: repair to use documented params The repair API use to have an undocumented parameter list similiar to origin. This patch changes the way repair is getting its parameters. Instead of a one undocumented string it now lists all the different optional parameters in the swagger file and accept them explicitely. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-29 15:38:44 +02:00
Amnon Heiman	71905081b1	API: report the load map as an unformatted double In origin the storage_serivce report the load map as a formatted string. As an API a better option is to report the load map as double and let the JMX proxy do the formatting. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-29 11:55:34 +02:00
Amnon Heiman	ef3c6b2647	API: Add describe ring API implementation This patch chanages the API to support describe ring instead of describe ring jmx that will be implemented in the jmx server. The API will return a list of objects instead of string. An additional api was added as the equivelent to the jmx call with an empty param. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-11-03 16:33:39 +02:00
Asias He	06b19867d8	api: Fix storage_service remove_node parameter nodetool removenode takes the parameter of host_id instead of token. Refs: #496	2015-10-26 10:32:47 +02:00
Amnon Heiman	1e8752d55e	API: Fix a confusion in the storage service snapshot details There was a confusion between the snapshot key and the keyspace in the snapshot details, this fixes it. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-10-20 13:53:09 +03:00
Amnon Heiman	6fd3c81db5	keyspace clean up should be a POST not a GET	2015-10-11 15:51:56 +03:00
Amnon Heiman	2c5716dac3	API: storage_service Add the swagger definition for ownership This adds the API for get_effective_ownership and get_ownership in storage_service. It is based on the StorageServiceMBean definition. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-25 19:39:13 +03:00
Amnon Heiman	db30a588b2	API: Break the async repair into two operations This distinguish between the async repair that starts the repair, that will now be a POST request and the method that check on the command progress that will now be a GET command. After the change each operation would get the parameters that it needs. The GET will return an enum based on the repair_status. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-25 15:45:30 +03:00
Amnon Heiman	34e60faaca	API: Adding the storage service metrics definition This adds the storage service metrics that is based on the StorageServiceMetrics class. The following command where added: get_metrics_load get_exceptions get_total_hints_in_progress get_total_hints Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-18 11:19:51 +03:00
Amnon Heiman	cae3de162c	API: Remove empty line from empty parameters list in storage_service.json Just for styling, empty parameters list will not include an empty line. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-18 11:19:51 +03:00
Amnon Heiman	d067b9b111	API: Complete defition of the storage service This patch complete the definition of the storage service. It is a mapping of the StorageServiceMBean. When possible the following convention where used: API that does not return value will be POST or DELETE depends on the context. When possible, POST, DELETE and GET uses the same path The following commands where added: get_leaving_nodes get_moving_nodes get_joining_nodes get_release_version get_schema_version get_all_data_file_locations get_saved_caches_location get_range_to_endpoint_map get_pending_range_to_endpoint_map describe_ring_jmx get_host_id_map get_load get_load_map get_current_generation_number get_natural_endpoints get_snapshot_details take_snapshot del_snapshot true_snapshots_size force_keyspace_compaction force_keyspace_cleanup scrub upgrade_sstables force_keyspace_flush repair_async force_terminate_all_repair_sessions decommission move remove_node get_removal_status force_remove_completion set_logging_level get_logging_levels get_operation_mode is_starting get_drain_progress drain truncate get_keyspaces update_snitch stop_gossiping start_gossiping is_gossip_running stop_daemon is_initialized stop_rpc_server start_rpc_server is_rpc_server_running start_native_transport stop_native_transport is_native_transport_running join_ring is_joined set_stream_throughput_mb_per_sec get_stream_throughput_mb_per_sec get_compaction_throughput_mb_per_sec set_compaction_throughput_mb_per_sec is_incremental_backups_enabled set_incremental_backups_enabled rebuild bulk_load bulk_load_async reschedule_failed_deletions load_new_ss_tables sample_key_range reset_local_schema set_trace_probability get_trace_probability enable_auto_compaction disable_auto_compaction deliver_hints get_cluster_name get_partitioner_name get_tombstone_warn_threshold set_tombstone_warn_threshold get_tombstone_failure_threshold set_tombstone_failure_threshold get_batch_size_failure_threshold set_batch_size_failure_threshold set_hinted_handoff_throttle_in_kb Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-06-16 14:30:09 +03:00
Amnon Heiman	bdb2a7ff47	api: Remove empty parameter from storage_service.json Empty parameter definition is not accepted by the swagger UI, instead, the parameter list itself should be empty. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-05-31 18:17:29 +03:00
Amnon Heiman	a28b90bfd3	Adding definitions to the storage_service This adds the following definitions to the storage_service swagger definition file: /storage_service/tokens /storage_service/tokens/{endpoint} /storage_service/commitlog /storage_service/tokens_endpoint Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-05-27 15:28:20 +03:00
Amnon Heiman	aeb66fa409	API: Adding the stroage service stub The storage service API will hold the equivelent information of the StorageServiceMBean. This adds the API with one stubed method the get local hostid. After the patch the storage_service doc will be available at: http://localhost:10000/api-doc/storage_service/ And the stubed local host id will be under: http://localhost:10000/storage_service/local_hostid and will return an empty string Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-04-13 18:57:14 +03:00

44 Commits