scylladb

Author	SHA1	Message	Date
Avi Kivity	513faa5c71	Merge 'Use http Stream for describe ring' from Amnon " This series changes the describe_ring API to use HTTP stream instead of serializing the results and send it as a single buffer. While testing the change I hit a 4-year-old issue inside service/storage_proxy.cc that causes a use after free, so I fixed it along the way. Fixes #6297 " * amnonh-stream_describe_ring: api/storage_service.cc: stream result of token_range storage_service: get_range_to_address_map prevent use after free	2020-05-17 14:05:26 +03:00
Amnon Heiman	7c4562d532	api/storage_service.cc: stream result of token_range The get token range API can become big which can cause large allocation and stalls. This patch replace the implementation so it would stream the results using the http stream capabilities instead of serialization and sending one big buffer. Fixes #6297 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-05-17 13:56:05 +03:00
Avi Kivity	f1fde537a9	Merge 'Support Snapshot of multiple tables' from Amnon This series adds support for taking a snapshot of multiple tables. Fixes #6333 * amnonh-snapshot_keyspace_table: api/storage_service.cc: Snapshot, support multiple tables service/storage_service: Take snapshot of multiple tables	2020-05-12 11:34:09 +03:00
Ivan Prisyazhnyy	84e25e8ba4	api: support table auto compaction control The patch implements: - /storage_service/auto_compaction API endpoint - /column_family/autocompaction/{name} API endpoint Those APIs allow to control and request the status of background compaction jobs for the existing tables. The implementation introduces the table::_compaction_disabled_by_user. Then the CompactionManager checks if it can push the background compaction job for the corresponding table. New members === table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const Test === Tests: unit(sstable_datafile_test autocompaction_control_test), manual $ ninja build/dev/test/boost/sstable_datafile_test $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000 The test tries to submit a compaction job after playing with autocompaction control table switch. However, there is no reliable way to hook pending compaction task. The code assumed that with_scheduling_group() closure will never preempt execution of the stats check. Revert === Reverts commit `c8247ac`. In previous version the execution sometimes resulted into the following error: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed This version adds a few sstables to the cf, starts the compaction and awaits until it is finished. API change === - `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled). - `/column_family/autocompaction/` got support for POST/DELETE per table Fixes === Fixes #1488 Fixes #1808 Fixes #440 Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2020-05-07 16:23:38 +03:00
Amnon Heiman	ee7b40e31b	api/storage_service.cc: Snapshot, support multiple tables It is sometimes useful to take a snapshot of multiple tables inside a keyspace. This patch add support for multiple tables names when taking a snapshot. The change consist of splitting the table (column family) name and use the array of table instead of just one. After this patch this will be supported: curl -X POST 'http://localhost:10000/storage_service/snapshots?tag=snapshottag&kn=system&cf=range_xfers,large_partitions' Fixes #6333 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-05-05 12:55:36 +03:00
Raphael S. Carvalho	02e046608f	api/service: fix segfault when taking a snapshot without keyspace specified If no keyspace is specified when taking snapshot, there will be a segfault because keynames is unconditionally dereferenced. Let's return an error because a keyspace must be specified when column families are specified. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>	2020-04-27 23:37:00 +03:00
Pavel Emelyanov	83fe0427d2	api/cache_service: Relax getting partitions count This patch has two goals -- speed up the total partitions calculations (walking databases is faster than walking tables), and get rid og row_cache._partitions.size() call, which will not be available on new _partitions collection implementation. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423133900.27818-1-xemul@scylladb.com>	2020-04-23 17:47:58 +02:00
Pavel Emelyanov	6ede253479	api/cache_service: Fix get_row_capacity calculation Current code gets table->row_cache->cache_tracker->region and sums up the region's used space for all tables found. The problem is that all row_cache-s share the same cache_tracker object from the database, thus the resulting number is not correct. Fix this by walking cache_tracker-s from databases instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200423133755.27187-1-xemul@scylladb.com>	2020-04-23 17:05:52 +03:00
Konstantin Osipov	18b9bb57ac	lwt: rename metrics to match accepted terminology Rename inherited metrics cas_propose and cas_commit to cas_accept and cas_learn respectively. A while ago we made a decision to stick to widely accepted terms for Paxos rounds: prepare, accept, learn. The rest of the code is using these terms, so rename the metrics to avoid confusion/technical debt. While at it, rename a few internal methods and functions. Fixes #6169 Message-Id: <20200414213537.129547-1-kostja@scylladb.com>	2020-04-15 12:20:30 +02:00
Pekka Enberg	c8247aced6	Revert "api: support table auto compaction control" This reverts commit `1c444b7e1e`. The test it adds sometimes fails as follows: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed Ivan is working on a fix, but let's revert this commit to avoid blocking next promotion failing from time to time.	2020-04-11 17:56:02 +03:00
Ivan Prisyazhnyy	1c444b7e1e	api: support table auto compaction control This patch adds API endpoint /column_family/autocompaction/{name} that listen to GET and POST requests to pick and control table background compactions. To implement that the patch introduces "_compaction_disabled_by_user" flag that affects if CompactionManager is allowed to push background compactions jobs into the work. It introduces table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const to control auto compaction state. Fixes #1488 Fixes #1808 Fixes #440 Tests: unit(sstable_datafile_test autocompaction_control_test), manual	2020-04-08 21:18:38 +03:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Rafael Ávila de Espíndola	8da235e440	everywhere: Use futurize_invoke instead of futurize<T>::invoke No functionality change, just simpler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200330165308.52383-1-espindola@scylladb.com>	2020-04-03 15:53:35 +02:00
Alejo Sanchez	3a4dd0a856	utils: error injection inject() returning a future Make inject() return a future. Suggested by Gleb. Botond helped on dealing with complex function/lambda overload. Refs #3295 (closed) Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-7-alejo.sanchez@scylladb.com>	2020-04-01 16:22:52 +02:00
Rafael Ávila de Espíndola	eca0ac5772	everywhere: Update for deprecated apply functions Now apply is only for tuples, for varargs use invoke. This depends on the seastar changes adding invoke. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200324163809.93648-1-espindola@scylladb.com>	2020-03-25 08:49:53 +02:00
Ivan Prisyazhnyy	5ec7e77b2e	api: /column_family/major_compaction/{keyspace:table} implementation This implements support for triggering major compations through the REST API. Please note that "split_output" is not supported and Glauber Costa confirmed this this is fine: "We don't support splits, nor do I think we should." Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>	2020-03-23 13:48:29 +02:00
Piotr Sarna	331ddf41e5	api: add error injection to REST API Simple REST API for error injection is implemented. The API allow the following operations: * injecting an error at given injection name * listing injections * disabling an injection * disabling all injections Currently the API enables/disables on all shards. Closes #3295 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-03-20 20:49:03 +01:00
Rafael Ávila de Espíndola	c0072eab30	everywhere: Be more explicit that we don't want std::make_shared If sstring is made an alias to std::string ADL causes std::make_shared to be found. Explicitly ask for ::make_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-03-10 13:13:48 -07:00
Konstantin Osipov	94ee511f6a	lwt: implement cas_failed_read_round_optimization metric Presently lightweight transactions piggy back the old row value on prepare round response. If one of the participants did not provide the old value or the values from peers don't match, we perform a full read round which will repair the Paxos table and the base table, if necessary, at all participants. Capture the fact that read optimization has failed in a metric. Message-Id: <20200304192955.84208-2-kostja@scylladb.com>	2020-03-05 12:20:45 +01:00
Pavel Emelyanov	7363d56946	sstables: Move get_highest_supported_format The global get_highest_supported_format helper and its declaration are scattered all over the code, so clean this up and prepare the ground for moving _sstables_format from the storage_service onto the sstables_manager (not this set). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:45 +03:00
Piotr Sarna	5e07c00eeb	Merge 'Delete table snapshot' from Amnon This series adds an option to the API that supports deleting a specific table from a snapshot. The implementation works in a similar way to the option to specify specific keyspaces when deleting a snapshot. The motivation is to allow reducing disk-space when using the snapshot for backup. A dtest PR is sent to the dtest repository. Fixes #5658 Original PR #5805 Tests: (database_test) (dtest snapshot_test.py:TestSnapshot.test_cleaning_snapshot_by_cf) * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot	2020-02-24 09:38:57 +01:00
Pavel Emelyanov	049b549fdc	api: Register /v2/config stuff after database is started The set_config registers lambdas that need db.local(), so these routes must be registered after database is started. Fixes: #5849 Tests: unit(dev), manual wget on API Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200219130654.24259-1-xemul@scylladb.com>	2020-02-23 17:09:03 +02:00
Amnon Heiman	6b020e67ce	api/storage_service: Support specifying a table when deleting a snapshot This patch adds an optional parameter to DELETE /storage_service/snapshots After this patch the following will be supported: If a keyspace called keyspace1 and a table called standard1 exists. curl -X POST 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1' curl -X DELETE --header 'Accept: application/json' 'http://localhost:10000/storage_service/snapshots?tag=am1&kn=keyspace1&cf=standard1' Fixes #5658 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Amnon Heiman	c3260bad25	storage_service: Add optional table name to clear snapshot There are cases when it is useful to delete specific table from a snapshot. An example is when a snapshot is used for backup. Backup can take a long period of time, during that time, each of the tables can be deleted once it was backup without waiting for the entire backup process to completed. This patch adds such an option to the database and to the storage_service wrapping method that calls it. If a table is specified a filter function is created that filter only the column family with that given name. This is similar to the filtering at the keyspace level. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Tomasz Grabiec	76d1dd7ec6	Merge "nodetool scrub: implement validation and the skip-corrupted flag " from Botond Nodetool scrub rewrites all sstables, validating their data. If corrupt data is found the scrub is aborted. If the skip-corrupted flag is set, corrupt data is instead logged (just the keys) and skipped. The scrubbing algorithm itself is fairly simple, especially that we already have a mutation stream validator that we can use to validate the data. However currently scrub is piggy-backed on top of cleanup compaction. To implement this flag, we have to make scrub a separate compaction type and propagate down the flag. This required some massaging of the code: * Add support for more than two (cleanup or not) compaction types. * Allow passing custom options for each compaction type. * Allow stopping a compaction without the manager retrying it later. Additionally the validator itself needed some changes to allow different ways to handle errors, as needed by the scrub. Fixes: #5487 * https://github.com/denesb/nodetool-scrub-skip-corrupted/v7: table: cleanup_sstables(): only short-circuit on actual cleanup compaction: compaction_type: add Upgrade compaction: introduce compaction_options compaction: compaction_descriptor: use compaction options instead of cleanup flag compaction_manager: collect all cleanup related logic in perform_cleanup() sstables: compaction_stop_exception: add retry flag mutation_fragment_stream_validator: split into low-level and high-level API compaction: introduce scrub_compaction compaction_manager: scrub: don't piggy-back on upgrade_sstables() test: sstable_datafile_test: add scrub unit test	2020-02-17 15:28:07 +02:00
Botond Dénes	26d4c8be95	compaction_manager: scrub: don't piggy-back on upgrade_sstables() Now that we have the necessary infrastructure to do actual scrubbing, don't rely on `upgrade_sstables()` anymore behind the scenes, instead do an actual scrub. Also, use the skip-corrupted flag.	2020-02-13 15:02:37 +02:00
Amnon Heiman	8581617e78	api/storage_service: protect the objects during function call The list_snapshot API, uses http stream to stream the result to the caller. It needs to keep all objects and stream alive until the stream is closed. This patch adds do_with to hold these objects during the lifetime of the function. Fixes #5752	2020-02-12 13:08:34 +02:00
Pavel Emelyanov	5434e412e4	api: Keep and use reference on token_metadata	2020-02-10 20:54:32 +03:00
Amnon Heiman	687e554737	api/storage_service: use stream in get_snapshots get_snapshot should use http stream to reduce memory allocation and stalls. This patch change the implementation so it would stream each of the snapshot object instead of creating a single response and return it. Fixes #5468 Depends on scylladb/seastar#723	2020-02-06 18:40:37 +02:00
Eliran Sinvani	971711a546	storage proxy: migrate to per scheduling group statistics This commit builds on top of the introduced per scheduling group statistics template and employs it for achieving a per scheduling group statistics in storage_proxy. Some of the statistics also had meaning as a global - per shard one. Those are the ones for determining if to throttle the write request. This was handled by creating a global stats struct that will hold those stats and by changing the stat update to also include the global one. One point that complicated it is an already existing aggregation over the per shard stats that now became a per scheduling group per shard stats, converting the aggregation to a two-dimensional aggregation. One thing this commit doesn't handle is validating that an individual statistic didn't "cross a scheduling group boundary", such validation is possible but it can easily be added in the future. There is a subtlety to doing so since if the operation did cross to other scheduling group two connected statistics can lose balance for example written bytes and completed write transactions. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:44 +01:00
Pavel Emelyanov	fd6b5efe75	api: Register snapshot API later In storage_service's snapshot code there are checks for _operation_mode being _not_ JOINING to proceed. The intention is apparently to allow for snapshots only after the cluster join. However, here's how the start-up code looks like - _operation_mode = STARTING in storage_service::constructor - snapshot API registered in api::set_server_storage_service - _operation_mode = JOINING in storage_service::join_token_ring So in between steps 2 and 3 snapshots can be taken. Although there's a quick and simple fix for that (check for the _operation_mode to be not STARTING either) I think it's better to register the snapshot API later instead. This will help greatly to de-bload the storage_service, in particular -- to incapsulate the _operation_mode properly. Note, though the check for _operation_mode is made only for taking snapshot, I move all snapshot ops registration to the later phase. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-27 17:42:04 +03:00
Pavel Emelyanov	4886c1db74	api: Unwrap wrap_ks_cf This is preparation for the next patch -- the lambda in question (and the used type) will be needed in two functions, so make the lambda a "real" function. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-27 17:42:04 +03:00
Pavel Emelyanov	b6e1e6df64	misc_services: Introduce load_meter There's a lonely get_load_map() call on storage_service that needs only load broadcaster, always runs on shard 0 and that's it. Next patch will move this whole stuff into its own helper no-shard container and this is preparation for this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-13 13:53:08 +03:00
Pavel Emelyanov	998f51579a	storage_service: Rip join_ring config option The option in question apparently does not work, several sharded objects are start()-ed (and thus instanciated) in join_roken_ring, while instances themselves of these objects are used during init of other stuff. This leads to broken seastar local_is_initialized assertion on sys_dist_ks, but reading the code shows more examples, e.g. the auth_service is started on join, but is used for thrift and cql servers initialization. The suggestion is to remove the option instead of fixing. The is_joined logic is kept since on-start joining still can take some time and it's safer to report real status from the API. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191203140717.14521-1-xemul@scylladb.com>	2019-12-18 12:45:13 +02:00
Amnon Heiman	f43285f39a	api: replace swagger definition to use long instead of int (#5380 ) In swagger 1.2 int is defined as int32. We originally used int following the jmx definition, in practice internally we use uint and int64 in many places. While the API format the type correctly, an external system that uses swagger-based code generator can face a type issue problem. This patch replace all use of int in a return type with long that is defined as int64. Changing the return type, have no impact on the system, but it does help external systems that use code generator from swagger. Fixes #5347 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-12-11 12:48:29 +02:00
Glauber Costa	73aff1fc95	api: export system uptime via REST This will be useful for tools like nodetool that want to query the uptime of the system. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190619110850.14206-1-glauber@scylladb.com>	2019-11-20 16:44:11 +02:00
Piotr Dulikowski	48f7b2e4fb	table: move out table::stats to table_stats This change was done in order to be able to forward-declare the table::stats structure.	2019-11-12 13:35:41 +01:00
Vladimir Davydov	e510288b6f	api: wire up column_family cas-related statistics	2019-10-29 19:26:18 +03:00
Vladimir Davydov	21c3c98e5b	api: wire up storage_proxy cas-related statistics	2019-10-29 19:26:18 +03:00
Asias He	f876580740	storage_service: Reject nodetool cleanup when there is pending ranges From Shlomi: 4 node cluster Node A, B, C, D (Node A: seed) cassandra-stress write n=10000000 -pop seq=1..10000000 -node <seed-node> cassandra-stress read duration=10h -pop seq=1..10000000 -node <seed-node> while read is progressing Node D: nodetool decommission Node A: nodetool status node - wait for UL Node A: nodetool cleanup (while decommission progresses) I get the error on c-s once decommission ends java.io.IOException: Operation x0 on key(s) [383633374d31504b5030]: Data returned was not validated The problem is when a node gets new ranges, e.g, the bootstrapping node, the existing nodes after a node is removed or decommissioned, nodetool cleanup will remove data within the new ranges which the node just gets from other nodes. To fix, we should reject the nodetool cleanup when there is pending ranges on that node. Note, rejecting nodetool cleanup is not a full protection because new ranges can be assigned to the node while cleanup is still in progress. However, it is a good start to reject until we have full protection solution. Refs: #5045	2019-10-23 19:20:36 +08:00
Vladimir Davydov	e8bcb34ed4	api: drop /storage_proxy/metrics/cas_read/condition_not_met There's no such metric in Cassandra (although Cassadra's docs mistakenly say it exists). Having it would make no sense anyway so let's drop it. Message-Id: <b4f7a6ad278235c443cb8ea740bfa6399f8e4ee1.1570434332.git.vdavydov@scylladb.com>	2019-10-07 16:54:39 +03:00
Nadav Har'El	6c4ad93296	api/compaction_manager: do not hold map on the stack Merged patch series by Amnon Heiman: This patch fixes a bug that a map is held on the stack and then is used by a future. Instead, the map is now moved to the relevant lambda function. Fixes #4824	2019-09-01 13:16:34 +03:00
Amnon Heiman	2d3185fa7d	column_family.cc: remove unhandle future The sum_ratio struct is a helper struct that is used when calculating ratio over multiple shards. Originally it was created thinking that it may need to use future, in practice it was never used and the future was ignore. This patch remove the future from the implementation and reduce an unhandle future warning from the compilation. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-25 16:51:14 +03:00
Amnon Heiman	21dee3d8ef	API:column_family.cc Add get_build_index implmentation This Patch adds an implementation of the get build index API and remove a FIXME. The API returns the list of the built secondary indexes belongs to a column family. Example: CREATE KEYSPACE scylla_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}; CREATE TABLE scylla_demo.mytableID ( uid uuid, text text, time timeuuid, PRIMARY KEY (uid, time) ); CREATE index on scylla_demo.mytableID (time); $ curl -X GET 'http://localhost:10000/column_family/built_indexes/scylla_demo%3Amytableid' ["mytableid_time_idx"] Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-25 16:46:49 +03:00
Pekka Enberg	d0eecbf3bb	api/storage_proxy: Wire up hinted-handoff status to API We support hinted-handoff now, so let's return it's status via the API. Message-Id: <20190819080006.18070-1-penberg@scylladb.com>	2019-08-20 00:24:50 +02:00
Amnon Heiman	6a0490c419	api/compaction_manager: indentation	2019-08-12 14:04:40 +03:00
Amnon Heiman	8181601f0e	api/compaction_manager: do not hold map on the stack This patch fixes a bug that a map is held on the stack and then is used by a future. Instead, the map is now wrapped with do_with. Fixes #4824 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-08-12 14:04:00 +03:00
Calle Wilund	298da3fc4b	api/storage_service: Add "sstable_info" command Assembles information and attributes of sstables in one or more column families. v2: * Use (not really legal) nested "type" in json * Rename "table" param to "cf" for consistency * Some comments on data sizes * Stream result to avoid huge string allocations on final json	2019-08-06 08:14:15 +00:00
Amnon Heiman	1c6dec139f	API: compaction_manager add get pending tasks by table The pending tasks by table name API return an array of pending tasks by keyspace/table names. After this patch the following command would work: curl -X GET 'http://localhost:10000/compaction_manager/metrics/pending_tasks_by_table' Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-12 19:21:26 +03:00
Calle Wilund	4ef940169f	Replace use of "ipv4_addr" with socket_address Allows the various sockets to use ipv6 address binding if so configured.	2019-07-08 14:13:09 +00:00

1 2 3 4 5 ...

443 Commits