scylladb

Author	SHA1	Message	Date
Avi Kivity	6221b90b89	secondary_index_manager: stop including expression.hh Use a forward declaration of cql3::expr::oper_t to reduce the number of translation units depending on expression.hh. Before: $ find build/dev -name '.d' \| xargs cat \| grep -c expression.hh 272 After: $ find build/dev -name '.d' \| xargs cat \| grep -c expression.hh 154 Some translation units adjust their includes to restore access to required headers. Closes #9229	2021-08-22 21:21:46 +03:00
Avi Kivity	a7ef826c2b	Merge "Fold validation compaction into scrub" from Botond " Validation compaction -- although I still maintain that it is a good descriptive name -- was an unfortunate choice for the underlying functionality because Origin has burned the name already as it uses it for a compaction type used during repair. This opens the door for confusion for users coming from Cassandra who will associate Validation compaction with the purpose it is used for in Origin. Additionally, since Origin's validation compaction was not user initiated, it didn't have a corresponding `nodetool` command to start it. Adding such a command would create an operational difference between us and Origin. To avoid all this we fold validation compaction into scrub compaction, under a new "validation" mode. I decided against using the also suggested `--dry-mode` flag as I feel that a new mode is a more natural choice, we don't have to define how it interacts with all the other modes, unlike with a `--dry-mode` flag. Fixes: #7736 Tests: unit(dev), manual(REST API) " * 'scrub-validation-mode/v2' of https://github.com/denesb/scylla: compaction/compaction_descriptor: add comment to Validation compaction type compaction/compaction_descriptor: compaction_options: remove validate api: storage_service: validate_keyspace -> scrub_keyspace (validate mode) compaction/compaction_manager: hide perform_sstable_validation() compaction: validation compaction -> scrub compaction (validate mode) compaction/compaction_descriptor: compaction_options: add options() accessor compaction/compaction_descriptor: compaction_options::scrub::mode: add validate	2021-08-10 12:18:35 +03:00
Piotr Dulikowski	7e3966c03e	api: add HTTP API for hint sync points Adds HTTP endpoints for manipulating hint sync points: - /hinted_handoff/sync_point (POST) - creates a new sync point for hints towards nodes listed in the `target_hosts` parameter - /hinted_handoff/sync_point (GET) - checks the status of the sync point. If a non-zero `timeout` parameter is given, it waits until the sync point is reached or the timeout expires.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	9091ce5977	api: register hints HTTP API outside set_server_done Registration of the currently unused hinted handoff endpoints is moved out from the set_server_done function. They are now explicitly registered in main.cc by calling api::set_hinted_handoff and also uninitialized by calling api::unset_hinted_handoff. Setting/unsetting HTTP API separately will allow to pass a reference to the sync_point_service without polluting the set_server_done function.	2021-08-09 09:24:36 +02:00
Botond Dénes	c1203618eb	api: storage_service: validate_keyspace -> scrub_keyspace (validate mode) Fold validate keyspace into scrub keyspace (validate mode).	2021-08-05 07:36:45 +03:00
Botond Dénes	5f6468d7d7	compaction/compaction_manager: hide perform_sstable_validation() We are folding validation compaction into scrub (at least on the interface level), so remove the validation entry point accordingly and have users go through `perform_sstable_scrub()` instead.	2021-08-05 07:36:44 +03:00
Raphael S. Carvalho	33404b9169	api: make compaction manager api available earlier That will be needed for aborting reshape on boot. Refs #7738. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-08-02 13:54:44 -03:00
Pavel Emelyanov	df285fca7a	api: Capture and use sharded<storage_service>& in handlers The reference in question is already there, handlers that need storage service can capture it and use. These handlers are not yet stopped, but neither is the storage service itself, so the potentially dangling reference is not being set up here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-29 05:12:36 +03:00
Pavel Emelyanov	2e50ba7079	api: Carry sharded<storage_service>& down to some handlers Both set_server_storage_service and set_server_storage_proxy set up API handlers that need storage service to work. Now they all call for global storage service instance, but it's better if they receive one from main. This patch carries the sharded storage service reference down to handlers setting function, next patch will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-29 05:12:36 +03:00
Avi Kivity	331eb57e17	Revert "compression: define 'class' attribute for compression and deprecate 'sstable_compression'" This reverts commit `5571ef0d6d`. It causes rolling upgrade failures. Fixes #9055. Reopens #8948.	2021-07-28 14:14:22 +03:00
Piotr Jastrzebski	90a607e844	api: use proper type to reduce partition count Partition count is of a type size_t but we use std::plus<int> to reduce values of partition count in various column families. This patch changes the argument of std::plus to the right type. Using std::plus<int> for size_t compiles but does not work as expected. For example plus<int>(2147483648LL, 1LL) = -2147483647 while the code would probably want 2147483649. Fixes #9090 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #9074	2021-07-26 11:53:06 +03:00
Juliusz Stasiewicz	a8b741efe2	endpoint_details: store `_host` as `gms::inet_address` In an upcoming commit I will add "system.describe_ring" table which uses endpoint's inet address as a part of CK and, therefore, needs to keep them sorted with `inet_addr_type::less`.	2021-07-20 14:00:54 +02:00
Botond Dénes	b0ef57c833	api: storage_service: expose validation compaction	2021-07-12 10:25:15 +03:00
Avi Kivity	9059514335	build, treewide: enable -Wpessimizing-move warning This warning prevents using std::move() where it can hurt - on an unnamed temporary or a named automatic variable being returned from a function. In both cases the value could be constructed directly in its final destination, but std::move() prevents it. Fix the handful of cases (all trivial), and enable the warning. Closes #8992	2021-07-08 17:52:34 +03:00
Raphael S. Carvalho	1924e8d2b6	treewide: Move compaction code into a new top-level compaction dir Since compaction is layered on top of sstables, let's move all compaction code into a new top-level directory. This change will give me extra motivation to remove all layer violations, like sstable calling compaction-specific code, and compaction entanglement with other components like table and storage service. Next steps: - remove all layer violations - move compaction code in sstables namespace into a new one for compaction. - move compaction unit tests into its own file Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>	2021-07-07 23:21:51 +03:00
Avi Kivity	5571ef0d6d	compression: define 'class' attribute for compression and deprecate 'sstable_compression' Cassandra 3.0 deprecated the 'sstable_compression' attribute and added 'class' as a replacement. Follow by supporting both. The SSTABLE_COMPRESSION variable is renamed to SSTABLE_COMPRESSION_DEPRECATED to detect all uses and prevent future misuse. To prevent old-version nodes from seeing the new name, the compression_parameters class preserves the key name when it is constructed from an options map, and emits the same key name when asked to generate an options map. Existing unit tests are modified to use the new name, and a test is added to ensure the old name is still supported. Fixes #8948. Closes #8949	2021-07-07 19:15:20 +02:00
Piotr Jastrzebski	2d6608bb88	sstables: stop including metadata_collector.hh in sstables.hh metadata collector is rarely used so it's better to include it only in those few places. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-06-27 15:12:31 +02:00
Avi Kivity	00ff3c1366	Merge 'treewide: add support for snapshot skip-flush option' from Benny Halevy The option is provided by nodetool snapshot https://docs.scylladb.com/operating-scylla/nodetool-commands/snapshot/ ``` nodetool [(-h <host> \| --host <host>)] [(-p <port> \| --port <port>)] [(-pp \| --print-port)] [(-pw <password> \| --password <password>)] [(-pwf <passwordFilePath> \| --password-file <passwordFilePath>)] [(-u <username> \| --username <username>)] snapshot [(-cf <table> \| --column-family <table> \| --table <table>)] [(-kc <kclist> \| --kc.list <kclist>)] [(-sf \| --skip-flush)] [(-t <tag> \| --tag <tag>)] [--] [<keyspaces...>] -sf / –skip-flush Do not flush memtables before snapshotting (snapshot will not contain unflushed data) ``` But is currently ignored by scylla-jmx (scylladb/scylla-jmx#167) and not supported at the api level. This patch adds support for the option in advance from the api service level down via snapshot_ctl to the table class and snapshot implementation. In addition, a corresponding unit test was added to verify that taking a snapshot with `skip_flush` does not flush the memtable (at the table::snapshot level). Refs #8725 Closes #8726 * github.com:scylladb/scylla: test: database_test: add snapshot_skip_flush_works api: storage_service/snapshots: support skip-flush option snapshot: support skip_flush option table: snapshot: add skip_flush option api: storage_service/snapshots: add sf (skip_flush) option	2021-06-17 13:32:23 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Benny Halevy	0c80d9d7a7	api: storage_service/snapshots: support skip-flush option The option is provided by nodetool snapshot https://docs.scylladb.com/operating-scylla/nodetool-commands/snapshot/ ``` nodetool [(-h <host> \| --host <host>)] [(-p <port> \| --port <port>)] [(-pp \| --print-port)] [(-pw <password> \| --password <password>)] [(-pwf <passwordFilePath> \| --password-file <passwordFilePath>)] [(-u <username> \| --username <username>)] snapshot [(-cf <table> \| --column-family <table> \| --table <table>)] [(-kc <kclist> \| --kc.list <kclist>)] [(-sf \| --skip-flush)] [(-t <tag> \| --tag <tag>)] [--] [<keyspaces...>] -sf / –skip-flush Do not flush memtables before snapshotting (snapshot will not contain unflushed data) ``` But is currently ignored by scylla-jmx (scylladb/scylla-jmx#167) and not supported at the api level. This patch wires the skip_flush option support to the REST API. Fixes #8725 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-02 17:20:21 +03:00
Benny Halevy	4169f56407	api: storage_service/snapshots: add sf (skip_flush) option Note: I tried adding the option and calling it "skip_flush" but I couldn't make it work with scylla-jmx, hence it's called by the abbreviated name - "sf". Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-02 17:20:19 +03:00
Pavel Emelyanov	651568318d	api: Get features from proxy The reset_local_schema call needs proxy and feature service to do its job. Right now the features are retrived from global storage service, but they are present on the proxy as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-28 18:15:15 +03:00
Pavel Emelyanov	e476247763	api: Use database from http_ctx Instead of getting database from global storage service it's simpler and better to grab it from the http context at hands. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-28 18:08:25 +03:00
Avi Kivity	0acf5bfca6	build: enable -Wreturn-std-move Clang warns when "return std::move(x)" is needed to elide a copy, but the call to std::move() is missing. We disabled the warning during the migration to clang. This patch re-enables the warning and fixes the places it points out, usually by adding std::move() and in one place by converting the returned variable from a reference to a local, so normal copy elision can take place. Closes #8739	2021-05-27 21:16:26 +03:00
Avi Kivity	50f3bbc359	Merge "treewide: various header cleanups" from Pavel S " The patch set is an assorted collection of header cleanups, e.g: * Reduce number of boost includes in header files * Switch to forward declarations in some places A quick measurement was performed to see if these changes provide any improvement in build times (ccache cleaned and existing build products wiped out). The results are posted below (`/usr/bin/time -v ninja dev-build`) for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX). Before: Command being timed: "ninja dev-build" User time (seconds): 28262.47 System time (seconds): 824.85 Percent of CPU this job got: 3979% Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2129888 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1402838 Minor (reclaiming a frame) page faults: 124265412 Voluntary context switches: 1879279 Involuntary context switches: 1159999 Swaps: 0 File system inputs: 0 File system outputs: 11806272 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 After: Command being timed: "ninja dev-build" User time (seconds): 26270.81 System time (seconds): 767.01 Percent of CPU this job got: 3905% Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2117608 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1400189 Minor (reclaiming a frame) page faults: 117570335 Voluntary context switches: 1870631 Involuntary context switches: 1154535 Swaps: 0 File system inputs: 0 File system outputs: 11777280 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 The observed improvement is about 5% of total wall clock time for `dev-build` target. Also, all commits make sure that headers stay self-sufficient, which would help to further improve the situation in the future. " * 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla: transport: remove extraneous `qos/service_level_controller` includes from headers treewide: remove evidently unneded storage_proxy includes from some places service_level_controller: remove extraneous `service/storage_service.hh` include sstables/writer: remove extraneous `service/storage_service.hh` include treewide: remove extraneous database.hh includes from headers treewide: reduce boost headers usage in scylla header files cql3: remove extraneous includes from some headers cql3: various forward declaration cleanups utils: add missing <limits> header in `extremum_tracking.hh`	2021-05-24 14:24:20 +03:00
Avi Kivity	30034371e7	Merge "Remove most of global pointers from repair" from Pavel " There are many global stuff in repair -- a bunch of pointers to sharded services, tracker, map of metas (maybe more). This set removes the first group, all those services had become main-local recently. Along the way a call to global storage proxy is dropped. To get there the repair_service is turned into a "classical" sharded<> service, gets all the needed dependencies by references from main and spreads them internally where needed. Tracker and other stuff is left global, but tracker is now the candidate for merging with the now sharded repair_service, since it emulates the sharded concept internally. Overall the change is - make repair_service sharded and put all dependencies on it at start - have sharded<repair_service> in API and storage service - carry the service reference down to repair_info and repair_meta constructions to give them the depedencies - use needed services in _info and _meta methods tests: unit(dev), dtest.repair(dev) " * 'br-repair-service' of https://github.com/xemul/scylla: (29 commits) repair: Drop most of globals from repair repair: Use local references in messaging handler checks repair: Use local references in create_writer() repair: Construct repair_meta with local references repair: Keep more stuff on repair_info repair: Kill bunch of global usages from insert_repair_meta repair: Pass repair service down to meta insertion repair: Keep local migration manager on repair_info repair: Move unused db captures repair: Remove unused ms captures repair: Construct repair_info with service repair: Loop over repair sharded container repair: Make sync_data_using_repair a method repair: Use repair from storage service repair: Keep repair on storage service repair: Make do_repair_start a method repair: Pass repair_service through the API until do_repair_start repair: Fix indentation after previous patch repair: Split sync_data_using_repair repair: Turn repair_range a repair_info method ...	2021-05-20 10:57:48 +03:00
Pavel Solodovnikov	238273d237	treewide: remove evidently unneded storage_proxy includes from some places Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 02:19:32 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Avi Kivity	6db826475d	Merge "Introduce segregate scrub mode" from Botond " The current scrub compaction has a serious drawback, while it is very effective at removing any corruptions it recognizes, it is very heavy-handed in its way of repairing such corruptions: it simply drops all data that is suspected to be corrupt. While this is the safest way to cleanse data, it might not be the best way from the point of view of a user who doesn't want to loose data, even at the risk of retaining some business-logic level corruption. Mind you, no database-level scrub can ever fully repair data from the business-logic point of view, they can only do so on the database-level. So in certain cases it might be desirable to have a less heavy-handed approach of cleansing the data, that tries as hard as it can to not loose any data. This series introduces a new scrub mode, with the goal of addressing this use-case: when the user doesn't want to loose any data. The new mode is called "segregate" and it works by segregating its input into multiple outputs such that each output contains a valid stream. This approach can fix any out-of-order data, be that on the partition or fragment level. Out-of-order partitions are simply written into a separate output. Out of order fragments are handled by injecting a partition-end/partition-start pair right before them, so that they are now in a separate (duplicate) partition, that will just be written into a separate output, just like a regular out-of-order partition. The reason this series is posted as an RFC is that although I consider the code stable and tested, there are some questions related to the UX. * First and foremost every scrub that does more than just discard data that is suspected to be corrupt (but even these a certain degree) have to consider the possibility that they are rehabilitating corruptions, leaving them in the system without a warning, in the sense that the user won't see any more problems due to low-level corruptions and hence might think everything is alright, while data is still corrupt from the business logic point of view. It is very hard to draw a line between what should and shouldn't scrub do, yet there is a demand from users for scrub that can restore data without loosing any of it. Note that anybody executing such a scrub is already in a bad shape, even if they can read their data (they often can't) it is already corrupt, scrub is not making anything worse here. * This series converts the previous `skip_corrupted` boolean into an enum, which now selects the scrub mode. This means that `skip_corrupted` cannot be combined with segregate to throw out what the former can't fix. This was chosen for simplicity, a bunch of flags, all interacting with each other is very hard to see through in my opinion, a linear mode selector is much more so. * The new segregate mode goes all-in, by trying to fix even fragment-level disorder. Maybe it should only do it on the partition level, or maybe this should be made configurable, allowing the user to select what to happen with those data that cannot be fixed. Tests: unit(dev), unit(sstable_datafile_test:debug) " * 'sstable-scrub-segregate-by-partition/v1' of https://github.com/denesb/scylla: test: boost/sstable_datafile_test: add tests for segregate mode scrub api: storage_service/keyspace_scrub: expose new segregate mode sstables: compaction/scrub: add segregate mode mutation_fragment_stream_validator: add reset methods mutation_writer: add segregate_by_partition api: /storage_service/keyspace_scrub: add scrub mode param sstables: compaction/scrub: replace skip_corrupted with mode enum sstables: compaction/scrub: prevent infinite loop when last partition end is missing tests: boost/sstable_datafile_test: use the same permit for all fragments in scrub tests	2021-05-18 13:43:01 +03:00
Pavel Emelyanov	4f9623fd87	repair: Pass repair_service through the API until do_repair_start The do_repair_start() will need the repair_service reference in the next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-14 18:44:02 +03:00
Avi Kivity	cea5493cb7	storage_proxy, treewide: introduce names for vectors of inet_address storage_proxy works with vectors of inet_addresses for replica sets and for topology changes (pending endpoints, dead nodes). This patch introduces new names for these (without changing the underlying type - it's still std::vector<gms::inet_address>). This is so that the following patch, that changes those types to utils::small_vector, will be less noisy and highlight the real changes that take place.	2021-05-05 18:36:48 +03:00
Botond Dénes	550a1cd036	api: storage_service/keyspace_scrub: expose new segregate mode Allow invoking scrub with the newly added segregate mode as well.	2021-05-05 14:35:04 +03:00
Botond Dénes	34643ac997	api: /storage_service/keyspace_scrub: add scrub mode param Add direct support to the newly added scrub mode enum. Instead of the legacy `skip_corrupted` flag, one can now select the desired mode from the mode enum. `skip_corrupted` is still supported for backwards compatibility but it is ignored when the mode enum is set.	2021-05-05 12:03:42 +03:00
Botond Dénes	03728f5c26	sstables: compaction/scrub: replace skip_corrupted with mode enum We want to add more modes than the current two, so replace the current boolean mode selector with an enum which allows for easy extensions.	2021-05-05 12:03:42 +03:00
Avi Kivity	0af7a22c21	repair: remove partition_checksum and related code `80ebedd242` made row-level repair mandatory, so there remain no callers to partition_checksum. Remove it. Closes #8537	2021-04-22 18:56:53 +03:00
Avi Kivity	f9244734f9	Update seastar submodule * seastar 48376c76a...72e3baed9 (3): > file: Add RFW_NOWAIT detection case for AuFS > sharded: provide type info on no sharded instance exception > iotune: Estimate accuarcy of measurement Added missing include "database.hh" to api/lsa.cc since seastar::sharded<> now needs full type information.	2021-03-31 10:40:04 +03:00
Ivan Prisyazhnyy	778d9217f3	tracing: api: fast mode doc improvement Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>	2021-03-30 16:22:56 +02:00
Piotr Wojtczak	c1daf2bb24	column_family: Make toppartitions queries more generic Right now toppartitions can only be invoked on one column family at a time. This change introduces a natural extension to this functionality, allowing to specify a list of families. We provide three ways for filtering in the query parameter "name_list": 1. A specific column family to include in the form "ks:cf" 2. A keyspace, telling the server to include all column families in it. Specified by omitting the cf name, i.e. "ks:" 3. All column families, which is represented by an empty list The list can include any amount of one or both of the 1. and 2. option. Fixes #4520 Closes #7864	2021-03-24 17:54:05 +02:00
Ivan Prisyazhnyy	7cbe2aa9c6	tracing: rest api for lightweight slow query tracing The patch adds REST API support for the lightweight slow query tracing (fast) mode that is implemented by omitting all of the trace events during the tracing. $ curl -v http://localhost:10000/storage_service/slow_query $ curl -v --request POST http://localhost:10000/storage_service/slow_query\?fast=true\&enable=true Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>	2021-03-18 15:05:05 +02:00
Avi Kivity	5342d79461	Merge "Preparatory work in sstable_set for the upcoming compound_sstable_set_impl" from Raphael * 'preparatory_work_for_compound_set' of github.com:raphaelsc/scylla: sstable_set: move all() implementation into sstable_set_impl sstable_set: preparatory work to change sstable_set::all() api sstables: remove bag_sstable_set	2021-03-10 19:19:26 +02:00
Raphael S. Carvalho	05b07c7161	sstable_set: preparatory work to change sstable_set::all() api users of sstable_set::all() rely on the set itself keeping a reference to the returned list, so user can iterate through the list assuming that it is alive all the way through. this will change in the future though, because there will be a compound set impl which will have to merge the all() of multiple managed sets, and the result is a temporary value. so even range-based loops on all() have to keep a ref to the returned list, to avoid the list from being prematurely destroyed. so the following code for (auto& sst : sstable_set.all()) { ...} becomes for (auto sstables = sstable_set.all(); auto& sst : sstables) { ... } Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-10 12:02:12 -03:00
Asias He	61ac8d03b9	repair: Add ignore_nodes option In some cases, user may want to repair the cluster, ignoring the node that is down. For example, run repair before run removenode operation to remove a dead node. Currently, repair will ignore the dead node and keep running repair without the dead node but report the repair is partial and report the repair is failed. It is hard to tell if the repair is failed only due to the dead node is not present or some other errors. In order to exclude the dead node, one can use the hosts option. But it is hard to understand and use, because one needs to list all the "good" hosts including the node itself. It will be much simpler, if one can just specify the node to exclude explicitly. In addition, we support ignore nodes option in other node operations like removenode. This change makes the interface to ignore a node explicitly more consistent. Refs: #7806 Closes #8233	2021-03-09 16:03:13 +01:00
Avi Kivity	5f4bf18387	Revert "Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros" This reverts commit `31909515b3`, reversing changes made to `ef97adc72a`. It shows many serious regressions in dtest. Fixes #8197.	2021-03-02 13:21:22 +02:00
Tomasz Grabiec	761f89e55e	api: Introduce system/drop_sstable_caches RESTful API Evicts objects from caches which reflect sstable content, like the row cache. In the future, it will also drop the page cache and sstable index caches. Unlike lsa/compact, doesn't cause reactor stalls. The old lsa/compact call invokes memory reclamation, which is non-preemptible. It also compacts LSA segments, so does more work. Some use cases don't need to compact LSA segments, just want the row cache to be wiped. Message-Id: <20210301120211.36195-1-tgrabiec@scylladb.com>	2021-03-01 16:13:04 +02:00
Avi Kivity	31909515b3	Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros Currently, the sstable_set in a table is copied before every change to allow accessing the unchanged version by existing sstable readers. This patch changes the sstable_set to a structure that keeps all its versions that are referenced somewhere and provides a way of getting a reference to an immutable version of the set. Each sstable in the set is associated with the versions it is alive in, and is removed when all such versions don't have references anymore. To avoid copying, the object holding all sstables in the set version is changed to a new structure, sstable_list, which was previously an alias for std::unordered_set<shared_sstable>, and which implements most of the methods of an unordered_set, but its iterator uses the actual set with all sstables from all referenced versions and iterates over those sstables that belong to the captured version. The methods that modify the sets contents give strong exception guarantee by trying to insert new sstables to its containers, and erasing them in the case of an caught exception. To release shared_sstables as soon as possible (i.e. when all references to versions that contain them die), each time a version is removed, all sstables that were referenced exclusively by this version are erased. We are able to find these sstables efficiently by storing, for each version, all sstables that were added and erased in it, and, when a version is removed, merging it with the next one. When a version that adds an sstable gets merged with a version that removes it, this sstable is erased. Fixes #2622 Signed-off-by: Wojciech Mitros wojciech.mitros@scylladb.com Closes #8111 * github.com:scylladb/scylla: sstables: add test for checking the latency of updating the sstable_set in a table sstables: move column_family_test class from test/boost to test/lib sstables: use fast copying of the sstable_set instead of rebuilding it sstables: replace the sstable_set with a versioned structure sstables: remove potential ub sstables: make sstable_set constructor less error-prone	2021-03-01 14:16:36 +02:00
Amnon Heiman	0595596172	api/compaction_manager: add the compaction id in get_compaction This patch adds the compaction id to the get_compaction structure. While it was supported, it was not used and up until now wasn't needed. After this patch a call to curl -X GET 'http://localhost:10000/compaction_manager/compactions' will include the compaction id. Relates to #7927 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes #8186	2021-03-01 10:51:31 +02:00
Kamil Braun	e2f03e4aba	cdc: move (most of) CDC generation management code to the new service Currently all management of CDC generations happens in storage_service, which is a big ball of mud that does many unrelated things. Previous commits have introduced a new service for managing CDC generations. This code moves most of the relevant code to this new service. However, some part still remains in storage_service: the bootstrap procedure, which happens inside storage_service, must also do some initialization regarding CDC generations, for example: on restart it must retrieve the latest known generation timestamp from disk; on bootstrap it must create a new generation and announce it to other nodes. The order of these operations w.r.t the rest of the startup procedure is important, hence the startup procedure is the only right place for them. Still, what remains in storage_service is a small part of the entire CDC generation management logic; most of it has been moved to the new service. This includes listening for generation changes and updating the data structures for performing CDC log writes (cdc::metadata). Furthermore these functions now return futures (and are internally coroutines), where previously they required a seastar::async context.	2021-02-26 12:06:12 +01:00
Wojciech Mitros	48153a1e2c	sstables: remove potential ub If the range expression in a range based for loop returns a temporary, its lifetime is extended until the end of the loop. The same can't be said about temporaries created within the range expression. In our case, *t->get_sstables_including_compacted_undeleted() returns a reference to a const sstable_list, but the t->get_sstables_including_compacted_undeleted() is a temporary lw_shared_ptr, so its lifetime may not be prolonged until the end of the loop, and it may be the sole owner of the referenced sstable_list, so the referenced sstable_list may be already deleted inside the loop too. Fix by creating a local copy of the lw_shared_ptr, and get reference from it in the loop. Fixes #7605 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-02-11 11:02:55 +01:00
Amnon Heiman	4498bb0a48	API: Fix aggregation in column_familiy Few method in column_familiy API were doing the aggregation wrong, specifically, bloom filter disk size. The issue is not always visible, it happens when there are multiple filter files per shard. Fixes #4513 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes #8007	2021-02-08 12:11:30 +02:00

1 2 3 4 5 ...

545 Commits