scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Avi Kivity	bc75e2c1d1	treewide: wrap runtime formats with fmt::runtime for fmt 8 fmt 8 checks format strings at compile time, and requires that non-compile-time format strings be wrapped with fmt::runtime(). Do that, and to allow coexistence with fmt 7, supply our own do-nothing version of fmt::runtime() if needed. Strictly speaking we shouldn't be introducing names into the fmt namespace, but this is transitional only. Closes #9640	2021-11-17 15:21:36 +02:00
Calle Wilund	a8bb4dcd28	tls: Add certficate_revocation_list option for client/server encryption options Fixes #9630 Adds support for importing a CRL certificate reovcation list. This will be monitored and reloaded like certs/keys. Allows blacklisting individual certs. Closes #9655	2021-11-17 14:24:22 +02:00
Avi Kivity	edcdbc16d3	db: heat weighted load balancing: remove unused variable total_deficit The variable is write-only. Closes #9647	2021-11-17 09:02:23 +02:00
Avi Kivity	e2c27ee743	Merge 'commitlog: recalculate disk footprint on delete_segment exceptions' from Calle Wilund If we get errors/exceptions in delete_segments we can (and probably will) loose track of disk footprint counters. This can in turn, if using hard limits, cause us to block indefinitely on segment allocation since we might think we have larger footprint than we actually do. Of course, if we actually fail deleting a segment, it is 100% true that we still technically hold this disk footprint (now unreachable), but for cases where for example outside forces (or wacky tests) delete a file behind our backs, this might not be true. One could also argue that our footprint is the segments and file names we keep track of, and the rest is exterior sludge. In any case, if we have any exceptions in delete_segments, we should recalculate disk footprint based on current state, and restart all new_segment paths etc. Fixes #9348 (Note: this is based on previous PR #9344 - so shows these commits as well. Actual changes are only the latter two). Closes #9349 * github.com:scylladb/scylla: commitlog: Recalculate footprint on delete_segment exceptions commitlog_test: Add test for exception in alloc w. deleted underlying file commitlog: Ensure failed-to-create-segment is re-deleted commitlog::allocate_segment_ex: Don't re-throw out of function	2021-11-16 17:44:56 +02:00
Pavel Emelyanov	a62631d441	config: Enable developer-mode by default in dev/debug modes Other than looking sane, this change continues the founded by the --workdir option tradition of freeing the developer form annoying necessity to type too many options when scylla is started by hand for devel purposes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20211116104815.31822-1-xemul@scylladb.com>	2021-11-16 12:53:33 +02:00
Botond Dénes	64bb48855c	flat_mutation_reader: revamp flat_mutation_reader_from_mutations() Add schema parameter so that: * Caller has better control over schema -- especially relevant for reverse reads where it is not possible to follow the convention of passing the query schema which is reversed compared to that of the mutations. * Now that we don't depend on the mutations for the schema, we can lift the restriction on mutations not being empty: this leads to safer code. When the mutations parameter is empty, an empty reader is created. Add "make_" prefix to follow convention of similar reader factory functions. Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>	2021-11-15 17:58:46 +02:00
Michael Livshin	a7511cf600	system keyspace: record partitions with too many rows Add "rows" field to system.large_partitions. Add partitions to the table when they are too large or have too many rows. Fixes #9506 Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Closes #9577	2021-11-14 14:25:18 +02:00
Pavel Emelyanov	4a70e0aa57	system_keyspace: Table with config options A config option value is reported as 'text' type and contains a string as it would looks like in json config. The table is UPDATE-able. Only the 'value' columnt can be set and the value accepted must be string. It will be converted into the option type automatically, however in current implementation is't not 100% precise -- conversion is lexicographical cast which only works for simple types. However, liveupdate-able values are only of those types, so it works in supported cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-11 16:39:34 +03:00
Pavel Emelyanov	947e4c9a10	code: Push db::config down to virtual tables The db::config reference is available on the database, which can be get from the virtual_table itself. The problem is that it's a const refernece, while system.config will be updateable and will need non-const reference. Adding non-const get_config() on the database looks wrong. The database shouldn't be used as config provider, even the const one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-11 16:39:34 +03:00
Pavel Emelyanov	1ea301ad07	storage_proxy: Propagate virtual table exceptions messages The intention is to return some meaningful info to the CQL caller if a virtual table update fails. Unfortunately the "generic" error reporting in CQL is not extremely flexible, so the best option seems to report regular write failre with custom message in it. For now this only works for virtual table errors. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-11 16:39:34 +03:00
Pavel Emelyanov	5aefc48e28	table: Virtual writer hook (mutation applier) Symmetrically to virtual reader one, add the virtual writer callback on a table that will be in charge of applying the provided mutation. If a virtual table doesn't override this apply method the dedicated exception is thrown. Next patch will catch it and propagate back to caller, so it's a new exception type, not existing/std one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-11 16:39:34 +03:00
Pavel Emelyanov	d513034ca4	utils: Ability to set_value(sstring) for an option There soon will appear an updateable system.config table that will push sstrings into names_value-s. Prepare for this change by adding the respective .set_value() call. Since the update only works for LiveUpdate-able options, and inability to do it can be propagated back to the caller make this method return true/false whether the update took place or not. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-11 15:15:05 +03:00
Calle Wilund	3929b7da1f	commitlog: Add explicit track var for "wasted space" to avoid double counting Refs #9331 In segment::close() we add space to managers "wasted" counter. In destructor, if we can cleanly delete/recycle the file we remove it. However, if we never went through close (shutdown - ok, exception in batch_cycle - not ok), we can end up subtracting numbers that were never added in the first place. Just keep track of the bytes added in a var. Observed behaviour in above issue is timeouts in batch_cycle, where we declare the segment closed early (because we cannot add anything more safely - chunks could get partial/misplaced). Exception will propagate to caller(s), but the segment will not go through actual close() call -> destructor should not assume such. Closes #9598	2021-11-09 09:15:44 +02:00
Avi Kivity	b0a2a9771f	Merge "Sanitize hostnames resolving on start" from Pavel E " On start scylla resolves several hostnames into addresses. Different places use different hostname selection logic, e.g. the API address can be the listen one if the dedicated option not set. Failure to resolve a hostname is reported with an exception that (sometimes) contains the hostname, but it doesn't look very convenient -- better to know the config option name. Also resolving of different hostnames has different decoration around, e.g. prometheus carries a main-local lambda just to nicely wrap the try/catch block. This set unifies this zoo and makes main() shorter and less hairy: 1. All failures to resolve a hostname are reported with an exception containing the relevant config option 2. The \|\| operator for named_value's is introduced to make the option selection look as short as resolve(cfg->some_address() \|\| cfg->another_address()) 3. All sanity checks are explicit and happen early in main 4. No dangling local variables carrying the cfg->...() value 5. Use resolved IP when logging a "... is listening on ..." message after a service start tests: unit(dev) " * 'br-ip-resolve-on-start' of https://github.com/xemul/scylla: main: Move fb-utilities initialization up the main code: Use utils::resolve instead of inet_address::lookup main: Remove unused variable main: Sanitize resolving of listen address main: Sanitize resolving of broadcast address main: Sanitize resolving of broadcast RPC address main: Sanitize resolving of API address main: Sanitize resolving of prometheus address utils: Introduce \|\| operator for named_values db.config: Verbose address resolver helper main: Remove api-port and prometheus-port variables alternator: Resolve address with the help of inet_address redis, thrift: Remove unused captures	2021-11-09 09:15:40 +02:00
Botond Dénes	5b3ac3147b	db/schema_tables: merge_tables_and_views(): match old/new view with old/new base table For altered tables, the above function creates schema objects representing before/after (old/new) table states. In case of views, there is a matching mechanism to set the base table field of the view to the appropriate base table object. This works by iterating over the list of altered tables and selecting the "new_schema" field of the first instance matching the keyspace/name of the base-table. This ends up pairing the after/old version of the base table to both the before and after version of the view. This means the base attached to the view is possibly incompatible with the view it is attached to. This patch fixes this by passing the schema generation (before/after) to the function responsible for this matching, so it can select the appropriate version of the base class. For example, given the following input to `merge_tables_and_views()`: tables_before = { t1_before } tables_after = { t1_after } views_before = { v1_before } views_after = { v1_after } Before this patch, the `base_schema` field of `v1_before` would be `t1_after`, while it obviously should be `t1_before`. This sounds scary but has no practical implications currently as `v1_before` is only computed and then discarded without being used. Tests: unit(dev) Fixes: #9586 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211108124806.151268-1-bdenes@scylladb.com>	2021-11-09 09:13:51 +02:00
Pavel Emelyanov	2f9c21644b	code: Use utils::resolve instead of inet_address::lookup There are some users of the latter call left. They all suffer from the same problem -- the lack of verbosity on resolving errors. While at it also get rid of useless local variables that are only there to carry the cfg->...() option over. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-08 17:33:27 +03:00
Pavel Emelyanov	71ce7c6e87	db.config: Verbose address resolver helper The helper works on named_value() and throws and exception containing the option name for convenient error reporting. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-08 17:33:27 +03:00
Avi Kivity	247f2b69d5	Merge "system tables: create the schema more efficiently" from Botond " System tables currently almost uniformly use a pattern like this to create their schema: return schema_builder(make_shared_schema(...)) // [...] .with_version(...) .build(...); This pattern is very wasteful because it first creates a schema, then dismantles it just to recreate it again. This series abolishes this pattern without much churn by simply adding a constructor to schema builder that takes identical parameters to `make_shared_schema()`, then simply removing `make_shared_schema()` from these users, who now build a schema builder object directly and build the schema only once. Tests: unit(dev) " * 'schema-builder-make-shared-schema-ctor/v1' of https://github.com/denesb/scylla: treewide: system tables: don't use make_shared_schema() for creating schemas schema_builder: add a constructor providing make_shared_schema semantics schema_builder: without_column(): don't assume column_specification exists schema: add static variant of column_name_type()	2021-11-07 18:23:22 +02:00
Botond Dénes	d51aa66a8a	db/system_keyspace: add versions table Contains all version related information (`nodetool version` and more). Example printout: (cqlsh) select * from system.versions; key \| build_id \| build_mode \| version -------+------------------------------------------+------------+------------------------------- local \| aaecce2f5068b0160efd04a09b0e28e100b9cd9e \| dev \| 4.6.dev-0.20211021.0d744fd3fa	2021-11-05 15:42:42 +02:00
Botond Dénes	89cc016f07	db/system_keyspace: add runtime_info table Loosly contains the equivalent of the `nodetool info` command, with some notable differences: * Protocol server related information is in `system.protocol_servers`; * Information about memory, memtable and cache is reformatted to be tailored to scylla: C* specific terminology and metrics are dropped; * Information that doesn't change and is already in `system.local` is not contained; * Added trace-probability too (`nodetool gettraceprobability`); TODO(follow-up): exceptions.	2021-11-05 15:42:42 +02:00
Botond Dénes	78adda197f	db/system_keyspace: add protocol_servers table Lists all the client protocol server and their status. Example output: (cqlsh) select * from system.protocol_servers; name \| is_running \| listen_addresses \| protocol \| protocol_version ------------------+------------+---------------------------------------+----------+------------------ native transport \| True \| ['127.0.0.1:9042', '127.0.0.1:19042'] \| cql \| 3.3.1 alternator \| False \| [] \| dynamodb \| rpc \| False \| [] \| thrift \| 20.1.0 redis \| False \| [] \| redis \| This prints the equivalent of `nodetool statusbinary` and the "Thrift active" and "Native Transport active" fields from the `nodetool info` output with some additional information: * It contains alternator and redis status; * It contains the protocol version; * It contains the listen addresses (if respective server is running);	2021-11-05 15:42:42 +02:00
Botond Dénes	64f658aea4	db/system_keyspace: add snapshots virtual table Lists the equivalent of the `nodetool listsnapshots` command.	2021-11-05 15:42:41 +02:00
Botond Dénes	f0281eaa98	db/virtual_table: remove _db member This member is potentially dangerous as it only becomes non-null sometimes after the virtual table object is constructed. This is asking for nullptr dereference. Instead, remove this member and have virtual table implementations that need a db, ask for it in the constructor, it is available in `register_virtual_tables()` now.	2021-11-05 15:42:41 +02:00
Botond Dénes	200e2fad4d	db/system_keyspace: propagate distributed<> database and storage_service to register_virtual_tables() As some virtual tables will need the distributed versions of these.	2021-11-05 15:42:41 +02:00
Botond Dénes	ccf5c31776	treewide: system tables: don't use make_shared_schema() for creating schemas `make_shared_schema()` is a convenience method for creating a schema in a single function call, however it doesn't have all the advanced capabilities as `schema_builder`. So most users (which all happen to be system tables) pass the schema created by it to schema builder immediately to do some further tweaking, effectively building the schema twice. This is wasteful. This patch changes all these users to use the newly added `schema_builder()` constructor which has the same signature (and therefore ease-of-use) as `make_shared_schema()`.	2021-11-05 11:41:04 +02:00
Raphael S. Carvalho	2bf47c902e	cql: set configurable restriction of DateTieredCompactionStrategy to warn by default Setting a value of "warn" will still allow the create or alter commands, but will warn the user, with a message that will appear both at the log and also at cqlsh for example. This is another step towards deprecating DTCS. Users need to know we're moving towards this direction, and setting the default value to warn is needed for this. Next step is to set it to false, and finally remove it from the code base. Refs #8914. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211029184503.102936-1-raphaelsc@scylladb.com>	2021-10-31 09:28:17 +02:00
Nadav Har'El	666017f2f0	Merge 'Convert last uses of sprint() to fmt::format()' from Avi Kivity sprint() uses the printf-style formatting language while most of our code uses the Python-derived format language from fmt::format(). The last mass conversion of sprint() to fmt (in `1129134a4a`) missed some callers (principally those that were on multiple lines, and so the automatic converter missed them). Convert the remainder to fmt::format(), and some sprintf() and printf() calls, so we have just one format language in the code base. Seastar::sprint() ought to be deprecated and removed. Test: unit (dev) Closes #9529 * github.com:scylladb/scylla: utils: logalloc: convert debug printf to fmt::print() utils: convert fmt::fprintf() to fmt::print() main: convert fprint() to fmt::print() compress: convert fmt::sprintf() to fmt::format() tracing: replace seastar::sprint() with fmt::format() thrift: replace seastar::sprint() with fmt::format() test: replace seastar::sprint() with fmt::format() streaming: replace seastar::sprint() with fmt::format() storage_service: replace seastar::sprint() with fmt::format() repair: replace seastar::sprint() with fmt::format() redis: replace seastar::sprint() with fmt::format() locator: replace seastar::sprint() with fmt::format() db: replace seastar::sprint() with fmt::format() cql3: replace seastar::sprint() with fmt::format() cdc: replace seastar::sprint() with fmt::format() auth: replace seastar::sprint() with fmt::format()	2021-10-28 22:33:23 +03:00
Benny Halevy	a2fc3345bd	storage_service: futurize storage_service::describe_ring Convert storage_service::describe_ring to a coroutine to prevent reactor stalls as seen in #9280. Fixes #9280 Closes #9282 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #9282	2021-10-28 16:51:57 +03:00
Botond Dénes	7c95bd3343	Merge 'Rename 'system.status' and 'system.describe_ring' virtual tables' from Avi Kivity 'system.status' and 'system.describe_ring' are imperfect names for what they do, so rename them. Fortunately they aren't exposed in any released version so there is no compatibility concern. Closes #9530 * github.com:scylladb/scylla: system_keyspace: rename 'system.describe_ring' to 'system.token_ring' system_keyspace: rename 'system.status' to 'system.cluster_status'	2021-10-28 11:46:20 +03:00
Avi Kivity	5ea0940ca9	system_keyspace: rename 'system.describe_ring' to 'system.token_ring' Table names are usually nouns, so SELECT/INSERT statements sound natural: "SELECT * FROM pets". 'system.describe_ring' defies this convention. Rename it to 'system.token_ring' so selects are natural. The name is not in any released version, so we can safely rename it.	2021-10-27 17:32:37 +03:00
Avi Kivity	5b21e4eb83	system_keyspace: rename 'system.status' to 'system.cluster_status' 'system.status' is too generic, it doesn't explain the status of what. 'system.node_status' is also ambiguous (this node? all nodes?) so I picked 'system.cluster_status'. The internal name, nodetool_status_table, was even worse (we're not querying the status of nodetool!) but fortunately wasn't exposed. The name is not in any released version, so we can safely rename it.	2021-10-27 17:31:45 +03:00
Avi Kivity	d9d03383fa	db: replace seastar::sprint() with fmt::format() sprint() is obsolete.	2021-10-27 17:02:00 +03:00
Benny Halevy	a21b1fbb2f	large_data_handle: add sstable name to log messages Although the sstable name is part of the system.large_* records, it is not printed in the log. In particular, this is essential for the "too many rows" warning that currently does not record a row in any large_* table so we can't correlate it with a sstable. Fixes #9524 Test: unit(dev) DTest: wide_rows_test.py Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211027074104.1753093-1-bhalevy@scylladb.com>	2021-10-27 10:53:11 +03:00
Benny Halevy	5f513ed28b	view_builder: consumer: flush_fragments: close reader on error Make sure to close the reader created by flush_fragments if an exception occurs before it's moved to `populate_views`. Note that it is also ok to close the reader _after_ it has been moved, in case populate_views itself throws after closing the reader that was moved it. For conveience flat_mutation_reader::close supports close-after-move. Fixes #9479 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211024164138.1100304-1-bhalevy@scylladb.com>	2021-10-24 19:53:31 +03:00
Nadav Har'El	e4a6569258	config: experimental flag UNUSED_CDC shouldn't be distinct from UNUSED When an experimental feature graduates from being experimental, we want to continue allow the old "--experimental-features=..." option to work, in case some user's configuration uses it - just do nothing. The way we do it is to map in db::experimental_features_t::map() the feature's name to the UNUSED value - this way the feature's name is accepted, but doesn't change anything. When the CDC feature graduated from being experimental, a new bit UNUSED_CDC was introduced to do the same thing. This separate bit was not actually necessary - if we ever check for UNUSED_CDC bit anywhere in the code it means the flag isn't actually unused ;-) And we don't check it. So simplify the code by conflating UNUSED_CDC into UNUSED. This will also make it easy to build from db::experimental_features_t::map() a list of current experimental features - now it will simply be those that do not map to UNUSED. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211013105107.123544-1-nyh@scylladb.com>	2021-10-20 17:54:17 +03:00
Nadav Har'El	ddba510e64	config: add name for the experimental Alternator TTL feature Earlier we added experimental (and very incomplete) support for Alternator's TTL feature, but forgot to set a name for this experimental feature. As a result, this feature can be enabled only with the blanket "--experimental" option and not with a specific "--experimental-features=..." option. Since issue #9467 deprecated the blanket "--experimental" option and users are encouraged to only enable specific experimental features, it is important that we have a name for it. So the name chosen in this patch is "alternator-ttl". Eventually this feature might evolve beyond Alternator-only, but for now, I think it's a good name and we'll probably graduate the experimental Alternator TTL feature before supporting CQL, so it will be a new experimental feature anyway. Refs #9467. db/config.cc Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211012110312.719654-1-nyh@scylladb.com>	2021-10-15 16:36:23 +03:00
Tomasz Grabiec	cc56a971e8	database, treewide: Introduce partition_slice::is_reversed() Cleanup, reduces noise. Message-Id: <20211014093001.81479-1-tgrabiec@scylladb.com>	2021-10-14 12:39:16 +03:00
Nadav Har'El	cad039421a	config: automate help-string listing experimental features The help string from the "--experimental-features" command-line option lists the available experimental features, to helping a user who might want to enable them. But this help string was manually written, and has since drifted from reality: * Two of the listed "experimental" features, cdc and lwt, have actually graduated from being experimental long ago. Although technically a user may still use the words "cdc" and "lwt" in the "experimental-features" parameter, doing so is pointless, and worse: This text in the help string can mislead a user into thinking that these two features are still experimental - while they are not! * One experimental feature - alternator-ttl - is missing from this list. Instead of updating the help string text now - and needing to do this again and again in the future as we change experimental features - what this patch does is to construct the list of features automatically from the map of supported feature names - excluding any features which map to UNUSED. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211013122635.132582-1-nyh@scylladb.com>	2021-10-14 10:39:58 +03:00
Benny Halevy	4d2561ff75	abstract_replication_strategy: precacluate get_replication_factor for effective_replication_map Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	cddd16f22d	db: view: use effective_replication_map to get_natural_endpoints Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 13:55:50 +03:00
Benny Halevy	96aa6161d8	db: hints manager: use effective_replication_map to get_natural_endpoints Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 13:54:52 +03:00
Benny Halevy	3393df45eb	token_metadata, storage_service: unify token_metadata_lock and merge_lock. Serialize the metadata changes with keyspace create, update, or drop. This will become necessary in the following patch when we update the effective_replication_map on all keyspaces and we want instances on all shards end up with the same replication map. Note that storage_service::keyspace_changed is called from the scheme_merge path so it already holds the merge_lock. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 13:01:25 +03:00
Pavel Solodovnikov	8b917f7c99	db: mark `--experimental` option deprecated The documentation for --experimental config option states that it enables all experimental features, but this is no longer true, i.e.: raft feature is not enabled with it and should be explicitly enabled via `--experimental-features=raft` switch (we don't want to enable it by default alongside other features). Since the flag doesn't do what it's intended to, we should mark it as "deprecated", because documenting each exception (there could be more than only raft in the future) will be a burden and docs will constantly go out-of-sync with the code. Adjust the description for the option to reflect that, mark it "deprecated" and suggest using --experimental-features, instead. Fixes: #9467 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20211012093005.20871-2-pa.solodovnikov@scylladb.com>	2021-10-12 13:22:12 +03:00
Pavel Solodovnikov	162f1899e8	db: update the list of supported experimental features `raft` and `alternator-streams` features were missing from the description for `experimental-features` config flag. Update `scylla.yaml` template comments to reflect that, too. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20211012093005.20871-1-pa.solodovnikov@scylladb.com>	2021-10-12 13:22:11 +03:00
Pavel Emelyanov	c504361c15	view_builder: Accept view_build_statuses The code itself is already in relevant .cc file, not move it to the relevant class. The only significant change is where to get token metadata from. In its old location tokens were provided by the storage service itself, now when it's in the view builder there's no "native" place to get them from, however the rest of the view building code gets tokens from global storage proxy, so do the same here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-10-11 11:11:40 +03:00
Pavel Emelyanov	3b6e8c7d93	storage_service: Move view_build_statuses code This code belongs to view builder, so put it into its .cc. No changes, just move. This needs some ugly namespace breakage, but they will be patched away with the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-10-11 11:11:29 +03:00
Pavel Emelyanov	4b4ce015aa	system-keyspace: Keep UUID value when saving The set_local_host_id() accepts UUID references and starts to save it in local keyspace and in all shards' local cache. Before it was coroutinized the UUID was copied on captures and survived, after it it remains references. The problem is that callers pass local variables as arguments that go away "really soon". Fix it to accept UUID as value, it's short enough for safe and painless copy. fixes: #9425 tests: dtest.ReplaceAddress_rbo_enabled.replace_node_diff_ip(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20211004145421.32137-1-xemul@scylladb.com>	2021-10-04 18:21:44 +03:00
Tomasz Grabiec	e89b9799b8	Merge 'sstable mx reader: implement reverse single-partition reads' from Kamil Braun Until now reversed queries were implemented inside `querier::consume_page` (more precisely, inside the free function `consume_page` used by `querier::consume_page`) by wrapping the passed-in reader into `make_reversing_reader` and then consuming fragments from the resulting reversed reader. The first couple of commits change that by pushing the reversing down below the `make_combined_reader` call in `table::query`. This allows working on improving reversing for memtables independently from reversing for sstables. We then extend the `index_reader` with functions that allow reading the promoted index in reverse. We introduce `partition_reversing_data_source`, which wraps an sstable data file and returns data buffers with contents of a single chosen partition as if the rows were stored in reverse order. We use the reversing source and the extended index reader in `mx_sstable_mutation_reader` to implement efficient (at least in theory) reversed single-partition reads. The patchset disables cache for reversed reads. Fast-forwarding is not supported in the mx reader for reversed queries at this point. Details in commit messages. Read the commits in topological order for best review experience. Refs: #9134 (not saying "Fixes" because it's only for single-partition queries without forwarding) Closes #9281 * github.com:scylladb/scylla: table: add option to automatically bypass cache for reversed queries test: reverse sstable reader with random schema and random mutations sstables: mx: implement reversed single-partition reads sstables: mx: introduce partition_reversing_data_source sstables: index_reader: add support for iterating over clustering ranges in reverse clustering_key_filter: clustering_key_filter_ranges owning constructor flat_mutation_reader: mention reversed schema in make_reversing_reader docstring clustering_key_filter: document clustering_key_filter_ranges::get_ranges	2021-10-04 15:37:34 +02:00
Kamil Braun	703aed3277	table: add option to automatically bypass cache for reversed queries Currently the new reversing sstable algorithms do not support fast forwarding and the cache does not yet handle reversed results. This forced us to disable the cache for reversed queries if we want to guarantee bounded memory. We introduce an option that does this automatically (without specifying `bypass cache` in the query) and turn it on by default. If the user decides that they prefer to keep the cache at the cost of fetching entire partitions into memory (which may be viable if their partitions are small) during reversed queries, the option can be turned off. It is live-updateable.	2021-10-04 15:24:12 +02:00
Piotr Dulikowski	6093c2378b	hints: assign _last_written_rp in ep manager's move constructor The end_point_hints_manager's field _last_written_rp is initialized in its regular constructor, but is not copied in the move constructor. Because the move constructor is always involved when creating a new endpoint manager, the _last_written_rp field is effectively always initialized with the zero-argument constructor, and is set to the zero value. This can cause the following erroneous situation to occur: - Node A accumulates hints towards B. - Sync point is created at A. It will be used later to wait for currently accumulated hints. - Node A is restarted. The endpoint manager A->B is created which has bogus value in the _last_written_rp (it is set to zero). - Node A replays its hints but does not write any new ones. - A hint flush occurs. If there are no hint segments on disk after flush, the endpoint manager sets its last sent position to the last written position, which is by design. However, the last written position has incorrect value, so the last sent position also becomes incorrect and too low. - Try to wait for the sync point created earlier. The sync point waiting mechanism waits until last sent hint position reaches or goes past the position encoded in the sync point, but it will not happen because the last sent position is incorrect. The above bug can be (sometimes) reproduced in hintedhandoff_sync_point_api_test dtest. Now, the _last_written_rp field is properly initialized in the move constructor, which prevents the bug described above. Fixes: #9320 Closes #9426	2021-10-04 13:21:34 +02:00

1 2 3 4 5 ...

2341 Commits