Commit Graph

2341 Commits

Author SHA1 Message Date
Avi Kivity
bc75e2c1d1 treewide: wrap runtime formats with fmt::runtime for fmt 8
fmt 8 checks format strings at compile time, and requires that
non-compile-time format strings be wrapped with fmt::runtime().

Do that, and to allow coexistence with fmt 7, supply our own
do-nothing version of fmt::runtime() if needed. Strictly speaking
we shouldn't be introducing names into the fmt namespace, but this
is transitional only.

Closes #9640
2021-11-17 15:21:36 +02:00
Calle Wilund
a8bb4dcd28 tls: Add certficate_revocation_list option for client/server encryption options
Fixes #9630

Adds support for importing a CRL certificate reovcation list. This will be
monitored and reloaded like certs/keys. Allows blacklisting individual certs.

Closes #9655
2021-11-17 14:24:22 +02:00
Avi Kivity
edcdbc16d3 db: heat weighted load balancing: remove unused variable total_deficit
The variable is write-only.

Closes #9647
2021-11-17 09:02:23 +02:00
Avi Kivity
e2c27ee743 Merge 'commitlog: recalculate disk footprint on delete_segment exceptions' from Calle Wilund
If we get errors/exceptions in delete_segments we can (and probably will) loose track of disk footprint counters. This can in turn, if using hard limits, cause us to block indefinitely on segment allocation since we might think we have larger footprint than we actually do.

Of course, if we actually fail deleting a segment, it is 100% true that we still technically hold this disk footprint (now unreachable), but for cases where for example outside forces (or wacky tests) delete a file behind our backs, this might not be true. One could also argue that our footprint is the segments and file names we keep track of, and the rest is exterior sludge.

In any case, if we have any exceptions in delete_segments, we should recalculate disk footprint based on current state, and restart all new_segment paths etc.

Fixes #9348

(Note: this is based on previous PR #9344 - so shows these commits as well. Actual changes are only the latter two).

Closes #9349

* github.com:scylladb/scylla:
  commitlog: Recalculate footprint on delete_segment exceptions
  commitlog_test: Add test for exception in alloc w. deleted underlying file
  commitlog: Ensure failed-to-create-segment is re-deleted
  commitlog::allocate_segment_ex: Don't re-throw out of function
2021-11-16 17:44:56 +02:00
Pavel Emelyanov
a62631d441 config: Enable developer-mode by default in dev/debug modes
Other than looking sane, this change continues the founded by the
--workdir option tradition of freeing the developer form annoying
necessity to type too many options when scylla is started by hand
for devel purposes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211116104815.31822-1-xemul@scylladb.com>
2021-11-16 12:53:33 +02:00
Botond Dénes
64bb48855c flat_mutation_reader: revamp flat_mutation_reader_from_mutations()
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
  reverse reads where it is not possible to follow the convention of
  passing the query schema which is reversed compared to that of the
  mutations.
* Now that we don't depend on the mutations for the schema, we can lift
  the restriction on mutations not being empty: this leads to safer
  code. When the mutations parameter is empty, an empty reader is
  created.
Add "make_" prefix to follow convention of similar reader factory
functions.

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
2021-11-15 17:58:46 +02:00
Michael Livshin
a7511cf600 system keyspace: record partitions with too many rows
Add "rows" field to system.large_partitions.  Add partitions to the
table when they are too large or have too many rows.

Fixes #9506

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>

Closes #9577
2021-11-14 14:25:18 +02:00
Pavel Emelyanov
4a70e0aa57 system_keyspace: Table with config options
A config option value is reported as 'text' type and contains
a string as it would looks like in json config.

The table is UPDATE-able. Only the 'value' columnt can be set
and the value accepted must be string. It will be converted into
the option type automatically, however in current implementation
is't not 100% precise -- conversion is lexicographical cast which
only works for simple types. However, liveupdate-able values are
only of those types, so it works in supported cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-11 16:39:34 +03:00
Pavel Emelyanov
947e4c9a10 code: Push db::config down to virtual tables
The db::config reference is available on the database, which
can be get from the virtual_table itself. The problem is that
it's a const refernece, while system.config will be updateable
and will need non-const reference.

Adding non-const get_config() on the database looks wrong. The
database shouldn't be used as config provider, even the const
one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-11 16:39:34 +03:00
Pavel Emelyanov
1ea301ad07 storage_proxy: Propagate virtual table exceptions messages
The intention is to return some meaningful info to the CQL caller
if a virtual table update fails. Unfortunately the "generic" error
reporting in CQL is not extremely flexible, so the best option
seems to report regular write failre with custom message in it.

For now this only works for virtual table errors.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-11 16:39:34 +03:00
Pavel Emelyanov
5aefc48e28 table: Virtual writer hook (mutation applier)
Symmetrically to virtual reader one, add the virtual writer
callback on a table that will be in charge of applying the
provided mutation.

If a virtual table doesn't override this apply method the
dedicated exception is thrown. Next patch will catch it and
propagate back to caller, so it's a new exception type, not
existing/std one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-11 16:39:34 +03:00
Pavel Emelyanov
d513034ca4 utils: Ability to set_value(sstring) for an option
There soon will appear an updateable system.config table that
will push sstrings into names_value-s. Prepare for this change
by adding the respective .set_value() call. Since the update
only works for LiveUpdate-able options, and inability to do it
can be propagated back to the caller make this method return
true/false whether the update took place or not.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-11 15:15:05 +03:00
Calle Wilund
3929b7da1f commitlog: Add explicit track var for "wasted space" to avoid double counting
Refs #9331

In segment::close() we add space to managers "wasted" counter. In destructor,
if we can cleanly delete/recycle the file we remove it. However, if we never
went through close (shutdown - ok, exception in batch_cycle - not ok), we can
end up subtracting numbers that were never added in the first place.
Just keep track of the bytes added in a var.

Observed behaviour in above issue is timeouts in batch_cycle, where we
declare the segment closed early (because we cannot add anything more safely
- chunks could get partial/misplaced). Exception will propagate to caller(s),
but the segment will not go through actual close() call -> destructor should
not assume such.

Closes #9598
2021-11-09 09:15:44 +02:00
Avi Kivity
b0a2a9771f Merge "Sanitize hostnames resolving on start" from Pavel E
"
On start scylla resolves several hostnames into addresses. Different
places use different hostname selection logic, e.g. the API address
can be the listen one if the dedicated option not set. Failure to
resolve a hostname is reported with an exception that (sometimes)
contains the hostname, but it doesn't look very convenient -- better
to know the config option name. Also resolving of different hostnames
has different decoration around, e.g. prometheus carries a main-local
lambda just to nicely wrap the try/catch block.

This set unifies this zoo and makes main() shorter and less hairy:

1. All failures to resolve a hostname are reported with an
   exception containing the relevant config option

2. The || operator for named_value's is introduced to make
   the option selection look as short as

     resolve(cfg->some_address() || cfg->another_address())

3. All sanity checks are explicit and happen early in main

4. No dangling local variables carrying the cfg->...() value

5. Use resolved IP when logging a "... is listening on ..."
   message after a service start

tests: unit(dev)
"

* 'br-ip-resolve-on-start' of https://github.com/xemul/scylla:
  main: Move fb-utilities initialization up the main
  code: Use utils::resolve instead of inet_address::lookup
  main: Remove unused variable
  main: Sanitize resolving of listen address
  main: Sanitize resolving of broadcast address
  main: Sanitize resolving of broadcast RPC address
  main: Sanitize resolving of API address
  main: Sanitize resolving of prometheus address
  utils: Introduce || operator for named_values
  db.config: Verbose address resolver helper
  main: Remove api-port and prometheus-port variables
  alternator: Resolve address with the help of inet_address
  redis, thrift: Remove unused captures
2021-11-09 09:15:40 +02:00
Botond Dénes
5b3ac3147b db/schema_tables: merge_tables_and_views(): match old/new view with old/new base table
For altered tables, the above function creates schema objects
representing before/after (old/new) table states. In case of views,
there is a matching mechanism to set the base table field of the view to
the appropriate base table object. This works by iterating over the list
of altered tables and selecting the "new_schema" field of the first
instance matching the keyspace/name of the base-table. This ends up
pairing the after/old version of the base table to both the before and
after version of the view. This means the base attached to the view is
possibly incompatible with the view it is attached to.
This patch fixes this by passing the schema generation (before/after) to
the function responsible for this matching, so it can select the
appropriate version of the base class.
For example, given the following input to `merge_tables_and_views()`:

    tables_before = { t1_before }
    tables_after = { t1_after }
    views_before = { v1_before }
    views_after = { v1_after }

Before this patch, the `base_schema` field of `v1_before` would be
`t1_after`, while it obviously should be `t1_before`. This sounds scary
but has no practical implications currently as `v1_before` is only
computed and then discarded without being used.

Tests: unit(dev)

Fixes: #9586
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211108124806.151268-1-bdenes@scylladb.com>
2021-11-09 09:13:51 +02:00
Pavel Emelyanov
2f9c21644b code: Use utils::resolve instead of inet_address::lookup
There are some users of the latter call left. They all suffer
from the same problem -- the lack of verbosity on resolving
errors.

While at it also get rid of useless local variables that are
only there to carry the cfg->...() option over.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-08 17:33:27 +03:00
Pavel Emelyanov
71ce7c6e87 db.config: Verbose address resolver helper
The helper works on named_value() and throws and exception containing
the option name for convenient error reporting.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-08 17:33:27 +03:00
Avi Kivity
247f2b69d5 Merge "system tables: create the schema more efficiently" from Botond
"
System tables currently almost uniformly use a pattern like this to
create their schema:

    return schema_builder(make_shared_schema(...))
    // [...]
    .with_version(...)
    .build(...);

This pattern is very wasteful because it first creates a schema, then
dismantles it just to recreate it again. This series abolishes this
pattern without much churn by simply adding a constructor to schema
builder that takes identical parameters to `make_shared_schema()`,
then simply removing `make_shared_schema()` from these users, who now
build a schema builder object directly and build the schema only once.

Tests: unit(dev)
"

* 'schema-builder-make-shared-schema-ctor/v1' of https://github.com/denesb/scylla:
  treewide: system tables: don't use make_shared_schema() for creating schemas
  schema_builder: add a constructor providing make_shared_schema semantics
  schema_builder: without_column(): don't assume column_specification exists
  schema: add static variant of column_name_type()
2021-11-07 18:23:22 +02:00
Botond Dénes
d51aa66a8a db/system_keyspace: add versions table
Contains all version related information (`nodetool version` and more).
Example printout:

    (cqlsh) select * from system.versions;

     key   | build_id                                 | build_mode | version
    -------+------------------------------------------+------------+-------------------------------
     local | aaecce2f5068b0160efd04a09b0e28e100b9cd9e |        dev | 4.6.dev-0.20211021.0d744fd3fa
2021-11-05 15:42:42 +02:00
Botond Dénes
89cc016f07 db/system_keyspace: add runtime_info table
Loosly contains the equivalent of the `nodetool info` command, with some
notable differences:
* Protocol server related information is in `system.protocol_servers`;
* Information about memory, memtable and cache is reformatted to be
  tailored to scylla: C* specific terminology and metrics are dropped;
* Information that doesn't change and is already in `system.local` is
  not contained;
* Added trace-probability too (`nodetool gettraceprobability`);

TODO(follow-up): exceptions.
2021-11-05 15:42:42 +02:00
Botond Dénes
78adda197f db/system_keyspace: add protocol_servers table
Lists all the client protocol server and their status. Example output:

    (cqlsh) select * from system.protocol_servers;

      name             | is_running | listen_addresses                      | protocol | protocol_version
    ------------------+------------+---------------------------------------+----------+------------------
     native transport |       True | ['127.0.0.1:9042', '127.0.0.1:19042'] |      cql |            3.3.1
	   alternator |      False |                                    [] | dynamodb |
		  rpc |      False |                                    [] |   thrift |           20.1.0
		redis |      False |                                    [] |    redis |

This prints the equivalent of `nodetool statusbinary` and the "Thrift
active" and "Native Transport active" fields from the `nodetool info`
output with some additional information:
* It contains alternator and redis status;
* It contains the protocol version;
* It contains the listen addresses (if respective server is running);
2021-11-05 15:42:42 +02:00
Botond Dénes
64f658aea4 db/system_keyspace: add snapshots virtual table
Lists the equivalent of the `nodetool listsnapshots` command.
2021-11-05 15:42:41 +02:00
Botond Dénes
f0281eaa98 db/virtual_table: remove _db member
This member is potentially dangerous as it only becomes non-null
sometimes after the virtual table object is constructed. This is asking
for nullptr dereference.
Instead, remove this member and have virtual table implementations that
need a db, ask for it in the constructor, it is available in
`register_virtual_tables()` now.
2021-11-05 15:42:41 +02:00
Botond Dénes
200e2fad4d db/system_keyspace: propagate distributed<> database and storage_service to register_virtual_tables()
As some virtual tables will need the distributed versions of these.
2021-11-05 15:42:41 +02:00
Botond Dénes
ccf5c31776 treewide: system tables: don't use make_shared_schema() for creating schemas
`make_shared_schema()` is a convenience method for creating a schema in
a single function call, however it doesn't have all the advanced
capabilities as `schema_builder`. So most users (which all happen to be
system tables) pass the schema created by it to schema builder
immediately to do some further tweaking, effectively building the schema
twice. This is wasteful.
This patch changes all these users to use the newly added
`schema_builder()` constructor which has the same signature (and
therefore ease-of-use) as `make_shared_schema()`.
2021-11-05 11:41:04 +02:00
Raphael S. Carvalho
2bf47c902e cql: set configurable restriction of DateTieredCompactionStrategy to warn by default
Setting a value of "warn" will still allow the create or alter commands,
but will warn the user, with a message that will appear both at the log
and also at cqlsh for example.

This is another step towards deprecating DTCS. Users need to know we're
moving towards this direction, and setting the default value to warn
is needed for this. Next step is to set it to false, and finally remove
it from the code base.

Refs #8914.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211029184503.102936-1-raphaelsc@scylladb.com>
2021-10-31 09:28:17 +02:00
Nadav Har'El
666017f2f0 Merge 'Convert last uses of sprint() to fmt::format()' from Avi Kivity
sprint() uses the printf-style formatting language while most of our
code uses the Python-derived format language from fmt::format().

The last mass conversion of sprint() to fmt (in 1129134a4a)
missed some callers (principally those that were on multiple lines, and
so the automatic converter missed them). Convert the remainder to
fmt::format(), and some sprintf() and printf() calls, so we have just
one format language in the code base. Seastar::sprint() ought to be
deprecated and removed.

Test: unit (dev)

Closes #9529

* github.com:scylladb/scylla:
  utils: logalloc: convert debug printf to fmt::print()
  utils: convert fmt::fprintf() to fmt::print()
  main: convert fprint() to fmt::print()
  compress: convert fmt::sprintf() to fmt::format()
  tracing: replace seastar::sprint() with fmt::format()
  thrift: replace seastar::sprint() with fmt::format()
  test: replace seastar::sprint() with fmt::format()
  streaming: replace seastar::sprint() with fmt::format()
  storage_service: replace seastar::sprint() with fmt::format()
  repair: replace seastar::sprint() with fmt::format()
  redis: replace seastar::sprint() with fmt::format()
  locator: replace seastar::sprint() with fmt::format()
  db: replace seastar::sprint() with fmt::format()
  cql3: replace seastar::sprint() with fmt::format()
  cdc: replace seastar::sprint() with fmt::format()
  auth: replace seastar::sprint() with fmt::format()
2021-10-28 22:33:23 +03:00
Benny Halevy
a2fc3345bd storage_service: futurize storage_service::describe_ring
Convert storage_service::describe_ring to a coroutine
to prevent reactor stalls as seen in #9280.

Fixes #9280
Closes #9282

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #9282
2021-10-28 16:51:57 +03:00
Botond Dénes
7c95bd3343 Merge 'Rename 'system.status' and 'system.describe_ring' virtual tables' from Avi Kivity
'system.status' and 'system.describe_ring' are imperfect names for
what they do, so rename them. Fortunately they aren't exposed in any
released version so there is no compatibility concern.

Closes #9530

* github.com:scylladb/scylla:
  system_keyspace: rename 'system.describe_ring' to 'system.token_ring'
  system_keyspace: rename 'system.status' to 'system.cluster_status'
2021-10-28 11:46:20 +03:00
Avi Kivity
5ea0940ca9 system_keyspace: rename 'system.describe_ring' to 'system.token_ring'
Table names are usually nouns, so SELECT/INSERT statements
sound natural: "SELECT * FROM pets". 'system.describe_ring' defies
this convention. Rename it to 'system.token_ring' so selects are natural.

The name is not in any released version, so we can safely rename it.
2021-10-27 17:32:37 +03:00
Avi Kivity
5b21e4eb83 system_keyspace: rename 'system.status' to 'system.cluster_status'
'system.status' is too generic, it doesn't explain the status of what.
'system.node_status' is also ambiguous (this node? all nodes?) so I
picked 'system.cluster_status'.

The internal name, nodetool_status_table, was even worse (we're
not querying the status of nodetool!) but fortunately wasn't exposed.

The name is not in any released version, so we can safely rename it.
2021-10-27 17:31:45 +03:00
Avi Kivity
d9d03383fa db: replace seastar::sprint() with fmt::format()
sprint() is obsolete.
2021-10-27 17:02:00 +03:00
Benny Halevy
a21b1fbb2f large_data_handle: add sstable name to log messages
Although the sstable name is part of the system.large_* records,
it is not printed in the log.
In particular, this is essential for the "too many rows" warning
that currently does not record a row in any large_* table
so we can't correlate it with a sstable.

Fixes #9524

Test: unit(dev)
DTest: wide_rows_test.py

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211027074104.1753093-1-bhalevy@scylladb.com>
2021-10-27 10:53:11 +03:00
Benny Halevy
5f513ed28b view_builder: consumer: flush_fragments: close reader on error
Make sure to close the reader created by flush_fragments
if an exception occurs before it's moved to `populate_views`.

Note that it is also ok to close the reader _after_ it has been
moved, in case populate_views itself throws after closing the
reader that was moved it.  For conveience flat_mutation_reader::close
supports close-after-move.

Fixes #9479

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211024164138.1100304-1-bhalevy@scylladb.com>
2021-10-24 19:53:31 +03:00
Nadav Har'El
e4a6569258 config: experimental flag UNUSED_CDC shouldn't be distinct from UNUSED
When an experimental feature graduates from being experimental, we want
to continue allow the old "--experimental-features=..." option to work,
in case some user's configuration uses it - just do nothing. The way
we do it is to map in db::experimental_features_t::map() the feature's
name to the UNUSED value - this way the feature's name is accepted, but
doesn't change anything.

When the CDC feature graduated from being experimental, a new bit
UNUSED_CDC was introduced to do the same thing. This separate bit was
not actually necessary - if we ever check for UNUSED_CDC bit anywhere
in the code it means the flag isn't actually unused ;-) And we don't
check it.

So simplify the code by conflating UNUSED_CDC into UNUSED.
This will also make it easy to build from db::experimental_features_t::map()
a list of current experimental features - now it will simply be those that
do not map to UNUSED.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211013105107.123544-1-nyh@scylladb.com>
2021-10-20 17:54:17 +03:00
Nadav Har'El
ddba510e64 config: add name for the experimental Alternator TTL feature
Earlier we added experimental (and very incomplete) support for
Alternator's TTL feature, but forgot to set a *name* for this
experimental feature. As a result, this feature can be enabled only with
the blanket "--experimental" option and not with a specific
"--experimental-features=..." option.

Since issue #9467 deprecated the blanket "--experimental" option
and users are encouraged to only enable specific experimental
features, it is important that we have a name for it.

So the name chosen in this patch is "alternator-ttl".
Eventually this feature might evolve beyond Alternator-only,
but for now, I think it's a good name and we'll probably
graduate the experimental Alternator TTL feature before
supporting CQL, so it will be a new experimental feature
anyway.

Refs #9467.

db/config.cc

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211012110312.719654-1-nyh@scylladb.com>
2021-10-15 16:36:23 +03:00
Tomasz Grabiec
cc56a971e8 database, treewide: Introduce partition_slice::is_reversed()
Cleanup, reduces noise.

Message-Id: <20211014093001.81479-1-tgrabiec@scylladb.com>
2021-10-14 12:39:16 +03:00
Nadav Har'El
cad039421a config: automate help-string listing experimental features
The help string from the "--experimental-features" command-line option
lists the available experimental features, to helping a user who might
want to enable them. But this help string was manually written, and has
since drifted from reality:

* Two of the listed "experimental" features, cdc and lwt, have actually
  graduated from being experimental long ago. Although technically a user
  may still use the words "cdc" and "lwt" in the "experimental-features"
  parameter, doing so is pointless, and worse: This text in the help
  string can mislead a user into thinking that these two features are
  still experimental - while they are not!

* One experimental feature - alternator-ttl - is missing from this list.

Instead of updating the help string text now - and needing to do this
again and again in the future as we change experimental features - what
this patch does is to construct the list of features automatically from
the map of supported feature names - excluding any features which map
to UNUSED.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211013122635.132582-1-nyh@scylladb.com>
2021-10-14 10:39:58 +03:00
Benny Halevy
4d2561ff75 abstract_replication_strategy: precacluate get_replication_factor for effective_replication_map
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
cddd16f22d db: view: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:55:50 +03:00
Benny Halevy
96aa6161d8 db: hints manager: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:54:52 +03:00
Benny Halevy
3393df45eb token_metadata, storage_service: unify token_metadata_lock and merge_lock.
Serialize the metadata changes with
keyspace create, update, or drop.

This will become necessary in the following patch
when we update the effective_replication_map
on all keyspaces and we want instances on all shards
end up with the same replication map.

Note that storage_service::keyspace_changed is called
from the scheme_merge path so it already holds
the merge_lock.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:01:25 +03:00
Pavel Solodovnikov
8b917f7c99 db: mark --experimental option deprecated
The documentation for --experimental config option states
that it enables all experimental features, but this is no
longer true, i.e.: raft feature is not enabled with it and
should be explicitly enabled via `--experimental-features=raft`
switch (we don't want to enable it by default alongside
other features).

Since the flag doesn't do what it's intended to, we should
mark it as "deprecated", because documenting each exception
(there could be more than only raft in the future) will be
a burden and docs will constantly go out-of-sync with the
code.

Adjust the description for the option to reflect that, mark
it "deprecated" and suggest using --experimental-features, instead.

Fixes: #9467

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211012093005.20871-2-pa.solodovnikov@scylladb.com>
2021-10-12 13:22:12 +03:00
Pavel Solodovnikov
162f1899e8 db: update the list of supported experimental features
`raft` and `alternator-streams` features were missing
from the description for `experimental-features` config
flag.

Update `scylla.yaml` template comments to reflect that, too.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211012093005.20871-1-pa.solodovnikov@scylladb.com>
2021-10-12 13:22:11 +03:00
Pavel Emelyanov
c504361c15 view_builder: Accept view_build_statuses
The code itself is already in relevant .cc file, not move it to the
relevant class.

The only significant change is where to get token metadata from.
In its old location tokens were provided by the storage service
itself, now when it's in the view builder there's no "native" place
to get them from, however the rest of the view building code gets
tokens from global storage proxy, so do the same here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-10-11 11:11:40 +03:00
Pavel Emelyanov
3b6e8c7d93 storage_service: Move view_build_statuses code
This code belongs to view builder, so put it into its .cc. No changes,
just move. This needs some ugly namespace breakage, but they will
be patched away with the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-10-11 11:11:29 +03:00
Pavel Emelyanov
4b4ce015aa system-keyspace: Keep UUID value when saving
The set_local_host_id() accepts UUID references and starts
to save it in local keyspace and in all shards' local cache.
Before it was coroutinized the UUID was copied on captures
and survived, after it it remains references. The problem is
that callers pass local variables as arguments that go away
"really soon".

Fix it to accept UUID as value, it's short enough for safe
and painless copy.

fixes: #9425
tests: dtest.ReplaceAddress_rbo_enabled.replace_node_diff_ip(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211004145421.32137-1-xemul@scylladb.com>
2021-10-04 18:21:44 +03:00
Tomasz Grabiec
e89b9799b8 Merge 'sstable mx reader: implement reverse single-partition reads' from Kamil Braun
Until now reversed queries were implemented inside
`querier::consume_page` (more precisely, inside the free function
`consume_page` used by `querier::consume_page`) by wrapping the
passed-in reader into `make_reversing_reader` and then consuming
fragments from the resulting reversed reader.

The first couple of commits change that by pushing the reversing down below
the `make_combined_reader` call in `table::query`. This allows
working on improving reversing for memtables independently from
reversing for sstables.

We then extend the `index_reader` with functions that allow
reading the promoted index in reverse.

We introduce `partition_reversing_data_source`, which wraps an sstable data
file and returns data buffers with contents of a single chosen partition
as if the rows were stored in reverse order.

We use the reversing source and the extended index reader in
`mx_sstable_mutation_reader` to implement efficient (at least in theory)
reversed single-partition reads.

The patchset disables cache for reversed reads. Fast-forwarding
is not supported in the mx reader for reversed queries at this point.

Details in commit messages. Read the commits in topological order
for best review experience.

Refs: #9134
(not saying "Fixes" because it's only for single-partition queries
without forwarding)

Closes #9281

* github.com:scylladb/scylla:
  table: add option to automatically bypass cache for reversed queries
  test: reverse sstable reader with random schema and random mutations
  sstables: mx: implement reversed single-partition reads
  sstables: mx: introduce partition_reversing_data_source
  sstables: index_reader: add support for iterating over clustering ranges in reverse
  clustering_key_filter: clustering_key_filter_ranges owning constructor
  flat_mutation_reader: mention reversed schema in make_reversing_reader docstring
  clustering_key_filter: document clustering_key_filter_ranges::get_ranges
2021-10-04 15:37:34 +02:00
Kamil Braun
703aed3277 table: add option to automatically bypass cache for reversed queries
Currently the new reversing sstable algorithms do not support fast
forwarding and the cache does not yet handle reversed results. This
forced us to disable the cache for reversed queries if we want to
guarantee bounded memory. We introduce an option that does this
automatically (without specifying `bypass cache` in the query) and turn
it on by default.

If the user decides that they prefer to keep the cache at the
cost of fetching entire partitions into memory (which may be viable
if their partitions are small) during reversed queries, the option can
be turned off. It is live-updateable.
2021-10-04 15:24:12 +02:00
Piotr Dulikowski
6093c2378b hints: assign _last_written_rp in ep manager's move constructor
The end_point_hints_manager's field _last_written_rp is initialized in
its regular constructor, but is not copied in the move constructor.
Because the move constructor is always involved when creating a new
endpoint manager, the _last_written_rp field is effectively always
initialized with the zero-argument constructor, and is set to the zero
value.

This can cause the following erroneous situation to occur:

- Node A accumulates hints towards B.
- Sync point is created at A.
  It will be used later to wait for currently accumulated hints.
- Node A is restarted.
  The endpoint manager A->B is created which has bogus value in the
  _last_written_rp (it is set to zero).
- Node A replays its hints but does not write any new ones.
- A hint flush occurs.
  If there are no hint segments on disk after flush, the endpoint
  manager sets its last sent position to the last written position,
  which is by design. However, the last written position has incorrect
  value, so the last sent position also becomes incorrect and too low.
- Try to wait for the sync point created earlier.
  The sync point waiting mechanism waits until last sent hint position
  reaches or goes past the position encoded in the sync point, but it
  will not happen because the last sent position is incorrect.

The above bug can be (sometimes) reproduced in
hintedhandoff_sync_point_api_test dtest.

Now, the _last_written_rp field is properly initialized in the move
constructor, which prevents the bug described above.

Fixes: #9320
Closes #9426
2021-10-04 13:21:34 +02:00