Commit Graph

2303 Commits

Author SHA1 Message Date
Nadav Har'El
ddba510e64 config: add name for the experimental Alternator TTL feature
Earlier we added experimental (and very incomplete) support for
Alternator's TTL feature, but forgot to set a *name* for this
experimental feature. As a result, this feature can be enabled only with
the blanket "--experimental" option and not with a specific
"--experimental-features=..." option.

Since issue #9467 deprecated the blanket "--experimental" option
and users are encouraged to only enable specific experimental
features, it is important that we have a name for it.

So the name chosen in this patch is "alternator-ttl".
Eventually this feature might evolve beyond Alternator-only,
but for now, I think it's a good name and we'll probably
graduate the experimental Alternator TTL feature before
supporting CQL, so it will be a new experimental feature
anyway.

Refs #9467.

db/config.cc

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211012110312.719654-1-nyh@scylladb.com>
2021-10-15 16:36:23 +03:00
Tomasz Grabiec
cc56a971e8 database, treewide: Introduce partition_slice::is_reversed()
Cleanup, reduces noise.

Message-Id: <20211014093001.81479-1-tgrabiec@scylladb.com>
2021-10-14 12:39:16 +03:00
Nadav Har'El
cad039421a config: automate help-string listing experimental features
The help string from the "--experimental-features" command-line option
lists the available experimental features, to helping a user who might
want to enable them. But this help string was manually written, and has
since drifted from reality:

* Two of the listed "experimental" features, cdc and lwt, have actually
  graduated from being experimental long ago. Although technically a user
  may still use the words "cdc" and "lwt" in the "experimental-features"
  parameter, doing so is pointless, and worse: This text in the help
  string can mislead a user into thinking that these two features are
  still experimental - while they are not!

* One experimental feature - alternator-ttl - is missing from this list.

Instead of updating the help string text now - and needing to do this
again and again in the future as we change experimental features - what
this patch does is to construct the list of features automatically from
the map of supported feature names - excluding any features which map
to UNUSED.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211013122635.132582-1-nyh@scylladb.com>
2021-10-14 10:39:58 +03:00
Benny Halevy
4d2561ff75 abstract_replication_strategy: precacluate get_replication_factor for effective_replication_map
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
cddd16f22d db: view: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:55:50 +03:00
Benny Halevy
96aa6161d8 db: hints manager: use effective_replication_map to get_natural_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:54:52 +03:00
Benny Halevy
3393df45eb token_metadata, storage_service: unify token_metadata_lock and merge_lock.
Serialize the metadata changes with
keyspace create, update, or drop.

This will become necessary in the following patch
when we update the effective_replication_map
on all keyspaces and we want instances on all shards
end up with the same replication map.

Note that storage_service::keyspace_changed is called
from the scheme_merge path so it already holds
the merge_lock.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:01:25 +03:00
Pavel Solodovnikov
8b917f7c99 db: mark --experimental option deprecated
The documentation for --experimental config option states
that it enables all experimental features, but this is no
longer true, i.e.: raft feature is not enabled with it and
should be explicitly enabled via `--experimental-features=raft`
switch (we don't want to enable it by default alongside
other features).

Since the flag doesn't do what it's intended to, we should
mark it as "deprecated", because documenting each exception
(there could be more than only raft in the future) will be
a burden and docs will constantly go out-of-sync with the
code.

Adjust the description for the option to reflect that, mark
it "deprecated" and suggest using --experimental-features, instead.

Fixes: #9467

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211012093005.20871-2-pa.solodovnikov@scylladb.com>
2021-10-12 13:22:12 +03:00
Pavel Solodovnikov
162f1899e8 db: update the list of supported experimental features
`raft` and `alternator-streams` features were missing
from the description for `experimental-features` config
flag.

Update `scylla.yaml` template comments to reflect that, too.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211012093005.20871-1-pa.solodovnikov@scylladb.com>
2021-10-12 13:22:11 +03:00
Pavel Emelyanov
c504361c15 view_builder: Accept view_build_statuses
The code itself is already in relevant .cc file, not move it to the
relevant class.

The only significant change is where to get token metadata from.
In its old location tokens were provided by the storage service
itself, now when it's in the view builder there's no "native" place
to get them from, however the rest of the view building code gets
tokens from global storage proxy, so do the same here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-10-11 11:11:40 +03:00
Pavel Emelyanov
3b6e8c7d93 storage_service: Move view_build_statuses code
This code belongs to view builder, so put it into its .cc. No changes,
just move. This needs some ugly namespace breakage, but they will
be patched away with the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-10-11 11:11:29 +03:00
Pavel Emelyanov
4b4ce015aa system-keyspace: Keep UUID value when saving
The set_local_host_id() accepts UUID references and starts
to save it in local keyspace and in all shards' local cache.
Before it was coroutinized the UUID was copied on captures
and survived, after it it remains references. The problem is
that callers pass local variables as arguments that go away
"really soon".

Fix it to accept UUID as value, it's short enough for safe
and painless copy.

fixes: #9425
tests: dtest.ReplaceAddress_rbo_enabled.replace_node_diff_ip(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211004145421.32137-1-xemul@scylladb.com>
2021-10-04 18:21:44 +03:00
Tomasz Grabiec
e89b9799b8 Merge 'sstable mx reader: implement reverse single-partition reads' from Kamil Braun
Until now reversed queries were implemented inside
`querier::consume_page` (more precisely, inside the free function
`consume_page` used by `querier::consume_page`) by wrapping the
passed-in reader into `make_reversing_reader` and then consuming
fragments from the resulting reversed reader.

The first couple of commits change that by pushing the reversing down below
the `make_combined_reader` call in `table::query`. This allows
working on improving reversing for memtables independently from
reversing for sstables.

We then extend the `index_reader` with functions that allow
reading the promoted index in reverse.

We introduce `partition_reversing_data_source`, which wraps an sstable data
file and returns data buffers with contents of a single chosen partition
as if the rows were stored in reverse order.

We use the reversing source and the extended index reader in
`mx_sstable_mutation_reader` to implement efficient (at least in theory)
reversed single-partition reads.

The patchset disables cache for reversed reads. Fast-forwarding
is not supported in the mx reader for reversed queries at this point.

Details in commit messages. Read the commits in topological order
for best review experience.

Refs: #9134
(not saying "Fixes" because it's only for single-partition queries
without forwarding)

Closes #9281

* github.com:scylladb/scylla:
  table: add option to automatically bypass cache for reversed queries
  test: reverse sstable reader with random schema and random mutations
  sstables: mx: implement reversed single-partition reads
  sstables: mx: introduce partition_reversing_data_source
  sstables: index_reader: add support for iterating over clustering ranges in reverse
  clustering_key_filter: clustering_key_filter_ranges owning constructor
  flat_mutation_reader: mention reversed schema in make_reversing_reader docstring
  clustering_key_filter: document clustering_key_filter_ranges::get_ranges
2021-10-04 15:37:34 +02:00
Kamil Braun
703aed3277 table: add option to automatically bypass cache for reversed queries
Currently the new reversing sstable algorithms do not support fast
forwarding and the cache does not yet handle reversed results. This
forced us to disable the cache for reversed queries if we want to
guarantee bounded memory. We introduce an option that does this
automatically (without specifying `bypass cache` in the query) and turn
it on by default.

If the user decides that they prefer to keep the cache at the
cost of fetching entire partitions into memory (which may be viable
if their partitions are small) during reversed queries, the option can
be turned off. It is live-updateable.
2021-10-04 15:24:12 +02:00
Piotr Dulikowski
6093c2378b hints: assign _last_written_rp in ep manager's move constructor
The end_point_hints_manager's field _last_written_rp is initialized in
its regular constructor, but is not copied in the move constructor.
Because the move constructor is always involved when creating a new
endpoint manager, the _last_written_rp field is effectively always
initialized with the zero-argument constructor, and is set to the zero
value.

This can cause the following erroneous situation to occur:

- Node A accumulates hints towards B.
- Sync point is created at A.
  It will be used later to wait for currently accumulated hints.
- Node A is restarted.
  The endpoint manager A->B is created which has bogus value in the
  _last_written_rp (it is set to zero).
- Node A replays its hints but does not write any new ones.
- A hint flush occurs.
  If there are no hint segments on disk after flush, the endpoint
  manager sets its last sent position to the last written position,
  which is by design. However, the last written position has incorrect
  value, so the last sent position also becomes incorrect and too low.
- Try to wait for the sync point created earlier.
  The sync point waiting mechanism waits until last sent hint position
  reaches or goes past the position encoded in the sync point, but it
  will not happen because the last sent position is incorrect.

The above bug can be (sometimes) reproduced in
hintedhandoff_sync_point_api_test dtest.

Now, the _last_written_rp field is properly initialized in the move
constructor, which prevents the bug described above.

Fixes: #9320
Closes #9426
2021-10-04 13:21:34 +02:00
Pavel Emelyanov
bbcf671276 config: Remove unused replacing options
The --replace-token and --replace-node were added some time ago, but
have never been used since then, just parsed and immediatelly aborted.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210930102222.16294-1-xemul@scylladb.com>
2021-09-30 14:56:04 +03:00
Piotr Jastrzebski
79de151158 cache_tracker: remove unused parameter from on_remove
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f66ad391d86963b43b2a01e957887ea597e591e8.1632992165.git.piotr@scylladb.com>
2021-09-30 13:03:13 +03:00
Pavel Emelyanov
9f5fd8b5c0 system_keyspace: Keep local_host_id on local_cache
Some places in the code want to have future-less access to the
host id, now they do it all by themselves. Local cache seems to
be a better place (for the record -- some time ago the "better
place" argument justified cached host id relocation from the
storage_service onto the database).

While at it -- add the future-less getter for the host_id to be
used further.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 10:54:38 +03:00
Pavel Emelyanov
beb345c00a code: Rename get_local_host_id() into load_...()
There will appear the future-less method which better deserves
the get_ prefix, so give the existing method the load_ one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 10:33:57 +03:00
Pavel Emelyanov
e49dc4ed0d system_keyspace: Coroutinize get_/set_local_host_id
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 10:33:57 +03:00
Avi Kivity
b3c95a1fc6 commitlog: reduce inclusions of commitlog.hh due to db::commitlog::force_sync (#9379)
There are now 231 translation units that indirectly include commitlog.hh
due to the need to have access to db::commitlog::force_sync.

Move that type to a new file commitlog_types.hh and make it available
without access to the commitlog class.

This reduces the number of translation units that depend on commitlog.hh
to 84, improving compile time.
2021-09-29 16:13:44 +03:00
Botond Dénes
41facb3270 treewide: move reversing to the mutation sources
Push down reversing to the mutation-sources proper, instead of doing it
on the querier level. This will allow us to test reverse reads on the
mutation source level.
The `max_size` parameter of `consume_page()` is now unused but is not
removed in this patch, it will be removed in a follow-up to reduce
churn.
2021-09-29 12:15:45 +03:00
Botond Dénes
dec282e050 db/virtual_table: streaming_virtual_table::as_mutation_source(): use query schema instead of table schema
The two might not be the same in case the schema was upgraded (unlikely
for virtual tables) or if we are reading in reverse. It is important to
use the passed-in query schema consistently during a read.
2021-09-28 17:03:57 +03:00
Avi Kivity
369afe3124 treewide: use coroutine::maybe_yield() instead of co_await make_ready_future()
The dedicated API shows the intent, and may be a tiny bit faster.

Closes #9382
2021-09-23 12:28:56 +02:00
Avi Kivity
6702711d9c Merge "Gossiper start-stop sanitation (+ bonus track)" from Pavel E
"
The main challenge here is to move messaging_service.start_listen()
call from out of gossiper into main. Other changes are pretty minor
compared to that and include

- patch gossiper API towards a standard start-shutdown-stop form
- gossiping "sharder info" in initial state
- configure cluster name and seeds via gossip_config

tests: unit(dev)
       dtest.bootstrap_test.start_stop_test_node(dev)
       manual(dev): start+stop, nodetool enable-/disablegossip

refs: #2737
refs: #2795
refs: #5489

"

* 'br-gossiper-dont-start-messaging-listen-2' of https://github.com/xemul/scylla:
  code: Expell gossiper.hh from other headers
  storage_service: Gossip "sharder" in initial states
  gossiper: Relax set_seeds()
  gossiper, main: Turn init_gossiper into get_seeds_from_config
  storage_service: Eliminate the do-bind argument from everywhere
  gossiper: Drop ms-registered manipulations
  messaging, main, gossiper: Move listening start into main
  gossiper: Do handlers reg/unreg from start/stop
  gossiper: Split (un)init_messaging_handler()
  gossiper: Relocate stop_gossiping() into .stop()
  gossiper: Introduce .shutdown() and use where appropriate
  gossiper: Set cluster_name via gossip_config
  gossiper, main: Straighten start/stop
  tests/cql_test_env: Open-code tst_init_ms_fd_gossiper
  tests/cql_test_env: De-global most of gossiper
  gossiper: Merge start_gossiping() overloads into one
  gossiper: Use is_... helpers
  gossiper: Fix do_shadow_round comment
  gossiper: Dispose dead code
2021-09-23 12:18:38 +03:00
Pavel Emelyanov
598841a5dd code: Expell gossiper.hh from other headers
This needs to add forward declarations of the gossiper class and
re-include some other headers here and there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-22 13:13:06 +03:00
Piotr Sarna
d3edca4b43 Merge 'alternator: add stub implementation of TTL's API operations'
... from Nadav Har'El

This small series adds a stub implementation of Alternator's
UpdateTimeToLive and DescribeTimeToLive operations. These operations can
enable, disable, or inquire about, the chosen expiration-time attribute.
Currently, the information about the chosen attribute is only saved,
with no actual expiration of any items taking place.

Because this is an incomplete implementation of this feature, it is not
enabled unless an experimental flag is enabled on all nodes in the
cluster.

See the individual patches for more information on what this series
does.

Refs #5060.

Closes #9345

* github.com:scylladb/scylla:
  test/alternator: rename utility function test_table_name()
  alternator: stub TTL operations
  alternator: make three utility functions in executor.cc non-static
  test/alternator: test another corner case of TTL
2021-09-21 09:58:17 +02:00
Avi Kivity
15819e0304 Merge "Database start/stop code sanitation" from Pavel E
"
Currently database start and stop code is quite disperse and
exists in two slightly different forms -- one in main and the
other one in cql_test_env. This set unifies both and makes
them look almost the perfect way:

    sharded<database> db;
    db.start(<dependencies>);
    auto stop = defer([&db] { db.stop().get(); });
    db.invoke_on_all(&database::start).get();

with all (well, most) other mentionings of the "db" variable
being arguments for other services' dependencies.

tests: unit(dev, release), unit.cross_shard_barrier(debug)
       dtest.simple_boot_shutdown(dev)
refs: #2737
refs: #2795
refs: #5489

"

* 'br-database-teardown-unification-2' of https://github.com/xemul/scylla: (26 commits)
  main: Log when database starts
  view_update_generator: Register staging sstables in constructor
  database, messaging: Delete old connection drop notification
  database, proxy: Relocate connection-drop activity
  messaging, proxy: Notify connection drops with boost signal
  database, tests: Rework recommended format setting
  database, sstables_manager: Sow some noexcepts
  database: Eliminate unused helpers
  database: Merge the stop_database() into database::stop()
  database: Flatten stop_database()
  database: Equip with cross-shard-barrier
  database: Move starting bits into start()
  database: Add .start() method
  main: Initialize directories before database
  main, api: Detach set_server_config from database and move up
  main: Shorten commitlog creation
  database: Extract commitlog initialization from init_system_keyspace
  repair: Shutdown without database help
  main: Shift iosched verification upward
  database: Remove unused mm arg from init_non_system_keyspaces()
  ...
2021-09-20 10:26:13 +03:00
Nadav Har'El
4ffd8c1f2b alternator: stub TTL operations
This patch adds stubs for the UpdateTimeToLive and DescribeTimeToLive
operations to Alternator. These operations can enable, disable, or inquire
about, the chosen expiration-time attribute.

Currently, the information about the chosen attribute is only saved, with
no actual expiration of any items taking place.

Some of the tests for the TTL feature start to pass, so their xfail tag
is removed.

Because this this new feature is incomplete, it is not enabled unless
the "alternator-ttl" experimental feature is enabled. Moreover, for
these operations to be allowed, the entire cluster needs to support
this experimental feature, because all nodes need to participate in the
data expiration - if some old nodes don't support Alternator TTL, some
of the data they hold won't get expired... So we don't allow enabling
TTL until all the nodes in the cluster support this feature.

The implementation is in a new source file, alternator/ttl.cc. This
source file will continue to grow as we implement the expiration feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-09-19 21:05:21 +03:00
Pavel Emelyanov
0de69136d4 view_update_generator: Register staging sstables in constructor
First, it's to fix the discarded future during the register. The
future is not actually such, as it's always the no-op ready one as
at that stage the view_update_generator is neither aborted nor is
in throttling state.

Second, this change is to keep database start-up code in main
shorter and cleaner. Registering staging sstables belongs to the
view_update_generator start code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:49:06 +03:00
Pavel Emelyanov
75e1d7ea74 large_data_handler: Prepare for stopped qctx
All the large data handler methods rely on global qctx thing to
write down its notes. This creates circular dependency:

query processor -> database -> large_data_handler -> qctx -> qp

In scylla this is not a technical problem, neither qctx nor the
query processor are stopped. It is a problem in cql_test_env
that stops everything, including resetting qctx to null. To avoid
tests stepping on nullptr qctx add the explicit check.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-15 17:35:24 +03:00
Avi Kivity
cc8fc73761 Merge 'hints: fix bugs in HTTP API for waiting for hints found by running dtest in debug mode' from Piotr Dulikowski
This series of commits fixes a small number of bugs with current implementation of HTTP API which allows to wait until hints are replayed, found by running the `hintedhandoff_sync_point_api_test` dtest in debug mode.

Refs: #9320

Closes #9346

* github.com:scylladb/scylla:
  commitlog: make it possible to provide base segment ID
  hints: fill up missing shards with zeros in decoded sync points
  hints: propagate abort signal correctly in wait_for_sync_point
  hints: fix use-after-free when dismissing replay waiters
2021-09-15 12:55:54 +03:00
Avi Kivity
daf028210b build: enable -Winconsistent-missing-override warning
This warning can catch a virtual function that thinks it
overrides another, but doesn't, because the two functions
have different signatures. This isn't very likely since most
of our virtual functions override pure virtuals, but it's
still worth having.

Enable the warning and fix numerous violations.

Closes #9347
2021-09-15 12:55:54 +03:00
Piotr Dulikowski
91163fcfa5 commitlog: make it possible to provide base segment ID
Adds a configuration option to the commitlog: base_segment_id. When
provided, the commitlog uses this ID as a base of its segment IDs
instead of calculating it based on the number of milliseconds between
the epoch and boot time.

This is needed in order for the feature which allows to wait for hints
to be replayed to work - it relies on the replay positions monotonically
increasing. Endpoint managers periodically re-creates its commitlog
instance - if it is re-created when there are no segments on disk,
currently it will choose the number of milliseconds between the epoch
and boot time, which might result in segments being generated with the
same IDs as some segments previously created and deleted during the same
runtime.
2021-09-15 11:04:34 +02:00
Piotr Dulikowski
486421c58c hints: fill up missing shards with zeros in decoded sync points
Between encoding and decoding of a sync point, the node might have been
restarted and resharded with increased shard count. During resharding,
existing hints segments might have been moved to new shards. Because of
that, we need to make sure that we wait for foreign segments to be
replayed on the new shards too.

This commit modifies the sync point decoding logic so that it places a
zero replay position for new shards. Additionally, a (incorrect) shard
count check is removed from `storage_proxy::wait_for_hint_sync_point`
because now the shard count in decoded sync point is guaranteed to be
not less than the node's current shard count.
2021-09-15 11:04:34 +02:00
Piotr Dulikowski
77f2448b2c hints: propagate abort signal correctly in wait_for_sync_point
When `manager::wait_for_sync_point` is called, the abort source from the
arguments (`as`) might have already been triggered. In such case, the
subscription which was supposed to trigger the `local_as` abort source
won't be run, and the code will wait indefinitely for hints to be
replayed instead of checking the replay status and returning
immediately.

This commit fixes the problem by manually triggering `local_as` if `as`
have been triggered.
2021-09-14 14:27:01 +02:00
Piotr Dulikowski
8e29ebc5d5 hints: fix use-after-free when dismissing replay waiters
When the promise waited on in the `wait_until_hints_are_replayed_up_to`
function is resolved, a continuation runs which prints a log line with
information about this event. The continuation captures a pointer to the
hints sender and uses it to get information about the endpoint whose
hints are waited for. However, at this point the sender might have been
deleted - for example, when the node is being stopped and everybody
waiting for hints is dismissed.

This commit fixes the use-after-free by getting all necessary
information while the sender is guaranteed to be alive and captures it
in the continuation's capture list.
2021-09-14 13:46:16 +02:00
Avi Kivity
3f2c680b70 Merge 'Add initial support for WebAssembly in user-defined functions (UDF)' from Piotr Sarna
This series adds very basic support for WebAssembly-based user-defined functions.

This series comes with a basic set of tests which were used to designate a minimal goal for this initial implementation.

Example usage:
```cql
CREATE FUNCTION ks.fibonacci (str text)
    RETURNS NULL ON NULL INPUT
    RETURNS boolean
    LANGUAGE xwasm
    AS ' (module
  (func $fibonacci (param $n i32) (result i32)
    (if
      (i32.lt_s (local.get $n) (i32.const 2))
      (return (local.get $n))
    )
    (i32.add
      (call $fibonacci (i32.sub (local.get $n) (i32.const 1)))
      (call $fibonacci (i32.sub (local.get $n) (i32.const 2)))
    )
  )
  (export "fibonacci" (func $fibonacci))
) '
```

Note that the language is currently called "xwasm" as in "experimental wasm", because its interface is still subject to change in the future.

Closes #9108

* github.com:scylladb/scylla:
  docs: add a WebAssembly entry
  cql-pytest: add wasm-based tests for user-defined functions
  main: add wasm engine instantiation
  treewide: add initial WebAssembly support to UDF
  wasm: add initial WebAssembly runtime implementation
  db: add wasm_engine pointer to database
  lang: add wasm_engine service
  import wasmtime.hh
  lua: move to lang/ directory
  cql3: generalize user-defined functions for more languages
2021-09-14 11:34:20 +03:00
Avi Kivity
e9ae9279e8 system_keyspace: reindent after conversion to class
Conversion to class left indentation in ruins, but that can be easily
fixed. 'git diff -w' reports no changes.

Closes #9339
2021-09-14 08:49:24 +03:00
Piotr Sarna
62e8c89a9c treewide: add initial WebAssembly support to UDF
This commit adds a very basic support for user-defined functions
coded in wasm. The support is very limited (only a few types work)
and was not tested against reactor stalls and performance in general.
2021-09-13 19:03:58 +02:00
Avi Kivity
e70b9d4835 system_keyspace: convert from namespace to class
All the namespace scope functions in system_keyspace have no place
to store context, so they must store their context in global
variables. This prevents conversion of those global variables
to constructor-provided depdendencies.

Take the first step towards providing a place to store the
context by converting system_keyspace to a class. All the functions
are static, so no context is yet available, but we can de-static-ify
them incrementally in the future and store the context in class members.

Indentation is a mess, but can be easily fixed later.
2021-09-13 15:14:14 +03:00
Avi Kivity
115d6d8d4c system_keyspace: prepare forward-declared members
In anticipation of making system_keyspace a class instead of a
namespace, rename any member that is currently forward-declared,
since one can't forward-declare a class member. Each member
is taken out of the system_keyspace namespace and gains a
system_keyspace prefix. Aliases are added to reduce code churn.

The result isn't lovely, but can be adjusted later.
2021-09-13 15:11:26 +03:00
Avi Kivity
c6ce81d6a0 system_keyspace: rearrange legacy subnamespace
Merge two fragments together, in anticipation of making 'legacy'
s struct instead of a namespace (when system_keyspace is a class,
we can't nest a namespace inside it).
2021-09-13 15:10:15 +03:00
Avi Kivity
6d379ae6f9 system_keyspace: remove outdated java code
This code has been rewritten and not removed, or is not needed.
Remove it to reduce clutter.
2021-09-13 15:08:57 +03:00
Piotr Sarna
4e952df470 lua: move to lang/ directory
Support for more languages is comming, so let's group them
in a separate directory.
2021-09-13 11:01:33 +02:00
Piotr Sarna
46c6603fe0 cql3: generalize user-defined functions for more languages
In order to support more languages than just Lua in the future,
Lua-specific configuration is now extracted to a separate
structure.
2021-09-13 11:01:33 +02:00
Avi Kivity
c5f52f9d97 schema_tables: don't flush in tests
Flushing schema tables is important for crash recovery (without a flush,
we might have sstables using a new schema before the commitlog entry
noting the schema change has been replayed), but not important for tests
that do not test crash recovery. Avoiding those flushes reduces system,
user, and real time on tests running on a consumer-level SSD.

before:
real	8m51.347s
user	7m5.743s
sys	5m11.185s

after:
real	7m4.249s
user	5m14.085s
sys	2m11.197s

Note real time is higher that user+sys time divided by the number
of hardware threads, indicating that there is still idle time due
to the disk flushing, so more work is needed.

Closes #9319
2021-09-12 11:32:13 +03:00
Tomasz Grabiec
83113d8661 Merge "raft: new schema for storing raft snapshots" from Pavel Solodovnikov
Previously, the layout for storing raft snapshot
descriptors contained a `config` field, which had `blob`
data type.

That means `raft::configuration` for the snapshot was serialized
as a whole in binary form. It's convenient to implement and
is the most compact form of representing the data, but:

1. Hard to debug due to the need to de-serialize the data.
2. Plants a time bomb wrt. changing data layout and also the
   documentation in the future.

Remove the `config` field from `system.raft_snapshots` and
extract it to a separate `system.raft_config` table to store
the data in exploded form.

Also, modify the schema of `system.raft_snapshots` table in
the following way: add a `server_id` field as a part of
composite partition key ((group_id, server_id)) to
be able to start multiple raft servers belonging to one raft
group on the same scylla node.

Rename `id` field in `raft_snapshots` to `snapshot_id` so
it's self-documenting.

Rename `snapshot_id` from clustering key since a given server
can have only one snapshot installed at a time.

Note that the `raft::server_address` stucture contains an opaque
`info` member, which is `bytes`, but in the `raft_config` table
we use `ip_addr inet` field, instead. We always know that the
corresponding member field is going to contain an IP address (either v4
or v6) of a given raft server.

So, now the snapshots schema looks like this:

    CREATE TABLE raft_snapshots (
        group_id timeuuid,
        server_id uuid,
        snapshot_id uuid,
        idx int,
        term int,
        -- no `config` field here, moved to `raft_config` table
        PRIMARY KEY ((group_id, server_id))
    )

    CREATE TABLE raft_config (
         group_id timeuuid,
         my_server_id uuid,
         server_id uuid,
         disposition text, -- can be either 'CURRENT` or `PREVIOUS'
         can_vote bool,
         ip_addr inet,
         PRIMARY KEY ((group_id, my_server_id), server_id, disposition)
    );

This way it's much easier to extend the schema with new fields,
very easy to debug and inspect via CQL, and it's much more descriptive
in terms of self-documentation.

Tests: unit(dev)

* manmanson/raft_snapshots_new_schema_v2:
  test: adjust `schema_change_test` to include new `system.raft_config` table
  raft: new schema for storing raft snapshots
  raft: pass server id to `raft_sys_table_storage` instance
2021-09-10 20:41:59 +02:00
Avi Kivity
16116ac631 interval: constrain comparator parameters
The interval template member functions mostly accept
tri-comparators but a few functions accept less-comparators.
To reduce the chance of error, and to provide better error
messages, constrain comparator parameters to the expected
signature.

In one case (db/size_estimates_virtual_reader.cc) the caller
had to be adjusted. The comparator supported comparisons
of the interval value type against other types, but not
against itself. To simplify things, we add that signature too,
even though it will never be called.

Closes #9291
2021-09-10 16:43:16 +02:00
Avi Kivity
c1028de22a Merge 'Introduce native reversed format' from Botond Dénes
We define the native reverse format as a reversed mutation fragment
stream that is identical to one that would be emitted by a table with
the same schema but with reversed clustering order. The main difference
to the current format is how range tombstones are handled: instead of
looking at their start or end bound depending on the order, we always
use them as-usual and the reversing reader swaps their bounds to
facilitate this. This allows us to treat reversed streams completely
transparently: just pass along them a reversed schema and all the
reader, compacting and result building code is happily ignorant about
the fact that it is a reversed stream.

This series is the first step towards implementing efficient reverse
reads. It allows us to remove all the special casing we have in various
places for reverse reads and thus treating reverse streams transparently
in all the middle layers. The only layers that have to know about the
actual reversing are mutation sources proper. The plan is that when
reading in reverse we create a reversed schema in the top layer then
pass this down as the schema for the read. There are two layers that
will need to act on this reversed schema:
* The layer sitting on top of the first layer which still can't handle
  reversed streams, this layer will create a reversed reader to handle
  the transition.
* The mutation source proper: which will obtain the underlying schema
  and will emit the data in reverse order.

Once all the mutation sources are able to handle reverse reads, we can
get rid of the reverse reader entirely.

Refs: #1413

Tests: unit(dev)

TODO:
* v2
* more testing

Also on: https://github.com/denesb/scylla.git reverse-reads/v3

Changelog

v3:
* Drop the entire schema transformation mechanism;
* Drop reversing from `schema_builder()`;
* Don't keep any information about whether the schema is reversed or not
  in the schema itself, instead make reversing deterministic w.r.t.
  schema version, such that:
  `s.version() == s.make_reversed().make_reversed().version()`;
* Re-reverse range tombstones in `streaming_mutation_freezer`, so
  `reconcilable_results` sent to the coordinator during read repair
  still use the old reverse format;

v2:
* Add `data_type reversed(data_type)`;
* Add `bound_kind reverse_kind(bound_kind)`;
* Make new API safer to use:
    - `schema::underlying_type()`: return this when unengaged;
    - `schema::make_transformed()`: noop when applying the same
      transformation again;
* Generalize reversed into transformation. Add support to transferring
  to remote nodes and shards by way of making `schema_tables` aware of
  the transformation;
* Use reverse schema everywhere in reverse reader;

Closes #9184

* github.com:scylladb/scylla:
  range_tombstone_accumulator: drop _reversed flag
  test/boost/mutation_test: add test for mutation::consume() monotonicity
  test/boost/flat_mutation_reader_test: more reversed reader tests
  flat_mutation_reader: make_reversing_reader(): implement fast_forward_to(partition_range)
  flat_mutation_reader: make_reversing_reader(): take ownership of the reader
  test/lib/mutation_source_test: add consistent log to all methods
  mutation: introduce reverse()
  mutation_rebuilder: make it standalone
  mutation: make copy constructor compatible with mutation_opt
  treewide: switch to native reversed format for reverse reads
  mutation: consume(): add native reverse order
  mutation: consume(): don't include dummy rows
  query: add slice reversing functions
  partition_slice_builder: add range mutating methods
  partition_slice_builder: add constructor with slice
  query: specific_ranges: add non-const ranges accessor
  range_tombstone: add reverse()
  clustering_bounds_comparator: add reverse_kind()
  schema: introduce make_reversed()
  schema: add a transforming copy constructor
  utils: UUID_gen: introduce negate()
  types: add reversed(data_type)
  docs: design-notes: add reverse-reads.md
2021-09-09 15:50:22 +03:00