Commit Graph

4972 Commits

Author SHA1 Message Date
Botond Dénes
0740019e4d db/view: migrate view_update_builder to v2
To avoid noise, the interface is left as v1 and inbound readers are
converted in the constructor.
2022-03-17 10:47:55 +02:00
Pavel Emelyanov
e8ba395fea system_keyspace: Make load_host_ids non-static
Same as previous patch -- just use the reference from storage service

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
28204cd83d system_keyspace: Make load_tokens non-static
Called from storage service that has system-keyspace instances

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
8f977814bc system_keyspace: Make remove_endpoint and update_tokens non-static
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
3f0f94b081 system_keyspace: Coroutinize update_tokens
While at it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
7b2f142e2d system_keyspace: Coroutinize remove_endpoint
Not to capture 'this' all over the method in next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
7d0d5642c0 system_keyspace: Make update_cached_values non-static
The update_table() helper template too. And the update_peer_info as
well. It can stop using global qctx and cache after that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
5a1f7193b0 system_keyspace: Coroutinuze update_peer_info
Not to carry 'this' over captures in next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
c15359165d system_keyspace: Make update_schema_version non-static
It's called from two places -- .setup() and schema_tables code. Both
have the instance hanging around, so the method can be de-marked
static and set free from global qctx

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
b80d5f8900 schema_tables: Add sharded<system_keyspace> argument to update_schema_version_and_announce
All its (indirect) callers had been patched to have it, now it's
possible to have the argument in it. Next patch will make use of it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
835f39e0ba system_keyspace: Remove temporary qp variable
After the previous patches it's possible to clean local stack
of .setup() a little bit

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
ece4448ea9 system_keyspace: Make get_preferred_ips non-static
And mark the method private while at it, because it is such

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
a095e8d92d system_keyspace: Make cache_truncation_record non-static
This one is a bit more tricky that its four preceeders. The qctx's
qp().execute_cql() is replaced with qp().execute_internal() for
symmetry with the rest. Without data args it's the same.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
f54473427d system_keyspace: Make check_health non-static
Yet another same step. Drop static keyword and patch out globals.
Get config.cluster_name from _db while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
5944d5f663 system_keyspace: Make build_bootstrap_info non-static
The same -- drop static and forget global qctx and cache

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
08174eb868 system_keyspace: Make build_dc_rack_info non-static
Same here -- remove static and patch out global qctx and cache.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
66beaad1e5 system_keyspace: Make setup_version non-static
Just remove static mark and stop using global qctx.
Grab config from _db instead of argument while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
00a345c4d8 system_keyspace: Copy execute_internal() from query context
Before patching system_keyspace methods to use query processor from
its instance, the respective call is needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
f4c185c30e system_keyspace: Make setup method non-static
It's called only on start and actively uses both qctx and local
cache. Next patches will fix the whole setup code to stop using
global qctx/cache.

For now setup invocation is left in its place, but it must really
happen in start() method. More patching is needed to make it work.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 13:57:59 +03:00
Pavel Emelyanov
b761e558b5 system_keyspace: Keep local_cache reference on board
For now it's a reference, but all users of the cache will be
eventually switched into using system_keyspace.

In cql-test-env cache starting happens earlier than it was
before, but that's OK, it just initializes empty instances.
In main cache starts at the same time as before patching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 13:57:59 +03:00
Pavel Emelyanov
1bcb6c13a5 system_keyspace: Move minimal_setup into start
Start happens at exactly the same place. One thing to take care
of is that it happens on all shards.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 13:57:59 +03:00
Pavel Emelyanov
7ef69b8189 system_keyspace: Make sharded object
The db::system_keyspace was made a class some time ago, time to create
a standard sharded<> object out of it. It needs query processor and
database. None of those depensencies is started early enough, so the
object for now starts in two steps -- early instances creation and
late start.

The instances will carry qctx and local_cache on board and all the
services that need those two will depend on system-keyspace. Its start
happens at exactly the same place where system_keyspace::setup happens
thus any service that will use system_keyspace will be on the same
safe side as it is now.

In the further future the system_keyspace will be equpped with its
own query processor backed by local replica database instance, instead
of the whole storage proxy as it is now.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 13:57:59 +03:00
Mikołaj Sielużycki
a8cb7bf677 readers: Make result_collector use queue reader handle v2.
Transitively modifies streaming_virtual_table::as_mutation_source,
along with the tests.

Closes #10223
2022-03-15 17:02:28 +02:00
Avi Kivity
39504dea61 Merge "Convert result builders to v2" from Botond
"
Namely the query result writer and the reconcilable result builder, used
for building results for regular queries and mutation queries (used in
read repair) respectively.
With this, there are no users left for the v1 output of the compactor,
so we remove that, making the compactor v2 all-the-way (and simpler).
This means that for regular queries, a downgrade phase is eliminated
completely, as regular queries don't store range tombstone in their
result, so no need to convert them.

Tests: unit(dev, release, debug)
"

* 'result-builders-v2/v1' of https://github.com/denesb/scylla:
  reconcilable_result_builder: remove v1 support
  query_result_builder: remove v1 support
  mutation_compactor: drop v1 related code-paths
  mutation_compactor: drop v1 support altogether from the API
  tree: migrate to the v2 consumer APIs
  test/boost/mutation_test: remove v1 specific test code
  querier: switch to v2 compactor output
  reconcilable_result_builder: add v2 support
  query_result_writer: add v2 support
  query_result_builder: make consume(range_tombstone) noop
2022-03-15 14:32:58 +02:00
Benny Halevy
90edddd7e3 everywhere: use make_flat_mutation_reader_from_mutations_v2
Rather than upgrade_to_v2(make_flat_mutation_reader_from_mutations)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220315083425.2786228-1-bhalevy@scylladb.com>
2022-03-15 11:41:10 +02:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Botond Dénes
87ac2e9ab0 tree: migrate to the v2 consumer APIs 2022-03-11 09:24:05 +02:00
Nadav Har'El
ef43531fb6 materialized views: allow empty strings in views and indexes
Although Cassandra generally does not allow empty strings as partition
keys (note they are allowed as clustering keys!), it *does* allow empty
strings in regular columns to be indexed by a secondary index, or to
become an empty partition-key column in a materialized view. As noted in
issues #9375 and #9364 and verified in a few xfailing cql-pytest tests,
Scylla didn't allow these cases - and this patch fixes that.

The patch mostly *removes* unnecessary code: In one place, code
prevented an sstable with an empty partition key from being written.
Another piece of removed code was a function is_partition_key_empty()
which the materialized-view code used to check whether the view's
row will end up with an empty partition key, which was supposedly
forbidden. But in fact, should have been allowed like they are allowed
in Cassandra and required for the secondary-index implementation, and
the entire function wasn't necessary.

Note that the removed function is_partition_key_empty() was *NOT* required
for the "IS NOT NULL" feature of materialized views - this continues to
work as expected after this patch, and we add another test to confirm it.
Being null and being an empty string are two different things.

This patch also removes a part of a unit test which enshrined the
wrong behavior.

After this patch we are left with one interesting difference from
Cassandra: Though Cassandra allows a user to create a view row with an
empty-string partition key, and this row is fully visible in when
scanning the view, this row can *not* be queried individually because
"WHERE v=''" is forbidden when v is the partition key (of the view).
Scylla does not reproduce this anomaly - and such point query does work
in Scylla after this patch. We add a new test to check this case, and mark
it "cassandra_bug", i.e., it's a Cassandra behavior which we consider
wrong and don't want to emulate.

This patch relies on #9352 and #10178 having been fixed in previous patches,
otherwise the WHERE v='' does not work when reading from sstables.
We add to the already existing tests we had for empty materialized-views
keys a lookup with WHERE v='' which failed before fixing those two issues.

Fixes #9364
Fixes #9375

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-03-08 15:34:26 +02:00
Michael Livshin
a389cc520b system_keyspace, sstable: log local host id in key places
Specifically: when it is generated, when it is loaded from
`system.local`, and when there is a mismatch during sstable
validation; in the latter case log the in-sstable host id also.

Refs #10148

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Message-Id: <20220301123925.257766-1-michael.livshin@scylladb.com>
2022-03-02 09:49:37 +02:00
Benny Halevy
ebbbf1e687 lister: move to utils
There's nothing specific to scylla in the lister
classes, they could (and maybe should) be part of
the seastar library.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-28 12:36:03 +02:00
Tomasz Grabiec
7719f4cd91 Merge "Group 0 discovery: persist and restore peers" from Kamil
We add a `peers()` method to `discovery` which returns the peers
discovered until now (including seeds). The caller of functions which
return an output -- `tick` or `request` -- is responsible for persisting
`peers()` before returning the output of `tick`/`request` (e.g. before
sending the response produced by `request` back). The user of
`discovery` is also responsible for restoring previously persisted peers
when constructing `discovery` again after a restart (e.g. if we
previously crashed in the middle of the algorithm).

The `persistent_discovery` class is a wrapper around `discovery` which
does exactly that.

For storage we use a simple local table.

A simple bugfix is also included in the first patch.

* kbr/discovery-persist-v3:
  service: raft: raft_group0: persist discovered peers and restore on restart
  db: system_keyspace: introduce discovery table
  service: raft: discovery: rename `get_output` to `tick`
  service: raft: discovery: stop returning peer_list from `request` after becoming leader
2022-02-25 17:23:08 +01:00
Nadav Har'El
49a8164fb7 alternator: add configurable scan period to TTL expiration
Before this patch, the experimental TTL (expiration time) feature in
Alternator scans tables for expiration in a tight loop - starting the
next scan one second after the previous one completed.

In this patch we introduce a new configuration option,
alternator_ttl_period_in_seconds, which determines how frequently
to start the scan. The default is 24 hours - meaning that the next
scan is started 24 hours after the previous one started.

The tests (test/alternator/run) change this configuration back to one
second, so that expiration tests finish as quickly as possible.

Please note that the scan is *not* slowed down to fill this 24 hours -
if it finishes in one hour, it will then sleep for 23 hours. Additional
work would be needed to slow down the scan to not finish too quickly.
One idea not yet implemented is to move the expiration service from
the "maintenance" scheduling group which it uses today to a new
scheduling group, and modifying the number of shares that this group
gets.

Another thing worth noting about the configurable period (which defaults
to 24 hours) is that when TTL is enabled on an Alternator table, it can
take that amount of time until its scan starts and items start expiring
from it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Avi Kivity
cbba80914d memtable: move to replica module and namespace
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.

Memtables are also used outside the replica, in two places:
 - in some virtual tables; this is also in some way inside the replica,
   (virtual readers are installed at the replica level, not the
   cooordinator), so I don't consider it a layering violation
 - in many sstable unit tests, as a convenient way to create sstables
   with known input. This is a layering violation.

We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).

Test: unit (dev)

Closes #10120
2022-02-23 09:05:16 +02:00
Botond Dénes
3aa05f7f03 Merge "Make system.clients table virtual" from Pavel Emelyanov
"
The table lists connected clients. For this the clients are
stored in real table when they connect, update their statuses
when needed and remove^w tombstone themselves when they
disconnect. On start the whole table is cleared.

This looks weird. Here's another approach (inspired by the
hackathon project) that makes this table a pure virtual one.
The schema is preserved so is the data returned.

The benefits of doing it virtual are

- no on-disk updates while processing clients
- no potentially failing updates on non-failing disconnect
- less usage of the global qctx thing
- less calls to global storage_proxy
- simpler support for thrift and alternator clients (today's
  table implementation doesn't track them)
- the need to make virtual tables reg/unreg dynamic

branch: https://github.com/xemul/scylla/tree/br-clients-virtual-table-4
tests: manual(dev), unit(dev)

The manual test used 80-shards node and 1M connections from
1k different IP addresses.
"

* 'br-clients-virtual-table-4' of https://github.com/xemul/scylla:
  test: Add cql-pytest sanity test for system.clients table
  client_data: Sanitize connection_notifier
  transport: Indentation fix after previous patch
  code: Remove old on-disk version of system.clients table
  system_keyspace: Add clients_v virtual table
  protocol_server: Add get_client_data call
  transport: Track client state for real
  transport: Add stringifiers to client_data class
  generic_server: Gentle iterator
  generic_server: Type alias
  docs: Add system.clients description
2022-02-22 20:58:25 +03:00
Botond Dénes
05c48ee0cc db/view/view_updating_consumer: migrate to v2
Not a completely mechanical transition. The consumer has to generate its
mutation via a mutation_rebuilder_v2 as mutation fragment v2 cannot be
applied to mutations directly yet.
2022-02-21 12:29:24 +02:00
Botond Dénes
45b36d91c6 db/view/view_builder: use v2 reader 2022-02-21 12:27:55 +02:00
Pavel Emelyanov
de6c60c1c9 client_data: Sanitize connection_notifier
Now the connection_notifier is all gone, only the client_data bits are left.
To keep it consistent -- rename the files.

Also, while at it, brush up the header dependencies and remove the not
really used constexprs for client states.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-02-18 15:02:26 +03:00
Pavel Emelyanov
971c431a23 code: Remove old on-disk version of system.clients table
This includes most of the connection_notifier stuff as well as
the auxiliary code from system_keyspace.cc and a bunch of
updating calls from the client state changing.

Other than less code and less disk updates on clients connection
paths, this removes one usage of the nasty global qctx thing.

Since the system.clients goes away rename the system.clients_v
here too so the table is always present out there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-02-18 15:02:26 +03:00
Pavel Emelyanov
0c9ed01716 system_keyspace: Add clients_v virtual table
This table mirrors the existing clients one but temporarily
has its own name. The schema is the same as in system.clients.

The table gets client_data's from the registered protocol
servers, which in turn are obtained from the storage service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-02-18 15:02:26 +03:00
Botond Dénes
948bc359c2 Merge "ME sstable format support" from Michael Livshin
"
This series implements support for the ME sstable format (introduced
in C* 3.11.11).

Tests: unit(dev)
"

* tag 'me-sstable-format-v5' of https://github.com/cmm/scylla:
  sstables: validate originating host id
  sstable: add is_uploaded() predicate
  config: make the ME sstable format default
  scylla-gdb.py: recognize ME sstables
  sstables: store originating host id in stats metadata
  system_keyspace: cache local host id before flushing
  database_test: ensure host id continuity
  sstables_manager: add get_local_host_id() method and support
  sstables_manager: formalize inheritability
  system_keyspace, main: load (or create) local host id earlier
  sstable_3_x_test: test ME sstable format too
  add "ME_SSTABLE" cluster feature
  add "sstable_format" config
  add support for the ME sstable format
  scylla-sstable: add ability to dump optionals and utils::UUID
  sstables: add ability to write and parse optionals
  globalize sstables::write(..., utils::UUID)
2022-02-16 18:28:16 +02:00
Michael Livshin
3bf1e137fc config: make the ME sstable format default
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
0ccd56e036 system_keyspace: cache local host id before flushing
Later in this series the ME sstable format is made default, which
means that `system.local` will likely be written as ME.

Since, in ME, originating host id is a part of sstable stats metadata,
the local host id needs to either already be cached by the time
`system.local` is flushed, or to somehow be special-case-ignored when
flushing `system.local`.

The former (done here) is optimistic (cache before flush), but the
alternative would be an abstraction violation and would also cost a
little time upon each sstable write.

(Cache-before-flush could be undone by catching any exceptions during
flush and un-caching, but inability to `co_await` in catch clauses
makes the code look rather awkward.  And there is no need to bother
because bootstrap failures should be fatal anyway)

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
3fef604075 sstables_manager: add get_local_host_id() method and support
Since ME sstable format includes originating host id in stats
metadata, local host id needs to be made available for writing and
validation.

Both Scylla server (where local host id comes from the `system.local`
table) and unit tests (where it is fabricated) must be accomodated.
Regardless of how the host id is obtained, it is stored in the db
config instance and accessed through `sstables_manager`.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
7d2af177eb system_keyspace, main: load (or create) local host id earlier
We want it to be cached before any sstable is written, so do it right
after system_keyspace::minimal_setup().

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
d370558279 add "ME_SSTABLE" cluster feature
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Michael Livshin
0b1447c702 add "sstable_format" config
Initialize it to "md" until ME format support is
complete (i.e. storing originating host id in sstable stats metadata
is implemented), so at present there is no observable change by
default.

Also declare "enable_sstables_md_format" unused -- the idea, going
forward, being that only "sstable_format" controls the written sstable
file format and that no more per-format enablement config options
shall be added.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-16 18:21:24 +02:00
Nadav Har'El
7be3129458 cdc: don't need current keyspace to create the log table
CDC registers to the table-creation hook (before_create_column_family)
to add a second table - the CDC log table - to the same keyspace.
The handler function (on_before_update_column_family() in cdc/log.cc)
wants to retrieve the keyspace's definition, but that does NOT WORK if
we create the keyspace and table in one operation (which is exactly what
we intend to do in Alternator to solve issue #9868) - because at the
time of the hook, the keyspace does not yet exist in the schema.

It turns out that on_before_update_column_family() does not REALLY need
the keyspace. It needed it to pass it on to make_create_table_mutations()
but that function doesn't use the keyspace parameter passed to it! All
it needs is the keyspace's name - which is in the schema anyway and
doesn't need to be looked up.

So in this patch we fix make_create_table_mutations() to not require the
unused keyspace parameter - and fix the CDC code not to look for the
keyspace that is no longer needed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220215162342.622509-1-nyh@scylladb.com>
2022-02-16 08:38:56 +02:00
Benny Halevy
244df07771 large_data_handler: use only basename to identify the sstable
SSTables may be created in one directory (e.g. staging)
and be removed from another directory (base table dir,
or quarantine if scrub moved them there), so identify
the sstable by its unique component basename rather than
the full path.

Fixes #10075

Test: unit(dev)
DTest: wide_rows_test.py (w/ https://github.com/scylladb/scylla-dtest/pull/2606)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220214131923.1468870-1-bhalevy@scylladb.com>
2022-02-14 17:57:49 +02:00
Benny Halevy
b131f94fc3 large_data_handler: maybe_delete_large_data_entries: data_size is unused
Since 64a4ffc579

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220214115258.1354372-1-bhalevy@scylladb.com>
2022-02-14 13:58:44 +02:00
Kamil Braun
5dbf86fa29 db: system_keyspace: introduce discovery table
This table will be used to persist the list of peers discovered by the
`discovery` algorithm that is used for creating Raft group 0 when
bootstrapping a fresh cluster.
2022-02-14 12:05:18 +01:00