The update_table() helper template too. And the update_peer_info as
well. It can stop using global qctx and cache after that
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's called from two places -- .setup() and schema_tables code. Both
have the instance hanging around, so the method can be de-marked
static and set free from global qctx
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All its (indirect) callers had been patched to have it, now it's
possible to have the argument in it. Next patch will make use of it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This one is a bit more tricky that its four preceeders. The qctx's
qp().execute_cql() is replaced with qp().execute_internal() for
symmetry with the rest. Without data args it's the same.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Yet another same step. Drop static keyword and patch out globals.
Get config.cluster_name from _db while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Just remove static mark and stop using global qctx.
Grab config from _db instead of argument while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Before patching system_keyspace methods to use query processor from
its instance, the respective call is needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's called only on start and actively uses both qctx and local
cache. Next patches will fix the whole setup code to stop using
global qctx/cache.
For now setup invocation is left in its place, but it must really
happen in start() method. More patching is needed to make it work.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For now it's a reference, but all users of the cache will be
eventually switched into using system_keyspace.
In cql-test-env cache starting happens earlier than it was
before, but that's OK, it just initializes empty instances.
In main cache starts at the same time as before patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Start happens at exactly the same place. One thing to take care
of is that it happens on all shards.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The db::system_keyspace was made a class some time ago, time to create
a standard sharded<> object out of it. It needs query processor and
database. None of those depensencies is started early enough, so the
object for now starts in two steps -- early instances creation and
late start.
The instances will carry qctx and local_cache on board and all the
services that need those two will depend on system-keyspace. Its start
happens at exactly the same place where system_keyspace::setup happens
thus any service that will use system_keyspace will be on the same
safe side as it is now.
In the further future the system_keyspace will be equpped with its
own query processor backed by local replica database instance, instead
of the whole storage proxy as it is now.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
Namely the query result writer and the reconcilable result builder, used
for building results for regular queries and mutation queries (used in
read repair) respectively.
With this, there are no users left for the v1 output of the compactor,
so we remove that, making the compactor v2 all-the-way (and simpler).
This means that for regular queries, a downgrade phase is eliminated
completely, as regular queries don't store range tombstone in their
result, so no need to convert them.
Tests: unit(dev, release, debug)
"
* 'result-builders-v2/v1' of https://github.com/denesb/scylla:
reconcilable_result_builder: remove v1 support
query_result_builder: remove v1 support
mutation_compactor: drop v1 related code-paths
mutation_compactor: drop v1 support altogether from the API
tree: migrate to the v2 consumer APIs
test/boost/mutation_test: remove v1 specific test code
querier: switch to v2 compactor output
reconcilable_result_builder: add v2 support
query_result_writer: add v2 support
query_result_builder: make consume(range_tombstone) noop
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.
With changes
real 29m14.051s
user 168m39.071s
sys 5m13.443s
Without changes
real 30m36.203s
user 175m43.354s
sys 5m26.376s
Closes#10194
Although Cassandra generally does not allow empty strings as partition
keys (note they are allowed as clustering keys!), it *does* allow empty
strings in regular columns to be indexed by a secondary index, or to
become an empty partition-key column in a materialized view. As noted in
issues #9375 and #9364 and verified in a few xfailing cql-pytest tests,
Scylla didn't allow these cases - and this patch fixes that.
The patch mostly *removes* unnecessary code: In one place, code
prevented an sstable with an empty partition key from being written.
Another piece of removed code was a function is_partition_key_empty()
which the materialized-view code used to check whether the view's
row will end up with an empty partition key, which was supposedly
forbidden. But in fact, should have been allowed like they are allowed
in Cassandra and required for the secondary-index implementation, and
the entire function wasn't necessary.
Note that the removed function is_partition_key_empty() was *NOT* required
for the "IS NOT NULL" feature of materialized views - this continues to
work as expected after this patch, and we add another test to confirm it.
Being null and being an empty string are two different things.
This patch also removes a part of a unit test which enshrined the
wrong behavior.
After this patch we are left with one interesting difference from
Cassandra: Though Cassandra allows a user to create a view row with an
empty-string partition key, and this row is fully visible in when
scanning the view, this row can *not* be queried individually because
"WHERE v=''" is forbidden when v is the partition key (of the view).
Scylla does not reproduce this anomaly - and such point query does work
in Scylla after this patch. We add a new test to check this case, and mark
it "cassandra_bug", i.e., it's a Cassandra behavior which we consider
wrong and don't want to emulate.
This patch relies on #9352 and #10178 having been fixed in previous patches,
otherwise the WHERE v='' does not work when reading from sstables.
We add to the already existing tests we had for empty materialized-views
keys a lookup with WHERE v='' which failed before fixing those two issues.
Fixes#9364Fixes#9375
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
There's nothing specific to scylla in the lister
classes, they could (and maybe should) be part of
the seastar library.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We add a `peers()` method to `discovery` which returns the peers
discovered until now (including seeds). The caller of functions which
return an output -- `tick` or `request` -- is responsible for persisting
`peers()` before returning the output of `tick`/`request` (e.g. before
sending the response produced by `request` back). The user of
`discovery` is also responsible for restoring previously persisted peers
when constructing `discovery` again after a restart (e.g. if we
previously crashed in the middle of the algorithm).
The `persistent_discovery` class is a wrapper around `discovery` which
does exactly that.
For storage we use a simple local table.
A simple bugfix is also included in the first patch.
* kbr/discovery-persist-v3:
service: raft: raft_group0: persist discovered peers and restore on restart
db: system_keyspace: introduce discovery table
service: raft: discovery: rename `get_output` to `tick`
service: raft: discovery: stop returning peer_list from `request` after becoming leader
Before this patch, the experimental TTL (expiration time) feature in
Alternator scans tables for expiration in a tight loop - starting the
next scan one second after the previous one completed.
In this patch we introduce a new configuration option,
alternator_ttl_period_in_seconds, which determines how frequently
to start the scan. The default is 24 hours - meaning that the next
scan is started 24 hours after the previous one started.
The tests (test/alternator/run) change this configuration back to one
second, so that expiration tests finish as quickly as possible.
Please note that the scan is *not* slowed down to fill this 24 hours -
if it finishes in one hour, it will then sleep for 23 hours. Additional
work would be needed to slow down the scan to not finish too quickly.
One idea not yet implemented is to move the expiration service from
the "maintenance" scheduling group which it uses today to a new
scheduling group, and modifying the number of shares that this group
gets.
Another thing worth noting about the configurable period (which defaults
to 24 hours) is that when TTL is enabled on an Alternator table, it can
take that amount of time until its scan starts and items start expiring
from it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.
Memtables are also used outside the replica, in two places:
- in some virtual tables; this is also in some way inside the replica,
(virtual readers are installed at the replica level, not the
cooordinator), so I don't consider it a layering violation
- in many sstable unit tests, as a convenient way to create sstables
with known input. This is a layering violation.
We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).
Test: unit (dev)
Closes#10120
"
The table lists connected clients. For this the clients are
stored in real table when they connect, update their statuses
when needed and remove^w tombstone themselves when they
disconnect. On start the whole table is cleared.
This looks weird. Here's another approach (inspired by the
hackathon project) that makes this table a pure virtual one.
The schema is preserved so is the data returned.
The benefits of doing it virtual are
- no on-disk updates while processing clients
- no potentially failing updates on non-failing disconnect
- less usage of the global qctx thing
- less calls to global storage_proxy
- simpler support for thrift and alternator clients (today's
table implementation doesn't track them)
- the need to make virtual tables reg/unreg dynamic
branch: https://github.com/xemul/scylla/tree/br-clients-virtual-table-4
tests: manual(dev), unit(dev)
The manual test used 80-shards node and 1M connections from
1k different IP addresses.
"
* 'br-clients-virtual-table-4' of https://github.com/xemul/scylla:
test: Add cql-pytest sanity test for system.clients table
client_data: Sanitize connection_notifier
transport: Indentation fix after previous patch
code: Remove old on-disk version of system.clients table
system_keyspace: Add clients_v virtual table
protocol_server: Add get_client_data call
transport: Track client state for real
transport: Add stringifiers to client_data class
generic_server: Gentle iterator
generic_server: Type alias
docs: Add system.clients description
Not a completely mechanical transition. The consumer has to generate its
mutation via a mutation_rebuilder_v2 as mutation fragment v2 cannot be
applied to mutations directly yet.
Now the connection_notifier is all gone, only the client_data bits are left.
To keep it consistent -- rename the files.
Also, while at it, brush up the header dependencies and remove the not
really used constexprs for client states.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This includes most of the connection_notifier stuff as well as
the auxiliary code from system_keyspace.cc and a bunch of
updating calls from the client state changing.
Other than less code and less disk updates on clients connection
paths, this removes one usage of the nasty global qctx thing.
Since the system.clients goes away rename the system.clients_v
here too so the table is always present out there.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This table mirrors the existing clients one but temporarily
has its own name. The schema is the same as in system.clients.
The table gets client_data's from the registered protocol
servers, which in turn are obtained from the storage service.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
This series implements support for the ME sstable format (introduced
in C* 3.11.11).
Tests: unit(dev)
"
* tag 'me-sstable-format-v5' of https://github.com/cmm/scylla:
sstables: validate originating host id
sstable: add is_uploaded() predicate
config: make the ME sstable format default
scylla-gdb.py: recognize ME sstables
sstables: store originating host id in stats metadata
system_keyspace: cache local host id before flushing
database_test: ensure host id continuity
sstables_manager: add get_local_host_id() method and support
sstables_manager: formalize inheritability
system_keyspace, main: load (or create) local host id earlier
sstable_3_x_test: test ME sstable format too
add "ME_SSTABLE" cluster feature
add "sstable_format" config
add support for the ME sstable format
scylla-sstable: add ability to dump optionals and utils::UUID
sstables: add ability to write and parse optionals
globalize sstables::write(..., utils::UUID)
Later in this series the ME sstable format is made default, which
means that `system.local` will likely be written as ME.
Since, in ME, originating host id is a part of sstable stats metadata,
the local host id needs to either already be cached by the time
`system.local` is flushed, or to somehow be special-case-ignored when
flushing `system.local`.
The former (done here) is optimistic (cache before flush), but the
alternative would be an abstraction violation and would also cost a
little time upon each sstable write.
(Cache-before-flush could be undone by catching any exceptions during
flush and un-caching, but inability to `co_await` in catch clauses
makes the code look rather awkward. And there is no need to bother
because bootstrap failures should be fatal anyway)
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Since ME sstable format includes originating host id in stats
metadata, local host id needs to be made available for writing and
validation.
Both Scylla server (where local host id comes from the `system.local`
table) and unit tests (where it is fabricated) must be accomodated.
Regardless of how the host id is obtained, it is stored in the db
config instance and accessed through `sstables_manager`.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
We want it to be cached before any sstable is written, so do it right
after system_keyspace::minimal_setup().
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Initialize it to "md" until ME format support is
complete (i.e. storing originating host id in sstable stats metadata
is implemented), so at present there is no observable change by
default.
Also declare "enable_sstables_md_format" unused -- the idea, going
forward, being that only "sstable_format" controls the written sstable
file format and that no more per-format enablement config options
shall be added.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
CDC registers to the table-creation hook (before_create_column_family)
to add a second table - the CDC log table - to the same keyspace.
The handler function (on_before_update_column_family() in cdc/log.cc)
wants to retrieve the keyspace's definition, but that does NOT WORK if
we create the keyspace and table in one operation (which is exactly what
we intend to do in Alternator to solve issue #9868) - because at the
time of the hook, the keyspace does not yet exist in the schema.
It turns out that on_before_update_column_family() does not REALLY need
the keyspace. It needed it to pass it on to make_create_table_mutations()
but that function doesn't use the keyspace parameter passed to it! All
it needs is the keyspace's name - which is in the schema anyway and
doesn't need to be looked up.
So in this patch we fix make_create_table_mutations() to not require the
unused keyspace parameter - and fix the CDC code not to look for the
keyspace that is no longer needed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220215162342.622509-1-nyh@scylladb.com>
This table will be used to persist the list of peers discovered by the
`discovery` algorithm that is used for creating Raft group 0 when
bootstrapping a fresh cluster.