Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.
data_dictionary::database::real_database() is called from several
places, for these reasons:
- calling yet-to-be-converted code
- callers with a legitimate need to access data (e.g. system_keyspace)
but with the ::database accessor removed from query_processor.
We'll need to find another way to supply system_keyspace with
data access.
- to gain access to the wasm engine for testing whether used
defined functions compile. We'll have to find another way to
do this as well.
The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.
Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
The new module will contain all schema related metadata, detached from
actual data access (provided by the database class). User types is the
first contents to be moved to the new module.
* seastar f8a038a0a2...8d15e8e67a (21):
> core/program_options: preserve defaultness of CLI arguments
> log: Silence logger when logging
> Include the core/loop.hh header inside when_all.hh header
> http: Fix deprecated wrappers
> foreign_ptr: Add concept
> util: file: add read_entire_file
> short_streams: move to util
> Revert "Merge: file: util: add read_entire_file utilities"
> foreign_ptr: declare destroy as a static method
> Merge: file: util: add read_entire_file utilities
> Merge "output_stream: handle close failure" from Benny
> net: bring local_address() to seastar::connected_socket.
> Merge "Allow programatically configuring seastar" from Botond
> Merge 'core: clean up memory metric definitions' from John Spray
> Add PopOS to debian list in install-dependencies.sh
> Merge "make shared_mutex functions exception safe and noexcept" from Benny
> on_internal_error: set_abort_on_internal_error: return current state
> Implementation of iterator-range version of when_any
> net: mark functions returning ethernet_address noexcept
> net: ethernet_address: mark functions noexcept
> shared_mutex: mark wake and unlock methods noexcept
Contains patch from Botond Dénes <bdenes@scylladb.com>:
db/config: configure logging based on app_template::seastar_options
Scylla has its own config file which supports configuring aspects of
logging, in addition to the built-in CLI logging options. When applying
this configuration, the CLI provided option values have priority over
the ones coming from the option file. To implement this scylla currently
reads CLI options belonging to seastar from the boost program options
variable map. The internal representation of CLI options however do not
constitute an API of seastar and are thus subject to change (even if
unlikely). This patch moves away from this practice and uses the new
shiny C++ api: `app_template::seastar_options` to obtain the current
logging options.
Since sstable reader was already converted to flat_mutation_reader_v2, compaction layer can naturally be converted too.
There are many dependencies that use v1. Those strictly needed like readers in sstable set, which links compaction to sstable reader, were converted to v2 in this series. For those that aren't essential we're relying on V1<-->V2 adaptors, and conversion work on them will be postponed. Those being postponed are: scrub specialized reader (needs a validator for mutation_fragment_v2), interposer consumer, combined reader which is used by incremental selector. incremental selector itself was converted to v2.
tests: unit(debug).
Closes#9725
* github.com:scylladb/scylla:
compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2
sstable_set: update make_crawling_reader() to flat_mutation_reader_v2
sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2
sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2
sstable_set: update incremental_reader_selector to flat_mutation_reader_v2
As part of the drive to move over to flat_mutation_reader_v2, update
make_filtering_reader(). Since it doesn't examine range tombstones
(only the partition_start, to filter the key) the entire patch
is just glue code upgrading and downgrading users in the pipeline
(or removing a conversion, in one case).
Test: unit (dev)
Closes#9723
In the very recent commit 3c0e703 fixing issue #8757, we changed the
default prometheus_address setting in scylla.yaml to "localhost", to
match the default listen_address in the same file. We explained in that
commit how this helped developers who use an unchanged scylla.yaml, and
how it didn't hurt pre-existing users who already had their own scylla.yaml.
However, it was quickly noted by Tzach and Amnon that there is one use case
that was hurt by that fix:
Our existing documentation, such as the installation guide
https://www.scylladb.com/download/?platform=centos ask the user to take
our initial scylla.yaml, and modify listen_address, rpc_address, seeds,
and cluster_name - and that's it. That document - and others - don't
tell the user to also override prometheus_address, so users will likely
forget to do so - and monitoring will not work for them.
So this patch includes a different solution to #8757.
What it does is:
1. The setting of prometheus_address in scylla.yaml is commented out.
2. In config.cc, prometheus_address defaults to empty.
3. In main.cc, if prometheus_address is empty (i.e., was not explicitly
set by the user), the value of listen_address is used instead.
In other words, the idea is that prometheus_address, if not explicitly set
by the user, should default to listen_address - which is the address used
to listen to the internal Scylla inter-node protocol.
Because the documentation already tells the user to set listen_address
and to not leave it set to localhost, setting it will also open up
prometheus, thereby solving #9701. Meanwhile, developers who leave the
default listen_address=localhost will also get prometheus_address=localhost,
so the original #8757 is solved as well. Finally, for users who had an old
scylla.yaml where prometheus_address was explicitly set to something,
this setting will continue to be used. This was also a requirement of
issue #8757.
Fixes#9701.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211129155201.1000893-1-nyh@scylladb.com>
As part of changing the codebase to flat_mutation_reader_v2,
change size_estimates_virtual_reader.
Since the bulk of the work is done by
make_flat_mutation_reader_from_mutations() (which is unchanged),
only glue code is affected. It is also not performance sensitive,
so the extra conversions are unimportant.
Test: unit (dev)
Closes#9707
As part of changing the codebase to flat_mutation_reader_v2,
change chained_delegating_reader and its user virtual_table.
Since the reader does not process fragments (only forwarding
things around), only glue code is affected. It is also not
performance sensitive, so the extra conversions are unimportant.
Test: unit (dev)
Closes#9706
"
To ensure consistency of schema and topology changes,
Scylla needs a linearizable storage for this data
available at every member of the database cluster.
The series introduces such storage as a service,
available to all Scylla subsystems. Using this service, any other
internal service such as gossip or migrations (schema) could
persist changes to cluster metadata and expect this to be done in
a consistent, linearizable way.
The series uses the built-in Raft library to implement a
dedicated Raft group, running on shard 0, which includes all
members of the cluster (group 0), adds hooks to topology change
events, such as adding or removing nodes of the cluster, to update
group 0 membership, ensures the group is started when the
server boots.
The state machine for the group, i.e. the actual storage
for cluster-wide information still remains a stub. Extending
it to actually persist changes of schema or token ring
is subject to a subsequent series.
Another Raft related service was implemented earlier: Raft Group
Registry. The purpose of the registry is to allow Scylla have an
arbitrary number of groups, each with its own subset of cluster
members and a relevant state machine, sharing a common transport.
Group 0 is one (the first) group among many.
"
* 'raft-group-0-v12' of github.com:scylladb/scylla-dev:
raft: (server) improve tracing
raft: (metrics) fix spelling of waiters_awaken
raft: make forwarding optional
raft: (service) manage Raft configuration during topology changes
raft: (service) break a dependency loop
raft: (discovery) introduce leader discovery state machine
system_keyspace: mark scylla_local table as always-sync commitlog
system_keyspace: persistence for Raft Group 0 id and Raft Server Id
raft: add a test case for adding entries on follower
raft: (server) allow adding entries/modify config on a follower
raft: (test) replace virtual with override in derived class
raft: (server) fix a typo in exception message
raft: (server) implement id() helper
raft: (server) remove apply_dummy_entry()
raft: (test) fix missing initialization in generator.hh
System dirty memory space is limited by 10MB capacity.
This means that memtables cannot accumulate more than
5MB before they are flushed to sstables.
This can impact performance under load.
Move the `system.raft` table to the regular dirty
memory space.
Fixes: #9692
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211129200044.1144961-1-pa.solodovnikov@scylladb.com>
The experimental_features_t has an all() method, supposedly returning
all values of the enum - but it's easy to forget to update it when
adding a new experimental feature - and it's currently out-of-sync
(it's missing the ALTERNATOR_TTL option).
We already have another method, map(), where a new experimental feature
must be listed otherwise it can't be used, so let's just take all()'s
values from map(), automatically, instead of forcing developers to keep
both lists up-to-date.
Note that using the all() function to enable all experimental features
is not recommended - the best practice is to enable specific experimental
features, not all of them. Nevertheless, this all() function is still used
in one place - in the cql_repl tool - which uses it to enable all
experimental features.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211108135601.78460-1-nyh@scylladb.com>
"
This set covers simple but diverse cases:
- cache hitrace calculator
- repair
- system keyspace (virtual table)
- dht code
- transport event notifier
All the places just require straightforward arguments passing.
And a reparation in transport -- event notifier needs a backref
to the owning server.
Remaining after this set is the snitch<->gossiper interaction
and the cache hitrate app state update from table code.
tests: unit(dev)
"
* 'br-unglobal-gossiper-cont' of https://github.com/xemul/scylla:
transport: Use server gossiper in event notifier
transport: Keep backreference from event_notifier
transport: Keep gossiper on server
dht: Pass gossiper to range_streamer::add_ranges
dht: Pass gossiper argument to bootstrap
system_keyspace: Keep gossiper on cluster_status_table
code: Carry gossiper down to virtual tables creation
repair: Use local gossiper reference
cache_hitrate_calculator: Keep reference on gossiper
Re-enable previously persisted enabled features on node
startup. The features list to be enabled is read from
`system.local#enabled_features`.
In case an unknown feature is encountered, the node
fails to boot with an exception, because that means
the node is doing a prohibited downgrade procedure.
Features should be enabled before commitlog starts replaying
since some features affect storage (for example, when
determining used sstable format).
This patch implements a part of solution proposed by Tomek
in https://github.com/scylladb/scylla/issues/4458.
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
It is infrequently updated (typically once at start) but stores
critical state for this instance survival (Raft Group 0 id, Raft
server id, sstables format), so always write it to commit log
in sync mode.
Implement system_keyspace helpers to persist Raft Group 0 id
and Raft Server id.
Do not use coroutines in a template function to work around
https://bugs.llvm.org/show_bug.cgi?id=50345
One of the tables needs gossiper and uses global one. This patch
prepares the fix by patching the main -> register_virtual_tables
stack with the gossiper reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
And rename to get_batchlog_mutation_for while at it,
as it's about the batchlog, not batch_log.
This resolves a circular dependency between the
batchlog_manager and the storage_proxy that required
it in the case.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
There's nothing in this function that actually requries
the batchlog manager instance.
It uses a random number engine that's moved along with it
to class gossiper.
This resolves a circular dependency between the
batchlog_manager and storage_proxy.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Ssimplify the function implemention and error handling
by invoking a lambda coroutine on shard 0 that keeps
a gate holder and semaphore units on its stack, for RAII-
style unwinding.
It then may invoke a function on another shard, using
the peered service container() to do the
replay on the destination shard.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As a prerequisite to globalizing the batchlog_manager,
allow setting a global pointer to it and instantiate
the sharded<db::batchlog_manager> on the main/cql_test_env
stack.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To avoid back-calling the system_keyspace from the messaging layer
let the system_keyspace get the preferred ips vector and pass it
down to the messaging_service.
This is part of the effort to deglobalize the system keyspace
and query context.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211119143523.3424773-1-bhalevy@scylladb.com>
fmt 8 checks format strings at compile time, and requires that
non-compile-time format strings be wrapped with fmt::runtime().
Do that, and to allow coexistence with fmt 7, supply our own
do-nothing version of fmt::runtime() if needed. Strictly speaking
we shouldn't be introducing names into the fmt namespace, but this
is transitional only.
Closes#9640
Fixes#9630
Adds support for importing a CRL certificate reovcation list. This will be
monitored and reloaded like certs/keys. Allows blacklisting individual certs.
Closes#9655
If we get errors/exceptions in delete_segments we can (and probably will) loose track of disk footprint counters. This can in turn, if using hard limits, cause us to block indefinitely on segment allocation since we might think we have larger footprint than we actually do.
Of course, if we actually fail deleting a segment, it is 100% true that we still technically hold this disk footprint (now unreachable), but for cases where for example outside forces (or wacky tests) delete a file behind our backs, this might not be true. One could also argue that our footprint is the segments and file names we keep track of, and the rest is exterior sludge.
In any case, if we have any exceptions in delete_segments, we should recalculate disk footprint based on current state, and restart all new_segment paths etc.
Fixes#9348
(Note: this is based on previous PR #9344 - so shows these commits as well. Actual changes are only the latter two).
Closes#9349
* github.com:scylladb/scylla:
commitlog: Recalculate footprint on delete_segment exceptions
commitlog_test: Add test for exception in alloc w. deleted underlying file
commitlog: Ensure failed-to-create-segment is re-deleted
commitlog::allocate_segment_ex: Don't re-throw out of function
Other than looking sane, this change continues the founded by the
--workdir option tradition of freeing the developer form annoying
necessity to type too many options when scylla is started by hand
for devel purposes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211116104815.31822-1-xemul@scylladb.com>
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
reverse reads where it is not possible to follow the convention of
passing the query schema which is reversed compared to that of the
mutations.
* Now that we don't depend on the mutations for the schema, we can lift
the restriction on mutations not being empty: this leads to safer
code. When the mutations parameter is empty, an empty reader is
created.
Add "make_" prefix to follow convention of similar reader factory
functions.
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
Add "rows" field to system.large_partitions. Add partitions to the
table when they are too large or have too many rows.
Fixes#9506
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Closes#9577
A config option value is reported as 'text' type and contains
a string as it would looks like in json config.
The table is UPDATE-able. Only the 'value' columnt can be set
and the value accepted must be string. It will be converted into
the option type automatically, however in current implementation
is't not 100% precise -- conversion is lexicographical cast which
only works for simple types. However, liveupdate-able values are
only of those types, so it works in supported cases.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The db::config reference is available on the database, which
can be get from the virtual_table itself. The problem is that
it's a const refernece, while system.config will be updateable
and will need non-const reference.
Adding non-const get_config() on the database looks wrong. The
database shouldn't be used as config provider, even the const
one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The intention is to return some meaningful info to the CQL caller
if a virtual table update fails. Unfortunately the "generic" error
reporting in CQL is not extremely flexible, so the best option
seems to report regular write failre with custom message in it.
For now this only works for virtual table errors.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Symmetrically to virtual reader one, add the virtual writer
callback on a table that will be in charge of applying the
provided mutation.
If a virtual table doesn't override this apply method the
dedicated exception is thrown. Next patch will catch it and
propagate back to caller, so it's a new exception type, not
existing/std one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There soon will appear an updateable system.config table that
will push sstrings into names_value-s. Prepare for this change
by adding the respective .set_value() call. Since the update
only works for LiveUpdate-able options, and inability to do it
can be propagated back to the caller make this method return
true/false whether the update took place or not.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Refs #9331
In segment::close() we add space to managers "wasted" counter. In destructor,
if we can cleanly delete/recycle the file we remove it. However, if we never
went through close (shutdown - ok, exception in batch_cycle - not ok), we can
end up subtracting numbers that were never added in the first place.
Just keep track of the bytes added in a var.
Observed behaviour in above issue is timeouts in batch_cycle, where we
declare the segment closed early (because we cannot add anything more safely
- chunks could get partial/misplaced). Exception will propagate to caller(s),
but the segment will not go through actual close() call -> destructor should
not assume such.
Closes#9598
"
On start scylla resolves several hostnames into addresses. Different
places use different hostname selection logic, e.g. the API address
can be the listen one if the dedicated option not set. Failure to
resolve a hostname is reported with an exception that (sometimes)
contains the hostname, but it doesn't look very convenient -- better
to know the config option name. Also resolving of different hostnames
has different decoration around, e.g. prometheus carries a main-local
lambda just to nicely wrap the try/catch block.
This set unifies this zoo and makes main() shorter and less hairy:
1. All failures to resolve a hostname are reported with an
exception containing the relevant config option
2. The || operator for named_value's is introduced to make
the option selection look as short as
resolve(cfg->some_address() || cfg->another_address())
3. All sanity checks are explicit and happen early in main
4. No dangling local variables carrying the cfg->...() value
5. Use resolved IP when logging a "... is listening on ..."
message after a service start
tests: unit(dev)
"
* 'br-ip-resolve-on-start' of https://github.com/xemul/scylla:
main: Move fb-utilities initialization up the main
code: Use utils::resolve instead of inet_address::lookup
main: Remove unused variable
main: Sanitize resolving of listen address
main: Sanitize resolving of broadcast address
main: Sanitize resolving of broadcast RPC address
main: Sanitize resolving of API address
main: Sanitize resolving of prometheus address
utils: Introduce || operator for named_values
db.config: Verbose address resolver helper
main: Remove api-port and prometheus-port variables
alternator: Resolve address with the help of inet_address
redis, thrift: Remove unused captures
For altered tables, the above function creates schema objects
representing before/after (old/new) table states. In case of views,
there is a matching mechanism to set the base table field of the view to
the appropriate base table object. This works by iterating over the list
of altered tables and selecting the "new_schema" field of the first
instance matching the keyspace/name of the base-table. This ends up
pairing the after/old version of the base table to both the before and
after version of the view. This means the base attached to the view is
possibly incompatible with the view it is attached to.
This patch fixes this by passing the schema generation (before/after) to
the function responsible for this matching, so it can select the
appropriate version of the base class.
For example, given the following input to `merge_tables_and_views()`:
tables_before = { t1_before }
tables_after = { t1_after }
views_before = { v1_before }
views_after = { v1_after }
Before this patch, the `base_schema` field of `v1_before` would be
`t1_after`, while it obviously should be `t1_before`. This sounds scary
but has no practical implications currently as `v1_before` is only
computed and then discarded without being used.
Tests: unit(dev)
Fixes: #9586
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211108124806.151268-1-bdenes@scylladb.com>
There are some users of the latter call left. They all suffer
from the same problem -- the lack of verbosity on resolving
errors.
While at it also get rid of useless local variables that are
only there to carry the cfg->...() option over.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The helper works on named_value() and throws and exception containing
the option name for convenient error reporting.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
System tables currently almost uniformly use a pattern like this to
create their schema:
return schema_builder(make_shared_schema(...))
// [...]
.with_version(...)
.build(...);
This pattern is very wasteful because it first creates a schema, then
dismantles it just to recreate it again. This series abolishes this
pattern without much churn by simply adding a constructor to schema
builder that takes identical parameters to `make_shared_schema()`,
then simply removing `make_shared_schema()` from these users, who now
build a schema builder object directly and build the schema only once.
Tests: unit(dev)
"
* 'schema-builder-make-shared-schema-ctor/v1' of https://github.com/denesb/scylla:
treewide: system tables: don't use make_shared_schema() for creating schemas
schema_builder: add a constructor providing make_shared_schema semantics
schema_builder: without_column(): don't assume column_specification exists
schema: add static variant of column_name_type()
Contains all version related information (`nodetool version` and more).
Example printout:
(cqlsh) select * from system.versions;
key | build_id | build_mode | version
-------+------------------------------------------+------------+-------------------------------
local | aaecce2f5068b0160efd04a09b0e28e100b9cd9e | dev | 4.6.dev-0.20211021.0d744fd3fa
Loosly contains the equivalent of the `nodetool info` command, with some
notable differences:
* Protocol server related information is in `system.protocol_servers`;
* Information about memory, memtable and cache is reformatted to be
tailored to scylla: C* specific terminology and metrics are dropped;
* Information that doesn't change and is already in `system.local` is
not contained;
* Added trace-probability too (`nodetool gettraceprobability`);
TODO(follow-up): exceptions.
Lists all the client protocol server and their status. Example output:
(cqlsh) select * from system.protocol_servers;
name | is_running | listen_addresses | protocol | protocol_version
------------------+------------+---------------------------------------+----------+------------------
native transport | True | ['127.0.0.1:9042', '127.0.0.1:19042'] | cql | 3.3.1
alternator | False | [] | dynamodb |
rpc | False | [] | thrift | 20.1.0
redis | False | [] | redis |
This prints the equivalent of `nodetool statusbinary` and the "Thrift
active" and "Native Transport active" fields from the `nodetool info`
output with some additional information:
* It contains alternator and redis status;
* It contains the protocol version;
* It contains the listen addresses (if respective server is running);