No code uses global gossiper instance, it can be removed. The main and
cql-test-env code now have their own real local instances.
This change also requires adding the debug:: pointer and fixing the
scylle-gdb.py to find the correct global location.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is needed not to mess with removed global gossiper in the next
patch. Other than this, it's better to access services by their own
debug:: pointers, not via under-the-good dependencies chains.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some places in the code has function-local gossiper reference but
continue to use global instance. Re-use the local reference (it's going
to become sharded<> instance soon).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The reference is put on the snitch_ptr because this is the sharded<>
thing and because gossiper reference is the same for different snitch
drivers. Also, getting gossiper from snitch_ptr by driver will look
simpler than getting it from any base class.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Snitch depends on gossiper and system keyspace, so it needs to be
started after those two do.
fixes#10402
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We start the memory threshold guard (that enables large memory allocation
warnings post-boot) but don't wait for it. I can't imagine it can hurt,
but it does carry a FIXME label.
Closes#10375
* 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla:
main: allow joining raft group0 before waiting for gossiper to settle
service: raft_group0: make `join_group0` re-entrant
service: storage_service: add `join_group0` method
raft_group_registry: update gossiper state only on shard 0
raft: don't update gossiper state if raft is enabled early or not enabled at all
gms: feature_service: add `cluster_uses_raft_mgmt` accessor method
db: system_keyspace: add `bootstrap_needed()` method
db: system_keyspace: mark getter methods for bootstrap state as "const"
"
There's a generic way to start-stop services in scylla, that includes
5 "actions" (some are optional and/or implicit though)
service_config cfg = ...
sharded<service>.start(cfg)
service.invoke_on_all(&service::start)
service.invoke_on_all(&service::shutdown)
service.invoke_on_all(&servuce::stop)
sharded<service>.stop()
and most of the service out there conforms to that scheme. Not snitch
(spoiler: and not tracing), for which there's a couple of helpers that
do all that magic behind the scenes, "configuring" snitch is done with
the help of overloaded constructors. The latter is extra complicated
with the need to register snitch drivers in class-registry for each
constructor overload. Also there's an external shards synchronization
on stop.
This set brings snitch start/stop code to the described standard: the
create/stop helpers are removed, creation acceps the config structure,
per-shard start/stop (snitch has no drain for now) happens in the
simple invoke-on-all manner.
The intended side effect of this change is the ability to add explicit
dependencies to snitch (in the future, not in this set).
tests: unit(dev)
"
* 'br-snitch-config' of https://github.com/xemul/scylla:
snitch: Remove create_snitch/stop_snitch
snitch: Simplify stop (and pause_io)
snitch: Move io_is_stopped to property-file driver
snitch: Remove init_snitch_obj()
snitch: Move instance creation into snitch_ptr constructor
snitch: Make config-based construction of all drivers
snitch: Declare snitch_ptr peering and rework container() method
snitch: Introduce container() method
A node can join group0 without waiting for gossiper if
it is either a fresh node, or it's an existing node, which
is already part of some group0 (i.e. have `group0_id` persisted
in system tables).
In that case the second `join_group0()` call inside the
`storage_service::join_token_ring` will be a no-op.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Repair updates (and queries on start) the system.repair_history table
and thus depends on the system_keyspace object
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After previous patches both, create_snitch() and stop_snitch() no look
like the classica sharded service start/stop sequence. Finally both
helpers can be removed and the rest of the user can just call start/stop
on locally obtained sharded references.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently snitch drivers register themselves in class-registry with all
sorts of construction options possible. All those different constuctors
are in fact "config options".
When later snitch will declare its dependencies (gossiper and system
keyspace), it will require patching all this registrations, which's very
inconvenient.
This patch introduces the snitch_config struct and replaces all the
snitch constructors with the snitch_driver(snitch_config cfg) one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The gossiper reads peer features from system keyspace. Also the snitch
code needs system keyspace, and since now it gets all its dependencies
from gossiper (will be fixed some day, but not now), it will do the same
for sys.ks.. Thus it's worth having gossiper->system_keyspace explicit
dependency.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The service uses system keyspace to, e.g., manage the generation id,
thus it depends on the system_keyspace instance and deserves the
explicit reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Prior to the change, `USES_RAFT_CLUSTER_MANAGEMENT` feature wasn't
properly advertised upon enabling `SUPPORTS_RAFT_CLUSTER_MANAGEMENT`
raft feature.
This small series consists of 3 parts to fix the handling of supported
features for raft:
1. Move subscription for `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` to the
`raft_group_registry`.
2. Update `system.local#supported_features` directly in the
`feature_service::support()` method.
3. Re-advertise gossiper state for `SUPPORTED_FEATURES` gossiper
value in the support callback within `raft_group_registry`.
* manmanson/track_supported_set_recalculation_v7:
raft: re-advertise gossiper features when raft feature support changes
raft: move tracking `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` feature to raft
gms: feature_service: update `system.local#supported_features` when feature support changes
test: cql_test_env: enable features in a `seastar::thread`
Move the listener from feature service to the `raft_group_registry`.
Enable support for the `USES_RAFT_CLUSTER_MANAGEMENT`
feature when the former is enabled.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The method needs to call merge_schema() that will need system keyspace
instance at hand. The parse_s._t. method is boot-time one, pushing the
main-local instance through it is fine
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The main target here is system_keyspace::update_schema_version() which
is now static, but needs to have system_keyspace at "this". Migration
manager is one of the places that calls that method indirectly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's called only on start and actively uses both qctx and local
cache. Next patches will fix the whole setup code to stop using
global qctx/cache.
For now setup invocation is left in its place, but it must really
happen in start() method. More patching is needed to make it work.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For now it's a reference, but all users of the cache will be
eventually switched into using system_keyspace.
In cql-test-env cache starting happens earlier than it was
before, but that's OK, it just initializes empty instances.
In main cache starts at the same time as before patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Start happens at exactly the same place. One thing to take care
of is that it happens on all shards.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The db::system_keyspace was made a class some time ago, time to create
a standard sharded<> object out of it. It needs query processor and
database. None of those depensencies is started early enough, so the
object for now starts in two steps -- early instances creation and
late start.
The instances will carry qctx and local_cache on board and all the
services that need those two will depend on system-keyspace. Its start
happens at exactly the same place where system_keyspace::setup happens
thus any service that will use system_keyspace will be on the same
safe side as it is now.
In the further future the system_keyspace will be equpped with its
own query processor backed by local replica database instance, instead
of the whole storage proxy as it is now.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently any unhandled error during deferred shutdown
is rethrown in a noexcept context (in ~deferred_action),
generating a core dump.
The core dump is not helpful if the cause of the
error is "environmental", i.e. in the system, rather
than in scylla itself.
This change detects several such errors and calls
_Exit(255) to exit the process early, without leaving
a coredump behind. Otherwise, call abort() explicitly,
rather than letting terminate() be called implicitly
by the destructor exception handling code.
Fixes#9573
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220227101054.1294368-1-bhalevy@scylladb.com>
Now the connection_notifier is all gone, only the client_data bits are left.
To keep it consistent -- rename the files.
Also, while at it, brush up the header dependencies and remove the not
really used constexprs for client states.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This includes most of the connection_notifier stuff as well as
the auxiliary code from system_keyspace.cc and a bunch of
updating calls from the client state changing.
Other than less code and less disk updates on clients connection
paths, this removes one usage of the nasty global qctx thing.
Since the system.clients goes away rename the system.clients_v
here too so the table is always present out there.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Since ME sstable format includes originating host id in stats
metadata, local host id needs to be made available for writing and
validation.
Both Scylla server (where local host id comes from the `system.local`
table) and unit tests (where it is fabricated) must be accomodated.
Regardless of how the host id is obtained, it is stored in the db
config instance and accessed through `sstables_manager`.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
We want it to be cached before any sstable is written, so do it right
after system_keyspace::minimal_setup().
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
main() has some logic to select the main function it will delegate to
based on argv[1]. The intent is that when the value of argv[1] suggest
that the user did not specify a specific app to run, we default to
"server" (scylla proper).
This logic currently breaks down when there are no arguments at all: in
this case the following error is printed and scylla refuses to start:
error: unrecognized first argument: expected it to be "server", a regular command-line argument or a valid tool name (see `scylla --list-tools`), but got
Fix this by checking for empty argv[1] and defaulting to "server" in
that case.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220210092125.293682-1-bdenes@scylladb.com>
The new service is responsible for:
* spreading forward_request execution across multiple nodes in cluster
* collecting forward_request execution results and merging them
`forward_service::dispatch` method takes forward_request as an
argument, and forwards its execution to group of other nodes (using rpc
verb added in previous commits). Each node (in the group chosen by
dispatch method) is provided with forward_request, which is no different
from the original argument except for changed partition ranges. They are
changed so that vnodes contained in them are owned by recipient node.
Executing forward_request is realized in `forward_service::execute`
method, that is registered to be called on FORWARD_REQUEST verb receipt.
Process of executing forward_request consists of mocking few
non-serializable object (such as `cql3::selection`) in order to create
`service:pager:query_pagers::pager` and `cql3::selection::result_set_builder`.
After pager and result_set_builder creation, execution process resembles
what might be seen in select_statement's execution path.
Alternator is a coordinator-side service and so should not access
the replica module. In this series all but one of uses of the replica
module are replaced with data_dictionary.
One case remains - accessing the replication map which is not
available (and should not be available) via the data dictionary.
The data_dictionary module is expanded with missing accessors.
Closes#9945
* github.com:scylladb/scylla:
alternator: switch to data_dictionary for table listing purposes
data_dictionary: add get_tables()
data_dictionary: introduce keyspace::is_internal()
As a coordinator-side service, alternator shouldn't touch the
replica module, so it is migrated here to data_dictionary.
One use case still remains that uses replica::keyspace - accessing
the replication map. This really isn't a replica-side thing, but it's
also not logically part of the data dictionary, so it's left using
replica::keyspace (using the data_dictionary::database::real_database()
escape hatch). Figuring out how to expose the replication map to
coordinator-side services is left for later.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
Be silent when argv[1] starts with "-", it is probably an option to
scylla (and "server" is missing from the cmd line).
Print an error and stop when argv[1] doesn't start with "-" and thus the
user assumably meant to start either the server or a tool and mis-typed
it. Instead of trying to guess what they meant stop with a clear error
message.
And make it the central place listing available tools (to minimize the
places to update when adding a new one). The description is edited to
point to this command instead of listing the tools itself.
Remove "compatible with Apache Cassandra", scylla is much more than that
already.
Rephrase the part describing the included tools such that it is clear
that the scylla server is the main thing and the tools are the "extra"
additions. Also use the term "tool" instead of the term "app".
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.
To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.
In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:
1) GC a tombstone after gc_grace_seconds
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;
This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.
2) Never GC a tombstone
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};
3) GC a tombstone immediately
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};
4) GC a tombstone after repair
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};
In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.
A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.
Tests: compaction_test.py, ninja test
Fixes#3560
[avi: resolve conflicts vs data_dictionary]
"
A big problem with scylla tool executables is that they include the
entire scylla codebase and thus they are just as big as the scylla
executable itself, making them impractical to deploy on production
machines. We could try to combat this by selectively including only the
actually needed dependencies but even ignoring the huge churn of
sorting out our depedency hell (which we should do at one point anyway),
some tools may genuinely depend on most of the scylla codebase.
A better solution is to host the tool executables in the scylla
executable itself, switching between the actual main function to run
some way. The tools themselves don't contain a lot of code so
this won't cause any considerable bloat in the size of the scylla
executable itself.
This series does exactly this, folds all the tool executables into the
scylla one, with main() switching between the actual main it will
delegate to based on a argv[1] command line argument. If this is a known
tool name, the respective tool's main will be invoked.
If it is "server", missing or unrecognized, the scylla main is invoked.
Originally this series used argv[0] as the mean to switch between the
main to run. This approach was abandoned for the approach mentioned above
for the following reasons:
* No launcher script, hard link, soft link or similar games are needed to
launch a specific tool.
* No packaging needed, all tools are automatically deployed.
* Explicit tool selection, no surprises after renaming scylla to
something else.
* Tools are discoverable via scylla's description.
* Follows the trend set by modern command line multi-command or multi-app
programs, like git.
Fixes: #7801
Tests: unit(dev)
"
* 'tools-in-scylla-exec-v5' of https://github.com/denesb/scylla:
main,tools,configure.py: fold tools into scylla exec
tools: prepare for inclusion in scylla's main
main: add skeleton switching code on argv[1]
main: extract scylla specific code into scylla_main()
Move saving features to `system.local#supported_features`
to the point after passing all remote feature checks in
the gossiper, right before joining the ring.
This makes `system.local#supported_features` column to store
advertised feature set. Leave a comment in the definition of
`system.local` schema to reflect that.
Since the column value is not actually used anywhere for now,
it shouldn't affect any tests or alter the existing behavior.
Later, we can optimize the gossip communication between nodes
in the cluster, removing the feature check altogether
in some cases (since the column value should now be monotonic).
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The infrastructure is now in place. Remove the proxy main of the tools,
and add appropriate `else if` statements to the executable switch in
main.cc. Also remove the tool applications from the `apps` list and add
their respective sources as dependencies to the main scylla executable.
With this, we now have all tool executables living inside the scylla
main one.
To prepare for the scylla executable hosting more than one apps,
switching between them using argv[1]. This is consistent with how most
modern multi-app/multi-command programs work, one prominent example
being git.
For now only one app is present: scylla itself, called "server". If
argv[1] is missing or unrecognized, this is what is used as the default
for backward-compatibility.
The scylla app also gets a description, which explains that scylla hosts
multiple apps and lists all the available ones.
main() now contains only generic setup and teardown code and it
delegates to scylla_main().
In the next patches we want to wire in tool executables into the scylla
one. This will require selecting the main to run at runtime.
scylla_main() will be just one of those (the default).