Some time ago it turned out that if unrecognized feature name is met in scylla.yaml, the whole experimental features list is ignored, but scylla continues to boot. There's UNUSED feature which is the proper way to deprecate a feature, and this PR improves its handling in several ways.
1. The recently removed "tablets" feature is partially brought back, but marked as UNUSED
2. Any UNUSED features met while parsing are printed into logs
3. The enum_option<> helper is enlightened along the way
refs: #18968Closesscylladb/scylladb#19230
* github.com:scylladb/scylladb:
config: Mark tablets feature as unused
main: Warn unused features
enum_option: Carry optional key on board
enum_option: Remove on-board _map member
The API endpoints are registered for particular services (with rare exceptions), and once the corresponding service is ready, its endpoints section can be registered too. Same but reversed is for shutdown, and it's automatic with deferred actions.
refs: #2737Closesscylladb/scylladb#19208
* github.com:scylladb/scylladb:
main: Register task manager API next to task manager itself
main: Register messaging API next to messaging service
main: Register repair API next to repair service
Commit 47dbf23773 (Rework view services and system-distributed-keyspace
dependencies) made streaming and repair services depend on view builder,
but missed the fact that the builder itself starts much later.
Move view builder earlier, that's safe, no activity is started upon
that, real building is kicked much later when invoke_on_all(start)
happens.
Other than than, start system distributed keyspace earlier, which also
looks safe, as it's also started "for real" later, by storage service
when it joins the ring.
fixes: #19133
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#19250
When seeing an UNUSED feature -- print it into log. This is where the
enum_option::key is in use. The thing is that experimental features map
different unused feature names into the single UNUSED feature enum
value, so once the feature is parsed its configured name only persists
in the option's key member (saved by previous patch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Fixesscylladb/scylla-pkg#3845
Don't overwrite (or rather change) AWS credentials variables if already set in
enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI.
v2:
* Allow environment variables in reading obj storage config - allows CI to
use real credentials in env without risking putting them info less seure
files
* Don't write credentials info from miniserver into config, instead use said
environment vars to propagate creds.
v3:
* Fix python launch scripts to not clear environment, thus retaining above aws envs.
Closesscylladb/scylladb#19086
After wasm udf appeared, code in main, create_function_statement and schema_tables got some involvements into details of wasm engine management. Also, even prior to this, there was duplication in how function context is created by statement code and schema_tables code.
This PR generalizes function context creation and encapsulates the management in sharded<lang::manager> service. Also it removes the wasm::startup_context thing and makes wasm start/stop be "classical" (see #2737)
Closesscylladb/scylladb#19166
* github.com:scylladb/scylladb:
code: Enlighten wasm headers usage
lang: Unfriend wasm context from manager
lang, cql3, schema_tables: Don't mess with db::config
lang: Don't use db::config to create lua context
lang: Don't use db::config to create wasm context
lang: Drop manager::precompile() method
cql3, schema_tables: Generalize function creation
wasm: Replace startup_context with wasm_config
lang: Add manager::start() method
lang: Move manager to lang namespace
lang: Move wasm::manager to its .cc/.hh files
Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group).
This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group.
Fixes: #18719
- [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this.
Closesscylladb/scylladb#18729
* github.com:scylladb/scylladb:
alternator, scheduler: test reproducing RPC scheduling group bug
main: add maintenance tenant to messaging_service's scheduling config
Currently they both run in streaming group and it may become busy during
repair/mv building and affect group0 functionality. Move it to the
gossiper group where it should have more time to run.
Fixesscylladb/scylladb#18863Closesscylladb/scylladb#19138
Now when function context creation is encapsulated in lang::manager,
some .cc files can stop using wasm-specific headers and just go with the
lang/manager.hh one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Similarly to previous patch, lua context needs db::config for creation.
It's better to get the configurables via lang::manager::config.
One thing to note -- lua config carries updateable_values on board, but
respective db::config options and _not_ LiveUpdate-able, so the lua
config could just use simple data types. This patch keeps updateable
values intact for brevity.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The managerr needs to get two "fuel" configurables from db::config in
order to create context. Instead of carrying db config from callers,
keep the options on existing lang::manager::config and use them.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The lang::manager starts with the help of a context because it needs to
have std::shared_ptr<> pointg to cross-shard shared wasm engine and
runner thread. For that a context is created in advance, that then helps
sharing the engine and runner across manager instances.
This patch removes the "context" and replaces it with classical
manager::config. With it, it's lang::manager who's now responsible for
initializing itself.
In order to have cross-shard engine and thread pointers, the start()
method uses invoke_on_others() facility to share the pointer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Just like any other sharded<> service, the lang::manager now starts and
stops in a classical sequence of
await sharded<manager>::start()
defer([] { await sharded<manager>::stop() })
await sharded<manager>::invoke_on_all(&manager::start)
For now the method is no-op, next patches will start using it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
And, while at it, rename local variable to refer to it to as "manager"
not "wasm". Query processor and database also have getters named
"wasm()", these are not renamed yet to keep patch smaller (and those
getters are going to be reworked further anyway).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's going to become a facade in front of both -- wasm and lua, so keep
it in files with language independent names.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We want to exclude repair with tablet migrations to avoid races
between repair reads and writes with replica movement. Repair is not
prepared to handle topology transitions in the middle.
One reason why it's not safe is that repair may successfully write to
a leaving replica post streaming phase and consider all replicas to be
repaired, but in fact they are not, the new replica would not be
repaired.
Other kinds of races could result in repair failures. If repair writes
to a leaving replica which was already cleaned up, such writes will
fail, causing repair to fail.
Excluding works by keeping effective_replication_map_ptr in a version
which doesn't have table's tablets in transitions. That prevents later
transitions from starting because topology coordinator's barrier will
wait for that erm before moving to a stage later than
allow_write_both_read_old, so before any requests start using the new
topology. Also, if transitions are already running, repair waits for
them to finish.
A blocked tablet migration (e.g. due to down node) will block repair,
whereas before it would fail. Once admin resolves the cause of blocked migration,
repair will continue.
Fixes#17658.
Fixes#18561.
Closesscylladb/scylladb#18641
* github.com:scylladb/scylladb:
test: pylib: Do not block async reactor while removing directories
repair: Exclude tablet migrations with tablet repair
repair_service: Propagate topology_state_machine to repair_service
main, storage_service: Move topology_state_machine outside storage_service
storage_srvice, toplogy: Extract topology_state_machine::await_quiesced()
tablet_scheduler: Make disabling of balancing interrupt shuffle mode
tablet_scheduler: Log whether balancing is considered as enabled
All sharded<service>'s a supposed to have their own config and not use global db::config one. The service config, in turn, is to be created by main/cql_test_env/whatever out of db::config and, maybe, other data. Gossiper is almost there, but it still uses db::config in few places.
Closesscylladb/scylladb#19051
* github.com:scylladb/scylladb:
gossiper: Stop using db::config
gossiper: Move force_gossip_generation on gossip_config
gossiper: Move failure_detector_timeout_ms on gossip_config
main: Fix indentation after previous patch
main: Make gossiper config a sharded parameter
main: Add local variable for set of seeds
main: Add local variable for group0 id
main: Add local variable for cluster_name
Protocol servers are started last, and are registered in storage_service, which stops them. Also there are deferred actions scheduled to stop protocol servers on aborted start and a FIXME asking to make even this case rely on storage_service. Also, there's a (rather rare) aborted-start bug in alternator and redis. Yet, thrift can be left started in some weird circumstances. This patch fixes it all. As a side effect, the start-stop code becomes shorter and a bit better structured.
refs: #2737Closesscylladb/scylladb#19042
* github.com:scylladb/scylladb:
main: Start alternator expiration service earlier
main: Start redis transparently
main: Start alternator transparently
main: Start thrift transparently
main: Start native transport transparently
storage_service: Make register_protocol_server() start the server
storage_service: Turn register_protocol_server() async method
storage_service: Outline register_protocol_server()
main: Schedule deferred drain_on_shutdown() prior to protocol servers
main: Move some trailing startup earlier
Currently it gets the streaming/maintenance one from database, but it
can as well just assume that it's already running in the correct one,
and the main code fulfils this assumption.
This removes one more place that uses database as sched groups provider.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#19078
Next patches will put updateable_value's on it, but plain copy of them
across shard doesn't work (see #7316)
Indentation is deliberately left broken
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patch will do seeds assignment to gossiper config on each
shard, so it's good to have it once, then copy around
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patch will do group0_id assignment to gossiper config on each
shard, so it's good to have it once, then copy around
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's modified if its empty, next patch will make this code be called on
each shard, so modification must happen only once
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Prior to registering drain_on_shutdown and all the protorocl servers.
To keep the natural sequence
- start core
- register drain-on-shutdown
- start transport(s)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now possible to start protocol server when registered. It will also
be stopped automatically on shutdown / aborted shutdown.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now possible to start protocol server when registered. It will also
be stopped automatically on shutdown / aborted shutdown.
Also move the controller variable lower to keep it all next to each
other.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now possible to start protocol server when registered. It will also
be stopped automatically on shutdown / aborted shutdown.
It also fixes a rare bug. If thrifst is not asked to be started on boot,
its deferred shutdown action isn't created, so it it's later started via
the API, it won't be stopped on shutdown.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now possible to start protocol server when registered. It will also
be stopped automatically on shutdown / aborted shutdown.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Nex patches will remove protocol servers' deferred stops and will rely
on drain_on_shutdown -> stop_transport to do it, so the drain deferred
action some come before protocol servers' registration.
This also fixes a bug. Currently alternator and redis both rely on
protocol servers to stop them on shutdown. However, when startup is
aborted prior to drain_on_shutdown() registration, protocol servers are
not stopped and alternator and redis can remain stopped.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The set_abort_on_ebadf() call and some api endpoints registration come
after protocol servers. The latter is going to be shuffled, so move the
former earlier not to hang around.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently the loader is called via API, which inherits the maintenance scheduling group from API http server. The loader then can either do load_and_stream() or call (legacy) distributed_loader::upload_new_sstables(). The latter first switches into streaming scheduling group, but the former doesn't and continues running in the maintenance one.
All this is not really a problem, because streaming sched group and maintenance sched group is one group under two different variable names. However, it's messy and worth delegating the sched group switch (even if it's a no-op) to the sstables-loader. As a nice side effect, this patch removes one place that uses database as proxy object to get configuration parameters.
Closesscylladb/scylladb#18928
* github.com:scylladb/scylladb:
sstables-loader: Run loading in its scheduling group
sstables-loader: Add scheduling group to constructor
This change supports changing replication factor in tablets-enabled keyspaces.
This covers both increasing and decreasing the number of tablets replicas through
first building topology mutations (`alter_keyspace_statement.cc`) and then
tablets/topology/schema mutations (`topology_coordinator.cc`).
For the limitations of the current solution, please see the docs changes attached to this PR.
Fixes: #16129Closesscylladb/scylladb#16723
* github.com:scylladb/scylladb:
test: Do not check tablets mutations on nodes that don't have them
test: Fix the way tablets RF-change test parses mutation_fragments
test/tablets: Unmark RF-changing test with xfail
docs: document ALTER KEYSPACE with tablets
Return response only when tablets are reallocated
cql-pytest: Verify RF is changes by at most 1 when tablets on
cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1
Reject ALTER with 'replication_factor' tag
Implement ALTER tablets KEYSPACE statement support
Parameterize migration_manager::announce by type to allow executing different raft commands
Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks
Extend system.topology with 3 new columns to store data required to process alter ks global topo req
Allow query_processor to check if global topo queue is empty
Introduce new global topo `keyspace_rf_change` req
New raft cmd for both schema & topo changes
Add storage service to query processor
tablets: tests for adding/removing replicas
tablet_allocator: make load_balancer_stats_manager configurable by name
Currently only the user tenant (statement scheduling group) and system
(default scheduling group) tenants exist, as we used to have only
user-initiated operations and sytem (internal) ones. Now there is need
to distinguish between two kinds of system operation: foreground and
background ones. The former should use the system tenant while the
latter will use the new maintenance tenant (streaming scheduling group).
The system-distributed-keyspace and view-update-generator often go in pair, because streaming, repair and sstables-loader (via distributed-loader) need them booth to check if sstable is staging and register it if it's such. The check is performed by messing directly with system_distributed.view_build_status table, and the registration happens via view-update-generator.
That's not nice, other services shouldn't know that view status is kept in system table. Also view-update-generator is a service to generae and push view updates, the fact that it keeps staging sstables list is the implementation detail.
This PR replaces dependencies on the mentioned pair of services with the single dependency on view-builder (repair, sstables-loader and stream-manager are enlightened) and hides the view building-vs-staging details inside the view_builder.
Along the way, some simplification of repair_writer_impl class is done.
Closesscylladb/scylladb#18706
* github.com:scylladb/scylladb:
stream_manager: Remove system_distributed_keyspace and view_update_generator
repair: Remove system_distributed_keyspace and view_update_generator
streaming: Remove system_distributed_keyspace and view_update_generator
sstables_loader: Remove system_distributed_keyspace and view_update_generator
distributed_loader: Remove system_distributed_keyspace and view_update_generator
view: Make register_staging_sstable() a method of view_builder
view: Make check_view_build_ongoing() helper a method of view_builder
streaming: Proparage view_builder& down to make_streaming_consumer()
repair: Keep view_builder& on repair_writer_impl
distributed_loader: Propagate view_builder& via process_upload_dir()
stream_manager: Add view builder dependency
repair_service: Add view builder dependency
sstables_loader: Add view_bulder dependency
main: Start sstables loader later
repair: Remove unwanted local references from repair_meta
There are four of them currently -- transport, thrift, alternator and
redis. This patch makes main pass to all the statement scheduling group
as constructor argument. Next patches will make use of it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>