Introduce `raft` experimental option.
Adjust the tests accordingly to accomodate the new option.
It's not enabled by default when providing
`--experimental=true` config option and should be
requested explicitly via `--experimental-options=raft`
config option.
Hide the code related to `raft_group_registry` behind
the switch. The service object is still constructed
but no initialization is performed (`init()` is not
called) if the flag is not set.
Later, other raft-related things, such as raft schema
changes, will also use this flag.
Also, don't introduce a corresponding gossiper feature
just yet, because again, it should be done after the
raft schema changes API contract is stabilized.
This will be done in a separate series, probably related to
implementing the feature itself.
Tests: unit(dev)
Ref #9239.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210823121956.167682-1-pa.solodovnikov@scylladb.com>
Prepare for updating seastar submodule to a change
that requires deferred actions to be noexcept
(and return void).
Test: unit(dev, debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Registration of the currently unused hinted handoff endpoints is moved
out from the set_server_done function. They are now explicitly
registered in main.cc by calling api::set_hinted_handoff and also
uninitialized by calling api::unset_hinted_handoff.
Setting/unsetting HTTP API separately will allow to pass a reference to
the sync_point_service without polluting the set_server_done function.
There are 3 places that can now declare local instance:
- main
- cql_test_env
- boost gossiper test
The global pointer is saved in debug namespace for debugging.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
One of them just re-wraps arguments in std::ref and calls for
global storage service. The other one is dead code which also
calls the global s._s. Remove both and fix the only caller.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The thrift_handler class' methods need storage service. This
patch makes sure this class has sharded storage service
reference on board.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Both set_server_storage_service and set_server_storage_proxy set up
API handlers that need storage service to work. Now they all call for
global storage service instance, but it's better if they receive one
from main. This patch carries the sharded storage service reference
down to handlers setting function, next patch will make use of it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data.
This PR was created by @jul-stas and @StarostaGit
Closes#8961
* github.com:scylladb/scylla:
test/boost: run_mutation_source_tests on streaming virtual table
system_keyspace: Introduce describe_ring table as virtual_table
storage_service: Pass the reference down to system_keyspace
endpoint_details: store `_host` as `gms::inet_address`
queue_reader: implement next_partition()
virtual_tables: Introduce streaming_virtual_table
flat_mutation_reader: Add a new filtering reader factory method
"
The cql-server -> storage-service dependency comes from the server's
event_notifier which (un)subscribes on the lifecycle events that come
from the storage service. To break this link the same trick as with
migration manager notifications is used -- the notification engine
is split out of the storage service and then is pushed directly into
both -- the listeners (to (un)subscribe) and the storage service (to
notify).
tests: unit(dev), dtest(simple_boot_shutdown, dev)
manual({ start/stop,
with/without started transport,
nodetool enable-/disablebinary
} in various combinations, dev)
"
* 'br-remove-storage-service-from-transport' of https://github.com/xemul/scylla:
transport.controller: Brushup cql_server declarations
code: Remove storage-service header from irrelevant places
storage_service: Remove (unlifecycle) subscribe methods
transport: Use local notifier to (un)subscribe server
transport: Keep lifecycle notifier sharded reference
main: Use local lifecycle notifier to (un)subscribe listeners
main, tests: Push notifier through storage service
storage_service: Move notification core into dedicated class
storage_service: Split lifecycle notification code
transport, generic_server: Remove no longer used functionality
transport: (Un)Subscribe cql_server::event_notifier from controller
tests: Remove storage service from manual gossiper test
The storage proxy and sl-manager get subscribed on lifecycle
events with the help of storage service. Now when the notifier
lives in main() they can use it directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it's time to move the lifecycle notifier from storage
service to the main's scope. Next patches will remove the
$lifecycle-subscriber -> storage_service dependency.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Stop the batchlog manager using a deferred
action in main to make sure it is stopped after
its start() method has been called, also
if we bail out of main early due to exception.
Change the bm.stop() calls in storage_service
to just stop the replay loop using drain().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
... when decommissioned (reworked)' from Eliran Sinvani
This is a rework of #8916 The polling loop of the service level
controller queries a distributed table in order to detect configuration
changes. If a node gets decommissioned, this loop continues to run until
shutdown, if a node stays in the decommissioned mode without being shut
down, the loop will fail to query the table and this will result in
warnings and eventually errors in the log. This is not really harmful
but it adds unnecessary noise to the log. The series below lays the
infrastructure for observing storage service state changes, which
eventually being used to break the loop upon preparation for
decommissioning. Tests: Unit test (dev) Failing tests in jenkins.
Fixes#8836
The previous merge (possibly due to conflict resolution) contained a
misplaced get that caused an abort on shutdown.
Closes#9035
* github.com:scylladb/scylla:
Service Level Controller: Stop configuration polling loop upon leaving the cluster
main: Stop using get_local_storage_service in main
Seastar's default limit of 10,000 iocbs per shard is too low for
some workload (it places an upper bound on the number of idle
connections, above which a crash occurs). Use the new Seastar
feature to raise the default to 50000.
Also multiply the global reservation by 5, and round it upwards
so the number is less weird. This prevents io_setup() from failing.
For tests, the reservation is reduced since they don't create large
numbers of connections. This reduces surprise test failures when they
are run on machines that haven't been adjusted.
Fixes#9051Closes#9052
the cluster
This change subscribes service_level_controller for nodes life cycle
notifications and uses the notification of leaving the cluster for
the current node to stop the configuration polling loop. If the loop
continues to run it's queries will fail consistently since the nodes
will not answers to queries. It is worth mentioning that the queries
failing in the current state of code is harmles but noisy since after
90 seconsd, if the scylla process is not shut down the failures will
start to generate failure logs every 90 seconds which is confusing for
users.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
This change removes the use of service::get_local_storage_service,
instead, the approach taken is similar to other modules, for example
storage_proxy where a reference to the `sharded` module is obtained
once and then the local() method in combination with capturing is
used.
As per 32bcbe59ad, the sl_controller is stopped after set_distributed_data_accessor is called.
However if scylla shuts down before that happens, the sl_controller still needs to be stopped.
We need to drain the service level controller before storage_proxy::drain_on_shutdown
is called to prevent queries by the update loop from starting after the
storage_proxy has been drained - leading to issues similar to #9009.
Fixes#9014
Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210713055606.130901-1-bhalevy@scylladb.com>
Since compaction is layered on top of sstables, let's move all compaction code
into a new top-level directory.
This change will give me extra motivation to remove all layer violations, like
sstable calling compaction-specific code, and compaction entanglement with
other components like table and storage service.
Next steps:
- remove all layer violations
- move compaction code in sstables namespace into a new one for compaction.
- move compaction unit tests into its own file
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>
logalloc has a nice leak/double-free sanitizer, with the nice
feature of capturing backtraces to make error reports easy to
track down. But capturing backtraces is itself very expensive.
This patch makes backtrace capture optional, reducing database_test
runtime from 30 minutes to 20 minutes on my machine.
Closes#8978
An incorrect spelling of max-io-requests was used in creating the
warning message text. Due to conversion to unsigned, the code would
crash due to bad_lexical_cast exception. The spelling of this
configuration name was fixed in the past (44f3ad836b), but only in the
'if' condition.
Fix by using the correct spelling.
Closes#8963
This is a more informative name. Helps see that, say, group0
is a separate service and not bundle all raft services together.
Message-Id: <20210619211412.3035835-3-kostja@scylladb.com>
The storage service pointer is only used so (un)subscribe
to (from) lifecycle events. Now the subscription is gone,
so can the storage service pointer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We check for the existence of the option using one spelling,
then read it using another, so we crash with bad_lexical_cast
if it's present when casting the empty string to unsigned.
Fix by using the correct spelling.
Closes#8866
Merged patch series by Pavel Emelyanov:
Alternator start and stop code is sitting inside the main()
and it's a big piece of code out there. Havig it all in main
complicates rework of start-stop sequences, it's much more
handy to have it in alternator/.
This set puts the mentioned code into transport- and thrift-
like controller model. While doing it one more call for global
storage service goes away.
* 'br-alternator-clientize' of https://github.com/xemul/scylla:
alternator: Move start-stop code into controller
alternator: Move the whole starting code into a sched group
alternator: Dont capture db, use cfg
alternator: Controller skeleton
alternator: Controller basement
alternator: Drop storage service from executor
This patch fixes the following warning:
main.cc:307:53: warning: 'capacity' is deprecated: modern I/O queues
should use a property file [-Wdeprecated-declarations]
auto capacity = engine().get_io_queue().capacity();
It's fine to just check --max-io-requests directly because seastar
sets io_queue::capacity to the value of this parameter anyway.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
We check that the number of open files is sufficent for normal
work (with lots of connections and sstables), but we can improve
it a little. Systemd sets up a low file soft limit by default (so that
select() doesn't break on file descriptors larger than 1023) and
recommends[1] raising the soft limit to the more generous hard limit
if the application doesn't use select(), as ours does not.
Follow the recommendation and bump the limit. Note that this applies
only to scylla started from the command line, as systemd integration
already raises the soft limit.
[1] http://0pointer.net/blog/file-descriptor-limits.htmlCloses#8756
This move is not "just move", but also includes:
- putting the whole thing into seastar::async()
- switch from locally captured dependencies into controller's
class members
- making smp_service_groups optional because it doesn't have
default contructor and should somehow survive on constructed
controller until its start()
Also copy few bits from main that can be generalized later:
- get_or_default() helper from main
- sharded_parameter lambda for cdc
- net family and preferred thing from main
( this also fixed the indentation broken by previous patch )
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The controller won't have the database_config at hands to get
the sched group from. All other client services run the whole
controller start in the needed sched group, so prepare the
alternator controller for that.
To make it compile (and while-at-it) also move up the sharded
server and executor instances and the smp_service_group. All
of these will migrate onto the controller in the next patch.
( the indentation is deliberately left broken )
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When .init()ing the server one needs to provide the
max_concurrent_requests_per_shard value from config.
Instead of carrying the database around for it -- use the
db::config itself which is at hand. All the shards share
its instance anyway.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add the controller class with all the needed dependencies. For
now completely unused (thus a bunch of (void)-s here and there).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Both controller and server only need database to get config from.
Since controller creation only happens in main() code which has the
config itself, we may remove database mentioning from transport.
Previous attempt was not to carry the config down to the server
level, but it stepped on an updateable_value landmine -- the u._v.
isn't copyable cross-shard (despite the docs) and to properly
initialize server's max_concurrent_requests we need the config's
named_value member itself.
The db::config that flies through the stack is const reference, but
its named_values do not get copied along the way -- the updateable
value accepts both references and const references to subscribe on.
tests: start-stop in debug mode
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210607135656.18522-1-xemul@scylladb.com>
Raft group 0 initialization and configuration changes
should be integrated with Scylla cluster assembly,
happening when starting the storage service and joining
the cluster. Prepare for this.
Since Raft service depends on query processor, and query
processor depends on storage service, to break a dependency
loop split Raft initialization into two steps: starting
an under-constructed instance of "sharded" Raft service,
accepting an under-constructed instance of "sharded"
query_processor, and then passed into storage service start
function, and then the local state of Raft groups from system
tables once query processor starts.
Consistently abbreviate raft_services instance raft_svcs, as
is the convention at Scylla.
Update the tests.
All start-stop code in main has the sharded<database> at hands, there's
no need in getting it from global storage service.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
The patch set is an assorted collection of header cleanups, e.g:
* Reduce number of boost includes in header files
* Switch to forward declarations in some places
A quick measurement was performed to see if these changes
provide any improvement in build times (ccache cleaned and
existing build products wiped out).
The results are posted below (`/usr/bin/time -v ninja dev-build`)
for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX).
Before:
Command being timed: "ninja dev-build"
User time (seconds): 28262.47
System time (seconds): 824.85
Percent of CPU this job got: 3979%
Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2129888
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1402838
Minor (reclaiming a frame) page faults: 124265412
Voluntary context switches: 1879279
Involuntary context switches: 1159999
Swaps: 0
File system inputs: 0
File system outputs: 11806272
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
After:
Command being timed: "ninja dev-build"
User time (seconds): 26270.81
System time (seconds): 767.01
Percent of CPU this job got: 3905%
Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2117608
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1400189
Minor (reclaiming a frame) page faults: 117570335
Voluntary context switches: 1870631
Involuntary context switches: 1154535
Swaps: 0
File system inputs: 0
File system outputs: 11777280
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The observed improvement is about 5% of total wall clock time
for `dev-build` target.
Also, all commits make sure that headers stay self-sufficient,
which would help to further improve the situation in the future.
"
* 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla:
transport: remove extraneous `qos/service_level_controller` includes from headers
treewide: remove evidently unneded storage_proxy includes from some places
service_level_controller: remove extraneous `service/storage_service.hh` include
sstables/writer: remove extraneous `service/storage_service.hh` include
treewide: remove extraneous database.hh includes from headers
treewide: reduce boost headers usage in scylla header files
cql3: remove extraneous includes from some headers
cql3: various forward declaration cleanups
utils: add missing <limits> header in `extremum_tracking.hh`
No code left that uses these globals, so rip them altogether. Also
drop the former messaging init/uninit methods that are now only
setting up those globals.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Storage service calls a bunch of do_something_with_repair() methods. All
of them need the local repair_service and the only way to get it is by
keeping it on storage service.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The repair service needs database, migration manager, messaging and
sys-dist-ks + view-update-generator pair. Put all these guys on it
in advance and equip the service with getters for future use.
Some dependencies are sharded because they are used in cross-shard
context and needs more care to be cleaned out later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>