"
Working on database.hh or any header that is included in database.hh
(of which there is a lot), is a major pain as each change involves the
recompilation of half of our compilation units.
Reduce the impact by removing the `#include "database.hh"` directive
from as many header files as possible. Many headers can make do with
just some forward declarations and don't need to include the entire
headers. I also found some headers that included database.hh without
actually needing it.
Results
Before:
$ touch database.hh
$ ninja build/release/scylla
[1/154] CXX build/release/gen/cql3/CqlParser.o
After:
$ touch database.hh
$ ninja build/release/scylla
[1/107] CXX build/release/gen/cql3/CqlParser.o
"
* 'reduce-dependencies-on-database-hh/v2' of https://github.com/denesb/scylla:
treewide: remove include database.hh from headers where possible
database_fwd.hh: add keyspace fwd declaration
service/client_state: de-inline set_keyspace()
Move cache_temperature into its own header
Many headers don't really need to include database.hh, the include can
be replaced by forward declarations and/or including the actually needed
headers directly. Some headers don't need this include at all.
Each header was verified to be compilable on its own after the change,
by including it into an empty `.cc` file and compiling it. `.cc` files
that used to get `database.hh` through headers that no longer include it
were changed to include it themselves.
Different nodes can concurrently create the distributed system
keyspace on boot, before the "if not exists" clause can take effect.
However, the resulting schema mutations will be different since
different nodes use different timestamps. This patch forces the
timestamps to be the same across all nodes, so we save some schema
mismatches.
This fixes a bug exposed by ca5dfdf, whereby the initialization of the
distributed system keyspace is done before waiting for schema
agreement. While waiting for schema agreement in
storage_service::join_token_ring(), the node still hasn't joined the
ring and schemas can't be pulled from it, so nodes can deadlock. A
similar situation can happen between a seed node and a non-seed node,
where the seed node progresses to a different "wait for schema
agreement" barrier, but still can't make progress because it can't
pull the schema from the non-seed node still trying to join the ring.
Finally, it is assumed that changes to the schema of the current
distributed system keyspace tables will be protected by a cluster
feature and a subsequent schema synchronization, such that all nodes
will be at a point where schemas can be transferred around.
Fixes#3976
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181211113407.20075-1-duarte@scylladb.com>
Returning "auto" makes it harder to understand what the function is returning,
and impossible to de-inline.
Return a vector of pointers instead. The caller should iterate immediately, in
any case, and since the previous return value was a range of references to const
unique_ptrs, nothing else could be done with it anyway.
The extensions class reaches into cql's property_definitions class to grab
a map<sstring, sstring> type. This generates a few unneeded dependencies.
Reduce dependencies by defining the map type ourselves; if cql's property_definitions
changes in an incompatible way, it will have to adapt, rather than the extensions
class.
db::config is a global class; changes in any module can cause changes
in db::config. Therefore, it is a cause of needless recompilation.
Remove some of these dependencies by having consumers of db::config
declare an intermediate config struct that is contains only
configuration of interest to them, and have their caller fill it out
(in the case of auth, it already followed this scheme and the patchset
only moves the translation function).
In addition, some outright pointless inclusions of db/config.hh are
removed.
The result is somewhat shorter compile times, and fewer needless
recompiles.
* https://github.com/avikivity/scylla unconfig-1/v1:
config: remove inclusions of db/config.hh from header files
repair: remove unneeded config.hh inclusion
batchlog_manager: remove dependency on db::config
auth: remove permissions_cache dependency on db::config
auth: remove auth::service dependency on db::config
auth: remove unneeded db/config.hh includes
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.
Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.
perf_simple_query -c4 --duration 60, medians:
./perf_before ./perf_after diff
read 343086.80 360720.53 5.1%
Tests: unit(release, small_vector in debug)
"
* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
partition_slice: use small_vector for column_ids
mutation_fragment_merger: use small_vector
auth: use small_vector in resource
auth: avoid list-initialisation of vectors
idl: serialiser: add serialiser for utils::small_vector
idl: serialiser: deduplicate vector serialisers
utils: introduce small_vector
intrusive_set_external_comparator: make iterator nothrow move constructible
mutation_fragment_merger: value-initialise iterator
Refs #3929
When deleting a segment, IFF we have not yet filled up all reserves,
instead of actually deleting the file, put it on a "recycle" list.
Next segment allocation will instead of creating a new one simply
rename the segment and reuse the file and its allocated space.
We rename the file twice: Once on adding to recycle list, with special
prefix so we don't mix up actual replayable segments and these. Second
when we actually re-use the file (also to ensure consecutive names).
Note that we limit the amount of recyclables, so a really stressed
application which somehow fills up the replenish queue might
cause us to still drop the segments. Could skip this but risk
getting to many files on disk.
Replay should be safe, since all entries are guarded by CRC based
on the file ID (i.e. file name). Thus replaying a recycled segment
will simply cause a CRC error in the main header and be ignored (see
previous patch).
Segments that are fully synced will have terminating zero-header (see
previous patch) so we know when to stop processing a recycled file.
If a file is the result of a mid-write crash, we will generate a CRC
processing error as "normally" in this case, when hitting partially
written block or coming to an old/new chunk boundary.
v2:
* Sync dir on rename
* auto -> const sstring&
* Allow recycling files as long as we're within disk space limits
v3:
* Use special names for files waiting for reuse
Writes a final chunk header of zero to the file on close, to mark
end-of-segment.
This allows us to gracefully stop replay processing of a segment file
even if it was not zeroed from the beginning (maybe recycled - hint
hint).
When reading the header chunk of a commitlog file, check the stored id
value against the id derived from the file name, and ignore if
mismatched. This is a prerequisite for re-using renamed commitlog files,
as we can then fail-fast should one such be left on disk, instead of
trying to replay it.
We also check said id via the CRC check for each chunk parsed. If we
find a chunk with
mismatched id, we will get a CRC error for the chunk, and replay will
terminate (albeit not gracefully).
Extract configuration into a new struct batchlog_manager_config and have the
callers populate it using db::config. This reduces dependencies on global objects.
Instead, distribute those inclusions to .cc files that require them. This
reduces rebuilds when config.hh changes, and makes it easier to locate files
that need config disaggregation.
We would like to get rid of boost::filesystem and gradually replace it with
std::experimental::filesystem.
TODO: using namespace fs = std::experimental::filesystem,
use fs::path directly, rather than lister::path
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It has two unrelated users: cql for validation, and storage_proxy for
complicated calculations. Split the simple stuff into a new header to reduce
dependencies.
Currently if hints directory contains unexpected directories Scylla fails to
start with unhandled std::invalid_argument exception. Make the manager
ignore malformed files instead and try to proceed anyway.
Message-Id: <20181121134618.29936-2-gleb@scylladb.com>
We scan hints directory in two places: to search for files to replay and
to search for directories to remove after resharding. The code that
translates directory name to a shard is duplicated. It is simple now, so
not a bit issue but in case it grows better have it in one place.
Message-Id: <20181121134618.29936-1-gleb@scylladb.com>
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
Remove the timeout argument to
db::view::view_builder::wait_until_built(), a test-only function to
wait until a given materialized view has finished building.
This change is motivated by the fact that some tests running on slow
environments will timeout. Instead of incrementally increasing the
timeout, remove it completely since tests are already run under an
exterior timeout.
Fixes#3920
Tests: unit release(view_build_test, view_schema_test)
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181115173902.19048-1-duarte@scylladb.com>
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().
Mechanically converted with https://github.com/avikivity/unsprint.
When a node reshards (i.e., restarts with a different number of CPUs), and
is in the middle of building a view for a pre-existing table, the view
building needs to find the right token from which to start building on all
shards. We ran the same code on all shards, hoping they would all make
the same decision on which token to continue. But in some cases, one
shard might make the decision, start building, and make progress -
all before a second shard goes to make the decision, which will now
be different.
This resulted, in some rare cases, in the new materialized view missing
a few rows when the build was interrupted with a resharding.
The fix is to add the missing synchronization: All shards should make
the same decision on whether and how to reshard - and only then should
start building the view.
Fixes#3890Fixes#3452
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181028140549.21200-1-nyh@scylladb.com>
* seastar d152f2d...c1e0e5d (6):
> scripts: perftune.py: properly merge parameters from the command line and the configuration file
> fmt: update to 5.2.1
> io_queue: only increment statistics when request is admitted
> Adds `read_first_line.cc` and `read_first_line.hh` to CMake.
> fstream: remove default extent allocation hint
> core/semaphore: Change the access of semaphore_units main ctor
Due to a compile-time fight between fmt and boost::multiprecision, a
lexical_cast was added to mediate.
sprint("%s", var) no longer accepts numeric values, so some sprint()s were
converted to format() calls. Since more may be lurking we'll need to remove
all sprint() calls.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Limit message size according to the configuration, to avoid a huge message from
allocating all of the server's memory.
We also need to limit memory used in aggregate by thrift, but that is left to
another patch.
Fixes#3878.
Message-Id: <20181024081042.13067-1-avi@scylladb.com>
"
Hinted handoff should not overpower regular flows like READs, WRITEs or
background activities like memtable flushes or compactions.
In order to achieve this put its sending in the STEAMING CPU scheduling
group and its commitlog object into the STREAMING I/O scheduling group.
Fixes#3817
"
* 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla:
db::hints::manager: use "streaming" I/O scheduling class for reads
commitlog::read_log_file(): set the a read I/O priority class explicitly
db::hints::manager: add hints sender to the "streaming" CPU scheduling group
When messaging_service is started we may immediately receive a mutation
from another node (e.g. in the MV update context). If hinted handoff is not
ready to store hints at that point we may fail some of MV updates.
We are going to resolve this by start()ing hints::managers before we
start messaging_service and blocking hints replaying until all relevant
objects are initialized.
Refs #3828
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Hinting is allowed after "started" before "stopping".
Hints that attempted to be stored outside this time frame are going to
be dropped.
Refs #3828
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Introduce a multi-bit state field. In this patch it replaces the _stopping
boolean. We are going to add more states in the following patches.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
The space_watchdog enables or disables hints for the managers
associated with a particular device. We encapsulate this decision
inside the hints::managers by introducing the update_backlog()
function.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
A db::hints::resource_manager manages the resources for one or two
db::hints::managers. Each of these can be using the same or different
devices. The db::hints::space_watchdog periodically checks whether
each manager is within their resource allocation, and if not disables
it.
The watchdog iterates over the managers and accounts for the total
size they are using. This is wrong, since it can account in the same
variable the size consumed by managers using different devices.
We fix this while taking advantage of the fact that on_timer is now
called in the context of a seastar::thread, instead of using future
combinators.
Fixes#3821
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Registering a manager for a new device used
std::unordered_map::emplace(), which may not insert the specified
value if one with the same key has already been added. This could
happen if both managers were using the same device and the fiber
deferred in-between adding them.
Found during code reading. Could cause hints to not be disabled for an
overloaded manager.
Fixes#3822
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Disable the copy and move ctors and assignment operators for both the
hints::manager and the hints::resource_manager.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Make sure that read I/O in the context of HH sending do not overpower I/O
in the context of queries, memtable flushes or compactions.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>