Mutation sources will soon require a valid permit so make sure we have
one and pass it to the mutation sources when creating the underlying
readers.
For now, pass no_reader_permit() on call sites, deferring the obtaining
of a valid permit to later patches.
This contains a reader concurrency semaphore for the tests, that they
can use to obtain a valid permit for reads. Soon we are going to start
working towards a point where all APIs taking a permit will require a
valid one. Before we start this work we must ensure test code is able to
obtain a valid permit.
We want to make `read_permit` the single interface through which reads
interact with the concurrency limiting mechanism. So far it was only
usable to track memory consumption. Add the missing `wait_admission()`
and `consume_resources()` to the permit API. As opposed to
`reader_concurrency_semaphore::` equivalents which returned a
permit, the `reader_permit::` variants jut return
`reader_permit::resource_units` which is an RAII holder for the acquired
units. This also allows for the permit to be created earlier, before the
reader is admitted, allowing for tracking pre-admission memory usage as
well. In fact this is what we are going to do in the next patches.
This patch also introduces a `broken()` method on the reader concurrency
semaphore which resolves waiters with an exception. This method is also
called internally from the semaphore's destructor. This is needed
because the semaphore can now have external waiters, who has to be
resolved before the semaphore itself is destroyed.
We want to refactor reader_permit::memory_units to work in terms of
reader_resources, as we are planning to use it for guarding count
resources as well. This patch makes the first step: renames it from
memory_units to resources_units. Since this is a very noisy change, we
do it in a separate patch, the semantic change is in the next patch.
Tenants get their own connections for statement verbs and are further
isolated from each other by different scheduling groups. A tenant is
identified by a scheduling group and a name. When selecting the client
index for a statement verb, we look up the tenant whose scheduling group
matches the current one. This scheduling group is persisted across the
RPC call, using the name to identify the tenant on the remote end, where
a reverse lookup (name -> scheduling group) happens.
Instead of a single scheduling group to be used for all statement verbs,
messaging_service::scheduling_config now contains a list of tenants. The
first among these is the default tenant, the one we use when the current
scheduling group doesn't match that of any configured tenant.
To make this mapping easier, we reshuffle the client index assignment,
such that statement and statement-ack verbs have the idx 2 and 3
respectively, instead of 0 and 3.
The tenant configuration is configured at message service construction
time and cannot be changed after. Adding such capability should be easy
but is not needed for query classification, the current user of the
tenant concept.
Currently two tenants are configured: $user (default tenant) and
$system.
Per-user SLA means we have connection classifications determined dynamically,
as SLAs are added or removed. This means the classification information cannot
be static.
Fix by making it a non-static vector (instead of a static array), allowing it
to be extended. The scheduling group member pointer is replaced by a scheduling
group as a member pointer won't work anymore - we won't have a member to refer
to.
On the client side, we supply an isolation cookie based on the connection index
On the server side, we convert an isolation cookie back to a scheduling_group.
This has two advantages:
- rpc processes the entire connection using the scheduling group, so that code
is also isolated and accounted for
- we can later add per-user connections; the previous approach of looking at the
verb to decide the scheduling_group doesn't help because we don't have a set of
verbs per user
With this, the main group sees <0.1% usage under simple read and write loads.
Move it from a function-local static to a class static variable. We will want
to extend it in two ways:
- add more information per connection index (like the rpc isolation cookie)
- support adding more connections for per-user SLA
As a first step, make it an array of structures and make it accessible to all
of messaging_service.
In the next patches we will match reads to the appropriate reader
concurrency semaphore based on the scheduling group they run in. This
will result in a lot of system reads that are executed during startup
and that were up to now (incorrectly) using the user read semaphore to
switch to the system read semaphore. This latter has a much more
constrained concurrency, which was observed to cause system reads to
saturate and block on the semaphore, slowing down startup.
To solve this, boost the concurrency of the system read semaphore during
startup to match that of the user semaphore. This is ok, as during
startup there are no user reads to compete with. After startup, before
we start serving user reads the concurrency is reverted back to the
normal value.
The QueryFilter parameter of Query is only partially implemented (issue
tests for it.
In this patch, we add comprehensive tests for this feature and all its
various operators, types, and corner cases. The tests cover both the
parts we already implemented, and the parts we did not yet.
As usual, all tests succeed on DynamoDB, but many still xfail on Alternator
pending the complete implementation.
Refs #5028.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200525141242.133710-1-nyh@scylladb.com>
test_compaction_with_multiple_regions() has two calls to std::shuffle(),
one using std::default_random_engine() has the PRNG, but the other, later
on, using the std::random_device directly. This can cause failures due to
entropy pool exhaustion.
Fix by making the `random` variable refer to the PRNG, not the random_device,
and adjust the first std::shuffle() call. This hides the random_device so
it can't be used more than once.
Message-Id: <20200527124247.2187364-1-avi@scylladb.com>
Boost test macros are not safe to use in multiple shards (threads).
Doing so will result in their output being interwoven, making it
unreadable and generating invalid XML test reports. There was a lot of
back-and-forth on how to solve this, including introducing thread-safe
wrappers of the boost test macros, that use locks. This patch does
something much simple: it defines a bunch of replacement utility
functions for the used macros. These functions use the thread safe
seastar logger to log messages and throw exceptions when the
test has to be failed, which is pretty much what boost test does too.
With this the previously seen complaint about invalid XML is gone.
Example log messages from the utility functions:
DEBUG 2020-05-27 13:32:54,248 [shard 1] testlog - check_equal(): OK @ validate_result() test/boost/multishard_mutation_query_test.cc:863: ckp{0004fe57c8d2} == ckp{0004fe57c8d2}
DEBUG 2020-05-27 13:32:54,248 [shard 1] testlog - require(): OK @ validate_result() test/boost/multishard_mutation_query_test.cc:855
Fixes: #4774
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200527104426.176342-1-bdenes@scylladb.com>
Boost test uses colored output by default, even when the output of the
test is redirected to a file. This makes the output quite hard to read
for example in Jenkins. This patch fixes this by disabling the colored
output when stdout is not a tty. This is in line with the colored output
of configure.py itself, which is also enabled only if stdout is a tty.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200526112857.76131-1-bdenes@scylladb.com>
seastar::apply() is deprecated in recent versions of seastar in favor
of std::apply(), so stop including its header. Calls to unqualified
apply(..., std::tuple<>) are resolved to std::apply() by argument
dependent lookup, so no changes to call sites are necessary.
This avoids a huge number of deprecation warnings with latest seastar.
Message-Id: <20200526090552.1969633-1-avi@scylladb.com>
We had to wait many years for it, but finally we have a starts_with()
method in C++20. Let's use it instead of ugly substr()-based code.
This is probably not a performance gain - substr() for a string_view
was already efficient. But it makes the code easier to understand,
and it allows us to rejoice in our decision to switch to C++20.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200526185812.165038-2-nyh@scylladb.com>
In commit cb7d3c6b55 we started to check
if two base64-encoded strings begin with each other without decoding
the strings first.
However, we missed the check_BEGINS_WITH function which does the same
thing. So this patch fixes this function as well.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200526185812.165038-1-nyh@scylladb.com>
"
In several tests we were calling random_device::operator() in a tight
loop. This is a slow operation, and in gcc 10 can fail if called too
frequently due to a bug [1].
Change to use a random_engine instead, seeded once from the
random_device.
Tests: unit (dev)
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087
"
* 'entropy' of git://github.com/avikivity/scylla:
tests: lsa_sync_eviction_test: don't exhaust random number entropy
tests: querier_cache_test: don't exhaust random number entropy
tests: loading_cache_test: don't exhaust random number entropy
tests: dynamic_bitset_test: don't exhaust random number entropy
This gives us access to std::ranges, the spaceship operator, and more.
Note coroutines are not yet enabled (these require g++ -fcoroutines) as
we are still working our problem with address santizer support.
Tests: unit (dev, debug, release)
Message-Id: <20200521092157.1460983-1-avi@scylladb.com>
In python, `is` and `is not` checks object identity, not value
equivalence, yet in `idl-compiler.py` it is used to compare strings.
Newer python versions (that shipped in Fedora32) complains about this
misuse, so this patch fixes it.
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200526091811.50229-1-bdenes@scylladb.com>
Alternator supports four ways in which write operations can use quorum
writes or LWT or both, which we called "write isolation policies".
Until this patch, Alternator defaulted to the most generally safe policy,
"always_use_lwt". This default could have been overriden for each table
separately, but there was no way to change this default for all tables.
This patch adds a "--alternator-write-isolation" configuration option which
allows changing the default.
Moreover, @dorlaor asked that users must *explicitly* choose this default
mode, and not get "always_use_lwt" without noticing. The previous default,
"always_use_lwt" supports any workload correctly but because it uses LWT
for all writes it may be disappointingly slow for users who run write-only
workloads (including most benchmarks) - such users might find the slow
writes so disappointing that they will drop Scylla. Conversely, a default
of "forbid_rmw" will be faster and still correct, but will fail on workloads
which need read-modify-write operations - and suprise users that need these
operations. So Dor asked that that *none* of the write modes be made the
default, and users must make an informed choice between the different write
modes, rather than being disappointed by a default choice they weren't
aware of.
So after this patch, Scylla refuses to boot if Alternator is enabled but
a "--alternator-write-isolation" option is missing.
The patch also modifies the relevant documentation, adds the same option to
our docker image, and the modifies the test-running script
test/alternator/run to run Scylla with the old default mode (always_use_lwt),
which we need because we want to test RMW operations as well.
Fixes#6452
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200524160338.108417-1-nyh@scylladb.com>
The format is currently sitting in storage_service, but the
previous set patched all the users not to call it, instead
they use sstables_manager to get the highest supported format.
So this set finalizes this effort and places the format on
sstables_manager(s).
The set introduces the db::sstables_format_selector, that
- starts with the lowest format (ka)
- reads one on start from system tables
- subscribes on sstables-related features and bumps
up the selection if the respective feature is enabled
During its lifetime the selector holds a reference to the
sharded<database> and updates the format on it, the database,
in turn, propagates it further to sstables_managers. The
managers start with the highest known format (mc) which is
done for tests.
* https://github.com/xemul/scylla br-move-sstables-format-4:
storage_service: Get rid of one-line helpers
system_keyspace: Cleanup setup() from storage_service
format_selector: Log which format is being selected
sstables_manager: Keep format on
format_selector: Make it standalone
format_selector: Move the code into db/
format_selector: Select format locally
storage_service: Introduce format_selector
storage_service: Split feature_enabled_listener::on_enabled
storage_service: Tossing bits around
features: Introduce and use masked features
features: Get rid of per-features booleans
Instead of waiting for all replicas to reply execute prune after quorum
of replicas. This will keep system.paxos smaller in the case where one
node is down.
Fixes#6330
Message-Id: <20200525110822.GC233208@scylladb.com>
* seastar ee516b1c...37774aa7 (12):
> task: specify the default constructor as noexcept
> scheduling: scheduling_group: specify explicit constructor as noexcept
> net: tcp: use var after std::move()ed
> future: implement make_exception_future_with_backtrace
> future: Add noexcept to a few functions
> scheduling: Add noexcept to a couple of functions
> future: Move current_exception_as_future out of internal
> future: Avoid a call to std::current_exception
> seastar.hh: fix typo in doxygen main page text
> future: Replace a call to futurize_apply with futurize_invoke
> rpc: document how isolation work
> future: Optimize any::move_it
rt is moved before rt.tomb.timestamp is retrieved, so there's a
something that looks like use-after-move here (but really isn't).
found it while auditting the code.
[avi: adjusted changelog to note that it's not really a use-after-move]
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200525141047.168968-1-raphaelsc@scylladb.com>
Merge pull request https://github.com/scylladb/scylla/pull/6484 by
Kamil Braun:
Allow a node to join without bootstrapping, even if it couldn't contact
other nodes.
Print a BIG WARNING saying that you should never join nodes without
bootstrapping (by marking it as a seed or using auto_bootstrap=off).
Only the very first node should (must) be joined as a seed.
If you want to have more seeds, first join them using the only supported
way (i.e. bootstrap them), and only AFTER they have bootstrapped, change
their configuration to include them in the seed list.
Does not fix, but closes#6005. Read the discussion: it's enlightening.
See scylladb/scylla-docs#2647 for the correct procedure of joining a node.
Reverts 7cb6ac3.
The tests for the contains() operator of FilterExpression were based on
an incorrect understanding of what this operator does. Because the tests
were (as usual) run against DynamoDB and passed, there was nothing wrong
in the test per se - but it contains comments based on the wrong
understanding, and also various corner cases which aren't as interesting
as I thought (and vice versa - missed interesting corner cases).
All these tests continue to pass on DynamoDB, and xfail on Alternator
(because we didn't implement FilterExpression yet).
Refs #5038.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200525123812.131209-1-nyh@scylladb.com>
Same as 9d91ac345a, drop dependency on pystache
since it nolonger present in Fedora 32.
To implement it, simplified debian package build process.
It will be generate debian/ directory when building relocatable package,
we just need to run debuild using the package.
To generate debian/ directory this commit added debian_files_gen.py,
it construct whole directory including control and changelog files
from template files.
Since we need to stop pystache, these template files swiched to
string.Template class which is included python3 standard library.
see: https://github.com/scylladb/scylla/pull/6313
This patch cleans the estimated histogram implementation.
It removes the FIXME that were left in the code from the migration time
and the if0 commented out code.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
We call shuffle() with a random_device, extracting a true random
number in each of the many calls shuffle() will invoke.
Change it to use a random_engine seeded by a random_device.
This avoids exhausting entropy, see [1] for details.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087
rand_int() re-creates a random device each time it is called.
Change it to use a static random_device, and get random numbers
from a random_engine instead of from the device directly.
This avoids exhausting entropy, see [1] for details.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087
rand_int() re-creates a random device each time it is called.
Change it to use a static random_device, and get random numbers
from a random_engine instead of from the device directly.
This avoids exhausting entropy, see [1] for details.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087
tests_random_ops() extracts a real random number from a random_device.
Change it to use a random number engine.
This avoids exhausting entropy, see [1] for details.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087
Make the database be the format_selector target, so
when the format is selected its set on database which
in turn just forwards the selection into sstables
managers. All users of the format are already patched
to read it from those managers.
The initial value for the format is the highest, which
is needed by tests. When scylla starts the format is
updated by format_selector, first after reading from
system tables, then by selectiing it from features.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Remove the selector from storage_service and introduce
an instance in main.cc that starts soon after the gossiper
and feature_service, starts listening for features and
sets the selected format on storage_service.
This change includes
- Removal of for_testing bit from format_selector constructor,
now tests just do not use it
- Adding a gate to selection routine to make sure on exit all
the selection stuff is done. Although before the cluster join
the selector waits for the feature listeners to finish (the
.sync() method) this gate is still required to handle aborted
start cases and wait for gossiper announcement from selector
to complete.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now format_selector uses storage_service as a place to
keep the selected format. Change this by keeping the
selected format on selector itself and after selection
update one on the target.
The selector starts with the lowest format to maybe bumps
it up later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The final goal is to have a entity that will
- read the saved sstables format (if any)
- listen for sstables format related features enabling
- select the top-most format
- put the selected format onto a "target"
- spread the world about it (via gossiper)
The target is the service from which the selected format is
read (so the selector can be removed once features agreement
is reached). Today it's the storage_service, but at the end
of this series it will be sstables_manager.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The split is into two parts, the goal is to move the 2nd one (the
selection logic itself) into another class.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The goal is to have main.cc add code between prepare_to_join
and join_token_ring. As a side effect this drives us closer
to proper split of storage service into sharded service itslef
vs start/boot/join code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Nowadays the knowledge about known/supported features is
scattered between frature_service and storage_service. The
latter uses knowledge about the selected _sstables_format
to alter the "supported" set.
Encapsulate this knowledge inside the feature_service with
the help of "masked_features" -- those, that shouldn't be
advertized to other nodes. When only maskable feature for
today is the UNBOUNDED_RANGE_TOMBSTONES one. Nowadays it's
reported as supported only if the sstables format is MC.
With this patch it starts as masked and gets unmasked when
the sstables format is selected to be MC, so the change is
correct.
This will make it possible to move sstables_format from
storage service to anywhere else.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The set of bool enable_something-s on feature_fonfig duplicates
the disabled_features set on it, so remove the former and make
full use of the latter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We had a very limited set of tests for the KeyConditions feature of
Query, which some error cases as well as important use cases (such as
bytes keys), leading to bugs #6490 and #6495 remaining undiscovered.
This patch adds a comprehensive test for the KeyConditions and (hopefully)
all its different combinations of operators, types, and many cases of errors.
We already had a comprehensive test suite for the newer
KeyConditionsExpression syntax, and this patch brings a similar level of
coverage for the older KeyConditions syntax.
Refs #6490
Refs #6495
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200524141800.104950-3-nyh@scylladb.com>
Improve error messages coming from Query's KeyCondition parameter when
wrong ComparisonOperators were used (issue discovered by @Orenef11).
At one point the error message was missing a parameter so resulted in an
internal error, while in another place the message mentioned an unuseful
number (enum) for the operator instead of its name. This patch fixes these
error messages.
Fixes#6490
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200524141800.104950-2-nyh@scylladb.com>