"
There seem to be two problems with handling snapshot API -- one
on start and the other one on stop. Here's the set that addresses
both.
The fix moved snapshot API registration later in time that required
Amnon's ACK. Now we have it :) so -- the rebase and resend.
Tests: unit(dev), start-stop
"
* 'br-snapshot-bugs-2' of https://github.com/xemul/scylla:
snapshot: Pass requests through gate
api: Register snapshot API later
api: Unwrap wrap_ks_cf
"
With these changes and a binutils compiled with
--enable-deterministic-archives, the only difference I get in the
build directory if I build scylla twice from scratch are:
* The various CMakeError.log because they have temporary file names.
* The various CMakeOutput.log for the same reason.
* .ninja_log and .ninja_deps. I am not sure what the contents are.
"
* 'espindola/fix-determinism' of https://github.com/espindola/scylla:
build: remove timestamps from then antlr output
build: Make the output of idl-compiler deterministic
This depends on the patch
mk: avoid combining -r and -export-dynamic linker options
being added to dpdk.
I benchmarked this on top of my patches to get a reproducible build. I
first compiled with ccache, deleted the build directory and recompiled
so that all the "gcc -c" invocations were served by ccache. The times
of the second "ninja release" invocations were:
lld:
ninja release 155.68s user 71.89s system 2077% cpu 10.953 total
gold:
ninja release 953.79s user 254.71s system 2533% cpu 47.699 total
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200127171516.26268-1-espindola@scylladb.com>
> memory: add scoped_heap_profiling
> build: add switch to enable heap profiling support
> io_tester: do not abort on end of test
> resource: clean up cgroups version determination.
> prometheus: Silence a bogus gcc warning in http server
> Update dpdk submodule
> resource: Support cgroups v2
> net: Don't use variable length arrays
> core/memory.hh: document set_heap_profiling_enabled()
> Revert "net: Don't use variable length arrays"
> cmake: fix pkgconfig boost deps
> thread: Avoid confusing comment by switching value
> net: posix-stack: fix allocator in ap listening sockets
> net: posix-stack: fix passing allocator to new sockets
> stall_detector: Add a counter for stall detector report
> Merge "Don't use variable length arrays" from Rafael
> treewide: fix minor issues reported by clang
> thread: Call mprotect in make_stack
> thread: Always allocate stack with aligned_alloc
> build: Make SEASTAR_THREAD_STACK_GUARDS private
> thread: Move code out of a header
Merged patch series from Konstantin Osipov:
This series sets cql_repl core count to 1 and adds LWT
unit tests.
test.py: invoke cql_repl with smp=1
lwt: add lightweight transactions unit tests
Merged patch series from Piotr Sarna:
In order to minimize the usage of throws and catches in code paths
that are potentially hot, these paths instead return appropriate errors
directly.
The server layer is still able to catch and translate errors,
but the preferred way is to return api_error directly in places
that may be performance-sensitive.
Tests: alternator-test(local)
Fixes#5472
Piotr Sarna (3):
alternator: change request return type to variant<value, error>
alternator: elide throwing in condition checks
alternator: replace top-level throws with returns in executor
alternator/executor.hh | 28 ++++----
alternator/server.hh | 4 +-
alternator/executor.cc | 141 +++++++++++++++++++++--------------------
alternator/server.cc | 44 ++++++++-----
4 files changed, 117 insertions(+), 100 deletions(-)
Exclude cql_repl from the list of tests, since it's not a test.
Build it as a separate app. Do not strip, so that any CQL test
failure is easy to debug without a rebuild.
All test-related targets are converted from lists to sets to avoid
quadratic lookup cost in the check inside the loop which creates the
ninja file.
Currently --ami does not check instance types, creates invalid
io_properties.yaml on unsupported instance types.
It actually won't occur on AMI startup, since scylla_ami_setup only
invoke scylla_io_setup --ami when the instance is supported, so we don't
get the issue on startup, but we still get when we run scylla_io_setup
manually.
It's better to check instance type on scylla_io_setup, too.
Refs #5438
Conditional updates inform the user that the condition is not met
by returning an error. An initial implementation was based on rethrowing
these errors, but returning them directly is considered better
for performance.
This will prevent contention in case of parallel updates of the same row
by the same coordinator. The patch does it by introducing a new per key
lock map and taking it before running PAXOS protocol (either for write
of for read).
Message-Id: <20200117101228.GA14816@scylladb.com>
In order to minimize the use of exceptions during normal operations,
each request handler is now able to return either a proper JSON value,
or an instance of api_error, which indicates that something went wrong,
but without having to throw, catch and rethrow C++ exceptions.
This is especially important for conditional updates, since it's
expected to be common to return ConditionalCheckFailedException.
Message-Id: <d8996a0a270eb0d9db8fdcfb7046930b96781e69.1579515640.git.sarna@scylladb.com>
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.
Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).
I checked that --pids-limit is supported by old versions of
docker and by podman.
Fixes#5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>
"
Final set of changes for full scan metrics.
- allow filtering
- full scan (Note: non-system tables only)
- full scan without BYPASS CACHE option
- tests for all metrics (bypass cache, allow filtering, full scan)
- works with prepared statements (tested, too)
"
* 'as_full_scan_metrics' of https://github.com/alecco/scylla:
Range scan query counter
Counter of queries doing full scan.
ALLOW FILTERING query counter
These unit tests cover all CQL aspects of lightweight transactions,
such as grammar, null semantics, batch semantics, result set
format, and so on.
For now, comment out unicode tests: test output depends
on libjsoncpp version in use.
When the scylla process is stopped no code waits for
current snapshot operations to finish. Also, the API
server is not stopped either, so new snapshot requests
can creep into.
In seastar there's a useful abstraction to address both.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In storage_service's snapshot code there are checks for
_operation_mode being _not_ JOINING to proceed. The intention
is apparently to allow for snapshots only after the cluster
join. However, here's how the start-up code looks like
- _operation_mode = STARTING in storage_service::constructor
- snapshot API registered in api::set_server_storage_service
- _operation_mode = JOINING in storage_service::join_token_ring
So in between steps 2 and 3 snapshots can be taken.
Although there's a quick and simple fix for that (check for the
_operation_mode to be not STARTING either) I think it's better
to register the snapshot API later instead. This will help
greatly to de-bload the storage_service, in particular -- to
incapsulate the _operation_mode properly.
Note, though the check for _operation_mode is made only for
taking snapshot, I move all snapshot ops registration to the
later phase.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is preparation for the next patch -- the lambda in
question (and the used type) will be needed in two
functions, so make the lambda a "real" function.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.
Fixes#5641
The method view_info::partition_ranges() is unused.
Also drop the now-dead _partition_ranges data member.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
This is a fourth iteration of the patch series adding LWT usage
(instead of the old naive - and wrong - read before write) to
Alternator, as well as full support for the ConditionExpression
syntax for conditional updates.
Changes in v4:
* Rebased to most recent master
* Replaced 3 booleans which had 2^3 = 8 theoretical combinations,
by just 4 options in enum write_isolation:
FORBID_RMW, LWT_ALWAYS, LWT_RMW_ONLY, UNSAFE_RMW
The four options are described in details comments.
* Fix reversed assertion in FORBID_RMW case.
* Two new metrics: write_using_lwt and shard_bounce_for_lwt.
* Fail boot if alternator is enabled, but LWT isn't.
* Add information about enabling LWT in docs/alternator/alternator.md
* nyh/v4-lwt:
alternator: add support for ConditionExpression
alternator: reimplement read-modify-write operations using LWT
alternator: make "executor" a peering_sharded_service
Implements a counter of executions of SELECT queries with ALLOW FILTERING option.
In scope of #5209
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
This patch adds support for the ConditionExpression parameter of the
item-writing operations in Alternator: PutItem, UpdateItem and DeleteItem.
We already supported conditional updates/put/delete using the "Expected"
parameter. The ConditionExpression parameter implemented here provides a
very similar feature, using a different - and also newer and more powerful -
syntax.
The implementation here reuses much of our existing expression-parsing
infrastructure. Unsurprisingly, ConditionExpression's syntax has much in
common with UpdateExpression which we already support) and also many of the
comparison functions already implemented for "Expected". However, it's still
quite a bit of new code, because of the many different comparisons, functions,
and syntax variations we need to support.
This patch also expands alternator-test/test_condition_expression.py with
a few additional corner cases discovered during the development of this
patch.
Almost all of the tests for this feature (35 out of 39) now pass.
Two tests still fail because we don't yet support nested attributes (this
is a missing feature across Alternator), and two tests fail because of minor
ideosyncracies in DynamoDB's error path that we chose not to duplicate
yet (but still remember the difference in the form of an xfailing test).
Fixes#5035
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In this patch, we re-implement the three read-modify-write operations -
PutItem, UpdateItem, DeleteItem. All three operations may need to read the
item before writing it to support conditional updates (the "Expected"
parameter) and UpdateItem may also need the previous item's value for
its update expression (e.g., a user may ask to "set a=a+1" or "set a=b").
Before this patch, the implementation of RMW operations simply did a read,
and then a write - without any attempt to protect concurrent operations.
In this patch, Scylla's LWT mechanism (storage_proxy::cas()) is used
instead, to ensure that concurrent update operations are correctly
isolated even if they are conditional. This means that Alternator now
requires the experimental LWT feature to be enabled (and refuses to
boot if it isn't).
The version presented here is configured to always use LWT for *every*
write, regardless of whether it has a condition or not. So it will
will significantly slow down write-only workloads like YCSB. But the code
in this patch actually includes three other modes, which can be chosen by
setting an enum constant in the code. In the future we will want to let the
user configure this mode, globally, per table or per attribute.
Note that read requests are NOT modified, and work exactly as they did
before: i.e., strongly-consistent reads are done using a normal
CL=LOCAL_QUORUM read - not via LWT. I believe this is good enough given
Dynamo's guarantees, and critical for our read performance.
Also note that patch doesn't yet fix the BatchWriteItem operation.
Although BatchWriteItem does not support any RMW operations - just pure
writes - we may still need to do those pure writes using LWT. This
should be fixed in a follow-up patch.
Unfortunately, this patch involves a large amount of code movement and
reorganization, because:
1. The cas operation requires each operation to be made into an object,
with a separate apply() function, forcing a lot of code to move.
2. Moreover, we need to do this for three different operations (PutItem,
UpdateItem, DeleteItem) so to avoid massive code duplication, I had
to move some common code.
3. The cas operation also forced us to change some of the utility functions'
APIs.
The end result is that this patch focuses more on a compact and
understandable *end result* than it does on an easy to understand *patch*,
so reviewers - sorry about that.
All alternator-test/ tests pass with this patch (and also with all of the
different optional modes enabled). However, other than that, I did not yet
do any real isolation tests (are concurrent operations really isolated
correctly? or is LWT just faking it? :-) ), performance tests or stress
tests - and I'll definitely need to do those as well.
Fixes#5054
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Alternator uses a sharded<executor> for handling execution of Alternator
requests on different shards. In this patch we make executor a subclass of
peering_sharded_service, to allow one of these executors to run an exector
method on a different shard: Any one of the shard-local executor instances
can call container() to get the full sharded<executor>.
We will need this capability later, when we need to bounce requests between
shards because of requirements of the storage_proxy::cas (LWT) code.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The output of antrl always has the timestamp of when it was
created. This expands the existing sed hack to remove that too.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
If at any point during the topological sort we had more than one node
with zero dependencies, the order they were printed was not
deterministic.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
"
This series refactors the code used by migration_notifier and gossiper
into an atomic_vector type.
"
* 'espindola/gossiper_atomic_vector' of https://github.com/espindola/scylla:
gossiper: Store subscribers in an atomic_vector
load_broadcaster: Unregister from load_broadcaster::stop_broadcasting
repair: add row_level::stop()
locator: Return future from i_endpoint_snitch::reload_gossiper_state
service: Refactor code into a atomic_vector class
migration_manager: Fix typo
load_meter: Use a shared_ptr to store a load_broadcaster
The new guarantees are a bit better IMHO:
Once a subscriber is removed, it is never notified. This was not true
in the old code since it would iterate over a copy that would still
have that subscriber.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This templates the code for listener_vector, renames it to
atomic_vector and moves it to the utils directory.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
load_broadcaster::stop_broadcasting uses shared_from_this(). Since
that is the only reference that the produced shared_ptr knows of, it
is deleted immediately. Fix that by also using a shared_ptr in
load_meter.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
* seastar afc46681...147d50b1 (6):
> perftune.py: Use safe_load() for fix arbitrary code execution
Fixes#5630
> clang: current_exception_as_future must be in namespaced
> tests: add an expected failures version of thread fixture
> Enable stack guards in Dev builds
> net: posix: Introduce load_balancing_algorithm::fixed
> stream: Move _next from subscription to stream
`parsed_statement::get_bound_variables` is assumed to always
return a nonnull pointer to `variable_specifications` instance.
In this case using a pointer is superfluous and can be safely
replaced by a plain reference.
Also add a default ctor and a utility method `set_bound_variables`
to the `variable_specifications` class to actually reset the
contents of the class instance.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200120195839.164296-1-pa.solodovnikov@scylladb.com>
"
As a followup to 0bde590
This series implements suggestions from @avikivity and @espindola
It simplifies the template definitions for accumulator_for,
adds some debug logging for the overflow values,
and adds unit tests for float and double sum overflow.
Test: unit(dev),
paging_test:TestPagingWithIndexingAndAggregation.test_filter_{indexed,non_indexed,pk}_column(dev)
"
* 'simplify-sum-overflow' of https://github.com/bhalevy/scylla:
test: cql_query_test: test float/double sum overflow
cql3: aggregate_fcts: simplify accumulator_for template definitions
It is the only place where get_changed_ranges_for_leaving is not running
inside a thread. Preparing patch to futurize get_changed_ranges_for_leaving.
Refs: #5495
A mistake in handling legacy checks for special 'idx_token' column
resulted in not recognizing materialized views backing secondary
indexes properly. The mistake is really a typo, but with bad
consequences - instead of checking the view schema for being an index,
we asked for the base schema, which is definitely not an index of
itself.
Branches 3.1,3.2 (asap)
Fixes#5621Fixes#4744
Before this patch the iterations over migration_notifier::_listeners
could race with listeners being added and removed.
The addition side is not modified, since it is common to add a
listener during construction and it would require a fairly big
refactoring. Instead, the iteration is modified to use indexes instead
of iterators so that it is still valid if another listener is added
concurrently.
For removal we use a rw lock, since removing an element invalidates
indexes too. There are only a few places that needed refactoring to
handle unregister_listener returning a future<>, so this is probably
OK.
Fixes#5541.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200120192819.136305-1-espindola@scylladb.com>
"
This series fixes all abandoned failed futures in cql_query_test and
starts running it with --fail-on-abandoned-failed-futures to avoid
regressions.
"
* 'espindola/fix-abandoned-failed-futures' of https://github.com/espindola/scylla:
cql_query_test: Avoid new abandoned failed futures
cql_query_test: Explicitly ignore a failed future
cql_query_test: Remove duplicated do_with_cql_env_thread
cql_query_test: Fix cql and values in test_int_sum_with_cast