* seastar 6b21197...2ebe842 (6):
> Merge "Various improvements to execution stages" from Paweł
> app-template: allow apps to specify a name for help message
> bool_class: avoid initializing object of incomplete type
> app-template: make sure we can still get help with required options
> prometheus: Http handler that returns prometheus 0.4 protobuf or text format
> Update DPDK to 17.02
Includes patch from Pawel to adjust to updated execution_stage interface.
This patch changes the migration path for table updates such that the
base table mutations are sent and applied atomically with the view
schema mutations.
This ensures that after schema merging, we have a consistent mapping
of base table versions to view table versions, which will be used in
later patches.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Metrics should have their unique name. This patch changes
throttled_writes of the queu lenght to current_throttled_writes.
Without it, metrics will be reported twice under the same name, which
may cause errors in the prometheus server.
This could be related to scylladb/seastar#250
Fixes#2163.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170314081456.6392-1-amnon@scylladb.com>
"If a node is bootstrapped with auto_boostrap disabled, it will not
wait for schema sync before creating global keyspaces for auth and
tracing. When such schema changes are then reconciled with schema on
other nodes, they may overwrite changes made by the user before the
node was started, because they will have higher timestamp.
To prevent that, let's use minimum timestamp so that default schema
always looses with manual modifications. This is what Cassandra does.
Fixes #2129."
* tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev:
db: Create default auth and tracing keyspaces using lowest timestamp
migration_manager: Append actual keyspace mutations with schema notifications
There is a workaround for notification race, which attaches keyspace
mutations to other schema changes in case the target node missed the
keyspace creation. Currently that generated keyspace mutations on the
spot instead of using the ones stored in schema tables. Those
mutations would have current timestamp, as if the keyspace has been
just modified. This is problematic because this may generate an
overwrite of keyspace parameters with newer timestamp but with stale
values, if the node is not up to date with keyspace metadata.
That's especially the case when booting up a node without enabling
auto_bootstrap. In such case the node will not wait for schema sync
before creating auth tables. Such table creation will attach
potentially out of date mutations for keyspace metadata, which may
overwrite changes made to keyspace paramteters made earlier in the
cluster.
Refs #2129.
"This series adds various optimisations to counter implementation
(nothing extreme, mostly just avoiding unnecessary operations) as well
as some missing features such as tracing and dropping timed out queries.
Performance was tested using:
perf-simple-query -c4 --counters --duration 60
The following results are medians.
before after diff
write 18640.41 33156.81 +77.9%
read 58002.32 62733.93 +8.2%"
* tag 'pdziepak/optimise-counters/v3' of github.com:cloudius-systems/seastar-dev: (30 commits)
cell_locker: add metrics for lock acquisition
storage_proxy: count counter updates for which the node was a leader
storage_proxy: use counter-specific timeout for writes
storage_proxy: transform counter timeouts to mutation_write_timeout_exception
db: avoid allocations in do_apply_counter_update()
tests/counters: add test for apply reversability
counters: attempt to apply in place
atomic_cell: add COUNTER_IN_PLACE_REVERT flag
counters: add equality operators
counters: implement decrement operators for shard_iterator
counters: allow using both views and mutable_views
atomic_cell: introduce atomic_cell_mutable_view
managed_bytes: add cast to mutable_view
bytes: add bytes_mutable_view
utils: introduce mutable_view
db: add more tracing events for counter writes
db: propagate tracing state for counter writes
tests/cell_locker: add test for timing out lock acquisition
counter_cell_locker: allow setting timeouts
db: propagate timeout for counter writes
...
boost::split() return one empty string if called on an empty input.
Trying to cast an empty string to a token value results in a bad_lexical_cast
exception. Fix it by handling empty token list explicitly.
Message-Id: <20170302125405.GU11471@scylladb.com>
Adds yet another magic function "SCYLLA_COUNTER_SHARD_LIST", indicating that
argument value, which must be a list of tuples <int, UUID, long, long>,
should be inserted as an actual counter value, not update.
This of course to allow counters to be read from sstable loader.
Note that we also need to allow timestamps for counter mutations,
as well as convince the counter code itself to treat the data as
already baked. So ugly wormhole galore.
v2:
* Changed flag names
* More explicit wormholing, bypassing normal counter path, to
avoid read-before-write etc
* throw exceptions on unhandled shard types in marshalling
v3:
* Added counter id ordering check
* Added batch statement check for mixing normal and raw counter updates
Message-Id: <1487683665-23426-2-git-send-email-calle@scylladb.com>
There are several problems with storage_proxy::send_to_endpoint right
now. It uses create_write_response_handler() overload that is specific
to read repair which is suboptimal and creates incorrect logs, it does
not process errors and it does not hold storage_proxy object until write
is complete. The patch fixes all of the problems.
Message-Id: <20170208101949.GA19474@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
"This series uses the newly added histogram and label support to add metrics to
the storage_proxy and to the column_family.
This would add latency and histogram and the missing metrics from column family."
* 'amnon/histogram_metrics' of github.com:cloudius-systems/seastar-dev:
database: add metrics registration for the coloumn family
storage_proxy: add read and write latency histogram
estimated_histogram: returns a metrics histogram
Add a function for sending one mutation to one remote replica owning
this mutation. This is needed for materialized views, where each
base replica sends each view mutation to one particular view replica.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
"Fixes #1531
Adds lookup to gms::inet_address and uses it in (hopefully all) the
salient places where configured symbolic names are interpreted.
Removes the dummy dns modula in scylla in favour of the seastar one."
* 'calle/use-dns' of github.com:cloudius-systems/seastar-dev:
remove scylla dns code
service::storage_service: Remove depedency on scylla dns
main.cc: remove scylla dns dependency
main/init: Lookup inet addresses from config by dns lookup
db::system_keyspace: Find rpc_address by lookup
gms::inet_address: Add lookup functionality.
scylla tls: Add option support for client auth and tls opts
Merge commit 45b6070832 used butchered version of storage_proxy
patch to adjust to rpc timer change instead the one I've sent. This
patch fixes the differences.
Message-Id: <20170206095237.GA7691@scylladb.com>
Refs #1813 (fixes scylla part)
Added require_client_auth and priority_string options to
server_encryption_options/client_encryption_options an process them.
Allows TLS method/algo specification. Also enabled enforcing known cert
authentication for both node-to-node and client communication.
* seastar 397685c...c1dbd89 (13):
> lowres_clock: drop cache-line alignment for _timer
> net/packet: add missing include
> Merge "Adding histogram and description support" from Amnon
> reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
> Set the option '--server' of tests/tcp_sctp_client to be required
> core/memory: Remove superfluous assignment
> core/memory: Remove dead code
> core/reactor: Use logger instead of cerr
> fix inverted logic in overprovision parameter
> rpc: fix timeout checking condition
> rpc: use lowres_clock instead of high resolution one
> semaphore: make semaphore's clock configurable
> rpc: detect timedout outgoing packets earlier
Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.
Includes fixup from Amnon:
collectd api should use the metrics getters
As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.
This will be important when the internal implementation will changed
from union to variant.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
The reason is the same as why foreground writes are reported instead of
total writes (049ae37d08): It is much easier to see what is going on
this way.
Also fixes a typo in a counter's description.
Fixes#1217
Message-Id: <20170129093412.GS11469@scylladb.com>
That's because a single shard is used to calculate generation for new
sstables in upload directory, and that will result in that single shard
sharing all the resources with other shards.
For refresh without upload dir, it currently works fine because we
reshuffle column family dir instead.
flush_upload_dir() is now a free function, takes a distributed database
object, and uses calculate_shard_from_sstable_generation() to decide
which shard will move sstable using its own generation namespace.
Fixes#2008.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b0cccf7bbb61416ff8718bac92fdca90cc5fb9c9.1484253232.git.raphaelsc@scylladb.com>
As the metrics migration progressed, some include to scollectd.hh left
behind.
Because of the nature of the scollecd implementation those include
brings alot of code with them to the header files and eventually to many
source file.
This patch remove those include and add a missing include to
storage_proxy.cc.
The reason the compiler didn't complain is an indication to the
problematic nature of those include in the first place.
Before this patch, change in metrics.hh would cause 169 files to
compile, after this change 17.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1484667536-2185-1-git-send-email-amnon@scylladb.com>
This patch ensures that the host only announces and registers the
MATERIALIZED_VIEWS feature if it was started with the experimental
flag.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170116123412.21365-1-duarte@scylladb.com>
Since ce083308a1
"random_mutation_generator: Generate RTs by default" random mutation
generator produces range tombstones. However, so far the tests were run
with all features disabled (because of incomplete initialization of all
services) which meant that RANGE_TOMBSTONE feature was not enabled and
the code couldn't handle range tombstones that weren't just prefixes.
This patch solves the problem by forcing all features to be enabled when
tests are run.
Message-Id: <20170116103324.22956-1-pdziepak@scylladb.com>
query_mutations_locally() takes one_or_two_partition_ranges by
reference and requires, indirectly, that it is kept alive until
operation resolves. However, we were passing expiring value to it, the
result of unwrap().
Fixes dtest failure in consistent_bootstrap_test.py:TestBootstrapConsistency.consistent_reads_after_bootstrap_test
Another potential problem was that we were dereferencing "s" in the same
expression which move-constructs an argument out of it.
Message-Id: <1484222759-4967-1-git-send-email-tgrabiec@scylladb.com>
"Intended to reduce memory usage when resharding by sharing sstable
components among shards. File descriptors are also shared from now
on, meaning that a much smaller number of file descriptors will be
used during resharding.
Fixes #1951."
branch 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla
* 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla:
db: avoid excessive memory usage during resharding
checked_file_impl: add support to dup
sstables: group sstable components that can be shared among shards
sstables: rename sstable member
After resharding, sstables may be owned by all shards, which
means that file descriptors and memory usage for metadata will
increase by a factor equal to number of shards. That can easily
lead to OOM.
SSTable components are immutable, so they can be stored in one
shard and shared with others that need it. We use the following
formula to decide which shard will open the sstable and share
it with the others: (generation % smp::count), which is the
inverse of how we calculate generation for new sstables.
So if no resharding is performed, everything is shard-local.
With this approach, resource usage due to loaded sstables will
be evenly distributed among shards.
For this approach to work, we now only populate keyspaces from
shard 0. It's now the sole responsible for iterating through
column family dirs. In addition, most of population functions
are now free and take distributed database object as parameter.
Fixes#1951.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
mutation_result_merger::get() assumes that the merged result may be a
short read if at least one of the partial results is a short read (in
other words, if none of the partial results is a short read, then the
merged result is also not a short read). However this is not true;
because we update the memory accounter incrementally, we may stop
scanning early. All the partial results are full; but we did not scan
the entire range.
Fix by changing the short_read variable initialization from `no`
(which assumes we'll encounter a short read indication when processing
one of the batches) to `this->short_read()`, which also takes into
account the memory accounter.
Fixes#2001.
Message-Id: <20170108111315.17877-1-avi@scylladb.com>
Transform the supervisor_notify() and related functions into
the "supervisor" class and place this class implementation in
a separate .cc file.
This is going to fix the compilation breakage of tests introduced
by a
commit 8014adc2a1
init: serialize the creation of system_traces KS objects
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1483663955-20096-1-git-send-email-vladz@scylladb.com>
During a range scan, we try to avoid sorting according to partition range
when we can do so. This is when we scan fewer than smp::count shards --
each shard's range is strictly ordered with respect to the others.
However, we use the wrong key for the sort -- we use the shard number. But
if we started at shard s > 0 and wrapped around to shard 0, then shard 0's
range will be after the range belonging to shard s, but will sort before it.
Fix by storing the iteration order as the sort key. We use that when we
know that shards do not overlap (shards < smp::count) and the index within
the source partition range vector when they do.
Fixes#1998.
Message-Id: <20170105114253.17492-1-avi@scylladb.com>
Serialize the creation of a system_traces KS objects when
they do not exist - the initial cluster boot.
Avoid creating them in parallel by different cluster Nodes
in order to avoid issue #420.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1483552503-12873-3-git-send-email-vladz@scylladb.com>