The integer_type_impl::parse_int() function uses boost::lexical_cast()
under the hood, which parses 8-bit numbers as characters. Fix the
function to lexical cast to 64-bit integer and convert the result to
integer_type_impl template type.
Previously, if the Prometheus port (by default, 0.0.0.0:9180) could not
be opened, the following message appeared in the log about 10 seconds into
the run, and Scylla crashed.
ERROR 2017-01-01 19:31:04,066 [shard 0] seastar - Exiting on unhandled exception: std::system_error (error system:98, Address already in use)
The puzzled user would have no idea *which* address was already in use, why,
or why Scylla stopped.
In this patch, before the above message we get the much more informative
message:
ERROR 2017-01-01 19:58:19,080 [shard 0] init - Could not start Prometheus API server on 0.0.0.0:9180: std::system_error (error system:98, Address already in use)
We continue to print the original message - and exit - in this case,
under the assumption that it's better not to run the database while
improperly configured.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170102121304.2060-1-nyh@scylladb.com>
The test case is interested in the upper boundary of 32-bit integer
because we already test the lower boundary in assertions below. The old
test passed, of course, but it wasn't very interesting.
Message-Id: <1483522773-6008-1-git-send-email-penberg@scylladb.com>
Fix the following build breakage:
FAILED: build/release/gen/cql3/CqlParser.o
g++ -MMD -MT build/release/gen/cql3/CqlParser.o -MF build/release/gen/cql3/CqlParser.o.d -std=gnu++1y -g -Wall -Werror -fvisibility=hidden -pthread -I/home/penberg/scylla/seastar -I/home/penberg/scylla/seastar/fmt -I/home/penberg/scylla/seastar/build/release/gen -march=nehalem -Ifmt -DBOOST_TEST_DYN_LINK -Wno-overloaded-virtual -DFMT_HEADER_ONLY -DHAVE_HWLOC -DHAVE_NUMA -DHAVE_LZ4_COMPRESS_DEFAULT -O2 -DBOOST_TEST_DYN_LINK -Wno-maybe-uninitialized -DHAVE_LIBSYSTEMD=1 -I. -I build/release/gen -I seastar -I seastar/build/release/gen -c -o build/release/gen/cql3/CqlParser.o build/release/gen/cql3/CqlParser.cpp
In file included from ./query-request.hh:31:0,
from ./locator/token_metadata.hh:51,
from ./locator/abstract_replication_strategy.hh:29,
from ./database.hh:26,
from ./service/storage_proxy.hh:44,
from ./db/schema_tables.hh:43,
from ./db/system_keyspace.hh:46,
from ./cql3/functions/function_name.hh:45,
from ./cql3/selection/selectable.hh:48,
from ./cql3/selection/writetime_or_ttl.hh:45,
from build/release/gen/cql3/CqlParser.hpp:63,
from build/release/gen/cql3/CqlParser.cpp:44:
./tracing/tracing.hh:357:5: error: ‘scollectd’ does not name a type
scollectd::registrations _registrations;
^~~~~~~~~
Message-Id: <1482939751-8756-1-git-send-email-penberg@scylladb.com>
Commit d41cd48a made the is_joined() method a future<bool> because
only cpu 0 knows its real value. This makes this function inconvenient
to use. So this patch reverts commit d41cd48a, and instead sets this
flag's value on all shards, so each shard can read its value locally
(and immediately).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20161228160450.5831-1-nyh@scylladb.com>
* seastar f32e4c2...1c8e389 (2):
> Merge "migrate network related seastar collectd metrics to the new metrics registration API" from Vlad
> file: add dup() support
This patch adds an overload to the apply() function,
which takes a clustering_row by reference, to copy. This will be
needed by future patches, when merging base table updates with the
existing data.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1482881106-3202-1-git-send-email-duarte@scylladb.com>
This patch removes references to the old begin_range_tombstone and
end_range_tombstone mutation_fragments, which have been replaced by a
single range_tombstone fragment.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1482880820-2831-1-git-send-email-duarte@scylladb.com>
- Change an exception type thrown by a tracing::tracing::set_trace_probability()
to make it different from the one thrown by an std::stod() when it fails to
parse a given string.
- Catch the std::out_of_range exception thrown by a tracing::tracing::set_trace_probability() and
wrap the exception string into the httpd::bad_param_exception() object.
- Throw a httpd::bad_param_exception() with a
"Bad format in a probability value: <a user given probability string value>"
message if std::invalid_argument is caught.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465300738-1557-1-git-send-email-vladz@cloudius-systems.com>
Refuse to boot if we don't have at least 1 GiB per shard, unless in developer
mode.
The primary violator here is docker, but since it starts in developer mode,
it won't get fixed. We need some extra logic for this case.
Message-Id: <20161221090222.28677-1-avi@scylladb.com>
"This patchset contains fixes for the changes introduced in "Query result
size limiting". It also improves handling of short data reads.
I order to minimise chances of digest mismatch during data queries replicas
that were asked just to return a digest also keep track of the size of the
data (in the IDL representation) so that they would stop at the same point
nodes doing full data queries would. Moreover, data queries are not
affected by per-shard memory limit and the coordinator sends individual
result size limits to replicas in order not to depend on hardcoded values.
It is still possible to get digest mismatches if the IDL changes (e.g. a
new field is added), but, hopefully, that won't be a serious problem."
* 'pdziepak/short-read-fixes/v4' of github.com:cloudius-systems/seastar-dev:
query: introduce result_memory_accounter::foreign_state
storage_proxy: fix short reads in parallel range queries
storage_proxy: pass maximum result size to replicas
mutation_partition: use result limiter for digest reads
query: make result_memory_limiter constants available for linker
result_memory_limiter: add accounter for digest reads
idl: allow writers to use any output stream
result_memory_limiter: split new_read() to new_{data, mutation}_read()
idl: is_short_read() was added in 1.6
mutation_partition: honour allowed_short_read for static rows
storage_proxy: fix _is_short_read computation
storage_proxy: disallow short reads if got no live rows
storage_proxy: don't stop after result with no live rows
This reverts commit aa392810ff, reversing
changes made to a24ff47c637e6a5fd158099b8a65f1191fc2d023; it uses
boost::intrusive::detail directly, which it must not, and doesn't compile on
all boost versions as a consequence.
* seastar 0b98024...f32e4c2 (11):
> Merge "Moving the reactor counters to the metric layer" from Amnon
> metrics: Metrics function should take variable as a refernce
> Revert "Merge ""Moving the reactor counters to the metric layer from Amnon"
> Merge ""Moving the reactor counters to the metric layer from Amnon
> Revert "fstream: Auto-close data_sink and data_source"
> rpc: Avoid resource unit leaks on failure
> fstream: Auto-close data_sink and data_source
> http: Move metrics registration to the metrics layer
> output_stream: add batching to zero copy interface
> Revert "slab: Move the metrics registration to the metrics layer"
> slab: Move the metrics registration to the metrics layer
"Reduce the size of mutation_partition by implementing intrusive set using
bi::rbtree_algorithms directly and using tree nodes optimized for size.
This will reduce the size of mutation_partition by:
24 bytes + <number of cql rows> * 8 bytes
This should have a positive impact on performance because mutation_partitions
are stored both in memtable and cache.
Fixes #742."
* 'haaawk/742' of github.com:cloudius-systems/seastar-dev:
intrusive_set: rename size() to calculate_size()
Make intrusive_set_external_comparator::_value_traits static
Implement intrusive set using rbtree_algorithms
mutation_partition: make apply_reversibly_intrusive_set nongeneric
mutation_partition: take schema in find_row and clustered_row
mutation_partition: Extract intrusive set logic to a class.
mutation_partition: Replace value_comp with key_comp calls
This hopefully will make it more apparent that
the time complexity of this method is O(N) not O(1).
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
_value_traits can be shared among all instances
and there's no need to store it in every single one.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This new implementation takes less memory because it
does not store comparator.
It also uses tree nodes optimized for size. This means
that instead of storing an enum field |color| they embed
this information inside pointer to parent.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
apply_reversibly_intrusive_set is used only in one place
and always with rows_type. There's no need for it to be generic.
This will allow changing intrusive set implementation.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
file output streams take the responsibility of closing the file, they
will close the file as part of closing the stream.
During sstable writing we create sstable object and keep file
references there as well. Sstable object also has responsibility for
closing the files, and does so from sstable::~sstable().
Double close was supposed to be avoided by a construct like this:
writer.close().get();
_file = {};
However if close() failed, which can happen when write-ahead failed,
_file would not be cleared, and both the writer and sstable would
close the file. This will result in a crash in
append_challenged_posix_file_impl::close(), which is not prepared to
be closed twice.
Another problem is that if exception happened before we reached that
construct, we still should close the writer. Currently we don't, so
there's no double close on the file, but that's a bug which needs to
be fixed and once that's fixed double close on _file will be even more
likely.
The fix employed here is to not keep files inside sstable object when
writing. As soon as the writer is constructed, it's the only owner of
the file.
Fixes#1764.
Message-Id: <1482428648-22553-1-git-send-email-tgrabiec@scylladb.com>
Range queries used to be performed sequentially and the shard performing
part of the read was reading state of the merger's memory accounter
directly. Now, they may be performed in parallel so it is safer to just
pass relevant data by value to the intersted shards so that they are not
reading something that another shard is modyfing at the same time.
Since query is done in parallel there is a chance of overread. However,
the parallelism is high only in sparsely populated tables and that's
when the overread is less serious problem.
Since a1cafed370 "storage_proxy: handle
range scans of sparsely populated tables" nonsingular range queries may
be performed in parallel on multiple shards. The consequence of this
that result may be added to the merger out of order. This requires more
complex logic for handling short reads.
As soon as mutation_result_merger gets a short read it starts to discard
all subsequently received results that are known to contain partitions
with larger keys.
Then when the final result is being prepared the merger may need to
combine and sorts results which ordering is not known. If at least one
of these results is a short one all partitions with larger keys are
removed.
Due to request being performed in parallel it is possible that even
though there was a short read the merger has got enough live data to
satisfy specified limits. If this has happened the short read flag is
not set on the final result.
We may want to change the default individual result size limit in the
future. If it is provided by the coordinator and not hardcoded in the
replicas this can be done without causing data query digest mismatches
or wasteful mutation query results.
Even if we are performing a digest query we should do proper result
memory accounting so that the result ends exactly in the same place that
it would if it was a data query. This is to avoid digest mismatches
between replicas.
It was the observation that ring_position_range_sharder doesn't support
wrapping ranges that started the nonwrapping_range madness, but that
class still has some leftover wrapping ranges. Close the circle by
removing them.
Message-Id: <20161123153113.8944-1-avi@scylladb.com>
Digest reads differ from data reads in a way that they do not really
consume any memory. We still want them to stop in the same place that
data reads would, but the per-shard semaphore shouldn't be updated by
them.
Original IDL generated code was hardcoded to always use bytes_ostream.
This patch makes the output stream a template parameter so that any
valid output stream can be used.
Unfortunately, making IDL writers generic requires updates in the code
that uses them, this is fixed in C++17 which would be able to deduce the
parameter in most cases.
For data queries it is very important that all replicas get limited in
the same place (this includes replicas returning only digest). That's
why they shouldn't be affected by per-shard result memory limit.
Moreover, we should make sure that individual memory limits are the
same, making the coordinator provide it for replicas which allow to
safely change it in the future.
Mutation queries are not as sensitive but it is still beneficial to make
sure that all replicas use the same individual limit.