The scylla_cpuset_setup uses a verify_args() function that is defined in the scylla_lib.sh.
Fixes#2716
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
* seastar e96881a...b9f1eb7 (9):
> httpd: indentation patch
> httpd: handle exception when shutting down
> stall-detector: Allow backtrace throttling to be configured
> stall-detector: Fix messages about suppresssion not appearing
> scripts: posix_net_conf.sh: allow passing a perftune.py configuration file as a parameter
> scripts: perftune.py: add the possibility to pass the parameters in a configuration file and print the YAML file with the current configuration
> scripts: perftune.py: actually use the number of Rx queues when comparing to the number of CPU threads
> core: make current_backtrace() noexcept
> memory: add large allocation detector stubs for default allocator
Fixes#2723.
* tag 'asias/repair_issue_2723_v1' of github.com:cloudius-systems/seastar-dev:
repair: Do not allow repair until node is in NORMAL status
gossip: Add is_normal helper
The following backtrace was reported by user when running repair and keeping restarting the node at the same time.
#0 0x00007eff077281d7 in raise () from /lib64/libc.so.6
#1 0x00007eff07729a08 in abort () from /lib64/libc.so.6
#2 0x00007eff07721146 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007eff077211f2 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000010ef2c2 in locator::token_metadata::first_token_index (this=0x641000214e98, start=...) at locator/token_metadata.cc:133
#5 0x00000000010ef2d9 in locator::token_metadata::first_token (this=0x641000214e98, start=...) at locator/token_metadata.cc:143
#6 0x00000000010e329d in locator::abstract_replication_strategy::get_natural_endpoints (this=0x641000494000, search_token=...)
at locator/abstract_replication_strategy.cc:66
#7 0x0000000001481186 in get_neighbors (hosts=std::vector of length 0, capacity 0, data_centers=std::vector of length 0, capacity 0,
range=<error reading variable: access outside bounds of object referenced via synthetic pointer>, ksname=..., db=...) at repair/repair.cc:196
#8 repair_range<nonwrapping_range<dht::token> > (range=..., ri=...) at repair/repair.cc:781
#9 <lambda(auto:99&)>::<lambda(auto:100&&)>::<lambda(auto:101&)>::<lambda()>::operator() (__closure=0x7efec07f7460) at repair/repair.cc:1005
#10 futurize<future<bool_class<stop_iteration_tag> > >::apply<repair_ranges(repair_info)::<lambda(auto:99&)>::
It is reproduced with
1) while true; do curl -X POST --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/ks3"; done
2) start node 127.0.0.1, stop node 127.0.0.1 in a loop
The problem is, during boot up, the token_metadata is not replicated to all shards until
the node goes into NORMAL status.
To fix, check until node is in NORMAL status before allowing repair.
Fixes#2723
The number of keysapce and column family metrics reported is
proportional to the number of shards times the number of keysapce/column
families.
This can cause a performance issue both on the reporting system and on
the collecting system.
This patch adds a configuration flag (set to false by default) to enable
or disable those metrics.
Fixes#2701
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170821113843.1036-1-amnon@scylladb.com>
"Overly large metadata can hog memory which especially hurts in setups
with bad disk/memory ratio. To ease the pain compress the in-memory
compression-info.
The compression is implemented based on Avi's idea which is to group n
offsets together into segments, where each segment stores a base
absolute offset into the file, the other offsets in the segments being
relative offsets (and thus of reduced size). Also offsets are allocated
only just enough bits to store their maximum value. The offsets are thus
packed in a buffer like so:
arrrarrrarrr...
where n is 4, a is an absolute offset and r are offsets relative to a.
This of course means that stored offsets will not be aligned, not even
on a byte boundary, but the size reduction pretty convincing.
In addition, segments are stored in buckets, where each bucket has its
own base offset. In addition, segments in a buckets are optimized to
address as large of a chunk of the data as possible for a given chunk
size."
Ref #1946.
* 'bdenes/compress-compression-v3' of https://github.com/denesb/scylla:
Add unit test for compress::offsets
Optimise the storage of compression chunk offsets
Add script to precompute segmented compression parameters
To reduce the memory footprint of compression-info, n offsets are
grouped together into segments, where each segment stores a base
absolute offset into the file, the other offsets in the segments being
relative offsets (and thus of reduced size). Also offsets are
allocated only just enough bits to store their maximum value. The
offsets are thus packed in a buffer like so:
arrrarrrarrr...
where n is 4, a is an absolute offset and r are offsets relative to a.
The optimal value of n can be calculated for a given file_size (f) and
chunk_size (c), by finding the minima of the following function:
f(n) = (f/c)/n * (log2(f) + (n - 1)*log2((n-1)*(c + 64)))
This is done in an empirical way, using a script (see below).
Furthermore segments are stored in buckets, where each bucket has its
own base offset. Each bucket therefore can address an equal chunk of the
file and furthermore each segment in a bucket can address an equal
sub-chunk of this area.
The value of a given offset i is thus:
bucket_base_offset_for(i) + segment_base_offset_for(i) + offset(i)
To account for the bucketed storage we calculate a local_f, which is
optimized so that a bucketful of segmented offsets can address the
largest possible chunk of f. As value of this local_f only depends on
the bucket_size (b) and c the value of n can be made independent of f
and therefore only depend on one dynamic value, c. This makes life much
simpler as we don't need to know the size of the file up-front, we can
just append buckets to the storage on demand, while the required storage
is still less than a third [1] of the original storage requirements
(std::deque<uint64>).
The table with the minima(f(n)) for different f and c values is
pre-computed by gen_segmented_compress_params.py and
stored in sstables/segmented_compress_params.hh. This script also
creates a table with the best values of local_f for the given
bucket_size. At runtime we only select the best params based on c.
[1] This was calculated for c=4K and b=4K
Some (most?) users don't read logs or release notes, so they won't notice
that the ByteOrdered and Random partitioners were deprecated in 2.0. Make
them notice by refusing to start with a deprecated partitioner, unless a
switch is explicitly enabled.
Message-Id: <20170820073424.8331-1-avi@scylladb.com>
Limiting summary entry generation to at most one summary entry
per 64k of index data can lead to large index pages, with thousands
of index entries per summary entry. These are slow to parse, and there
is no real gain from the limit, since we already enforce a size
limit on the summary.
Remove the limit and allow summary entry generation based solely on
spanned data size.
Fixes#2711.
Message-Id: <20170819184255.14181-1-avi@scylladb.com>
split_range_to_single_shard() returns a vector of size 4096, with
each element (a partition_range) of size 100. The total of 400k can
cause defragmentation if memory is fragmented.
Fix by using a deque.
Fixes#2707.
Message-Id: <20170819141017.28287-1-avi@scylladb.com>
The script generates sstables/segmented_compress_params.hh which
contains a list with the optimal number of grouped offsets for
different data and chunk sizes as well as a list with the best
nominal data sizes for different chunk sizes, given a bucket size.
Data sizes are in the range of [2**4,2**50] and chunks in the
range of [2**4, 2**30]. Data sizes that are not used with the current
bucket_size are ommited.
See next commit for details of how the calculated values are used.
Large allocations can require cache evictions to be satisfied, and can
therefore induce long latencies. Enable the seastar large allocation
warning so we can hunt them down and fix them.
Message-Id: <20170819135212.25230-1-avi@scylladb.com>
Two reasons for this change:
1) every compaction should be multiplexed to manager which in turn
will make decision when to schedule. improvements on it will
immediately benefit every existing compaction type.
2) active tasks metric will now track ongoing reshard jobs.
Fixes#2671.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170817224334.6402-1-raphaelsc@scylladb.com>
"Now that range queries go through the normal digest path, we rely on
query::result::calculate_counts() to count the amount of partitions
and rows returned.
This series optimizes it, in case it is needed, and also changes the
result message to include the partition and row counts, avoiding the
calculation altogether."
* 'calculate-counts/v3' of github.com:duarten/scylla:
query-result: Send row and partition count over the wire
query::result: Optimize calculate_counts()
These two admin related packages will be packaged under the "app-admin"
category and not the "dev-db" one.
This fixes the detection path of the packages for scylla_setup.
Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170817094756.21550-1-ultrabug@gentoo.org>
Scylla's configure.py contains stuff we copied from Seastar's
configure.py, but is no longer used. Let's get rid of some of it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170813150842.12603-1-nyh@scylladb.com>
When user mistakenly forgot to pass parameter for a flag, our scripts misparses
next flag as the parameter.
ex) Correct usage is '--ntp-domain <domain> --setup-nic', but passed
'--ntp-domain --setup-nic'.
Result of that, next flag will ignore by scripts.
To prevent such behavior, reject any parameter that start with '--'.
Fixes#2609
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170815114751.6223-1-syuu@scylladb.com>
When compacting a partition for querying we would read an extra row,
to include any tombstones between that one and the previous row.
This is no longer needed since we have a general mechanism to detect
short reads in the storage_proxy.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170811103031.22866-1-duarte@scylladb.com>
incremental_reader_selector assumes the partition_range it receives has a lower
bound, but it was seen in mutation_test that this is not so.
Fix by checking whether the bound exists or not.
Message-Id: <20170815095852.14149-1-avi@scylladb.com>
"Exhausted readers belonging to a combined_mutation_reader can be fast
forwarded, so we have to keep them around. However, if the reader is
not fast forwardable, then we can drop the contained readers and their
buffers."
* 'ff-reader/v2' of github.com:duarten/scylla:
combined_mutation_reader: Drop exhausted readers if not in FF mode
combined_mutation_reader: Remove superfluous mutation_readers list
memtable_snapshot_source: Created readers should be fast forwardable
Exhausted readers can be fast forwarded, so we have to keep them
around. However, if the current reader is not fast forwardable, then
we can drop those readers and their buffers.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>