"
As I described at https://github.com/scylladb/scylla/issues/6973#issuecomment-669705374,
we need to avoid disk full on scylla_swap_setup, also we should provide manual configuration of swap size.
This PR provides following things:
- use psutil to get memtotal and disk free, since it provides better APIs
- calculate swap size in bytes to avoid causing error on low-memory environment
- prevent to use more than 50% of disk space when auto-configured swap size, abort setup when almost no disk space available (less than 2GB)
- add --swap-size option to specify swap size both on scylla_swap_setup and scylla_setup
- add interactive swap size prompt on scylla_setup
Fixes#6947
Related #6973
Related scylladb/scylla-machine-image#48
"
* syuu1228-scylla_swap_setup_improves:
scylla_setup: add swap size interactive prompt on swap setup
scylla_swap_setup: add --swap-size option to specify swap size
scylla_swap_setup: limit swapfile size to half of diskspace
scylla_swap_setup: calculate in bytes instead of GB
scylla_swap_setup: use psutil to get memtotal and disk free
Sometimes (e.g. when investigating a suspected heap use-after-free) it
is useful to include dead objects in the search results. This patch adds
a new option to scylla find to enable just that. Also make sure we
save and print the offset of the address in the object for dead
objects, just like we do for live ones.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200810091202.1401231-1-bdenes@scylladb.com>
We should not fill entire disk space with swapfile, it's safer to limit
swap size 50% of diskspace.
Also, if 50% of diskspace <= 1GB, abort setup since it's out of disk space.
Converting memory & disk sizes to an int value of N gigabytes was too rough,
it become problematic in low memory size / low disk size environment, such as
some types of EC2 instances.
We should calculate in bytes.
To get better API of memory & disk statistics, switch to psutil.
With the library we don't need to parse /proc/meminfo.
[avi: regenerate tools/toolchain/image for new python3-psutils package]
* seastar eb452a22a0...e615054c75 (13):
> memory: fix small aligned free memory corruption
Fixes#6831
> logger: specify log methods as noexcept
> logger: mark trivial methods as noexcept
> everywhere: Replace boost::optional with std::optional
> httpd: fix Expect header case sensitivity
> rpc: Avoid excessive casting in has_handler() helper
> test: Increase capacity of fair-queue unit test case
> file_io_test: Simplify a test with memory::with_allocation_failures
> Merge 'Add HTTP/1.1 100 Continue' from Wojciech
Fixes#6844
> future: Use "if constexpr"
> future: Drop code that was avoiding a gcc 8 warning
> file: specify alignment get methods noexcept
> merge: Specify abort_source subscription handlers as noexcept
Move the classes representing CQL expressions (and utility functions
on them) from the `restrictions` namespace to a new namespace `expr`.
Most of the restriction.hh content was moved verbatim to
expression.hh. Similarly, all expression-related code was moved from
statement_restrictions.cc verbatim to expression.cc.
As suggested in #5763 feedback
https://github.com/scylladb/scylla/pull/5763#discussion_r443210498
Tests: dev (unit)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
While answering a stackoverflow question on how to create an item but only
if we don't already have an item with the same key, I realized that we never
actually tested that ConditionExpressions works on key columns: all the
tests we had in test_condition_expression.py had conditions on non-key
attributes. So in this patch we add two tests with a condition on the key
attribute.
Most examples of conditions on the key attributes would be silly, but
in these two tests we demonstrate how a test on key attributes can be
useful to solve the above need of creating an item if no such item
exists yet. We demonstrate two ways to do this using a condition on
the key - using either the "<>" (not equal) operator, or the
"attribute_not_exists()" function.
These tests pass - we don't have a bug in this. But it's nice to have
a test that confirms that we don't (and don't regress in that area).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200806200322.1568103-1-nyh@scylladb.com>
* tools/java aa7898d771...f2c7cf8d8d (2):
> dist: debian: support non-x86
> NodeProbe: get all histogram values in a single call
* tools/jmx 626fd75...c5ed831 (1):
> dist: debian: support non-x86
"
The current implementation of B+ benefits from using SIMD
instruction in intra-nodes keys search. This set adds this
functionality.
The general idea behind the implementation is in "asking"
the less comparator if it is the plain "<" and allows for
key simplification to do this natural comparison. If it
does, the search key is simplified to int64_t, the node's
array of keys is casted to array of integers, then both are
fed into avx-optimized searcher.
The searcher should work on nodes that are not filled with
keys. For performance the "unused" keys are set to int64_t
minimum, the search loop compares them too (!) and adjusts
the result index by node size. This needs some care in the
maybe_key{} wrapper.
fixes: #186
tests: unit(dev)
"
* 'br-bptree-avx-b' of https://github.com/xemul/scylla:
utils: AVX searcher
bptree: Special intra-node key search when possible
bptree: Add lesses to maybe_key template
token: Restrict TokenCarrier concept with noexcept
The column names in SlicePredicate can be passed in arbitrary order.
We converted them to clustering ranges in read_command preserving the
original order. As a result, the clustering ranges in read command may
appear out of order. This violates storage engine's assumptions and
lead to undefined behavior.
It was seen manifesting as a SIGSEGV or an abort in sstable reader
when executing a get_slice() thrift verb:
scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed.
Fixes#6486.
Tests:
- added a new dtest to thrift_tests.py which reproduces the problem
Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com>
In case of an initialization failure after
db.get_compaction_manager().enable();
But before stop_database, we would never stop the compaction manager
and it would assert during destruction.
I am trying to add a test for this using the memory failure injector,
but that will require fixing other crashes first.
Found while debugging #6831.
Refs #6831.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200805181840.196064-1-espindola@scylladb.com>
With all the preparations made so far it's now possible to implement
the avx-powered search in an array.
The array to search in has both -- capacity and size, so searching in
it needs to take allocated, but unused tail into account. Two options
for that -- limit the number of comparisons "by hands" or keep minimal
and impossible value in this tail, scan "capacity" elements, then
correct the result with "size" value. The latter approach is up to 50%
faster than any (tried) attempt to do the former one.
The run-time selection of the array search code is done with the gnu
target attribute. It's available since gcc 4.8. For AVX-less platforms
the default linear scanner is used.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
If the key type is int64_t and the less-comparator is "natural" (i.e. it's
literally 'a < b') we may use the SIMD instructions to search for the key
on a node. Before doing so, the maybe_key and the searcher should be prepared
for that, in particular:
1. maybe_key should set unused keys to the minimal value
2. the searcher for this case should call the gt() helper with
primitive types -- int64_t search key and array of int64_t values
To tell to B+ code that the key-less pair is such the less-er should define
the simplify_key() method converting search keys to int64_t-s.
This searcher is selected automatically, if any mismatch happens it silently
falls back to default one. Thus also add a static assertion to the row-cache
to mitigate this.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The way maybe_key works will be in-sync with the intra-node searching
code and will require to know what the Less type is, so prepare for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The <search-key>.token() is noexcept currently and will have
to be explicitly such for future optimized key searcher, so
restrict the constraint and update the related classes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The single-node test Scylla run by test/alternator/run uses, as the
default, 256 vnodes. When we have 256 vnodes and two shards, our CDC
implementation produces 512 separate "streams" (called "shards" in
DynamoDB lingo). This causes each of the tests in test_streams.py
which need to read data from the stream to need to do 1024 (!) API
requests (512 calls to GetShardIterator and 512 calls to GetRecords)
which takes significant time - about a second per test.
In this patch, we reduce the number of vnodes to 16. We still have
a non-negligible number of stream "shards" (32) so this part of the
CDC code is still exercised. Moreover, to ensure we still routinely
test the paging feature of DescribeStream (whose default page size
is 100), the patch changes the request to use a Limit of 10, so
paging will still be used to retrieve the list of 32 shards.
The time to run the 27 tests in test_streams.py, on my laptop:
Before this patch: 26 seconds
After this patch: 6 seconds.
Fixes#6979
Message-Id: <20200805093418.1490305-1-nyh@scylladb.com>
This patch adds additional tests for Alternator Streams, which helped
uncover 9 new issues.
The stream tests are noticibly slower than most other Alternator tests -
test_streams.py now has 27 tests taking a total of 20 seconds. Much of this
slowness is attributed to Alternator Stream's 512 "shards" per stream in the
single-node test setup with 256 vnodes, meaning that we need over 1000 API
requests per test using GetRecords. These tests could be made significantly
faster (as little as 4 seconds) by setting a lower number of vnodes.
Issue #6979 is about doing this in the future.
The tests in this patch have comments explaining clearly (I hope) what they
test, and also pointing to issues I opened about the problems discovered
through these tests. In particular, the tests reproduce the following bugs:
Refs #6918
Refs #6926
Refs #6930
Refs #6933
Refs #6935
Refs #6939
Refs #6942
The tests also work around the following issues (and can be changed to
be more strict and reproduce these issues):
Refs #6918
Refs #6931
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200804154755.1461309-1-nyh@scylladb.com>
"
This patch series converts a few more global variables from sstring to
constexpr std::string_view.
Doing that makes it impossible for them to be part of any
initialization order problems.
"
* 'espindola/more-constexpr-v2' of https://github.com/espindola/scylla:
auth: Turn DEFAULT_USER_NAME into a std::string_view variable
auth: Turn SALTED_HASH into a std::string_view variable
auth: Turn meta::role_members_table::qualified_name into a std::string_view variable
auth: Turn meta::roles_table::qualified_name into a std::string_view variable
auth: Turn password_authenticator_name into a std::string_view variable
auth: Inline default_authorizer_name into only use
auth: Turn allow_all_authorizer_name into a std::string_view variable
auth: Turn allow_all_authenticator_name into a std::string_view variable
If initialization of a TLS variable fails there is nothing better to
do than call std::unexpected.
This also adds a disable_failure_guard to avoid errors when using
allocation error injection.
With init() being noexcept, we can also mark clear_functions.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200804180550.96150-1-espindola@scylladb.com>
There is no constexpr operator+ for std::string_view, so we have to
concatenate the strings ourselves.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/6910
by Wojciech Mitros:
This patch enables selecting more than 2^32 rows from a table. The change
becomes active after upgrading whole cluster - until then old limits are
used.
Tested reading 4.5*10^9 rows from a virtual table, manually upgrading a
cluster with ccm and performing cql SELECT queries during the upgrade,
ran unit tests in dev mode and cql and paging dtests.
tests: add large paging state tests
increase the maximum size of query results to 2^64
Add a unit test checking if the top 32 bits of the number of remaining rows in paging state
is used correctly and a manual test checking if it's possible to select over 2^32 rows from a table
and a virtual reader for this table.
This commit takes out some responsibilities of `cdc::transformer`
(which is currently a big ball of mud) into a separate class.
This class is a simple abstraction for creating entries in a CDC log
mutation.
Low-level calls to the mutation API (such as `set_cell`) inside
`cdc::transformer` were replaced by higher-level calls to the builder
abstraction, removing some duplication of logic.
util/loading_cache.hh includes adjusted.
* seastar 02ad74fa7d...eb452a22a0 (17):
> core: add missing include for std::allocator_traits
> exceptions: move timed_out_error and factory into its own header file
> future: parallel_for_each: add disable_failure_guard for parallel_for_each_state
> Merge "Improve file API noexcept correctness" from Rafael
> util: Add a with_allocation_failures helper
> future: Fix indentation
> future: Refactor duplicated try/catch
> future: Make set_to_current_exception public
> future: Add noexcept to continuation related functions
> core: mark timer cancellation functions as noexcept
> future: Simplify future::schedule
> test: add a case for overwriting exact routes
> http: throw on duplicated routes to prevent memory leaks
> metrics: Remove the type label
> fstream: turn file_data_source_impl's memory corruption bugs into aborts
> doc: update tutorial splitting script
> reactor_backend: let the reactor know again if any work was done by aio backend
Merged pull request https://github.com/scylladb/scylla/pull/6969 by
Calle Wilund:
Fixes#6942Fixes#6926Fixes#6933
We use clustering [lo:hi) range for iterator query.
To avoid encoding inclusive/exclusive range (depending on
init/last get_records call), instead just increment
the timeuuid threshold.
Also, dynamo result always contains a "records" entry. Include one for us as well.
Also, if old (or new) image for a change set is empty, dynamo will not include
this key at all. Alternator did return an empty object. This changes it to be excluded
on empty.
alternator::streams: Don't include empty new/old image
alternator::streams: Always include "Records" array in get_records reponse
alternator::streams: Incr shard iterator threshold in get_records
"
With this patches a monitor is destroyed before the writer, which
simplifies the writer destructor.
"
* 'espindola/simplify-write-monitor-v2' of https://github.com/espindola/scylla:
sstables: Delete write_failed
sstables: Move monitor after writer in compaction_writer
Fixes#6933
If old (or new) image for a change set is empty, dynamo will not
include this key at all. Alternator did return an empty object.
This changes it to be excluded on empty.
Fixes#6942
We use clustering [lo:hi) range for iterator query.
To avoid encoding inclusive/exclusive range (depending on
init/last get_records call), instead just increment
the timeuuid threshold.