The list of all endpoints for a query can be empty if we have
replication_factor 0 or there are no live endpoints for this token.
Do not access all_replicas.front() in this case.
Fixes#5935.
Message-Id: <20200306192521.73486-2-kostja@scylladb.com>
(cherry picked from commit 9827efe554)
The function was reimplemented to solve the following issues.
The cutom implementation also improved its performance in
close to 19%
Using regex_match("[a-z][a-z0-9_]*") may cause stack overflow on long input strings
as found with the limits_test.py:TestLimits.max_key_length_test dtest.
std::regex_replace does not replace in-place so no doubling of
quotes was actually done.
Add unit test that reproduces the crash without this fix
and tests various string patterns for correctness.
Note that defining the regex with std::regex::optimize
still ended up with stack overflow.
Fixes#5671
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0329fe1fd1)
Due to a bug the entire segment is written in one huge write of 32Mb.
The idea was to split it to writes of 128K, so fix it.
Fixes#5857
Message-Id: <20200220102939.30769-1-gleb@scylladb.com>
(cherry picked from commit df2f67626b)
The get token range API can become big which can cause large allocation
and stalls.
This patch replace the implementation so it would stream the results
using the http stream capabilities instead of serialization and sending
one big buffer.
Fixes#6297
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 7c4562d532)
In bash, 'A || B && C' will be problem because when A is true, then it will be
evaluates C, since && and || have the same precedence.
To avoid the issue we need make B && C in one statement.
Fixes#5764
(cherry picked from commit b6988112b4)
This is relevant only when using partition or clustering keys which
have a representation in memory which is larger than 12.8 KB (10% of
LSA segment size).
There are several places in code (cache, background garbage
collection) which may need to linearize keys because of performing key
comparison, but it's not done safely:
1) the code does not run with the LSA region locked, so pointers may
get invalidated on linearization if it needs to reclaim memory. This
is fixed by running the code inside an allocating section.
2) LSA region is locked, but the scope of
with_linearized_managed_bytes() encloses the allocating section. If
allocating section needs to reclaim, linearization context will
contain invalidated pointers. The fix is to reorder the scopes so
that linearization context lives within an allocating section.
Example of 1 can be found in
range_populating_reader::handle_end_of_stream() where it performs a
lookup:
auto prev = std::prev(it);
if (prev->key().equal(*_cache._schema, *_last_key->_key)) {
it->set_continuous(true);
but handle_end_of_stream() is not invoked under allocating section.
Example of 2 can be found in mutation_cleaner_impl::merge_some() where
it does:
return with_linearized_managed_bytes([&] {
...
return _worker_state->alloc_section(region, [&] {
Fixes#6637.
Refs #6108.
Tests:
- unit (all)
Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit e81fc1f095)
Fixes#5808
Seems some gcc:s will generate the code as sign extending. Mine does not,
but this should be more correct anyhow.
Added small stringify test to serialization_test for inet_address
(cherry picked from commit a14a28cdf4)
Consider
1. Start n1, n2 in the cluster
2. Stop n2 and delete all data for n2
3. Start n2 to replace itself with replace_address_first_boot: n2
4. Kill n2 before n2 finishes the replace operation
5. Remove replace_address_first_boot: n2 from scylla.yaml of n2
6. Delete all data for n2
7. Start n2
At step 7, n2 will be allowed to bootstrap as a new node, because the
application state of n2 in the cluster is HIBERNATE which is not
rejected in the check of is_safe_for_bootstrap. As a result, n2 will
replace n2 with a different tokens and a different host_id, as if the
old n2 node was removed from the cluster silently.
Fixes#5172
(cherry picked from commit cdcedf5eb9)
Scylla returns the wrong error code (0000 - server internal error)
in response to trying to do authentication/authorization operations
that involves a non-existing role.
This commit changes those cases to return error code 2200 (invalid
query) which is the correct one and also the one that Cassandra
returns.
Tests:
Unit tests (Dev)
All auth and auth_role dtests
(cherry picked from commit ce8cebe34801f0ef0e327a32f37442b513ffc214)
Fixes#6363.
The implementation of get_range_to_address_map has a default behaviour,
when getting an empty keypsace, it uses the first non-system keyspace
(first here is basically, just a keyspace).
The current implementation has two issues, first, it uses a reference to
a string that is held on a stack of another function. In other word,
there's a use after free that is not clear why we never hit.
The second, it calls get_non_system_keyspaces twice. Though this is not
a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling
that function does have a cost).
This patch solves both issues, by chaning the implementation to hold a
string instead of a reference to a string.
Second, it stores the results from get_non_system_keyspaces and reuse
them it's more efficient and holds the returned values on the local
stack.
Fixes#6465
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 69a46d4179)
When index file is larger than 4GB, offset calculation will overflow
uint32_t and _promoted_index_end will be too small.
As a result, promoted_index_size calculation will underflow and the
rest of the page will be interpretd as a promoted index.
The partitions which are in the remainder of the index page will not
be found by single-partition queries.
Data is not lost.
Introduced in 6c5f8e0eda.
Fixes#6040
Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com>
(cherry picked from commit a6c87a7b9e)
In a recent next failure I got the following backtrace
function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101
at ./seastar/include/seastar/core/shared_ptr.hh:463
at repair/row_level.cc:2059
This patch changes a few functions to use finally to make sure the sink
is always closed.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200515202803.60020-1-espindola@scylladb.com>
(cherry picked from commit 311fbe2f0a)
Ref #6414
Consider: n1, n2, n1 is the repair master, n2 is the repair follower.
=== Case 1 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
is written to sstable, r2 is not written yet, r1 belongs to
partition 1, r2 belongs to partition 2. It yields after row r1 is
written.
data: partition_start, r1
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream()
data: partition_start, r1, partition_end
5) Step 2 resumes to apply the rows.
data: partition_start, r1, partition_end, partition_end, partition_start, r2
=== Case 2 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
is written to sstable, r2 is not written yet, r1 belongs to partition
1, r2 belongs to partition 2. It yields after partition_start for r2
is written but before _partition_opened is set to true.
data: partition_start, r1, partition_end, partition_start
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream().
Since _partition_opened[node_idx] is false, partition_end is skipped,
end_of_stream is written.
data: partition_start, r1, partition_end, partition_start, end_of_stream
This causes unbalanced partition_start and partition_end in the stream
written to sstables.
To fix, serialize the write_end_of_stream and apply_rows with a semaphore.
Fixes: #6394Fixes: #6296Fixes: #6414
(cherry picked from commit b2c4d9fdbc)
When sending hints from one file, rps_set field in send_one_file_ctx
keeps track of commitlog positions of hints that are being currently
sent, or have failed to be sent. At the end of the operation, if sending
of some hints failed, we will choose position of the earliest hint that
failed to be sent, and will retry sending that file later, starting from
that position. This position is stored in _last_not_complete_rp.
Usually, this set has a bounded size, because we impose a limit of at
most 128 hints being sent concurrently. Because we do not attempt to
send any more hints after a failure is detected, rps_set should not have
more than 128 elements at a time.
Due to a bug, commitlog positions of old hints (older than
gc_grace_seconds of the destination table) were inserted into rps_set
but not removed after checking their age. This could cause rps_set to
grow very large when replaying a file with old hints.
Moreover, if the file mixed expired and non-expired hints (which could
happen if it had hints to two tables with different gc_grace_seconds),
and sending of some non-expired hints failed, then positions of expired
hints could influence calculation _last_not_complete_rp, and more hints
than necessary would be resent on the next retry.
This simple patch removes commitlog position of a hint from rps_set when
it is detected to be too old.
Fixes#6422
(cherry picked from commit 85d5c3d5ee)
Related commit: 85d5c3d
When attempting to send a hint, an exception might occur that results in
that hint being discarded (e.g. keyspace or table of the hint was
removed).
When such an exception is thrown, position of the hint will already be
stored in rps_set. We are only allowed to retain positions of hints that
failed to be sent and needed to be retried later. Dropping a hint is not
an error, therefore its position should be removed from rps_set - but
current logic does not do that.
Because of that bug, hint files with many discardable hints might cause
rps_set to grow large when the file is replayed. Furthermore, leaving
positions of such hints in rps_set might cause more hints than necessary
to be re-sent if some non-discarded hints fail to be sent.
This commit fixes the problem by removing positions of discarded hints
from rps_set.
Fixes#6433
(cherry picked from commit 0c5ac0da98)
Speculative reader has more targets that needed for CL. In case there is
a digest mismatch the repair runs between all of them, but that violates
provided CL. The patch makes it so that repair runs only between
replicas that answered (there will be CL of them).
Fixes#6123
Reviewed-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402132245.GA21956@scylladb.com>
(cherry picked from commit 36a24bbb70)
On Centos 7 machine:
fstrim.timer not enabled, only unmasked due scylla_fstrim_setup on installation
When trying run scylla-fstrim service manually you get error:
Traceback (most recent call last):
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 60, in <module>
main()
File "/opt/scylladb/scripts/libexec/scylla_fstrim", line 44, in main
cfg = parse_scylla_dirs_with_default(conf=args.config)
File "/opt/scylladb/scripts/scylla_util.py", line 484, in parse_scylla_dirs_with_default
if key not in y or not y[k]:
NameError: name 'k' is not defined
It caused by error in scylla_util.py
Fixes#6294.
(cherry picked from commit 068bb3a5bf)
Until now, attempts to print counter update cell would end up
calling abort() because `atomic_cell_view::value()` has no
specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`,
i.e. counter update IMR type. Such visitor is not easy to write
if we want to intercept counters only (and not all int64_t values).
Anyway, linearized byte representation of counter cell would not
be helpful without knowing if it consists of counter shards or
counter update (delta) - and this must be known upon `deserialize`.
This commit introduces simple approach: it determines cell type on
high level (from `atomic_cell_view`) and prints counter contents by
`counter_cell_view` or `atomic_cell_view::counter_update_value()`.
Fixes#5616
(cherry picked from commit 0ea17216fe)
The intention of the code was to clear sharding metadata
chunked_vector so that it doesn't bloat memory.
The type of c is `chunked_vector*`. Assigning `{}`
clears the pointer while the intended behavior was to reset the
`chunked_vector` instance. The original instance is left unmodified
with all its reserved space.
Because of this, the previous fix had no effect because token ranges
are stored entirely inline and popping them doesn't realease memory.
Fixes#4951
Tests:
- sstable_mutation_test (dev)
- manual using scylla binary on customer data on top of 2019.1.5
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1584559892-27653-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5fe626a887)
Row cache needs to be invalidated whenever data in sstables
changes. Cleanup removes data from sstables which doesn't belong to
the node anymore, which means cache must be invalidated on cleanup.
Currently, stale data can be returned when a node re-owns ranges which
data are still stored in the node's row cache, because cleanup didn't
invalidate the cache."
Fixes#4446.
tests:
- unit tests (dev mode)
- dtests:
update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test
cleanup_test.py
(cherry picked from commit d0b6be0820)
In order to prevent users from creating a network topology
strategy instance with invalid inputs, it's not enough to use
std::stol() on the input: a string "3abc" still returns the number '3',
but will later confuse cqlsh and other drivers, when they ask for
topology strategy details.
The error message is now more human readable, since for incorrect
numeric inputs it used to return a rather cryptic message:
ServerError: stol()
This commit fixes the issue and comes with a simple test.
Fixes#3801
Tests: unit(dev)
Message-Id: <7aaae83d003738f047d28727430ca0a5cec6b9c6.1583478000.git.sarna@scylladb.com>
(cherry picked from commit 5b7a35e02b)
The "jobs" script is used to determine the amount of compilation
parallelism on a machine. It attempts to ensure each GCC process has at
least 4 GB of memory per core. However, in the worst case scenario, we
could end up having the GCC processes take up all the system memory,
forcin swapping or OOM killer to kick in. For example, on a 4 core
machine with 16 GB of memory, this worst case scenario seems easy to
trigger in practice.
Fix up the problem by keeping a 1 GB of memory reserve for other
processes and calculating parallelism based on that.
Message-Id: <20200423082753.31162-1-penberg@scylladb.com>
(cherry picked from commit 7304a795e5)
Generating timestamps for auth signatures used a non-thread-safe
::gmtime function instead of thread-safe ::gmtime_r.
Tests: unit(dev)
Fixes#6345
(cherry picked from commit fb7fa7f442)
"
This series fix hang in multishard_writer when error happens. It contains
- multishard_writer: Abort the queue attached to consumers when producer fails
- repair: Fix hang when the writer is dead
Fixes#6241
Refs: #6248
"
* asias-stream_fix_multishard_writer_hang:
repair: Fix hang when the writer is dead
mutation_writer_test: Add test_multishard_writer_producer_aborts
multishard_writer: Abort the queue attached to consumers when producer fails
(cherry picked from commit 8925e00e96)
If no keyspace is specified when taking snapshot, there will be a segfault
because keynames is unconditionally dereferenced. Let's return an error
because a keyspace must be specified when column families are specified.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>
(cherry picked from commit 02e046608f)
Fixes#6336.
In commit 388b492040, which was only supposed
to move around code, we accidentally lost the line which does
_executor.local()._stats.total_operations++;
So after this commit this counter was always zero...
This patch returns the line incrementing this counter.
Arguably, this counter is not very important - a user can also calculate
this number by summing up all the counters in the scylla_alternator_operation
array (these are counters for individual types of operations). Nevertheless,
as long as we do export a "scylla_alternator_total_operations" metric,
we need to correctly calculate it and can't leave it zero :-)
Fixes#5836
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219162820.14205-1-nyh@scylladb.com>
(cherry picked from commit b8aed18a24)
When multiple key columns (clustering or partition) are passed to
the schema constructor, all having the same column id, the expectation
is that these columns will retain the order in which they were passed to
`schema_builder::with_column()`. Currently however this is not
guaranteed as the schema constructor sort key columns by column id with
`std::sort()`, which doesn't guarantee that equally comparing elements
retain their order. This can be an issue for indexes, the schemas of
which are built independently on each node. If there is any room for
variance between for the key column order, this can result in different
nodes having incompatible schemas for the same index.
The fix is to use `std::stable_sort()` which guarantees that the order
of equally comparing elements won't change.
This is a suspected cause of #5856, although we don't have hard proof.
Fixes: #5856
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
[avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes
unstable at 17 elements, and the failing schema had a
clustering key with 23 elements]
Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com>
(cherry picked from commit a4aa753f0f)
While CQL does not allow creation of a materialized view with more than one
base regular column in the view's key, in Alternator we do allow this - both
partition and clustering key may be a base regular column. We had a bug in
the logic handling this case:
If the new base row is missing a value for *one* of the view key columns,
we shouldn't create a view row. Similarly, if the existing base row was
missing a value for *one* of the view key columns, a view row does not
exist and doesn't need to be deleted. This was done incorrectly, and made
decisions based on just one of the key columns, and the logic is now
fixed (and I think, simplified) in this patch.
With this patch, the Alternator test which previously failed because of
this problem now passes. The patch also includes new tests in the existing
C++ unit test test_view_with_two_regular_base_columns_in_key. This tests
was already supposed to be testing various cases of two-new-key-columns
updates, but missed the cases explained above. These new tests failed
badly before this patch - some of them had clean write errors, others
caused crashes. With this patch, they pass.
Fixes#6008.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200312162503.8944-1-nyh@scylladb.com>
(cherry picked from commit 635e6d887c)
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).
SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).
This commit does two things:
1. for all SSTables created after this commit, include a new feature
flag, CorrectUDTsInCollections, presence of which implies that frozen
UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
nested inside collections are always frozen, even if they don't have
the tag. This assumption is safe to be made, because at the time of
this commit, Scylla does not allow non-frozen (multi-cell) types inside
collections or UDTs, and because of point 1 above.
There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.
Fixes#6130.
(cherry picked from commit 3d811e2f95)
As @tgrabiec helpfully pointed out, creating a row tombstone
for a table which does not have a clustering key in its schema
creates something that looks like an open-ended range tombstone.
That's problematic for KA/LA sstable formats, which are incapable
of writing such tombstones, so a workaround is provided
in order to allow using KA/LA in alternator.
Fixes#6035
Cherry-picked from 0a2d7addc0
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.
n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)
One year later, user wants the upgrade n1,n2,n3 to a new version
when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.
Such unnecessary marking of node down can cause availability issues.
For example:
DC1: n1, n2
DC2: n3, n4
When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.
To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.
Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.
Fixes#5164
(cherry picked from commit 743b529c2b)
User reported an issue that after a node restart, the restarted node
is marked as DOWN by other nodes in the cluster while the node is up
and running normally.
Consier the following:
- n1, n2, n3 in the cluster
- n3 shutdown itself
- n3 send shutdown verb to n1 and n2
- n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to
INT_MAX
- n3 restarts
- n3 sends gossip shadow rounds to n1 and n2, in
storage_service::prepare_to_join,
- n3 receives response from n1, in gossiper::handle_ack_msg, since
_enabled = false and _in_shadow_round == false, n3 will apply the
application state in fiber1, filber 1 finishes faster filber 2, it
sets _in_shadow_round = false
- n3 receives response from n2, in gossiper::handle_ack_msg, since
_enabled = false and _in_shadow_round == false, n3 will apply the
application state in fiber2, filber 2 yields
- n3 finishes the shadow round and continues
- n3 resets gossip endpoint_state_map with
gossiper.reset_endpoint_state_map()
- n3 resumes fiber 2, apply application state about n3 into
endpoint_state_map, at this point endpoint_state_map contains
information including n3 itself from n2.
- n3 calls gossiper.start_gossiping(generation_number, app_states, ...)
with new generation number generated correctly in
storage_service::prepare_to_join, but in
maybe_initialize_local_state(generation_nbr), it will not set new
generation and heartbeat if the endpoint_state_map contains itself
- n3 continues with the old generation and heartbeat learned in fiber 2
- n3 continues the gossip loop, in gossiper::run,
hbs.update_heart_beat() the heartbeat is set to the number starting
from 0.
- n1 and n2 will not get update from n3 because they use the same
generation number but n1 and n2 has larger heartbeat version
- n1 and n2 will mark n3 as down even if n3 is alive.
To fix, always use the the new generation number.
Fixes: #5800
Backports: 3.0 3.1 3.2
(cherry picked from commit 62774ff882)
When qualifying columns to be fetched for filtering, we also check
if the target column is not used as an index - in which case there's
no need of fetching it. However, the check was incorrectly assuming
that any restriction is eligible for indexing, while it's currently
only true for EQ. The fix makes a more specific check and contains
many dynamic casts, but these will hopefully we gone once our
long planned "restrictions rewrite" is done.
This commit comes with a test.
Fixes#5708
Tests: unit(dev)
(cherry picked from commit 767ff59418)
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.
Fixes#5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>
(cherry picked from commit ac6f64a885)
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:
1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.
To fix, resize the _regions vector outside the lock.
Fixes#6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>
(cherry picked from commit c020b4e5e2)
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.
To prevent that we need to set the following macros as follows:
unset `_unique_build_ids`
set `_no_recompute_build_ids` to 1
Fixes#5881
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 25a763a187)
... when clustering key is unavailable' from Benny
This series fixes null pointer dereference seen in #5794efd7efe cql3: generate_base_key_from_index_pk; support optional index_ck
7af1f9e cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
7fe1a9e cql3: do_execute_base_query: fixup indentation
Fixes#5794
Branches: 3.3
Test: unit(dev) secondary_indexes_test:TestSecondaryIndexes.test_truncate_base(debug)
* bhalevy/fix-5794-generate_base_key_from_index_pk:
cql3: do_execute_base_query: fixup indentation
cql3: do_execute_base_query: generate open-ended slice when clustering key is unavailable
cql3: generate_base_key_from_index_pk; support optional index_ck
(cherry picked from commit 4e95b67501)
It seems like *.service is conflicting on install time because the file
installed twice, both debian/*.service and debian/scylla-server.install.
We don't need to use *.install, so we can just drop the line.
Fixes#5640
(cherry picked from commit 29285b28e2)
Cells in CDC logs used to be created while completely neglecting
TTLs (the TTLs from `cdc = {...'ttl':600}`). This patch adds TTLs
to all cells; there are no row markers, so wee need not set TTL
there.
Fixes#5688
(cherry picked from commit 67b92c584f)
There may be other commitlog writes waiting for zeroing to complete, so
not using proper scheduling class causes priority inversion.
Fixes#5858.
Message-Id: <20200220102939.30769-2-gleb@scylladb.com>
(cherry picked from commit 6a78cc9e31)
The maybe_schedule_schema_pull waits for schema_tables_v3 to
become available. This is unsafe in case migration manager
goes away before the feature is enabled.
Fix this by subscribing on feature with feature::listener and
waiting for condition variable in maybe_schedule_schema_pull.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The sleep is interrupted with the abort source, the "wait" part
is done with the existing _background_tasks gate. Also we need
to make sure the gate stays alive till the end of the function,
so make use of the async_sharded_service (migration manager is
already such).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This unregistration doesn't happen currently, but doesn't seem to
cause any problems in general, as on stop gossiper is stopped and
nothing from it hits the store_service.
However (!) if an exception pops up between the storage_service
is subscribed on gossiper and the drain_on_shutdown defer action
is set up then we _may_ get into the following situation:
- main's stuff gets unrolled back
- gossiper is not stopped (drain_on_shutdown defer is not set up)
- migration manager is stopped (with deferred action in main)
- a nitification comes from gossiper
-> storage_service::on_change might want to pull schema with
the help of local migration manager
-> assert(local_is_initialized) strikes
Fix this by registering storage_service to gossiper a bit earlier
(both are already initialized y that time) and setting up unregister
defer right afterwards.
Test: unit(dev), manual start-stop
Bug: #5628
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200130190343.25656-1-xemul@scylladb.com>
* seastar dd686552ff...f54084c08f (2):
> reactor: fallback to epoll backend when fs.aio-max-nr is too small
> util: move read_sys_file_as() from iotune to seastar header, rename read_first_line_as()
Fixes#5638.
This patch fixes a bug that appears because of an incorrect interaction
between counters and hinted handoff.
When a counter is updated on the leader, it sends mutations to other
replicas that contain all counter shards from the leader. If consistency
level is achieved but some replicas are unavailable, a hint with
mutation containing counter shards is stored.
When a hint's destination node is no longer its replica, it is attempted
to be sent to all its current replicas. Previously,
storage_proxy::mutate was used for that purpose. It was incorrect
because that function treats mutations for counter tables as mutations
containing only a delta (by how much to increase/decrease the counter).
These two types of mutations have different serialization format, so in
this case a "shards" mutation is reinterpreted as "delta" mutation,
which can cause data corruption to occur.
This patch backports `storage_proxy::mutate_hint_from_scratch`
function, which bypasses special handling of counter mutations and
treats them as regular mutations - which is the correct behavior for
"shards" mutations.
Refs #5833.
Backports: 3.1, 3.2, 3.3
Tests: unit(dev)
(cherry picked from commit ec513acc49)
The update generation path must track and apply all tombstones,
both from the existing base row (if read-before-write was needed)
and for the new row. One such path contained an error, because
it assumed that if the existing row is empty, then the update
can be simply generated from the new row. However, lack of the
existing row can also be the result of a partition/range tombstone.
If that's the case, it needs to be applied, because it's entirely
possible that this partition row also hides the new row.
Without taking the partition tombstone into account, creating
a future tombstone and inserting an out-of-order write before it
in the base table can result in ghost rows in the view table.
This patch comes with a test which was proven to fail before the
changes.
Branches 3.1,3.2,3.3
Fixes#5793
Tests: unit(dev)
Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>
(cherry picked from commit e93c54e837)
Before this patch the iterations over migration_notifier::_listeners
could race with listeners being added and removed.
The addition side is not modified, since it is common to add a
listener during construction and it would require a fairly big
refactoring. Instead, the iteration is modified to use indexes instead
of iterators so that it is still valid if another listener is added
concurrently.
For removal we use a rw lock, since removing an element invalidates
indexes too. There are only a few places that needed refactoring to
handle unregister_listener returning a future<>, so this is probably
OK.
Fixes#5541.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200120192819.136305-1-espindola@scylladb.com>
(cherry picked from commit 27bd3fe203)
After 546556b71b we can have mixed writes into commitlog,
some do flush immediately some do not. If non flushing write races with
flushing one and becomes responsible for writing back its buffer into a
file flush will be skipped which will cause assert in batch_cycle() to
trigger since flush position will not be advanced. Fix that by checking
that flush was skipped and in this case flush explicitly our file
position.
Fixes#5670
Message-Id: <20200128145103.GI26048@scylladb.com>
(cherry picked from commit c654ffe34b)
This patch affects the LWT queries with IF conditions of the
following form: `IF col in :value`, i.e. if the parameter
marker is used.
When executing a prepared query with a bound value
of `(None,)` (tuple with null, example for Python driver), it is
serialized not as NULL but as "empty" value (serialization
format differs in each case).
Therefore, Scylla deserializes the parameters in the request as
empty `data_value` instances, which are, in turn, translated
to non-empty `bytes_opt` with empty byte-string value later.
Account for this case too in the CAS condition evaluation code.
Example of a problem this patch aims to fix:
Suppose we have a table `tbl` with a boolean field `test` and
INSERT a row with NULL value for the `test` column.
Then the following update query fails to apply due to the
error in IF condition evaluation code (assume `v=(null)`):
`UPDATE tbl SET test=false WHERE key=0 IF test IN :v`
returns false in `[applied]` column, but is expected to succeed.
Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286)
Fixes: #5710
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit bcc4647552)
"
client_state is used simultaneously by many requests running in parallel
while tracing state pointer is per request. Both those facts do not sit
well together and as a result sometimes tracing state is being overwritten
while still been used by active request which may cause incorrect trace
or even a crash.
"
Fixes#5700.
Backported from 9f1f60fc38
* 'gleb/trace_fix_3.3_backport' of ssh://github.com/scylladb/seastar-dev:
client_state: drop the pointer to a tracing state from client_state
transport: pass tracing state explicitly instead of relying on it been in the client_state
alternator: pass tracing state explicitly instead of relying on it been in the client_state
The table::flush_streaming_mutations is used in the days when streaming
data goes to memtable. After switching to the new streaming, data goes
to sstables directly in streaming, so the sstables generated in
table::flush_streaming_mutations will be empty.
It is unnecessary to invalidate the cache if no sstables are added. To
avoid unnecessary cache invalidating which pokes hole in the cache, skip
calling _cache.invalidate() if the sstables is empty.
The steps are:
- STREAM_MUTATION_DONE verb is sent when streaming is done with old or
new streaming
- table::flush_streaming_mutations is called in the verb handler
- cache is invalidated for the streaming ranges
In summary, this patch will avoid a lot of cache invalidation for
streaming.
Backports: 3.0 3.1 3.2
Fixes: #5769
(cherry picked from commit 5e9925b9f0)
This assert, added by 060e3f8 is supposed to make sure the invariant of
the append() is respected, in order to prevent building an invalid row.
The assert however proved to be too harsh, as it converts any bug
causing out-of-order clustering rows into cluster unavailability.
Downgrade it to on_internal_error(). This will still prevent corrupt
data from spreading in the cluster, without the unavailability caused by
the assert.
Fixes: #5786
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>
(cherry picked from commit 3164456108)
client_state is shared between requests and tracing state is per
request. It is not safe to use the former as a container for the later
since a state can be overwritten prematurely by subsequent requests.
(cherry picked from commit 31cf2434d6)
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per request.
Currently next request may overwrite tracing state for previous one
causing, in a best case, wrong trace to be taken or crash if overwritten
pointer is freed prematurely.
Fixes#5700
(cherry picked from commit 9f1f60fc38)
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per
request. This is not yet an issue for the alternator since it creates
new client_state object for each request, but first of all it should not
and second trace state will be dropped from the client_state, by later
patch.
(cherry picked from commit 38fcab3db4)
Since dpkg does not re-install conffiles when it removed by user,
currently we are missing dependencies.conf and sysconfdir.conf on rollback.
To prevent this, we need to stop running
'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'.
Fixes#5734
(cherry picked from commit 43097854a5)
Change the SCYLLA_REPO_URL variable to point to branch-3.3 instead of
master. This ensures that Docker image builds that don't specify the
variable build from the right repository by default.
awk returns float value on Debian, it causes postinst script failure
since we compare it as integer value.
Replaced with sed + bash.
Fixes#5569
(cherry picked from commit 5627888b7c)
Treat writes to local.paxos as user memory, as the number of writes is
dependent on the amount of user data written with LWT.
Fixes#5682
Message-Id: <20200130150048.GW26048@scylladb.com>
(cherry picked from commit b08679e1d3)
We would sometimes produce an unnecessary extra 0xff prefix byte.
The new encoding matches what cassandra does.
This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.
Fixes#5656
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit c89c90d07f)
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.
Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.
This patch therefore makes eventually() more patient, by a factor of
32.
Fixes#4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>
(cherry picked from commit ec5b721db7)
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.
Fixes#5641
(cherry picked from commit dd81fd3454)
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.
Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).
I checked that --pids-limit is supported by old versions of
docker and by podman.
Fixes#5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>
(cherry picked from commit 897320f6ab)
A mistake in handling legacy checks for special 'idx_token' column
resulted in not recognizing materialized views backing secondary
indexes properly. The mistake is really a typo, but with bad
consequences - instead of checking the view schema for being an index,
we asked for the base schema, which is definitely not an index of
itself.
Branches 3.1,3.2 (asap)
Fixes#5621Fixes#4744
(cherry picked from commit 9b379e3d63)
The atomic_cell pretty printers use a mix of commas and semicolons.
This change makes them use commas everywhere, for consistency.
Message-Id: <20200116133327.2610280-1-avi@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5567
from Calle Wilund:
Fixes#5314
Instead of tying CDC handling into cql statement objects, this patch set
moves it to storage proxy, i.e. shared code for mutating stuff. This means
we automatically handle cdc for code paths outside cql (i.e. alternator).
It also adds api handling (though initially inefficient) for batch statements.
CDC is tied into storage proxy by giving the former a ref to the latter (per
shard). Initially this is not a constructor parameter, because right now we
have chicken and egg issues here. Hopefully, Pavels refactoring of migration
manager and notifications will untie these and this relationship can become
nicer.
The actual augmentation can (as stated above) be made much more efficient.
Hopefully, the stream management refactoring will deal with expensive stream
lookup, and eventually, we can maybe coalesce pre-image selects for batches.
However, that is left as an exercise for when deemed needed.
The augmentation API has an optional return value for a "post-image handler"
to be used iff returned after mutation call is finished (and successful).
It is not yet actually invoked from storage_proxy, but it is at least in the
call chain.
The set make dependencies between mm and other services cleaner,
in particular, after the set:
- the query processor no longer needs migration manager
(which doesn't need query processor either)
- the database no longer needs migration manager, thus the mutual
dependency between these two is dropped, only migration manager
-> database is left
- the migration manager -> storage_service dependency is relaxed,
one more patchset will be needed to remove it, thus dropping one
more mutual dependency between them, only the storage_service
-> migration manager will be left
- the migration manager is stopped on drain, but several more
services need it on stop, thus causing use after free problems,
in particular there's a caught bug when view builder crashes
when unregistering from notifier list on stop. Fixed.
Tests: unit(dev)
Fixes: #5404
Enabling asan enables a few cleanup optimizations in gcc. The net
result is that using
-fsanitize=address -fno-sanitize-address-use-after-scope
Produces code that uses a lot less stack than if the file is compiled
with just -O0.
This patch adds -O1 in addition to
-fno-sanitize-address-use-after-scope to protect the unfortunate
developer that decides to build in dev mode with --cflags='-O0 -g'.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-2-espindola@scylladb.com>
It is sometimes convenient to build with flags that don't match any
existing mode.
Recently I was tracking a bug that would not reproduce with debug, but
reproduced with dev, so I tried debugging the result of
./configure.py --cflags="-O0 -g"
While the binary had debug info, it still had optimizations because
configure.py put the mode flags after the user flags (-O0 -O1). This
patch flips the order (-O1 -O0) so that the flags passed in the
command line win.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200116012318.361732-1-espindola@scylladb.com>
CQL transport code relies on an exception's C++ type to create correct
reply, but in lwt we converted some mutation_timeout exceptions to more
generic request_timeout while forwarding them which broke the protocol.
Do not drop type information.
Fixes#5598.
Message-Id: <20200115180313.GQ9084@scylladb.com>
Murmur3 is the default partitioner.
ByteOrder and Random are the deprecated ones
and should be mentioned in the description.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5294 from
Amnon Heiman:
To use a snapshot we need a schema file that is similar to the result of
running cql DESCRIBE command.
The DESCRIBE is implemented in the cql driver so the functionality needs
to be re-implemented inside scylla.
This series adds a describe method to the schema file and use it when doing
a snapshot.
There are different approach of how to handle materialize views and
secondary indexes.
This implementation creates each schema.cql file in its own relevant
directory, so the schema for materializing view, for example, will be
placed in the snapshot directory of the table of that view.
Fixes#4192
This commit makes most sleeps in gossip.cc abortable. It is now possible
to quickly shut down a node during startup, most notably during the
phase while it waits for gossip to settle.
This reduces network traffic and eliminates time for installation when
building packages from the frozen toolchain, as well as isolating the
build from updates to those dependencies which may cause breakage.
This patch set adds support for CQL tests to test.py,
as well as many other improvements:
* --name is now a positional argument
* test output is preserved in testlog/${mode}
* concise output format
* better color support
* arbitrary number of test suites
* per-suite yaml-based configuration
* options --jenkins and --xunit are removed and xml
files are generated for all runs
A simple driver is written in C++ to read CQL for
standard input, execute in embedded mode and produce output.
The patch is checked with BYO.
Reviewed-by: Dejan Mircevski <dejan@scylladb.com>
* 'test.py' of github.com:/scylladb/scylla-dev: (39 commits)
test.py: introduce BoostTest and virtualize custom boost arguments
test.py: sort tests within a suite, and sort suites
test.py: add a basic CQL test
test.py: add CQL .reject files to gitignore
test.py: print a colored unidiff in case of test failure
test.py: add CqlTestSuite to run CQL tests
test.py: initial import of CQL test driver, cql_repl
test.py: remove custom colors and define a color palette
test.py: split test output per test mode
test.py: remove tests_to_run
test.py: virtualize Test.run(), to introduce CqlTest.Run next
test.py: virtualize test search pattern per TestSuite
test.py: virtualize write_xunit_report()
test.py: ensure print_summary() is agnostic of test type
test.py: tidy up print_summary()
test.py: introduce base class Test for CQL and Unit tests
test.py: move the default arguments handling to UnitTestSuite
test.py: move custom unit test command line arguments to suite.yaml
test.py: move command line argument processing to UnitTestSuite
test.py: introduce add_test(), which is suite-specific
...
The Ubuntu-based Docker image uses Scylla 1.0 and has not been updated
since 2017. Let's remove it as unmaintained.
Message-Id: <20200115102405.23567-1-penberg@scylladb.com>
"
Currently commitlog supports two modes of operation. First is 'periodic'
mode where all commitlog writes are ready the moment they are stored in
a memory buffer and the memory buffer is flushed to a storage periodically.
Second is a 'batch' mode where each write is flushed as soon as possible
(after previous flush completed) and writes are only ready after they
are flushed.
The first option is not very durable, the second is not very efficient.
This series adds an option to mark some writes as "more durable" in
periodic mode meaning that they will be flushed immediately and reported
complete only after the flush is complete (flushing a durable write also
flushes all writes that came before it). It also changes paxos to use
those durable writes to store paxos state.
Note that strictly speaking the last patch is not needed since after
writing to an actual table the code updates paxos table and the later
uses durable writes that make sure all previous writes are flushed. Given
that both writes supposed to run on the same shard this should be enough.
But it feels right to make base table writes durable as well.
"
* 'gleb/commilog_sync_v4' of github.com:scylladb/seastar-dev:
paxos: immediately sync commitlog entries for writes made by paxos learn stage
paxos: mark paxos table schema as "always sync"
schema: allow schema to be marked as 'always sync to commitlog'
commitlog: add test for per entry sync mode
database: pass sync flag from db::apply function to the commitlog
commitlog: add sync method to entry_writer
Before this patch result_set_assertions was handling both null values
and missing values in the same way.
This patch changes the handling of missing values so that now checking
for a null value is not the same as checking for a value not being
present.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200114184116.75546-1-espindola@scylladb.com>
3ec889816 changed cell::make_collection() to take different code paths
depending whether its `data` argument is nothrow copyable/movable or
not. In case it is not, it is wrapped in a view to make it so (see the
above mentioned commit for a full explanation), relying on the methods
pre-existing requirement for callers to keep `data` alive while the
created writer is in use.
On closer look however it turns out that this requirement is neither
respected, nor enforced, at least not on the code level. The real
requirement is that the underlying data represented by `data` is kept
alive. If `data` is a view, it is not expected to be kept alive and
callers don't, it is instead copied into `make_collection()`.
Non-views however *are* expected to be kept alive. This makes the API
error prone.
To avoid any future errors due to this ambiguity, require all `data`
arguments to be nothrow copyable and movable. Callers are now required
to pass views of nonconforming objects.
This patch is a usability improvement and is not fixing a bug. The
current code works as-is because it happens to conform to the underlying
requirements.
Refs: #5575
Refs: #5341
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200115084520.206947-1-bdenes@scylladb.com>
This patch adds tests for the describe method.
test_describe_simple_schema tests regular tables.
test_describe_view_schema tests view and index.
Each test, create a table, find the schema, call the describe method and
compare the results to the string that was used to create the table.
The view tests also verify that adding an index or view does not change
the base table.
When comparing results, leading and trailing white spaces are ignored
and all combination of whitespaces and new lines are treated equaly.
Additional tests may be added at a future phase if required.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
When creating a snapshot we need to add a schema.cql file in the
snapshot directory that describes the table in that snapshot.
This patch adds the file using the schema describe method.
get_snapshot_details and manifest_json_filter were modified to ignore
the schema.cql file.
Fixes#4192
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This patch adds a describe method to a table schema.
It acts similar to a DESCRIBE cql command that is implemented in a CQL
driver.
The method supports tables, secondary indexes local indexes and
materialize views.
relates to: #4192
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
index_name_from_table_name is a reverse of index_table_name,
it gets a table name that was generated for an index and return the name
of the index that generated that table.
Relates to #4192
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The factory is purely a state-less thing, there is no difference what
instance of it to use, so we may omit referencing the storage_service
in passive_announce
This is 2nd simple migration_manager -> storage_service link to cut
(more to come later).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are several places where migration_manager needs storage_service
reference to get the database from, thus forming the mutual dependency
between them. This is the simplest case where the migration_manager
link to the storage_service can be cut -- the databse reference can be
obtained from storage_proxy instead.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the last place where database code needs the migration_manager
instance to be alive, so now the mutual dependency between these two
is gone, only the migration_manager needs the database, but not the
vice-versa.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage_server needs migration_manager for notifications and
carefully handles the manager's stop process not to demolish the
listeners list from under itself. From now on this dependency is
no longer valid (however the storage_service seems still need the
migration_manager, but this is different story).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch removes an implicit cql_server -> migration_manager
dependency, as the former's event notifier uses the latter
for notifications.
This dependency also breaks a loop:
storage_service -> cql_server -> migration_manager -> storage_service
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch breaks one (probably harmless but still) dependency
loop. The query_processor -> migration_manager -> storage_proxy
-> tracing -> query_processor.
The first link is not not needed, as the query_processor needs the
migration_manager purely to (ub)subscribe on notifications.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The same as with view builder. The constructor still needs both,
but the life-time reference is now for notifier only.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The migration manager itself is still needed on start to wait
for schema agreement, but there's no longer the need for the
life-time reference on it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Do not call for local migration manager instance to send notifications,
call for the local migration notifier, it will always be alive.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage service will need this guy to initialize sub-services
with. Also it registers itself with notifiers.
That said, it's convenient to have the migration notifier on board.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The _listeners list on migration_manager class and the corresponding
notify_xxx helpers have nothing to do with the its instances, they
are just transport for notification delivery.
At the same time some services need the migration manager to be alive
at their stop time to unregister from it, while the manager itself
may need them for its needs.
The proposal is to move the migration notifier into a complete separate
sharded "service". This service doesn't need anything, so it's started
first and stopped last.
While it's not effectively a "migration" notifier, we inherited the name
from Cassandra and renaming it will "scramble neurons in the old-timers'
brains but will make it easier for newcomers" as Avi says.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The comparator is refreshed to ensure the following:
- null compares less to all other types;
- null, true and false are comparable against each other,
while other types are only comparable against themselves and null.
Comparing mixed types is not currently reachable from the alternator
API, because it's only used for sets, which can only use
strings, binary blobs and numbers - thus, no new pytest cases are added.
Fixes#5454
When counter mutation is about to be sent, a leader is elected, but
if the leader fails after election, we get `rpc::closed_error`. The
exception propagates high up, causing all connections to be dropped.
This patch intercepts `rpc::closed_error` in `storage_proxy::mutate_counters`
and translates it to `mutation_write_failure_exception`.
References #2859
Run the test and compare results. Manage temporary
and .reject files.
Now that there are CQL tests, improve logging.
run_test success no longer means test success.
cql_repl is a simple program which reads CQL from stdin,
executes it, and writes results to stdout.
It support --input, --output and --log options.
--log is directed to cql_test.log by default.
--input is stdin by default
--output is stdout by default.
The result set output is print with a basic
JSON visitor.
Store test temporary files and logs in ${testdir}/${mode}.
Remove --jenkins and --xunit, and always write XML
files at a predefined location: ${testdir}/${mode}/xml/.
Use .xunit.xml extension for tests which XML output is
in xunit format, and junit.xml for an accumulated output
of all non-boost tests in junit format.
Load the command line arguments, if any, from suite.yaml, rather
than keep them hard-coded in test.py.
This is allows operations team to have easier access to these.
Note I had to sacrifice dynamic smp count for mutation_reader_test
(the new smp count is fixed at 3) since this is part
of test configuration now.
This way we can avoid iterating over all tests
to handle --repeat.
Besides, going forward the tests will be stored
in two places: in the global list of all tests,
for the runner, and per suite, for suite-based
reporting, so it's easier if TestSuite
if fully responsible for finding and adding tests.
Scan entire test/ for folders that contain suite.yaml,
and load tests from these folders. Skip the rest.
Each folder with a suite.yaml is expected to have a valid
suite configuration in the yaml file.
A suite is a folder with test of the same type. E.g.
it can be a folder with unit tests, boost tests, or CQL
tests.
The harness will use suite.yaml to create an appropriate
suite test driver, to execute tests in different formats.
It reduces the number of configurations to re-test when test.py is
modified. and simplifies usage of test.py in build tools, since you no
longer need to bother with extra arguments.
Going forward I'd like to make terminal output brief&tabular,
but some test details are necessary to preserve so that a failure
is easy to debug. This information now goes to the log file.
- open and truncate the log file on each harness start
- log options of each invoked test in the log, so that
a failure is easy to reproduce
- log test result in the log
Since tests are run concurrently, having an exact
trace of concurrent execution also helps
debugging flaky tests.
The storage_service struct is a collection of diverse things,
most of them requiring only on start and on stop and/or runing
on shard 0 (but is nonetheless sharded).
As a part of clearing this structure and generated by it inter-
-componenes dependencies, here's the sanitation of load_broadcaster.
Fixes#5582
... but only populate log on shard 0.
Migration manager callbacks are slightly assymetric. Notifications
for pre-create/update mutations are sent only on initiating shard
(neccesary, because we consider the mutations mutable).
But "created" callbacks are sent on all shards (immutable).
We must subscribe on all shards, but still do population of cdc table
only once, otherwise we can either miss table creat or populate
more than once.
v2:
- Add test case
Message-Id: <20200113140524.14890-1-calle@scylladb.com>
* seastar 36cf5c5ff0...3f3e117de3 (16):
> memcached: don't use C++17-only std::optional
> reactor: Comment why _backend is assigned in constructor body
> log: restore --log-to-stdout for backward compatibility
> used_size.hh: Include missing headers
> core: Move some code from reactor.cc to future.cc
> future-util: move parallel_for_each to future-util.cc
> task: stop wrapping tasks with unique_ptr
> Merge "Setup timer signal handler in backend constructor" from Pavel
Fixes#5524
> future: avoid a branch in future's move constructor if type is trivial
> utils: Expose used_size
> stream: Call get_future early
> future-util: Move parallel_for_each_state code to a .cc
> memcached: log exceptions
> stream: Delete dead code
> core: Turn pollable_fd into a simple proxy over pollable_fd_state.
> Merge "log to std::cerr" from Benny
This is the part of de-bloating storage_service.
The field in question is used to temporary keep the _token_metadata
value during shard-wide replication. There's no need to have it as
class member, any "local" copy is enough.
Also, as the size of token_metadata is huge, and invoke_on_all()
copies the function for each shard, keep one local copy of metadata
using do_with() and pass it into the invoke_on_all() by reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Reviewed-by: Asias He <asias@scylladb.com>
Message-Id: <20200113171657.10246-1-xemul@scylladb.com>
The query option always_return_static_content was added for lightweight
transations in commits e0b31dd273 (infrastructure) and 65b86d155e
(actual use). However, the flag was added unconditionally to
update_parameters::options. This caused it to be set for list
read-modify-write operations, not just for lightweight transactions.
This is a little wasteful, and worse, it breaks compatibility as old
nodes do not understand the always_return_static_content flag and
complain when they see it.
To fix, remove the always_return_static_content from
update_parameters::options and only set it from compare-and-swap
operations that are used to implement lightweight transactions.
Fixes#5593.
Reviewed-by: Gleb Natapov <gleb@scylladb.com>
Message-Id: <20200114135133.2338238-1-avi@scylladb.com>
The drain_in_progress variable here is the future that's set by the
drain() operation itself. Its promise is set when the drain() finishes.
The check for this future in the beginning of drain() is pointless.
No two drain()-s can run in parallels because of run_with_api_lock()
protection. Doing the 2nd drain after successfull 1st one is also
impossible due to the _operation_mode check. The 2nd drain after
_exceptioned_ (and thus incomplete) 1st one will deadlock, after
this patch will try to drain for the 2nd time, but that should by ok.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200114094724.23876-1-xemul@scylladb.com>
This change introduces system.clients table, which provides
information about CQL clients connected.
PK is the client's IP address, CK consists of outgoing port number
and client_type (which will be extended in future to thrift/alternator/redis).
Table supplies also shard_id and username. Other columns,
like connection_stage, driver_name, driver_version...,
are currently empty but exist for C* compatibility and future use.
This is an ordinary table (i.e. non-virtual) and it's updated upon
accepting connections. This is also why C*'s column request_count
was not introduced. In case of abrupt DB stop, the table should not persist,
so it's being truncated on startup.
Resolves#4820
"
Most of the code in `cell` and the `imr` infrastructure it is built on
is `noexcept`. This means that extra care must be taken to avoid rouge
exceptions as they will bring down the node. The changes introduced by
0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this
code path by violating an undocumented and unvalidated assumption --
that fragment ranges passed to `cell::make_collection()` are nothrow
copyable and movable.
This series refactors `cell::make_collection()` such that it does not
have this assumption anymore and is safe to use with any range.
Note that the unit test included in this series, that was used to find
all the possible exception sources will not be currently run in any of
our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not
being set. I plan to address this in a followup because setting this
flags fails other tests using the failure injection mechanism. This is
because these tests are normally run with the failure injection disabled
so failures managed to lurk in without anyone noticing.
Fixes: #5575
Refs: #5341
Tests: unit(dev, debug)
"
* 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla:
test: mutation_test: add exception safety test for large collection serialization
data/cell.hh: avoid accidental copies of non-nothrow copiable ranges
utils/fragment_range.hh: introduce fragment_range_view
We do not yet support the ScanIndexForward=false option for reversing
the sort order of a Query operation, as reported in issue #5153.
But even before implementing this feature, it is important that we
produce an error if a user attempts to use it - instead of outright
ignoring this parameter and giving the user wrong results. This is
what this patch does.
Before this patch, the reverse-order query in the xfailing test
test_query.py::test_query_reverse seems to succeed - yet gives
results in the wrong order. With this patch, the query itself fails -
stating that the ScanIndexForward=false argument is not supported.
Refs #5153
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200105113719.26326-1-nyh@scylladb.com>
Here's another theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and some other operation.
Let's walk through them
First goes the removenode:
run_with_api_lock
_operation_in_progress = "removenode"
storage_service::remove_node
sleep in replicating_nodes.empty() loop
Now the force_removenode can run:
run_with_no_api_lock
storage_service::force_removenode
check _operation_in_progress (not empty)
_force_remove_completion = true
sleep in _operation_in_progress.empty loop
Now the 1st call wakes up and:
if _force_remove_completion == true
throw <some exception>
.finally() handler in run_with_api_lock
_operation_in_progress = <empty>
At this point some other operation may start. Say, drain:
run_with_api_lock
_operation_in_progress = "drain"
storage_service::drain
...
go to sleep somewhere
No let's go back to the 1st op that wakes up from its sleep.
The code it executes is
while (!ss._operation_in_progress.empty()) {
sleep_abortable()
}
and while the drain is running it will never exit.
However (! and this is the core of the race) should the drain
operation happen _before_ the force_removenode, another check
for _operation_in_progress would have made the latter exit with
the "Operation drain is in progress, try again" message.
Fix this inconsistency by making the check for current operation
every wake-up from the sleep_abortable.
Fixes#5591
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Here's a theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and removenode (again)
operations. Let's walk through them
First goes the removenode:
run_with_api_lock
_operation_in_progress = "removenode"
storage_service::remove_node
sleep in replicating_nodes.empty() loop
Now the force_removenode can run:
run_with_no_api_lock
storage_service::force_removenode
check _operation_in_progress (not empty)
_force_remove_completion = true
sleep in _operation_in_progress.empty loop
Now the 1st call wakes up and:
if _force_remove_completion == true
_force_remove_completion = false
throw <some exception>
.finally() handler in run_with_api_lock
_operation_in_progress = <empty>
! at this point we have _force_remove_completion = false and
_operation_in_progress = <empty>, which opens the following
opportunity for the 3d removenode:
run_with_api_lock
_operation_in_progress = "removenode"
storage_service::remove_node
sleep in replicating_nodes.empty() loop
Now here's what we have in 2nd and 3rd ops:
1. _operation_in_progress = "removenode" (set by 3rd) prevents the
force_removenode from exiting its loop
2. _force_remove_completion = false (set by 1st on exit) prevents
the removenode from waiting on replicating_nodes list
One can start the 4th call with force_removenode, it will proceed and
wake up the 3rd op, but after it we'll have two force_removenode-s
running in parallel and killing each other.
I propose not to set _force_remove_completion to false in removenode,
but just exit and let the owner of this flag unset it once it gets
the control back.
Fixes#5590
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Other types do not have a wider accumulator at the moment.
And static_cast<accumulator_type>(ret) != _sum evaluates as
false for NaN/Inf floating point values.
Fixes#5586
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200112183436.77951-1-bhalevy@scylladb.com>
Now that atomic_cell_view and collection_mutation_view have
type-aware printers, we can use them in the type-aware atomic_cell_or_collection
printer.
Message-Id: <20191231142832.594960-1-avi@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5533
from Avi Kivity:
canonical_mutation objects are used for schema reconciliation, which is a
fragile area and thus deserves some debugging help.
This series makes canonical_mutation objects printable.
Merged patch series from Piotr Sarna:
"Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.
This series is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.
Fixes#5006
Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)"
Piotr Sarna (3):
db,view: fix checking if partition key is empty
view: handle multiple regular base columns in view pk
test: add a case for multiple base regular columns in view key
alternator-test/test_gsi.py | 1 -
view_info.hh | 5 +-
cql3/statements/alter_table_statement.cc | 2 +-
db/view/view.cc | 77 ++++++++++++++----------
mutation_partition.cc | 2 +-
test/boost/cql_query_test.cc | 58 ++++++++++++++++++
6 files changed, 109 insertions(+), 36 deletions(-)
Merged patch series from Gleb Natapov:
"LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt. It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.
The nicer way to achieve the same would be to jump to a right shard
inside of the storage_proxy::cas(), but unfortunately with current
implementation of the modification statements they are unusable by
a shard different from where it was created, so the jump should happen
before a modification statement for an cas() is created. When we fix our
cql code to be more cross-shard friendly this can be reworked to do the
jump in the storage_proxy."
Gleb Natapov (4):
transport: change make_result to takes a reference to cql result
instead of shared_ptr
storage_service: move start_native_transport into a thread
lwt: Process lwt request on a owning shard
lwt: drop invoke_on in paxos_state prepare and accept
auth/service.hh | 5 +-
message/messaging_service.hh | 2 +-
service/client_state.hh | 30 +++-
service/paxos/paxos_state.hh | 10 +-
service/query_state.hh | 6 +
service/storage_proxy.hh | 2 +
transport/messages/result_message.hh | 20 +++
transport/messages/result_message_base.hh | 4 +
transport/request.hh | 4 +
transport/server.hh | 25 ++-
cql3/statements/batch_statement.cc | 6 +
cql3/statements/modification_statement.cc | 6 +
cql3/statements/select_statement.cc | 8 +
message/messaging_service.cc | 2 +-
service/paxos/paxos_state.cc | 48 ++---
service/storage_proxy.cc | 47 ++++-
service/storage_service.cc | 120 +++++++------
test/boost/cql_query_test.cc | 1 +
thrift/handler.cc | 3 +
transport/messages/result_message.cc | 5 +
transport/server.cc | 203 ++++++++++++++++------
21 files changed, 377 insertions(+), 180 deletions(-)
Use `seastar::memory::local_failure_injector()` to inject al possible
`std::bad_alloc`:s into the collection serialization code path. The test
just checks that there are no `std::abort()`:s caused by any of the
exceptions.
The test will not be run if `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` is
not defined.
`cell::make_collection()` assumes that all ranges passed to it are
nothrow copyable and movable views. This is not guaranteed, is not
expressed in the interface and is not mentioned in the comments either.
The changes introduced by 0a453e5d3a to collection serialization, making
it use fragmented buffers, fell into this trap, as it passes
`bytes_ostream` to `cell::make_collection()`. `bytes_ostream`'s copy
constructor allocates and hence can throw, triggering an
`std::terminate()` inside `cell::make_collection()` as the latter is
noexcept.
To solve this issue, non-nothrow copyable and movable ranges are now
wrapped in a `fragment_range_view` to make them so.
`cell::make_collection()` already requires callers to keep alive the
range for the duration of the call, so this does not introduce any new
requirements to the callers. Additionally, to avoid any future
accidents, do not accept temporaries for the `data` parameter. We don't
ever want to move this param anyway, we will either have a trivially
copyable view, or a potentially heavy-weight range that we will create a
trivially copyable view of.
A lightweight, trivially copyable and movable view for fragment ranges.
Allows for uniform treatment of all kinds of ranges, i.e. treating all
of them as a view. Currently `fragment_range.hh` provides lightweight,
view-like adaptors for empty and single-fragment ranges (`bytes_view`). To
allow code to treat owning multi-fragment ranges the shame way as the
former two, we need a view for the latter as well -- this is
`fragment_range_view`.
Resolves#4820. Execution path in main.cc now cleans up system.clients
table if it exists (this is done on startup). Also, server.cc now calls
functions that notify about cql clients connecting/disconnecting.
This simplifies the storage_service API and fixes the
complain about shared_ptr usage instead of unique_ptr.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a lonely get_load_map() call on storage_service that
needs only load broadcaster, always runs on shard 0 and that's it.
Next patch will move this whole stuff into its own helper no-shard
container and this is preparation for this.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Since lwt requests are now running on an owning shard there is no longer
a need to invoke cross shard call on paxos_state level. RPC calls may
still arrive to a wrong shard so we need to make cross shard call there.
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt. It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
"
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.
Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.
Fixes#3717
"
* tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla:
database: Avoid OOMing with flush continuations after failed memtable flush
lsa: Introduce operator bool() to occupancy_stats
lsa: Expose region_impl::evictable_occupancy in the region class
"
Fix overflow handling in sum() and avg().
sum:
- aggregated into __int128
- detect overflow when computing result and log a warning if found
avg:
- fix division function to divide the accumulator type _sum (__int128 for integers) by _count
Add unit tests for both cases
Test:
- manual test against Cassandra 3.11.3 to make sure the results in the scylla unit test agree with it.
- unit(dev), cql_query_test(debug)
Fixes#5536
"
* 'cql3-sum-overflow' of https://github.com/bhalevy/scylla:
test: cql_query_test: test avg overflow
cql3: functions: protect against int overflow in avg
test: cql_query_test: test sum overflow
cql3: functions: detect and handle int overflow in sum
exceptions: sort exception_code definitions
exceptions: define additional cassandra CQL exceptions codes
"
We were failing to start a thread when the UDF call was nested in an
aggregate function call like SUM.
"
* 'espindola/fix-sum-of-udf' of https://github.com/espindola/scylla:
cql3: Fix indentation
cql3: Add missing with_thread_if_needed call
cql3: Implement abstract_function_selector::requires_thread
remove make_ready_future call
This was initialized to api::missing_timestamp but
should be set to either a client provided-timestamp or
the server's.
Unlike write operations, this timestamp need not be unique
as the one generated by client_state::get_timestamp.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-2-bhalevy@scylladb.com>
exec->_cmd->read_timestamp may be initialized by default to api::min_timestamp,
causing:
service/storage_proxy.cc:3328:116: runtime error: signed integer overflow: 1577983890961976 - -9223372036854775808 cannot be represented in type 'long int'
Aborting on shard 1.
Do not optimize cross-dc repair if read_timestamp is missing (or just negative)
We're interested in reads that happen within write_timeout of a write.
Fixes#5556
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200108074021.282339-1-bhalevy@scylladb.com>
The cdc service is assigned from outside, post construction, mainly
because of the chickens and eggs in main startup. Would be nice to
have it unconditionally, but this is workable.
It is (and shall) only be called from inside storage proxy,
and we would like this to be reflected in the interface
so our eventual moving of cdc logic into the mutate call
chains become easier to verify and comprehend.
To eventually replace the free function.
Main difference is this is build to both handle batches correctly
and to eventually allow hanging cdc object on storage proxy,
and caches on the cdc object.
The test case checks that having two base regular columns
in the materialized view key (not obtainable via CQL),
still works fine when values are inserted or deleted.
If TTL was involved and these columns would have different expiration
rules, the case would be more complicated, but it's not possible
for a user to reach that case - neither with CQL, nor with alternator.
Previous assumption was that there can only be one regular base column
in the view key. The assumption is still correct for tables created
via CQL, but it's internally possible to create a view with multiple
such columns - the new assumption is that if there are multiple columns,
they share their liveness.
This patch is vital for indexing to work properly on alternator,
so it would be best to solve the issue upstream. I strived to leave
the existing semantics intact as long as only up to one regular
column is part of the materialized view primary key, which is the case
for Scylla's materialized views. For alternator it may not be true,
but all regular columns in alternator share liveness info (since
alternator does not support per-column TTL), which is sufficient
to compute view updates in a consistent way.
Fixes#5006
Tests: unit(dev), alternator(test_gsi_update_second_regular_base_column, tic-tac-toe demo)
Message-Id: <c9dec243ce903d3a922ce077dc274f988bcf5d57.1567604945.git.sarna@scylladb.com>
Now that position_in_partition_view has type-aware printing, use it
to provide a human readable version of clustering keys.
Message-Id: <20191231151315.602559-2-avi@scylladb.com>
If the position_in_partition_view represents a clustering key,
we can now see it with the clustering key decoded according to
the schema.
Message-Id: <20191231151315.602559-1-avi@scylladb.com>
Previous implementation did not take into account that a column
in a partition key might exist in a mutation, but in a DEAD state
- if it's deleted. There are no regressions for CQL, while for
alternator and its capability of having two regular base columns
in a view key, this additional check must be performed.
This reduces code bloat and makes the code friendlier for IDEs, as the
IDE now understands the type of create_schema.
Message-Id: <20191231134803.591190-1-avi@scylladb.com>
sstables::write_simple() has quite a lot of boilerplate
which gets replicated into each template instance. Move
all of that into a non-template do_write_simple(), leaving
only things that truly depend on the component being written
in the template, and encapsulating them with a
noncopyable_function.
An explicit template instantiation was added, since this
is used in a header file. Before, it likely worked by
accident and stopped working when the template became
small enough to inline.
Tests: unit (dev)
Message-Id: <20200106135453.1634311-1-avi@scylladb.com>
mutation_partition_view now supports a compile-time resolved visitor.
This is performant but results in bloat when the performance is not
needed. Furthermore, the template function that applies the object
to the visitor is private and out-of-line, to reduce compile time.
To allow visitation on mutation_partition_view objects, add a virtual
visitor type and a non-template accept function.
Note: mutation_partition_visitor is very similar to the new type,
but different enough to break the template visitor which is used
to implement the new visitor.
The new visitor will be used to implement pretty printing for
canonical_mutation.
Consider this:
1) Write partition_start of p1
2) Write clustering_row of p1
3) Write partition_end of p1
4) Repair is stopped due to error before writing partition_start of p2
5) Repair calls repair_row_level_stop() to tear down which calls
wait_for_writer_done(). A duplicate partition_end is written.
To fix, track the partition_start and partition_end written, avoid
unpaired writes.
Backports: 3.1 and 3.2
Fixes: #5527
commit 21dec3881c introduced
a bug that will cause scylla debian build to fail. This is
because the commit relied on the environment PRODUCT variable
to be exported (and as a result, to propogate to the rename
command that is executed by find in a subshell)
This commit fixes it by explicitly passing the PRODUCT variable
into the rename command.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20200106102229.24769-1-eliransin@scylladb.com>
In scylla, the replacing node is set as HIBERNATE status. It is the only
place we use HIBERNATE status. The replacing node is supposed to be
alive and updating its heartbeat, so it is not supposed to be in dead
state.
This patch fixes the following problem in replacing.
1) start n1, n2
2) n2 is down
3) start n3 to replace n2, but kill n3 in the middle of the replace
4) start n4 to replace n2
After step 3 and step 4, the old n3 will stay in gossip forever until a
full cluster shutdown. Note n3 will only stay in gossip but in
system.peers table. User will see the annoying and infinite logs like on
all the nodes
rpc - client $ip_of_n3:7000: fail to connect: Connection refused
Fixes: #5449
Tests: replace_address_test.py + manual test
VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7"
scylla_ntp_setup doesn't work on OEL7.7 for ValueError.
- ValueError: invalid literal for int() with base 10: '7.7'
This patch changed redhat_version() to return version string, and compare
with parse_version().
Fixes#5433
Signed-off-by: Amos Kong <amos@scylladb.com>
When the progress is queried, e.g., query from nodetool netstats
the progress info might not be updated yet.
Fix it by checking before access the map to avoid errors like:
std::out_of_range (_Map_base::at)
Fixes: #5437
Tests: nodetool_additional_test.py:TestNodetool.netstats_test
This depends on the just emailed fixes to undefined behavior in
tests. With this change we should quickly notice if a change
introduces undefined behavior.
Fixes#4054
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191230222646.89628-1-espindola@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5538 from
Avi Kivity and Piotr Jastrzębski.
This series prepares CDC for rolling upgrade. This consists of
reducing the footprint of cdc, when disabled, on the schema, adding
a cluster feature, and redacting the cdc column when transferring
it to other nodes. The latter is needed because we'll want to backport
this to 3.2, which doesn't have canonical_mutations yet.
If in memory buffer has not enough space for incoming mutation it is
written into a file, but the code missed updating timestamp of a last
sync, so we may sync to often.
Message-Id: <20200102155049.21291-9-gleb@scylladb.com>
The code that enters the gate never defers before leaving, so the gate
behaves like a flag. Lets use existing flag to prohibit adding data to a
closed segment.
Message-Id: <20200102155049.21291-8-gleb@scylladb.com>
Currently segment closing code is spread over several functions and
activated based on the _closed flag. Make segment closing explicit
by moving all the code into close() function and call it where _closed
flag is set.
Message-Id: <20200102155049.21291-6-gleb@scylladb.com>
Currently sync() does two completely different things based on the
shutdown parameter. Separate code into two different function.
Message-Id: <20200102155049.21291-3-gleb@scylladb.com>
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after CDC was enabled
and a table with CDC enabled is created,
in order to make sure that the digest computed
including CDC column does not change spuriously as well.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Addition of cdc column in scylla_tables changes how schema
digests are calculated, and affect the ABI of schema update
messages (adding a column changes other columns' indexes
in frozen_mutation).
To fix this, extend the schema_tables mechanism with support
for the cdc column, and adjust schemas and mutations to remove
that column when sending schemas during upgrade.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
An empty cdc column in scylla_tables is hashed differently from
a missing column. This causes schema mismatch when a schema is
propagated to another node, because the other node will redact
the schema column completely if the cluster feature isn't enabled,
and an empty value is hashed differently from a missing value.
Store a tombstone instead. Tombstones are removed before
digesting, so they don't affect the outcome.
This change also undoes the changes in 386221da84 ("schema_tables:
handle 'cdc' options") to schema_change_test
test_merging_does_not_alter_tables_which_didnt_change. That change
enshrined the breakage into the test, instead of fixing the root cause,
which was that we added an an extra mutation to the schema (for
cdc options, which were disabled).
Different versions of boost have different rules for what conversions
from cpp_int to smaller intergers are allowed.
We already had a function that worked with all supported versions, but
it was not being use by lua.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200104041028.215153-1-espindola@scylladb.com>
I noticed this while looking at the crashes next is currently
experiencing.
While I have no idea if this fixes the issue, it does avoid broken
future warnings (for no_sharded_instance_exception) in a debug build.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200103201540.65324-1-espindola@scylladb.com>
* seastar 0525bbb08...36cf5c5ff (6):
> memcached: Fix use after free in shutdown
> Revert "task: stop wrapping tasks with unique_ptr"
> task: stop wrapping tasks with unique_ptr
> http: Change exception formating to the generic seastar one
> Merge "Avoid a few calls to ~exception_ptr" from Rafael
> tests: fix core generation with asan
This patch adds a very comprehensive test for the ConditionExpression
feature, i.e., the newer syntax of conditional writes replacing
the old-style "Expected" - for the UpdateItem, PutItem and DeleteItem
operations.
I wrote these tests while closely following the DynamoDB ConditionExpression
documentation, and attempted to cover all conceivable features, subfeatures
and subcases of the ConditionExpression syntax - to serve as a test for a
future support for this feature in Alternator (see issue #5053).
As usual, all these tests pass on AWS DynamoDB, but because we haven't yet
implemented this feature in Alternator, all but one xfail on Alternator.
Refs #5053.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191229143556.24002-1-nyh@scylladb.com>
If Alternator is requested to be enabled on a specific port but the port is
already taken, the boot fails as expected - but the error log is confusing;
It currently looks something like this:
WARN 2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
... (many more messages about the server shutting down)
INFO 2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
There are two problems here. First, the "WARN" should really be an "ERROR",
because it causes the server to be shut down and the user must see this error.
Second, the final line in the log, something the user is likely to see first,
contains only the ultimate cause for the exception (an address already in use)
but not the information what this address was needed for.
This patch solves both issues, and the log now looks like:
ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna
tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system
:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
...
INFO 2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191224124127.7093-1-nyh@scylladb.com>
We don't support yet the ReturnValues option on PutItem, UpdateItem or
DeleteItem operations (see issue #5053), but if a user tries to use such
an option anyway, we silently ignore this option. It's better to fail,
reporting the unsupported option.
In this patch we check the ReturnValues option and if it is anything but
the supported default ("NONE"), we report an error.
Also added a test to confirm this fix. The test verifies that "NONE" is
allowed, and something which is unsupported (e.g., "DOG") is not ignored
but rather causes an error.
Refs #5053.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191216193310.20060-1-nyh@scylladb.com>
These are flags we always want to enable. In particular, we want them
to be used by the bots, but the bots run this script with
--configure-flags, so they were being discarded.
We put the user option later so that they can override the common
options.
Fixes#5505
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Takuya ASADA <syuu@scylladb.com>
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
There is no requirement that all notes be placed in a single
PT_NOTE. It looks like recent lld's actually put each section in its
own PT_NOTE.
This change looks for build-id in all PT_NOTE headers.
Fixes#5525
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191227000311.421843-1-espindola@scylladb.com>
rpm compression uses xz, which is painfully slow. Adjust the
compression settings to run on all threads.
The xz utility documentation suggests that 0 threads is
equivalent to all CPUs, but apparently the library interface
(which rpmbuild uses) doesn't think the same way.
Message-Id: <20200101141544.1054176-1-avi@scylladb.com>
In the current code, support for case-sensitive (quoted) user-defined type
names is broken. For example, a test doing:
CREATE TYPE "PHone" (country_code int, number text)
CREATE TABLE cf (pk blob, pn "PHone", PRIMARY KEY (pk))
Fails - the first line creates the type with the case-sensitive name PHone,
but the second line wrongly ends up looking for the lowercased name phone,
and fails with an exception "Unknown type ks.phone".
The problem is in cql3_type_name_impl. This class is used to convert a
type object into its proper CQL syntax - for example frozen<list<int>>.
The problem is that for a user-defined type, we forgot to quote its name
if not lowercase, and the result is wrong CQL; For example, a list of
PHone will be written as list<PHone> - but this is wrong because the CQL
parser, when it sees this expression, lowercases the unquoted type name
PHone and it becomes just phone. It should be list<"PHone">, not list<PHone>.
The solution is for cql3_type_name_impl to use for a user-defined type
its get_name_as_cql_string() method instead of get_name_as_string().
get_name_as_cql_string() is a new method which prints the name of the
user type as it should be in a CQL expression, i.e., quoted if necessary.
The bug in the above test was apparently caused when our code serialized
the type name to disk as the string PHone (without any quoting), and then
later deserialized it using the CQL type parser, which converted it into
a lowercase phone. With this patch, the type's name is serialized as
"PHone", with the quotes, and deserialized properly as the type PHone.
While the extra quotes may seem excessive, they are necessary for the
correct CQL type expression - remember that the type expression may be
significantly more complex, e.g., frozen<list<"PHone">> and all of this,
including the quotes, is necessary for our parser to be able to translate
this string back into a type object.
This patch may cause breakage to existing databases which used case-
sensitive user-defined types, but I argue that these use cases were
already broken (as demonstrated by this test) so we won't break anything
that actually worked before.
Fixes#5544
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200101160805.15847-1-nyh@scylladb.com>
The class in question wants to run its own instances on different
shards, for this sake it keeps reference on sharded self to call
invoke_on() on. There's a handy peering_sharded_service<> in seastar
for the same, using it makes the code nicer and shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191226112401.23960-1-xemul@scylladb.com>
We had a lot of code in a .hh file, that while using templeates, was
only used from creating functions during startup.
This moves it to a new .cc file.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200101002158.246736-1-espindola@scylladb.com>
I can't quite figure out how we were trying to write a sstable with
the large data handler already stopped, but the backtrace suggests a
good place to add extra checks.
This patch adds two check. One at the start and one at the end of
sstable::write_components. The first one should give us better
backtraces if the large_data_handler is already stopped. The second
one should help catch some race condition.
Refs: #5470
Message-Id: <20191231173237.19040-1-espindola@scylladb.com>
The standard printer for atomic_cell prints the value as hex,
because atomic_cell does not include the type. Add a type-aware
printer that allows the user to provide the type.
When the product name is other than "scylla", the debian
packaging scripts go over all files that starts with "scylla-"
and change the prefix to be the actual product name.
However, if there are no such files in the directory
the script will fail since the renaming command will
get the wildcard string instrad of an actual file name.
This patch replaces the command with a command with
an equivalent desired effect that only operates on files
if there are any.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191230143250.18101-1-eliransin@scylladb.com>
Since we merged /usr/lib/scylla with /opt/scylladb, we removed
/usr/lib/scylla and replace it with the symlink point to /opt/scylladb.
However, RPM does not support replacing a directory with a symlink,
we are doing some dirty hack using RPM scriptlet, but it causes
multiple issues on upgrade/downgrade.
(See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/)
To minimize Scylla upgrading/downgrade issues on user side, it's better
to keep /usr/lib/scylla directory.
Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb,
we can create symlinks for each setup scripts like
/usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>.
Fixes#5522Fixes#4585Fixes#4611
The '--builddir' option value is assigned to the "builddir" variable,
which is wrong. The correct variable is "BUILDDIR" so use that instead
to fix the '--builddir' option.
Also, add logging to the script when executing the "dist/redhat_build.rpm.sh"
script to simplify debugging.
Hit the following ubsan error with bootstrap_test:TestBootstrap.manual_bootstrap_test in debug mode:
service/storage_service.cc:3519:37: runtime error: load of value 190, which is not a valid value for type 'bool'
The use site is:
service::storage_service::is_cleanup_allowed(seastar::basic_sstring<char, unsigned int, 15u, true>)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&) const at /local/home/bhalevy/dev/scylla/service/storage_service.cc:3519
While at it, initialize `_initialized` to false as well, just in case.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Avoid following UBSAN error:
repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool'
Fixes#5531
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.
This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is used.
Fixes#4474Fixes#5474Fixes#5480
Instances of `variable_specifications` are passed around as
shared_ptr's, which are redundant in this case since the class
is marked as `final`. Use `lw_shared_ptr` instead since we know
for sure it's not a polymorphic pointer.
Tests: unit(debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>
If any two directories of data/commitlog/hints/view_hints
are the same we still end up running verify_owner_and_mode
and disk_sanity(check_direct_io_support) in parallel
on the same directoriea and hit #5510.
This change uses std::set rather than std::vector to
collect a unique set of directories that need initialization.
Fixes#5510
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>
db::commitlog::segment::batch_cycle() assumes that after a write
for a certain position completes (as reported by
_pending_ops.wait_for_pending()) it will also be flushed, but this is
true only if writing and flushing are atomic wrt _pending_ops lock.
It usually is unless flush_after is set to false when cycle() is
called. In this case only writing is done under the lock. This
is exactly what happens when a segment is closed. Flush is skipped
because zero header is added after the last entry and then flushed, but
this optimization breaks batch_cycle() assumption. Fix it by flushing
after the write atomically even if a segment is being closed.
Fixes#5496
Message-Id: <20191224115814.GA6398@scylladb.com>
The hints and view_hints directory has per-shard sub-dirs,
and the directories code tries to create, check and lock
all of them, including the base one.
The manipulations in question are excessive -- it's enough
to check and lock either the base dir, or all the per-shard
ones, but not everything. Let's take the latter approach for
its simplicity.
Fixes#5510
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Looks-good-to: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223142429.28448-1-xemul@scylladb.com>
Pekka Enberg <penberg@scylladb.com> wrote:
> Image might not be present, but the subsequent "docker run" command will automatically pull it.
Just letting "docker run" fail produces kinda confusing error message,
referring to docker help, but the we want to provide the user
with our own help, so still fail early, just also try to pull the image
if "docker image inspect" failed, indicating it's not present locally.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-4-bhalevy@scylladb.com>
Suggested-by: Pekka Enberg <penberg@scylladb.com>
> This will print all the available Docker images,
> many (most?) of them completely unrelated.
> Why not just print an error saying that no image was specified,
> and then perhaps print usage.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191223085219.1253342-3-bhalevy@scylladb.com>
Add dbuild dependency on python3-colorama,
which will be used in test.py instead of a hand-made palette.
[avi: update tools/toolchain/image]
Message-Id: <20191223125251.92064-2-kostja@scylladb.com>
In this place we only need to know the number of endpoints,
while current code additionally shuffles them before counting.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are two _identical_ methods in token_metadata class:
get_all_endpoints_count() and number_of_endpoints().
The former one is used (called) the latter one is not used, so
let's remove it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This greatly helps to narrow down the source of schema digest mismatch
between nodes. Intented use is to enable this logger on disagreeing
nodes and trigger schema digest recalculation and observe which
mutations differ in digest and then examine their content.
Message-Id: <1574872791-27634-1-git-send-email-tgrabiec@scylladb.com>
In commit b463d7039c (repair: Introduce
get_combined_row_hash_response), working_row_buf_nr is returned in
REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is
scheduled to be part of 3.1 release. However it is not backported to 3.1
by accident.
In order to be compatible between 3.1 and 3.2 repair. We need to drop
the working_row_buf_nr in 3.2 release.
Fixes: #5490
Backports: 3.2
Tests: Run repair in a mixed 3.1 and 3.2 cluster
Changes summary:
* make `cql3::result_set` movable-only
* change signature of `cql3::result::result_set` to return by cref
* adjust available call sites to the aforementioned method to accept cref
Motivation behind this change is elimination of dangerous API,
which can easily set a trap for developers who don't expect that
result_set would be returned by value.
There is no point in copying the `result_set` around, so make
`cql3::result::result_set` to cache `result_set` internally in a
`unique_ptr` member variable and return a const reference so to
minimize unnecessary copies here and there.
Tests: unit(debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191220115100.21528-1-pa.solodovnikov@scylladb.com>
So a higher level component using the validator to validate a stream can
catch only validation errors, and let any other incidental exception
through.
This allows building data correctors on top of the
`mutation_fragment_stream_validator`, by filtering a fragment stream
through a validator, catching invalid fragment stream exceptions and
dropping the respective fragments from the stream.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>
This reverts commit 237ba74743. While it
works for the scylla executable, it fails for iotune, which is built
by seastar. It should be reinstated after we pass the correct link
parameters to the seastar build system.
"
These series solves an issue with scylla_setup and prevent it from
waiting forever if housekeeping cannot look for the new Scylla version.
Fixes#5302
It should be backported to versions that support offline installations.
"
* 'scylla_setup_timeout' of git://github.com/amnonh/scylla:
scylla_setup: do not wait forever if no reply is return housekeeping
scylla_util.py: Add optional timeout to out function
Having a long path allows patchelf to change the interpreter without
changing the PT_LOAD headers and therefore without moving the
build-id out of the first page.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191213224803.316783-1-espindola@scylladb.com>
Suppose we have a multi-dc setup (e.g. 9 nodes distributed across
3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]).
When a query that uses LWT is executed with LOCAL_SERIAL consistency
level, the `storage_proxy::get_paxos_participants` function
incorrectly calculates the number of required participants to serve
the query.
In the example above it's calculated to be 5 (i.e. the number of
nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL,
which is equivalent to LOCAL_QUORUM cl in this case).
This behavior results in an exception being thrown when executing
the following query with LOCAL_SERIAL cl:
INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS
Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'}
Tests: unit(dev), dtest(consistency_test.py)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>
The actual buffer is now in a member called 'data'. Leave the old
`dummy.dummy` and `dummy` as fall-back. This seems to change every
Fedora release.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191218153544.511421-1-bdenes@scylladb.com>
Schema is node-global, update_schema_version_and_announce() updates
all shards. We don't need to recalculate it from every shard, so
install the listeners only on shard 0. Reduces noise in the logs.
Message-Id: <1574872860-27899-1-git-send-email-tgrabiec@scylladb.com>
The option in question apparently does not work, several sharded objects
are start()-ed (and thus instanciated) in join_roken_ring, while instances
themselves of these objects are used during init of other stuff.
This leads to broken seastar local_is_initialized assertion on sys_dist_ks,
but reading the code shows more examples, e.g. the auth_service is started
on join, but is used for thrift and cql servers initialization.
The suggestion is to remove the option instead of fixing. The is_joined
logic is kept since on-start joining still can take some time and it's safer
to report real status from the API.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191203140717.14521-1-xemul@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5366 from Calle Wilund:
Moves schema creation/alter/drop awareness to use new "before" callbacks from
migration manager, and adds/modifies log and streams table as part of the base
table modification.
Makes schema changes semi-atomic per node. While this does not deal with updates
coming in before a schema change has propagated cluster, it now falls into the
same pit as when this happens without CDC.
Added side effect is also that now schemas are transparent across all subsystems,
not just cql.
Patches:
cdc_test: Add small test for altering base schema (add column)
cdc: Handle schema changes via migration manager callbacks
migration_manager: Invoke "before" callbacks for table operations
migration_listener: Add empty base class and "before" callbacks for tables
cql_test_env: Include cdc service in cql tests
cdc: Add sharded service that does nothing.
cdc: Move "options" to separate header to avoid to much header inclusion
cdc: Remove some code from header
* seastar 00da4c8760...0525bbb08f (7):
> future: Simplify future_state_base::any move constructor
> future: don't create temporary tuple on future::get().
> future: don't instantiate new future on future::then_wrapped().
> future: clean-up the Result handling in then_wrapped().
> Merge "Fix core dumps when asan is enabled" from Rafael
> future: Move ignore to the base class
> future: Don't delete in ignore
Currently `SCYLLA-VERSION-GEN` is not a dependency of any target and
hence changes done to it will not be picked up by ninja. To trigger a
rebuild and hence version changes to appear in the `scylla` target
binary, one has to do `touch configure.py`. This is counter intuitive
and frustrating to people who don't know about it and wonder why their
changed version is not appearing as the output of `scylla --version`.
This patch makes `SCYLLA-VERSION-GEN` a dependency of `build.ninja,
making the `build.ninja` target out-of-date whenever
`SCYLLA-VERSION-GEN` is changed and hence will trigger a rerun of
`configure.py` when the next target is built, allowing a build of e.g.
`scylla` to pick up any changes done to the version automatically.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191217123955.404172-1-bdenes@scylladb.com>
"
This patch set rearranges the test files so that
it is now possible to search for tests automatically,
and adds this functionality to test.py
"
* 'test.py.requeue' of ssh://github.com/scylladb/scylla-dev:
cmake: update CMakeLists.txt to scan test/ rather than tests/
test.py: automatically lookup all unit and boost tests
tests: move all test source files to their new locations
tests: move a few remaining headers
tests: move another set of headers to the new test layout
tests: move .hh files and resources to new locations
tests: remove executable property from data_listeners_test.cc
When scylla is installed without a network connectivity, the test if a
newer version is available can cause scylla_setup to wait forever.
This patch adds a limit to the time scylla_setup will wait for a reply.
When there is no reply, the relevent error will be shown that it was
unable to check for newer version, but this will not block the setup
script.
Fixes#5302
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5343 from
Benny Halevy.
Fixes#5340
Hold the sstable_deletion_sem table::move_sstables_from_subdirs to
serialize access to the staging directory. It now synchronizes snapshot,
compaction deletion of sstables, and view_update_generator moving of
sstables from staging.
Tests:
unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master]
snapshot_test.py (dev)
I used the following as a reference:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/virtual/ClientsTable.java
At this moment there is only info about IP, clients outgoing port,
client 'type' (i.e. CQL/thrift/alternator), shard ID and username.
Column `request_count' is NOT present and CK consists of
(`port', `client_type'), contrary to what C*'s has: (`port').
Code that notifies `system.clients` about new connections goes
to top-level files `connection_notifier.*`. Currently only CQL
clients are observed, but enum `client_type` can be used in future
to notify about connections with other protocols.
Hold the _sstable_deletion_sem while moving sstables from the staging directory
so not to move them under the feet of table::snapshot.
Fixes#5340
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Consumer may throw, in this case, break from the loop and retry.
move_sstable_from_staging_in_thread may theoretically throw too,
ignore the error in this case since the sstable was already processed,
individual move failures are already ignored and moving from staging
will be retried upon restart.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To be used for "batch" move of several sstables from staging
to the base directory, allowing the caller to sync the directories
once when all are moved rather than for each one of them.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
distributed_loader::probe_file needlessly creates a seastar
thread for it and the next patch will use it as part of
a parallel_for_each loop to move a list of sstables
(and sync the directories once at the end).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We do not yet support the parallel Scan options (TotalSegments, Segment),
as reported in issue #5059. But even before implementing this feature, it
is important that we produce an error if a user attempts to use it - instead
of outright ignoring this parameter. This is what this patch does.
The patch also adds a full test, test_scan.py::test_scan_parallel, for the
parallel scan feature. The test passes on DynamoDB, and still xfails
on Alternator after this patch - but now the Scan request fails immediately
reporting the unsupported option - instead of what the pre-patch code did:
returning the wrong results and the test failing just when the results
do not match the expectations.
Refs #5059.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191217084917.26191-1-nyh@scylladb.com>
"
Only the first patch is needed to fix the undefined behavior, but the
followup ones simplify the memory management around user types.
"
* 'espindola/fix-5193-v2' of ssh://github.com/espindola/scylla:
db: Don't use lw_shared_ptr for user_types_metadata
user_types_metadata: don't implement enable_lw_shared_from_this
cql3: pass a const user_types_metadata& to prepare_internal
db: drop special case for top level UDTs
db: simplify db::cql_type_parser::parse
db: Don't create a reference to nullptr
Add test for loading a schema with a non native type
1. Move tests to test (using singular seems to be a convention
in the rest of the code base)
2. Move boost tests to test/boost, other
(non-boost) unit tests to test/unit, tests which are
expected to be run manually to test/manual.
Update configure.py and test.py with new paths to tests.
Move sstable_test.hh, test_table.hh and cql_assertions.hh from tests/ to
test/lib or test/boost and update dependent .cc files.
Move tests/perf_sstable.hh to test/perf/perf_sstable.hh
Move another small subset of headers to test/
with the same goals:
- preserve bisectability
- make the revision history traceable after a move
Update dependent files.
The plan is to move the unstructured content of tests/ directory
into the following directories of test/:
test/lib - shared header and source files for unit tests
test/boost - boost unit tests
test/unit - non-boost unit tests
test/manual - tests intended to be run manually
test/resource - binary test resources and configuration files
In order to not break git bisect and preserve the file history,
first move most of the header files and resources.
Update paths to these files in .cc files, which are not moved.
We're seeing the following error from test from time to time:
fatal error: in "test_allocation_failure": std::runtime_error: Did not get expected exception from writing too large record
This is not reproducible and the error string does not contain
enough information to figure out what happened exactly, therefore
this patch adds an exception if the call succeeded unexpectedly
and also prints the unexpected exception if one was caught.
Refs #4714
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191215052434.129641-1-bhalevy@scylladb.com>
The is_podman check was depending on `docker -v` printing "podman" in
the output, but that doesn't actually work, since podman prints $0.
Use `docker --help` instead, which will output "podman".
Also return podman's return status, which was previously being
dropped.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
"
On start there are two things that scylla does on data/commitlog/etc.
dirs: locks and verifies permissions. Right now these two actions are
managed by different approaches, it's convenient to merge them.
Also the introduced in this set directories class makes a ground for
better --workdir option handling. In particular, right now the db::config
entries are modified after options parse to update directories with
the workdir prefix. With the directories class at hands will be able
to stop doing this.
"
* 'br-directories-cleanup' of https://github.com/xemul/scylla:
directories: Make internals work on fs::path
directories: Cleanup adding dirs to the vector to work on
directories: Drop seastar::async usage
directories: Do touch_and_lock and verify sequentially
directories: Do touch_and_lock in parallel
directories: Move the whole stuff into own .cc file
directories: Move all the dirs code into .init method
file_lock: Work with fs::path, not sstring
This reverts commit 4333b37f9e. It breaks upgrades,
and the user question is not informative enough for the user to make a correct
decision.
Fixes#5478.
Fixes#5480.
The unordered_set is turned into vector since for fs::path
there's no hash() method that's needed for set.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the only future-able operation remained is the call to
parallel_for_each(), all the rest is non-blocking preparation,
so we can drop the seastar::async and just return the future
from parallel_for_each.
The indendation is now good, as in previous patch is was prepared
just for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The goal is to drop the seastar::async() usage.
Currently we have two places that return futures -- calls to
parallel_for_each-s. We can either chain them together or,
since both are working on the same set of directories, chain
actions inside them.
For code simplicity I propose to chain actions.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The list of paths that should be touch-and-locked is already
at hands, this shortens the code and makes it slightly faster
(in theory).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order not to pollute the root dir place the code in
utils/ directory, "utils" namespace.
While doing this -- move the touch_and_lock from the
class declaration.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The seastar::async usage is tempoarary, added for bisect-safety,
soon it will go away. For this reason the indentation in the
.init method is not "canonical", but is prepared for one-patch
drop of the seastar::async.
The hinted_handoff_enabled arg is there, as it's not just a
parameter on config, it had been parsed in main.cc.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The main.cc code that converts sstring to fs::path
will be patched soon, the file_desc::open belongs
to seastar and works on sstrings.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt. It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by the transport
code that jumps to a correct shard and re-process incoming message there.
"
* 'gleb/bounce_lwt_request' of github.com:scylladb/seastar-dev:
lwt: take raw lock for entire cas duration
lwt: drop invoke_on in paxos_state prepare and accept
lwt: Process lwt request on a owning shard
storage_service: move start_native_transport into a thread
transport: change make_result to takes a reference to cql result instead of shared_ptr
The implementation of Expected's BEGINS_WITH operator on blobs was
incorrect, naively comparing the base64-encoded strings, which doesn't
work. This patches fixes the code to compare the decoded strings.
The reason why the BEGINS_WITH test missed this bug was that we forgot
to check the blob case and only tested the string case; So this patch
also adds the missing test - which reproduces this bug, and verifies
its fix.
Fixes#5457
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191211115526.29862-1-nyh@scylladb.com>
The LIKE operator requires filtering, so needs_filtering() must check
is_LIKE(). This already happens for partition columns, but it was
overlooked for clustering columns in the initial implementation of
LIKE.
Fixes#5400.
Tests: unit(dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
To be used by dtest as an indicator that endpoint's hints
were drained and hints directory is removed.
Refs #5354
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The user_types_metadata can simply be owned by the keyspace. This
simplifies the code since we never have to worry about nulls and the
ownership is now explicit.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
It looks like this was done just to avoid including
user_types_metadata.hh, which seems a bit much considering that it
requires adding specialization to the seastar namespace.
A followup patch will also stop using lw_shared_ptr for
user_types_metadata.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
We never modify the user_types_metadata via prepare_internal, so we
can pass it a const reference.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This was originally done in 7f64a6ec4b,
but that commit was reverted in reverted in
8517eecc28.
The revert was done because the original change would call parse_raw
for non UDT types. Unlike the old patch, this one doesn't change the
behavior of non UDT types.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The variant of db::cql_type_parser::parse that has a
user_types_metadata argument was only used from the variant that
didn't. This inlines one in the other.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The user_types variable can be null during db startup since we have to
create types before reading the system table defining user types.
This avoids undefined behavior, but is unlikely that it was causing
more serious problems since the variable is only used when creating
user types and we don't create any until after all system tables are
read, in which case the user_types variable is not null.
Fixes#5193
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
LWT is much more efficient if a request is processed on a shard that owns
a token for the request. This is because otherwise the processing will
bounce to an owning shard multiple times. The patch proposes a way to
move request to correct shard before running lwt. It works by returning
an error from lwt code if a shard is incorrect one specifying the shard
the request should be moved to. The error is processed by transport code
that jumps to a correct shard and re-process incoming message there.
This patch adds comprehensive tests for the ReturnValue parameter of
the write operations (PutItem, UpdateItem, DeleteItem), which can return
pre-write or post-write values of the modified item. The tests are in
a new test file, alternator-test/test_returnvalues.py.
This feature is not yet implemented in Alternator, so all the new
tests xfail on Alternator (and all pass on AWS).
Refs #5053
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127163735.19499-1-nyh@scylladb.com>
This patch adds tests for Query's "ScanIndexForward" parameter, which
can be used to return items in reversed sort order.
We test that a Limit works and returns the given number of *last* items
in the sort order, and also that such reverse queries can be resumed,
i.e., paging works in the reverse order.
These tests pass against AWS DynamoDB, but fail against Alternator (which
doesn't support ScanIndexForward yet), so it is marked xfail.
Refs #5153.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191127114657.14953-1-nyh@scylladb.com>
The JMX interface is implemented by the scylla-jmx project, not scylla.
Therefore, let's remove this historical reference to MBeans from
storage_proxy.
Message-Id: <20191211121652.22461-1-penberg@scylladb.com>
"
Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser.
Fixes#5338
"
* 'vecexper' of https://github.com/dekimir/scylla:
config: Add `experimental_features` option
utils: Add enum_option
* seastar e440e831c8...00da4c8760 (7):
> Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi
Fixes#5443.
> install-dependencies.sh: fix arch dependencies
> Merge " rpc: fix use-after-free during rpc teardown vs. rpc server message handling" from Benny
> Merge "testing: improve the observability of abandoned failed futures" from Botond
> rework the fair_queue tester
> directory_test: Update to use run instead of run_deprecated
> log: support fmt 6.0 branch with chrono.h for log
In the calculate_delay() code for view-backlog flow control, we calculate
a delay and cap it at a "budget" - the remaining timeout. This timeout is
measured in milliseconds, but the capping calculation converted it into
microseconds, which overflowed if the timeout is very large. This causes
some tests which enable the UB sanitizer to fail.
We fix this problem by comparing the delay to the budget in millisecond
resolution, not in microsecond resolution. Then, if the calculated delay
is short enough, we return it using its full microsecond resolution.
Fixes#5412
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191205131130.16793-1-nyh@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5453
from Piotr Sarna:
Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:
[1, 3, 2] == [1, 2, 3]
Fixes#5021
Checking the EQ relation for alternator attributes is usually performed
simply by comparing underlying JSON objects, but sets (SS, BS, NS types)
need a special routine, as we need to make sure that sets stored in
a different order underneath are still equal, e.g:
[1, 3, 2] == [1, 2, 3]
Fixes#5021
If a set of mutations contains both an entry that deletes a table
and an entry that adds a table with the same name, it's expected
to be a replacement operation (delete old + create new),
rather than a useless "try to create a table even though it exists
already and then immediately delete the original one" operation.
As such, notifications about the deletions should be performed
before notifications about the creations. The place that originally
suffered from this wrong order is view building - which in this case
created an incorrect duplicated entry in the view building bookkeeping,
and then immediately deleted it, resulting in having old, deprecated
entries with stale UUIDS lying in the build queue and never proceeding,
because the underlying table is long gone.
The issue is fixed by ensuring the order of notifications:
- drops are announced first, view drops are announced before table drops;
- creations follow, table creations are announced before views;
- finally, changes to tables and views are announced;
Fixes#4382
Tests: unit(dev), mv_populating_from_existing_data_during_node_stop_test
Iterate over an array holding all rpm names to see if any
of them is missing from `dist/ami/files`. If they are missing,
look them up in build/redhat/RPMS/x86_64 so that if reloc/build_rpm.sh
was run manually before dist/ami/build_ami.sh we can just collect
the built rpms from its output dir.
If we're still missing any rpms, then run reloc/build_rpm.sh
and copy the required rpms from build/redhat/RPMS/x86_64.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
In swagger 1.2 int is defined as int32.
We originally used int following the jmx definition, in practice
internally we use uint and int64 in many places.
While the API format the type correctly, an external system that uses
swagger-based code generator can face a type issue problem.
This patch replace all use of int in a return type with long that is defined as int64.
Changing the return type, have no impact on the system, but it does help
external systems that use code generator from swagger.
Fixes#5347
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Provide some explanation on prio strings + direction to gnutls manual.
Document client auth option.
Remove confusing/misleading statement on "custom options"
Message-Id: <20191210123714.12278-1-calle@scylladb.com>
Enable existing NOT_CONTAINS test, add NOT_CONTAINS to the list of
recognized operators, implement check_NOT_CONTAINS, and hook it up to
verify_expected_one().
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
When the user wants to turn on only some experimental features, they
can use this new option. The existing `experimental` option is
preserved for backwards compatibility.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Almost all commands provided by `scylla-gdb.py` are safe to use. The
worst that could happen if they fail is that you won't get the desired
information. There is one notable exception: `scylla thread`. If
anything goes wrong while this command is executed - gdb crashes, a bug
in the command, etc. - there is a good change the process under
examination will crash. Sometimes this is fine, but other times e.g.
when live debugging a production node, this is unacceptable.
To avoid any accidents add documentation to all commands working with
`seastar::thread`. And since most people don't read documentation,
especially when debugging under pressure, add a safety net to the
`scylla thread` command. When run, this command will now warn of the
dangers and will ask for explicit acknowledgment of the risk of crash,
by means of passing an `--iamsure` flag. When this flag is missing, it
will refuse to run. I am sure this will be very annoying but I am also
sure that the avoided crashes are worth it.
As part of making `scylla thread` safe, its argument parsing code is
migrated to `argparse`. This changes the usage but this should be fine
because it is well documented.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191129092838.390878-1-bdenes@scylladb.com>
The ami description attribute is only allowed to be 255
characters long. When build_ami.sh generates an ami, it
generates an ami description which is a concatenation
of all of the componnents version strings. It can
happen that the description string is too long which
eventually causes the ami build to fail. This patch
trims the description string to 255 characters.
It is ok since the individual versions of the components
are also saved in tags attached to the image.
Tests:
1. Reproduced with a long description and
validated that it doesn't fail after the fix.
Fixes#5435
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20191209141143.28893-1-eliransin@scylladb.com>
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.
This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is
used.
Fixes#4474
This allows us to create/alter/drop log and desc tables "atomically"
with the base, by including these mutations in the original mutation
set, i.e. batch create/alter tables.
Note that population does not happen until types are actually
already put into database (duh), thus there _is_ still a gap
between creating cdc and it being truly usable. This may or may
not need handling later.
A general build system knows about 3 machines:
* build: where the building is running
* host: where the built software will run
* target: the machine the software will produce code for
The target machine is only relevant for compilers, so we can ignore
it.
Until now we could ignore the build and host distinction too. This
patch adds the first difference: don't use host ld_flags when linking
build tools (gen_crc_combine_table).
The reason for this change is to make it possible to build with
-Wl,--dynamic-linker pointing to a path that will exist on the host
machine, but may not exist on the build machine.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191207030408.987508-1-espindola@scylladb.com>
Empty base type makes for less boiler plate in implementations.
The "before" callbacks are for listeners who need to potentially
react/augment type creation/alteration _before_ actually
committing type to schema tables (and holding the semaphore for this).
I.e. it is for cdc to add/modify log/desc tables "atomically" with base.
The vm.swappiness sysctl controls the kernel's prefernce for swapping
anonymous memory vs page cache. Since Scylla uses very large amounts
of anonymous memory, and tiny amounts of page cache, the correct setting
is to prefer swapping page cache. If the kernel swaps anonymous memory
the reactor will stall until the page fault is satisfied. On the other
hand, page cache pages usually belong to other applications, usually
backup processes that read Scylla files.
This setting has been used in production in Scylla Cloud for a while
with good results.
Users can opt out by not installing the scylla-kernel-conf package
(same as with the other kernel tunables).
* seastar 166061da3...e440e831c (8):
> Fail tests on ubsan errors
> future: make a couple of asserts more strict
> future: Move make_ready out of line
> config: Do not allow zero rates
Fixes#5360
> future: add new state to avoid temporaries in get_available_state().
> future: avoid temporary future_state on get_available_state().
> future: inline future::abandoned
> noncopyable_function: Avoid uninitialized warning on empty types
Previously, scylla used min/max(blob)->blob overload for collections,
tuples and UDTs; effectively making the results being printed as blobs.
This PR adds "dynamically"-typed min()/max() functions for compound types.
These types can be complicated, like map<int,set<tuple<..., and created
in runtime, so functions for them are created on-demand,
similarly to tojson(). The comparison remains unchanged - underneath
this is still byte-by-byte weak lex ordering.
Fixes#5139
* jul-stas/5139-minmax-bad-printing-collections:
cql_query_tests: Added tests for min/max/count on collections
cql3: min()/max() for collections/tuples/UDTs do not cast to blobs
This tests new min/max function for collections and tuples. CFs
in test suite were named according to types being tested, e.g.
`cf_map<int,text>' what is not a valid CF name. Therefore, these
names required "escaping" of invalid characters, here: simply
replacing with '_'.
Before:
cqlsh> insert into ks.list_types (id, val) values (1, [3,4,5]);
cqlsh> select max(val) from ks.list_types;
system.max(val)
------------------------------------------------------------
0x00000003000000040000000300000004000000040000000400000005
After:
cqlsh> select max(val) from ks.list_types;
system.max(val)
--------------------
[3, 4, 5]
This is accomplished similarly to `tojson()`/`fromjson()`: functions
are generated on demand from within `cql3::functions::get()`.
Because collections can have a variety of types, including UDTs
and tuples, it would be impossible to statically define max(T t)->T
for every T. Until now, max(blob)->blob overload was used.
Because `impl_max/min_function_for` is templated with the
input/output type, which can be defined in runtime, we need type-erased
("dynamic") versions of these functors. They work identically, i.e.
they compare byte representations of lhs and rhs with
`bytes::operator<`.
Resolves#5139
If you merge a pull request that contains multiple patches via
the github interface, it will document itself as the committer.
Work around this brain damage by using the command line.
Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write.
The intent of using a separate connection is to increase fairness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.
Information about new RPC support is propagated through new gossip feature HINTED_HANDOFF_SEPARATE_CONNECTION.
Fixes#4974.
Tests: unit(release)
To make gms::inet_address::to_string() similar in output to origin.
The sole purpose being quick and easy fix of API/JMX ipv6
formatting of endpoints etc, where strings are used as lexical
comparisons instead of textual representation.
A better, but more work, solution is to fix the scylla-jmx
bridge to do explicit parse + re-format of addresses, but there
are many such callpoints.
An even better solution would be to fix nodetool to not make this
mistake of doing lexical comparisons, but then we risk breaking
merge compatibility. But could be an option for a separate
nodeprobe impl.
Message-Id: <20191204135319.1142-1-calle@scylladb.com>
Currently query_options objects is passed to a trace stopping function
which makes it mandatory to make them alive until the end of the
query. The reason for that is to add prepared statement parameters to
the trace. All other query options that we want to put in the trace are
copied into trace_state::params_values, so lets copy prepared statement
parameters there too. Trace enabled case will become a little bit more
expensive but on the other hand we can drop a continuation that holds
query_options object alive from a fast path. It is safe to drop the call
to stop_foreground_prepared() here since The tracing will be stopped
in process_request_one().
Message-Id: <20191205102026.GJ9084@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5381 by
Peng Jian, fixing multiple small issues with Redis:
* Rename the options related to Redis API, and describe them clearly.
* Rename redis_transport_port to redis_port
* Rename redis_transport_port_ssl to redis_ssl_port
* Rename redis_default_database_count to redis_database_count
* Remove unnecessary option enable_redis_protocol
* Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM
* Fix the DEL command: support to delete mutilple keys in one command.
* Fix the GET command: return the empty string when the required key is not exists.
* Fix the redis-test/test_del_non_existent_key: mark xfail.
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.
Fix by evaluating full_slice before moving the schema.
Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.
Fixes#5419.
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
CREATE INDEX index1 ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
skip-read-validation -node 127.0.0.1;
Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.
Refs #5409Fixes#4615Fixes#5418
This test execution time dominates by a serious margin
test execution time in dev/release mode: reducing its
execution time improves the test.py turnaround by over 70%.
Message-Id: <20191204135315.86374-2-kostja@scylladb.com>
Currently, 'scylla thread' uses arch_prctl() to extract the value of
fsbase, used to reference thread local variables. gdb 8 added support
for directly accessing the value as $fs_base, so use that instead. This
works from core dumps as well as live processes, as you don't need to
execute inferior functions.
The patch is required for debugging threads in core dumps, but not
sufficient, as we still need to set $rip and $rsp, and gdb still[1]
doesn't allow this.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=9370
The feature introduced by this commit declares that hints can be sent
using the new dedicated RPC verb. Before using the new verb, nodes need
to know if other nodes in the cluster will be able to handle the new
RPC verb.
Introduce a new verb dedicated for receiving and sending hints:
HINT_MUTATION. It is handled on the streaming connection, which is
separate from the one used for handling mutations sent by coordinator
during a write.
The intent of using a separate connection is to increase fariness while
handling hints and user requests - this way, a situation can be avoided
in which one type of requests saturate the connection, negatively
impacting the other one.
"
In several cases in distributed testing (dtest) we trigger compaction using nodetool compact assuming that when it is done, it is indeed really done.
However, the way compaction is currently implemented in scylla, it may leave behind some background tasks to delete the old sstables that were compacted.
This commit changes major compaction (triggered via the ss::force_keyspace_compaction api) so it would wait on the background deletes and will return only when they finish.
Fixes#4909
Tests: unit(dev), nodetool_refresh_with_data_perms_test, test_nodetool_snapshot_during_major_compaction
"
We may able to use chrony setup script on future version of RHEL/CentOS,
it better to run chrony setup when RHEL version >= 8, not only 8.
Note that on Fedora it still provides ntp/ntpdate package, so we run
ntp setup on it for now. (same on debian variants)
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191203192812.5861-1-syuu@scylladb.com>
`segment_manager' now uses a decorated version of `timed_out_error'
with hardcoded name. On the other hand `region_group' uses named
`on_request_expiry' within its `expiring_fifo'.
Fixes#5211
In 79935df959 replay apply-call was
changed from one with no continuation to one with. But the frozen
mutation arg was still just lambda local.
Change to use do_with for this case as well.
Message-Id: <20191203162606.1664-1-calle@scylladb.com>
Exception messages contain semaphore's name (provided in ctor).
This affects the queue overflow exception as well as timeout
exception. Also, custom throwing function in ctor was changed
to `prethrow_action', i.e. metrics can still be updated there but
now callers have no control over the type of the exception being
thrown. This affected `restricted_reader_max_queue_length' test.
`reader_concurrency_semaphore'-s docs are updated accordingly.
In a build configured with --debuginfo 0 the scylla binary still ends
up with some debug info from the libraries that are statically linked
in.
We should avoid compiling subprojects (including seastar) with debug
info when none is needed, but this at least avoids it showing up in
the binary.
The main motivation for this is that it is confusing to get a binary
with *some* debug info in it.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191127215843.44992-1-espindola@scylladb.com>
This series refactors the collection de/serialization code to use
fragmented buffers, avoiding the large allocations and the associated
pains when working with large collections. Currently all operations that
involve collections require deserializing them, executing the operation,
then serializing them again to their internal storage format. The
de/serialization operations happen in linearized buffers, which means
that we have to allocate a buffer large enough to hold the *entire*
collection. This can cause immense pressure on the memory allocator,
which, in the face of memory fragmentation, might be unable to serve the
allocation at all. We've seen this causing all sorts of nasty problems,
including but not limited to: failing compactions, failing memtable
flush, OOM crash and etc.
Users are strongly discouraged from using large collections, yet they
are still a fact of life and have been haunting us since forever.
The proper solution for these problems would be to come up with an
in-memory format for collections, however that is a major effort, with a
lot of unknowns. This is something we plan on doing at some point but
until it happens we should make life less painful for those with large
collections.
The goal of this series is to avoid the need of allocating these large
buffers. Serialization now happens into a `bytes_ostream` which
automatically fragments the values internally. Deserialization happens
with `utils::linearizing_input_stream` (introduced by this series), which
linearizes only the individual collection cells, but not the entire
collection.
An important goal of this series was to introduce the least amount of
risk, and hence the least amount of code. This series does not try to
make a revolution and completely revamp and optimize the
de/serialization codepaths. These codepaths have their days numbered so
investing a lot of effort into them is in vain. We can apply incremental
optimizations where we deem it necessary.
Fixes: #5341
Support to delete multiple keys in one DEL command.
The feature of returning number of the really deleted keys is still not supported.
Return empty string to client for GET command when the required key is not exists.
Fixes: #5334
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Rename option redis_transport_port to redis_port, which the redis transport listens on for clients.
Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients.
Rename option redis_database_count. Set the redis dabase count.
Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace.
Remove option enable_redis_protocol, which is unnecessary.
Fixes: #5335
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Commit 96009881d8 added diffutils to the dependencies via
Seastar's install-dependencies.sh, after it was inadvertantly
dropped in 1164ff5329 (update to Fedora 31; diffutils is no
longer brought in as a side effect of something else).
Regenerate the image to include diffutils.
Ref #5401.
This patch added subtests for EOF process, it reads and writes the socket
directly by using protocol cmds.
We can add more tests in future, tests with Redis module will hide some
protocol error.
Signed-off-by: Amos Kong <amos@scylladb.com>
podman needs to relabel directories in exactly the same cases docker
does. The difference is that podman cannot relabel /tmp.
The reason it was working before is that in practice anyone using
dbuild has already relabeled any directories that need relabeling,
with the exception of /tmp, since it is recreated on every boot.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191201235614.10511-2-espindola@scylladb.com>
Use `utils::linearizing_input_stream` for the deserizalization of the
collection. Allows for avoiding the linearization of the entire cell
value, instead only linearizing individual values as they are
deserialized from the buffer.
`linearizing_input_stream` allows transparently reading linearized
values from a fragmented buffer. This is done by linearizing on-the-fly
only those read values that happen to be split across multiple
fragments. This reduces the size of the largest allocation from the size
of the entire buffer (when the entire buffer is linearized) to the size
of the largest read value. This is a huge gain when the buffer contains
loads of small objects, and modest gains when the buffer contains few
large objects. But the even in the worst case the size of the largest
allocation will be less or equal compared to the case where the entire
buffer is linearized.
This stream is planned to be used as glue code between the fragmented
cell value and the collection deserialization code which expects to be
reading linearized values.
Currently the loop which writes the data from the fragmented origin to
the destination, moves to the next chunk eagerly after writing the value
of the current chunk, if the current chunk is exhausted.
This presents a problem when we are writing the last piece of data from
the last chunk, as the chunk will be exhausted and we eagerly attempt to
move to the next chunk, which doesn't exist and dereferencing it will
fail. The solution is to not be eager about moving to the next chunk and
only attempt it if we actually have more data to write and hence expect
more chunks.
The presence of `const_iterator` seems to be a requirement as well
although it is not part of the concept. But perhaps it is just an
assumption made by code using it.
Not just bytes::output_iterator. Allow writing into streams other than
just `bytes`. In fact we should be very careful with writing into
`bytes` as they require potentially large contiguous allocations.
The `write()` method is now templatized also on the type of its first
argument, which now accepts any CharOutputIterator. Due to our poor
usage of namespace this now collides with `write` defined inside
`db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to
be templatized on the data type it reads from, and de-templatizing it
resolves the clash.
Currently interactive RAID setup prompt does not list virtio-blk devices due to
following reasons:
- We fail matching '-p' option on 'lsblk --help' output since misusage of
regex functon, list_block_devices() always skipping to use lsblk output.
- We don't check existance of /dev/vd* when we skipping to use lsblk.
- We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e'
option, but we actually needed them.
To fix the problem we need to use re.search() instead of re.match() to match
'-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list,
then need to stop '-e 252' option on lsblk which excludes virtio-blk.
Additionally, it better to parse 'TYPE' field of lsblk output, we should skip
'loop' devices and 'rom' devices since these are not disk devices.
Fixes#4066
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191201160143.219456-1-syuu@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5392 from
Dejan Mircevski.
Refs #5034
The patches:
alternator: Implement LE operator in Expected
alternator: Implement GE operator in Expected
alternator: Make cmp diagnostic a value, not funct
utils: Add operator<< for big_decimal
alternator: Implement BETWEEN operator in Expected
All check_compare diagnostics are static strings, so there's no need
to call functions to get them. Instead of a function, make diagnostic
a simple value.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
"Fix two problem in scylla_io_setup:
- Problem 1: paths of default directories is invalid, introduced by
commit 5ec1915 ("scylla_io_setup: assume default directories under
/var/lib/scylla").
- Problem 2: wrong path join, introduced by commit 31ddb21
("dist/common/scripts: support nonroot mode on setup scripts").
Fix a problem in scylla_io_setup, scylla_fstrim and scylla_blocktune.py:
- Fixed default scylla directories when they aren't assigned in
scylla.yaml"
Fixes#5370
Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>
* 'scylla_io_setup' of git://github.com/amoskong/scylla:
use parse_scylla_dirs_with_default to get scylla directories
scylla_io_setup: fix data_file_directories check
scylla_util: introduce helper to process the default scylla directories
scylla_util: get workdir by datadir() if it's not assigned in scylla.yaml
scylla_io_setup: fix path join of default scylla directories
Use asyncio as a more modern way to work with concurrency,
Process signals in an event loop, terminate all outstanding
tests before exiting.
Breaking change: this commit requires Python 3.7 or
newer to run this script. The patch adds a version
check and a message to enforce it.
Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr
in one_session_records when constructed so it can be used
during shutdown.
Fixes#5243
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
_user cannot outlive client_state class instance, so there is no point
in holding it in shared_ptr.
Tested: debug test.py and dtest auth_test.py
Message-Id: <20191128131217.26294-5-gleb@scylladb.com>
Only do_stop_rpc_server uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.
Message-Id: <20191128131217.26294-3-gleb@scylladb.com>
Only do_stop_native_transport() uses the shared_ptr to prolong server's
lifetime until stop() completes, but do_with() can be used to achieve the
same.
Message-Id: <20191128131217.26294-2-gleb@scylladb.com>
* seastar 6f0ef32514...5c25de907a (7):
> shared_future: Fix crash when all returned futures time out
Fixes#5322.
> future: don't create temporaries on get_value().
> reactor: lower the default stall threshold to 200ms
> reactor: Simplify network initialization
> reactor: Replace most std::function with noncopyable_function
> futures: Avoid extra moves in SEASTAR_TYPE_ERASE_MORE mode
> inet_address: Make inet_address == operator ignore scope (again)
Merged pull request https://github.com/scylladb/scylla/pull/5311 from
Juliusz Stasiewicz:
This is a partial solution to #5139 (only for two types) because of the
above and because collections are much harder to do. They are coming in
a separate PR.
References #5139. Aggregate functions, like max(), when invoked
on `inet_address' and `time_native_type' used to choose
max(blob)->blob overload, with casting of argument and result to
bytes. This is because appropriate calls to
`aggregate_fcts::make_XXX_function()' were missing. This commit
adds them. Functioning remains the same but now clients see
user-friendly representations of aggregate result, not binary.
Comparing inet addresses without inet::operator< is performed by
trick, where ADL is bypassed by wrapping the name of std::min/max
and providing an overload of wrapper on inet type.
Currently we support to assign workdir from scylla.yaml, and we use many
hardcode '/var/lib/scylla' in setup scripts.
Some setup scripts get scylla directories by parsing scylla.yaml, introduced
parse_scylla_dirs_with_default() that adds default values if scylla directories
aren't assigned in scylla.yaml
Signed-off-by: Amos Kong <amos@scylladb.com>
Currently we are checking an invalid path of some default scylla directories,
the directories don't exist, so the tune will always be skipped. It caused by
two problem.
Problem 1: paths of default directories is invalid
Introduced by commit 5ec191536e, we try to tune some scylla default directories
if they exist. But the directory paths we try are wrong.
For example:
- What we check: /var/lib/scylla/commitlog_directory
- Correct one: /var/lib/scylla/commitlog
Problem 2: wrong path join
Introduced by commit 31ddb2145a, default_path might be replaced from
'/var/lib/scylla/' to '/var/lib/scylla'.
Our code tries to check an invalid path that is wrongly join, eg:
'/var/lib/scyllacommitlog'
Signed-off-by: Amos Kong <amos@scylladb.com>
The default values of data_file_directories and commitlog_directory were
commented by commit e0f40ed16a. It causes scylla_util.py:get_scylla_dirs() to
fail in checking the values.
This patch changed get_scylla_dirs() to return default data/commitlog
directories if they aren't set.
Fixes#5358
Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
Add a test, test_query.py::test_query_limit, to verify that the Limit
parameter correctly limits the number of rows returned by the Query.
This was supposed to already work correctly - but we never had a test for
it. As we hoped, the test passes (on both Alternator and DynamoDB).
Another test, test_query.py::test_query_limit_paging, verifies that
paging can be done with any setting of Limit. We already had tests
for paging of the Scan operation, but not for the Query operation.
Refs #5153
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This is a comprehensive test for the "Select" parameter of Query and Scan
operations, but only for the base-table case, not index, so another future
patch should add similar tests in test_gsi.py and test_lsi.py as well.
The main use of the Select parameter is to allow returning just the count
of items, instead of their content, but it also has other esoteric options,
all of which we test here.
The test currently succeeds on AWS DynamoDB, demonstrating that the test
is correct, but fails on Alternator because the "Select" parameter is not
yet supported. So the test is marked xfail.
Refs #5058
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Currently the command tries to read all seastar smp queues in its
initialization code in the constructor. This constructor is run each
time `scylla-gdb.py` is sourced in `gdb` which leads to slowdowns and
sometimes also annoying errors because the sourcing happens in the wrong
context and seastar symbols are not available.
Avoid this by running this initializing code lazily, on the first
invocation.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191127095408.112101-1-bdenes@scylladb.com>
This patchset adds missing "const" function qualifiers throughout
the Scylla code base, which would make code less error-prone.
The changeset incorporates Kostja's work regarding const qualifiers
in the cql code hierarchy along with a follow-up patch addressing the
review comment of the corresponding patch set (the patch subject is
"cql: propagate const property through prepared statement tree.").
The boost 1.67 release notes says
Changed maximum supported year from 10000 to 9999 to resolve various issues
So change the test to use a larger number so that we get an exception
with both boost 1.66 and boost 1.67.
Fixes#5344
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191126180327.93545-1-espindola@scylladb.com>
Docker on Fedora 31 is flakey, and is not supported at all on RHEL 8.
Podman is a drop-in replacement for docker; this series adds support
for using podman in dbuild.
Apart from actually working on Fedora 31 hosts,
podman is nicer in being more secure and not requiring a daemon.
Fixes#5332
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "[it] was successull" one. In between it catches
the exception (if any) and warns this in logs.
By "then" I mean literally then, not the seastar's then() :)
Fixes: #4586
By default, semaphore exceptions bring along very little context:
either that a semaphore was broken or that it timed out.
In order to make debugging easier without introducing significant
runtime costs, a notion of named semaphore is added.
A named semaphore is simply a semaphore with statically defined
name, which is present in its errors, bringing valuable context.
A semaphore defined as:
auto sem = semaphore(0);
will present the following message when it breaks:
"Semaphore broken"
However, a named semaphore:
auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"});
will present a message with at least some debugging context:
"Semaphore broken: io_concurrency_sem"
It's not much, but it would really help in pinpointing bugs
without having to inspect core dumps.
At the same time, it does not incur any costs for normal
semaphore operations (except for its creation), but instead
only uses more CPU in case an error is actually thrown,
which is considered rare and not to be on the hot path.
Refs #4999
Tests: unit(dev), manual: hardcoding a failure in view building code
Currently scylla_io_setup will skip in scylla_setup, because we didn't support
those new instance types.
I manually executed scylla_io_setup, and the scylla-server started and worked
well.
Let's apply this patch first, then check if there is some new problem in
ami-test.
Signed-off-by: Amos Kong <amos@scylladb.com>
cql_statement is a class representing a prepared statement in Scylla.
It is used concurrently during execution, so it is important that its
change is not changed by execution.
Add const qualifier to the execution methods family, throghout the
cql hierarchy.
Mark a few places which do mutate prepared statement state during
execution as mutable. While these are not affecting production today,
as code ages, they may become a source of latent bugs and should be
moved out of the prepared state or evaluated at prepare eventually:
cf_property_defs::_compaction_strategy_class
list_permissions_statement::_resource
permission_altering_statement::_resource
property_definitions::_properties
select_statement::_opts
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
---
v2:
- Have stop easrlier so that exception in start/listen do
not prevent prometheu.stop from calling
As suggested in issue #4586 here is the helper that prints
"shutting down foo" message, then shuts the foo down, then
prints the "shutting down foo was successfull". In between
it catches the exception (if any) and warns this in logs.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5218
from Piotr Jastrzębski:
Users should be able to decide whether they need preimage or not. There is
already an option for that but it's not respected by the implementation.
This PR adds support for this functionality.
Tests: unit(dev).
Individual patches:
cdc: Don't take storage_proxy as transformer::pre_image_select param
cdc::append_log_mutations: use do_with instead of shared_ptr
cdc::append_log_mutations: fix undefined behavior
cdc: enable preimage in test_pre_image_logging test
cdc: Return preimage only when it's requested
cdc: test both enabled and disabled preimage in test_pre_image_logging
Before stopping the db itself, stop the migration service.
It must be stopped before RPC, but RPC is not stopped yet
itself, so we should be safe here.
Here's the tail of the resulting logs:
INFO 2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager
INFO 2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service
INFO 2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service
INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started
INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished
INFO 2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server
INFO 2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete.
Also -- stop the mm on drain before the commitlog it stopped.
[Tomasz: mm needs the cl because pulling schema changes from other nodes
involves applying them into the database. So cl/db needs to be
stopped after mm is stopped.]
The drain logs would look like
...
INFO 2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service
INFO 2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service
INFO 2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED:
and then on stop
...
INFO 2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager
INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started
INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished
INFO 2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server
INFO 2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete.
Fixes#5300
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191125080605.7661-1-xemul@scylladb.com>
In get_full_row_hashes_with_rpc_stream and
repair_get_row_diff_with_rpc_stream_process_op which were introduced in
the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not
updated correctly.
In the test we have 3 nodes and run repair on node3, we makes sure the
following metrics are correct.
assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'],
node3_metrics['scylla_repair_rx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'],
node3_metrics['scylla_repair_tx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'],
node3_metrics['scylla_repair_rx_row_nr'])
assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'],
node3_metrics['scylla_repair_tx_row_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'],
node3_metrics['scylla_repair_rx_row_bytes'])
assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'],
node3_metrics['scylla_repair_tx_row_bytes'])
Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test
Fixes: #5339
Backports: 3.2
The code was iterating over a collection that was modified
at the same time. Iterators were used for that and collection
modification can invalidate all iterators.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5310 from
Avi Kivity:
This is a minor update as gcc and boost versions did not change. A noteable
update is patchelf 0.10, which adds support to large binaries.
A few minor issues exposed by the update are fixed in preparatory patches.
Patches:
dist: rpm: correct systemd post-uninstall scriptlet
build: force xz compression on rpm binary payload
tools: toolchain: update to Fedora 31
Since 90d6c0b, cache will abort when trying to detach partition
entries while they're updated. This should never happen. It can happen
though, when the update fails on bad_alloc, because the cleanup guard
invalidates the cache before it releases partition snapshots (held by
"update" coroutine).
Fix by destroying the coroutine first.
Fixes#5327.
Tests:
- row_cache_test (dev)
Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>
By default rpm uses dwz to merge the debug info from various
binaries. Unfortunately, it looks like addr2line has not been updated
to handle this:
// This works
$ addr2line -e build/release/scylla 0x1234567
$ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug
// now this fails
$ addr2line -e build/release/scylla 0x1234567
I think the issue is
https://sourceware.org/bugzilla/show_bug.cgi?id=23652Fixes#5289
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123015734.89331-1-espindola@scylladb.com>
By default we were compressing debug info only in release
executables. The idea, if I understand it correctly, is that those are
the ones we ship, so we want a more compact binary.
I don't think that was doing anything useful. The compression is just
gzip, so when we ship a .tar.xz, having the debug info compressed
inside the scylla binary probably reduces the overall compression a
bit.
When building a rpm the situation in amusing. As part of the rpm
build process the debug info is decompressed and extracted to an
external file.
Given that most of the link time goes to compressing debug info, it is
probably a good idea to just skip that.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123022825.102837-1-espindola@scylladb.com>
Structure the code to be able to introduce futures.
Apply trivial cleanups.
Switch to asyncio and use it to work with processes and
handle signals. Cleanup all processes upon signal.
This patch implements a simple optimization for LWT: it makes PAXOS
prepare phase query locally and return the current value of the modified
key so that a separate query is not necessary. For more details see
patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial
preparatory refactoring.
Current LWT implementation uses at least three network round trips:
- first, execute PAXOS prepare phase
- second, query the current value of the updated key
- third, propose the change to participating replicas
(there's also learn phase, but we don't wait for it to complete).
The idea behind the optimization implemented by this patch is simple:
piggyback the current value of the updated key on the prepare response
to eliminate one round trip.
To generate less network traffic, only the closest to the coordinator
replica sends data while other participating replicas send digests which
are used to check data consistency.
Note, this patch changes the API of some RPC calls used by PAXOS, but
this should be okay as long as the feature in the early development
stage and marked experimental.
To assess the impact of this optimization on LWT performance, I ran a
simple benchmark that starts a number of concurrent clients each of
which updates its own key (uncontended case) stored in a cluster of
three AWS i3.2xlarge nodes located in the same region (us-west-1) and
measures the aggregate bandwidth and latency. The test uses shard-aware
gocql driver. Here are the results:
latency 99% (ms) bandwidth (rq/s) timeouts (rq/s)
clients before after before after before after
1 2 2 626 637 0 0
5 4 3 2616 2843 0 0
10 3 3 4493 4767 0 0
50 7 7 10567 10833 0 0
100 15 15 12265 12934 0 0
200 48 30 13593 14317 0 0
400 185 60 14796 15549 0 0
600 290 94 14416 15669 0 0
800 568 118 14077 15820 2 0
1000 710 118 13088 15830 9 0
2000 1388 232 13342 15658 85 0
3000 1110 363 13282 15422 233 0
4000 1735 454 13387 15385 329 0
That is, this optimization improves max LWT bandwidth by about 15%
and allows to run 3-4x more clients while maintaining the same level
of system responsiveness.
invoke_on() guarantees that captures object won't be destroyed until the
future returned by the invoked function is resolved so there's no need
to move key, token, proposal for calling paxos_state::*_impl helpers.
The test_health_only_works_for_root_path test checks that while Alternator's
HTTP server responds to a "GET /" request with success ("health check"), it
should respond to different URLs with failures (page not found).
One of the URLs it tested was "/..", but unfortunately some versions of
Python's HTTP client canonize this request to just a "/", causing the
request to unexpectedly succeed - and the test to fail.
So this patch just drops the "/.." check. A few other nonsense URLs are
attempted by the test - e.g., "/abc".
Fixes#5321
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
One of the fields still missing in DescribeTable's response (Refs #5026)
was the table's schema - KeySchema and AttributeDefinitions.
This patch adds this missing feature, and enables the previously-xfailing
test test_describe_table_schema.
A complication of this patch is that in a table with secondary indexes,
we need to return not just the base table's schema, but also the indexes'
schema. The existing tests did not cover that feature, so we add here
two more tests in test_gsi.py for that.
One of these secondary-index schema tests, test_gsi_2_describe_table_schema,
still fails, because it outputs a range-key which Scylla added to a view
because of its own implementation needs, but wasn't in the user's
definition of the GSI. I opened a separate issue #5320 for that.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats
reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or
std::tuple<> as T.
We need it to avoid copying query::result while serializing
paxos::promise.
Currently even if `-a` or `-s 0` is provided, `scylla task_histogram`
will scan a limited amount of pages due to a bug in the scan loop's stop
condition, which will be trigger a stop once the default sample limit is
reached. Fix the loop by skipping this check when the user wants to scan
all tasks.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>
At least some versions of 'podman logs --follow' hang when the
container eventually exits (also happens with docker on recent
versions). Fortunately, we don't need to use 'podman logs --follow'
and can use the more natural non-detached 'podman run', because
podman does not proxy SIGTERM and instead shuts down the container
when it receives it.
So, to work around the problem, use the same code path in interactive
and non-interactive runs, when podman is in use instead of docker.
With docker, we went to considerable lengths to ensure that
access to mounted volume was done using the calling user, including
supplementary groups. This avoids root-owned files being left around
after a build, and ensures that access to group-shared files (like
/var/cache/ccache) works as expected.
All of this is unnecessary and broken when using podman. Podman
uses a proxy to access files on behalf of the container, so naturally
all access is done using the calling user's identity. Since it remaps
user and group IDs, assigning the host uid/gid is meaningless. Using
--userns host also breaks, because sudo no longer works.
Fix this by making all the uid/gid/selinux games specific to docker and
ignore them when using podman. To preserve the functionality of tools
that depend on $HOME, set that according to the host setting.
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.
Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.
Fixes#3717
podman refuses to start with duplicate volumes, which routinely
happen if the toplevel directory is the working directory. Detect
this and avoid the duplicate.
UnitTest class uses juggles with the name 'args' quite a bit to
construct the command line for a unit test, so let's spread
the harness command line arguments from the unit test command line
arguments a bit apart by consistently calling the harness command line
arguments 'options', and unit test command line arguments 'args'.
Rename usage() to parse_cmd_line().
Create unique UnitTest objects in find_tests() for each found match,
including repeat, to ensure each test has its own unique id.
This will also be used to store execution state in the test.
It somewhat stands in the way of using asyncio
This patch also implements a more comprehensive
fix for #5303, since we not only have --repeat, but
run some tests in different configurations, in which
case xml output is also overwritten.
When starting scylla daemon as non-root the initialization fails
because standard /var/lib/scylla is not accessible by regular users.
Making the default dir accessible for user is not very convenient
either, as it will cause conflicts if two or more instances of scylla
are in use.
This problem can be resolved by specifying --commitlog-directory,
--data-file-directories, etc on start, but it's too much typing. I
propose to revive Nadav's --home option that allows to move all the
directories under the same prefix in one go.
Unlike Nadav's approach the --workdir option doesn't do any tricky
manipulations with existing directories. Insead, as Pekka suggested,
the individual directories are placed under the workir if and only
if the respective option is NOT provided. Otherwise the directory
configuration is taken as is regardless of whether its absolute or
relative path.
The values substutution is done early on start. Avi suggested that
this is unsafe wrt HUP config re-read and proper paths must be
resolved on the fly, but this patch doesn't address that yet, here's
why.
First of all, the respective options are MustRestart now and the
substitution is done before HUP handler is installed.
Next, commitlog and data_file values are copied on start, so marking
the options as LiveUpdate won't make any effect.
Finally, the existing named_value::operator() returns a reference,
so returning a calculated (and thus temporary) value is not possible
(from my current understanding, correct me if I'm wrong). Thus if we
want the *_directory() to return calculated value all callers of them
must be patched to call something different (e.g. *_directory.get() ?)
which will lead to more confusion and errors.
Changes v3:
- the option is --workdir back again
- the existing *directory are only affected if unset
- default config doesn't have any of these set
- added the short -W alias
Changes v2:
- the option is --home now
- all other paths are changed to be relative
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191119130059.18066-1-xemul@scylladb.com>
I found these mismatched types while converting some member functions
to standalone functions, since they have to use the public API that
has more type checks.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191120181213.111758-4-espindola@scylladb.com>
Use pkg-config to search for Lua dependencies rather
than hard-code include and link paths.
Avoid using boost internals, not present in earlier
versions of boost.
Reviewed-by: Rafael Avila de Espindola <espindola@scylladb.com>
Message-Id: <20191120170005.49649-1-kostja@scylladb.com>
Use `-Wl,--threads` flag to enable multi-threaded linking when
using `ld.gold` linker.
Additional compilation test is required because it depends on whether
or not the `gold` linker has been compiled with `--enable-threads` option.
This patch introduces a substantial improvement to the link times of
`scylla` binary in release and debug modes (around 30 percent).
Local setup reports the following numbers with release build for
linking only build/release/scylla:
Single-threaded mode:
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.30
Multi-threaded mode:
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.57
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191120163922.21462-1-pa.solodovnikov@scylladb.com>
Merged patch series from Peng Jian, adding optionally-enabled Redis API
support to Scylla. This feature is experimental, and partial - the extent
of this support is detailed in docs/redis/redis.md.
Patches:
Document: add docs/redis/redis.md
redis: Redis API in Scylla
Redis API: graft redis module to Scylla
redis-test: add test cases for Redis API
This is a minor update as gcc and boost versions do not change.
glibc-langpack-en no longer gets pulled in by default. As it is required
by some locale use somewhere, it is added to the explicit dependencies.
Fedora 31 switched the default compression to zstd, which isn't readable
by some older rpm distributions (CentOS 7 in particular). Tell it to use
the older xz compression instead, so packages produced on Fedora 31 can
be installed on older distributions.
The post-uninstall scriptlet requires a parameter, but older versions
of rpm survived without it. Fedora 31's rpm is more strict, so supply
this parameter.
In this document, the detailed design and implementation of Redis API in
Scylla is provided.
v2: build: work around ragel 7 generated code bug (suggested by Avi)
Ragel 7 incorrectly emits some unused variables that don't compile.
As a workaround, sed them away.
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
Scylla has advantage and amazing features. If Redis build on the top of Scylla,
it has the above features automatically. It's achived great progress
in cluster master managment, data persistence, failover and replication.
The benefits to the users are easy to use and develop in their production
environment, and taking avantages of Scylla.
Using the Ragel to parse the Redis request, server abtains the command name
and the parameters from the request, invokes the Scylla's internal API to
read and write the data, then replies to client.
Signed-off-by: Peng Jian, <pengjian.uestc@gmail.com>
Merged patch set by Piotr Dulikowski:
This change corrects condition on which a row was considered expired by its
TTL.
The logic that decides when a row becomes expired was inconsistent with the
logic that decides if a single cell is expired. A single cell becomes expired
when expiry_timestamp <= now, while a row became expired when
expiry_timestamp < now (notice the strict inequality). For rows inserted
with TTL, this caused non-key cells to expire (change their values to null)
one second before the row disappeared. Now, row expiry logic uses non-strict
inequality.
Fixes#4263,
Fixes#5290.
Tests:
unit(dev)
python test described in issue #5290
It is useful to have an option to limit the execution time of a shell
script.
This patch adds an optional timeout parameter, if a parameter will be
provided a command will return and failure if the duration is passed.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Merged patch series from Juliusz Stasiewicz:
Welcome to my first PR to Scylla!
The task was intended as a warm-up ("noob") exercise; its description is
here: #4182 Sorry, I also couldn't help it and did some scouting: edited
descriptions of some metrics and shortened few annoyingly long LoC.
Those are typically symptoms of use-after-free or memory corruption in
the program. It's better to catch such error sooner than later.
That situation is also dangerous since if a valid descriptor would
land under the invalid access, not the one which was intended for the
operation, then the operation may be performed on the wrong file and
result in corruption.
Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>
This change corrects condition on which a row was considered expired by
its TTL.
The logic that decides when a row becomes expired was inconsistent with
the logic that decides if a single cell is expired. A single cell
becomes expired when `expiry_timestamp <= now`, while a row became
expired when `expiry_timestamp < now` (notice the strict inequality).
For rows inserted with TTL, this caused non-key cells to expire (change
their values to null) one second before the row disappeared. Now, row
expiry logic uses non-strict inequality.
Fixes: #4263, #5290.
Tests:
- unit(dev)
- python test described in issue #5290
Currently, we overwrite the same XML output file for each test repeat
cycle. This can cause invalid XML to be generated if the XML contents
don't match exactly for every iteration.
Fix the problem by appending the test repeat cycle in the XML filename
as follows:
$ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test
$ ls -1 *.xml
jenkins_test.release.vint_serialization_test.0.boost.xml
jenkins_test.release.vint_serialization_test.1.boost.xml
jenkins_test.release.vint_serialization_test.2.boost.xml
Fixes#5303.
Message-Id: <20191119092048.16419-1-penberg@scylladb.com>
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.
To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.
"
This patch series adds only UDF support, UDA will be in the next patch series.
With this all CQL types are mapped to Lua. Right now we setup a new
lua state and copy the values for each argument and return. This will
be optimized once profiled.
We require --experimental to enable UDF in case there is some change
to the table format.
"
* 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits)
Lua: Document the conversions between Lua and CQL
Lua: Implement decimal subtraction
Lua: Implement decimal addition
Lua: Implement support for returning decimal
Lua: Implement decimal to string conversion
Lua: Implement decimal to floating point conversion
Lua: Implement support for decimal arguments
Lua: Implement support for returning varint
Lua: Implement support for returning duration
Lua: Implement support for duration arguments
Lua: Implement support for returning inet
Lua: Implement support for inet arguments
Lua: Implement support for returning time
Lua: Implement support for time arguments
Lua: Implement support for returning timeuuid
Lua: Implement support for returning uuid
Lua: Implement support for uuid and timeuuid arguments
Lua: Implement support for returning date
Lua: Implement support for date arguments
Lua: Implement support for returning timestamp
...
Add mode_list rule to ninja build and use it by default when searching
for tests in test.py.
Now it is no longer necessary to explicitly specify the test mode when
invoking test.py.
(cherry picked from commit a211ff30c7f2de12166d8f6f10d259207b462d4b)
The goal of this patch is to fix issue #5280, a rather serious Alternator
bug, where Scylla fails to restart when an Alternator table has secondary
indexes (LSI or GSI).
Traditionally, Cassandra allows table names to contain only alphanumeric
characters and underscores. However, most of our internal implementation
doesn't actually have this restriction. So Alternator uses the characters
':' and '!' in the table names to mark global and local secondary indexes,
respectively. And this actually works. Or almost...
This patch fixes a problem of listing, during boot, the sstables stored
for tables with such non-traditional names. The sstable listing code
needlessly assumes that the *directory* name, i.e., the CF names, matches
the "\w+" regular expression. When an sstable is found in a directory not
matching such regular expression, the boot fails. But there is no real
reason to require such a strict regular expression. So this patch relaxes
this requirement, and allows Scylla to boot with Alternator's GSI and LSI
tables and their names which include the ":" and "!" characters, and in
fact any other name allowed as a directory name.
Fixes#5280.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191114153811.17386-1-nyh@scylladb.com>
This document adds information about how fixes are tracked to be
backported into releases and what is the procedure that is followed to
backport those fixes.
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Allow filtering the resolved addresses by a startswith string.
The common use case if for resolving vtable ptrs, when resolving
the output of `find_vptrs` that may be too long for the host
(running gdb) memory size. In this case the number of vtable
ptrs is considerably smaller than the total number of objects
returned by find_ptrs (e.g. 462 vs. 69625 in a OOM core I
examined from scylla --smp=2 --memory=1024M)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
CQL tracing would only report file I/O involving one sstable, even if
multiple sstables were read from during the query.
Steps to reproduce:
create a table with NullCompactionStrategy
insert row, flush memtables
insert row, flush memtables
restart Scylla
tracing on
select * from table
The trace would only report DMA reads from one of the two sstables.
Kudos to @denesb for catching this.
Related issue: #4908
There are ... signs of massive start/stop code rework in the
main() function. While fixing the sub-modules interdependencies
during start/stop I've polished these signs too, so here's the
simplest ones.
This is just the minimum to pass a value to Lua. Right now you can't
actually do anything with it.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This adds support for all integer types. Followup commits will
implement the missing types.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This makes it substantially simpler to support both varint and
decimal, which will be implemented in a followup patch.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
With this we support all simple integer types. Followup patches will
implement the missing types.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This add a wrapper around the lua interpreter so that function
executions are interruptible and return futures.
With this patch it is possible to write and use simple UDFs that take
and return integer values.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This adds a requires_thread predicate to functions and propagates that
up until we get to code that already returns futures.
We can then use the predicate to decide if we need to use
seastar::async.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This refactors test_schema_digest_does_not_change to also test a
schema with user defined functions and user defined aggregates.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
With this it is possible to create user defined functions and
aggregates and they are saved to disk and the schema change is
propagated.
It is just not possible to call them yet.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The parser now rejects having both OR REPLACE and IF NOT EXISTS in the
same statement.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This updates UDF syntax to the current specification.
In particular, this removes DETERMINISTIC and adds "CALLED ON NULL
INPUT" and "RETURNS NULL ON NULL INPUT".
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
At some point we should make the function list non static, but this
allows us to write tests for now.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This avoids allocating a std::vector and is more flexible since the
iterator can be passed to erase.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This is a simple wrapper that allows code that is not in the types
hierarchy to visit a data_value.
Will be used by UDF.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Similar to "gossip: Limit number of pending gossip ACK messages", limit
the number of pending gossip ACK2 messages in gossiper::handle_ack_msg.
Fixes#5210
In a cross-dc large cluster, the receiver node of the gossip SYN message
might be slow to send the gossip ACK message. The ack messages can be
large if the payload of the application state is big, e.g.,
CACHE_HITRATES with a lot of tables. As a result, the unlimited ACK
message can consume unlimited amount of memory which causes OOM
eventually.
To fix, this patch queues the SYN message and handles it later if the
previous ACK message is still being sent. However, we only store the
latest SYN message. Since the latest SYN message from peer has the
latest information, so it is safe to drop the previous SYN message and
keep the latest one only. After this patch, there can be at most 1
pending SYN message and 1 pending ACK message per peer node.
Fixes#5210
Now that compaction returns only after the compacted sstables are
deleted we no longer need to stop the base to force waiting
for deletes (that were previously done asynchronously)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
// FIXME: still missing some response fields (issue #5026)
// FIXME: more attributes! Check https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TableDescription.html#DDB-Type-TableDescription-TableStatus but also run a test to see what DyanmoDB really fills
// maybe for TableId or TableArn use schema.id().to_sstring().c_str();
"summary":"Return the generation value for this node.",
"type":"int",
"type":"long",
"nickname":"get_current_generation_number",
"produces":[
"application/json"
@@ -646,7 +646,7 @@
{
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace",
"type":"int",
"type":"long",
"nickname":"force_keyspace_cleanup",
"produces":[
"application/json"
@@ -678,7 +678,7 @@
{
"method":"GET",
"summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",
"type":"int",
"type":"long",
"nickname":"scrub",
"produces":[
"application/json"
@@ -726,7 +726,7 @@
{
"method":"GET",
"summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",
"type":"int",
"type":"long",
"nickname":"upgrade_sstables",
"produces":[
"application/json"
@@ -800,7 +800,7 @@
"summary":"Return an array with the ids of the currently active repairs",
"type":"array",
"items":{
"type":"int"
"type":"long"
},
"nickname":"get_active_repair_async",
"produces":[
@@ -816,7 +816,7 @@
{
"method":"POST",
"summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",
"type":"int",
"type":"long",
"nickname":"repair_async",
"produces":[
"application/json"
@@ -947,7 +947,7 @@
"description":"The repair ID to check for status",
sm::description("Counts a total number of CQL CREATE INDEX requests.")),
sm::description("Counts the total number of CQL CREATE INDEX requests.")),
sm::make_derive(
"secondary_index_drops",
_cql_stats.secondary_index_drops,
sm::description("Counts a total number of CQL DROP INDEX requests.")),
sm::description("Counts the total number of CQL DROP INDEX requests.")),
// secondary_index_reads total count is also included in all cql reads
sm::make_derive(
"secondary_index_reads",
_cql_stats.secondary_index_reads,
sm::description("Counts a total number of CQL read requests performed using secondary indexes.")),
sm::description("Counts the total number of CQL read requests performed using secondary indexes.")),
// secondary_index_rows_read total count is also included in all cql rows read
sm::make_derive(
"secondary_index_rows_read",
_cql_stats.secondary_index_rows_read,
sm::description("Counts a total number of rows read during CQL requests performed using secondary indexes.")),
sm::description("Counts the total number of rows read during CQL requests performed using secondary indexes.")),
// read requests that required ALLOW FILTERING
sm::make_derive(
"filtered_read_requests",
_cql_stats.filtered_reads,
sm::description("Counts a total number of CQL read requests that required ALLOW FILTERING. See filtered_rows_read_total to compare how many rows needed to be filtered.")),
sm::description("Counts the total number of CQL read requests that required ALLOW FILTERING. See filtered_rows_read_total to compare how many rows needed to be filtered.")),
// rows read with filtering enabled (because ALLOW FILTERING was required)
sm::make_derive(
"filtered_rows_read_total",
_cql_stats.filtered_rows_read_total,
sm::description("Counts a total number of rows read during CQL requests that required ALLOW FILTERING. See filtered_rows_matched_total and filtered_rows_dropped_total for information how accurate filtering queries are.")),
sm::description("Counts the total number of rows read during CQL requests that required ALLOW FILTERING. See filtered_rows_matched_total and filtered_rows_dropped_total for information how accurate filtering queries are.")),
// rows read with filtering enabled and accepted by the filter
sm::make_derive(
"filtered_rows_matched_total",
_cql_stats.filtered_rows_matched_total,
sm::description("Counts a number of rows read during CQL requests that required ALLOW FILTERING and accepted by the filter. Number similar to filtered_rows_read_total indicates that filtering is accurate.")),
sm::description("Counts the number of rows read during CQL requests that required ALLOW FILTERING and accepted by the filter. Number similar to filtered_rows_read_total indicates that filtering is accurate.")),
// rows read with filtering enabled and rejected by the filter
sm::description("Counts a number of rows read during CQL requests that required ALLOW FILTERING and dropped by the filter. Number similar to filtered_rows_read_total indicates that filtering is not accurate and might cause performance degradation.")),
sm::description("Counts the number of rows read during CQL requests that required ALLOW FILTERING and dropped by the filter. Number similar to filtered_rows_read_total indicates that filtering is not accurate and might cause performance degradation.")),
sm::make_derive(
"select_bypass_caches",
_cql_stats.select_bypass_caches,
sm::description("Counts the number of SELECT statements with BYPASS CACHE option.")),
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.