Fixes#6459
When moving or removing endpoints, we should ensure
that the set of available racks reflect the nodes
known, i.e. match what would be the result of a
reboot + create sets initially.
Message-Id: <20200519153300.15391-1-calle@scylladb.com>
Although Python 2 is deprecated, some systems today still have "python"
and "pytest" pointing to Python 2, so it would be convenient for the
Alternator tests to work on both Python 2 and 3 if it's not too much
of an effort.
And it really isn't too much of an effort - they all work on both versions
except for one problem introduced in the previous test patch: The syntax b''
for an empty byte array works correctly on Python 3 but incorrectly on
Python 2: In Python 2, b'' is just a normal empty string, not byte array,
which confuses Boto3 which refuses to accept a string as a value for a
byte-array key.
The trivial fix is to replace b'' by bytearray('', 'utf-8').
Uglier, but works as expected on both Python 2 and 3.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200519214321.25152-1-nyh@scylladb.com>
* seastar 92365e7b8...ee516b1cc (17):
> build: use -fcommon compiler flag for dpdk
> coroutines: reduce template bloat
> thread: make async noexcept
> file: specify methods noexcept
> doc: drop grace period for old C++ standard revisions
> semaphore: specify consume_units as noexcept
> doc/tutorial.md: add short intro to seastar::sharded<>
> future: Move promise_base move constructor out of line
> coroutines: enable for C++20
> tutorial: adjust evaluation order warning to note it is C++14-only
> rpc_test: Fix test_stream_connection_error with valgrind
> file: Remove unused lambda capture
> install-dependencies: add valgrind to arch
> coroutines_test: Don't access a destroyed lambda
> tutorial: warn about evaluation order pitfall
> merge: apps: improvements in httpd and seawreck
> file: Move functions out of line
The Alternator test (test/alternator/run) runs the real Scylla executable
to test it. Users sometimes want to run Scylla manually in parallel (on
different IP addresses, of course) and sometimes use commands like
"killall scylla" to stop it, may be surprised that this command will also
unintentionally kill a running test.
So what this patch does is to name the Scylla process used for the test
with the name "test_scylla". It will be visible as "test_scylla" in top,
and a "killall scylla" will not touch it. You can, of course, kill it with
a "killall test_scylla" if you wish.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200519071604.19161-1-nyh@scylladb.com>
According to DynamoDB, string/binary blob keys cannot be empty
and this definition affects secondary indexes as well.
As a result, only nonempty strings/binary blobs are accepted
as values for columns which form a GSI or LSI key.
Clarify in README.md that the instructions there will build a Docker image
containing a Scylla executable downloaded from downloads.scylla.com - NOT
the one you built yourself. The image is also CentOS based - not Fedora-based
as claimed.
In addition, a new dist/docker/redhat/README.md explains the somewhat
steps needed to actually build a Docker image with the Scylla executable
that you built. In the future, these steps should be automated (e.g.,
"ninja docker") but until then, let's at least document the process.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200518151123.11313-1-nyh@scylladb.com>
C++20 deprecates capturing this in default-copy lambdas ([=]), with
good reason. Move to explicit captures to avoid any ambiguity and
reduce warning spew.
Message-Id: <20200517150834.753463-1-avi@scylladb.com>
CDC Log is a time series which uses time window compaction with some
time window. Data is TTLed with the same value. This means that sstable
won't become fully expired more often than once per time window
duration.
This patch sets expired_sstable_check_frequency_seconds compaction
strategy parameter to half of the time window. Default value of this
parameter is 10 minutes which in most cases won't be a good fit.
By default, we set TTL to 24h and time window to 1h. This means that
with a default value of the parameter we would be checking every 10
minutes but new expired sstable would appear only every 60 minutes.
The parameter is set to half of the time window duration because it's
the expected time we have to wait for sstable to become fully expired.
Half of the time we will wait longer and half of the time we will wait
shorter.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
LWT batches conditions can't span multiple tables.
This was detected in batch_statement::validate() called in ::prepare().
But ::cas_result_set_metadata() was built in the constructor,
causing a bitset assert/crash in a reported scenario.
This patch moves validate() to the constructor before building metadata.
Closes#6332
Tested with https://github.com/scylladb/scylla-dtest/pull/1465
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
C++20 deprecates capturing this in default-copy lambdas ([=]), with
good reason. Move to explicit captures to avoid any ambiguity and
reduce warning spew.
Message-Id: <20200517150921.754073-1-avi@scylladb.com>
C++20 deprecates capturing this in default-copy lambdas ([=]), with
good reason. Move to explicit captures to avoid any ambiguity and
reduce warning spew.
Message-Id: <20200517151023.754906-1-avi@scylladb.com>
In a recent next failure I got the following backtrace
#3 0x00007efd71251a66 in __GI___assert_fail (assertion=assertion@entry=0x2d0c00 "this->_con->get()->sink_closed()", file=file@entry=0x32c9d0 "./seastar/include/seastar/rpc/rpc_impl.hh", line=line@entry=795,
function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101
#4 0x0000000001f5d2c3 in seastar::rpc::sink_impl<netw::serializer, repair_row_on_wire_with_cmd>::~sink_impl (this=<optimized out>, __in_chrg=<optimized out>) at ./seastar/include/seastar/core/future.hh:312
#5 0x0000000001f5d2f4 in seastar::shared_ptr_count_for<seastar::rpc::sink_impl<netw::serializer, repair_row_on_wire_with_cmd> >::~shared_ptr_count_for (this=0x60100075b680, __in_chrg=<optimized out>)
at ./seastar/include/seastar/core/shared_ptr.hh:463
#6 seastar::shared_ptr_count_for<seastar::rpc::sink_impl<netw::serializer, repair_row_on_wire_with_cmd> >::~shared_ptr_count_for (this=0x60100075b680, __in_chrg=<optimized out>) at ./seastar/include/seastar/core/shared_ptr.hh:463
#7 0x000000000240f2e6 in seastar::shared_ptr<seastar::rpc::sink<repair_row_on_wire_with_cmd>::impl>::~shared_ptr (this=0x601003118590, __in_chrg=<optimized out>) at ./seastar/include/seastar/core/future.hh:427
#8 seastar::rpc::sink<repair_row_on_wire_with_cmd>::~sink (this=0x601003118590, __in_chrg=<optimized out>) at ./seastar/include/seastar/rpc/rpc_types.hh:270
#9 <lambda(auto:134&)>::<lambda(const seastar::rpc::client_info&, uint64_t, seastar::rpc::source<repair_hash_with_cmd>)>::<lambda(std::__exception_ptr::exception_ptr)>::~<lambda> (this=0x601003118570, __in_chrg=<optimized out>)
at repair/row_level.cc:2059
This patch changes a few functions to use finally to make sure the sink
is always closed.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200515202803.60020-1-espindola@scylladb.com>
Some statements made in docs/alternator/alternator.md on having a single
keyspace, or recommending a DNS setup, are not up-to-date. So fix them.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200517132444.9422-1-nyh@scylladb.com>
The test/alternator/run script starts Scylla to be tested. It waits until
CQL is responsive and if Scylla dies earlier, recognizes the failure
immediately. This is useful so we see boot errors immediately instead of
waiting for the first test to timeout and fail.
However, Scylla starts the Alternator service after CQL. So it is possible
that after the "run" script found CQL to be up, Alternator couldn't start
(e.g., bad configuration parameters) and Scylla is shut down, and instead
of recognizing this situation, we start the actual test.
The fix is simple: don't start the tests until verifying that Alternator
is up. We verify this using the trivial healthcheck request (which is
nothing more than an HTTP GET request).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200517125851.8484-1-nyh@scylladb.com>
The instructions in README.md about building a docker image start with
"cd dist/docker", but it actually needs to be "cd dist/docker/redhat".
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200517152815.15346-1-nyh@scylladb.com>
"
This series changes the describe_ring API to use HTTP stream instead of serializing the results and send it as a single buffer.
While testing the change I hit a 4-year-old issue inside service/storage_proxy.cc that causes a use after free, so I fixed it along the way.
Fixes#6297
"
* amnonh-stream_describe_ring:
api/storage_service.cc: stream result of token_range
storage_service: get_range_to_address_map prevent use after free
The get token range API can become big which can cause large allocation
and stalls.
This patch replace the implementation so it would stream the results
using the http stream capabilities instead of serialization and sending
one big buffer.
Fixes#6297
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The implementation of get_range_to_address_map has a default behaviour,
when getting an empty keypsace, it uses the first non-system keyspace
(first here is basically, just a keyspace).
The current implementation has two issues, first, it uses a reference to
a string that is held on a stack of another function. In other word,
there's a use after free that is not clear why we never hit.
The second, it calls get_non_system_keyspaces twice. Though this is not
a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling
that function does have a cost).
This patch solves both issues, by chaning the implementation to hold a
string instead of a reference to a string.
Second, it stores the results from get_non_system_keyspaces and reuse
them it's more efficient and holds the returned values on the local
stack.
Fixes#6465
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
In add40d4e59, we relaxed the prohibition of unbounded DELETE and
stopped testing the failure message. But there are still scenarios
when unbounded DELETE is prohibited, so add a test to ensure we
continue to catch it where appropriate.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
"
The shutdown process of compaction manager starts with an explicit call
from the database object. However that can only happen everything is
already initialized. This works well today, but I am soon to change
the resharding process to operate before the node is fully ready.
One can still stop the database in this case, but reshardings will
have to finish before the abort signal is processed.
This patch passes the existing abort source to the construction of the
compaction_manager and subscribes to it. If the abort source is
triggered, the compaction manager will react to it firing and all
compactions it manages will be stopped.
We still want the database object to be able to wait for the compaction
manager, since the database is the object that owns the lifetime of
the compaction manager. To make that possible we'll use a future
that is return from stop(): no matter what triggered the abort, either
an early abort during initial resharding or a database-level event like
drain, everything will shut down in the right order.
The abort source is passed to the database, who is responsible from
constructing the compaction manager
Tests: unit (debug), manual start+stop, manual drain + stop, previously
failing dtests.
"
Fixed-size integer types are legal varints - both are serialized as
two's complement in network byte order. So there's tinyint, shortint,
int, and bigint can be interpreted as varints.
Change is_compatible_with() to reflect that.
Message-Id: <20200516115143.28690-2-avi@scylladb.com>
The short and byte types are two's complement network byte order,
just like varint (except fixed size) and so varint can read them
just fine.
Mark them as value compatible like int32_type and long_type.
A unit test is added.
Message-Id: <20200516115143.28690-1-avi@scylladb.com>
This avoids potential use-after-move, since undefined c++ sequencing order
may std::move(f) in the lambda capture before evaluating f.stat().
Also, this makes use of a more generic library function that doesn't
require to open and hold on to the file in the application.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200514152054.162168-1-bhalevy@scylladb.com>
Consider: n1, n2, n1 is the repair master, n2 is the repair follower.
=== Case 1 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
is written to sstable, r2 is not written yet, r1 belongs to
partition 1, r2 belongs to partition 2. It yields after row r1 is
written.
data: partition_start, r1
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream()
data: partition_start, r1, partition_end
5) Step 2 resumes to apply the rows.
data: partition_start, r1, partition_end, partition_end, partition_start, r2
=== Case 2 ===
1) n1 sends missing rows {r1, r2} to n2
2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1
is written to sstable, r2 is not written yet, r1 belongs to partition
1, r2 belongs to partition 2. It yields after partition_start for r2
is written but before _partition_opened is set to true.
data: partition_start, r1, partition_end, partition_start
3) n1 sends repair_row_level_stop to n2 because error has happened on n1
4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream().
Since _partition_opened[node_idx] is false, partition_end is skipped,
end_of_stream is written.
data: partition_start, r1, partition_end, partition_start, end_of_stream
This causes unbalanced partition_start and partition_end in the stream
written to sstables.
To fix, serialize the write_end_of_stream and apply_rows with a semaphore.
Fixes: #6394Fixes: #6296Fixes: #6414
The Redis API in Scylla only supports a small subset of the Redis
commands. Let's document what we support so people have the right
expectations when they try it out.
Avoid `f(s).then([s = std::move(s)] {})` patterns,
where the move into the lambda capture may potentially be
sequenced by the compiler before passing `s` to function `f`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200514131701.140046-1-bhalevy@scylladb.com>
The existing text did not explain what happens if additional DCs are added
to the cluster, so this patch improves the explanation of the status of
our support for global tables, including that issue.
Fixes#6353
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200513175908.21642-1-nyh@scylladb.com>
The shutdown process of compaction manager starts with an explicit call
from the database object. However that can only happen everything is
already initialized. This works well today, but I am soon to change
the resharding process to operate before the node is fully ready.
One can still stop the database in this case, but reshardings will
have to finish before the abort signal is processed.
This patch passes the existing abort source to the construction of the
compaction_manager and subscribes to it. If the abort source is
triggered, the compaction manager will react to it firing and all
compactions it manages will be stopped.
We still want the database object to be able to wait for the compaction
manager, since the database is the object that owns the lifetime of
the compaction manager. To make that possible we'll use a future
that is return from stop(): no matter what triggered the abort, either
an early abort during initial resharding or a database-level event like
drain, everything will shut down in the right order.
The abort source is passed to the database, who is responsible from
constructing the compaction manager.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We want stop() to be callable just once. Having the compaction manager
stopped twice is a potential indication that something is wrong.
Still there are places where we want to stop all ongoing compactions
and prevent new from running - like the drain operation. Today the
only operation that allows for cancellation of all existing compations
is stop(). To unweave this, we will split those two things.
A drain operation is carved out, and it should be safe to be called many
times. The compaction manager is usable after this, and new compactions
can even be sent if it happen to be enabled again (we currently don't)
A stop operation, which includes a drain, will only be allowed once. After
a stop() the compaction_manager object is no longer usable.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We are having many issues with the stop code in the compaction_manager.
Part of the reason is that the "stopped" state has its meaning overloaded
to indicate both "compaction manager is not accepting compactions" and
"compaction manager is not ready or destructed".
In a later step we could default to enabled-at-start, but right now we
maintain current behavior to minimize noise.
It is only possible to stop the compaction manager once.
It is possible to enable / disable the compaction manager many times.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/6427
by Piotr Jastrzębski:
CDC Log is a time series so it makes sense to use time window compaction
strategy for it.
Our support for time series is limited so we make sure that we don't create
more than 24 sstables.
If TTL is configured to 0, meaning data does not expire, we don't use time
window compaction strategy.
This PR also sets gc_grace_seconds to 0 when TTL is not set to 0.
Print the test command line and the UBSAN and ASAN env settings to the log
so the run can be easily reproduced (optionally with providing --random-seed=XXX
that is printed by scylla unit tests when they start).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200513110959.32015-1-bhalevy@scylladb.com>
After commit 88d2486fca, removal of shared SSTables is not atomic anymore.
They can be first removed from the list of shared SSTables and only later be
removed from the SSTable set. That list is used to filter out shared SSTables
from regular compaction candidates.
So it can happen that regular compaction pick up a shared SSTable as candidate
after it was removed from that list but before it was removed from the set.
To fix this, let's only remove a shared SSTable from that aforementioned list
after it was successfully removed from the SSTable set, so that a shared
SSTable cannot be selected for regular compaction anymore.
Fixes#6439.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200512175224.114487-1-raphaelsc@scylladb.com>
C++20 makes string literals defined with u8"my string" as using
a new type char8_t. This is sensible, as plain char might not
have 8 bits, but conflicts with our bytes type.
Adjust by having overloads that cast back to char*. This limits
us to environments where char is 8 bits, but this is already a
restriction we have.
Reviewed-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20200512101646.127688-1-avi@scylladb.com>