The deletable_row accepts clustering_row in constructor and
.apply() method. The next patch will make clustering_row
embed the deletable_row inside, so those two methods will
violate layering and should be fixed in advance.
The fix is in providing a clustering_row method to convert
itself into a deletable_row. There are two places that need
this: mutation_fragment_applier and partition_snapshot_row_cursor.
Both methods pass temporary clustering_row value, so the
method in question is also move-converter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Changing a user type may allow adding apparently duplicate rows to
tables where this type is used in a partitioning key. Fix by checking
all types of existing partitioning columns before allowing to add new
fields to the type.
Fixes#6941
The log-structured allocator (LSA) reserves memory when performing
operations, since its operations are performed with reclaiming disabled
and if it runs out, it cannot evict cache to gain more. The amount of
memory to reserve is remembered across calls so that it does not have
to repeat the fail/increase-reserve/retry cycle for every operation.
However, we currently lack decaying the amount to reserve. This means
that if a single operation increased the reserve in the distant past,
all current operations also require this large reserve. Large reserves
are expensive since they can cause large amounts of cache to be evicted.
This patch adds reserve decay. The time-to-decay is inversely proportional
to reserve size: 10GB/reserve. This means that a 20MB reserve is halved
after 500 operations (10GB/20MB) while a 20kB reserve is halved after
500,000 operations (10GB/20kB). So large, expensive reserves are decayed
quickly while small, inexpensive reserves are decayed slowly to reduce
the risk of allocation failures and exceptions.
A unit test is added.
Fixes#325.
This patch adds support for 2 hash commands HDEL and HGETALL.
Internally it introduces the hashes_result_builder class to
read hashes and stored them in a std::map.
Other changes:
- one exception return string was fixed
- tests now use pytest.raises
Signed-off-by: Etienne Adam <etienne.adam@gmail.com>
Message-Id: <20200907202528.4985-1-etienne.adam@gmail.com>
Due to #7175, microseconds are stored in a db_clock::time_point
as if they were milliseconds.
std::chrono::duration_cast<std::chrono::nanoseconds> may cause overflow
and end up with invalid/negative nanos.
This change specializes time_point_to_string to std::chrono::milliseconds
since it's currently only called to print db_clock::time_point
and uses boost::posix_time::milliseconds to print the count.
This would generate an exception in today's time stamps
and the output will look like:
1599493018559873 milliseconds (Year is out of valid range: 1400..9999)
instead of:
1799-07-16T19:57:52.175010
It is preferrable to print the numeric value annotated as out of valid range
than to print a bogus date in the past.
Test: unit(dev), commitlog_test:TestCommitLog.test_mixed_mode_commitlog_same_partition_smp_1
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200907162845.147477-1-bhalevy@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/7160
By Calle Wilund:
Stream descriptions, as returned from create, update and describe stream was missing "latest" stream arn.
Shard descriptions sequence number (for us timeuuid:s) were formatted wrongly. The spec states they should be numeric only.
Both these defects break Kinesis operations.
alternator: Set CDC delta to keys only for alternator streams
alternator: Include stream spec in desc for create/update/describe
alternator: Include LatestStreamLabel in resulting desc for create/update table
alternator: Make "StreamLabel" an iso8601 timestamp
alternator: Alloc BILLING_MODE in update_table
cdc: Add setter for delta mode
alternator: Fix sequence number range using wrong format
alternator: Include stream arn in table description if enabled
Add new validate_with_error_position function
which returns -1 if data is a valid UTF-8 string
or otherwise a byte position of first invalid
character. The position is added to exception
messages of all UTF-8 parsing errors in Scylla.
validate_with_error_position is done in two
passes in order to preserve the same performance
in common case when the string is valid.
Fixes#7190
Since we don't use any delta value when translating cdc -> streams
it is wasteful to write these to the log table, esp. since we already
write big fat pre- and post images.
Fixes#7163
If enabled, the resulting table description should include a
StreamDescription object with the appropriate members describing
current stream settings.
Previous patches removed those `BOOST_REQUIRE*` macros that could be
invoked from shards other than 0. The reason is that said macros are not
thread-safe, so calling them from multiple shards produces mangled
output to stdout as well as the XML report file. It was assumed that
only these invocations -- from a non-0 shard -- are problematic, but it
turns out even these can race with seastar log messages emitted from
other shards. This patch removes all such macros, replacing them with
the thread safe `require*` functions from `test/lib/test_utils.hh`.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200907125309.1199104-1-bdenes@scylladb.com>
Fixes#7158
A streams shard descriptions has a sequence range describing start/end
(if available) of the shard. This is specified as being "numeric only".
Alternator incorrectly used UUID here, which breaks kinesis.
v2:
* Fix uint128_t parsing from string. bmp::number constructor accepted
sstring, but did not interpret it as std::string/chars. Weird results.
As seen in https://github.com/scylladb/scylla/issues/7175,
1e676cd845
that was merged in bc77939ada
exposed a preexisting problem in time_point_to_string
where it tried printing a timestamp that was in microseconds
(taken from an api::timestamp_type instead of db_clock::time_point)
and hit `boost::wrapexcept<boost::gregorian::bad_year> (Year is out of valid range: 1400..9999)`
If hit, this patch with print the offending time_stamp in
nanoseconds and the error message.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200907083303.33229-1-bhalevy@scylladb.com>
Fixes#7157
When creating/altering/describing a table, if streams are enabled, the
"latest active" stream arn should be included as LatestStreamArn.
Not doing so breaks java kinesis.
This improves the build documentation beyond just packaging:
- Explain how the configure.py step works
- Explain how to build just executables and tests (for development)
- Explain how to build for specific build mode if you didn't specify a
build mode in configure.py step
- Fix build artifact locations, missing .debs, and add executables and
tests
Message-Id: <20200904084443.495137-1-penberg@iki.fi>
T& tp may have other period than milliseconds.
Cast the time_point duration to nanoseconds (or microseconds
if boost doesn't supports it) so it is printed in the best possible
resolution.
Note that we presume that the time_point epoch is the
Unix epoch of 1970-01-01, but the c++ standard doesn't guwarntee
that. See https://github.com/scylladb/scylla/issues/5498
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200906171106.690872-1-bhalevy@scylladb.com>
Documentation states that `SCYLLA_LWT_OPTIMIZATION_META_BIT_MASK`
is a 32-bit integer that represents bit mask. What it fails to mention
is that it's a unsigned value and in fact it takes value of 2147483648.
This is problematic for clients in languages that don't have unsigned
types (like Java).
This patch improves the documentation to make it clear that
`SCYLLA_LWT_OPTIMIZATION_META_BIT_MASK` is represented by unsigned
value.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <7166b736461ae6f3d8ffdf5733e810a82aa02abc.1599382184.git.piotr@scylladb.com>
statement_restrictions::process_partition_key_restrictions() was
checking has_unrestricted_components(), whereas just an empty() check
suffices there, because has_unrestricted_components() is implicitly
checked five lines down by needs_filtering().
The replacement check is cheaper and simpler to understand.
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Previously batch statement result set included rows for only
those updates which have a prefetch data present (i.e. there
was an "old" (pre-existing) row for a key).
Also, these rows were sorted not in the order in which statements
appear in the batch, but in the order of updated clustering keys.
If we have a batch which updates a few non-existent keys, then
it's impossible to figure out which update inserted a new key
by looking at the query response. Not only because the responses
may not correspond to the order of statements in the batch, but
even some rows may not show up in the result set at all.
Please see #7113 on Github for detailed description
of the problem:
https://github.com/scylladb/scylla/issues/7113
The patch set proposes the following fix:
For conditional batch statements the result set now always
includes a row for each LWT statement, in the same order
in which individual statements appear in the batch.
This way we can always tell which update did actually insert
a new key or update the existing one.
Technically, the following changes were made:
* `update_parameters::prefetch_data::row::is_in_cas_result_set`
member removed as well as the supporting code in
`cas_request::applies_to` which iterated through cas updates
and marked individual `prefetch_data` rows as "need to be in
cas result set".
* `cas_request::applies_to` substantially simplified since it
doesn't do anything more than checking `stmt.applies_to()`
in short-circuiting manner.
* `modification_statement::build_cas_result_set` method moved
to `cas_request`. This allows to easily iterate through
individual `cas_row_update` instances and preserve the order
of the rows in the result set.
* A little helper `cas_request::find_old_row`
is introduced to find a row in `prefetch_data` based on the
(pk, ck) combination obtained from the current `cas_request`
and a given `cas_row_update`.
* A few tests for the issue #7113 are written, other lwt-batch-related
tests adjusted accordingly.
Previously batch statement result set included rows for only
those updates which have a prefetch data present (i.e. there
was an "old" (pre-existing) row for a key).
Also, these rows were sorted not in the order in which statements
appear in the batch, but in the order of updated clustering keys.
If we have a batch which updates a few non-existent keys, then
it's impossible to figure out which update inserted a new key
by looking at the query response. Not only because the responses
may not correspond to the order of statements in the batch, but
even some rows may not show up in the result set at all.
The patch proposes the following fix:
For conditional batch statements the result set now always
includes a row for each LWT statement, in the same order
in which individual statements appear in the batch.
This way we can always tell which update did actually insert
a new key or update the existing one.
`update_parameters::prefetch_data::row::is_in_cas_result_set`
member variable was removed as well as supporting code in
`cas_request::applies_to` which iterated through cas updates
and marked individual `prefetch_data` rows as "need to be in
cas result set".
Instead now `cas_request::applies_to` is significantly
simplified since it doesn't do anything more than checking
`stmt.applies_to()` in short-circuiting manner.
A few tests for the issue are written, other lwt-batch-related
tests were adjusted accordingly to include rows in result set
for each statement inside conditional batches.
Tests: unit(dev, debug)
Co-authored-by: Konstantin Osipov <kostja@scylladb.com>
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
This is just a plain move of the code from `modification_statement`
to `cas_request` without changes in the logic, which will further
help to refactor `build_cas_result_set` behavior to include a row
for each LWT statement and order rows in the order of statements
in a batch.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Factor out little helper function which finds a pre-existing
row for a given `cas_row_update` (matching the primary key).
Used in `cas_request::applies_to`.
Will be used in a subsequent patch to move
`modification_statement::build_cas_result_set` into `cas_request`.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
We use a lot of TLS variables yet GDB is not of much help when working
with these. So in this patch I document where they are located in
memory, how to calculate the address of a known TLS variable and how to
find (identify) one given an address.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200903131802.1068288-1-bdenes@scylladb.com>
* seastar 7f7cf0f232...4ff91c4c3a (47):
> core/reactor: complete_timers(): restore previous scheduling group
Fixes#7117.
> future: Drop spurious 'mutable' from a lambda
> future: Don't put a std::tuple in future_state
> future: Prepare for changing the future_state storage type
> future: Add future_state::get0()
> when_all: Replace another untuple with get0
> when_all: Use get0 instead of untuple
> rpc: Don't assume that a future stores a std::tuple
> future: Add a futurize_base
> testing: Stop boost from installing its own signal handler
> future: Define futurize::value_type with future::value_type
> future: Move futurize down the file
> Merge "Put logging onto {fmt} rails" from Pavel E
> Merge "Make future_state non variadic" from Rafael
> sharded: propagate sharded instances stop error
> log: Fix misprint in docs
> future: Use destroy_at in destructor
> future: Add static_assert rejecting variadic future_state
> futures_test: Drop variadic test
> when_all: Drop a superfluous use of futurize::from_tuples
> everywhere: Use future::get0 when appropriate
> net: upgrade sockopt comments to doxygen
> iostream: document the read_exactly() function
> net:tls: fix clang-10 compilation duration cast
> future: Move continuation_base_from_future to future.hh
> repeat: Drop unnecessary make_tuple
> shared_future: Don't use future_state_type
> future: Add a static_assert against variadic futures
> future: Delete warn_variadic_future
> rpc_demo: Don't use a variadic future
> rpc_test: Don't use a variadic future
> futures_test: Don't use a variadic future
> future: Move disable_failure_guard to promise::schedule
> net: add an interface for custom socket options
> posix: add one more setsockopt overload
> Merge "Simplify pollfn and its inheritants" from Pavel E
> util:std-compat.hh: add forward declaration of std::pmr for clang-10
> rpc: Add protocol::has_handlers() helper
> Add a Seastar_DEBUG_ALLOCATIONS build option
> futures_test: Add a test for futures of references
> future: Simplify destruction of future_state
> Use detect_stack_use_after_return=1
> repeat: Fix indentation
> repeat: Delete try/catch
> repeat: Simplify loop
> Avoid call to std::exception_ptr's destructor from a .hh
> file: Add missing include
"
Before seastar is updated with the {fmt} engine under the
logging hood, some changes are to be made in scylla to
conform to {fmt} standards.
Compilation and tests checked against both -- old (current)
and new seastar-s.
tests: unit(dev), manual
"
* 'br-logging-update' of https://github.com/xemul/scylla:
code: Force formatting of pointer in .debug and .trace
code: Format { and } as {fmt} needs
streaming: Do not reveal raw pointer in info message
mp_row_consumer: Provide hex-formatting wrapper for bytes_view
heat_load_balance: Include fmt/ranges.h
This patch adds a test for the TRIM_HORIZON option of GetShardIterator in
Alternator Streams. This option asks to fetch again *all* the available
history in this shard stream. We had an implementation for it, but not a
test - so this patch adds one. The test passes.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200830131458.381350-1-nyh@scylladb.com>
Alternator Streams already support the AT_SEQUENCE_NUMBER and
AFTER_SEQUENCE_NUMBER options for iterators. These options allow to replay
a stream of changes from a known position or after that known position.
However, we never had a test verifying that these features actually work
as intended, beyond just checking syntax. Having such tests is important
because recently we changed the implementation of these iterators, but
didn't have a test verifying that they still work.
So in this patch we add such tests. The tests pass (as usual, on both
Alternator and DynamoDB).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200830115817.380075-1-nyh@scylladb.com>
We had a test, test_streams_last_result, that verifies that after reading from
an Alternator Stream the last event, reading again will find nothing.
But we didn't actually have a test which checks that if at that point a new event
*does* arrive, we can read it. This test checks this case, and it passes (we don't
have a bug there, but it's good as a regression test for NextShardIterator).
This test also verifies that after reading an event for a particular key on a
a specific stream "shard", the next event for the same key will arrive on the
same shard.
This test passes on both Alternator and DynamoDB.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200830105744.378790-1-nyh@scylladb.com>
A compaction strategy, that supports parallel compaction, may want to know
if the table has compaction running on its behalf before making a decision.
For example, a size-tiered-like strategy may not want to trigger a behavior,
like cross-tier compaction, when there's ongoing compaction.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200901134306.23961-1-raphaelsc@scylladb.com>
If available, use the recently added
`reader_concurrency_semaphore::_initial_resources` to calculate the
amount of memory used out of the initially configured amount. If not
available, the summary falls back to the previous mode of just printing
the remaining amount of memory.
Example:
Replica:
Read Concurrency Semaphores:
user sstable reads: 11/100, 263621214/ 42949672 B, queued: 847
streaming sstable reads: 0/ 10, 0/ 42949672 B, queued: 0
system sstable reads: 1/ 10, 251584/ 42949672 B, queued: 0
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200901091452.806419-1-bdenes@scylladb.com>
This patch fixes a bug which caused sporadic failures of the Alternator
test - test_streams.py::test_streams_last_result.
The GetRecords operation reads from an Alternator Streams shard and then
returns an "iterator" from where to continue reading next time. Because
we obviously don't want to read the same change again, we "incremented"
the current position, to start at the incremented position on the next read.
Unfortunately, the implementation of the increment() function wasn't quite
right. The position in the CDC log is a timeuuid, which has a really bizarre
comparison function (see compare_visitor in types.cc). In particular the
least-sigificant bytes of the UUID are compared as *signed* bytes. This
means that if the last byte of the UUID was 127, and increment() increased
it to 128, and this was wrong because the comparison function later deemed
that as a signed byte, where 128 is lower than 127, not higher! The result
was that with 1/256 probability (whenever the last byte of the position was
127) we would return an item twice. This was reproduced (with 1/256
probability) by the test test_streams_last_result, as reported in issue #7004.
The fix in this patch is to drop the increment() and replace it by a flag
whether an iterator is inclusive of the threshold (>=) or exclusive (>).
The internal representation of the iterator has a boolean flag "inclusive",
and the string representation uses the prefixes "I" or "i" to indicate an
inclusive or exclusive range, respectively - whereas before this patch we
always used the prefix "I".
Although increment() could have been fixed to work correctly, the result would
have been ugly because of the weirdness of the timeuuid comparison function.
increment() would also require extensive new unit-tests: we were lucky that
the high-level functional tests caught a 1 in 256 error, but they would not
have caught rarer errors (e.g., 1 in 2^32). Furthermore, I am looking at
Alternator as the first "user" of CDC, and seeing how complicated and
error-prone increment() is, we should not recommend to users to use this
technique - they should use exclusive (>) range queries instead.
Fixes#7004.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200901102718.435227-1-nyh@scylladb.com>
Users can set python3 and sysconfdir from cmdline of install.sh
according to the install mode (root or nonroot) and distro type.
It's helpful to correct the default python3/sysconfdir, otherwise
setup scripts or scylla-jmx doesn't work.
Fixes#7130
Signed-off-by: Amos Kong <amos@scylladb.com>
If the install.sh is executed by sudo, then the tmp scylla.yaml is owned
by root. It's difficult to overwrite it by non-privileged users.
Signed-off-by: Amos Kong <amos@scylladb.com>
Commit a6ad70d3da changed the format of
stream IDs: the lower 8 bytes were previously generated randomly, now
some of them have semantics. In particular, the least significant byte
contains a version (stream IDs might evolve with further releases).
This is a backward-incompatible change: the code won't properly handle
stream IDs with all lower 8 bytes generated randomly. To protect us from
subtle bugs, the code has an assertion that checks the stream ID's
version.
This means that if an experimental user used CDC before the change and
then upgraded, they might hit the assertion when a node attempts to
retrieve a CDC generation with old stream IDs from the CDC description
tables and then decode it.
In effect, the user won't even be able to start a node.
Similarly as with the case described in
d89b7a0548, the simplest fix is to rename
the tables. This fix must get merged in before CDC goes out of
experimental.
Now, if the user upgrades their cluster from a pre-rename version, the
node will simply complain that it can't obtain the CDC generation
instead of preventing the cluster from working. The user will be able to
use CDC after running checkAndRepairCDCStreams.
Since a new table is added to the system_distributed keyspace, the
cluster's schema has changed, so sstables and digests need to be
regenerated for schema_digest_test.
Merged pull request https://github.com/scylladb/scylla/pull/7121
By Calle Wilund:
Refs #7095Fixes#7128
CDC delta!=full both relied on post-filtering
to remove generated log row and/or cells. This is inefficient.
Instead, simply check if the data should be created in the
visitors.
Also removed delta_mode=off mode.
cdc: Remove post-filterings for keys-only/off cdc delta generation
cdc: Remove cdc delta_mode::off
Refs #7095
CDC delta!=full both relied on post-filtering
to remove generated log row and/or cells. This is inefficient.
Instead, simply check if the data should be created in the
visitors.
v2:
* Fixed delta logs rows created (empty) even when delta == off
v3:
* Killed delta == off
v4:
* Move checks into (const) member var(s)