According to the IETF spec uuid variant bits should be set to '10'. All
others are either invalid or reserved. The patch change the code to
follow the spec.
Closesscylladb/scylladb#27073
Our "sstring_view" is an historic alias for the standard std::string_view.
The patch changes the last remaining random uses of this old alias across
our source directory to the standard type name.
After this patch, there are no more uses of the "sstring_view" alias.
It will be removed in the following patch.
Refs #4062.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
schema_change_test currently fails due to failure to start a cql test
env in unit tests after the point where this is called (in one of the
test cases):
forward_jump_clocks(std::chrono::seconds(60*60*24*31));
The problem manifests with a failure to join the cluster due to
missing_column exception ("missing_column: done") being thrown from
system_keyspace::get_topology_request_state(). It's a symptom of
join request being missing in system.topology_requests. It's missing
because the row is expired.
When request is created, we insert the
mutations with intended TTL of 1 month. The actual TTL value is
computed like this:
ttl_opt topology_request_tracking_mutation_builder::ttl() const {
return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds(_ts)) + std::chrono::months(1)
- std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch());
}
_ts comes from the request_id, which is supposed to be a timeuuid set
from current time when request starts. It's set using
utils::UUID_gen::get_time_UUID(). It reads the system clock without
adding the clock offset, so after forward_jump_clocks(), _ts and
gc_clock::now() may be far off. In some cases the accumulated offset
is larger than 1month and the ttl becomes negative, causing the
request row to expire immediately and failing the boot sequence.
The fix is to use db_clock, which respects offsets and is consistent
with gc_clock.
The test doesn't fail in CI becuase there each test case runs in a
separate process, so there is no bootstrap attempt (by new cql test
env) after forward_jump_clocks().
Closesscylladb/scylladb#21558
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
In issue #17035 we had a situation where a certain input timestamp
could result in the create_time() utility function getting called on
a timestamp that cannot be represented as timeuuid, and this resulted
in an *assertion failure*, and a crash.
I guess we used an assertion because we believed that callers try to
avoid calling this function on excessively large timestamps, but
evidentally, they didn't tried hard enough and we got a crash.
The code in UUID_gen.hh changed a lot over the years and has become
very convoluted and it is almost impossible to understand all the
code paths that could lead to this assertion failures. So it's better
to replace this assertion by a on_internal_error, which by default
is just an exception - and also logs the backtrace of the failure.
Issue #17035 would have been much less serious if we had an exception
instead of an assert.
Refs #17035
Refs #7871, Refs #13970 (removes an assert)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Our time-handling code in UUID_gen.hh is very fragile for very large
timestamps, because the different types - such as Cassandra "timestamp"
and Timeuuid use very different resolution and ranges.
In issue #17035 we discovered a situation where a certain CQL
"timestamp"-type value could cause an assertion-failure and a crash
in the create_time() function that creates a timeuuid - because that
timestamp didn't fit the place we have in timeuuid.
We already added in the past a limit, UUID_UNIXTIME_MAX, beyond which
we refuse timestamps, to avoid these assertions failure. However, we
missed the possibility of *negative* timestamps (which are allowed in
CQL), and indeed a negative timestamp (or a timestamp which was "wrapped"
to a negative value) is what caused issue #17035.
So this patch adds a second limit, UUID_UNIXTIME_MIN - limiting the
most negative timestamp that we support to well below the area which
causes problems, and adds tests that reproduce #17035 and that we
didn't break anything else (e.g., negative timestamps are still
allowed - just not extremely negative timestamps).
Fixes#17035.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In 10c1f1dc80 I fixed
`make_group0_history_state_id_mutation` to use correct timestamp
resolution (microseconds instead of milliseconds) which was supposed to
fix the flakiness of `test_group0_history_clearing_old_entries`.
Unfortunately, the test is still flaky, although now it's failing at a
later step -- this is because I was sloppy and I didn't adjust this
second part of the test to also use microsecond resolution. The test is
counting the number of entries in the `system.group0_history` table that
are older than a certain timestamp, but it's doing the counting using
millisecond resolution, causing it to give results that are off by one
sometimes.
Fix it by using microseconds everywhere.
Fixes#14653Closes#14670
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
A follow up for the patch for #7611. This change was requested
during review and moved out of #7611 to reduce its scope.
The patch switches UUID_gen API from using plain integers to
hold time units to units from std::chrono.
For one, we plan to switch the entire code base to std::chrono units,
to ensure type safety. Secondly, using std::chrono units allows to
increase code reuse with template metaprogramming and remove a few
of UUID_gen functions that beceme redundant as a result.
* switch get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch
min_time_UUID(), max_time_UUID(), create_time_safe() to
std::chrono
* remove unused variant of from_unix_timestamp()
* remove unused get_time_UUID_bytes(), create_time_unsafe(),
redundant get_adjusted_timestamp()
* inline get_raw_UUID_bytes()
* collapse to similar implementations of get_time_UUID()
* switch internal constants to std::chrono
* remove unnecessary unique_ptr from UUID_gen::_instance
Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>
Scylla list cells are represented internally as a map of
timeuuid => value. To append a new value to a list
the coordinator generates a timeuuid reflecting the current time as key
and adds a value to the map using this key.
Before this patch, Scylla always generated a timeuuid for a new
value, even if the query had a user supplied or LWT timestamp.
This could break LWT linearizability. User supplied timestamps were
ignored.
This is reported as https://github.com/scylladb/scylla/issues/7611
A statement which appended multiple values to a list or a BATCH
generated an own microsecond-resolution timeuuid for each value:
BEGIN BATCH
UPDATE ... SET a = a + [3]
UPDATE ... SET a = a + [4]
APPLY BATCH
UPDATE ... SET a = a + [3, 4]
To fix the bug, it's necessary to preserve monotonicity of
timeuuids within a batch or multi-value append, but make sure
they all use the microsecond time, as is set by LWT or user.
To explain the fix, it's first necessary to recall the structure
of time-based UUIDs:
60 bits: time since start of GMT epoch, year 1582, represented
in 100-nanosecond units
4 bits: version
14 bits: clock sequence, a random number to avoid duplicates
in case system clock is adjusted
2 bits: type
48 bits: MAC address (or other hardware address)
The purpose of clockseq bits is as defined in
https://tools.ietf.org/html/rfc4122#section-4.1.5
is to reduce the probability of UUID collision in case clock
goes back in time or node id changes. The implementation should reset it
whenever one of these events may occur.
Since LWT microsecond time is guaranteed to be
unique by Paxos, the RFC provisioning for clockseq and MAC
slots becomes excessive.
The fix thus changes timeuuid slot content in the following way:
- time component now contains the same microsecond time for all
values of a statement or a batch. The time is unique and monotonic in
case of LWT. Otherwise it's most always monotonic, but may not be
unique if two timestamps are created on different coordinators.
- clockseq component is used to store a sequence number which is
unique and monotonic for all values within the statement/batch.
- to protect against time back-adjustments and duplicates
if time is auto-generated, MAC component contains a random (spoof)
MAC address, re-created on each restart. The address is different
at each shard.
The change is made for all sources of time: user, generated, LWT.
Conditioning the list key generation algorithm on the source of
time would unnecessarily complicate the code while not increase
quality (uniqueness) of created list keys.
Since 14 bits of clockseq provide us with only 16383 distinct slots
per statement or batch, 3 extra bits in nanosecond part of the time
are used to extend the range to 131071 values per statement/batch.
If the rang is exceeded beyond the limit, an exception is produced.
A twist on the use of clockseq to extend timeuuid uniqueness is
that Scylla, like Cassandra, uses int8 compare to compare lower
bits of timeuuid for ordering. The patch takes this into account
and sign-complements the clockseq value to make it monotonic
according to the legacy compare function.
Fixes#7611
test: unit (dev)
Fixes#7424
AWS sdk (kinesis) assumes SequenceNumbers are monotonically
growing bigints. Since we sort on and use timeuuids are these
a "raw" bit representation of this will _not_ fulfill the
requirement. However, we can "unwrap" the timestamp of uuid
msb and give the value as timestamp<<64|lsb, which will
ensure sort order == bigint order.
Safely convert millis to "nanos_since" (number of 100
nanseconds since START_EPOCH) while type casting to uint64_t
to avoid possible int overflow.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
UUID_gen::create_time_safe() does not synchronize across cores. The
comment says that it assumes it runs on a single core. This is no
longer true, we can run urchin on many cores. This easily leads to
UUID conflicts with more than one core. Fix by adding a per-core
unique number to the node part of the UUID.
Convert Cassandra's UUIDGen class, which generates time-dependent UUID,
and parts of the java.util.UUID which I thought we need, to C++.
It is possible I missed some needed features of java.util.UUID that we'll
need to add later.
Also, part of the version-1 UUID is supposed to be node-unique (so that
if two nodes happen to boot at the same time and get a UUID at exactly
the same time, they still get different UUIDs). Cassandra uses for this
a hash function of the IP address, we should use in the future the MAC
address (from Seastar's network stack). But currently we just use 0.
Left a FIXME to fix that.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
[avi: add to ./configure.py]