when we convert timestamp into string it must look like: '2017-12-27T11:57:42.500Z'
it concerns any conversion except JSON timestamp format
JSON string has space as time separator and must look like: '2017-12-27 11:57:42.500Z'
both formats always contain milliseconds and timezone specification
Fixes#14518Fixes#7997Closes#14726Fixes#16575
(cherry picked from commit ff721ec3e3)
Closes#18852
We have had support for COUNTER columns for quite some time now, but some functionality was left unimplemented - various internal and CQL functions resulted in "unimplemented" messages when used, and the goal of this series is to fix those issues. The primary goal was to add the missing support for CASTing counters to other types in CQL (issue #14501), but we also add the missing CQL `counterasblob()` and `blobascounter()` functions (issue #14742).
As usual, the series includes extensive functional tests for these features, and one pre-existing test for CAST that used to fail now begins to pass.
Fixes#14501Fixes#14742Closes#14745
* github.com:scylladb/scylladb:
test/cql-pytest: test confirming that casting to counter doesn't work
cql: support casting of counter to other types
cql: implement missing counterasblob() and blobascounter() functions
cql: implement missing type functions for "counters" type
(cherry picked from commit a637ddd09c)
Check the first fragment before dereferencing it, the fragment might be
empty, in which case move to the next one.
Found by running range scan tests with random schema and random data.
Fixes: #12821Fixes: #12823Fixes: #12708Closes#12824
(cherry picked from commit ef548e654d)
Currently reverse types match the default case (false), even though they
might be wrapping a tuple type. One user-visible effect of this is that
a schema, which has a reversed<frozen<UDT>> clustering key component,
will have this component incorrectly represented in the schema cql dump:
the UDT will loose the frozen attribute. When attempting to recreate
this schema based on the dump, it will fail as the only frozen UDTs are
allowed in primary key components.
Fixes: #12576Closes#12579
(cherry picked from commit ebc100f74f)
Some functions defined by a template in types.cc are used in other
translation units (via `cql3/untyped_result_set.hh`), but aren't
explicitly instantiated. Therefore their linking can fail, depending
on inlining decisions. (I experienced this when playing with compiler
options).
Fix that.
Closes#12539
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
This PR fixes several bugs related to handling of non-full
clustering keys.
One is in trim_clustering_row_ranges_to(), which is broken for non-full keys in reverse
mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of
position_in_partition_view::before_key(key), hence it will include the
key in the resulting range rather than exclude it.
Fixes#12180
after_key() was creating a position which is after all keys prefixed
by a non-full key, rather than a position which is right after that
key.
This will issue will be caught by cql_query_test::test_compact_storage
in debug mode when mutation_partition_v2 merging starts inserting
sentinels at position after_key() on preemption.
It probably already causes problems for such keys as after_key() is used
in various parts in the read path.
Refs #1446Closes#12234
* github.com:scylladb/scylladb:
position_in_partition: Make after_key() work with non-full keys
position_in_partition: Introduce before_key(position_in_partition_view)
db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order
types: Fix comparison of frozen sets with empty values
A frozen set can be part of the clustering key, and with compact
storage, the corresponding key component can have an empty value.
Comparison was not prepared for this, the iterator attempts to
deserialize the item count and will fail if the value is empty.
Fixes#12242
Now that scylla requries c++20 there's no
need to define our own implementation in utils/bit_cast.hh
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As reported in #10867, newer versions of the fmt library
format %Y using 4-characters width, 0-padding the prefix
when needed, while older versions don't do that.
This change moves away from using %Y and friends
fmt specifiers to using explicit numeric-based formatting
conforming to ISO 8601 and making sure the year field
has at least 4 digits and is zero padded. When
negative, the width is upped to 5 so it would show as -0001
rather than -001.
The unit test was updated respectively.
Fixes#10867
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#10870
The time point is multiplied by an adjustment factor of 1000
for boost::posix_time::time_duration::ticks_per_second() = 1000000
when calling boost::posix_time::milliseconds(count).
That may lead to integer overflow as reported by the
UndefinedBehaviorSanitizer.
See https://github.com/scylladb/scylla/issues/10830#issuecomment-1158899187
This change uses gmtime_r to convert seconds since unix epoch
to std::tm and the fmt library to format the iso representation
of the time_point to avoid exceptions and undefined behavior.
gmtime_r may still detect an overflow "when the year does not fit into
an integer" (see ctime(3)). In this case we return a backward
compatible representation of "{count} milliseconds (out of range)".
Refs #10830
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Let's say we have a query like:
```cql
INSERT INTO ks.t (list_column) VALUES (?);
```
And the driver sends a list with null inside as the bound value, something like `[1, 2, null, 4]`.
In such case we should throw `invalid_request_exception` because `nulls` are not allowed inside collections.
Currently when a query like this gets executed Scylla throws an ugly marshalling error.
This is because the validation code reads size of the next element, interprets it as an unsigned integer and tries to read this much.
In case of `null` element the size is `-1`, which when converted to unsigned `size_t` gives 18446744073709551615 and it fails to read this much.
This PR adds proper validation checks to make the error message better.
I also added some tests.
I originally tried to write them in python, but python driver really doesn't like sending invalid values.
Trying to send `[1, None, 2]` results in a list with empty value instead of null.
Trying to send `[1, UNSET_VALUE, 2]` Fails before query even leaves the driver.
Fixes#10580Closes#10599
* github.com:scylladb/scylla:
cql3: Add tests for null and unset inside collections
cql3: Add null and unset checks in collection validation
Add a bunch of tests that test what happens
when there is a null or unset value inside collections.
They are not allowed so every such attempt
should end with invalid_request_exception
with proper message.
I had to write a new function for collection serialization.
I tried to use data_value and its methods, but it's impossible
to create a data_value that represents an unset value.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Validating a collection should ensure that there
are no null or unset values inside the collection.
The validation already fails in case of such values,
but it does so in an ugly way.
Length of null and unset value is negative but is
cast to unsigned size_t. Then it tries to read
a really large value and fails with marshalling error.
The new checks are a better way to handle this.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
expression::printer is used to print CQL expressions
in a pretty way that allows them to be parsed back
to the same representation.
There is a bunch of things that need to be changed when
compared to the current implementation of opreatorr<<(expression)
to output something parsable.
column names should be printed without 'unresolved_identifier()'
and sometimes they need to be quoted to perserve case sensitivity.
I needed to write new code for printing constant values
because the current one did debug printing
(e.g. a set was printed as '1; 2; 3').
A list of IN values should be printed inside () intead of [],
but because it is internally represented as a list it is
by default printed with [].
To fix this a temporary tuple_constructor is created and printed.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Checking if the type is string is subtly broken for reversed types,
and these types will not be recognized as strings, even though they are.
As a result, if somebody creates a column with DESC order and then
tries to use operator LIKE on it, it will fail because the type
would not be recognized as a string.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
When a template is instantiated in a header file which is included by many
source files, the compiler needs to compile it again and again.
ClangBuildAnalyzer helps find the worst cases of this happening, and one
of the worst happens to be
seastar::throw_with_backtrace<marshal_exception, sstring>
This specific template function takes (according to ClangBuildAnalyzer)
362 milliseconds to instantiate, and this is done 312 (!) times, because
it reaches virtually every Scylla source file via either types.hh or
compound.hh which use this idiom.
Unfortunately, C++ as it exists today does not have a mechanism to
avoid compiling a specific template instantiation if this was already
done in some other source file. But we can do this manually using
the C++11 feature of "extern template":
1. For a specific template instance, in this case
seastar::throw_with_backtrace<marhsal_exception, sstring>,
all source files except one specify it as "extern template".
This means that the code for it will NOT be built in this source
file, and the compiler assumes the linker will eventually supply it.
2. At the same time, one source file instantiates this template
instance once regularly, without "extern".
The numbers from ClangBuildAnalyzer suggest that this patch should
reduce total build time by 1% (in dev build mode), but this is hard to
measure in practice because the very long build time (210 CPU minutes on
my laptop) usually fluctuates by more than 1% in consecutive runs.
However, we've seen in the past that a good estimate of build time is
the total produced object size (du -bc build/dev/**/*.o). This patch
indeed reduces this total object size (in dev build mode) by exactly 1%.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220105171453.308821-1-nyh@scylladb.com>
In a previous patch, we noticed that the header file <gm/inet_address.hh>,
which is included, directly or indirectly, by most source files,
includes <seastar/net/ip.hh> which is very slow to compile, and
replaced it by the much faster-to-include <seastar/net/ipv[46]_address.hh>.
However, we also included <seastar/net/ip.hh> in types.hh - and that
too is included by almost every file, so the actual saving from the
above patch was minimal. So in this patch we replace this include too.
After this patch Scylla does not include <seastar/net/ip.hh> at all.
According to ClangBuildAnalyzer, this reduces the average time to include
types.hh (multiply this by 312 times!) from 4 seconds to 1.8 seconds,
and reduces total build time (dev mode) by about 3%.
Some of the source files were now missing some include directives, that
were previously included in ip.hh - so we need to add those explicitly.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
deserialize_value() has a constraint that depends on another
deserialize_value() implementation. Apprently gcc wants to
instantiate the deserialize_value() instance we're constraining
while evaluating the constraint, leading to a loop.
Since this deserialize_value() is just an internal helper, drop
the constraint rather than fighting it.
contains_collection() and contains_set_or_map() used to be calculated on each call().
Now the result is calculated only once during type creation.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Implement evaluating a bind_variable.
To be able to evaluate a bind_variable we need to know the type of the bound value.
This is why a data_type has been added to the bind_variable struct.
There are some quirks when evaluating a bind_variable.
The first problem occurs when the variable has been sent with an older cql serialization format and contains collections.
In that case the value has to be reserialized to use the newest cql serialization format.
The second problem occurs when there is a set or a map in the value.
The set value sent by the driver might not have the elements in the correct order, contain duplicates etc.
When a set or map is detected in the value it is reserialized as well.
collection_type_impl::reserialize doesn't work for this purpose, because it uses data_value which does not perform sorting or removal.
New code corresponds to old bind() of lists::marker in cql3/lists.cc, sets::marker etc.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Sometimes we need to know whether some type contains
some collection, set, or map inside.
Introduce two functions that provide this information.
Information about collection is useful for reserializing
values with old serialization format.
Information about set/map is useful for reserializing
sets and maps to remove duplicates.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
If x is of type std::strong_ordering, then "x <=> 0" is equivalent to
x. These no-ops were inserted during #1449 fixes, but are now unnecessary.
They have potential for harm, since they can hide an accidental of the
type of x to an arithmetic type, so remove them.
Ref #1449.
This warning prevents using std::move() where it can hurt
- on an unnamed temporary or a named automatic variable being
returned from a function. In both cases the value could be
constructed directly in its final destination, but std::move()
prevents it.
Fix the handful of cases (all trivial), and enable the warning.
Closes#8992
The cql3 layer manipulates lists as `std::vector`s (of `managed_bytes_opt`). Since lists can be arbitrarily large, let's use chunked vectors there to prevent potentially large contiguous allocations.
Closes#8668
* github.com:scylladb/scylla:
cql3: change the internal type of tuples::in_value from std::vector to chunked_vector
cql3: change the internal type of lists::value from std::vector to chunked_vector
cql3: in multi_item_terminal, return the vector of items by value
"
The patch set is an assorted collection of header cleanups, e.g:
* Reduce number of boost includes in header files
* Switch to forward declarations in some places
A quick measurement was performed to see if these changes
provide any improvement in build times (ccache cleaned and
existing build products wiped out).
The results are posted below (`/usr/bin/time -v ninja dev-build`)
for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX).
Before:
Command being timed: "ninja dev-build"
User time (seconds): 28262.47
System time (seconds): 824.85
Percent of CPU this job got: 3979%
Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2129888
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1402838
Minor (reclaiming a frame) page faults: 124265412
Voluntary context switches: 1879279
Involuntary context switches: 1159999
Swaps: 0
File system inputs: 0
File system outputs: 11806272
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
After:
Command being timed: "ninja dev-build"
User time (seconds): 26270.81
System time (seconds): 767.01
Percent of CPU this job got: 3905%
Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2117608
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1400189
Minor (reclaiming a frame) page faults: 117570335
Voluntary context switches: 1870631
Involuntary context switches: 1154535
Swaps: 0
File system inputs: 0
File system outputs: 11777280
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The observed improvement is about 5% of total wall clock time
for `dev-build` target.
Also, all commits make sure that headers stay self-sufficient,
which would help to further improve the situation in the future.
"
* 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla:
transport: remove extraneous `qos/service_level_controller` includes from headers
treewide: remove evidently unneded storage_proxy includes from some places
service_level_controller: remove extraneous `service/storage_service.hh` include
sstables/writer: remove extraneous `service/storage_service.hh` include
treewide: remove extraneous database.hh includes from headers
treewide: reduce boost headers usage in scylla header files
cql3: remove extraneous includes from some headers
cql3: various forward declaration cleanups
utils: add missing <limits> header in `extremum_tracking.hh`
Yet another patch aiming to prevent potentially large allocations.
abstract_type::hash somehow evaded the anti-linearization patches until now.
Fix that.
Note that decimals and varints are still linearized, but we leave it be,
under the assumption that nobody inserts 128KiB-large varints into a database.
Refs: #8120Closes#8689
Before this patch, deserializing a collection from a (prepared) CQL request
involved deserializing every element and serializing it again. Originally this
was a hacky method of validation, and it was also needed to reserialize nested
frozen collections from the CQLv2 format to the CQLv3 format.
But since then we started doing validation separately (before calls to
from_serialized) and CQLv2 became irrelevant, making reserialization of
elements (which, among other things, involves a memory alocation for every
element) pure waste.
This patch adds a faster path for collections in the v3 format, which does not
involve linearizing or reserializing the elements (since v3 is the same as
our internal format).
After this patch, the path from prepared CQL statements to
atomic_cell_or_collection is almost completely linearization-free. The last
remaining place is collection_mutation_description, where map keys are
linearized.
This patch switches the type used to store collection elements inside the
intermediate form used in lists::value, tuples::value etc. from bytes
to managed_bytes. After this patch, tuple and list elements are only linearized
in from_serialized, which will be corrected soon.
This commit introduces some additional copies in expression.cc, which
will be dealt with in a future commit.
We will use them to avoid linearization when going from the intermediate
std::vector<bytes> form in cql3/ to the atomic_cell format, by outputting
managed_bytes instead of bytes in get_with_protocol_version.
To avoid high latencies caused by large contigous allocations
needed by linearizing, work on fragmented buffers instead.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
In preparation for removing linearization from abstract_type::compare,
add options to avoid linearization in tuple_deserializing_iterator.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
We may want to store a tuple in a fragmented buffer. To split it
into a vector of optional bytes, tuple_type_impl::split can be used.
To split a contiguous buffer(bytes_view), simply pass
single_fragmented_view(bytes_view).
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
The template method needs to be specialized in each file that is
using it. To avoid rewriting the specialization into multiple files,
move it to the header file.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
When a tuple value is serialized, we go through every element type and
use it to serialize element values. But an element type can be
reversed, which is artificially different from the type of the value
being read. This results in a server error due to the type mismatch.
Fix it by unreversing the element type prior to comparing it to the
value type.
Fixes#7902
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#8316
time_point_to_string ensures its input is a time_point with
millisecond resolution (though it neglects to verify the epoch
is what it expects). Change the test from a clunky enable_if to
a nicer concept.
Closes#8321
In some places we use the `*reinterpret_cast<const net::packed<T>*>(&x)`
pattern to reinterpret memory. This is a violation of C++'s aliasing rules,
which invokes undefined behaviour.
The blessed way to correctly reinterpret memory is to copy it into a new
object. Let's do that.
Note: the reinterpret_cast way has no performance advantage. Compilers
recognize the memory copy pattern and optimize it away.