Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Adds two functions that take a range over pairs of serialized values
and return a serialized map value.
There are 2 functions - one operating on bytes and one operating on managed_bytes.
The version with managed_bytes is used in expression.cc, used to be a local static function.
The bytes version will be used in type_json.cc in the next commit.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
contains_collection() and contains_set_or_map() used to be calculated on each call().
Now the result is calculated only once during type creation.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Eliminate not used includes and replace some more includes
with forward declarations where appropriate.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
This patch switches the type used to store collection elements inside the
intermediate form used in lists::value, tuples::value etc. from bytes
to managed_bytes. After this patch, tuple and list elements are only linearized
in from_serialized, which will be corrected soon.
This commit introduces some additional copies in expression.cc, which
will be dealt with in a future commit.
We will need them to port the representation of collection types
in cql3/ from bytes to managed_bytes.
The version which takes an iterator of `bytes` as an argument will
be removed after that transition is complete.
Commit aab6b0ee27 introduced the
controversial new IMR format, which relied on a very template-heavy
infrastructure to generate serialization and deserialization code via
template meta-programming. The promise was that this new format, beyond
solving the problems the previous open-coded representation had (working
on linearized buffers), will speed up migrating other components to this
IMR format, as the IMR infrastructure reduces code bloat, makes the code
more readable via declarative type descriptions as well as safer.
However, the results were almost the opposite. The template
meta-programming used by the IMR infrastructure proved very hard to
understand. Developers don't want to read or modify it. Maintainers
don't want to see it being used anywhere else. In short, nobody wants to
touch it.
This commit does a conceptual revert of
aab6b0ee27. A verbatim revert is not
possible because related code evolved a lot since the merge. Also, going
back to the previous code would mean we regress as we'd revert the move
to fragmented buffers. So this revert is only conceptual, it changes the
underlying infrastructure back to the previous open-coded one, but keeps
the fragmented buffers, as well as the interface of the related
components (to the extent possible).
Fixes: #5578
Devirtualizes collection_type_impl::deserialize (so it can be templated) and
adds a FragmentedView overload. This will allow us to deserialize collections
with explicit cql_serialization_format directly from fragmented buffers.
Convert some more helper functions to accept const reference to
column_specification and column_identifier instead of shared_ptr.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The purpose of collection_type_impl::to_value was to serialize a
collection for sending over CQL. The corresponding function in origin
is called serializeForNativeProtocol, but the name is a bit lengthy,
so I settled for serialize_for_cql.
The method now became a free-standing function, using the visit
function to perform a dispatch on the collection type instead
of a virtual call. This also makes it easier to generalize it to UDTs
in future commits.
Remove the old serialize_for_native_protocol with a FIXME: implement
inside. It was already implemented (to_value), just called differently.
remove dead methods: enforce_limit and serialized_values. The
corresponding methods in C* are auxiliary methods used inside
serializeForNativeProtocol. In our case, the entire algorithm
is wholly written in serialize_for_cql.
`collection_type_impl::serialize_mutation_form`
became `collection_mutation(_view)_description::serialize`.
Previously callers had to cast their data_type down to collection_type
to use serialize_mutation_form. Now it's done inside `serialize`.
In the future `serialize` will be generalized to handle UDTs.
`collection_type_impl::deserialize_mutation_form`
became a free standing function `deserialize_collection_mutation`
with similiar benefits. Actually, noone needs to call this function
manually because of the next paragraph.
A common pattern consisting of linearizing data inside a `collection_mutation_view`
followed by calling `deserialize_mutation_form` has been abstracted out
as a `with_deserialized` method inside collection_mutation_view.
serialize_mutation_form_only_live was removed,
because it hadn't been used anywhere.
collection_type_impl::mutation became collection_mutation_description.
collection_type_impl::mutation_view became collection_mutation_view_description.
These classes now reside inside collection_mutation.hh.
Additional documentation has been written for these classes.
Related function implementations were moved to collection_mutation.cc.
This makes it easier to generalize these classes to non-frozen UDTs in future commits.
The new names (together with documentation) better describe their purpose.
The classes 'collection_mutation' and 'collection_mutation_view'
were moved to a separate header, collection_mutation.hh.
Implementations of functions that operate on these classes,
including some methods of collection_type_impl, were moved
to a separate compilation unit, collection_mutation.cc.
This makes it easier to modify these structures in future commits
in order to generalize them for non-frozen User Defined Types.
Some additional documentation has been written for collection_mutation.
Multi-cell lists and maps may be stored in different formats: as sorted
vectors of pairs of values, when retreived from storage, or as sorted
vectors of values, when created from parser literals or supplied as
parameter values.
Implement a specialized compare for use when receiver and paramter
representation don't match.
Add helpers.
They are somewhat expensive (in code size at least) and not needed
everywhere.
Inside the getter the variables are 'const data_type&', so we can
return that. Everything still works when a copy is needed, but in code
that just wants to check a property we avoid the copy.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The type hierarchy is closed, so we can give each leaf an enum value.
This will be used to implement a visitor pattern and reduce code
duplication.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
all atoms that are expired and would be discarded. The body of
`compact_and_expire()` was changed so that it checks cells' tombstone
coverage before it checks their expiry, so that cells that are both
covered by a tombstone and also expired are not passed to the collector.
The collector param is optional and defaults to nullptr. To accommodate
the collector, which needs to know the column id, a new `column_id`
parameter was added as well.
This is quick fix to the immediate problem of large collections causing
large allocations, triggering stalls or OOM. The proper fix is to
use IMR for storing the cells, but that is a complex change that will
require time, so let's not stall/OOM in the meanwhile.
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.
To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.
Even before this patch cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.
This avoids the reference cycle and is easier to understand IMHO.
Tests: unit (dev)
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>