Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Eliminate not used includes and replace some more includes
with forward declarations where appropriate.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Yet another patch preventing potentially large allocations.
Currently, collection_mutation{_view,}_description linearize each collection
value during deserialization. It's not unthinkable that a user adds a
large element to a list or a map, so let's avoid that.
This patch removes the dependency on linearizing_input_stream, which does not
provide a way to read fragmented subbuffers, and replaces it with a new
helper, which does. (Extending linearizing_input_stream is not viable without
rewriting it completely).
Only linearization of collection values is corrected in this patch.
Collection keys are still linearized. Storing them in managed_bytes is likely
to be more harmful than helpful, because large map keys are extremely unlikely,
and UUIDs, which are used as keys in lists, do not fit into manages_bytes's
small value optimization, so this would incure an extra allocation for every
list element.
Note: this patch leaves utils/linearizing_input_stream.hh unused.
Refs: #8120Closes#8690
Commit aab6b0ee27 introduced the
controversial new IMR format, which relied on a very template-heavy
infrastructure to generate serialization and deserialization code via
template meta-programming. The promise was that this new format, beyond
solving the problems the previous open-coded representation had (working
on linearized buffers), will speed up migrating other components to this
IMR format, as the IMR infrastructure reduces code bloat, makes the code
more readable via declarative type descriptions as well as safer.
However, the results were almost the opposite. The template
meta-programming used by the IMR infrastructure proved very hard to
understand. Developers don't want to read or modify it. Maintainers
don't want to see it being used anywhere else. In short, nobody wants to
touch it.
This commit does a conceptual revert of
aab6b0ee27. A verbatim revert is not
possible because related code evolved a lot since the merge. Also, going
back to the previous code would mean we regress as we'd revert the move
to fragmented buffers. So this revert is only conceptual, it changes the
underlying infrastructure back to the previous open-coded one, but keeps
the fragmented buffers, as well as the interface of the related
components (to the extent possible).
Fixes: #5578
Use `utils::linearizing_input_stream` for the deserizalization of the
collection. Allows for avoiding the linearization of the entire cell
value, instead only linearizing individual values as they are
deserialized from the buffer.
We could use iterators over cells instead of a vector of cells
in collection_mutation(_view)_description. Then some use cases could
provide iterators that construct the cells "on the fly".
The purpose of collection_type_impl::to_value was to serialize a
collection for sending over CQL. The corresponding function in origin
is called serializeForNativeProtocol, but the name is a bit lengthy,
so I settled for serialize_for_cql.
The method now became a free-standing function, using the visit
function to perform a dispatch on the collection type instead
of a virtual call. This also makes it easier to generalize it to UDTs
in future commits.
Remove the old serialize_for_native_protocol with a FIXME: implement
inside. It was already implemented (to_value), just called differently.
remove dead methods: enforce_limit and serialized_values. The
corresponding methods in C* are auxiliary methods used inside
serializeForNativeProtocol. In our case, the entire algorithm
is wholly written in serialize_for_cql.
`collection_type_impl::serialize_mutation_form`
became `collection_mutation(_view)_description::serialize`.
Previously callers had to cast their data_type down to collection_type
to use serialize_mutation_form. Now it's done inside `serialize`.
In the future `serialize` will be generalized to handle UDTs.
`collection_type_impl::deserialize_mutation_form`
became a free standing function `deserialize_collection_mutation`
with similiar benefits. Actually, noone needs to call this function
manually because of the next paragraph.
A common pattern consisting of linearizing data inside a `collection_mutation_view`
followed by calling `deserialize_mutation_form` has been abstracted out
as a `with_deserialized` method inside collection_mutation_view.
serialize_mutation_form_only_live was removed,
because it hadn't been used anywhere.
collection_type_impl::mutation became collection_mutation_description.
collection_type_impl::mutation_view became collection_mutation_view_description.
These classes now reside inside collection_mutation.hh.
Additional documentation has been written for these classes.
Related function implementations were moved to collection_mutation.cc.
This makes it easier to generalize these classes to non-frozen UDTs in future commits.
The new names (together with documentation) better describe their purpose.
The classes 'collection_mutation' and 'collection_mutation_view'
were moved to a separate header, collection_mutation.hh.
Implementations of functions that operate on these classes,
including some methods of collection_type_impl, were moved
to a separate compilation unit, collection_mutation.cc.
This makes it easier to modify these structures in future commits
in order to generalize them for non-frozen User Defined Types.
Some additional documentation has been written for collection_mutation.