Citing #6138: > In the past few years we have converted most of our codebase to work in terms of fragmented buffers, instead of linearised ones, to help avoid large allocations that put large pressure on the memory allocator. > One prominent component that still works exclusively in terms of linearised buffers is the types hierarchy, more specifically the de/serialization code to/from CQL format. Note that for most types, this is the same as our internal format, notable exceptions are non-frozen collections and user types. > > Most types are expected to contain reasonably small values, but texts, blobs and especially collections can get very large. Since the entire hierarchy shares a common interface we can either transition all or none to work with fragmented buffers. This series gets rid of intermediate linearizations in deserialization. The next steps are removing linearizations from serialization, validation and comparison code. Series summary: - Fix a bug in `fragmented_temporary_buffer::view::remove_prefix`. (Discovered while testing. Since it wasn't discovered earlier, I guess it doesn't occur in any code path in master.) - Add a `FragmentedView` concept to allow uniform handling of various types of fragmented buffers (`bytes_view`, `temporary_fragmented_buffer::view`, `ser::buffer_view` and likely `managed_bytes_view` in the future). - Implement `FragmentedView` for relevant fragmented buffer types. - Add helper functions for reading from `FragmentedView`. - Switch `deserialize()` and all its helpers from `bytes_view` to `FragmentedView`. - Remove `with_linearized()` calls which just became unnecessary. - Add an optimization for single-fragment cases. The addition of `FragmentedView` might be controversial, because another concept meant for the same purpose - `FragmentRange` - is already used. Unfortunately, it lacks the functionality we need. The main (only?) thing we want to do with a fragmented buffer is to extract a prefix from it and `FragmentRange` gives us no way to do that, because it's immutable by design. We can work around that by wrapping it into a mutable view which will track the offset into the immutable `FragmentRange`, and that's exactly what `linearizing_input_stream` is. But it's wasteful. `linearizing_input_stream` is a heavy type, unsuitable for passing around as a view - it stores a pair of fragment iterators, a fragment view and a size (11 words) to conform to the iterator-based design of `FragmentRange`, when one fragment iterator (4 words) already contains all needed state, just hidden. I suggest we replace `FragmentRange` with `FragmentedView` (or something similar) altogether. Refs: #6138 Closes #7692 * github.com:scylladb/scylla: types: collection: add an optimization for single-fragment buffers in deserialize types: add an optimization for single-fragment buffers in deserialize cql3: tuples: don't linearize in in_value::from_serialized cql3: expr: expression: replace with_linearize with linearized cql3: constants: remove unneeded uses of with_linearized cql3: update_parameters: don't linearize in prefetch_data_builder::add_cell cql3: lists: remove unneeded use of with_linearized query-result-set: don't linearize in result_set_builder::deserialize types: remove unneeded collection deserialization overloads types: switch collection_type_impl::deserialize from bytes_view to FragmentedView cql3: sets: don't linearize in value::from_serialized cql3: lists: don't linearize in value::from_serialized cql3: maps: don't linearize in value::from_serialized types: remove unused deserialize_aux types: deserialize: don't linearize tuple elements types: deserialize: don't linearize collection elements types: switch deserialize from bytes_view to FragmentedView types: deserialize tuple types from FragmentedView types: deserialize set type from FragmentedView types: deserialize map type from FragmentedView types: deserialize list type from FragmentedView types: add FragmentedView versions of read_collection_size and read_collection_value types: deserialize varint type from FragmentedView types: deserialize floating point types from FragmentedView types: deserialize decimal type from FragmentedView types: deserialize duration type from FragmentedView types: deserialize IP address types from FragmentedView types: deserialize uuid types from FragmentedView types: deserialize timestamp type from FragmentedView types: deserialize simple date type from FragmentedView types: deserialize time type from FragmentedView types: deserialize boolean type from FragmentedView types: deserialize integer types from FragmentedView types: deserialize string types from FragmentedView types: remove unused read_simple_opt types: implement read_simple* versions for FragmentedView utils: fragmented_temporary_buffer: implement FragmentedView for view utils: fragment_range: add single_fragmented_view serializer: implement FragmentedView for buffer_view utils: fragment_range: add linearized and with_linearized for FragmentedView utils: fragment_range: add FragmentedView utils: fragmented_temporary_buffer: fix view::remove_prefix
Scylla
What is Scylla?
Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.
For more information, please see the ScyllaDB web site.
Build Prerequisites
Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).
Building Scylla
Building Scylla with the frozen toolchain dbuild is as easy as:
$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
For further information, please see:
- Developer documentation for more information on building Scylla.
- Build documentation on how to build Scylla binaries, tests, and packages.
- Docker image build documentation for information on how to build Docker images.
Running Scylla
To start Scylla server, run:
$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1
This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory.
The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations).
Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.
For more run options, run:
$ ./tools/toolchain/dbuild ./build/release/scylla --help
Testing
See test.py manual.
Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.
Documentation
Documentation can be found in ./docs and on the wiki. There is currently no clear definition of what goes where, so when looking for something be sure to check both. Seastar documentation can be found here. User documentation can be found here.
Training
Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.
Contributing to Scylla
If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.
If you are a developer working on Scylla, please read the developer guidelines.
Contact
- The users mailing list and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
- The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.