`flat_mutation_reader::consume_pausable` is widely used in Scylla. Some places worth mentioning are memtables and combined readers but there are others as well. This patchset improves `consume_pausable` in three ways: 1. it removes unnecessary allocation 2. it rearranges ifs to not check the same thing twice 3. for a consumer that returns plain stop_iteration not a future<stop_iteration> it reduces the amount of future usage Test: unit(dev, release, debug) Combined reader microbenchmark has shown from 2% to 22% improvement in median execution time while memtable microbenchmark has shown from 3.6% to 7.8% improvement in median execution time. Before the change: ``` ./build/release/test/perf/perf_mutation_readers --random-seed 3549335083 single run iterations: 0 single run duration: 1.000s number of runs: 5 number of cores: 16 random seed: 3549335083 test iterations median mad min max combined.one_row 1316234 140.120ns 0.020ns 140.074ns 140.141ns combined.single_active 7332 91.484us 31.890ns 91.453us 91.778us combined.many_overlapping 945 870.973us 429.720ns 868.625us 871.403us combined.disjoint_interleaved 7102 85.989us 7.847ns 85.973us 85.997us combined.disjoint_ranges 7129 85.570us 7.840ns 85.562us 85.596us combined.overlapping_partitions_disjoint_rows 5458 124.787us 56.738ns 124.731us 125.370us clustering_combined.ranges_generic 1920688 217.940ns 0.184ns 217.742ns 218.275ns clustering_combined.ranges_specialized 1935318 194.610ns 0.199ns 194.210ns 195.228ns memtable.one_partition_one_row 624001 1.600us 1.405ns 1.599us 1.605us memtable.one_partition_many_rows 79551 12.555us 1.829ns 12.549us 12.558us memtable.many_partitions_one_row 40557 24.748us 77.083ns 24.644us 25.135us memtable.many_partitions_many_rows 3220 310.429us 57.628ns 310.295us 311.189us ``` After the change: ``` ./build/release/test/perf/perf_mutation_readers --random-seed 3549335083 single run iterations: 0 single run duration: 1.000s number of runs: 5 number of cores: 16 random seed: 3549335083 test iterations median mad min max combined.one_row 1358839 109.222ns 0.122ns 109.089ns 109.348ns combined.single_active 7525 87.305us 25.540ns 87.273us 87.362us combined.many_overlapping 962 853.195us 1.904us 851.244us 855.142us combined.disjoint_interleaved 7310 81.988us 28.877ns 81.949us 82.032us combined.disjoint_ranges 7315 81.699us 37.144ns 81.662us 81.874us combined.overlapping_partitions_disjoint_rows 5591 120.964us 15.294ns 120.949us 121.120us clustering_combined.ranges_generic 1954722 211.993ns 0.052ns 211.883ns 212.084ns clustering_combined.ranges_specialized 2042194 187.807ns 0.066ns 187.732ns 188.289ns memtable.one_partition_one_row 648701 1.542us 0.339ns 1.542us 1.543us memtable.one_partition_many_rows 85007 11.759us 1.168ns 11.752us 11.782us memtable.many_partitions_one_row 43893 22.805us 17.147ns 22.782us 22.843us memtable.many_partitions_many_rows 3441 290.220us 41.720ns 290.172us 290.306us ``` Closes #8359 * github.com:scylladb/scylla: flat_mutation_reader: optimize consume_pausable for some consumers flat_mutation_reader: special case consumers in consume_pausable flat_mutation_reader: Change order of checks in consume_pausable flat_mutation_reader: fix indentation in consume_pausable flat_mutation_reader: Remove allocation in consume_pausable perf: Add benchmarks for large partitions
Scylla
What is Scylla?
Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.
For more information, please see the ScyllaDB web site.
Build Prerequisites
Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).
Building Scylla
Building Scylla with the frozen toolchain dbuild is as easy as:
$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
For further information, please see:
- Developer documentation for more information on building Scylla.
- Build documentation on how to build Scylla binaries, tests, and packages.
- Docker image build documentation for information on how to build Docker images.
Running Scylla
To start Scylla server, run:
$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1
This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory.
The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations).
Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.
For more run options, run:
$ ./tools/toolchain/dbuild ./build/release/scylla --help
Testing
See test.py manual.
Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.
Documentation
Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.
Training
Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.
Contributing to Scylla
If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.
If you are a developer working on Scylla, please read the developer guidelines.
Contact
- The users mailing list and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
- The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.