Replace throwing `protocol_exception` with returning it as a result or an exceptional future in the transport server module. The goal is to improve performance. Most of the `protocol_exception` throws were made from `fragmented_temporary_buffer` module, by passing `exception_thrower()` to its `read*` methods. `fragmented_temporary_buffer` is changed so that it now accepts an exception creator, not exception thrower. `fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced `fragmented_temporary_buffer_concepts::ExceptionThrower` and all methods that have been throwing now return failed result of type `utils::result_with_eptr`. This change is then propagated to the callers. The scope of this patch is `protocol_exception`, so commitlog just calls `.value()` method on the result. If the result failed, that will throw the exception from the result, as defined by `utils::result_with_eptr_throw_policy`. This means that the behavior of commitlog module stays the same. transport server module handles results gracefully. All the caller functions that return non-future value `T` now return `utils::result_with_eptr<T>`. When the caller is a function that returns a future, and it receives failed result, `make_exception_future(std::move(failed_result).value())` is returned. The rest of the callstack up to the transport server `handle_error` function is already working without throwing, and that's how zero throws is achieved. cql3 module changes do the same as transport server module. Benchmark that is not yet merged has commit `67fbe35833e2d23a8e9c2dcb5e04580231d8ec96`, [GitHub diff view](https://github.com/scylladb/scylladb/compare/master...nuivall:scylladb:perf_cql_raw). It uses either read or write query. Command line used: ``` ./build/release/scylla perf-cql-raw --workdir ~/tmp/scylladir --smp 1 --developer-mode 1 --workload write --duration 300 --concurrency 1000 --username cassandra --password cassandra 2>/dev/null ``` The only thing changed across runs is `--workload write`/`--workload read`. Built and run on `release` target. <details> ``` throughput: mean= 36946.04 standard-deviation=1831.28 median= 37515.49 median-absolute-deviation=1544.52 maximum=39748.41 minimum=28443.36 instructions_per_op: mean= 108105.70 standard-deviation=965.19 median= 108052.56 median-absolute-deviation=53.47 maximum=124735.92 minimum=107899.00 cpu_cycles_per_op: mean= 70065.73 standard-deviation=2328.50 median= 69755.89 median-absolute-deviation=1250.85 maximum=92631.48 minimum=66479.36 ⏱ real=5:11.08 user=2:00.20 sys=2:25.55 cpu=85% ``` ``` throughput: mean= 40718.30 standard-deviation=2237.16 median= 41194.39 median-absolute-deviation=1723.72 maximum=43974.56 minimum=34738.16 instructions_per_op: mean= 117083.62 standard-deviation=40.74 median= 117087.54 median-absolute-deviation=31.95 maximum=117215.34 minimum=116874.30 cpu_cycles_per_op: mean= 58777.43 standard-deviation=1225.70 median= 58724.65 median-absolute-deviation=776.03 maximum=64740.54 minimum=55922.58 ⏱ real=5:12.37 user=27.461 sys=3:54.53 cpu=83% ``` ``` throughput: mean= 37107.91 standard-deviation=1698.58 median= 37185.53 median-absolute-deviation=1300.99 maximum=40459.85 minimum=29224.83 instructions_per_op: mean= 108345.12 standard-deviation=931.33 median= 108289.82 median-absolute-deviation=55.97 maximum=124394.65 minimum=108188.37 cpu_cycles_per_op: mean= 70333.79 standard-deviation=2247.71 median= 69985.47 median-absolute-deviation=1212.65 maximum=92219.10 minimum=65881.72 ⏱ real=5:10.98 user=2:40.01 sys=1:45.84 cpu=85% ``` ``` throughput: mean= 38353.12 standard-deviation=1806.46 median= 38971.17 median-absolute-deviation=1365.79 maximum=41143.64 minimum=32967.57 instructions_per_op: mean= 117270.60 standard-deviation=35.50 median= 117268.07 median-absolute-deviation=16.81 maximum=117475.89 minimum=117073.74 cpu_cycles_per_op: mean= 57256.00 standard-deviation=1039.17 median= 57341.93 median-absolute-deviation=634.50 maximum=61993.62 minimum=54670.77 ⏱ real=5:12.82 user=4:10.79 sys=11.530 cpu=83% ``` This shows ~240 instructions per op increase for reads and ~180 instructions per op increase for writes. Tests have been run multiple times, with almost identical results. Each run lasted 300 seconds. Number of operations executed is roughly 38k per second * 300 seconds = 11.4m ops. Update: I have repeated the benchmark with clean state - reboot computer, put in performance mode, rebuild, closed other apps that might affect CPU and disk usage. run count: 5 times before and 5 times after the patch duration: 300 seconds Average write throughput median before patch: 41155.99 Average write throughput median after patch: 42193.22 Median absolute deviation is also lower now, with values in range 350-550, while the previous runs' values were in range 750-1350. </details> Built and run on `release` target. <details> ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null ``` throughput: mean= 14910.90 standard-deviation=477.72 median= 14956.73 median-absolute-deviation=294.16 maximum=16061.18 minimum=13198.68 instructions_per_op: mean= 659591.63 standard-deviation=495.85 median= 659595.46 median-absolute-deviation=324.91 maximum=661184.94 minimum=658001.49 cpu_cycles_per_op: mean= 213301.49 standard-deviation=2724.27 median= 212768.64 median-absolute-deviation=1403.85 maximum=225837.15 minimum=208110.12 ⏱ real=5:19.26 user=5:00.22 sys=15.827 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null ``` throughput: mean= 93345.45 standard-deviation=4499.00 median= 93915.52 median-absolute-deviation=2764.41 maximum=104343.64 minimum=79816.66 instructions_per_op: mean= 65556.11 standard-deviation=97.42 median= 65545.11 median-absolute-deviation=71.51 maximum=65806.75 minimum=65346.25 cpu_cycles_per_op: mean= 34160.75 standard-deviation=803.02 median= 33927.16 median-absolute-deviation=453.08 maximum=39285.19 minimum=32547.13 ⏱ real=5:03.23 user=4:29.46 sys=29.255 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null ``` throughput: mean= 206982.18 standard-deviation=15894.64 median= 208893.79 median-absolute-deviation=9923.41 maximum=232630.14 minimum=127393.34 instructions_per_op: mean= 35983.27 standard-deviation=6.12 median= 35982.75 median-absolute-deviation=3.75 maximum=36008.24 minimum=35952.14 cpu_cycles_per_op: mean= 17374.87 standard-deviation=985.06 median= 17140.81 median-absolute-deviation=368.86 maximum=26125.38 minimum=16421.99 ⏱ real=5:01.23 user=4:57.88 sys=0.124 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null ``` throughput: mean= 16198.26 standard-deviation=902.41 median= 16094.02 median-absolute-deviation=588.58 maximum=17890.10 minimum=13458.74 instructions_per_op: mean= 659752.73 standard-deviation=488.08 median= 659789.16 median-absolute-deviation=334.35 maximum=660881.69 minimum=658460.82 cpu_cycles_per_op: mean= 216070.70 standard-deviation=3491.26 median= 215320.37 median-absolute-deviation=1678.06 maximum=232396.48 minimum=209839.86 ⏱ real=5:17.33 user=4:55.87 sys=18.425 cpu=99% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null ``` throughput: mean= 97067.79 standard-deviation=2637.79 median= 97058.93 median-absolute-deviation=1477.30 maximum=106338.97 minimum=87457.60 instructions_per_op: mean= 65695.66 standard-deviation=58.43 median= 65695.93 median-absolute-deviation=37.67 maximum=65947.76 minimum=65547.05 cpu_cycles_per_op: mean= 34300.20 standard-deviation=704.66 median= 34143.92 median-absolute-deviation=321.72 maximum=38203.68 minimum=33427.46 ⏱ real=5:03.22 user=4:31.56 sys=29.164 cpu=99% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null ``` throughput: mean= 223495.91 standard-deviation=6134.95 median= 224825.90 median-absolute-deviation=3302.09 maximum=234859.90 minimum=193209.69 instructions_per_op: mean= 35981.41 standard-deviation=3.16 median= 35981.13 median-absolute-deviation=2.12 maximum=35991.46 minimum=35972.55 cpu_cycles_per_op: mean= 17482.26 standard-deviation=281.82 median= 17424.08 median-absolute-deviation=143.91 maximum=19120.68 minimum=16937.43 ⏱ real=5:01.23 user=4:58.54 sys=0.136 cpu=99% ``` </details> Fixes: #24567 This PR is a continuation of #24738 [transport: remove throwing protocol_exception on connection start](https://github.com/scylladb/scylladb/pull/24738). This PR does not solve a burning issue, but is rather an improvement in the same direction. As it is just an enhancement, it should not be backported. Closes scylladb/scylladb#25408 * github.com:scylladb/scylladb: test/cqlpy: add protocol exception tests test/cqlpy: `test_protocol_exceptions.py` refactor message frame building test/cqlpy: `test_protocol_exceptions.py` refactor duplicate code transport: replace `make_frame` throw with return result cql3: remove throwing `protocol_exception` transport: replace throw in validate_utf8 with result_with_exception_ptr return transport: replace throwing protocol_exception with returns utils: add result_with_exception_ptr test/cqlpy: add unknown compression algorithm test case
Scylla unit tests using C++ and the Boost test framework
The source files in this directory are Scylla unit tests written in C++ using the Boost.Test framework. These unit tests come in three flavors:
-
Some simple tests that check stand-alone C++ functions or classes use Boost's
BOOST_AUTO_TEST_CASE. -
Some tests require Seastar features, and need to be declared with Seastar's extensions to Boost.Test, namely
SEASTAR_TEST_CASE. -
Even more elaborate tests require not just a functioning Seastar environment but also a complete (or partial) Scylla environment. Those tests use the
do_with_cql_env()ordo_with_cql_env_thread()function to set up a mostly-functioning environment behaving like a single-node Scylla, in which the test can run.
While we have many tests of the third flavor, writing new tests of this type should be reserved to white box tests - tests where it is necessary to inspect or control Scylla internals that do not have user-facing APIs such as CQL. In contrast, black-box tests - tests that can be written only using user-facing APIs, should be written in one of newer test frameworks that we offer - such as test/cqlpy or test/alternator (in Python, using the CQL or DynamoDB APIs respectively) or test/cql (using textual CQL commands), or - if more than one Scylla node is needed for a test - using the test/topology* framework.
Running tests
Because these are C++ tests, they need to be compiled before running.
To compile a single test executable row_cache_test, use a command like
ninja build/dev/test/boost/row_cache_test
You can also use ninja dev-test to build all C++ tests, or use
ninja deb-build to build the C++ tests and also the full Scylla executable
(however, note that full Scylla executable isn't needed to run Boost tests).
Replace "dev" by "debug" or "release" in the examples above and below to use the "debug" build mode (which, importantly, compiles the test with ASAN and UBSAN enabling on and helps catch difficult-to-catch use-after-free bugs) or the "release" build mode (optimized for run speed).
To run an entire test file row_cache_test, including all its test
functions, use a command like:
build/dev/test/boost/row_cache_test -- -c1 -m1G
to run a single test function test_reproduce_18045() from the longer test
file, use a command like:
build/dev/test/boost/row_cache_test -t test_reproduce_18045 -- -c1 -m1G
In these command lines, the parameters before the -- are passed to
Boost.Test, while the parameters after the -- are passed to the test code,
and in particular to Seastar. In this example Seastar is asked to run on one
CPU (-c1) and use 1G of memory (-m1G) instead of hogging the entire
machine. The Boost.Test option -t test_reproduce_18045 asks it to run just
this one test function instead of all the test functions in the executable.
Unfortunately, interrupting a running test with control-C while doesn't
work. This is a known bug (#5696). Kill a test with SIGKILL (-9) if you
need to kill it while it's running.
Boost tests can also be run using test.py - which is a script that provides a uniform way to run all tests in scylladb.git - C++ tests, Python tests, etc.
Execution with pytest
To run all tests with pytest execute
pytest test/boost
To execute all tests in one file, provide the path to the source filename as a parameter
pytest test/boost/aggregate_fcts_test.cc
Since it's a normal path, autocompletion works in the terminal out of the box.
To execute only one test function, provide the path to the source file and function name
pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg
To provide a specific mode, use the next parameter --mode dev,
if parameter isn't provided pytest tries to use ninja mode_list to find out the compiled modes.
Parallel execution is controlled by pytest-xdist and the parameter -n auto.
This command starts tests with the number of workers equal to CPU cores.
The useful command to discover the tests in the file or directory is
pytest --collect-only -q --mode dev test/boost/aggregate_fcts_test.cc
That will return all test functions in the file.
To execute only one function from the test, you can invoke the output from the previous command.
However, suffix for mode should be skipped.
For example,
output shows in the terminal something like this test/boost/aggregate_fcts_test.cc::test_aggregate_avg.dev.
So to execute this specific test function, please use the next command
pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg
Writing tests
Because of the large build time and build size of each separate test executable, it is recommended to put test functions into relatively large source files. But not too large - to keep compilation time of a single source file (during development) at reasonable levels.
When adding new source files in test/boost, don't forget to list the new source file in configure.py and also in CMakeLists.txt. The former is needed by our CI, but the latter is preferred by some developers.