Files
scylladb/test/alternator
Nadav Har'El 3aca1ca572 alternator: make BatchGetItem group reads by partition
DynamoDB API's BatchGetItem invokes a number (up to 25) of read requests
in parallel, returning when all results are available. Alternator naively
implemented this by sending all read requests in parallel, no matter which
requests these were.

That implementation was inefficient when all the requests are to different
items (clustering rows) of the same partition. In a multi-node setup this
will end up sending 25 separate requests to the same remote node(s). Even
on a single-node setup, this may result in reading from disk more than
once, and even if the partition is cached - doing an O(logN) search in
each multiple times.

What we do in this patch, instead, is to group all the BatchGetItem
requests that aimed at the same partition into a single read request
asking for a (sorted) list of clustering keys. This is similar to an
"IN" request in CQL.

As an example of the performance benefit of this patch, I tried a
BatchGetItem request asking for 20 random items from a 10-million item
partition. I measured the latency of this request on a single-node
Scylla. Before this patch, I saw a latency of 17-21 ms (the lower number
is when the request is retried and the requested items are already in
the cache). After this patch, the latency is 10-14 ms. The performance
improvement on multi-node clusters are expected to be even higher.

Unfortunately the patch is less trivial than I hoped it would be,
because some of the old code was organized under the assumption that
each read request only returned one item (and if it failed, it means
only one item failed), so this part of the code had to be reorganized
(and, for making the code more readable, coroutinized).

An unintended benefit of the code reorganization is that it also gave
me an opportunity to fail an attempt to ask BatchGetItem the same
item more than once (issue #10757).

The patch also adds a few more corner cases in the tests, to be even
more sure that the code reorganization doesn't introduce a regression
in BatchGetItem.

Fixes #10753
Fixes #10757

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-19 14:47:57 +03:00
..

Tests for Alternator that should also pass, identically, against DynamoDB.

Tests use the boto3 library for AWS API, and the pytest frameworks (both are available from Linux distributions, or with "pip install").

To run all tests against the local installation of Alternator on http://localhost:8000, just run pytest.

Some additional pytest options:

  • To run all tests in a single file, do pytest test_table.py.
  • To run a single specific test, do pytest test_table.py::test_create_table_unsupported_names.
  • Additional useful pytest options, especially useful for debugging tests:
    • -v: show the names of each individual test running instead of just dots.
    • -s: show the full output of running tests (by default, pytest captures the test's output and only displays it if a test fails)

Add the --aws option to test against AWS instead of the local installation. For example - pytest --aws test_item.py or pytest --aws.

If you plan to run tests against AWS and not just a local Scylla installation, the files ~/.aws/credentials should be configured with your AWS key:

[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

and ~/.aws/config with the default region to use in the test:

[default]
region = us-east-1

HTTPS support

In order to run tests with HTTPS, run pytest with --https parameter. Note that the Scylla cluster needs to be provided with alternator_https_port configuration option in order to initialize a HTTPS server. Moreover, running an instance of a HTTPS server requires a certificate. Here's how to easily generate a key and a self-signed certificate, which is sufficient to run --https tests:

openssl genrsa 2048 > scylla.key
openssl req -new -x509 -nodes -sha256 -days 365 -key scylla.key -out scylla.crt

If this pair is put into conf/ directory, it will be enough to allow the alternator HTTPS server to think it's been authorized and properly certified. Still, boto3 library issues warnings that the certificate used for communication is self-signed, and thus should not be trusted. For the sake of running local tests this warning is explicitly ignored.

Authorization

By default, boto3 prepares a properly signed Authorization header with every request. In order to confirm the authorization, the server recomputes the signature by using user credentials (user-provided username + a secret key known by the server), and then checks if it matches the signature from the header. Early alternator code did not verify signatures at all, which is also allowed by the protocol. A partial implementation of the authorization verification can be allowed by providing a Scylla configuration parameter:

  alternator_enforce_authorization: true

The implementation is currently coupled with Scylla's system_auth.roles table, which means that an additional step needs to be performed when setting up Scylla as the test environment. Tests will use the following credentials: Username: alternator Secret key: secret_pass

With CQLSH, it can be achieved by executing this snipped:

cqlsh -x "INSERT INTO system_auth.roles (role, salted_hash) VALUES ('alternator', 'secret_pass')"

Most tests expect the authorization to succeed, so they will pass even with alternator_enforce_authorization turned off. However, test cases from test_authorization.py may require this option to be turned on, so it's advised.