Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
BatchGetItem request is a map of table names and 'sub-requests', ExpressionAttributeNames is defined on
'sub-request' level but the code was instead checking the top level, obtaining nullptr every time which
effectively disables unused names check.
Fixes#13251
This reverts commit 8e892426e2 and fixes
the code in a different way:
That commit moved the scylla_inject_error function from
test/alternator/util.py to test/cql-pytest/util.py and renamed
test/alternator/util.py. I found the rename confusing and unnecessary.
Moreover, the moved function isn't even usable today by the test suite
that includes it, cql-pytest, because it lacks the "rest_api" fixture :-)
so test/cql-pytest/util.py wasn't the right place for it anyway.
test/rest_api/rest_util.py could have been a good place for this function,
but there is another complication: Although the Alternator and rest_api
tests both had a "rest_api" fixture, it has a different type, which led
to the code in rest_api which used the moved function to have to jump
through hoops to call it instead of just passing "rest_api".
I think the best solution is to revert the above commit, and duplicate
the short scylla_inject_error() function. The duplication isn't an
exact copy - the test/rest_api/rest_util.py version now accepts the
"rest_api" fixture instead of the URL that the Alternator version used.
In the future we can remove some of this duplication by having some
shared "library" code but we should do it carefully and starting with
agreeing on the basic fixtures like "rest_api" and "cql", without that
it's not useful to share small functions that operate on them.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11275
Move scylla_inject_error from alternator/ to cql-pytest/ so it
can be reached from various tests dirs. alternator/util.py is
renamed to alternator/alternator_util.py to avoid name shadowing.
DynamoDB API's BatchGetItem invokes a number (up to 25) of read requests
in parallel, returning when all results are available. Alternator naively
implemented this by sending all read requests in parallel, no matter which
requests these were.
That implementation was inefficient when all the requests are to different
items (clustering rows) of the same partition. In a multi-node setup this
will end up sending 25 separate requests to the same remote node(s). Even
on a single-node setup, this may result in reading from disk more than
once, and even if the partition is cached - doing an O(logN) search in
each multiple times.
What we do in this patch, instead, is to group all the BatchGetItem
requests that aimed at the same partition into a single read request
asking for a (sorted) list of clustering keys. This is similar to an
"IN" request in CQL.
As an example of the performance benefit of this patch, I tried a
BatchGetItem request asking for 20 random items from a 10-million item
partition. I measured the latency of this request on a single-node
Scylla. Before this patch, I saw a latency of 17-21 ms (the lower number
is when the request is retried and the requested items are already in
the cache). After this patch, the latency is 10-14 ms. The performance
improvement on multi-node clusters are expected to be even higher.
Unfortunately the patch is less trivial than I hoped it would be,
because some of the old code was organized under the assumption that
each read request only returned one item (and if it failed, it means
only one item failed), so this part of the code had to be reorganized
(and, for making the code more readable, coroutinized).
An unintended benefit of the code reorganization is that it also gave
me an opportunity to fail an attempt to ask BatchGetItem the same
item more than once (issue #10757).
The patch also adds a few more corner cases in the tests, to be even
more sure that the code reorganization doesn't introduce a regression
in BatchGetItem.
Fixes#10753Fixes#10757
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Our simple test for BatchGetItem on a table with sort keys still has
requests with just one sort key per partition, so if BatchGetItem has
a bug with requesting multiple sort keys from the same partition,
such bug won't be caught by the simple tests. So in this test we add a
test that does. This will be useful for the next patch, we are planning
to refactor BatchGetItem's handling of multiple sort keys in the same
partition - so it will be useful to have more regression tests.
The tests test_batch_get_item_large and test_batch_get_item_partial
would actually also catch such bugs, but they are more elaborate tests
and it's nice to have smaller tests more focused on checking specific
features.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
It turns out that DynamoDB forbids requesting the same item more than
once in a GetBatchItem request. Trying to do it would obviously be a
waste, but DynamoDB outright refuses it - and Alternator currently
doesn't (refs #10757).
The test currently passes on DynamoDB and fails on Alternator, so it
is marked xfail.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10758
In test_tracing.py and util.py, we already have three duplicates of code
which looks for the Scylla REST API. We'll soon want to add even more uses
of this REST API, so it's good time to add a single fixture, "rest_api",
which can be use in all tests that need the Scylla REST API instead of
duplicating the same code.
A test using the "rest_api" fixture will be skipped if the server isn't
Scylla, or its port 10000 is not available or not responsive.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220331195337.64352-1-nyh@scylladb.com>
The test verifies that if all reads from a batch operation
failed, the result is an error, and not a success response
with UnprocessedKeys parameter set to all keys.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
DynamoDB limits the number of items that a BatchWriteItem call can write
to 25. As noted in issue #5057, in Alternator we don't have this limit
or any limit on the number of items in a BatchWriteItem - which probably
isn't wise.
This patch adds a simple xfailing test for this.
Refs #5057
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210912140736.76995-1-nyh@scylladb.com>
This patch adds an Alternator test, test_batch_get_item_large,
which checks a BatchGetItem with a moderately large (1.5 MB) response.
The test passes - we do not have a bug in BatchGetItem - but it
does reproduce issue #8522 - the long response is stored in memory as
one long contiguous string and causes a warning about an over-sized
allocation:
WARN ... seastar_memory - oversized allocation: 2281472 bytes.
Incidentally, this test also reproduces a second contiguous
allocation problem - issue #8183 (in BatchWriteItem which we use
in this test to set up the item to read).
Refs #8522
Refs #8183
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210520161619.110941-1-nyh@scylladb.com>
Alternator request sizes can be up to 16 MB, but the current implementation
had the Seastar HTTP server read the entire request as a contiguous string,
and then processed it. We can't avoid reading the entire request up-front -
we want to verify its integrity before doing any additional processing on it.
But there is no reason why the entire request needs to be stored in one big
*contiguous* allocation. This always a bad idea. We should use a non-
contiguous buffer, and that's the goal of this patch.
We use a new Seastar HTTPD feature where we can ask for an input stream,
instead of a string, for the request's body. We then begin the request
handling by reading lthe content of this stream into a
vector<temporary_buffer<char>> (which we alias "chunked_content"). We then
use this non-contiguous buffer to verify the request's signature and
if successful - parse the request JSON and finally execute it.
Beyond avoiding contiguous allocations, another benefit of this patch is
that while parsing a long request composed of chunks, we free each chunk
as soon as its parsing completed. This reduces the peak amount of memory
used by the query - we no longer need to store both unparsed and parsed
versions of the request at the same time.
Although we already had tests with requests of different lengths, most
of them were short enough to only have one chunk, and only a few had
2 or 3 chunks. So we also add a test which makes a much longer request
(a BatchWriteItem with large items), which in my experiment had 17 chunks.
The goal of this test is to verify that the new signature and JSON parsing
code which needs to cross chunk boundaries work as expected.
Fixes#7213.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210309222525.1628234-1-nyh@scylladb.com>
Client libraries (e.g. PynamoDB) expect the UnprocessedKeys
and UnprocessedItems attributes to appear in the response
unconditionally - it's hereby added, along with a simple test case.
Fixes#6569
Tests: unit(dev)
To make the tests in alternator-test runnable by test.py, we need to
move the directory alternator-test/ to test/alternator, because test.py
only looks for tests in subdirectories of test/. Then, we need to create
a test/alternator/suite.yaml saying that this test directory is of type
"Run", i.e., it has a single run script "run" which runs all its tests.
The "run" script had to be slightly modified to be aware of its new
location relative to the source directory.
To run the Alternator tests from test.py, do:
./test.py --mode dev alternator
Note that in this version, the "--mode" has no effect - test/alternator/run
always runs the latest compiled Scylla, regardless of the chosen mode.
The Alternator tests can still be run manually and individually against
a running Scylla or DynamoDB as before - just go to the test/alternator
directory (instead of alternator-test previously) and run "pytest" with
the desired parameters.
Fixes#6046
Signed-off-by: Nadav Har'El <nyh@scylladb.com>