This series adds background reclaim to lsa, with the goal
that most large allocations can be satisfied from available
free memory, and and reclaim work can be done from a preemptible
context.
If the workload has free cpu, then background reclaim will
utilize that free cpu, reducing latency for the main workload.
Otherwise, background reclaim will compete with the main
workload, but since that work needs to happen anyway,
throughput will not be reduced.
A unit test is added to verify it works.
Fixes#1634.
Closes#8044
* github.com:scylladb/scylla:
test: logalloc_test: test background reclaim
logalloc: reduce gap between std min_free and logalloc min_free
logalloc: background reclaim
logalloc: preemptible reclaim
Test that the background reclaimer is able to compete with a
fake load and reclaim 10 MB/s. The test is quite stressful as the "LRU"
is fully randomized.
If the background reclaimer is disabled, the test fails as soon as the
20MB "gap" is exhausted. With the reclaimer enabled, it is able to
free memory ahead of the allocations.
This patch adds to Alternator support for the CORS (Cross-Origin Resource
Sharing) protocol - a simple extension over the HTTP protocol which
browsers use when Javascript code contacts HTTP-based servers.
Although we usually think of Alternator as being used in a three-tier
application, in some setups there is no middle layer and the user's
browser, running Javascript code, wants to communicate directly with the
database. However, for security reasons, by default Javascript loaded
from domain X is not allowed to communicate with different domains Y.
The CORS protocol is meant to allow this, and Alternator needs to
participate in this protocol if it is to be used directly from Javascript
in browsers.
To implement CORS, Alternator needs to respond to the OPTIONS method
which it didn't allow before - with certain headers based on the
input headers. It also needs to do some of these things for the
regular methods (mostly, POST). The patch includes a comprehensive
test that runs against both Alternator and DynamoDB and shows that
Alternator handles these headers and methods the same as DynamoDB.
Additionally, I tested manually a Javascript DynamoDB client - which
didn't work prior to this patch (the browser reported CORS errors),
and works after this patch.
Fixes#8025.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210217222027.1219319-1-nyh@scylladb.com>
"
Current storage of cells in a row is a union of vector and set. The
vector holds 5 cell_and_hash's inline, up to 32 ones in the external
storage and then it's switched to std::set. Once switched, the whole
union becomes the waste of space, as it's size is
sizeof(vector head) + 5 * sizeof(cell and hash) = 90+ bytes
and only 3 pointers from it are used (std::set header). Also the
overhead to keep cell_and_hash as a set entry is more then the size
of the structure itself.
Column ids are 32-bit integers that most likely come sequentialy.
For this kind of a search key a radix tree (with some care for
non-sequential cases) can be beneficial.
This set introduces a compact radix tree, that uses 7-bit sub values
from the search key to index on each node and compacts the nodes
themselves for better memory usage. Then the row::_storage is replaced
with the new tree.
The most notable result is the memory footprint decrease, for wide
rows down to 2x times. The performance of micro-benchmarks is a bit
lower for small rows and (!) higer for longer (8+ cells). The numbers
are in patch #12 (spoiler: they are better than for v2)
v3:
- trimmed size of radix down to 7 bits
- simplified the nodes layouts, now there are 2 of them (was 4)
- enhanced perf_mutation to test N-cells schema
- added AVX intra-nodes search for medium-sized nodes
- added .clone_from() method that helped to improve perf_mutation
- minor
- changed functions not to return values via refs-arguments
- fixed nested classes to properly use language constructors
- renamed index_to to key_t to distinguish from node_index_t
- improved recurring variadic templates not to use sentinel argument
- use standard concepts
v2:
- fixed potential mis-compilation due to strict-aliasing violation
- added oracle test (radix tree is compared with std::map)
- added radix to perf_collection
- cosmetic changes (concepts, comments, names)
A note on item 1 from v2 changelog. The nodes are no longer packed
perfectly, each has grown 3 bytes. But it turned out that when used
as cells container most of this growth drowned in lsa alignments.
next todo:
- aarch64 version of 16-keys node search
tests: unit(dev), unit(debug for radix*), pref(dev)
"
* 'br-radix-tree-for-cells-3' of https://github.com/xemul/scylla:
test/memory_footpring: Print radix tree node sizes
row: Remove old storages
row: Prepare row::equal for switch
row: Prepare row::difference for switch
row: Introduce radix tree storage type
row-equal: Re-declare the cells_equal lambda
test: Add tests for radix tree
utils: Compact radix tree
array-search: Add helpers to search for a byte in array
test/perf_collection: Add callback to check the speed of clone
test/perf_mutation: Add option to run with more than 1 columns
test/perf_mutation: Prepare to have several regular columns
test/perf_mutation: Use builder to build schema
In this patch, we port validation/entities/json_test.java, containing
21 tests for various JSON-related operations - SELECT JSON, INSERT JSON,
and the fromJson() and toJson() functions.
In porting these tests, I uncovered 19 (!!) previously unknown bugs in
Scylla:
Refs #7911: Failed fromJson() should result in FunctionFailure error, not
an internal error.
Refs #7912: fromJson() should allow null parameter.
Refs #7914: fromJson() integer overflow should cause an error, not silent
wrap-around.
Refs #7915: fromJson() should accept "true" and "false" also as strings.
Refs #7944: fromJson() should not accept the empty string "" as a number.
Refs #7949: fromJson() fails to set a map<ascii, int>.
Refs #7954: fromJson() fails to set null tuple elements.
Refs #7972: toJson() truncates some doubles to integers.
Refs #7988: toJson() produces invalid JSON for columns with "time" type.
Refs #7997: toJson() is missing a timezone on timestamp.
Refs #8001: Documented unit "µs" not supported for assigning a "duration"
type.
Refs #8002: toJson() of decimal type doesn't use exponents so can produce
huge output.
Refs #8077: SELECT JSON output for function invocations should be
compatible with Cassandra.
Refs #8078: SELECT JSON ignores the "AS" specification.
Refs #8085: INSERT JSON with bad arguments should yield InvalidRequest
error, not internal error.
Refs #8086: INSERT JSON cannot handle user-defined types with case-
sensitive component names.
Refs #8087: SELECT JSON incorrectly quotes strings inside map keys.
Refs #8092: SELECT JSON missing null component after adding field to
UDT definition.
Refs #8100: SELECT JSON with IN and ORDER BY does not obey the ORDER BY.
Due to these bugs, 8 out of the 21 tests here currently xfail and one
has to be skipped (issue #8100 causes the sanitizer to detect a use
after free, and crash Scylla).
As usual in these sort of tests, all 21 tests pass when running against
Cassandra.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210217130732.1202811-1-nyh@scylladb.com>
Until now, the lists of streams in the `cdc_streams_descriptions` table
for a given generation were stored in a single collection. This solution
has multiple problems when dealing with large clusters (which produce
large lists of streams):
1. large allocations
2. reactor stalls
3. mutations too large to even fit in commitlog segments
This commit changes the schema of the table as described in issue #7993.
The streams are grouped according to token ranges, each token range
being represented by a separate clustering row. Rows are inserted in
reasonably large batches for efficiency.
The table is renamed to enable easy upgrade. On upgrade, the latest CDC
generation's list of streams will be (re-)inserted into the new table.
Yet another table is added: one that contains only the generation
timestamps clustered in a single partition. This makes it easy for CDC
clients to learn about new generations. It also enables an elegant
two-phase insertion procedure of the generation description: first we
insert the streams; only after ensuring that a quorum of replicas
contains them, we insert the timestamp. Thus, if any client observes a
timestamp in the timestamps table (even using a ONE query),
it means that a quorum of replicas must contain the list of streams.
---
Nodes automatically ensure that the latest CDC generation's list of
streams is present in the streams description table. When a new
generation appears, we only need to update the table for this
generation; old generations are already inserted.
However, we've changed the description table (from
`cdc_streams_descriptions` to `cdc_streams_descriptions_v2`). The
existing mechanism only ensures that the latest generation appears in
the new description table. We add an additional procedure that
rewrites the older generations as well, if we find that it is necessary
to do so (i.e. when some CDC log tables may contain data in these
generations).
Closes#8116
* github.com:scylladb/scylla:
tests: add a simple CDC cql pytest
cdc: add config option to disable streams rewriting
cdc: rewrite streams to the new description table
cql3: query_processor: improve internal paged query API
cdc: introduce no_generation_data_exception exception type
docs: cdc: mention system.cdc_local table
cdc: coroutinize do_update_streams_description
sys_dist_ks: split CDC streams table partitions into clustered rows
cdc: use chunked_vector for streams in streams_version
cdc: remove `streams_version::expired` field
system_distributed_keyspace: use mutation API to insert CDC streams
storage_service: don't use `sys_dist_ks` before it is started
The `query_processor::query` method allowed internal paged queries.
However, it was quite limited, hardcoding a number of parameters:
consistency level, timeout config, page size.
This commit does the following improvements:
1. Rename `query` to `query_internal` to make it obvious that this API
is supposed to be used for internal queries only
2. Extend the method to take consistency level, timeout config, and page
size as parameters
3. Remove unused overloads of `query_internal`
4. Fix a bunch of typos / grammar issues in the docstring
Until now, the lists of streams in the `cdc_streams_descriptions` table
for a given generation were stored in a single collection. This solution
has multiple problems when dealing with large clusters (which produce
large lists of streams):
1. large allocations
2. reactor stalls
3. mutations too large to even fit in commitlog segments
This commit changes the schema of the table as described in issue #7993.
The streams are grouped according to token ranges, each token range
being represented by a separate clustering row. Rows are inserted in
reasonably large batches for efficiency.
The table is renamed to enable easy upgrade. On upgrade, the latest CDC
generation's list of streams will be (re-)inserted into the new table.
Yet another table is added: one that contains only the generation
timestamps clustered in a single partition. This makes it easy for CDC
clients to learn about new generations. It also enables an elegant
two-phase insertion procedure of the generation description: first we
insert the streams; only after ensuring that a quorum of replicas
contains them, we insert the timestamp. Thus, if any client observes a
timestamp in the timestamps table (even using a ONE query),
it means that a quorum of replicas must contain the list of streams.
Test log consistency after apply_snapshot() is called.
Ensure log::last_term() log::last_conf_index() and log::size()
work as expected.
Misc cleanups.
* scylla-dev/raft-confchange-test:
raft: add a unit test for voting
raft: do not account for the same vote twice
raft: remove fsm::set_configuration()
raft: consistently use configuration from the log
raft: add ostream serialization for enum vote_result
raft: advance commit index right after leaving joint configuration
raft: add tracker test
raft: tidy up follower_progress API
raft: update raft::log::apply_snapshot() assert
raft: add a unit test for raft::log
raft: rename log::non_snapshoted_length() to log::length()
raft: inline raft::log::truncate_tail()
raft: ignore AppendEntries RPC with a very old term
raft: remove log::start_idx()
raft: return a correct last term on an empty log
raft: do not use raft::log::start_idx() outside raft::log()
raft: rename progress.hh to tracker.hh
raft: extend single_node_is_quiet test
`_range_override` is used to store the modified range the reader reads
after it has to be recreated (when recreating a reader it's read range
is reduced to account for partitions it already read). When engaged,
this field overrides the `_pr` field as the definitive range the reader
is supposed to be currently reading. Fast forwarding conceptually
overrides the range the reader is currently reading, however currently
it doesn't reset the `_range_override` field. This resulted in
`_range_override` (containing the modified pre-fast-forward range)
incorrectly overriding the fast-forwarded-to range in `_pr` when
validating the first partition produced by the just recreated reader,
resulting in a false-positive validation failure.
Fixes: #8059
Tests: unit(release)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210217164744.420100-1-bdenes@scylladb.com>
Unlike flat_mutation_reader_opt that is defined using
optimized_optional<flat_mutation_reader>, std::optional<T> does not evaluate
to `false` after being moved, only after it is explicitly reset.
Use flat_mutation_reader_opt rather than std::optional<flat_mutation_reader>
to make it easier to check if it was closed before it's destroyed
or being assigned-over.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210215101254.480228-6-bhalevy@scylladb.com>
Currently, whole topology description for CDC is stored in a single row.
This means that for a large cluster of strong machines (say 100 nodes 64
cpus each), the size of the topology description can reach 32MB.
This causes multiple problems. First of all, there's a hard limit on
mutation size that can be written to Scylla. It's related to commit log
block size which is 16MB by default. Mutations bigger than that can't be
saved. Moreover, such big partitions/rows cause reactor stalls and
negatively influence latency of other requests.
This patch limits the size of topology description to about 4MB. This is
done by reducing the number of CDC streams per vnode and can lead to CDC
data not being fully colocated with Base Table data on shards. It can
impact performance and consistency of data.
This is just a quick fix to make it easily backportable. A full solution
to the problem is under development.
For more details see #7961, #7993 and #7985.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Closes#8048
* github.com:scylladb/scylla:
cdc: Limit size of topology description
cdc: Extract create_stream_ids from topology_description_generator
Currently, whole topology description for CDC is stored in a single row.
This means that for a large cluster of strong machines (say 100 nodes 64
cpus each), the size of the topology description can reach 32MB.
This causes multiple problems. First of all, there's a hard limit on
mutation size that can be written to Scylla. It's related to commit log
block size which is 16MB by default. Mutations bigger than that can't be
saved. Moreover, such big partitions/rows cause reactor stalls and
negatively influence latency of other requests.
This patch limits the size of topology description to about 4MB. This is
done by reducing the number of CDC streams per vnode and can lead to CDC
data not being fully colocated with Base Table data on shards. It can
impact performance and consistency of data.
This is just a quick fix to make it easily backportable. A full solution
to the problem is under development.
For more details see #7961, #7993 and #7985.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Commit aab6b0ee27 introduced the
controversial new IMR format, which relied on a very template-heavy
infrastructure to generate serialization and deserialization code via
template meta-programming. The promise was that this new format, beyond
solving the problems the previous open-coded representation had (working
on linearized buffers), will speed up migrating other components to this
IMR format, as the IMR infrastructure reduces code bloat, makes the code
more readable via declarative type descriptions as well as safer.
However, the results were almost the opposite. The template
meta-programming used by the IMR infrastructure proved very hard to
understand. Developers don't want to read or modify it. Maintainers
don't want to see it being used anywhere else. In short, nobody wants to
touch it.
This commit does a conceptual revert of
aab6b0ee27. A verbatim revert is not
possible because related code evolved a lot since the merge. Also, going
back to the previous code would mean we regress as we'd revert the move
to fragmented buffers. So this revert is only conceptual, it changes the
underlying infrastructure back to the previous open-coded one, but keeps
the fragmented buffers, as well as the interface of the related
components (to the extent possible).
Fixes: #5578Closes#8106
* github.com:scylladb/scylla:
imr: switch back to open-coded description of structures
utils: managed_bytes: add a few trivial helper methods
utils: fragment_range: move FragmentedView helpers to fragment_range.hh
utils: fragment_range: add single_fragmented_mutable_view
utils: fragment_range: implement FragmentRange for fragment_range
utils: mutable_view: add front()
types: remove an unused helper function
test: mutation_test: fix memory calculations in make_fragments_with_non_monotonic_positions
test: mutation_test: remove an obsolete assertion
test: mutation_test: initialize an uninitialized variable
test: sstable_datafile_test: fix tracking of closed sstables in sstable_run_based_compaction_test
Commit aab6b0ee27 introduced the
controversial new IMR format, which relied on a very template-heavy
infrastructure to generate serialization and deserialization code via
template meta-programming. The promise was that this new format, beyond
solving the problems the previous open-coded representation had (working
on linearized buffers), will speed up migrating other components to this
IMR format, as the IMR infrastructure reduces code bloat, makes the code
more readable via declarative type descriptions as well as safer.
However, the results were almost the opposite. The template
meta-programming used by the IMR infrastructure proved very hard to
understand. Developers don't want to read or modify it. Maintainers
don't want to see it being used anywhere else. In short, nobody wants to
touch it.
This commit does a conceptual revert of
aab6b0ee27. A verbatim revert is not
possible because related code evolved a lot since the merge. Also, going
back to the previous code would mean we regress as we'd revert the move
to fragmented buffers. So this revert is only conceptual, it changes the
underlying infrastructure back to the previous open-coded one, but keeps
the fragmented buffers, as well as the interface of the related
components (to the extent possible).
Fixes: #5578
The off-by-one error would cause
test_multishard_combining_reader_non_strictly_monotonic_positions to fail if
the added range_tombstones filled the buffer exactly to the end.
In such situation, with the old loop condition,
make_fragments_with_non_monotonic_positions would add one range_tombstone too
many to the deque, violating the test assumptions.
Due to small value optimizations, the removed assertions are not true in
general. Until now, atomic_cell did not use small value optimizations, but
it will after upcoming changes.
sstable_run_based_compaction_test assumed that sstables are freed immediately
after they are fully processed.
Hovewer, since commit b524f96a74,
mutation_reader_merger releases sstables in batches of 4, which breaks the
assumption. This fix adjusts the test accordingly.
Until now, the test only kept working by chance: by coincidence, the number of
test sstables processed by merging_reader in a single fill_buffer() call was
divisible by 4. Since the test checks happen between those calls,
the test never witnessed a situation when an sstable was fully processed,
but not released yet.
The error was noticed during the work on an upcoming patch which changes the
size of mutation_fragment, and reduces the number of test sstables processed
in a single fill_buffer() call, which breaks the test.
raft::log::start_idx() is currently not meaningful
in case the log is empty.
Avoid using it in fsm::replicate_to() and avoid manual search for
previous log term, instead encapsulate the search in log::term_for().
As a side effect we currently return a correct term (0)
when log matching rule is exercised for an empty log
and the very first snapshot with term 0. Update raft_etcd_test.cc
accordingly.
This change happens to reduce the overall line count.
While at it, improve the comments in raft::replicate_to().
This patch adds several additional tests o test/cql-pytest/test_json.py
to reproduce additional bugs or clarify some non-bugs.
First, it adds a reproducer for issue #8087, where SELECT JSON may create
invalid JSON - because it doesn't quote a string which is part of a map's
key. As usual for these reproducers, the test passes on Cassandra, and fails
on Scylla (so marked xfail).
We have a bigger test translated from Cassandra's unit tests,
cassandra_tests/validation/entities/json_test.py::testInsertJsonSyntaxWithNonNativeMapKeys
which demonstrates the same problem, but the test added in this patch is much
shorter and focuses on demonstrating exactly where the problem is.
Second, this patch adds a test test verifies that SELECT JSON works correctly
for UDTs or tuples where one of their components was never set - in such a
case the SELECT JSON should also output this component, with a "null" value.
And this test works (i.e., produces the same result in Cassandra and Scylla).
This test is interesting because it shows that issue #8092 is specific to the
case of an altered UDT, and doesn't happen for every case of null
component in a UDT.
Refs #8087
Refs #8092
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210216150329.1167335-1-nyh@scylladb.com>
We see long reactor stalls from `logalloc::prime_segment_pool`
in debug mode yet the stall detector's purpose is to detect
reactor stalls during normal operation where they can increase
the latency of other queries running in parallel.
Since this change doesn't actually fix the stalls but rather
hides them, the following annotations will just refrence
the respective github issues rather than auto-close them.
Refs #7150
Refs #5192
Refs #5960
Restore blocked_reactor_notify_ms right before
starting storage_proxy. Once storage_proxy is up, this node
affects cluster latency, and so stalls should be reported so
they can be fixed.
Test: secondary_index_test --blocked-reactor-notify-ms 1 (release)
DTest: CASSANDRA_DIR=../scylla/build/release SCYLLA_EXT_OPTS="--blocked-reactor-notify-ms 2" ./scripts/run_test.sh materialized_views_test:TestMaterializedViews.interrupt_build_process_with_resharding_half_to_max_test
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210216112052.27672-1-bhalevy@scylladb.com>
After switching cells storage onto compact radix tree it
becomes useful to know the tree nodes' sizes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now when the 3rd storage type (radix tree) is all in, old
storage can be safely removed. The result is:
1. memory footprint
sizeof(class row): 112 => 16 bytes
sizeof(rows_entry): 126 => 120 bytes
the "in cache" value depends on the number of cells:
num of cells master patch
1 752 656
2 808 712
3 864 768
4 920 824
5 968 936
6 1136 992
...
16 1840 1672
17 1904 1992 (+88)
18 1976 2048 (+72)
19 2048 2104 (+56)
20 2120 2160 (+40)
21 2184 2208 (+24)
22 2256 2264 ( +8)
23 2328 2320
...
32 2960 2808
After 32 cells the storage switches into rbtree with
24-bytes per-cell overhead and the radix tree improvement
rocketlaunches
64 7872 6056
128 15040 9512
256 29376 18568
2. perf_mutation test is enhanced by this series and the
results differ depending on the number of columns used
tps value
--column-count master patch
1 59.9k 57.6k (-3.8%)
2 59.9k 57.5k
4 59.8k 57.6k
8 57.6k 57.7k <- eq
16 56.3k 57.6k
32 53.2k 57.4k (+7.9%)
A note on this. Last time 1-column test was ~5% worse which
was explained by inline storage of 5 cells that's present on
current implementation and was absent in radix tree.
An attempt to make inline storage for small radix trees
resulted in complete loss of memory footprint gain, but gave
fraction of percent to perf_mutation performance. So this
version doesn't have inline nodes.
The 1.2% improvement from v2 surprisingly came from the
tree::clone_from() which in v2 was work-around-ed by slow
walk+emplace sequence while this version has the optimized
API call for cloning.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
As Gleb suggested in a previous review, remove ticker from raft and
leave calling tick() to external code.
While there, tick faster to speed up tests.
* https://github.com/alecco/scylla/tree/tests-17-remove-ticker:
raft: replication test: reduce ticker from 100ms to 1ms
raft: drop ticker from raft
In some places scylla clones collections of objects, so it's
sometimes needed to measure the speed of this operation.
This patch adds a placeholder for it, but no implementations
for any supported collections. It will be added soon for radix
tree.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The --column-count makes the test generate schema with
the given numbers of columns and make mutation maker
fill random column with the value on each iteration.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Teach the schema builder and test itself to work on more
than one regular column, but for now only use 1, as before.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
in all expressions' from Nadav Har'El.
This series fixes#5024 - which is about adding support for nested attribute
paths (e.g., a.b.c[2]) to Alternator. The series adds complete support for this
feature in ProjectionExpression, ConditionExpression, FilterExpression and
UpdateExpression - and also its combination with ReturnValues. Many relevant
tests - and also some new tests added in this series - now pass.
The first patch in the series fixes#8043 a bug in some error cases in
conditions, which was discovered while working in this series, and is
conceptually separate from the rest of the series.
Closes#8066
* github.com:scylladb/scylla:
alternator: correct implemention of UpdateItem with nested attributes and ReturnValues
alternator: fix bug in ReturnValues=UPDATED_NEW
alternator: implemented nested attribute paths in UpdateExpression
alternator: limit the depth of nested paths
alternator: prepare for UpdateItem nested attribute paths
alternator: overhaul ProjectionExpression hierarchy implementation
alternator: make parsed::path object printable
alternator-test: a few more ProjectionExpression conflict test cases
alternator-test: improve tests for nested attributes in UpdateExpression
alternator: support attribute paths in ConditionExpression, FilterExpression
alternator-test: improve tests for nested attributes in ConditionExpression
alternator: support attribute paths in ProjectionExpression
alternator: overhaul attrs_to_get handling
alternator-test: additional tests for attribute paths in ProjectionExpression
alternator-test: harden attribute-path tests for ProjectionExpression
alternator: fix ValidationException in FilterExpression - and more
This patch adds several more tests reproducing bugs in toJson() and
SELECT JSON.
First add two xfailing tests reproducing two toJson() issues - #7988
and #8002. The first is that toJson() incorrectly formats values of the
"time" type - it should be a string but Scylla forgets the quotes.
The second is that toJson() format "decimal" values as JSON numbers
without using an exponent, resulting in memory allocation failure
for numbers with high exponents, like 1e1000000000.
The actual test for 1e1000000000 has to be skipped because in
debug build mode we get a crash trying this huge allocation.
So instead, we check 1e1000 - this generates a string of 1000
characters, which is much too much (should just be "1e1000")
but doesn't crash.
Then we add a reproducing test for issue #8077: When using SELECT JSON
on a function, such as count(*), ttl(v) or intAsBlob(v), Cassandra has
a specific way how it formats the result in JSON, and Scylla should do
it the same way unless we have a good reason not to.
As usual, the new tests passes on Cassandra, fails on Scylla, so is marked
xfail.
Refs #7988
Refs #8002
Refs #8077.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210214210727.1098388-1-nyh@scylladb.com>
To speed up replication test reduce the tick time from 100ms to 1ms
Speed up: debug 3.7 to 2.5, dev 2.9 to 2.1 seconds
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
This test reproduces issue #7987, where Scylla cannot set a time column
with an integer - wheres the documentation says this should be possible
and it also works in Cassandra.
The test file includes tests for both ways of setting a time column
(using an integer and a string), with both prepared and unprepared
statements, and demonstrates that only one combination fails in Scylla -
an unprepared statement with an integer. This test xfails on Scylla
and passes on Cassandra, and the rest pass on both.
Refs #7987.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210128215103.370723-1-nyh@scylladb.com>
This patch fixes the last missing part of nested attribute support in
UpdateItem - returning the correct attributes when ReturnValues is requested.
When the expression says "a.b = :val" and ReturnValues is set to UPDATED_OLD
or UPDATED_NEW, only the actual updated attribute a.b should be returned, not
the entire top-level attribute a as we did before this patch.
This patch was made very simple because our existing hierarchy_filter()
function already does exactly the right thing, and can trivially be made to
accept any attribute_path_map<T> (in our case attribute_path_map<action>),
not just attrs_to_get as it did until now.
This patch also adds several more checks to the test in test_returnvalues.py
to improve the test's coverage even more. Interestingly, I discovered two
esoteric cases where DynamoDB does something which makes little sense, but
apparently simplified their implementation - but the beautiful thing is that
it also simplifies our implementation! See long comments about these two
cases in the test code.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds full support for nested attribute paths (e.g., a.b[3].c)
in UpdateExpression. After in previous patches we already added such
support for ProjectionExpression, ConditionExpression and FilterExpression
this means the nested attribute paths feature is now complete, so we
remove the warning from the documents. However, there is one last loose
end to tie and we will do it in the next patch: After this patch, the
combination of UpdateExpression with nested attributes and ReturnValues
is still wrong, and the test for it in test_returnvalues.py still xfails.
Note that previous patches already implemented support for attribute paths
in expression evaluations - i.e., the right-hand side of UpdateExpression
actions, and in this patch we just needed to implement the left hand side:
When an update action is on an attribute a.b we need to read the entire
content of the top-level a (an RWM operation), modify just the b part of
its json with the result of the action, and finally write back the entire
content of a. Of course everything gets complicated by the fact that we
can have multiple actions on multiple pieces of the same JSON, and we also
need to detect overlapping and conflicting actions (we already have this
detection in the attribute_path_map<> class we introduced in a previous
patch).
I decided to leave one small esoteric difference, reproduced by the xfailing
test_update_expression.py::test_nested_attribute_remove_from_missing_item:
As expected, "SET x.y = :val" fails for an item if its attribute x doesn't
exist or the item itself does not exist. For the update expression
"REMOVE x.y", DynamoDB fails if the attribute x doesn't exist, but oddly
silently passes if the entire item doesn't exist. Alternator does not
currently reproduce this oddity - it will fail this write as well.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
DynamoDB limits the depth of a nested path in expressions (e.g. "a.b.c.d")
to 32 levels. This patch adds the same limit also to Alternator.
The exact value of this limit is less important (although it did make
sense to choose the same limit as DynamoDB does), but it's important
to have *some* limit: It's often convenient to handle paths with a
recursive algorithm, and if we allow unlimited path depth, it can
result in unlimited recursion depth, and a crash. Let's avoid this
possibility.
We detect the over-long path while building the parsed::path object
in the parser, and generate a parse error.
This patch also includes a test that verifies that both Alternator
and DynamoDB have the same 32-level nesting limit on paths.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch prepares UpdateItem for updating of nested attribute paths
(e.g., "SET a.b = :val"), but does not yet support them.
Instead of _update_expression holding an unsorted list of "actions",
we change it to hold a attribute_path_map of actions. This will allow
us to process all the actions on a top-level attribute together, and
moreover gets us "for free" the correct checking for overlapping and
conflicting updates - exactly the same checking we already had in
attribute_path_map for ProjectionExpression. Other than this change,
most of this patch is just code movement, not functional changes.
After this patch, the tests for update path overlap and conflict pass:
test_update_expression_multi_overlap_nested and
test_update_expression_multi_conflict_nested.
We can also mark test_update_expression_nested_attribute_rhs as passing -
this test involves an attribute path in the right-hand-side of an update,
but the left-hand-side is still a top-level attribute, so it works (it
actually worked before this patch - it started working when we implemented
attribute paths in expressions, for ConditionExpression and
FilterExpression).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>