Close _delegate if it's engaged both in the close() method
and when ever it is currently reset by _delegate = {}.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We don't own _source therefore do not close it.
That said, we still need to make sure that the reversing reader
itself is closed to calm down the check when it's destroyed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The underlying reader is owned by the caller if it is moved to it,
but not if it was constructed with a reference to the underlying reader.
Close the underlying reader on close() only in the former case.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
`with_closeable` simplifies scoped use of
flat_mutation_reader, making sure to always close
the reader after use.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This series adds a wrapper for the default rjson allocator which throws on allocation/reallocation failures. It's done to work around several rapidjson (the underlying JSON parsing library) bugs - in a few cases, malloc/realloc return value is not checked, which results in dereferencing a null pointer (or an arbitrary pointer computed as 0 + `size`, with the `size` parameter being provided by the user). The new allocator will throw an `rjson:error` if it fails to allocate or reallocate memory.
This series comes with unit tests which checks the new allocator behavior and also validates that an internal rapidjson structure which we indirectly rely upon (Stack) is not left in invalid state after throwing. The last part is verified by the fact that its destructor ran without errors.
Fixes#8521
Refs #8515
Tests:
* unit(release)
* YCSB: inserting data similar to the one mentioned in #8515 - 1.5MB objects clustered in partitions 30k objects in size - nothing crashed during various YCSB workloads, but nothing also crashed for me locally before this patch, so it's not 100% robust
relevant YCSB workload config for using 1.5MB objects:
```yaml
fieldcount=150
fieldlength=10000
```
Closes#8529
* github.com:scylladb/scylla:
test: add a test for rjson allocation
test: rename alternator_base64_test to alternator_unit_test
rjson: add a throwing allocator
sstable cells are parsed into temporary_buffers, which causes large contiguous allocations for some cells.
This is fixed by storing fragments of the cell value in a fragmented_temporary_buffer instead.
To achieve this, this patch also adds new methods to the fragmented_temporary_buffer(size(), ostream& operator<<()) and adds methods to the underlying parser(primitive_consumer) for parsing byte strings into fragmented buffers.
Fixes#7457Fixes#6376Closes#8182
* github.com:scylladb/scylla:
primitive_consumer: keep fragments of parsed buffer in a small_vector
sstables: add parsing of cell values into fragmented buffers
sstables: add non-contiguous parsing of byte strings to the primitive_consumer
utils: add ostream operator<<() for fragmented_temporary_buffer::view
compound_type: extend serialize_value for all FragmentedView types
"
Memory usage is considerably reduced by making reshape switch to partitioned set,
given that input sstables are disjoint. This will benefit reshape for all
strategies, not only TWCS.
Write amplification is reduced a lot by compacting all input sstables at once,
which is possible given that unbounded memory usage is fixed too.
With both these issues fixed, TWCS reshape will be much more efficient.
tests: mode(dev).
"
* 'twcs_reshape_fixes' of github.com:raphaelsc/scylla:
tests: sstables: Check that TWCS is able to reshape disjoint sstables efficiently
TWCS: Reshape all sstables in a time window at once if they're disjoint
sstables: Extract code to count amount of overlapping into a function
LCS: reshape: Fix overlapping check when determining if a sstable set is disjoint
compaction: Make reshape compaction always use partitioned_sstable_set
compaction: Allow a compaction type to override the sstable_set for input sstables
Reduce rebuilds and build time by removing unnecessary includes. Along the way,
improve header sanity.
Ref #1.
Test: dev-headers, unit(dev).
Closes#8524
* github.com:scylladb/scylla:
treewide: remove inclusions of storage_proxy.hh from headers
storage_proxy: unnest coordinator_query_result
treewide: make headers self-sufficient
utils: intrusive_btree: add missing #pragma once
storage_proxy.hh is huge and includes many headers itself, so
remove its inclusions from headers and re-add smaller headers
where needed (and storage_proxy.hh itself in source files that
need it).
Ref #1.
In issue #5021 we noted that Alternator's equality operator needs to be
fixed for the case of comparing two sets, because the equality check needs
to take into account the possibility of different element order.
Unfortunately, we fixed only the equality check operator, but forgot there
is also an inequality operator!
So in this patch we fix the inequality operator, and also add a test for
it that was previously missing.
The implementation of the inequality operator is trivial - it's just the
negation of the equality test. Our pre-existing tests verify that this is
the correct implementation (e.g., if attribute x doesn't exist, then "x = 3"
is false but "x <> 3" is true).
Refs #5021Fixes#8513
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210419141450.464968-1-nyh@scylladb.com>
In issue #5021 we noticed that the equality check in Alternator's condition
expressions needs to handle sets differently - we need to compare the set's
elements ignoring their order. But the implementation we added to fix that
issue was only correct when the entire attribute was a set... In the
general case, an attribute can be a nested document, with only some
inner set. The equality-checking function needs to tranverse this nested
document, and compare the sets inside it as appropriate. This is what
we do in this patch.
This patch also adds a new test comparing equality of a nested document with
some inner sets. This test passes on DynamoDB, failed on Alternator before
this patch, and passes with this patch.
Refs #5021Fixes#8514
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210419184840.471858-1-nyh@scylladb.com>
When a condition expression (ConditionExpression, FilterExpression, etc.)
checks for equality of two item attributes, i.e., "x = y", and when one of
these attributes was missing we correctly returned false.
However, we also need to return false when *both* attributes are missing in
the item, because this is what DynamoDB does in this case. In other words
an unset attribute is never equal to anything - not even to another unset
attribute. This was not happening before this patch:
When x and y were both missing attributes, Alternator incorrectly returned
true for "x = y", and this patch fixes this case. It also fixes "x <> y"
which should to be true when both x and y are unset (but was false
before this patch).
The other comparison operators - <, <=, >, >=, BETWEEN, were all
implemented correctly even before this patch.
This patch also includes tests for all the two-unset-attribute cases of
all the operators listed above. As usual, we check that these tests pass
on both DynamoDB and Alternator to confirm our new behavior is the correct
one - before this patch, two of the new tests failed on Alternator and
passed on DynamoDB.
Fixes#8511
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210419123911.462579-1-nyh@scylladb.com>
Currently said method uses the system semaphore as a catch-all for all
scheduling groups it doesn't know about. This is incompatible with the
recent forward-porting of the service-level infrastructure as it means
that all service level related scheduling groups will fall back to the
system scheduling group, which causes two problems:
* They will experience much limited concurrency, as the system semaphore
is assigned much less count units, to match the much more limited
internal traffic.
* They compete with internal reads, severely impacting the respective
internal processes, potentially causing extreme slowdown, or even
deadlock in the case of an internal query executed on behalf of a
user query being blocked on the latter.
Even if we don't have any custom service level scheduling groups at the
moment, it is better to change this such that unknown scheduling groups
fall-back to using the user semaphore. We don't expect any new internal
scheduling group to pop up any time soon (and if they do we can adjust
get_reader_concurrency_semaphore() accordingly), but we do expect user
scheduling groups to be created in the future, even dynamically.
To minimize the chance of the wrong workload being associated with the
user semaphore, all statically created scheduling groups are now
explicitly listed in `get_reader_concurrency_semaphore()`, to make their
association with the respective semaphore explicit and documented.
Added a unit test which also checks the correct association for all
these scheduling groups.
Fixes: #8508
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210420105156.94002-1-bdenes@scylladb.com>
Support for random-seed was added in 4ad06c7eeb
but the program still uses std::rand() to draw random keys.
Use tests::random::get_int instead so we can get reprodicible
sequence of keys given a particular random-seed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210418104455.82086-1-bhalevy@scylladb.com>
The database.hh is the central recursive-headers knot -- it has ~50
includes. This patch leaves only 34 (it remains the champion though).
Similar thing for database.cc.
Both changes help the latter compile ~4% faster :)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210414183107.30374-1-xemul@scylladb.com>
This patch fixes cql-pytest/run-cassandra to work on systems which
default to Java 11, including Fedora 33.
Recent versions of Cassandra can run on Java 11 fine, but requires a
bunch of weird JVM options to work around its JPMS (Java Platform Module
System) feature. Cassandra's start scripts require these options to
be listd in conf/jvm11-server.options, which is read by the startup
script cassandra.in.sh.
Because our "run-cassandra" builds its own "conf" directory, we need
to create a jvm11-server.options file in that directory. This is ugly,
but unfortunately necessary if cql-pytest/run-cassandra is to run with
on systems defaulting to Java 11.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210406220039.195796-1-nyh@scylladb.com>
We currently only update the failure detector for a node when a higher
version of application state is received. Since gossip syn messages do
not contain application state, so this means we do not update the
failure detector upon receiving gossip syn messages, even if a message
from peer node is received which implies the peer node is alive.
This patch relaxes the failure detector update rule to update the
failure detector for the sender of gossip messages directly.
Refs #8296Closes#8476
When a particular partition exists in at least one sstable, the cache
expects any single-partition query to this partition to return a `partition_start`
fragment, even if the result is empty.
In `time_series_sstable_set::create_single_key_sstable_reader` it could
happen that all sstables containing data for the given query get
filtered out and only sstables without the relevant partition are left,
resulting in a reader which immediately returns end-of-stream (while it
should return a `partition_start` and if not in forwarding mode, a
`partition_end`). This commit fixes that.
We do it by extending the reader queue (used by the clustering reader
merger) with a `dummy_reader` which will be returned by the queue as
the very first reader. This reader only emits a `partition_start` and,
if not in forwarding mode, a `partition_end` fragment.
Fixes#8447.
Closes#8448
"
These methods are generic ways to query a mutation source. At least they
used to be, but nowadays they are pretty specific to how tables are
queried -- they use a querier cache to lookup queriers from and save
them into. With the coming changes to how permits are obtained, they are
about to get even more specific to tables. Instead of forcing the
genericity and keep adding new parameters, this patchset bites the
bullet and moves them to table. `data_query()` is inlined into
`table::query()`, while `mutation_query()` is replaced with
`table::mutation_query()`.
The only other users besides table are tests and they are adjusted to
use similarly named local methods that just combine the right querier
with the right result builder. This combination is what the tests really
want to test, as this is also what is used by the table methods behind
the scenes.
Tests: unit(release, debug)
"
* 'mutation-query-move-query-methods-into-table/v1' of https://github.com/denesb/scylla:
mutation_query: remove now unused mutation_query()
test: mutation_query_test: use local mutation_query() implementation
database: mutation_query(): use table::mutation_query()
table: add mutation_query()
query: remove the now unused data_query()
test: mutation_query_test: use local data_query() implementation
table: query(): inline data_query() code into query()
table: make query() a coroutine
This series introduces service level syntax borrowed from https://docs.scylladb.com/using-scylla/workload-prioritization/ , but without workload prioritization itself - just for the sake of using identical syntax to provide different parameters later. The new parameters may include:
* per-service-level timeouts
* oltp/olap declaration, which may change the way Scylla treats long requests - e.g. time them out (the oltp way) or keep them sustained with empty pages (the olap way)
Refs #7617Closes#7867
* github.com:scylladb/scylla:
transport: initialize query state with service level controller
main: add initializing service level data accessor
service: make enable_shared_from_this inheritance public
cql3: add SERVICE LEVEL syntax (without an underscore)
unit test: Add unit test for per user sla syntax
cql: Add support for service level cql queries
auth: Add service_level resource for supporting in authorization of cql service_level
cql: Support accessing service_level_controller from query state
instantiate and initialize the service_level_controller
qos: Add a standard implementation for service level data accessor
qos: add waiting for the updater future
service/qos: adding service level controller
service_levels: Add documentation for distributed tables
service/qos: adding service level table to the distributed keyspace
service/qos: add common definitions
auth: add support for role attributes
This commit adds the infrastructure needed to test per user sla,
more specificaly, a service level accessor that triggers the
update_service_levels_from_distributed_data function uppon any
change to the dystributed sla data.
A test was added that indirectly consumes this infrastructure by
changing the distributed service level data with cql queries.
Message-Id: <23b2211e409446c4f4e3e57b00f78d9ff75fc978.1609249294.git.sarna@scylladb.com>
queries
In order to be able to manage service_level configuration one must be authorized
to do so, or to be a superuser. This commit adds the support for service_levels
resource. Since service_levels are relative, reconfiguring one service level is not locallized
only to that service level and will affect the QOS for all of the service levels,
so there is not much sense of granting permissions to manage individual service_levels.
This is why only root resource named service_levels that represents all service levels is used.
This commit also implements the unit test additions for the newly introduced resource.
Message-Id: <81ab16fa813b61be117155feea405da6266921e3.1609237687.git.sarna@scylladb.com>
fixes AddressSanitizer: stack-buffer-underflow on address 0x7ffd9a375820 at pc 0x555ac9721b4e bp 0x7ffd9a374e70 sp 0x7ffd9a374620
Backend registry holds a unique pointer to the backend implementation
that must outlive the whole tracing lifetime until the shutdown call.
So it must be catched/moved before the program exits its scope by
passing out the lambda chain.
Regarding deletion of the default destructor: moving object requires
a move constructor (for do_with) that is not implicitly provided if
there is a user-defined object destructor defined even tho its impl
is default.
Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
Closes#8461
The merger could return end-of-stream if some (but not all) of the
underlying readers were empty (i.e. not even returning a
`partition_start`). This could happen in places where it was used
(`time_series_sstable_set::create_single_key_sstable_reader`) if we
opened an sstable which did not have the queried partition but passed
all the filters (specifically, the bloom filter returned a false
positive for this sstable).
The commit also extends the random tests for the merger to include empty
readers and adds an explicit test case that catches this bug (in a
limited scope: when we merge a single empty reader).
It also modifies `test_twcs_single_key_reader_filtering` (regression
test for #8432) because the time where the clustering key filter is
invoked changes (some invocations move from the constructor of the
merger to operator()). I checked manually that it still catches the bug
when I reintroduce it.
Fixes#8445.
Closes#8446
This is a translation of Cassandra's CQL unit test source file
validation/entities/SecondaryIndexOnStaticColumnTest.java into our
our cql-pytest framework.
This test file checks various features of indexing (with secondary index)
static rows. All these tests pass on Cassandra, but fail on Scylla because
of issue #2963 - we do not yet support indexing of a static row.
The failing test currently fail as soon as they try to create the index,
with the message:
"Indexing static columns is not implemented yet."
Refs #2963.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210411153014.311090-1-nyh@scylladb.com>
This patch avoids an annoying warning
Warning: Unknown config ini key: flake8-ignore
when running one of the pytest-based test projects (cql-pytest,
alternator and redis) on recent versions of pytest.
In commit 2022da2405, we added to the
toplevel Scylla directory a "tox.ini" file with some intention to
configure Python syntax checking. One of the configurations in this
tox.ini is:
[pytest]
flake8-ignore =
E501
It turns out that pytest, if a certain test directory does not have its
own pytest.ini file, looks up in ancestor directory for various
configuration files (the configuration file precedence is described in
https://docs.pytest.org/en/stable/customize.html), and this includes
this tox.ini configuration section. Recent versions of pytest complain
about the "flake8-ignore" configuration parameter, which they don't
recognize. This parameter may be ok (?) if you install a flake8 pytest
plugin, but we do not require users to do this for running these tests.
Moreover, whatever noble intentions this commit and its tox.ini had,
nobody ever followed up on it. The three pytest-based test directories
never adhered to flake8's recommended syntax, and never intended to do
so. None of the developers of these tests use flake8, or seem to wish
to do so. If this ever changes, we can change the pytest.ini or undo this
commit and go back to a top-level tox.ini, but I don't see this happening
anytime soon.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210411085708.300851-1-nyh@scylladb.com>
There are some users of that tri_comparator which are also
converted to strong_ordering. Most of the code using those
is, in turn, already handling return values interchangeably.
The bound_view::tri_compare, which's used by the guy, is
still returning int.
tests: unit(dev)
* xemul/br-position-tri-compare:
code: Relax position_in_partition::tri_compare users
position_in_partition: Convert tri_compare to strong_ordering
test: Convert clustering_fragment_summary::tri_cmp to strong_ordering
repair: Convert repair_sync_boundary::tri_compare to strong_ordering
view: Don't expect int from position_in_partition::tri_compare
There are some pieces left doing res <=> 0 with the
res now being a strong_ordering itself. All these can
be just dropped.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add a local `mutation_query()` variant, which only contains the pieces
of logic the test really wants to test: invoking
`mutation_querier::consume_page()` with a `reconcilable_result_builder`.
This allows us to get rid of the now otherwise unused
`mutation_query()`.
The test only wants to test result size calculation so it doesn't need
the whole `data_query()` logic. Replace the call to `data_query()` with
one to a local alternative which contains just the necessary bits --
invoking `data_querier::consume_page()` with the right result builder.
This allows us get rid of the now otherwise unused `data_query()`.
Right now call to .row() method may create hash on row's cells.
It's counterintuitive to see a const method that transparently
changes something it points to. Since the only caller of a row()
who knows whether the hash creation is required is the cache
reader, it's better to move the call to prepare_hash() into it.
Other than making the .row() less surprising this also helps to
get rid of the whole method by the next patches.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method in question is test-only helper, there's no
need in keeping it as a part of the API.
Another reason to move is that the method is O(number of
rows) and doesn't preempt while looping, but cursor code
users try hard not to stall the reactor. So even though
this method has a meaningful semantics within the class,
it will better be reinvented if needed in core code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The filter passed to `min_position_reader_queue`, which was used by
`clustering_order_reader_merger`, would incorrectly include sstables as
soon as they passed through the PK (bloom) filter, and would include
sstables which didn't pass the PK filter (if they passed the CK
filter). Fortunately this wouldn't cause incorrect data to be returned,
but it would cause sstables to be opened unnecessarily (these sstables
would immediately return eof), resulting in a performance drop. This commit
fixes the filter and adds a regression test which uses statistics to
check how many times the CK filter was invoked.
Fixes#8432.
Closes#8433