The value in question is cached from db::system_keyspace
for places that want to have it without waiting for
futures. So far the only place is database counters code,
so keep the value on database itself. Next patches will
make use of it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This abstraction is used to merge the output of multiple readers, each
opened for a single partition query, into a non-decreasing stream
of mutation_fragments.
It is similar to `mutation_reader_merger`,
but an important difference is that the new merger may select new readers
in the middle of a partition after it already returned some fragments
from that partition. It uses the new `position_reader_queue` abstraction
to select new readers. It doesn't support multi-partition (ring range) queries.
The new merger will be later used when reading from sstable sets created
by TimeWindowCompactionStrategy. This strategy creates many sstables
that are mostly disjoint w.r.t the contained clustering keys, so we can
delay opening sstable readers when querying a partition until after we have
processed all mutation fragments with positions before the keys
contained by these sstables.
A microbenchmark was added that compares the existing combining reader
(which uses `mutation_reader_merger` underneath) with a new combining reader
built using the new `clustering_order_reader_merger` and a simple queue of readers
that returns readers from some supplied set. The used set of readers is built from the following
ranges of keys (each range corresponds to a single reader):
`[0, 31]`, `[30, 61]`, `[60, 91]`, `[90, 121]`, `[120, 151]`.
The microbenchmark runs the reader and divides the result by the number of mutation fragments.
The results on my laptop were:
```
$ build/release/test/perf/perf_mutation_readers -t clustering_combined.* -r 10
single run iterations: 0
single run duration: 1.000s
number of runs: 10
test iterations median mad min max
clustering_combined.ranges_generic 2911678 117.598ns 0.685ns 116.175ns 119.482ns
clustering_combined.ranges_specialized 3005618 111.015ns 0.349ns 110.063ns 111.840ns
```
`ranges_generic` denotes the existing combining reader, `ranges_specialized` denotes the new reader.
Split from https://github.com/scylladb/scylla/pull/7437.
Closes#7688
* github.com:scylladb/scylla:
tests: mutation_source_test for clustering_order_reader_merger
perf: microbenchmark for clustering_order_reader_merger
mutation_reader_test: test clustering_order_reader_merger in memory
test: generalize `random_subset` and move to header
mutation_reader: introduce clustering_order_reader_merger
In issue #7722, it was suggested that we should port Cassandra's CQL unit
tests into our own repository, by translating the Java tests into Python
using the new cql-pytest framework. Cassandra's CQL unit test framework is
orders of magnitude faster than dtest, and in-tree, so Cassandra have been
moving many CQL correctness tests there, and we can also benefit from their
test cases.
In this patch, we take the first step in a long journey:
1. I created a subdirectory, test/cql-pytest/cassandra_tests, where all the
translated Cassandra tests will reside. The structure of this directory
will mirror that of the test/unit/org/apache/cassandra/cql3 directory in
the Cassandra repository.
pytest conveniently looks for test files recursively, so when all the
cql-pytest are run, the cassandra_tests files will be run as well.
As usual, one can also run only a subset of all the tests, e.g.,
"test/cql-pytest/run -vs cassandra_tests" runs only the tests in the
cassandra_tests subdirectory (and its subdirectories).
2. I translated into Python two of the smallest test files -
validation/entities/{TimeuuidTest,DataTypeTest}.java - containing just
three test functions.
The plan is to translate entire Java test files one by one, and to mirror
their original location in our own repository, so it will be easier
to remember what we already translated and what remains to be done.
3. I created a small library, porting.py, of functions which resemble the
common functions of the Java tests (CQLTester.java). These functions aim
to make porting the tests easier. Despite the resemblence, the ported code
is not 100% identical (of course) and some effort is still required in
this porting. As we continue this porting effort, we'll probably need
more of these functions, can can also continue to improve them to reduce
the porting effort.
Refs #7722.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201201192142.2285582-1-nyh@scylladb.com>
This series introduces a `large_data_counters` element to `scylla_metadata` component to explicitly count the number of `large_{partitions,rows,cells}` and `too_many_rows` in the sstable. These are accounted for in the sstable writer whenever the respective large data entry is encountered.
It is taken into account in `large_data_handler::maybe_delete_large_data_entries`, when engaged.
Otherwise, if deleting a legacy sstable that has no such entry in `scylla_metadata`, just revert to using the current method of comparing the sstable's `data_size` to the various thresholds.
Fixes#7668
Test: unit(dev)
Dtest: wide_rows_test.py (in progress)
Closes#7669
* github.com:scylladb/scylla:
docs: sstable-scylla-format: add large_data_stats subcomponent
large_data_handler: maybe_delete_large_data_entries: use sstable large data stats
large_data_handler: maybe_delete_large_data_entries: accept shared_sstable
large_data_handler: maybe_delete_large_data_entries: move out of line
sstables: load large_data_stats from scylla_metadata
sstables: store large_data_stats in scylla_metadata
sstables: writer: keep track of large data stats
large_data_handler: expose methods to get threshold
sstables: kl/writer: never record too many rows
large_data_handler: indicate recording of large data entries
large_data_handler: move constructor out of line
In test/cql-pytest/run.py we have a 200 second timeout to boot Scylla.
I never expected to reach this timeout - it normally takes (in dev
build mode) around 2 seconds, but in one run on Jenkins we did reach it.
It turns out that the code does not recognize this timeout correctly,
thought that Scylla booted correctly - and then failed all the
subtests when they fail to connect to Scylla.
This patch fixes the timeout logic. After the timeout, if Scylla's
CQL port is still not responsive, the test run is failed - without
trying to run many individual tests.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201201150927.2272077-1-nyh@scylladb.com>
When a row was inserted into a table with no regular columns, and no
such row existed in the first place, postimage would not be produced.
Fix this.
Fixes#7716.
Closes#7723
If the sstable has scylla_metadata::large_data_stats use them
to determine whether to delete the corresponding large data records.
Otherwise, defer to the current method of comparing the sstable
data_size to the respective thresholds.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Since the actual deletion if the large data entries
is done in the background, and we don't captures the shared_sstable,
we can safely pass it to maybe_delete_large_data_entries when
deleting the sstable in sstable::unlink and it will be release
as soon as maybe_delete_large_data_entries returns.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Load the large data stats from the scylla_metadata component
if they are present. Otherwise, if we're opening a legacy sstable
that has scylla_metadata_type::LargeDataStats, leave
sstable::_large_data_stats disengaged.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Store the large data statistics in the scylla_metadata component.
These will be retrieved when loading the sstable and be
used for determining whether to delete the corresponding
large data entries upon sstable deletion.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Previously, statement_restrictions::find_idx() would happily return an
index for a non-EQ restriction (because it checked only the column
name, not the operator). This is incorrect: when the selected index
is for a non-EQ restriction, it is impossible to query that index
table.
Fixes#7659.
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#7665
* seastar 010fb0df1e...8b400c7b45 (6):
> append_challenged_posix_file_impl::read_dma: allow iovec to cross _logical_size
> Merge "Extend per task-queue timing statistics" from Pavel E
> tls_test: Create test certs at build time
> cook: upgrade hwloc version
> memory: rate-limit diagnostics messages
> util/log: add rate-limited version of writer version of log()
Currently, if the user provides a cell name with too many components,
we will accept it and construct an invalid clusterin key. This may
result in undefined behavior down the stream.
It was caught by ASAN in a debug build when executing dtest
cql_tests.py:MiscellaneousCQLTester.cql3_insert_thrift_test with
nodetool flush manually added after the write. Triggered during
sstable writing to an MC-format sstable:
seastar::shared_ptr<abstract_type const>::operator*() const at ././seastar/include/seastar/core/shared_ptr.hh:577
sstables::mc::clustering_blocks_input_range::next() const at ./sstables/mx/writer.cc:180
To prevent corrupting the state in this way, we should fail
early. This patch addds validation which will fail thrift requests
which attempt to create invalid clustering keys.
Fixes#7568.
Example error:
Internal server error: Cell name of ks.test has too many components, expected 1 got 2 in 0x0004000000040000017600
Message-Id: <1605550477-24810-1-git-send-email-tgrabiec@scylladb.com>
This patch adds an option to scylla_setup to configure an rsyslog destination.
The monitoring stack has an option to get information from rsyslog it
requires that rsyslog on the scylla machines will send the trace line to
it.
The configuration will be in a Scylla configuration file, so it is safe to run it multiple times.
Fixes#7589
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Closes#7634
* github.com:scylladb/scylla:
dist/common/scripts/scylla_setup: Optionally config rsyslog destination
Adding dist/common/scripts/scylla_rsyslog_setup utility
This patch adds an option to scylla_setup to configure an rsyslog
destination.
The monitoring stack has an option to get information from rsyslog, it
requires that rsyslog on the scylla machines will send the trace line to
it.
If the /etc/rsyslog.d/ directory exists (that means the current system
runs rsyslog) it will ask if to add rsyslog configuration and if yes, it
would run scylla_rsyslog_setup.
Fixes#7589
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
* scylla-dev/snapshot_fixes_v1:
raft: ignore append_reply from a peer in SNAPSHOT state
raft: Ignore outdated snapshots
raft: set next_idx to correct value after snapshot transfer
This abstraction is used to merge the output of multiple readers, each
opened for a single partition query, into a non-decreasing stream
of mutation_fragments.
It is similar to `mutation_reader_merger`,
an important difference is that the new merger may select new readers
in the middle of a partition after it already returned some fragments
from that partition. It uses the new `position_reader_queue` abstraction
to select new readers. It doesn't support multi-partition (ring range) queries.
The new merger will be later used when reading from sstable sets created
by TimeWindowCompactionStrategy. This strategy creates many sstables
that are mostly disjoint w.r.t the contained clustering keys, so we can
delay opening sstable readers when querying a partition until after we have
processed all mutation fragments with positions before the keys
contained by these sstables.
Fix#7680 by never using secondary index for multi-column restrictions.
Modify expr::is_supported_by() to handle multi-column correctly.
Tests: unit (dev)
Closes#7699
* github.com:scylladb/scylla:
cql3/expr: Clarify multi-column doesn't use indexing
cql3: Don't use index for multi-column restrictions
test: Add eventually_require_rows
The first two patches in this series are small improvements to cql-pytest to prepare for the third and main patch. This third patch adds cql-pytest tests which check that we fail CQL queries that try to inject non-ASCII and non-UTF-8 strings for ascii and text columns, respectively.
The tests do not discover any unknown bug in Scylla, however, they do show that Scylla is more strict in its definition of "valid UTF-8" compared to Cassandra.
Closes#7719
* github.com:scylladb/scylla:
test/cql-pytest: add tests for validation of inserted strings
test/cql-pytest: add "scylla_only" fixture
test/cpy-pytest: enable experimental features
This change adds tracking of all the CQL errors that can be
raised in response to a CQL message from a client, as described
in the CQL v4 protocol and with Scylla's CDC_WRITE_FAILUREs
included.
Fixes#5859Closes#7604
We have "Conflicts: kernel < 3.10.0-514" on rpm package to make sure
the environment is running newer kernel.
However, user may use non-standard kernel which has different package name,
like kernel-ml or kernel-uek.
On such environment Conflicts tag does not works correctly.
Even the system running with newer kernel, rpm only checks "kernel" package
version number.
To avoid such issue, we need to drop Conflicts tag.
Fixes#7675
This patch adds comprehensive cql-pytest tests for checking the validation
of strings - ASCII or UTF-8 - in CQL. Strings can be represented in CQL
using several methods - a strings can be a string literal as
part of the statement, can be encoded as a blob (0x...), or
can be a binding parameter for a prepared statement, or returned
by user-defined functions - and these tests check all of them.
We already have low-level unit tests for UTF-8 parsing in
test/boost/utf8_test.cc, but the new tests here confirms that we really
call these low-level functions in the correct way. Moreover, since these
are CQL tests, they can also be run against Cassandra, and doing that
demonstrated that Scylla's UTF-8 parsing is *stricter* than Cassandra's -
Scylla's UTF-8 parser rejects the following sequences which Cassandra's
accepts:
1. \xC0\x80 as another non-minimal representation of null. Note that other
non-minimal encodings are rejected by Cassandra, as expected.
2. Characters beyond the official Unicode range (or what Scylla considers
the end of the range).
3. UTF-16 surrogates - these are not considered valid UTF-8, but Cassandra
accepts them, and Scylla does not.
In the future, we should consider whether Scylla is more correct than
Cassandra here (so we're fine), or whether compatibility is more important
than correctness (so this exposed a bug).
The ASCII tests reproduces issue #5421 - that trying to insert a
non-ASCII string into an "ascii" column should produce an error on
insert - not later when fetching the string. This test now passes,
because issue 5421 was already fixed.
These tests did not exposed any bug in Scylla (other than the differences
with Cassandra mentioned a bug), so all of them pass on Scylla. Two
of the tests fail on Cassandra, because Cassandra does not recognize
some invalid UTF-8 (according to Scylla's definition) as invalid.
Refs #5421.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reject the previously accepted case where the multi-column restriction
applied to just a single column, as it causes a crash downstream. The
user can drop the parentheses to avoid the rejection.
Fixes#7710
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#7712
"
This series adds maybe_yield called from
cleanup_compaction::get_ranges_for_invalidation
to avoid reactor stalls.
To achieve that, we first extract bool_class can_yield
to utils/maybe_yield.hh, and add a convience helper:
utils::maybe_yield(can_yield) that conditionally calls
seastar::thread::maybe_yield if it can (when called in a
seastar thread).
With that, we add a can_yield parameter to dht::to_partition_ranges
and dht::partition_range::deoverlap (defaults to false), and
use it from cleanup_compaction::get_ranges_for_invalidation,
as the latter is always called from `consume_in_thread`.
Fixes#7674
Test: unit(dev)
"
* tag 'unstall-get_ranges_for_invalidation-v2' of github.com:bhalevy/scylla:
compaction: cleanup_compaction: get_ranges_for_invalidation: add yield points
dht/i_partitioner: to_partition_ranges: support yielding
locator: extract can_yield to utils/maybe_yield.hh
It is used to force remove a node from gossip membership if something
goes wrong.
Note: run the force_remove_endpoint api at the same time on _all_ the
nodes in the cluster in order to prevent the removed nodes come back.
Becasue nodes without running the force_remove_endpoint api cmd can
gossip around the removed node information to other nodes in 2 *
ring_delay (2 * 30 seconds by default) time.
For instance, in a 3 nodes cluster, node 3 is decommissioned, to remove
node 3 from gossip membership prior the auto removal (3 days by
default), run the api cmd on both node 1 and node 2 at the same time.
$ curl -X POST --header "Accept: application/json"
"http://127.0.0.1:10000/gossiper/force_remove_endpoint/127.0.0.3"
$ curl -X POST --header "Accept: application/json"
"http://127.0.0.2:10000/gossiper/force_remove_endpoint/127.0.0.3"
Then run 'nodetool gossipinfo' on all the nodes to check the removed nodes
are not present.
Fixes#2134Closes#5436
This patch adds a fixture "scylla_only" which can be used to mark tests
for Scylla-specific features. These tests are skipped when running against
other CQL servers - like Apache Cassandra.
We recognize Scylla by looking at whether any system table exists with
the name "scylla" in its name - Scylla has several of those, and Cassandra
has none.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
disk parsing expects output from recursive listing of GCP
metadata REST call, the method used to do it by default,
but now it requires a boolean flag to run in recursive mode
Fixes#7684Closes#7685
Since f3bcd4d205 ("Merge 'Support SSL Certificate Hot
Reloading' from Calle"), we reload certificates as they are
modified on disk. This uses inotify, which is limited by a
sysctl fs.inotify.max_user_instances, with a default of 128.
This is enough for 64 shards only, if both rpc and cql are
encrypted; above that startup fails.
Increase to 1200, which is enough for 6 instances * 200 shards.
Fixes#7700.
Closes#7701
When we introduced dependencies.conf, we mistakenly added it on rpm as %ghost,
but it should be normal file, should be installed normally on package installation.
Fixes#7703Closes#7704
Fixes#7211
If we start a sharded<> object, then proceed to do potentially
exceptional stuff, we should destroy it on said exception.
Otherwise, the exception propagation will abort on RAII
destruction of the sharded<>. And we get no exception logging.
Closes#7697
* github.com:scylladb/scylla:
redis::service: Shut down sharded<> subobject on startup exception
transport::controller: Shut down distributed object on startup exception
Refs #7211
If we start a sharded<> object, then proceed to do potentially
exceptional stuff, we should destroy it on said exception.
Otherwise, the exception propagation will abort on RAII
destruction of the sharded<>. And we get no exception logging.
Fixes#7211
If we start a sharded<> object, then proceed to do potentially
exceptional stuff, we should destroy it on said exception.
Otherwise, the exception propagation will abort on RAII
destruction of the sharded<>. And we get no exception logging.
The downstream code expects a single-column restriction when using an
index. We could fix it, but we'd still have to filter the rows
fetched from the index table, unlike the code that queries the base
table directly. For instance, WHERE (c1,c2,c3) = (1,2,3) with an
index on c3 can fetch just the right rows from the base table but all
the c3=3 rows from the index table.
Fixes#7680
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>