Store the failure_detector object inside gossiper object.
- No more the global object sharded<failure_detector>
- No need to initialize sharded<failure_detector> manually which
simplifies the code in tests/cql_test_env.cc and init.cc.
When we're populating a partition range and the population range ends
with a partition key (not a token) which is present in sstables and
there was a concurrent memtable flush, we would abort on the following
assert in cache::autoupdating_underlying_reader:
utils::phased_barrier::phase_type creation_phase() const {
assert(_reader);
return _reader_creation_phase;
}
That's because autoupdating_underlying_reader::move_to_next_partition()
clears the _reader field when it tries to recreate a reader but it finds
the new range to be empty:
if (!_reader || _reader_creation_phase != phase) {
if (_last_key) {
auto cmp = dht::ring_position_comparator(*_cache._schema);
auto&& new_range = _range.split_after(*_last_key, cmp);
if (!new_range) {
_reader = {};
return make_ready_future<mutation_fragment_opt>();
}
Fix by not asserting on _reader. creation_phase() will now be
meaningful even after we clear the _reader. The meaning of
creation_phase() is now "the phase in which the reader was last
created or 0", which makes it valid in more cases than before.
If the reader was never created we will return 0, which is smaller
than any phase returned by cache::phase_of(), since cache starts from
phase 1. This shouldn't affect current behavior, since we'd abort() if
called for this case, it just makes the value more appropriate for the
new semantics.
Tests:
- unit.row_cache_test (debug)
Fixes#4236
Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>
"
This series adds support for local indexing, i.e. when the index table
resides on the same partition as base data.
It addresses the performance issue of having an indexed query
that also specifies a partition key - index will be queried
locally.
"
* 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits)
tests: add cases for local index prefix optimization
tests: add create/drop local index test case
tests: add non-standard names cases to local index tests
tests: add multi pk case for local index tests
tests: add test for malformed local index definitions
tests: add local index paging test
tests: add local indexing test
cql3: add CREATE INDEX syntax for local indexes
cql3: use serialization function to create index target string
index: add serialization function for index targets
index: use proper local index target when adding index
index: add parsing target column name from local index targets
db: add checking for local index in schema tables
index: add checking if serialized target implies local index
index: enable parsing multi-key targets
index: move target parser code to .cc file
json: add non-throwing overload for to_json_value
cql3: add checking for local indexes in has_supporting_index()
cql3: move finding index restrictions to prepare stage
cql3: add picking an index by score
...
When a materialized view was created, the verification code artificially
forbade creating a view without a clustering key column. However, there
is no real reason to forbid this. In the trivial case, the original base
table might not have had a clustering key, and the view might want to use
the exact same key. In a more complex case, a view may want to have all the
primary key columns as *partition* key columns, and that should be fine.
The patch also includes a regression test, which failed before this patch,
and succeeds with it (we test that we can create materialized views in both
aforementioned scenarios, and these materialized views work as expected).
Duarte raised the opinion that the "trivial" case of a view table with
a key identical to that of the base should be disallowed. However, this
should be done, if at all (I think it shouldn't), in a follow-up patch,
which will implement the non-triviality requirement consistently (e.g.,
require view primary key to be different from base's, regardless of
the existance or non-existance of clustering columns).
Fixes#4340.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190320122925.10108-1-nyh@scylladb.com>
"
Static compact tables are tables with compact storage and no
clustering columns.
Before this patch, Scylla was writing rows of static compact tables as
clustered rows instead of as static rows. That's because in our in-memory
model such tables have regular rows and no static row. In Cassandra's
schema (since 3.x), those tables have columns which are marked as
static and there are no regular columns.
This worked fine as long as Scylla was writing and reading those
sstables. But when importing sstables from Cassandra, our reader was
skipping the static row, since it's not present in our schema, and
returning no rows as a result. Also, Cassandra, and Scylla tools,
would have problems reading those sstables.
Fix this by writing rows for such tables the same way as Cassandra
does. In order to support rolling downgrade, we do that only when all
nodes are upgraded.
Fixes#4139.
Tests:
- unit (dev)
"
* tag 'static-compact-mc-fix-v3.1' of github.com:tgrabiec/scylla:
tests: sstables: Test reading of static compact sstable generated by Cassandra
tests: sstables: Add test for writing and reading of static compact tables
sstables: mc: Write static compact tables the same way as Cassandra
sstable: mc: writer: Set _static_row_written inside write_static_row()
sstables: Add sstable::features()
sstables: mc: writer: Prepare write_static_row() for working with any column_kind
storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag
sstables: mc: writer: Build indexed_columns together with serialization_header
sstables: mc: writer: De-optimize make_serialization_header()
sstable: mc: writer: Move attaching of mc-specific components out of generic code
Tomek and I recently had a discussion about whether or not a commitlog
replay would be safe after we dropped or truncated a table that is not
flushed (durable, but auto_snapshots being false).
While we agreed that would be the safe, we both agreed we would feel
better with a unit test covering that.
This patch adds such a test (btw, it passes)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190318223811.6862-1-glauber@scylladb.com>
Make sure the exception is thrown when Scylla
tries to load an SSTable created with non-compatible partitioner.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This renames it to record_large_partitions, which matches
record_large_rows. It also changes the signature to be closer to
record_large_rows.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The code for deleting entries from system.large_partitions was almost
a duplicate from the code for deleting entries from system.large_rows.
This patch unifies the two, which also improves the error message when
we fail to delete entries from system.large_partitions.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Complex timestamp tests were ported from dtest and contained a potential
race - rows were updated with TTL 1 and then checked if the row exists
in both base and view replicas in an eventually() loop.
During this loop however, TTL of 1 second might have already passed
and the row could have been deleted from base.
This patch changes the mentioned TTL to 30 seconds, making the tests
extremely unlikely to be flaky.
Message-Id: <6b43fe31850babeaa43465eb771c0af45ee6e80d.1552041571.git.sarna@scylladb.com>
This series contains several improvements to perf_fast_forward that
either address some of the problems seen in the automated runs or help
understanding the results.
The main problem was that test small-partition-slicing had a preparation
stage disproportionally long compared to the actual testing phase. While
the fragments per second results wasn't affected by that, it restricted
the number of iterations of the test that we were able to run, and the
test which single iterations is short (and more prone to noise) was
executed only four times. This was solved by sharing the preparation
stage with all iterations, thus enabling the test to be run many times
and improving the stability of the results.
Another, improvement is the ability to dump all test results and process
them producing histograms. This allows us to see how the distribution of
particular statistics looks like and if there are some complications.
Refs #4278.
* https://github.com/pdziepak/scylla.git more-perf_fast_forward/v1:
tests/perf_fast_forward: print number of iterations of each test
tests/perf_fast_forward: reuse keys in small partition slicing test
tests/perf_fast_forward: extract json result file writing logic
tests/perf_fast_forward: add an option to dump all results
tests/perf_fast_forward: add script for analysing full results
perf_fast_forward with flag --dump-all-results reports the results of
every test iteration that was executed. This patch introduces a python
script that can analyse those results (in json format) and present them
in a more human-friendly way.
For now, the only option is to plot histograms of selected statistics.
perf_fast_forward runs each test case multiple times and reports a
summary of those results (median, min, max, and median absolute
deviation).
While very convenient the summary may hide some important information
(e.g. the distribution of the results). This patch adds an option to
report results of every single executed iteration.
"
Currently any large partitions found during shutdown are not
recorded. The reason is that the database commit log is already off,
so there is nowhere to record it to.
One possible solution is to have an independent system database. With
that the regular db is shutdown first and writes can continue to the
system db.
That is a pretty big change. It would also not allow us to record
large partitions in any system tables.
This patch series instead tries to stop the commit log later. With
that any large partitions are recorded to the log and moved to a
sstable on the next startup.
"
* 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla:
db: stop the commit log after the tables during shutdown
db: stop the compaction manager earlier
db: Add a stop_database helper
db: Don't record large partitions in system tables
According to the cql definitions, if no ORDER BY clause is present,
records should be returned ordered by the clustering keys. Since the
backend returns the ranges according to their order of appearance
in the request, the bounds should be sorted before sending it to the
backend. This kind of sorting is needed in queries that generates more
than one bound to be read, examples to such queris are:
1. a SELECT query with an IN clause.
2. a SELECT query on a mixed order tupple of columns (see #2050).
The assumption this commit makes is the correctness of the bounds
list, that is, the bounds are non overlapping. If this wasn't true, multiple
occurences of the same reccord could have returned for certain queries.
Tests:
1. Unit tests release
2. All dtest that requires #2050 and #2029Fixes#2029
"
Fixes#3708
This series adds JSON serialization and deserialization procedures
to tuples and user defined types.
Tests: unit (dev)
"
* 'add_tuple_and_udt_json_support_2' of https://github.com/psarna/scylla:
tests: add test cases for JSON and UDT
types: add JSON support to UDT
tests: add JSON tuple tests
types: add JSON support for tuples
fuzzy_test performs some checks that are expected to fail and whoose
failure does not influence the outcome of the test. For this it uses the
`BOOT_WARN_*` family of macros. These will just log a warning when their
predicate fails. This can however confuse someone looking at the logs
trying to determine the cause of a failure. Since these checks are
performed primarly to provide an aid in debugging failures, replace them
with a conditional debug-level log message.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f550a9d9ab1b5b4aeb4f81860cbd3d924fc86898.1551792035.git.bdenes@scylladb.com>
The `test_abandoned_read` verifies that an abandoned read does a proper
cleanup. One of the things checked is that after the querier TTL
expires, the saved queriers are cleaned-up. This check however had a
very tight timing. The TTL was 2s and the test waited 2s before it did
the check, which is wrapped in an `eventually_true()` (max +1s).
The TTL timer scans the queriers with a period of TTL/2 so a querier
can live 1.5*TTL time. This means that the 2s + 1s wait time is just on
the limit and with some bad luck (and a slow machine) it can fail.
Reduce the TTL in this test to 1s to relax the dependence on timing.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <ed0d45b5a07960b83b391d289cade9b9f60c7785.1551787638.git.bdenes@scylladb.com>
This change ammends on the functionality of the result generation,
it changes the behaviour to return the expected results vector sorted
in the expected order of appearance in the result set. Then the
result set is validated for both, content and also order.
Tests: unit tests (Release)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Whenever a query with an IN clause on clustering keys is executed,
assuming only one partition, the rows are ordered according to the
clustering keys. This commit adds the order validation to the content
validation whenever possible (which means removing the
ignore order part).
Tests: unit tests (Release)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
"
Futurize split_range_to_single_shard to fix reactor stall.
Fixes: #3846
"
* tag 'asias/split_range_to_single_shard/v4' of github.com:scylladb/seastar-dev:
partitioner: Futurize split_range_to_single_shard
tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc
"
This fixes#3988.
We already have a system.large_partitions, but only a warning for
large rows. These patches close the gap by also recording large rows
into a new system.large_rows.
"
* 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla:
Add a testcase for large rows
Populate system.large_rows.
Create a system.large_rows table
Extract a key_to_str helper
Don't call record_large_rows if stopped
Add a delete_large_rows_entries method to large_data_handler
db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
Rename maybe_delete_large_partitions_entry
Rename log_large_row to record_large_rows
Rename maybe_log_large_row to maybe_record_large_rows
This patch adds fragmented_temporary_buffer_view::remove_suffix(). It is
also necessary to adjust remove_prefix() since now the total size of all
fragments may be larger than the size of the view if both those
operations are performed.
"
This series heavily refactors `auth_test` in anticipation of
the last patch, which fixes a bug and which should be backported.
Branches: branch-3.0, branch-2.3
"
Fixes#4284
* 'jhk/check_can_login/v2' of https://github.com/hakuch/scylla:
auth: Reject logins from disallowed roles
tests: Restrict the scope of a variable
tests: Simplify boolean assertions in `auth_test`
tests: Abstract out repeated assertion checking
tests: Do not use the `auth` namespace
tests: Validate authentication correctly
tests: Ensure test roles are created and dropped
tests: Use `static` variables in `auth_test`
tests: Remove non-useful test