* seastar-dev.git haaawk/sst3/test_clustering_slices/v8:
sstables: Extract on_end_of_stream from consume_partition_end
sstables: Don't call consume_range_tombstone_end in
consume_partition_end
sstables: Change the way fragments are returned from consumer
(cherry picked from commit 193efef950)
There is a mismatch between row markers used in SSTables 2.x (ka/la) and
liveness_info used by SSTables 3.x (mc) in that a row marker can be
written as a deleted cell but liveness_info cannot.
To handle this, for a dead row marker the corresponding liveness_info is
written as expiring liveness_info with a fake TTL set to 1.
This approach is adapted from the solution for CASSANDRA-13395 that
exercised similar issue during SSTables upgrades.
* github.com/argenet/scylla.git projects/sstables-30/dead-row-marker/v7:
sstables: Introduce TTL limitation and special 'expired TTL' value.
sstables: Write dead row marker as expired liveness info.
tests: Add test covering dead row marker writing to SSTables 3.x.
(cherry picked from commit a7a14e3af2)
Currently, when stopping a reader fails, it simply won't be attempted to
be saved, and it will be left in the `_readers` array as-is. This can
lead to an assertion failure as the reader state will contain futures
that were already waited upon, and that the cleanup code will attempt to
wait on again. To prevent this, when stopping a reader fails, reset it
to nonexistent state, so that the cleanup code doesn't attempt to do
anything with it.
Refs: #3830
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <a1afc1d3d74f196b772e6c218999c57c15ca05be.1539088164.git.bdenes@scylladb.com>
(cherry picked from commit d467b518bc)
A materialized views can provide a filter so as to pick up only a subset
of the rows from the base table. Usually, the filter operates on columns
from the base table's primary key. If we use a filter on regular (non-key)
columns, things get hairy, and as issue #3430 showed, wrong: merely updating
this column in the base table may require us to delete, or resurrect, the
view row. But normally we need to do the above when the "new view key column"
was updated, when there is one. We use shadowable tombstones with one
timestamp to do this, so it cannot take into account the two timestamp from
those two columns (the filtered column and the new key column).
So in the current code, filtering by a non-key column does not work correctly.
In this patch we provide two test cases (one involving TTLs, and one involves
only normal updates), which demonstrate vividly that it does *not* work
correctly. With normal updates, trying to resurect a view row that has
previously disappeared, fails. With TTLs, things are even worse, and the view
row fails to disappear when the filtered column is TTLed.
In Cassandra, the same thing doesn't work correctly as well (see
CASSANDRA-13798 and CASSANDRA-13832) so they decided to refuse creating
a materialized view filtering a non-key column. In this patch we also
do this - fail the creation of such an unsupported view. For this reason,
the two tests mentioned above are commented out in a "#if", with, instead,
a trivial test verifying a failure to create such a view.
Note that as explained above, when the filtered column and new view key
column are *different* we have a problem. But when they are the *same* - namely
we filter by a non-key base column which actually *is* a key in the view -
we are actually fine. This patch includes additional test cases verifying
that this case is really fine and provides correct results. Accordingly,
this case is *not* forbidden in the view creation code.
Fixes#3430.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181008185633.24616-1-nyh@scylladb.com>
(cherry picked from commit b8668dc0f8)
We had two commented out tests based on Cassandra's MV unit tests, for
the case that the view's filter (the "SELECT" clause used to define the
view) filtered by a non-primary-key column. These tests used to fail
because of problems we had in the filtering code, but they now succeed,
so we can enable them. This patch also adds some comments about what
the tests do, and adds a few more cases to one of the tests.
Refs #3430.
However, note that the success of these tests does not really prove that
the non-PK-column filtering feature works fully correctly and that issue
forbidding it, as explained in
https://issues.apache.org/jira/browse/CASSANDRA-13798. We can probably
fix this feature with our "virtual cells" mechanism, but will need to add
a test to confirm the possible problem and its (probably needed fix).
We do not add such a test in this patch.
In the meantime, issue #3430 should remain open: we still *allow* users
to create MV with such a filter, and, as the tests in this patch show,
this "mostly" works correctly. We just need to prove and/or fix what happens
with the complex row liveness issues a la issue #3362.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181004213637.32330-1-nyh@scylladb.com>
(cherry picked from commit e4ef7fc40a)
Fixes#3798Fixes#3694
Tests:
unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test)
* tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla:
gms/gossiper: Replicate enpoint states in add_saved_endpoint()
gms/gossiper: Make reset_endpoint_state_map() have effect on all shards
gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards
gms/gossiper: Always override states from older generations
(cherry picked from commit 48ebe6552c)
"
This patchset fixes a bug in SSTables 3.x reading when fast-forwarding
is enabled. It is possible that a mutation fragment, row or RT marker,
is read and then stored because it falls outside the current
fast-forwarding range.
If the reader is further fast-forwarded but the
row still falls outside of it, the reader would still continue reading
and get the next fragment, if any, that would clobber the currently
stored one. With this fix, the reader does not attempt to read on
after storing the current fragment.
Tests: unit {release}
"
* 'projects/sstables-30/row-skipped-on-double-ff/v2' of https://github.com/argenet/scylla:
tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x.
sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment.
sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value.
(cherry picked from commit 0fa60660b8)
The Antlr3 exception class has a null dereference bug that crashes
the system when trying to extract the exception message using
ANTLR_Exception<...>::displayRecognitionError(...) function. When
a parsing error occurs the CqlParser throws an exception which in
turn processesed for some special cases in scylla to generate a custom
message. The default case however, creates the message using
displayRecognitionError, causing the system to crash.
The fix is a simple workaround, making sure the pointer is not null
before the call to the function. A "proper" fix can't be implemented
because the exception class itself is implemented outside scylla
in antlr headers that resides on the host machine os.
Tested manualy 2 testcases, a typo causing scylla to crash and
a cql comment without a newline at the end also caused scylla to crash.
Ran unit tests (release).
Fixes#3740Fixes#3764
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <cfc7e0d758d7a855d113bb7c8191b0fd7d2e8921.1538566542.git.eliransin@scylladb.com>
(cherry picked from commit 20f49566a2)
"
This patchset enables very simple column type conversions.
It covers only handling variable and fixed size type differences.
Two types still have to be compatiple on bits level to be able to convert a field from one to the other.
"
* 'haaawk/sst3/column_type_schema_change/v4' of github.com:scylladb/seastar-dev:
Fix check_multi_schema to actually check the column type change
Handle very basic column type conversions in SST3
Enable check_multi_schema for SST3
(cherry picked from commit b9702222f8)
This reverts commit b443a9b930. The compaction
history table doesn't have enough information to be a replacement for this
log message yet.
(cherry picked from commit 7c8143c3c4)
The linker uses an opt-in system for non-executable stack: if all object files
opt into a non-executable stack, the binary will have a non-executable stack,
which is very desirable for security. The compiler cooperates by opting into
a non-executable stack whenever possible (always for our code).
However, we also have an assembly file (for fast power crc32 computations).
Since it doesn't opt into a non-executable stack, we get a binary with
executable stack, which Gentoo's build system rightly complains about.
Fix by adding the correct incantation to the file.
Fixes#3799.
Reported-by: Alexys Jacob <ultrabug@gmail.com>
Message-Id: <20181002151251.26383-1-avi@scylladb.com>
(cherry picked from commit aaab8a3f46)
"
This series adds support for range generator functors to multi range
reader. A range generator functor can lazily generate an uknown amount
of ranges on-the-fly for the reader to read.
The range generator support was added by refactoring
`flat_multi_range_mutation_reader` to work in terms of a generator
functor. The existing overload taking a `dht::partition_range_vector`
is adapted to the generator interface behind the scenes.
"
* 'multi-range-reader-generator/v9' of https://github.com/denesb/scylla:
tests/flat_mutation_reader_test: extend multi-range reader tests
make_flat_multi_range_reader: add documentation
make_flat_multi_range_reader: add generator overload
flat_multi_range_reader: refactor to work in terms of generator
make_flat_multi_range_reader(): better handle the 0 range case
flat_mutation_reader: add move_buffer_content_to()
flat_multi_range_mutation_reader: drop fwd_mr ctor parameter
"
Indexed select statement consists of two queries - the view query
used to extract base keys and the base query that uses those keys
to return base rows.
The main idea of this series is to replace raw proxy.query() call
during the view query to one that uses a pager.
Additionally, paging info from the view query needs to be returned
to the client, in order to be used later for requesting new pages.
"
* 'paging_indexes_7' of https://github.com/psarna/scylla:
tests: add test for secondary index with paging
cql3: remove execute(primary_keys) from select statement
cql3: add incremental base queries to index query
storage_proxy: make get_restricted_ranges public
cql3: add base query handling function to indexed statement
cql3: add generating base key from index keys
cql3: add paging state generation function
cql3: move getting index view schema to prepare stage
pager: make state() defined for exhausted pagers
cql3: add maybe_set_paging_state function
cql3: rename set_has_more_pages to set_paging_state
pager: add setters for partition/clustering keys
cql3: add paging to read_posting_list
cql3: add non-const get_result_metadata method
cql3: make find_index_* functions return paging state
cql3: make read_posting_list return future<rows>
cql3: make pagers use time_point instead of duration
Commit d6b0c4dda4 changed the built-in default
murmur3_ignore_msb_bits to 12 (from 0) and removed the scylla.yaml default.
Removal of the scylla.yaml default was a mistake for two reasons:
- if someone downgrades a cluster, keeping scylla.yaml derived from the
master branch, they will experience resharding since the built-in default,
which has changed, will take effect. While that scenario is not supported,
it already happened and caused much consternation.
- if, in the future, we wish to change the default, we will cause resharding
again. Embedding the default in scylla.yaml allows us to change the default
for new clusters while allowing upgraded clusters to retain older values.
Therefore, this patch restores murmur3_ignore_msb_bits in scylla.yaml. Future
changes to the configuration item should change both scylla.yaml and the
built-in default.
Message-Id: <20180930090053.21136-1-avi@scylladb.com>
Reusable buffers are meant to be used when protocol or third-party
library limiations force us to allocate large contiguous buffers. There
isn't much that can be done about this so there is little point in
warning about that.
Fixes#3788.
Message-Id: <20180928085141.6469-1-pdziepak@scylladb.com>
In commit 4a0b561376, "storage_service:
Get rid of moving operation", we removed remove_from_moving() in
update_normal_tokens(). However, remove_from_moving() calls
invalidate_cached_rings(). We should call invalidate_cached_rings() in
update_normal_tokens(), otherwise we will get wrong token range to
address map in the token_metadata cache.
This issue exists in master only. It is not in any of the releases.
Message-Id: <c03f2ed478cfdb84494f36dce9a8cfc05ed9e0cd.1538288364.git.asias@scylladb.com>
Allows creating a multi range reader from an arbitrary callable that
return std::optional<dht::partition_range>. The callable is expected to
return a new range on each call, such that passing each successive range
to `flat_mutation_reader::fast_forward_to` is valid. When exhausted the
callable is expected to return std::nullopt.
Instead of working with a dht::partition_range_vector directly, work
with an abstract generator that returns a pointer to the next range on
each invocation. When exhausted it returns nullptr. This opens up the
possibility to create multi range readers from a generator functor that
creates ranges lazily. This is indeed what the next path does.
Previously, when the passed in range of partition ranges contained 0
ranges, an empty reader was returned. This means that the returned
reader was forwardable or not depending on the number of passed in
ranges. This is inconsistent and can lead to nasty surprises.
To solve this problem add `forwardable_empty_mutation_reader`, a
specialized reader that delays creating the underlying reader until
fast_forward_to() is called on it, and thus a range is available.
When `make_flat_multi_range_mutation_reader()` is called with
`mutation_reader::forwarding::no` a simple empty reader is created, like
before.
`move_buffer_content_to()` makes it possible to implement more efficient
wrapping readers, readers that wrap another flat mutation reader but do
no transformation to the underlying fragment stream.
These readers, when filling their buffers, can simply fill the
underlying reader's buffer, then move its content into their own. When
the reader's own buffer is empty, this is very efficient, as it can be
done by simply swapping the buffers, avoiding the work of moving the
fragments one-by-one.
The factory function creating this reader ensures that the passed-in
ranges vector has more then one range, which effectively makes the
`fwd_mr` constructor parameter have no effect. The underlying reader
will always be created with `mutation_reader::forwarding::yes` as it has
to be able to fast-forward between the ranges.
When validating assignment between two types, it's possible one of
them is wrapped in a reverse_type, if it comes, for example, from the
type associated with a clustering column. When checking for weak
assignment the types are correctly unwrapped, but not when checking
for an exact match, which this patch fixes.
Technically, the receiver is never a reversed_type for the current
callers, but this is the morally correct implementation, as the type
being reversed or not plays no role in assignment.
Tests: unit(release)
Fixes#3789
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180927223201.28152-1-duarte@scylladb.com>
Right now, with specialized execute() that takes primary keys
for indexed_table_select_statement, the original execute()
method implemented in select_statement is not used anywhere,
so it's removed.
Base queries that are part of index queries are allowed to be short,
which can result in wasted work - e.g. when we query all replicas
in parallel, but have to discard most of the result, since the first
one (in token order) resulted in a short read.
Thus, we start by quering 1 range, check if the read is short,
and if not, continue by querying 2x more ranges than before.
Refs #2960
Searching for index view schema for an indexed statement can be done
once in prepare stage, so it's moved to indexed_table_select_statement
prepare method.
If service::pager is exhausted, state() function used to return
a nullptr instead of a pointer to a valid paging state and the
documented return type in this case was 'unspecified'.
Sometimes a paging state may be needed anyway, even if the pager
is already exhausted - thus, state() return value becomes defined
after this commit. Exhausted pagers will return a valid object
to a state with _remaining field set to 0.
We have found issues when a flush is requested outside the usual
memtable flush loop and because there is not a lot of data the
controller will not have a high amount of shares.
To prevent this, this patch guarantees some minimum amount of shares
when extraneous operations (nodetool flush, commitlog-driven flush, etc)
are requested.
Another option would be to add shares instead of guarantee a minimum.
But in my view the approach I am taking here has two main advantages:
1) It won't cause spikes when those operations are requested
2) It is cumbersome to add shares in the current infrastructure, as just
adding backlog can cause shares to spike. Consider this example:
Backlog is within the first range of very low backlog (~0.2). Shares
for this would be around ~20. If we want to add 200 shares, that is
equivalent to a backlog of 0.8. Once we add those two backlogs
together, we end up with 1 (max backlog).
Fixes#3761
Tests: unit (release)
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180927131904.8826-1-glauber@scylladb.com>