"
Fixes#3574
This series adds missing multi-column restrictions filtering to CQL.
The underlying infrastructure already allows checking multi-column
restrictions in a reasonable way, so this series consists of mostly
adding simple interfaces and parameters.
Also, unit test cases for multi-column restrictions are provided.
Tests: unit (dev)
"
* 'add_multi_column_restrictions_filtering_3' of https://github.com/psarna/scylla:
tests: add multi-column filtering tests
cql3: add multi-column restrictions filtering
cql3: add specified is_satisfied_by to multi-column restriction
cql3: rewrite raw loop in is_satisfied_by to boost::any_of
cql3: fix is_satisfied_by for multi-column restrictions
cql3: add missing include to multi-column restriction
"
As part of implementing sstables manager and fixing issue related
to updating large_data_handler on all delete paths, we want to funnel
all sstable creations, loading, and deletions through a manager.
The patchset lays out test infrastructure to funnel these opeations
through class sstables::test_env.
In the process, it cleans up many numerous call sites in the existing
unit tests that evolved over time.
Refs #4198
Refs #4149
Tests: unit (dev)
"
* 'projects/test_env/v3' of https://github.com/bhalevy/scylla:
tests: introduce sstables::test_env
tests: perf_sstable: rename test_env
tests: sstable_datafile_test: use useable_sst
tests: sstable_test: add write_and_validate_sst helper
tests: sstable_test: add test_using_reusable_sst helper
tests: sstable_test: use reusable_sst where possible
tests: sstable_test: add test_using_working_sst helper
tests: sstable_3_x_test: make_test_sstable
tests: run_sstable_resharding_test: use default parameters to make_sstable
tests: sstables::test::make_test_sstable: reorder params
tests: test_setup: do_with_test_directory is unused
tests: move sstable_resharding_strategy_tests to sstable_reharding_test
tests: move create_token_from_key helpers to test_services
tests: move column_family_for_tests to test_services
dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh
"
This series introduces PER PARTITION LIMIT to CQL.
Protocol and storage is already capable of applying per-partition limits,
so for nonpaged queries the changes are superficial - a variable is parsed
and passed down.
For paged queries and filtering the situation is a little bit more complicated
due to corner cases: results for one partition can be split over 2 or more pages,
filtering may drop rows, etc. To solve these, another variable is added to paging
state - the number of rows already returned from last served partition.
Note that "last" partition may be stretched over any number of pages, not just the
last one, which is a case especially when considering filtering.
As a result, per-partition-limiting queries are not eligible for page generator
optimization, because they may need to have their results locally filtered
for extraneous rows (e.g. when the next page asks for per-partition limit 5,
but we already received 4 rows from the last partition, so need just 1 more
from last partition key, but 5 from all next ones).
Tests: unit (dev)
Fixes#2202
"
* 'add_per_partition_limit_3' of https://github.com/psarna/scylla:
tests: remove superficial ignore_order from filtering tests
tests: add filtering with per partition key limit test
tests: publish extract_paging_state and count_rows_fetched
tests: fix order of parameters in with_rows_ignore_order
cql3,grammar: add PER PARTITION LIMIT
idl,service: add persistent last partition row count
cql3: prevent page generator usage for per-partition limit
cql3: add checking for previous partition count to filtering
pager: add adjusting per-partition row limit
cql3: obey per partition limit for filtering
cql3: clean up unneeded limit variables
cql3: obey per partition limit for select statement
cql3: add get_per_partition_limit
cql3: add per_partition_limit to CQL statement
When reporting a failure, expected rows were mixed up with received
rows. Also, the message assumed it received more rows, but it can
as well be less, so now it reports a "different number" of rows.
"
get_restricted_ranges() is inefficient since it calculates all
vnodes that cover a requested key ranges in advance, but callers often
use only the first one. Replace the function with generator interface
that generates requested number of vnodes on demand.
"
* 'gleb/query_ranges_to_vnodes_generator' of github.com:scylladb/seastar-dev:
storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator
storage_proxy: remove old get_restricted_ranges() interface
cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface
tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface
storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface
storage_proxy: introduce new query_ranges_to_vnode_generator interface
read_exactly(), when given a stream that does not contain the amount of data
requested, will loop endlessly, allocating more and more memory as it does, until
it fails with an exception (at which point it will release the memory).
Fix by returning an empty result, like input_stream::read_exactly() (which it
replaces). Add a test case that fails without a fix.
Affected callers are the native transport, commitlog replay, and internal
deserialization.
Fixes#4233.
Branches: master, branch-3.0
Tests: unit(dev)
Message-Id: <20190216150825.14841-1-avi@scylladb.com>
The included testcase used to crash because during database::stop() we
would try to update system.large_partition.
There doesn't seem to be an order we can stop the existing services in
cql_test_env that makes this possible.
This patch then adds another step when shutting down a database: first
stop updating system.large_partition.
This means that during shutdown any memtable flush, compaction or
sstable deletion will not be reflected in system.large_partition. This
is hopefully not too bad since the data in the table is TTLed.
This seems to impact only tests, since main.cc calls _exit directly.
Tests: unit (release,debug)
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190213194851.117692-1-espindola@scylladb.com>
In preparation for providing a default large_data_handler in
a test-standard way.
buffer_size parameter reordered and now has a default value
same as make_sstable()'s.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
For fixing issue #3362 we added in materialized views, in some cases,
"virtual columns" for columns which were not selected into the view.
Although these columns nominally exist in the view's schema, they must
not be visible to the user, and in commit
3f3a76aa8f we prevented a user from being
able to SELECT these columns.
In this patch we also prevent the user from being able to use these
column names (which shouldn't exist in the view) in WHERE restrictions.
Fixes#4216
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190212162014.18778-1-nyh@scylladb.com>
The bulk materialized-view building processes (when adding a materialized
view to a table with existing data) currently reads the base table in
batches of 128 (view_builder::batch_size) rows. This is clearly better
than reading entire partitions (which may be huge), but still, 128 rows
may grow pretty large when we have rows with large strings or blobs,
and there is no real reason to buffer 128 rows when they are large.
Instead, when the rows we read so far exceed some size threshold (in this
patch, 1MB), we can operate on them immediately instead of waiting for
128.
As a side-effect, this patch also solves another bug: At worst case, all
the base rows of one batch may be written into one output view partition,
in one mutation. But there is a hard limit on the size of one mutation
(commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the
batch size to exceed this limit. By not batching further after 1MB,
we avoid reaching this limit when individual rows do not reach it but
128 of them did.
Fixes#4213.
This patch also includes a unit test reproducing #4213, and demonstrating
that it is now solved.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190214093424.7172-1-nyh@scylladb.com>
Previously it was the responsibility of the layer above (multishard
combining reader) to pause readers, which happened via an explicit
`pause()` call. This proved to be a very bad design as we kept finding
spots where the multishard reader should have paused the reader to avoid
potential deadlocks (due to starved reader concurrency semaphores), but
didn't.
This commit moves the responsibility of pausing the reader into the
shard reader. The reader is now kept in a paused state, except when it
is actually used (a `fill_buffer()` or `fast_forward_to()` call is
executing). This is fully transparent to the layer above.
As a side note, the shard reader now also hides when the reader is
created. This also used to be the responsibility of the multishard
reader, and although it caused no problems so far, it can be considered
a leak of internal details. The shard reader now automatically creates
the remote reader on the first time it is attempted to be used.
The code has been reorganized, such that there is now a clear separation
of responsibilities. The multishard combining reader handles the
combining of the output of the shard readers, as well as issuing
read-aheads. The shard reader handles read-ahead and creating the
remote reader when needed, as well as transferring the results of remote
reads to the "home" shard. The remote reader
(`shard_reader::remote_reader`, new in this patch) handles
pausing-resuming as well as recreating the reader after it was evicted.
Layers don't access each other's internals (like they used to).
After this commit, the reader passed to `destroy_reader()` will always
be in paused state.
Reader creation happens through the `reader_lifecycle_policy` interface,
which offers a `create_reader()` method. This method accepts a shard
parameter (among others) and returns a future. Its implementation is
expected to go to the specified shard and then return with the created
reader. The method is expected to be called from the shard where the
shard reader (and consequently the multishard reader) lives. This API,
while reasonable enough, has a serious flaw. It doesn't make batching
possible. For example, if the shard reader issues a call to the remote
shard to fill the remote reader's buffer, but finds that it was evicted
while paused, it has to come back to the local shard just to issue the
recreate call. This makes the code both convoluted and slow.
Change the reader creation API to be synchronous, that is, callable from
the shard where the reader has to be created, allowing for simple call
sites and batching.
This change requires that implementations of the lifecycle policy update
any per-reader data-structure they have from the remote shard. This is
not a problem however, as these data-structures are usually partitioned,
such that they can be accessed safely from a remote shard.
Another, very pleasant, consequence of this change is that now all
methods of the lifecycle interface are sync and thus calls to them
cannot overlap anymore.
This patch also removes the
`test_multishard_combining_reader_destroyed_with_pending_create_reader`
unit test, which is not useful anymore.
For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize
noise.
The shard reader relies on the `reader_lifecycle_policy` for pausing and
resuming the remote reader. The lifecycle policy's API was designed to
be as general as possible, allowing for any implementation of
pause/resume. However, in practice, we have a single implementation of
pause/resume: registering/unregistering the reader with the relevant
`reader_concurrency_semaphore`, and we don't expect any new
implementations to appear in the future.
Thus, the generic API of the lifecycle policy, is needlessly abstract
making its implementations needlessly complex. We can instead make this
very concrete and have the lifecycle policy just return the relevant
semaphore, removing the need for every implementor of the lifecycle
policy interface to have a duplicate implementation of the very same
logic.
For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize noise.
All tests that involve writing to a base table and then reading from the
view table must use the eventually() function to account for the fact that
the view update is asynchronous, and may be visible only some time after
writing the base table. Forgetting an eventually() can cause the test
to become flaky and sometimes fail because the expected data is not *yet*
in the view. Botond noticed these failures in practice in two subtests
(test_partition_key_filtering_with_slice and
test_clustering_key_in_restrictions).
This patch fixes both tests, and I also reviewed the entire source file
view_schem_test.cc and found additional places missing an eventually()
(and also places that unnecessarily used eventually() to read from the
base table), and fixed those as well.
Fixes#4212
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190212121140.14679-1-nyh@scylladb.com>
There is no value if having a default value for encoding_stats parameter
of write_components(). If anything it weakens the tests by encouraging
not using the real encoding stats which is not what the actual sstable
write path in Scylla does.
This patch removes the default value and makes most of the tests provide
real encoding statistics. The ones that do not are those that have no
easy way of obtaining those (and those stats are not that important for
the test itself) or there is a reason for not using those
(sstable_3_x_test::test_sstable_write_large_row uses row size thresholds
based on size with default-constructed encoding_stats).
Message-Id: <20190212124356.14878-1-pdziepak@scylladb.com>
"
The code reading counter cells form sstables verifies that there are no
unsupported local or remote shards. The latter are detected by checking
if all shards are present in the counter cell header (only remote shards
do not have entries there). However, the logic responsible for doing
that was incorrectly computing the total number of counter shards in a
cell if the header was larger than a single counter shard. This resulted
in incorrect complaints that remote shards are present.
Fixes#4206
Tests: unit(release)
"
* tag 'counter-header-fix/v1' of https://github.com/pdziepak/scylla:
tests/sstables: test counter cell header with large number of shards
sstables/counters: fix remote counter shard detection
The logic responsible for reading counters from sstables was getting
confused by large headers. The size of the header depends directly on
the number of shards. This tests checks that we can handle cells with
large number of counter shards properly.
tmpdir is a helper class representing a temporary directory.
Unfortunately, it suffers for some problems such as lack of proper
encapsulation and weak typing. This has caused bugs in the past when the
user code accidentally modified the member variable with the path to the
directory.
This patch modernises tmpdir and updates its users. The path is stored
in a std::filesystem::path and available read-only to the class users.
mkdtemp and boost are replaced by standard solution.
The users are update to use path more (when it didn't involve too many
changes to their code) and stop using lw_shared_ptr to store the tmpdir
when it wasn't necessary.
tmpdir intentionally doesn't provide any helpers for getting the path as
a string in order to discourage weak types.
Message-Id: <20190207145727.491-1-pdziepak@scylladb.com>
By default write_components() uses a safe default for encoding_stats
which indicates that all columns are present. This may hide so bugs, so
let's pass the real thing in the tests that this may matter.
This patch introduces some improvement to file comparison:
- exception flags are set so that any error triggers an exceptions and
guarantees that they are not silently ignored
- std::ios_base::binary flag is passed to open()
- istreambuf_iterator is used instead of istream_iterator. It is better
suited for comparing binary data.
The interface tmpdir::path isn't properly encapsulated and its users can
modify the path even though they really shouldn't. This can happen
accidentally, in cql_test_env a reference to tmpdir::path was created
and later assigned to in one of the code paths. This caused tmpdir
destructor to remove wrong directory at program exit.
This patch solves the problem by avoiding referencing tmpdir::path, a
copy is perfectly acceptable considering that this is tests-only code.
Message-Id: <20190206173046.26801-1-pdziepak@scylladb.com>