sprint() returns std::string(), but the new format() returns an sstring. Usually
an sstring is wanted but in this case an sstring will fail as it is added to
an std::string.
Fix the failure (after spring->format conversion) by converting to an std::string.
It achieves 2.0x speedup on intel E5 and 1.1x to 2.5x speedup on
various arm64 microarchitectures.
The algorithm cuts data into blocks of 1024 bytes and calculates crc
for each block, which is furthur divided into three subblocks of 336
bytes(42 uint64) each, and 16 remaining bytes(2 uint64).
For each iteration, three independent crc are caculated for one uint64
from each subgroup. It increases IPC(instructions per cycle) much.
After subblocks are done, three crc and remaining two uint64 are
combined using carry-less multiplication to reach the final result
for one block of 1024 bytes.
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1541042759-24767-1-git-send-email-yibo.cai@arm.com>
"
In Cassandra, row columns are stored in a BTree that uses the following
ordering on them:
- all atomic columns go first, then all multi-cell ones
- columns of both types (atomic and multi-cell) are
lexicographically ordered by name regarding each other
Scylla needs to store columns and their respective indices using the
same ordering as well as when reading them back.
Fixes#3853
Tests: unit {release}
+
Checked that the following SSTables are dumped fine using Cassandra's
sstabledump:
cqlsh:sst3> CREATE TABLE atomic_and_collection3 ( pk int, ck int, rc1 text, rc2 list<text>, rc3 text, rc4 list<text>, rc5 text, rc6 list<text>, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''};
cqlsh:sst3> INSERT INTO atomic_and_collection3 (pk, ck, rc1, rc4, rc5) VALUES (0, 0, 'hello', ['beautiful','world'], 'here');
<< flush >>
sstabledump:
[
{
"partition" : {
"key" : [ "0" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 96,
"clustering" : [ 0 ],
"liveness_info" : { "tstamp" : "1540599270139464" },
"cells" : [
{ "name" : "rc1", "value" : "hello" },
{ "name" : "rc5", "value" : "here" },
{ "name" : "rc4", "deletion_info" : { "marked_deleted" : "1540599270139463", "local_delete_time" : "1540599270" } },
{ "name" : "rc4", "path" : [ "45e22cb0-d97d-11e8-9f07-000000000000" ], "value" : "beautiful" },
{ "name" : "rc4", "path" : [ "45e22cb1-d97d-11e8-9f07-000000000000" ], "value" : "world" }
]
}
]
}
]
"
* 'projects/sstables-30/columns-proper-order/v1' of https://github.com/argenet/scylla:
tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x.
sstables: Re-order columns (atomic first, then collections) for SSTables 3.x.
sstables: Use a compound structure for storing information used for reading columns.
* seastar d152f2d...c1e0e5d (6):
> scripts: perftune.py: properly merge parameters from the command line and the configuration file
> fmt: update to 5.2.1
> io_queue: only increment statistics when request is admitted
> Adds `read_first_line.cc` and `read_first_line.hh` to CMake.
> fstream: remove default extent allocation hint
> core/semaphore: Change the access of semaphore_units main ctor
Due to a compile-time fight between fmt and boost::multiprecision, a
lexical_cast was added to mediate.
sprint("%s", var) no longer accepts numeric values, so some sprint()s were
converted to format() calls. Since more may be lurking we'll need to remove
all sprint() calls.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"
This patchset adddresses two problems with shadowable deletions handling
in SSTables 3.x. ('mc' format).
Firstly, we previously did not set a flag indicating the presence of
extended flags byte with HAS_SHADOWABLE_DELETION bitmask on writing.
This would break subsequent reading and cause all types of failures up
to crash.
Secondly, when reading rows with this extended flag set, we need to
preserve that information and create a shadowable_tombstone for the row.
Tests: unit {release}
+
Verified manually with 'hexdump' and using modified 'sstabledump' that
second (shadowable) tombstone is written for MV tables by Scylla.
+
DTest (materialized_views_test.py:TestMaterializedViews.hundred_mv_concurrent_test)
that originally failed due to this issue has successfully passed locally.
"
* 'projects/sstables-30/shadowable-deletion/v4' of https://github.com/argenet/scylla:
tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x.
tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x.
sstables: Support Scylla-specific extension for writing shadowable tombstones.
sstables: Introduce a feature for shadowable tombstones in Scylla.db.
memtable: Track regular and shadowable tombstones separately in encoding_stats_collector.
sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion.
sstables: Support checking row extension flags for Cassandra shadowable deletion.
After the new in-memory representation of cells was introduced there was
a regression in atomic_cell_or_collection::operator<< which stopped
printing the content of the cell. This makes debugging more incovenient
are time-consuming. This patch fixes the problem. Schema is propagated
to the atomic_cell_or_collection printer and the full content of the
cell is printed.
Fixes#3571.
Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>
Compaction mode fails if more than one shard is used because it doesn't
make sure sstables used as input for compaction only contain local keys.
Therefore, sstable generated by compaction has less keys than expected
because non-local keys are purged out.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181022225153.12029-1-raphaelsc@scylladb.com>
"
Hinted handoff should not overpower regular flows like READs, WRITEs or
background activities like memtable flushes or compactions.
In order to achieve this put its sending in the STEAMING CPU scheduling
group and its commitlog object into the STREAMING I/O scheduling group.
Fixes#3817
"
* 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla:
db::hints::manager: use "streaming" I/O scheduling class for reads
commitlog::read_log_file(): set the a read I/O priority class explicitly
db::hints::manager: add hints sender to the "streaming" CPU scheduling group
his series attempts to make fragments per second results reported by
perf_fast_forward more stable. That includes running each test case
multiple time and reporting median, median average deviation, maximum
and minimum value. That should allow to relatively easily assess how
repeatable the presented results are. Moreover, since perf_fast_forward
does IO operation it is important that they do not introduce any
excessive noise to the results. The location of the data directory
is made configurable so that the user can choose less noisy disk or a
ramdisk.
* github.com/pdziepak/scylla.git stabilise-perf_fast_forward/v3:
tests/perf_fast_forward: make fragments/s measurements more stable
tests/perf_fast_forward: make data directory location configurable
network_topology_strategy test creates a ring with hundreds of tokens (and one
token per node). Then, for each token, it calls get_primary_ranges(), which in
turn walks the token ring. However, because the each datacenter occupies a
disjoint token range, this walk practically has to walk the entire ring until
it collects enough endpoints for each datacenter. The whole thing takes 15 minutes.
Speed this up by randomizing the token<->dc relationship. This is more realistic,
and switches the algorithm to be O(token count), and now it completes in less
than a minute (still not great, but better).
Message-Id: <20181022154026.19618-1-avi@scylladb.com>
perf_fast_forward populates perf_fast_forward_output with some data and
then runs performance tests that read it. That makes the disk a
significant factor in the final result and may make the results less
repeatable. This patch adds a flag that allows setting the location
of the data directory so that the user can opt for a less noisy disk
or a ramdisk.
perf_fast_forward performs various operations, many of which involve
sstable reads and verifies the metrics that there weren't any
unnecessary IO operations. It also provides fragments per seconds
measurements for the tests it runs. However, since some of the tests are
very short and involve IO those values vary a lot what makes them not
very useful.
This commit attempts to stabilise those results. Each test case is run
multiple time (by default for a second, but at least 3 times) and shows
median, median absolute deviation, maximum and minimum value. This
should allow assessing whether the changes in the results are just noise
or a real regression or improvement.
Currently, restricting_mutation_reader::fill_buffer justs reads
lower-layer reader's fragments one by one without doing any further
transformations. This change just swaps the parent-child buffers in a
single step, as suggested in #3604, and, hence, removing any possible
per-fragment overhead.
I couldn't find any test that exercises restricting_mutation_reader as
a mutation source, so I added test_restricted_reader_as_mutation_source
in mutation_reader_test.
Tests: unit (release), though these 4 tests are failing regardless of
my changes (they fail on master for me as well): snitch_reset_test,
sstable_mutation_test, sstable_test, sstable_3_x_test.
Fixes: #3604
Signed-off-by: George Kollias <georgioskollias@gmail.com>
Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>
"
This patchset fixes#3803. When a select statement with filtering
is executed and the column that is needed for the filtering is not
present in the select clause, rows that should have been filtered out
according to this column will still be present in the result set.
Tests:
1. The testcase from the issue.
2. Unit tests (release) including the
newly added test from this patchset.
"
* 'issues/3803/v10' of https://github.com/eliransin/scylla:
unit test: add test for filtering queries without the filtered column
cql3 unit test: add assertion for the number of serialized columns
cql3: ensure retrieval of columns for filtering
cql3: refactor find_idx to be part of statement restrictions object
cql3: add prefix size common functionality to all clustering restrictions
cql3: rename selection metadata manipulation functions
Test the usecase where the column that the filtering operates on
is not a part of the select clause. The expected result is a set
containing the columns of the select clause with the additional
columns for filtering marked as non serializable.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
The result sets that the assertions are performed against
are result sets before serialization to the user and therefore
contain also columns that will not be serialized and sent as
the query's final result. The patch adds an assertion on the
number of columns that will be present in the final serialized
result.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Previously we were making assumptions about missing columns
(the size of its value, whether it's a collection or a counter) but
they didn't have to be always true. Now we're using column type
from serialization header to use the right values.
Fixes#3859
* seastar-dev.git haaawk/projects/sstables-30/handling-dropped-columns/v4:
sstables 3: Correctly handle dropped columns in column_translation
sstables 3: Add test for dropped columns handling
get_ranges() is supposed to return ranges in sorted order. However, a35136533d
broke this and returned the range that was supposed to be last in the second
position (e.g. [0, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9]). The broke cleanup, which
relied on the sort order to perform a binary search. Other users of the
get_ranges() family did not rely on the sort order.
Fixes#3872.
Message-Id: <20181019113613.1895-1-avi@scylladb.com>
Readers for SST3 return a bit more precise range tombstones
when reader is slicing. Namely, SST2 readers return whole
range tombstones that overlap with slicing range but SST3
trim those range tombstones to slicing range.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
There is a mismatch between row markers used in SSTables 2.x (ka/la) and
liveness_info used by SSTables 3.x (mc) in that a row marker can be
written as a deleted cell but liveness_info cannot.
To handle this, for a dead row marker the corresponding liveness_info is
written as expiring liveness_info with a fake TTL set to 1.
This approach is adapted from the solution for CASSANDRA-13395 that
exercised similar issue during SSTables upgrades.
* github.com/argenet/scylla.git projects/sstables-30/dead-row-marker/v7:
sstables: Introduce TTL limitation and special 'expired TTL' value.
sstables: Write dead row marker as expired liveness info.
tests: Add test covering dead row marker writing to SSTables 3.x.
The single-range overload, when used by
make_multishard_streaming_reader(), has to create a reader that is
forwardable. Otherwise the multishard streaming reader will not produce
any output as it cannot fast-forward its shard readers to the ranges
produced by the generator.
Also add a unit test, that is based on the real-life purpose the
multishard streaming reader was designed for - serving partition
from a shard, according to a sharding configuration that is different
than the local one. This is also the scenario that found the buf in the
first place.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <bf799961bfd535882ede6a54cd6c4b6f92e4e1c1.1539235034.git.bdenes@scylladb.com>
A simple case for SI paging is added to secondary_index_test suite.
This commit should be followed by more complex testing
and serves as an example on how to extract paging state and use it
across CQL queries.
Message-Id: <b22bdb5da1ef8df399849a66ac6a1f377e6a650a.1539090350.git.sarna@scylladb.com>
A materialized views can provide a filter so as to pick up only a subset
of the rows from the base table. Usually, the filter operates on columns
from the base table's primary key. If we use a filter on regular (non-key)
columns, things get hairy, and as issue #3430 showed, wrong: merely updating
this column in the base table may require us to delete, or resurrect, the
view row. But normally we need to do the above when the "new view key column"
was updated, when there is one. We use shadowable tombstones with one
timestamp to do this, so it cannot take into account the two timestamp from
those two columns (the filtered column and the new key column).
So in the current code, filtering by a non-key column does not work correctly.
In this patch we provide two test cases (one involving TTLs, and one involves
only normal updates), which demonstrate vividly that it does *not* work
correctly. With normal updates, trying to resurect a view row that has
previously disappeared, fails. With TTLs, things are even worse, and the view
row fails to disappear when the filtered column is TTLed.
In Cassandra, the same thing doesn't work correctly as well (see
CASSANDRA-13798 and CASSANDRA-13832) so they decided to refuse creating
a materialized view filtering a non-key column. In this patch we also
do this - fail the creation of such an unsupported view. For this reason,
the two tests mentioned above are commented out in a "#if", with, instead,
a trivial test verifying a failure to create such a view.
Note that as explained above, when the filtered column and new view key
column are *different* we have a problem. But when they are the *same* - namely
we filter by a non-key base column which actually *is* a key in the view -
we are actually fine. This patch includes additional test cases verifying
that this case is really fine and provides correct results. Accordingly,
this case is *not* forbidden in the view creation code.
Fixes#3430.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181008185633.24616-1-nyh@scylladb.com>
"
This patchset fixes a bug in SSTables 3.x reading when fast-forwarding
is enabled. It is possible that a mutation fragment, row or RT marker,
is read and then stored because it falls outside the current
fast-forwarding range.
If the reader is further fast-forwarded but the
row still falls outside of it, the reader would still continue reading
and get the next fragment, if any, that would clobber the currently
stored one. With this fix, the reader does not attempt to read on
after storing the current fragment.
Tests: unit {release}
"
* 'projects/sstables-30/row-skipped-on-double-ff/v2' of https://github.com/argenet/scylla:
tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x.
sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment.
sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value.
writetime() or ttl() selections of non-frozen collections can work, as they
are single cells. Relax the check to allow them, and only forbid non-frozen
collections.
Fixes#3825.
Tests: cql_query_test (release).
Message-Id: <20181008123920.27575-1-avi@scylladb.com>
Uncomment existing declare() calls and implement tests. Because the
data_value(bytes) constructor is explicit, we add explicit conversion to
data_value in impl_min_function_for<> and impl_max_function_for<>.
Fixes#3824.
Message-Id: <20181008084127.11062-1-avi@scylladb.com>
Test dropping of static column defined in CREATE TABLE, and
adding and dropping of a static column using ALTER TABLE.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We had two commented out tests based on Cassandra's MV unit tests, for
the case that the view's filter (the "SELECT" clause used to define the
view) filtered by a non-primary-key column. These tests used to fail
because of problems we had in the filtering code, but they now succeed,
so we can enable them. This patch also adds some comments about what
the tests do, and adds a few more cases to one of the tests.
Refs #3430.
However, note that the success of these tests does not really prove that
the non-PK-column filtering feature works fully correctly and that issue
forbidding it, as explained in
https://issues.apache.org/jira/browse/CASSANDRA-13798. We can probably
fix this feature with our "virtual cells" mechanism, but will need to add
a test to confirm the possible problem and its (probably needed fix).
We do not add such a test in this patch.
In the meantime, issue #3430 should remain open: we still *allow* users
to create MV with such a filter, and, as the tests in this patch show,
this "mostly" works correctly. We just need to prove and/or fix what happens
with the complex row liveness issues a la issue #3362.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181004213637.32330-1-nyh@scylladb.com>
Field 'e' was supposed to be read as blob but the test had a bug
and the read schema was treating that field as int. This patch
changes that and makes the test really check column type change.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
We allow tables to be created with the ID property, mostly for
advanced recovery cases. However, we need to validate that the ID
doesn't match an existing one. We currently do this in
database::add_column_family(), but this is already too late in the
normal workflow: if we allow the schema change to go through, then
it is applied to the system tables and loaded the next time the node
boots, regardless of us throwing from database::add_column_family().
To fix this, we perform this validation when announcing a new table.
Note that the check wasn't removed from database::add_column_family();
it's there since 2015 and there might have been other reasons to add
it that are not related to the ID property.
Refs #2059
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181001230142.46743-1-duarte@scylladb.com>