This covers five tests, including three for compressed tables:
- write_many_partitions_deflate
- write_many_partitions_lz4
- write_many_partitions_snappy
- write_many_live_partitions
- write_many_deleted_partitions
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
These tests check the correctness of resulting compacted SSTables based
on the files produced by compacting input files with Cassandra.
Note that output files are not identical to those generated by Cassandra
because Scylla compaction does not yet optimise delta-encoded values
using serialization header.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <3fa05ce72352292d1026ce80ac87552889d10d96.1533667535.git.vladimir@scylladb.com>
Tests three cases:
- a row lying inside a range tombstone
- a row that has the same clustering key as range tombstone start
- a row that has the same clustering key as range tombstone end
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
These are two RTs where one's RT end clustering is the same as another
one's RT start bound but they are both exclusive.
In this case those bounds should not (and cannot) be merged into a
single RT boundary when writing RT markers.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
"
SSTables 3.x format ('m') stores the size of previous row or RT marker
inside each row/marker. That potentially allows to traverse rows/markers
in reverse order.
The previous code calculating those sizes appeared to produce invalid
values for all rows except the first one. The problem with detecting
this bug was that neither Cassandra itself nor the sstabledump tool use
those values, they are simply rejected on reading.
From UnfilteredSerializer.deserializeRowBody() method,
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java#L562
:
if (header.isForSSTable())
{
in.readUnsignedVInt(); // Skip row size
in.readUnsignedVInt(); // previous unfiltered size
}
So while the previous test files were technically correct in that they
contained valid data readable by Cassandra/sstabledump, they didn't
follow the format specification.
This patchset fixes the code to produce correct values and replaces
incorrect data files with correct ones. The newly generated data files
have been validated to be identical to files generated with Cassandra
using same data and timestamps as unit tests.
Tests: Unit {release}
"
* 'projects/sstables-30/fix-prev-row_size/v1' of https://github.com/argenet/scylla:
tests: Fix test files to use correct previous row sizes.
sstables: Fix calculation of previous row size for SSTables 3.x
sstables: Factor out code building promoted index blocks into separate helpers.
"
This patchset contains two fixes to the clustering key prefixes
serialization logic for SSTables 3.x.
First, it fixes a vexing typo: a bitwise-and (&) has been used instead
of a remainder operator (%) for truncating the shift value.
This did not show up in existing tests because they all had non-empty
clustering columns values.
Added tests to cover empty clustering columns values.
Second, it fixes the logic of serialization to write values up to the
prefix length, not the length of the clustering key as defined by
schema. This matches the way it is done by the Origin.
There is, however, a special case where the prefix size is smaller than
that of a clustering key but we still need to serialize up to the full
size. This is the case when a compact table is being used and some
rows in it are added using incomplete clustering keys (containing null
for trailing columns).
In Cassandra, these prefixes still have a full length and missing
columns are just set to 'null'. In our code those prefixes have their
real length, but since we need to serialize beyond it, we pass a flag to
indicate this.
"
* 'projects/sstables-30/fix-clustering-blocks/v1' of https://github.com/argenet/scylla:
tests: Add test covering compact table with non-full clustering key.
sstables: Improve clustering blocks writing, use logical clustering prefix size.
tests: Add test covering large clustering keys (>32 columns) for SSTables 3.x
tests: Add unit test covering empty values in clustering key.
sstables: Fix typo in clustering blocks write helper.
"
Add handling for missing columns and tests for it.
There are 3 cases:
1. Number of columns in a table is smaller than 64
2. Number of columns in a table is greater than 64
2a. and less than half of all possible columns are present in sstable
2b. and at least half of all possible columns are present in sstable
Case 1 is implemented using bit mask and column is present if mask & (1 << <column number>) == 0
Case 2 is implemented by storing list of column numbers for each present column
case 3 is implemented by storing list of column numbers for each absent column
"
* 'haaawk/sstables3/read-missing-columns-v3' of ssh://github.com/scylladb/seastar-dev:
sstables 3: add test for reading big dense subset of columns
sstables 3: support reading big dense subsets of columns
sstables 3: add test for reading big sparse subset of columns
sstables 3: support reading big sparse subsets of columns
sstables 3: add test for reading small subset of columns
sstables 3: support reading small subsets of columns
Since sstabledump and Cassandra do not use row size values, the new
files have been validated to be identical to files generated by
Cassandra with the same data inserted at same timestamps.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
"
Add handling for static rows and tests for it.
"
* 'haaawk/sstables3/read-static-v1' of ssh://github.com/scylladb/seastar-dev:
sstable_3_x_test: Add test_uncompressed_compound_static_row_read
sstable_3_x_test: add test_uncompressed_static_row_read
flat_mutation_reader_assertions: improve static row assertions
data_consume_rows_context_m: Implement support for static rows
mp_row_consumer_m: Implement support for static rows
mp_row_consumer_m: Extract fill_cells
"
Add handling for clustering columns and tests for it.
"
* 'haaawk/sstables3/read-ck-v3' of ssh://github.com/scylladb/seastar-dev:
Add test_uncompressed_compound_ck_read for SSTables 3.x
Add test_uncompressed_simple_read for SSTables 3.x
Implement reading clustering key from SSTables 3.x
column_translation: cache fixed value lengths for ck
data_consume_rows_context_m: use cached fixed column value lenghts
column_translation: store fix lengths of column values
consume_row_start: change type of clustering key
Rename ROW_BODY state to CLUSTERING_ROW
"
This patchset implements reading row columns from SSTable 3 format data file.
Tests: units (release)
"
* 'haaawk/sstables3/read-columns-v4' of ssh://github.com/scylladb/seastar-dev: (21 commits)
Add test for reading column values of different types.
Support all fixed size column types from SSTable 3.x
Add abstract_type::value_length_if_fixed
Add test for simple table with value
flat_reader_assertions: Add produces_row taking column values
Implement reading rows and columns in data_consume_rows_context_m
Introduce column_flags_m
Add column_translation to data_consume_rows_context_m
Pass schema to data_consume_context
Add column_translation.hh
consumer_m: Add consume methods for consuming rows and columns
Extract make_atomic_cell from mp_row_consumer_k_l
Rename NON_STATIC_ROW_* states to ROW_BODY_*
Add liveness_info and use it in reading sstables
Add helper methods for parsing simple types.
Add unfiltered_flags_m::has_all_columns
data_consume_context: use make_unique instead of new
Pass serialization_header to data_consume_rows_context*
Use disk_string_vint_size for bytes_array_vint_size
Introduce disk_string_vint_size type
...