Commit Graph

15179 Commits

Author SHA1 Message Date
Avi Kivity
6154ea734d Merge "upport for writing SSTables 3.0 - rows only" from Vladimir
"
This patch series introduces initial support for writing SSTables in
'mc' format (aka SSTables 3.0).

Currently, the following components are written in 3.0 format:
  - Data.db
  - Index.db
  - Summary.db
(there were no changes to summary files format compared to ka/la)
Other SSTables components are written in the old format for now as they
still need to exist to satisfy post-flush processing.

For now, only rows are written to the data file and indexed. Range
tombstones are not supported.

Writing rows is supported in full with the only exception being counter
cells. All the other features (TTLed data, row/cell level tombstones,
collections, etc) are supported.

Unit tests rely on producing files and binary-comparing them with
'golden' copies that are produced using Cassandra 3.11. This is done to
not block until reading SSTables 3.0 format is implemented.

=======================================
Implementation notes
=======================================

Internally, sstable_writer has been refactored to support multiple
implementations that are instantiated in its constructor based on the
sstable version. Little to no code is shared among sstable_writer_v2 and
sstable_writer_v3 as we only intend to support sstable_writer_v2
alongside sstable_writer_v3 for a single release (to be able to do
rollback on rolling upgrade failure) and then plan to get rid of it
entirely and switch to always writing SSTables in the new format.

The design of sstable_writer_v3 mostly follows that of its precursors
sstable_writer(_v2) and components_writer. Some refactoring and further
code rearrangements are expected in the future but the main code is
there.
"

* 'projects/sstables-30/write-rows/v2' of https://github.com/argenet/scylla:
  Add tests for writing data and index files in SSTables 3.0 ('mc') format.
  Support for writing SSTables 3.0 ('mc') Data.db and Index.db files - rows only.
  Add missing enum values to bound_kind.
  Add building blocks for writing data in SSTables 3.0 format.
  Refactor sstable_writer to support various internal implementations.
  Add is_fixed_length() to data types.
  Add mutation_partition::apply_insert() overload that accepts TTL and expiry for row marker.
2018-04-27 17:10:31 +03:00
Piotr Jastrzebski
d839a945b4 Use goto instead of break in data_consume_rows_context_m::process_state
This way the code will be better predicted.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <271333caa723e8f3ed1db4fbe6b014ebde2b5d3a.1524818584.git.piotr@scylladb.com>
2018-04-27 11:56:13 +03:00
Vladimir Krivopalov
77fdfa3e7a Add tests for writing data and index files in SSTables 3.0 ('mc') format.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-26 14:34:20 -07:00
Vladimir Krivopalov
15ef4ca73c Support for writing SSTables 3.0 ('mc') Data.db and Index.db files - rows only.
This fix adds functionality for writing data in 'mc' format to Data.db
file according to the SSTables 3.0 data format as described at https://github.com/scylladb/scylla/wiki/SSTables-3.0-Data-File-Format
and Index.db file according to the specification at https://github.com/scylladb/scylla/wiki/SSTables-3.0-Index-File-Format

The following cases are not supported yet:
  - writing counter cells
  - range tombstones

In Index.db, end open markers are not written since range tombstones are not
supported for data files yet.

For #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-26 14:34:20 -07:00
Vladimir Krivopalov
3ecc9e9ce4 Add missing enum values to bound_kind.
bound_kind::clustering, bound_kind::excl_end_incl_start and
bound_kind::incl_end_excl_start are used during SSTables 3.0 writing.

bound_kind::static_clustering is not used yet but added for completeness
and parity with the Origin.

For #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-26 14:34:20 -07:00
Vladimir Krivopalov
a95664be08 Add building blocks for writing data in SSTables 3.0 format.
For #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-26 14:34:20 -07:00
Vladimir Krivopalov
bb2bea928a Refactor sstable_writer to support various internal implementations.
This is preparatory work for supporting writing SSTables in multiple
formats.

For #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-26 14:34:20 -07:00
Vladimir Krivopalov
54bd74fda0 Add is_fixed_length() to data types.
For any given CQL data type, this member returns whether its values are
of fixed or variable length. This is used by SSTables 3.0 format to only
store the length value for variable-length cells.

For #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-26 14:34:20 -07:00
Vladimir Krivopalov
ed62b9a667 Add mutation_partition::apply_insert() overload that accepts TTL and expiry for row marker.
For #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-26 13:27:42 -07:00
Piotr Jastrzebski
a8154e2825 Fix use-after-free in summary parsing
Buffer received from read_exactly is referenced by
a pointer used in do_until loop but is not kept around
and is destroyed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <5edd6d08ec4466fe6abd0e83b4bfb24f1f5c80fa.1524747108.git.piotr@scylladb.com>
2018-04-26 15:54:41 +03:00
Avi Kivity
5119c1e9c1 Merge "Implement reading simple table from sstable 3.x" from Piotr
"
This patchset prepares everything for support of both 2.x and 3.x formats and implements reading from sstable 3.x
very simple table with just partition keys.

Tests: units (release)
"

* 'haaawk/sstables3/read_only_partitions_v4' of ssh://github.com/scylladb/seastar-dev: (22 commits)
  Test for reading sstable in MC format with no columns
  Use new mp_row_consumer_m and data_consume_rows_context_m
  Introduce mp_row_consumer_m
  Rename mp_row_consumer to mp_row_consumer_k_l
  Introduce consumer_m and data_consume_rows_context_m
  Use read_short_length_bytes in RANGE_TOMBSTONE
  Use read_short_length_bytes in ATOM_START
  Use read_short_length_bytes in ROW_START
  Add continuous_data_consumer::read_short_length_bytes
  Reduce duplication with continuous_data_consumer::read_partial_int
  Add test for a simple table with just partition key
  Add test for reading index
  Extract mp_row_consumer to separate header
  Make sstable_mutation_reader independent from mp_row_consumer
  Make sstable_mutation_reader a template
  Make data_consume_context a template
  Move data_consume_rows_context from row.cc to row.hh
  Decouple sstable.hh and row.hh
  Reduce visibility of sstable::data_consume_*
  Move data_consume_context to separate header
  ...
2018-04-26 14:35:42 +03:00
Botond Dénes
b2d71ed872 install_dependencies.sh: centos: add systemd-devel
This optional dependency is needed to properly integrate with systemd.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <bacd07958531e6541d5b1a4ea885f01491002a7b.1524740540.git.bdenes@scylladb.com>
2018-04-26 14:32:36 +03:00
Piotr Jastrzebski
5c223c13d6 Test for reading sstable in MC format with no columns
Just a simple table with only partition key.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:38 +02:00
Piotr Jastrzebski
6dd7ce2582 Use new mp_row_consumer_m and data_consume_rows_context_m
When SSTable is in MC format then use those new classes
to be able to read the sstable.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:38 +02:00
Piotr Jastrzebski
9ba64f65e1 Introduce mp_row_consumer_m
This is a version of mp_row_consumer that can
handle SSTables in MC format.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:38 +02:00
Piotr Jastrzebski
4aec023927 Rename mp_row_consumer to mp_row_consumer_k_l
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:38 +02:00
Piotr Jastrzebski
2ee3d8b87b Introduce consumer_m and data_consume_rows_context_m
Those classes can handle SSTables in MC format.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:38 +02:00
Piotr Jastrzebski
b343212073 Use read_short_length_bytes in RANGE_TOMBSTONE
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
90bb7802cc Use read_short_length_bytes in ATOM_START
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
6a81a755ee Use read_short_length_bytes in ROW_START
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
06ceea9c3e Add continuous_data_consumer::read_short_length_bytes
This is a common operation so it's better to have it
implemented in a single place.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
e664360730 Reduce duplication with continuous_data_consumer::read_partial_int
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
9a3f93a42b Add test for a simple table with just partition key
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
c6d4f49abb Add test for reading index
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
63f0b57365 Extract mp_row_consumer to separate header
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
e5145b87b0 Make sstable_mutation_reader independent from mp_row_consumer
Take consumer as template parameter instead.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
9c93f9f5f4 Make sstable_mutation_reader a template
Take DataConsumeRowsContext type as parameter.
This will allow us to implement different context
for reading 3.x files.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
9fad5831df Make data_consume_context a template
Parametrize it with the type of data consume rows context.

There will be different implementations used for different
sstable file formats.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
e2b393df13 Move data_consume_rows_context from row.cc to row.hh
It will be used as a template parameter for sstable_mutation_reader
once it's turned into a template. This means the definition has
to be accessible.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
0e405719e8 Decouple sstable.hh and row.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
bcf5717753 Reduce visibility of sstable::data_consume_*
They are used just in partition.cc, row.cc and sstables_test.cc
so it is usefull to cut their scope by moving them
to data_consume_context.hh.

This will make it much easier to turn data_consume_context into
a template.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
578aa6826f Move data_consume_context to separate header
It's used only in row.cc, partition.cc and sstables_test.cc
so it's better to reduce the dependency just to those files.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
a55cec544e mp_row_consumer: stop depending on sstable_mutation_reader
Introduce mp_row_consumer_reader to cut
a cyclic dependency between them.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
0efcc6b33f Fix use-after-free in estimated_histogram parsing
A pointer to buf was used in do_until but buf wasn't
kept around and was destroyed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:48:02 +02:00
Takuya ASADA
782ebcece4 dist/debian: add --jobs <njobs> option just like build_rpm.sh
On some build environment we may want to limit number of parallel jobs since
ninja-build runs ncpus jobs by default, it may too many since g++ eats very
huge memory.
So support --jobs <njobs> just like on rpm build script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180425205439.30053-1-syuu@scylladb.com>
2018-04-26 12:44:06 +03:00
Duarte Nunes
6f9bc28edf Merge 'Collect statistics on updates to memtables' from Vladimir
"
This patchset brings in a statistics collector that tracks minimal
values for timestamps, TTLs and local deletion times for all the updates
made to a given memtable.

This statistics is later used when flushing memtables into SSTables
using 3.x ('mc') format to delta-encode corresponding values using
collected minimums as bases (that is why it is called encoding
statistics).

This patchset is sent out apart from other changes that introduce
writing SSTables 3.x to facilitate read path implementation that also
needs the encoding_stats structure.

The tests for write path implicitly cover this functionality as any rows
written to a SSTable 3.0 file make use of delta-encoding.
"

* 'projects/sstables-30/collect-encoding-statistics-v4' of https://github.com/argenet/scylla:
  Collect encoding statistics for memtable updates.
  Factor out min_tracker and max_tracker as common helpers.
  Always pass mutation_partitions to partition_entry::apply()
2018-04-26 00:39:15 +01:00
Vladimir Krivopalov
948c4d79d3 Collect encoding statistics for memtable updates.
We keep track of all updates and store the minimal values of timestamps,
TTLs and local deletion times across all the inserted data.
These values are written as a part of serialization_header for
Statistics.db and used for delta-encoding values when writing Data.db
file in SSTables 3.0 (mc) format.

For #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-25 15:39:14 -07:00
Vladimir Krivopalov
f6f99919da Factor out min_tracker and max_tracker as common helpers.
They will be re-used for collecting encoding statistics which is needed
to write SSTables 3.0.

Part of #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-25 14:58:47 -07:00
Vladimir Krivopalov
e1ee833861 Always pass mutation_partitions to partition_entry::apply()
Previously it was also possible to pass a frozen_mutation to it.
Now we de-serialize frozen mutations at the calling side.

This is a pre-requisite for collecting memtable statistics needed for
writing into the SSTables 3.0 format.

For #1969.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-04-25 14:58:47 -07:00
Moreno Garcia
8dde91d03c docker: Create data_dir if it does not exist
When provisioning a Scylla docker image with --developer-mode 0 (disabled)
scylla_raid_setup is not invoked. As a consequence the "data" directory is not
created and scylla_io_setup fails (steps to reproduce and error message provided
at the end).

This patch adds the same verifications present in scylla_io_setup to docker's
scyllasetup.py and creates the data directory in the case it is not present.

--

Steps to reproduce on AWS i3.2xlarge with Ubuntu 16.04:

sudo -s
apt update && apt upgrade -y && apt-get install docker.io -y

mdadm --create --verbose --force --run /dev/md0 --level=0 -c1024 --raid-devices=1 /dev/nvme0n1
mkfs.xfs /dev/md0 -f -K
mkdir /var/lib/scylla
mount -t xfs /dev/md0 /var/lib/scylla

docker run --name some-scylla \
  --volume /var/lib/scylla:/var/lib/scylla \
  -p 9042:9042 -p 7000:7000 -p 7001:7001 -p 7199:7199 \
  -p 9160:9160 -p 9180:9180 -p 10000:10000 \
  -d scylladb/scylla --overprovisioned 1 --developer-mode 0

docker logs some-scylla
  running: (['/usr/lib/scylla/scylla_dev_mode_setup', '--developer-mode', '0'],)
  running: (['/usr/lib/scylla/scylla_io_setup'],)
  terminate called after throwing an instance of 'std::system_error'
    what():  open: No such file or directory
  ERROR:root:/var/lib/scylla/data did not pass validation tests, it may not be on XFS and/or has limited disk space.
  This is a non-supported setup, and performance is expected to be very bad.
  For better performance, placing your data on XFS-formatted directories is required.
  To override this error, enable developer mode as follow:
  sudo /usr/lib/scylla/scylla_dev_mode_setup --developer-mode 1
  failed!
  Traceback (most recent call last):
    File "/docker-entrypoint.py", line 15, in <module>
      setup.io()
    File "/scyllasetup.py", line 34, in io
      self._run(['/usr/lib/scylla/scylla_io_setup'])
    File "/scyllasetup.py", line 23, in _run
      subprocess.check_call(*args, **kwargs)
    File "/usr/lib64/python3.4/subprocess.py", line 558, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['/usr/lib/scylla/scylla_io_setup']' returned non-zero exit status 1

ls -latr /var/lib/scylla
  total 4
  drwxr-xr-x 44 root root 4096 Abr 24 13:02 ..
  drwxr-xr-x  2 root root    6 Abr 24 13:10 .

Signed-off-by: Moreno Garcia <moreno@scylladb.com>
Message-Id: <20180424173729.22151-1-moreno@scylladb.com>
2018-04-25 17:48:34 +03:00
Calle Wilund
b1edf75c8b types: Make seastar::inet_address the "native" type for CQL inet.
Fixes #3187

Requires seastar "inet_address: Add constructor and conversion function
from/to IPv4"

Implements support IPv6 for CQL inet data. The actual data stored will
now vary between 4 and 16 bytes. gms::inet_address has been augumented
to interop with seastar::inet_address, though of course actually trying
to use an Ipv6 address there or in any of its tables with throw badly.

Tests assuming ipv4 changed. Storing a ipv4_address should be
transparent, as it now "widens". However, since all ipv4 is
inet_address, but not vice versa, there is no implicit overloading on
the read paths. I.e. tests and system_keyspace (where we read ip
addresses from tables explicitly) are modified to use the proper type.
Message-Id: <20180424161817.26316-1-calle@scylladb.com>
2018-04-24 23:12:07 +01:00
Duarte Nunes
9111c6e49a Merge seastar upstream
* seastar 1bb44ac...70aecca (12):
  > Experimental CMake-based build system
  > inet_address: Add constructor and conversion function from/to IPv4
  > tls: Add missing includes and forward declarations to header
  > install_dependencies.sh: fix remaining centos issues
  > rpc: Add missing return when closing client socket
  > install-dependencies.sh: install g++7.3 for centos, instead of g++7.2
  > reactor: fix race beween alien queue construction and start
  > Merge "enhance the I/O Scheduler with bandwidth and throughput limits" from Glauber
  > reactor: gracefully exit if exception happens during initialization
  > build: really add alien_test
  > Merge "reactor: add alien::submit_to()" from Kefu
  > queue: do not consume from aborted queue

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-04-24 23:07:13 +01:00
Duarte Nunes
f5eeafe1bf tests/secondary_index_test: Add test for dropping index-backing MV
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180424140745.7144-2-duarte@scylladb.com>
2018-04-24 17:02:59 +01:00
Duarte Nunes
9146de3118 service/migration_manager: Don't drop index-backing MV
Unless dropped by the index itself, forbid dropping an index-backing
MV using `drop materialized view`.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180424140745.7144-1-duarte@scylladb.com>
2018-04-24 17:01:59 +01:00
Nadav Har'El
d674b6f672 secondary index: fix bug in indexing case-sensitive column names
CQL normally folds identifiers such as column names to lowercase. However,
if the column name is quoted, case-sensitive column names and other strange
characters can be used. We had a bug where such columns could be indexed,
but then, when trying to use the index in a SELECT statement, it was not
found.

The existing code remembered the index's column after converting it to CQL
format (adding quotes). But such conversion was unnecessary, and wrong,
because the rest of the code works with bare strings and does not involve
actual CQL statements. So the fix avoids this mistaken conversion.

This patch also includes a test to reproduce this problem.

Fixes #3154.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180424154920.15924-1-nyh@scylladb.com>
2018-04-24 16:57:17 +01:00
Piotr Sarna
d323b5cddc tests: add missing case-sensitive JSON tests
This commit complements cql_query_test with case-sensitivity cases
for both SELECT JSON and INSERT JSON statements.
Message-Id: <20bc7df2ec644618727183e09f2352ca5546a9b9.1524576066.git.sarna@scylladb.com>
2018-04-24 16:30:56 +03:00
Piotr Sarna
000ce24306 cql3: solve JSON case-sensitivity issues
This commit fixes two closely related issues with handling
case-sensitive column names in JSON:
 * according to doc, case-sensitive names should be wrapped with
   additional pair of double quotes during JSON SELECT
 * logic error in parse_json() prevented INSERT JSON from working
   properly on case-sensitive column names

This commit is followed by updated cql_query_test, which checks
case-sensitive cases as well.
Message-Id: <82d9d5e193a656e99bc86b297c00662a6fb808a0.1524576066.git.sarna@scylladb.com>
2018-04-24 16:30:55 +03:00
Avi Kivity
13ea1a89b5 Merge "Implement loading sstables in 3.x format" from Piotr
"
Pass sstable version to parse, write and describe_type methods to make it possible to handle different versions.
For now serialization header from 3.x format is ignored.

Tests: units (release)
"

* 'haaawk/sstables3/loading_v4' of ssh://github.com/scylladb/seastar-dev:
  Add test for loading the whole sstable
  Add test for loading statistics
  Add support for 3_x stats metadata
  Pass sstable version to describe_type
  Pass sstable version to write methods
  metadata_type: add Serialization type
  Pass sstable_version_types to parse methods
  Add test for reading filter
  Add test for read_summary
  sstables 3.x: Add test for reading TOC
  sstable: Make component_map version dependent
  sstable::component_type: add operator<<
  Extract sstable::component_type to separete header
  Remove unused sstable::get_shared_components
  sstable_version_types: add mc version
2018-04-24 12:49:41 +03:00
Piotr Jastrzebski
6310fc5f1c Add test for loading the whole sstable
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
9e78b6d4c6 Add test for loading statistics
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00