Commit Graph

15145 Commits

Author SHA1 Message Date
Piotr Jastrzebski
2ee3d8b87b Introduce consumer_m and data_consume_rows_context_m
Those classes can handle SSTables in MC format.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:38 +02:00
Piotr Jastrzebski
b343212073 Use read_short_length_bytes in RANGE_TOMBSTONE
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
90bb7802cc Use read_short_length_bytes in ATOM_START
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
6a81a755ee Use read_short_length_bytes in ROW_START
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
06ceea9c3e Add continuous_data_consumer::read_short_length_bytes
This is a common operation so it's better to have it
implemented in a single place.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
e664360730 Reduce duplication with continuous_data_consumer::read_partial_int
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
9a3f93a42b Add test for a simple table with just partition key
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
c6d4f49abb Add test for reading index
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
63f0b57365 Extract mp_row_consumer to separate header
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
e5145b87b0 Make sstable_mutation_reader independent from mp_row_consumer
Take consumer as template parameter instead.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
9c93f9f5f4 Make sstable_mutation_reader a template
Take DataConsumeRowsContext type as parameter.
This will allow us to implement different context
for reading 3.x files.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
9fad5831df Make data_consume_context a template
Parametrize it with the type of data consume rows context.

There will be different implementations used for different
sstable file formats.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
e2b393df13 Move data_consume_rows_context from row.cc to row.hh
It will be used as a template parameter for sstable_mutation_reader
once it's turned into a template. This means the definition has
to be accessible.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
0e405719e8 Decouple sstable.hh and row.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
bcf5717753 Reduce visibility of sstable::data_consume_*
They are used just in partition.cc, row.cc and sstables_test.cc
so it is usefull to cut their scope by moving them
to data_consume_context.hh.

This will make it much easier to turn data_consume_context into
a template.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
578aa6826f Move data_consume_context to separate header
It's used only in row.cc, partition.cc and sstables_test.cc
so it's better to reduce the dependency just to those files.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
a55cec544e mp_row_consumer: stop depending on sstable_mutation_reader
Introduce mp_row_consumer_reader to cut
a cyclic dependency between them.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:49:37 +02:00
Piotr Jastrzebski
0efcc6b33f Fix use-after-free in estimated_histogram parsing
A pointer to buf was used in do_until but buf wasn't
kept around and was destroyed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-26 12:48:02 +02:00
Piotr Jastrzebski
6310fc5f1c Add test for loading the whole sstable
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
9e78b6d4c6 Add test for loading statistics
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
df457166b0 Add support for 3_x stats metadata
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
e1e23ec555 Pass sstable version to describe_type
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
1cc1f9af5f Pass sstable version to write methods
This will allow writing different versions differently

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
08da518dae metadata_type: add Serialization type
Ignore it while reading sstable 3_x and throw
if it's present when reading 2_x.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
cb84ca8abb Pass sstable_version_types to parse methods
Parsing will depend on the sstable version when
we have support for both 2_x and 3_x formats.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
444b468d46 Add test for reading filter
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
ff06d2153c Add test for read_summary
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
10f9b06145 sstables 3.x: Add test for reading TOC
Make sure DigestCRC32 is handled correctly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
561ca34ec2 sstable: Make component_map version dependent
Introduce sstable_version_constants that will be a proxy
serving correct constants depending on the format version.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
7aef74c55f sstable::component_type: add operator<<
Make it possible to print out component_type.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:30:26 +02:00
Piotr Jastrzebski
d492e92b15 Extract sstable::component_type to separete header
It will be used in other places which won't depend on
sstable.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 11:29:57 +02:00
Piotr Jastrzebski
279b426ee8 Remove unused sstable::get_shared_components
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 09:45:55 +02:00
Piotr Jastrzebski
7248752698 sstable_version_types: add mc version
This is the latest version of 3.x SSTable format.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-04-24 09:45:55 +02:00
Raphael S. Carvalho
11940ca39e sstables: Fix bloom filter size after resharding by properly estimating partition count
We were feeding the total estimation partition count of an input shared
sstable to the output unshared ones.

So sstable writer thinks, *from estimation*, that each sstable created
by resharding will have the same data amount as the shared sstable they
are being created from. That's a problem because estimation is feeded to
bloom filter creation which directly influences its size.
So if we're resharding all sstables that belong to all shards, the
disk usage taken by filter components will be multiplied by the number
of shards. That becomes more of a problem with #3302.

Partition count estimation for a shard S will now be done as follow:
    //
    // TE, the total estimated partition count for a shard S, is defined as
    // TE = Sum(i = 0...N) { Ei / Si }.
    //
    // where i is an input sstable that belongs to shard S,
    //       Ei is the estimated partition count for sstable i,
    //       Si is the total number of shards that own sstable i.

Fixes #2672.
Refs #3302.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180423151001.9995-1-raphaelsc@scylladb.com>
2018-04-23 18:11:20 +03:00
Avi Kivity
8a8f688dbf Merge "Materialized views: Fixes to update generation" from Duarte
"
Fixes to several issues around view update generation, pertaining to
timestamp and TTL management.

Fixes #3361
Fixes #3360
Fixes #3140
Refs #3362

Tests: unit(release, debug), dtest(materialized_views.py)
"

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

* 'materialized-views/fixes-galore/v2' of http://github.com/duarten/scylla:
  mutation_partition: Clarify comment about emptiness
  tests: Add view_complex_test
  tests/view_schema_test: Complete test
  db/view: Move cells instead of copying in add_cells_to_view()
  db/view: Handle unselected base columns and corner cases
  mutation_partition: Regular base column in view determines row liveness
  db/view: Don't avoid read-before-write when view PK matches base
  db/view: Process base updates to column unselected by its views
  db/view: Consider partition tombstone when generating updates
  tests/view_schema_test: Remove unneeded test
  mutation_fragment: Allow querying if row is live
  view_info: Add view_column() overload
  view_info: Explicitly initialize base-dependent fields
  cql3/alter_table_statement: Forbid dropping columns of MV base tables
2018-04-23 16:49:29 +03:00
Nadav Har'El
1ec5688b0b Materialized Views: fix incorrect limitations on row filtering
This patch fixes several cases where it was disallowed to create
a materialized view with a filter ("where ..."), for no good reason.
After this patch, these cases will be allowed. Fixes #2367.

In ordinary SELECT queries, certain types of filtering which is known to
be deceptively inefficient is now allowed. For example, trying to query
a range of partition keys cannot be done without reading the entire
database (because the murmur3 tokenizer randomizes the order of partitions).
Restricting two partition key components also cannot be done without
reading excessive amount of the entire partition. So Scylla, following
Cassandra, chooses to disallow such SELECT queries, and give an error
message.

However, the same SELECT statements *should* be allowed when defining a
materialized view. In this case, the filter is just used to check an
individual row - not to search for one - so there is no performance
concern.

Unfortunately the existing code did these validations while building the
SELECT statement's "restrictions", in code shared by both uses of SELECT
(query and MV definition). It was easy to move one of the validations
to later code which runs after the restriction has already been built (and
knows if it is working for query or MV), but because of the way the
"restrictions" objects (translated from Cassandra 2's code) hide what they
contain, many of the checks are harder to perform after having built the
restrictions object. So instead, we add in strategic places in the
restriction-handling code a new "allow_filtering" flag. If restrictions
are built with allow_filtering=true, the extra performance-oriented tests
on the filtering restrictions is not done. Materialized views sets
allow_filtering=true.

The allow_filtering flag will also be useful later when we want to support
the "ALLOW FILTERING" query option which is currently not supported properly
(we have several open issues on that). However note that this patch doesn't
complete that support: I left a FIXME in the spot where we set
allow_filtering in the Materialized Views case, but in the futre also need
to set it if the user specified "ALLOWED FILTERING" in the query.

This patch also enables several unit tests written by Duarte which used to
fail because of this bug, and now pass. These tests verify that the
restrictions are now allowed and filter the view as desired; But I also
added test code to verify that the same restrictions are still forbidden,
as before, when used in ordinary SELECT queries.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20180423124343.17591-1-nyh@scylladb.com>
2018-04-23 14:08:04 +01:00
Avi Kivity
ff055a291a Merge "Improve "out-of-the-box" build experience on centos" from Botond
"
Make sure install_dependencies.sh installs all the right dependencies
and that the example `configure.py` invokation can just be copy-pasted
into the terminal and will "just work".

Ref: #3208
"

* 'fix_centos_compile/v2' of https://github.com/denesb/scylla:
  install_dependencies.sh: update centos package list and example
  configure.py: add --with-ragel option
  configure.py: add --with-antlr3
  configure.py: check compiler version first
2018-04-23 15:49:27 +03:00
Botond Dénes
bfe741c03d install_dependencies.sh: update centos package list and example
Add missing packages to `yum install` list:
* scylla-boost163-static
* scylla-python34-pyparsing20

Update the configure.py example so that it just works:
* Change g++ to 7.3
* Add --with-antlr3 pointing to antlr3 installed from scylla 3rdparty
2018-04-23 15:46:43 +03:00
Botond Dénes
1efcf215b6 configure.py: add --with-ragel option
To allow the user to select the exact ragel executable they whish to
use.
2018-04-23 15:46:43 +03:00
Botond Dénes
784be9cc43 configure.py: add --with-antlr3
To allow the user to select the exact antlr3 executable they whish to
use.
2018-04-23 15:46:43 +03:00
Botond Dénes
ea8d8f9fbf configure.py: check compiler version first
Before checking anything else (presence of boost, its version, etc.)
check that the compiler is present and can compile and link a simple c++
program.
Before if the compiler was not set up correctly configure.py would fail
at one of the other try_compile checks, whichever came first (usually
the one checking for boost). This lead the user into chasing some
false-positive error when in fact the compiler wasn't working.
2018-04-23 15:46:43 +03:00
Takuya ASADA
7b92c3fd3f dist: Drop AmbientCapabilities from scylla-server.service for Debian 8
Debian 8 causes "Invalid argument" when we used AmbientCapabilities on systemd
unit file, so drop the line when we build .deb package for Debian 8.
For other distributions, keep using the feature.

Fixes #3344

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180423102041.2138-1-syuu@scylladb.com>
2018-04-23 13:27:14 +03:00
Avi Kivity
269207fdf6 Merge "Introducing INSERT JSON and fromJson to CQL3" from Piotr
"
This series complements JSON support with INSERT JSON and fromJson
cql function.

INSERT JSON implementation tries hard to interfere as little as possible
with regular INSERT path. So, after being parsed, insertJsonStatement
exists as a separate statement and is handled in a special way.
Overridden add_update_for_key extracts values from JSON map and applies
them to columns.

Converting from insert_json_statement to insert_statement uses auxiliary
from_json_object methods to convert JSON-encoded types to bytes.
Then, terms are matched to appropriate column names and cells are
updated.

fromJson CQL function uses the same from_json_object helper methods,
but applies them to single arguments, not whole rows.

Existing json handling functions from json.hh and libjsoncpp were used
where possible.

Things implemented:
 * expanding CQL grammar to accept INSERT JSON
 * converting JSON representation of cql values to cql terms
 * serving 'INSERT INTO xxx JSON yyy' clause
 * tests for INSERT JSON and fromJson()
"

* 'json_ops_2' of https://github.com/psarna/scylla:
  tests: add cql unit tests for INSERT JSON
  cql3: add fromJson() function
  cql3: add INSERT JSON parsing to CQL grammar
  cql3: add support for INSERT JSON clause
  cql3: decouple execute from term binding in setters
  cql3: change operation::make_* functions to static
  cql3: add from_json_object function to types
  cql3: Make literals::NULL_VALUE public
2018-04-23 13:19:54 +03:00
Piotr Sarna
97e89f2efb tests: add cql unit tests for INSERT JSON
This commit adds tests for INSERT JSON clause, which is expected
to accept JSON strings and insert appropriate values to columns
defined there.
The tests also cover fromJson function calls and inserting prepared
batch statements with INSERT JSON inside.

References #2058
2018-04-23 12:00:57 +02:00
Piotr Sarna
cd76a01747 cql3: add fromJson() function
This function extends JSON support with fromJson() function,
which can be used in UPDATE clause to transform JSON value
into a value with proper CQL type.

fromJson() accepts strings and may return any type, so its instances,
like toJson(), are generated during calls.

This commit also extends functions::get() with additional
'receiver' parameter. This parameter is used to extract receiver type
information neeeded to generate proper fromJson instance.
Receiver is known only during insert/update, so functions::get() also
accepts a nullptr if receiver is not known (e.g. during selection).

References #2058
2018-04-23 12:00:57 +02:00
Piotr Sarna
9dd34bf34d cql3: add INSERT JSON parsing to CQL grammar
This commit makes it possible to parse INSERT JSON statement
in CQL grammar, so it's available via cqlsh.

References #2058
2018-04-23 12:00:57 +02:00
Piotr Sarna
cdcbf654a8 cql3: add support for INSERT JSON clause
This commit adds the implementation of INSERT JSON clause
which accepts JSON object as parameter and inserts appropriate
values into appropriate columns, as defined in given JSON.

Example:
INSERT INTO testme JSON '{
  "id" : 77,
  "name" : "Jones",
  "ranking" : 8.5
}'

References #2058
2018-04-23 12:00:57 +02:00
Piotr Sarna
bfe3c20035 cql3: decouple execute from term binding in setters
This commit makes it possible to pass values to setters,
instead of having to pass cql3::term instances.
Thanks to that previously prepared terminals can be directly
used in a setter execution.

References #2058
2018-04-23 12:00:56 +02:00
Piotr Sarna
2b729a10bc cql3: change operation::make_* functions to static
This commit makes operation::make* functions static, because they
don't access any instance-specific data anyway. It is later needed
to decouple setter execution from binding a cql3::term.
2018-04-23 12:00:56 +02:00
Piotr Sarna
1d40d2186e cql3: add from_json_object function to types
This commit adds a 'from_json_object' method which will be used
for converting JSON representation of a value to raw bytes representing
the same value. This functionality will be needed by 'INSERT JSON'
clause implementation, which can turn these raw bytes into cql3::term.

References #2058
2018-04-23 12:00:56 +02:00