Commit Graph

18610 Commits

Author SHA1 Message Date
Dejan Mircevski
f9b00a4318 cql: Fix mixed selection with GROUP BY
GROUP BY is currently supported by simple_selection, the class used
when all selectors are simple.  But when selectors are mixed, we use
selection_with_processing, which does not yet support GROUP BY.  This
patch fixes that.

It also adapts one testcase in filtering_test to the new behavior of
simple_selector.  The test currently expects the last value seen, but
simple_selector now outputs the first value seen.

(More details: the WHERE clause implicitly selects the columns it
references, and unit tests are forced to provide expected values for
these columns.  The user-visible result is unchanged in the test;
users never see the WHERE column values due to filtering in
cql::transport, outside unit tests.)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-14 12:50:39 -04:00
Dejan Mircevski
06e3b36164 cql: Allow mixing of aggregate and simple selectors
Scylla currently rejects SELECT statements with both simple and
aggregate selectors, but Cassandra allows them.  This patch brings
parity to Scylla.

Fixes #4447.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-14 10:34:02 -04:00
Glauber Costa
a23531ebd5 Support AWS i3en instances
AWS just released their new instances, the i3en instances.  The instance
is verified already to work well with scylla, the only adjustments that
we need is advertise that we support it, and pre-fill the disk
information according to the performance numbers obtained by running the
instance.

Fixes #4486
Branches: 3.1

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190508170831.6003-1-glauber@scylladb.com>
2019-05-08 20:09:44 +03:00
Avi Kivity
a86fdeb02b Merge "Implement GROUP BY" from Dejan
"
Cassandra has supported GROUP BY in SELECT statements since 2016
(v3.10), while ScyllaDB currently treats it as a syntax error.  To
achieve parity with Cassandra in this important bit of functionality,
this patch adds full support for GROUP BY, from parsing to validation
to implementation to testing.
"

* 'groupby-implPP' of https://github.com/dekimir/scylla:
  Implement grouping in selection processing
  Propagate GROUP BY indices to result_set_builder
  Process GROUP BY columns into select_statement
  Parse GROUP BY clause, store column identifiers
2019-05-08 18:35:12 +03:00
Dejan Mircevski
d51e4a589d Implement grouping in selection processing
Make result_set_builder obey its _group_by_cell_indices by recognizing
group boundaries and resetting the selectors.

Also make simple_selectors work correctly when grouping.

Fixes #2206.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 11:05:36 -04:00
Dejan Mircevski
c3929aee3a Propagate GROUP BY indices to result_set_builder
Ensure that the indices recorded in select_statement are passed to
result_set_builder when one is created for processing the cell values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:10:10 -04:00
Dejan Mircevski
274a77f45e Process GROUP BY columns into select_statement
Validate raw GROUP BY identifiers and translate them into
a select_statement member.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:10:10 -04:00
Dejan Mircevski
e1fb414805 Parse GROUP BY clause, store column identifiers
Extend the grammar file with GROUP BY, collect the column identifiers,
and store them in raw::select_statement.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-08 10:09:22 -04:00
Avi Kivity
ab3f044daa Revert "Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz"
This reverts commit dcb263b36b, reversing
changes made to a6759dc6aa. schema_change_test
fails consistently on master with it.
2019-05-08 16:19:38 +03:00
JP-Reddy
56420dc650 scylla_io_setup: TypeError in iotune_args array from scylla_io_setup script
Whenever the iotune_args array uses "--smp", it needs cpudata.smp()
which returns an integer instead of a string. So when iotune_args is
passed to subprocess.check_call(), it actually throws "TypeError:
expected str, bytes or os.PathLike object, not int" but
"%s did not pass validation tests, it may not be on XFS..." is shown as
the exception.

Even though the user inputs correct arguments, it might still throw an
error and confuse the user that he/she has not passed the right
arguments.

One simple fix is to use str(cpudata.smp()) instead of cpudata.smp().

Signed-off-by: JP-Reddy <guthijp.reddy@gmail.com>
Message-Id: <20190406070118.48477-1-guthijp.reddy@gmail.com>
2019-05-07 20:13:54 +03:00
Paweł Dziepak
8a16cbc50d Merge "treewide: adjust for gcc 9" from Avi
"
gcc 9 complains a lot about pessimizing moves, narrowing conversions, and
has tighter deduction rules, plus other nice warnings. Fix problems found
by it, and make some non-problems compile without warnings.
"

* tag 'gcc9/v1' of https://github.com/avikivity/scylla:
  types: fix pessimizing moves
  thrift: fix pessimizing moves
  tests: fix pessimizing moves
  tests: cql_query_test: silence narrowing conversion warning
  test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T>
  table: fix potentially wrong schema when reading from zero sstables
  storage_proxy: fix pessimizing moves
  memtable: fix pessimizing moves
  IDL: silence narrowing conversion in bool serializer
  compaction: fix pessimizing moves
  cache: fix pessimizing moves
  locator: fix pessimizing moves
  database: fix pessimizing moves
  cql: fix pessimizing moves
  cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9
2019-05-07 12:19:29 +01:00
Avi Kivity
43867fe618 types: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:01:36 +03:00
Avi Kivity
1b760297f5 thrift: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:01:15 +03:00
Avi Kivity
0ff6e48e77 tests: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 10:00:58 +03:00
Avi Kivity
b60d58d6bd tests: cql_query_test: silence narrowing conversion warning
Make it explicit to gcc 9 that the conversion to bool is intended.
2019-05-07 09:59:44 +03:00
Avi Kivity
5636b621a7 test: cql_auth_syntax_test: fix ambiguity due to parser uninitialized<T>
gcc 9 is unable to decide whether to call role_name's copy or move
constructor. Help it by casting.
2019-05-07 09:58:21 +03:00
Avi Kivity
add20eb9a6 table: fix potentially wrong schema when reading from zero sstables
We use the schema during creation of the mutation_source rather than
during the query itself. Likely they're the same, and since no rows
are returned from a zero-sstable query, harmless. But gcc 9 complains.

Fix by using the query's schema.
2019-05-07 09:56:30 +03:00
Avi Kivity
985a30a01c storage_proxy: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:56:09 +03:00
Avi Kivity
fd3c493961 memtable: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:55:53 +03:00
Avi Kivity
17c268cd55 IDL: silence narrowing conversion in bool serializer
bool serializers are now aliases to int8_t serializers, but gcc 9
complains about narrowing conversions, due to the path int8_t -> int -> bool.

A bad narrowing conversion here cannot happen in practice, but massage
the code a little to silence it.
2019-05-07 09:28:24 +03:00
Avi Kivity
d7cbd3dc61 compaction: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:28:12 +03:00
Avi Kivity
9c7eb95f78 cache: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:27:50 +03:00
Avi Kivity
c42d59d805 locator: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:27:27 +03:00
Avi Kivity
96a0073929 database: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:26:58 +03:00
Avi Kivity
03e9cdbfb0 cql: fix pessimizing moves
Remove pessimizing moves, as reported by gcc 9.
2019-05-07 09:26:20 +03:00
Avi Kivity
c26ec176dd cql parser: fix conversion from uninitalized<T> to optional<T> with gcc 9
We use uninitialized<T> (wrapping an optional<T>) to adjust to the
parser's way of laying out the code, but this fails with gcc 9
(presumably for the correct reasons) when converting from
uninitialized<T> back to optional<T>. Add a conversion operator
to make it build.
2019-05-07 09:21:22 +03:00
Dejan Mircevski
0ea6df2cd1 tests: Add predicates for checking exception messages
Many tests verify exception messages.  Currently, they do so via
verbose lambdas or inner functions that hide test-failure locations.
This patch adds utilities for quick creation of message-checking tests
and replaces existing ad-hoc methods with these new utilities.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190506210006.124645-1-dejan@scylladb.com>
2019-05-07 07:11:07 +03:00
Avi Kivity
dcb263b36b Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz
"
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.

Branches: 3.1
"

* tag 'fix-gc_clock-digest-v1' of github.com:tgrabiec/scylla:
  tests: Add test which verifies that schema digest stays the same
  tests: Add sstables for the schema digest test
  gc_clock: Fix hashing to be backwards-compatible
2019-05-07 07:04:40 +03:00
Tomasz Grabiec
8019634dba tests: Add test which verifies that schema digest stays the same 2019-05-06 18:43:43 +02:00
Tomasz Grabiec
1f2995c8c5 tests: Add sstables for the schema digest test
Generated by running test_schema_digest_does_not_change with
regenerate set to true.
2019-05-06 18:43:43 +02:00
Tomasz Grabiec
549d0eb2f3 gc_clock: Fix hashing to be backwards-compatible
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.
2019-05-06 18:43:43 +02:00
Avi Kivity
a6759dc6aa Update seastar submodule
* seastar 4cdccae...f73690e (16):
  > sstring: silence technically correct but unhelpful warning in sstring move ctor
  > cmake: add a seastar_supports_flag function
  > future: Fix build with libc++'s non-trivially-constructible  std::tuple<>
  > Revert "Make sure all allocations are properly bytes aligned"
  > Merge "future: simplify future_state management" from Rafael
  > Make sure all allocations are properly bytes aligned
  > util/log: use correct clock type
  > core/reactor: don't assume system_clock::duration is in nanoseconds
  > Merge "Optimize the future_state move constructor" from Rafael
  > rpc: don't use boost/variant.hpp directly
  > core/memory: Omit [[gnu::leaf]] attribute on clang
  > Fix build with std::filesystem
  > Merge "Fix clang build and tests" from Rafael
  > cmake: Move ) out of quotes
  > Merge "Fix some bugs found by (or perhaps in) gcc 9" by Avi
  > Deduplicate Seastar dependencies management in CMake scripts
2019-05-06 19:17:37 +03:00
Gleb Natapov
1d851a3892 messaging: catch an error that sending of CLIENT_ID may return
Avoid a warning about unhandled exception.

Message-Id: <20190506122718.GL21208@scylladb.com>
2019-05-06 18:13:51 +03:00
Glauber Costa
79a5351651 scylla-housekeeping: timeout eventually
scylla-housekeeping always wants to run in the installation to check if
we are running the latest version. This happens regardless of whether or
not we said yes or no to the housekeeping scylla_setup question - as
that question only deals with whether or not we want to do this through
a timer.

It is fine to try to run scylla-housekeeping, as long as we time it out.
The current code doesn't.

The naive solution is to add a timeout parameter to urllib.request.open.
However, that timeout is not respected and in my tests I saw real
timeouts up to four times higher the timeout we set. For a reasonable 5s
timeout, this mean a 20s real timeout which can lead to a very bad user
experience. This seems to be a known problem with this module according
to a quick Google search.

This patch then takes a slightly more complex solution and uses
multiprocess to enforce a well-defined user-visible timeout.

Fixes #3980

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190506122335.5707-1-glauber@scylladb.com>
2019-05-06 17:37:59 +03:00
Gleb Natapov
b8188e1e2f storage_proxy: avoid copying of a topology and endpoint array in batchlog code
batchlog make copies of topology and endpoint array in batchlog endpoint
choosing code. There is a remark that at least endpoint copy is
deliberate because Cassandra code has it. We do not have to follow. Our
endpoint calculation code is atomic, so we can use a reference.

Message-Id: <20190506115815.GK21208@scylladb.com>
2019-05-06 17:36:50 +03:00
Raphael S. Carvalho
ef5681486f compaction: do not unconditionally delete a new sstable in interrupted compaction
After incremental compaction, new sstables may have already replaced old
sstables at any point. Meaning that a new sstable is in-use by table and
a old sstable is already deleted when compaction itself is UNFINISHED.
Therefore, we should *NEVER* delete a new sstable unconditionally for an
interrupted compaction, or data loss could happen.
To fix it, we'll only delete new sstables that didn't replace anything
in the table, meaning they are unused.

Found the problem while auditting the code.

Fixes #4479.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190506134723.16639-1-raphaelsc@scylladb.com>
2019-05-06 16:55:36 +03:00
Avi Kivity
1c65ba6e66 Use correct scylla_tables schema for removing version column
Mutations carry their schema, so use that instead of bring in a global schema,
which may change as features are added.
Message-Id: <20190505132542.6472-1-avi@scylladb.com>
2019-05-06 13:51:08 +02:00
Paweł Dziepak
51e98e0e11 tests/perf_fast_forward: report average number of aio operations
perf_fast_forward is used to detect performance regressions. The two
main metrics used for this are fargments per second and the number of
the IO operations. The former is a median of a several runs, but the
latter is just the actual number of asynchronous IO operations performed
in the run that happened to be picked as a median frag/s-wise. There's
no always a direct correlation between frag/s and aio and the latter can
vary which makes the latter hard to compare.

In order to make this easier a new metric was introduced: "average aio"
which reports the average number of asynchronous IO operations performed
in a run. This should produce much more stable results and therefore
make the comparison more meaningful.
Message-Id: <20190430134401.19238-1-pdziepak@scylladb.com>
2019-05-06 11:47:31 +02:00
Piotr Sarna
cf8d2a5141 Revert "view: cache is_index for view pointer"
This reverts commit dbe8491655.
Caching the value was not done in a correct manner, which resulted
in longevity tests failures.

Fixes #4478

Branches: 3.1

Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>
2019-05-06 11:45:46 +03:00
Benny Halevy
d9136f96f3 commitlog: descriptor: skip leading path from filename
std::regex_match of the leading path may run out of stack
with long paths in debug build.

Using rfind instead to lookup the last '/' in in pathname
and skip it if found.

Fixes #4464

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>
2019-05-05 17:51:56 +03:00
Benny Halevy
3a2fa82d6e time_window_backlog_tracker: fix use after free
Fixes #4465

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190430094209.13958-1-bhalevy@scylladb.com>
2019-05-05 12:47:51 +03:00
Glauber Costa
47d04e49e8 scylla_setup: respect user's decision not to call housekeeping
The setup script asks the user whether or not housekeeping should
be called, and in the first time the script is executed this decision
is respected.

However if the script is invoked again, that decision is not respected.

This is because the check has the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
 }
 if (version_check) { do_version_check() } else { dont_do_it() }

When it should have the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
    if (version_check) { do_version_check() } else { dont_do_it() }
 }

(Thanks python)

This is problematic in systems that are not connected to the internet, since
housekeeping will fail to run and crash the setup script.

Fixes #4462

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502034211.18435-1-glauber@scylladb.com>
2019-05-02 18:46:41 +03:00
Glauber Costa
99c00547ad make scylla_util OS detection robust against empty lines
Newer versions of RHEL ship the os-release file with newlines in the
end, which our script was not prepared to handle. As such, scylla_setup
would fail.

This patch makes our OS detection robust against that.

Fixes #4473

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502152224.31307-1-glauber@scylladb.com>
2019-05-02 18:33:35 +03:00
Paweł Dziepak
cf451f0e62 Merge "gdb: Fixes and improvements to memory analysis" from Tomasz
"
One of the fixes is for incorrect recognition of memory pages as belonging
or not belonging to small allocation pools in some cases.

Also, compensates for https://github.com/scylladb/seastar/issues/608 in "scylla memory",
which improves accurracy of the small allocation pool report.

Fixes "scylla task_histogram" to not look into pages which do not belong to live
small allocation pool spans.

Fixes #4367
Fixes #4368
"

* tag 'gdb-fix-span-qualification-v2' of github.com:tgrabiec/scylla:
  gdb: Print size of large allocations in 'scylla ptr'
  gdb: Fix 'scylla ptr' for free pages
  gdb: Set is_live and offset for large allocations properly in 'scylla ptr'
  gdb: Fix 'scylla ptr' misqualifying pointers
  gdb: Make 'scylla memory' show unused memory in small pools
  gdb: Fix small pool memory usage reporting in 'scylla memory'
  gdb: Switch 'scylla memory' to use the span_checker to find large spans
  gdb: Switch task_histogram to use the span_checker
  gdb: Introduce span_checker
2019-05-02 14:25:30 +01:00
Gleb Natapov
95c6d19f6c batchlog_manager: fix array out of bound access
endpoint_filter() function assumes that each bucket of
std::unordered_multimap contains elements with the same key only, so
its size can be used to know how many elements with a particular key
are there.  But this is not the case, elements with multiple keys may
share a bucket. Fix it by counting keys in other way.

Fixes #3229

Message-Id: <20190501133127.GE21208@scylladb.com>
2019-05-01 17:30:11 +03:00
Nadav Har'El
2710f382de secondary index: expand test of secondary-index and UPDATE requests
The existing unit test test_secondary_index_contains_virtual_columns
reproduced a bug (issue #4144) with indexing of primary-key columns,
but we only actually tested clustering columns. In issue #4471 there
was a question whether we may still have a bug when indexing of
*partition-key* columns. This patch adds a test that verifies that
we don't, and this case works well too.

Refs #4144
Refs #4471

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190501113500.25900-1-nyh@scylladb.com>
2019-05-01 12:53:23 +01:00
Nadav Har'El
a45b6e41a0 materialized views and secondary index: sometimes allow dropping base columns
Until this patch, dropping columns from a table was completely forbidden
if this table has any materialized views or secondary indexes. However,
this is excessively harsh, and not compatible with Cassandra which does
allow dropping columns from a base table which has a secondary index on
*other* columns. This incompatibility was raised in the following
Stackoverflow question:
https://stackoverflow.com/questions/55757273/error-while-dropping-column-from-a-table-with-secondary-index-scylladb/55776490

In this patch, we allow dropping a base table column if none of its
materialized views *needs* this column. Columns selected by a view
(as regular or key columns) are needed by it, of course, but when
virtual columns are used (namely, there is a view with same key columns
as the base), *all* columns are needed by the view, so unfortunately none
of the columns may be dropped.

After this patch, when a base-table column cannot be dropped because one
of the materialized views needs it, the error message will look like:

   exceptions::invalid_request_exception: Cannot drop column a from base
   table ks.cf: a materialized view cf_a_idx_index needs this column.

This patch also includes extensive testing for the cases where dropping
columns are now allowed, and not allowed. The secondary-index tests are
especially interesting, because they demonstrate that now usually (when
a non-key column is being indexed) dropping columns will be allowed,
which is what originally bothered the Stackoverflow user.

Fixes #4448.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190429214805.2972-1-nyh@scylladb.com>
2019-04-30 12:13:10 +01:00
Nadav Har'El
92d5f61ba5 cql: support single-value IN restriction wherever EQ restriction is supported
There are several places were IN restrictions are not currently supported,
especially in queries involving a secondary index. However, when the IN
restriction has just a single value, it is nothing more than an equality
restriction and can be converted into one and be supported. So this patch
does exactly this.

Note that Cassandra does this conversion since August 2016, and therefore
supports the special case of single-value IN even where general IN is not
supported. So it's important for Cassandra compatibility that we do this
conversion too.

This patch also includes a test with two queries involving a secondary
index that were previously disallowed because of the "IN" on the primary
key or the indexed column - and are now allowed when the IN restriction
has just a single value. A third query tested is not related to secondary
indexes, but confirms we don't break multi-column single-value IN queries.

Fixes #4455.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190428160317.23328-1-nyh@scylladb.com>
2019-04-30 12:13:06 +01:00
Tomasz Grabiec
1adcb3637e Merge "multishard reader: fix handling of non strictly monotonous positions" from Botond
The shard readers of the multishard reader assumed that the positions in
the data stream are strictly monotonous. This assumption is invalid.
Range tombstones can have positions that they can share with other range
tombstones and/or a clustering row. The effect of this false assumption
was that when the shard reader was evicted such that the last seen
fragment was a range tombstone, when recreated it would skip any unseen
fragments that have the same position as that of the last seen range
tombstone.

Fixes: #4418

Branches: master, 3.0, 2019.1

Tests: unit(dev)

* https://github.com/denesb/scylla.git
multishard_reader_handle_non_strictly_monotonous_positions/v4:
  multishard_combining_reader: shard_reader::remote_reader extract
    fill-buffer logic into do_fill_buffer()
  mutlishard_combining_reader: reorder
    shard_reader::remote_reader::do_fill_buffer() code
  position_in_partition_view: add region() accessor
  multishard_combining_reader: fix handling of non-strictly monotonous
    positions
  flat_mutation_reader: add flat_mutation_reader_from_mutations()
    overload with range and slice
  flat_mutation_reader: add make_flat_mutation_reader_from_fragments()
    overload with range and slice
  tests: add unit test for multishard reader correctly handling
    non-strictly monotonous positions
2019-04-30 12:35:28 +02:00
Tomasz Grabiec
077c639e42 Merge "Simplify the result_set_row API" from Rafael
Currently null and missing values are treated differently. Missing
values throw no_such_column. Null values return nullptr, std::nullopt
or throw null_column_value.

The api is a bit confusing since a function returning a std::optional
either returns std::nullopt or throws depending on why there is no
value.

With this patch series only get_nonnull throws and there is only one
exception type.

* https://github.com/espindola/scylla.git espindola/merge-null-and-missing-v2:
  query-result-set: merge handling of null and missing values
  Remove result_set_row::has
  Return a reference from get_nonnull
2019-04-30 11:06:29 +02:00