Commit Graph

18957 Commits

Author SHA1 Message Date
Rafael Ávila de Espíndola
4e7ffb80c0 cql: Fix use of UDT in reversed columns
We were missing calls to underlying_type in a few locations and so the
insert would think the given literal was invalid and the select would
refuse to fetch a UDT field.

Fixes #4672

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190708200516.59841-1-espindola@scylladb.com>
2019-07-12 19:21:26 +03:00
Rafael Ávila de Espíndola
281f3a69f8 mc writer: Fix exception safety when closing _index_writer
This fixes a possible cause of #4614.

From the backtrace in that issue, it looks like a file is being closed
twice. The first point in the backtrace where that seems likely is in
the MC writer.

My first idea was to add a writer::close and make it the responsibility
of the code using the writer to call it. That way we would move work
out of the destructor.

That is a bit hard since the writer is destroyed from
flat_mutation_reader::impl::~consumer_adapter and that would need to
get a close function too.

This patch instead just fixes an exception safety issue. If
_index_writer->close() throws, _index_writer is still valid and
~writer will try to close it again.

If the exception was thrown after _completed.set_value(), that would
explain the assert about _completed.set_value() being called twice.

With this patch the path outside of the destructor now moves the
writer to a local variable before trying to close it.

Fixes #4614
Message-Id: <20190710171747.27337-1-espindola@scylladb.com>
2019-07-10 19:27:19 +02:00
Paweł Dziepak
eb7d17e5c5 lsa: make sure align_up_for_asan() doesn't cause reads past end of segment
In debug mode the LSA needs objects to be 8-byte aligned in order to
maximise coverage from the AddressSanitizer.

Usually `close_active()` creates a dummy objects that covers the end of
the segment being closed. However, it the last real objects ends in the
last eight bytes of the segment then that dummy won't be created because
of the alignment requirements. This broke exit conditions on loops
trying to read all objects in the segment and caused them to attempt to
dereference address at the end of the segment. This patch fixes that.

Fixes #4653.
2019-07-10 19:19:24 +02:00
Avi Kivity
e32bdb6b90 Merge "Warn user about using SimpleStrategy with Multi DC deployment" from Kamil
"
If the user creates a keyspace with the 'SimpleStrategy' replication class
in a multi-datacenter environment, they will receive a warning in the CQL shell
and in the server logs.

Resolves #4481 and #4651.
"

* 'multidc' of https://github.com/kbr-/scylla:
  Warn user about using SimpleStrategy with Multi DC deployment
  Add warning support to the CQL binary protocol implementation
2019-07-10 16:47:07 +03:00
Avi Kivity
138b28ae43 Merge "Fix command line parsing and add logging." from Kamil
"
Fixes #4203 and #4141.
"

* 'cmdline' of https://github.com/kbr-/scylla:
  Add logging of parsed command line options
  Fix command line argument parsing in main.
2019-07-10 16:40:57 +03:00
Avi Kivity
405fd517b0 Merge "IPv6 support" from Calle
"
Fixes #2027

Modifies inet address type in scylla to use seastar::net::inet_address,
and removes explicit use of ipv4_addr in various network code in favour
of socket_address. Thus capable of resolving and binding to ipv6.

Adds config option to enable/disable ipv6 (default enabled), so
upgrading cluster can continue to work while running mixed version
nodes (since gossip message address serialization becomes different).
"

* 'calle/ipv6' of https://github.com/elcallio/scylla:
  test-serialization: Add small roundtrip test for inet address (v4 + v6)
  inet_address/init: Make ipv6 default enabled
  db::config: Add enable ipv6 switch (default off)
  gms::inet_address: Make serialization ipv6 aware
  Remove usage of inet_address::raw_addr()
  Replace use of "ipv4_addr" with socket_address
  inet_address: Add optional family to lookup
  gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address
  types: Add ipv6_address support
2019-07-10 15:07:56 +03:00
Benny Halevy
b4dc118639 tests: logalloc_test: scale down test_region_groups
Post commit b3adabda2d
(Reduce logalloc differences between debug and release)
logalloc_test's memory footprint has grown, in particular
in test_region_groups, and it triggers the oom killer on
our test automation machines.

This patch scales down this test case so it requires less memory.

Fixes #4669

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-10 12:06:10 +02:00
Pekka Enberg
bb53c109b4 test.py: Add option for repeating test execution
This adds a '--repeat N' command line option to test.py, which can be
used to execute the tests N times. This is useful for finding flakey
tests, for example.

Message-Id: <20190710092115.15960-1-penberg@scylladb.com>
2019-07-10 12:42:39 +03:00
Botond Dénes
ce647fac9f timestamp_based_splitting_writer: fix the handling of partition tombstone
Currently the handling of partition tombstones is broken in multiple
ways:
* The partition-tombstone is lost when the bucket is calculated for its
timestamp (due to a misplaced `std::exchange()`).
* When the `partition_start` fragment (containing the partition
tombstone) is actually written to the bucket we emit another
`partition_start` fragment before it because the bucket has not seen
that partition before and we fail to notice that we are actually writing
the partition header.

This bug was allowed to fly under the radar because the unit test was
accidentally not creating partition tombstones in the generated data
(due to a mistake). It was discovered while working on unit tests for
another test and fixing the data generation function to actually
generate partition tombstones.

This patch fixes both problems in the handling of partition tombstones
but it doesn't yet fixes the test. That is deferred until the patch
series which uncovered this bug is merged to avoid merge conflicts.
The other series mentioned here is: [PATCH v6 00/15] compaction: allow
collecting purged data

Fixes: #4683

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190710092427.122623-1-bdenes@scylladb.com>
2019-07-10 12:36:57 +03:00
Pekka Enberg
e6cc90aa98 test: add 'eventually' block to index paging test (#4681)
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670

Tests: unit(dev)
2019-07-10 11:46:03 +03:00
Kamil Braun
d6736a304a Add metric for failed memtable flushes
Resolves #3316.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-10 11:30:10 +03:00
Amnon Heiman
2fbc5ea852 config_file.hh: get_value return a pointer to the value
The get_value method returns a pointer to the value that is used by the
value_to_json method.

The assumption is that the void pointer points to the actual value.

Fixes #4678

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-07-10 10:40:35 +03:00
Piotr Sarna
ebbe038d19 test: add 'eventually' block to index paging test
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670
2019-07-09 17:07:16 +02:00
Asias He
39ca044dab repair: Allow repair when a replica is down
Since commit bb56653 (repair: Sync schema from follower nodes before
repair), the behaviour of handling down node during repair has been
changed.  That is, if a repair follower is down, it will fail to sync
schema with it and the repair of the range will be skipped. This means
a range can not be repaired unless all the nodes for the replicas are up.

To fix, we filter out the nodes that is down and mark the repair is
partial and repair with the nodes that are still up.

Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test
Fixes: #4616
Backports: 3.1

Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>
2019-07-09 10:07:36 +03:00
Calle Wilund
5dfc356380 test-serialization: Add small roundtrip test for inet address (v4 + v6)
Verify we get back what we put in.
2019-07-08 15:28:21 +00:00
Calle Wilund
3cfb79e0ff inet_address/init: Make ipv6 default enabled
Makes lookup find any (incl ipv6 numeric) address.
Init will look at enable_ipv6 and use explcit ipv4 family lookup if not
enabled.
2019-07-08 14:13:10 +00:00
Calle Wilund
1f5e1d22bf db::config: Add enable ipv6 switch (default off)
Off by default to prevent problems during cluster migration when
needing to gossip with non-ipv6 aware nodes.
2019-07-08 14:13:09 +00:00
Calle Wilund
c540e36fe2 gms::inet_address: Make serialization ipv6 aware
Because inet_address was initially hardcoded to
ipv4, its wire format is not very forward compatible.
Since we potentially need to communicate with older version nodes, we
manually define the new serial format for inet_address to be:

ipv4: 4  bytes address
ipv6: 4  bytes marker 0xffffffff (invalid address)
      16 bytes data -> address
2019-07-08 14:13:09 +00:00
Calle Wilund
e9816efe06 Remove usage of inet_address::raw_addr() 2019-07-08 14:13:09 +00:00
Calle Wilund
4ef940169f Replace use of "ipv4_addr" with socket_address
Allows the various sockets to use ipv6 address binding if so configured.
2019-07-08 14:13:09 +00:00
Calle Wilund
5ba545f493 inet_address: Add optional family to lookup 2019-07-08 14:13:09 +00:00
Calle Wilund
5fd811ec8a gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address
Thusly handle all types net::inet_address can handle. I.e. ipv6.
2019-07-08 14:13:09 +00:00
Calle Wilund
482fd72ca2 types: Add ipv6_address support
As ipv4, just redirect to inet_address.
2019-07-08 14:09:25 +00:00
Benny Halevy
a0499bbd31 lister::guarantee_type: do not follow symlink
Simliar to commit 9785754e0d
lister::guarantee_type needs to check the entry's type,
not the symlink it may point to.

Fixes #4606

The nodetool_refresh_with_wrong_upload_modes_test dtest creates a broken
symlink and following it fails, as it should, with the default follow_symlink::yes

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190626110734.4558-1-bhalevy@scylladb.com>
2019-07-07 15:29:28 +03:00
Avi Kivity
63edd46562 Merge "Expand big decimal with arithmetic operators" from Piotr
"
This miniseries expands big_decimal interface with convenience operators
(-=, +, -), provides test cases for it and makes one of the constructors
explicit.

Tests: unit(dev)
"

* 'expand_big_decimal_interface' of https://github.com/psarna/scylla:
  utils: make string-based big decimal constructor explicit
  tests: add more operators to big decimal tests
  utils: add operators to big_decimal
2019-07-06 12:26:08 +03:00
Avi Kivity
24caf0824d Merge "Complete the LIKE operator" from Dejan
"
Implement LIKE parsing, intermediate representation, and query processing. Add tests
for this implementation (leaving the LIKE functionality tests in
tests/like_matcher_test.cc).

Refs #4477.
"

* 'finish-like' of https://github.com/dekimir/scylla:
  cql3: Add LIKE operator to CQL grammar
  cql3: Ensure LIKE filtering for partition columns
  cql3: Add LIKE restriction
  cql3: Add LIKE relation
2019-07-06 12:26:08 +03:00
kbr-
8995945052 Implement tuple_type_impl::to_string_impl. (#4645)
Resolves #4633.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-06 12:26:08 +03:00
Avi Kivity
187859ad78 review-checklist: mention that the guidelines are not absolute rules and can be overridden 2019-07-06 12:26:08 +03:00
Kamil Braun
c0915c40eb Warn user about using SimpleStrategy with Multi DC deployment
If the user creates a keyspace with the 'SimpleStrategy' replication class
in a multi-datacenter environment, they will receive a warning in the CQL shell
and in the server logs.
Resolves #4481.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-05 09:25:03 +02:00
Kamil Braun
35dbe9371c Add warning support to the CQL binary protocol implementation
The CQL binary protocol v4 adds support for server-side warnings:
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec
This adds a convenient API to add warnings to messages returned to the user.
Resolves #4651.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-05 09:24:56 +02:00
Kamil Braun
2f0f53ac72 Add logging of parsed command line options
The recognized command line options are now being printed when Scylla is run,
together with the whole command used.
Fixes #4203.
2019-07-05 09:00:28 +02:00
Piotr Sarna
eed2543bcc utils: make string-based big decimal constructor explicit
As a rule of thumb, single-parameter constructors should be explicit
in order to avoid unexpected implicit conversions.
2019-07-04 11:33:00 +02:00
Piotr Sarna
7e722f8dd5 tests: add more operators to big decimal tests 2019-07-04 11:32:57 +02:00
Piotr Sarna
a5e41408ec utils: add operators to big_decimal
For convenience, operators -=, + and - are implemented on top of +=.
2019-07-04 11:32:53 +02:00
Dejan Mircevski
6727e8f073 cql3: Add LIKE operator to CQL grammar
Extend the grammar with LIKE and add CQL query tests for it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 11:01:13 +02:00
Dejan Mircevski
1c583de8bb cql3: Ensure LIKE filtering for partition columns
Partition columns are implicitly filtered whenever possible, avoiding
expensive post-processing.  But there are exceptions, eg, when
partition key is only partially restricted, or for CONTAINS
expressions.  Here we add LIKE to this list of exceptions.

Also fix compute_bounds() to punt on LIKE restrictions, which cannot
be translated into meaningful bounds.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:59:13 +02:00
Dejan Mircevski
63cec653e5 cql3: Add LIKE restriction
This restriction leverages like_matcher to perform filtering.

Make single_column_relation::new_LIKE_restriction() return this new
restriction.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:58:56 +02:00
Dejan Mircevski
21d7722594 cql3: Add LIKE relation
Add a new type of relation with operator LIKE.  Handle it in
relation::to_restriction by introducing a new virtual method for it.
The temporary implementation of this method returns null; that will be
replaced in a subsequent patch.

Add abstract_type::is_string() to recognize string columns and
disallow LIKE operator on non-string columns.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:54:30 +02:00
Kamil Braun
f155a2d334 Fix command line argument parsing in main.
Command line arguments are parsed twice in Scylla: once in main and once in Seastar's app_template::run.
The first parse is there to check if the "--version" flag is present --- in this case the version is printed
and the program exists.  The second parsing is correct; however, most of the arguments were improperly treated
as positional arguments during the first parsing (e.g., "--network host" would treat "host" as a positional argument).
This happened because the arguments weren't known to the command line parser.
This commit fixes the issue by moving the parsing code until after the arguments are registered.
Resolves #4141.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-03 14:11:34 +02:00
Avi Kivity
8a0c4d508a Merge "Repair switch to rpc stream" from Asias
"
The put_row_diff, get_row_dif and get_full_row_hashes verbs are switched
to use rpc stream instead of rpc verb. They are the verbs that could
send big rpc messages. The rpc stream sink and source are created per
repair follower for each of the above 3 verbs. The sink and source are
shared for multiple requests during the entire repair operation for a
given range, so there is no overhead to setup rpc stream.

The row buffer is now increased to 32MiB from 256KiB, giving better
bandwidth in high latency links. The downside of bigger row buffer is
reduced possibility that all the rows inside a row buffer are identical.
This causes more full hashes to be exchanged. To address this issue, the
plan is to add better set reconciliation algorithm in addition to the
current send full hashes.

I compared rebuild using regular stream plan with repair using rpc
stream. With 2 nodes, 1 smp, 8M rows, delete all data on one of the
node before repair or rebuild.

    repair using seastar rpc verb

Time to complete: 82.17s

    rebuild using regular streaming which uses seastar rpc stream

Time to complete: 63.87s

    repair using seastar rpc stream

Time to complete: 68.48s

For 1) and 3), the improvement is 16.6% (repair using rpc verb v.s. repair using rpc stream)

For 2) and 3), the difference is 7.2% (repair v.s. stream)

The result is promising for the future repair-based bootstrap/replace node operations.

NOTE: We do not actually enable rpc stream in row level repair for now. We
will enable it after we fix the the stall issues caused by handling
bigger row buffers.

Fixes #4581
"

* 'repair_switch_to_rpc_stream_v9' of https://github.com/asias/scylla: (45 commits)
  docs: Add RPC stream doc for row level repair
  repair: Mark some of the helper functions static
  repair: Increase max row buf size
  repair: Hook rpc stream version of verbs in row level repair
  repair: Add use_rpc_stream to repair_meta
  repair: Add is_rpc_stream_supported
  repair: Add needs_all_rows flag to put_row_diff
  repair: Optimize get_row_diff
  repair: Register repair_get_full_row_hashes_with_rpc_strea
  repair: Register repair_put_row_diff_with_rpc_stream
  repair: Register repair_get_row_diff_with_rpc_stream
  repair: Add repair_get_full_row_hashes_with_rpc_stream_handler
  repair: Add repair_put_row_diff_with_rpc_stream_handler
  repair: Add repair_get_row_diff_with_rpc_stream_handler
  repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op
  repair: Add repair_put_row_diff_with_rpc_stream_process_op
  repair: Add repair_get_row_diff_with_rpc_stream_process_op
  repair: Add put_row_diff_with_rpc_stream
  repair: Add put_row_diff_sink_op
  repair: Add put_row_diff_source_op
  ...
2019-07-03 10:08:55 +03:00
Asias He
f686f0b9d6 docs: Add RPC stream doc for row level repair
This documents RPC stream usage in row level repair.
2019-07-03 08:09:57 +08:00
Asias He
78ae5af203 repair: Mark some of the helper functions static
They are used only inside repair/row_level.cc. Make them static.
2019-07-03 08:09:57 +08:00
Asias He
e8c13444ba repair: Increase max row buf size
If the cluster supports row level repair with rpc stream interface, we
can use bigger row buf size to have better repair bandwidth in high
latency links.
2019-07-03 08:01:37 +08:00
Asias He
7d08a8d223 repair: Hook rpc stream version of verbs in row level repair
If rpc stream is supported, use the rpc stream version of the
get_row_diff, put_row_diff, get_full_row_hashes.
2019-07-03 08:01:37 +08:00
Asias He
fccaa0324f repair: Add use_rpc_stream to repair_meta
Determine if rpc stream should be used.
2019-07-03 08:01:37 +08:00
Asias He
7bf0c646be repair: Add is_rpc_stream_supported
Given a row_level_diff_detect_algorithm, return if this algo supports
rpc stream interface.
2019-07-03 08:01:04 +08:00
Asias He
1c92643f02 repair: Add needs_all_rows flag to put_row_diff
So we can avoid copy _working_row_buf in get_row_diff on master node if
there is only one follower node and all repair rows are needed by
follower node.
2019-07-03 07:56:22 +08:00
Asias He
6595417567 repair: Optimize get_row_diff
Move _working_row_buf instead of copy if it is follower node or
it is master node with only one follow. In these cases, the
_working_row_buf will not be used after this function, so we can move
it.
2019-07-03 07:56:22 +08:00
Asias He
c4eb0ee361 repair: Register repair_get_full_row_hashes_with_rpc_strea
Register the get_full_row_hashes rpc stream verb.
2019-07-03 07:56:22 +08:00
Asias He
b56cced5b8 repair: Register repair_put_row_diff_with_rpc_stream
Register the put_row_diff rpc stream verb.
2019-07-03 07:56:22 +08:00