Commit Graph

18967 Commits

Author SHA1 Message Date
Botond Dénes
b26fe76fc1 tests: random_schema: futurize generate_random_mutations()
To avoid reactor stalls when generate many and/or large partitions.
2019-07-15 17:38:00 +03:00
Botond Dénes
cf135c6257 tests/random_schema: generate_random_mutations(): allow customizing generated data
Allow callers to specify the number of partitions generated, as well as
the number of clustering rows and range tombstones generated per
partition.
2019-07-15 17:38:00 +03:00
Botond Dénes
d2930ffa53 tests/random_schema: add assert to make_clustering_key()
Verify that the schema *does* indeed have clustering columns. Better an
assert than a cryptic "division by 0" exception deeper in the call stack.
2019-07-15 17:38:00 +03:00
Botond Dénes
d90ac6bd7b tests/random_schema: generate_random_mutations(): remove engine parameter
Use an internally create instance of random engine. Passing a readily
seeded engine from the outside is pointless now that we have a mechanism
to seed entire test suites with a command line algorithm: the internal
engine is seeded from tests::random, so the seed of the test suite
determines the internal seed as well.

Update the sole user of this method (mutation_writer_test.cc) to not
generate local seeds anymore.
2019-07-15 17:38:00 +03:00
Botond Dénes
fd2f53f292 tests: mutation_writer_test.cc/generate_mutations() -> random_schema.hh/generate_random_mutations()
We plan on allowing other tests to use this method. The first step is to
make it available in a header.
2019-07-15 17:38:00 +03:00
Botond Dénes
7a4a609e88 Introduce Garbage Collected Consumer to Mutation Compactor
Introduce consumer in mutation compactor that will only consume
data that is purged away from regular consumer. The goal is to
allow compaction implementation to do whatever it wants with
the garbage collected data, like saving it for preventing
data resurrection from ever happening, like described in
issue #4531.
noop_compacted_fragments_consumer is made available for users that
don't need this capability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-07-15 17:38:00 +03:00
Botond Dénes
4c2781edaa row_marker: add garbage_collector
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
the row_marker when it expired and would be discarded.
The collector param is optional and defaults to nullptr.
2019-07-15 17:38:00 +03:00
Botond Dénes
7db2006162 row_marker: de-inline compact_and_expire() 2019-07-15 17:38:00 +03:00
Botond Dénes
4c7a7ffe8f row: add garbage_collector
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
all atoms that are expired and can would be discarded. The body of
`compact_and_expire()` was changed so that it checks cells' tombstone
coverage before it checks their expiry, so that cells that are both
covered by a tombstone and also expired are not passed to the collector.
The collector is forwarded to
`collection_type_impl::mutation::compact_and_expire()` as well.
The collector param is optional and defaults to nullptr
2019-07-15 17:38:00 +03:00
Botond Dénes
307b48794d collection_type_impl::mutation: compact_and_expire() add collector parameter
The new collector parameter is a pointer to a
`compaction_garbage_collector` implementation. This collector is passed
all atoms that are expired and would be discarded. The body of
`compact_and_expire()` was changed so that it checks cells' tombstone
coverage before it checks their expiry, so that cells that are both
covered by a tombstone and also expired are not passed to the collector.
The collector param is optional and defaults to nullptr. To accommodate
the collector, which needs to know the column id, a new `column_id`
parameter was added as well.
2019-07-15 17:37:55 +03:00
Botond Dénes
5002ebb73f Introduce compaction_garbage_collector interface
This interface can be used to implement a garbage collector that
collects atoms that are purged due to expiry during compaction.
The intended usage is collecting purged atoms for safekeeping until the
compaction process finishes safely, to be dropped only at the end when
the compaction is known to have finished successfully.
2019-07-15 15:30:43 +03:00
Rafael Ávila de Espíndola
281f3a69f8 mc writer: Fix exception safety when closing _index_writer
This fixes a possible cause of #4614.

From the backtrace in that issue, it looks like a file is being closed
twice. The first point in the backtrace where that seems likely is in
the MC writer.

My first idea was to add a writer::close and make it the responsibility
of the code using the writer to call it. That way we would move work
out of the destructor.

That is a bit hard since the writer is destroyed from
flat_mutation_reader::impl::~consumer_adapter and that would need to
get a close function too.

This patch instead just fixes an exception safety issue. If
_index_writer->close() throws, _index_writer is still valid and
~writer will try to close it again.

If the exception was thrown after _completed.set_value(), that would
explain the assert about _completed.set_value() being called twice.

With this patch the path outside of the destructor now moves the
writer to a local variable before trying to close it.

Fixes #4614
Message-Id: <20190710171747.27337-1-espindola@scylladb.com>
2019-07-10 19:27:19 +02:00
Paweł Dziepak
eb7d17e5c5 lsa: make sure align_up_for_asan() doesn't cause reads past end of segment
In debug mode the LSA needs objects to be 8-byte aligned in order to
maximise coverage from the AddressSanitizer.

Usually `close_active()` creates a dummy objects that covers the end of
the segment being closed. However, it the last real objects ends in the
last eight bytes of the segment then that dummy won't be created because
of the alignment requirements. This broke exit conditions on loops
trying to read all objects in the segment and caused them to attempt to
dereference address at the end of the segment. This patch fixes that.

Fixes #4653.
2019-07-10 19:19:24 +02:00
Avi Kivity
e32bdb6b90 Merge "Warn user about using SimpleStrategy with Multi DC deployment" from Kamil
"
If the user creates a keyspace with the 'SimpleStrategy' replication class
in a multi-datacenter environment, they will receive a warning in the CQL shell
and in the server logs.

Resolves #4481 and #4651.
"

* 'multidc' of https://github.com/kbr-/scylla:
  Warn user about using SimpleStrategy with Multi DC deployment
  Add warning support to the CQL binary protocol implementation
2019-07-10 16:47:07 +03:00
Avi Kivity
138b28ae43 Merge "Fix command line parsing and add logging." from Kamil
"
Fixes #4203 and #4141.
"

* 'cmdline' of https://github.com/kbr-/scylla:
  Add logging of parsed command line options
  Fix command line argument parsing in main.
2019-07-10 16:40:57 +03:00
Avi Kivity
405fd517b0 Merge "IPv6 support" from Calle
"
Fixes #2027

Modifies inet address type in scylla to use seastar::net::inet_address,
and removes explicit use of ipv4_addr in various network code in favour
of socket_address. Thus capable of resolving and binding to ipv6.

Adds config option to enable/disable ipv6 (default enabled), so
upgrading cluster can continue to work while running mixed version
nodes (since gossip message address serialization becomes different).
"

* 'calle/ipv6' of https://github.com/elcallio/scylla:
  test-serialization: Add small roundtrip test for inet address (v4 + v6)
  inet_address/init: Make ipv6 default enabled
  db::config: Add enable ipv6 switch (default off)
  gms::inet_address: Make serialization ipv6 aware
  Remove usage of inet_address::raw_addr()
  Replace use of "ipv4_addr" with socket_address
  inet_address: Add optional family to lookup
  gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address
  types: Add ipv6_address support
2019-07-10 15:07:56 +03:00
Benny Halevy
b4dc118639 tests: logalloc_test: scale down test_region_groups
Post commit b3adabda2d
(Reduce logalloc differences between debug and release)
logalloc_test's memory footprint has grown, in particular
in test_region_groups, and it triggers the oom killer on
our test automation machines.

This patch scales down this test case so it requires less memory.

Fixes #4669

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-10 12:06:10 +02:00
Pekka Enberg
bb53c109b4 test.py: Add option for repeating test execution
This adds a '--repeat N' command line option to test.py, which can be
used to execute the tests N times. This is useful for finding flakey
tests, for example.

Message-Id: <20190710092115.15960-1-penberg@scylladb.com>
2019-07-10 12:42:39 +03:00
Botond Dénes
ce647fac9f timestamp_based_splitting_writer: fix the handling of partition tombstone
Currently the handling of partition tombstones is broken in multiple
ways:
* The partition-tombstone is lost when the bucket is calculated for its
timestamp (due to a misplaced `std::exchange()`).
* When the `partition_start` fragment (containing the partition
tombstone) is actually written to the bucket we emit another
`partition_start` fragment before it because the bucket has not seen
that partition before and we fail to notice that we are actually writing
the partition header.

This bug was allowed to fly under the radar because the unit test was
accidentally not creating partition tombstones in the generated data
(due to a mistake). It was discovered while working on unit tests for
another test and fixing the data generation function to actually
generate partition tombstones.

This patch fixes both problems in the handling of partition tombstones
but it doesn't yet fixes the test. That is deferred until the patch
series which uncovered this bug is merged to avoid merge conflicts.
The other series mentioned here is: [PATCH v6 00/15] compaction: allow
collecting purged data

Fixes: #4683

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190710092427.122623-1-bdenes@scylladb.com>
2019-07-10 12:36:57 +03:00
Pekka Enberg
e6cc90aa98 test: add 'eventually' block to index paging test (#4681)
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670

Tests: unit(dev)
2019-07-10 11:46:03 +03:00
Kamil Braun
d6736a304a Add metric for failed memtable flushes
Resolves #3316.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-10 11:30:10 +03:00
Amnon Heiman
2fbc5ea852 config_file.hh: get_value return a pointer to the value
The get_value method returns a pointer to the value that is used by the
value_to_json method.

The assumption is that the void pointer points to the actual value.

Fixes #4678

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-07-10 10:40:35 +03:00
Piotr Sarna
ebbe038d19 test: add 'eventually' block to index paging test
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670
2019-07-09 17:07:16 +02:00
Asias He
39ca044dab repair: Allow repair when a replica is down
Since commit bb56653 (repair: Sync schema from follower nodes before
repair), the behaviour of handling down node during repair has been
changed.  That is, if a repair follower is down, it will fail to sync
schema with it and the repair of the range will be skipped. This means
a range can not be repaired unless all the nodes for the replicas are up.

To fix, we filter out the nodes that is down and mark the repair is
partial and repair with the nodes that are still up.

Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test
Fixes: #4616
Backports: 3.1

Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>
2019-07-09 10:07:36 +03:00
Calle Wilund
5dfc356380 test-serialization: Add small roundtrip test for inet address (v4 + v6)
Verify we get back what we put in.
2019-07-08 15:28:21 +00:00
Calle Wilund
3cfb79e0ff inet_address/init: Make ipv6 default enabled
Makes lookup find any (incl ipv6 numeric) address.
Init will look at enable_ipv6 and use explcit ipv4 family lookup if not
enabled.
2019-07-08 14:13:10 +00:00
Calle Wilund
1f5e1d22bf db::config: Add enable ipv6 switch (default off)
Off by default to prevent problems during cluster migration when
needing to gossip with non-ipv6 aware nodes.
2019-07-08 14:13:09 +00:00
Calle Wilund
c540e36fe2 gms::inet_address: Make serialization ipv6 aware
Because inet_address was initially hardcoded to
ipv4, its wire format is not very forward compatible.
Since we potentially need to communicate with older version nodes, we
manually define the new serial format for inet_address to be:

ipv4: 4  bytes address
ipv6: 4  bytes marker 0xffffffff (invalid address)
      16 bytes data -> address
2019-07-08 14:13:09 +00:00
Calle Wilund
e9816efe06 Remove usage of inet_address::raw_addr() 2019-07-08 14:13:09 +00:00
Calle Wilund
4ef940169f Replace use of "ipv4_addr" with socket_address
Allows the various sockets to use ipv6 address binding if so configured.
2019-07-08 14:13:09 +00:00
Calle Wilund
5ba545f493 inet_address: Add optional family to lookup 2019-07-08 14:13:09 +00:00
Calle Wilund
5fd811ec8a gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address
Thusly handle all types net::inet_address can handle. I.e. ipv6.
2019-07-08 14:13:09 +00:00
Calle Wilund
482fd72ca2 types: Add ipv6_address support
As ipv4, just redirect to inet_address.
2019-07-08 14:09:25 +00:00
Benny Halevy
a0499bbd31 lister::guarantee_type: do not follow symlink
Simliar to commit 9785754e0d
lister::guarantee_type needs to check the entry's type,
not the symlink it may point to.

Fixes #4606

The nodetool_refresh_with_wrong_upload_modes_test dtest creates a broken
symlink and following it fails, as it should, with the default follow_symlink::yes

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190626110734.4558-1-bhalevy@scylladb.com>
2019-07-07 15:29:28 +03:00
Avi Kivity
63edd46562 Merge "Expand big decimal with arithmetic operators" from Piotr
"
This miniseries expands big_decimal interface with convenience operators
(-=, +, -), provides test cases for it and makes one of the constructors
explicit.

Tests: unit(dev)
"

* 'expand_big_decimal_interface' of https://github.com/psarna/scylla:
  utils: make string-based big decimal constructor explicit
  tests: add more operators to big decimal tests
  utils: add operators to big_decimal
2019-07-06 12:26:08 +03:00
Avi Kivity
24caf0824d Merge "Complete the LIKE operator" from Dejan
"
Implement LIKE parsing, intermediate representation, and query processing. Add tests
for this implementation (leaving the LIKE functionality tests in
tests/like_matcher_test.cc).

Refs #4477.
"

* 'finish-like' of https://github.com/dekimir/scylla:
  cql3: Add LIKE operator to CQL grammar
  cql3: Ensure LIKE filtering for partition columns
  cql3: Add LIKE restriction
  cql3: Add LIKE relation
2019-07-06 12:26:08 +03:00
kbr-
8995945052 Implement tuple_type_impl::to_string_impl. (#4645)
Resolves #4633.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-06 12:26:08 +03:00
Avi Kivity
187859ad78 review-checklist: mention that the guidelines are not absolute rules and can be overridden 2019-07-06 12:26:08 +03:00
Kamil Braun
c0915c40eb Warn user about using SimpleStrategy with Multi DC deployment
If the user creates a keyspace with the 'SimpleStrategy' replication class
in a multi-datacenter environment, they will receive a warning in the CQL shell
and in the server logs.
Resolves #4481.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-05 09:25:03 +02:00
Kamil Braun
35dbe9371c Add warning support to the CQL binary protocol implementation
The CQL binary protocol v4 adds support for server-side warnings:
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec
This adds a convenient API to add warnings to messages returned to the user.
Resolves #4651.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-05 09:24:56 +02:00
Kamil Braun
2f0f53ac72 Add logging of parsed command line options
The recognized command line options are now being printed when Scylla is run,
together with the whole command used.
Fixes #4203.
2019-07-05 09:00:28 +02:00
Piotr Sarna
eed2543bcc utils: make string-based big decimal constructor explicit
As a rule of thumb, single-parameter constructors should be explicit
in order to avoid unexpected implicit conversions.
2019-07-04 11:33:00 +02:00
Piotr Sarna
7e722f8dd5 tests: add more operators to big decimal tests 2019-07-04 11:32:57 +02:00
Piotr Sarna
a5e41408ec utils: add operators to big_decimal
For convenience, operators -=, + and - are implemented on top of +=.
2019-07-04 11:32:53 +02:00
Dejan Mircevski
6727e8f073 cql3: Add LIKE operator to CQL grammar
Extend the grammar with LIKE and add CQL query tests for it.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 11:01:13 +02:00
Dejan Mircevski
1c583de8bb cql3: Ensure LIKE filtering for partition columns
Partition columns are implicitly filtered whenever possible, avoiding
expensive post-processing.  But there are exceptions, eg, when
partition key is only partially restricted, or for CONTAINS
expressions.  Here we add LIKE to this list of exceptions.

Also fix compute_bounds() to punt on LIKE restrictions, which cannot
be translated into meaningful bounds.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:59:13 +02:00
Dejan Mircevski
63cec653e5 cql3: Add LIKE restriction
This restriction leverages like_matcher to perform filtering.

Make single_column_relation::new_LIKE_restriction() return this new
restriction.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:58:56 +02:00
Dejan Mircevski
21d7722594 cql3: Add LIKE relation
Add a new type of relation with operator LIKE.  Handle it in
relation::to_restriction by introducing a new virtual method for it.
The temporary implementation of this method returns null; that will be
replaced in a subsequent patch.

Add abstract_type::is_string() to recognize string columns and
disallow LIKE operator on non-string columns.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-07-04 10:54:30 +02:00
Kamil Braun
f155a2d334 Fix command line argument parsing in main.
Command line arguments are parsed twice in Scylla: once in main and once in Seastar's app_template::run.
The first parse is there to check if the "--version" flag is present --- in this case the version is printed
and the program exists.  The second parsing is correct; however, most of the arguments were improperly treated
as positional arguments during the first parsing (e.g., "--network host" would treat "host" as a positional argument).
This happened because the arguments weren't known to the command line parser.
This commit fixes the issue by moving the parsing code until after the arguments are registered.
Resolves #4141.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-03 14:11:34 +02:00
Avi Kivity
8a0c4d508a Merge "Repair switch to rpc stream" from Asias
"
The put_row_diff, get_row_dif and get_full_row_hashes verbs are switched
to use rpc stream instead of rpc verb. They are the verbs that could
send big rpc messages. The rpc stream sink and source are created per
repair follower for each of the above 3 verbs. The sink and source are
shared for multiple requests during the entire repair operation for a
given range, so there is no overhead to setup rpc stream.

The row buffer is now increased to 32MiB from 256KiB, giving better
bandwidth in high latency links. The downside of bigger row buffer is
reduced possibility that all the rows inside a row buffer are identical.
This causes more full hashes to be exchanged. To address this issue, the
plan is to add better set reconciliation algorithm in addition to the
current send full hashes.

I compared rebuild using regular stream plan with repair using rpc
stream. With 2 nodes, 1 smp, 8M rows, delete all data on one of the
node before repair or rebuild.

    repair using seastar rpc verb

Time to complete: 82.17s

    rebuild using regular streaming which uses seastar rpc stream

Time to complete: 63.87s

    repair using seastar rpc stream

Time to complete: 68.48s

For 1) and 3), the improvement is 16.6% (repair using rpc verb v.s. repair using rpc stream)

For 2) and 3), the difference is 7.2% (repair v.s. stream)

The result is promising for the future repair-based bootstrap/replace node operations.

NOTE: We do not actually enable rpc stream in row level repair for now. We
will enable it after we fix the the stall issues caused by handling
bigger row buffers.

Fixes #4581
"

* 'repair_switch_to_rpc_stream_v9' of https://github.com/asias/scylla: (45 commits)
  docs: Add RPC stream doc for row level repair
  repair: Mark some of the helper functions static
  repair: Increase max row buf size
  repair: Hook rpc stream version of verbs in row level repair
  repair: Add use_rpc_stream to repair_meta
  repair: Add is_rpc_stream_supported
  repair: Add needs_all_rows flag to put_row_diff
  repair: Optimize get_row_diff
  repair: Register repair_get_full_row_hashes_with_rpc_strea
  repair: Register repair_put_row_diff_with_rpc_stream
  repair: Register repair_get_row_diff_with_rpc_stream
  repair: Add repair_get_full_row_hashes_with_rpc_stream_handler
  repair: Add repair_put_row_diff_with_rpc_stream_handler
  repair: Add repair_get_row_diff_with_rpc_stream_handler
  repair: Add repair_get_full_row_hashes_with_rpc_stream_process_op
  repair: Add repair_put_row_diff_with_rpc_stream_process_op
  repair: Add repair_get_row_diff_with_rpc_stream_process_op
  repair: Add put_row_diff_with_rpc_stream
  repair: Add put_row_diff_sink_op
  repair: Add put_row_diff_source_op
  ...
2019-07-03 10:08:55 +03:00