Commit Graph

15289 Commits

Author SHA1 Message Date
Botond Dénes
ddd70dc113 Use dht::token_range alias for last/preferred replicas
Use the pre-existing type alias instead of fully spelling out the type
everywhere.
2018-05-10 06:22:39 +03:00
Botond Dénes
52affa2a61 storage_proxy::coordinator_query_result: merge constructors into one w/ default params 2018-05-10 06:22:39 +03:00
Botond Dénes
3b6f4e4901 querier: check only the end bound of ranges when matching them
The querier provides a `matches(const nonwrapping_range&)` member to
allow for checking whether a range matches that with which the querier
was originally created. The check for match is more lax than a strict
equality check as ranges are shrunk query progresses.
Because of this the above member only checked that one of the bounds of
the examined ranges matches. This is adequate as for this purpose
because, in the context of a single query, it is guaranteed that no
two read requests to the same replica will have overlapping range.
However Avi pointed out in a recent, related review, that this check can
be made a little more strict by requiring that the end-bounds of the
two ranges *always* matches, instead of allowing any of the bounds to
match.
2018-05-10 06:22:39 +03:00
Botond Dénes
eba90d0208 querier: take range and slice by value
It needs to copy these anyway so give callers the opportunity to move
these in.
2018-05-10 06:22:39 +03:00
Botond Dénes
546a0e292e querier: remove const params from make_compaction_state() 2018-05-10 06:22:39 +03:00
Botond Dénes
bc01833cad querier: make _range and _slice const
Since we are storing them on the heap we can make them const and still
be movable. We get the cake and can eat it too.
2018-05-10 06:22:39 +03:00
Botond Dénes
f5b012c952 flat_multi_range_mutation_reader: optimize for non-plural range vectors
Don't create a flat_multi_range_mutation_reader when the range vector
has 0 or 1 element. In the former case create an empty reader and in the
latter just create a reader with the mutation-source with the only range
in the vector.
2018-05-10 06:22:39 +03:00
Botond Dénes
16319c2036 range: clean the deduced transformed type
wrapping_range and nonwrapping_range offer a transform() member function
which allows creating a new range by applying a transformer function to
the bounds of the current range. The type of bounds of the new range is
deduced from the return type for this transformer function. However the
return type is used as-is, with any CV or reference attached to it.
Since it doesn't make sense to create a range of references or a type
with CV qualifiers strip these off the deduced type.
2018-05-10 06:22:39 +03:00
Avi Kivity
911c2e7953 Merge "Support Bloom filter format for SSTables 3.x." from Vladimir
"
In SSTables 3.0, the base and increment fields have been swapped in
Bloom filters to reduce collisions (see CASSANDRA-8413). This affects
the resulting values written to Filter.db.

This patchset adds support for reading/writing Filter.db in the format
corresponding to the version of SSTables.

Tests: unit {release}

Filter.db files have been generated using Cassandra 3.11 with same data
as in unit tests and are validated to match those generated by Scylla.
"

* 'projects/sstables-30/write-filter/v1-2' of https://github.com/argenet/scylla:
  Fix mistakes and typos in comments (minor clean-up)
  Check Filter.db in SSTables 3.x write tests.
  Support Bloom filter format used in SSTables 3.0.
  Remove unused overload of i_filter::get_filter().
2018-05-09 11:16:09 +03:00
Vladimir Krivopalov
51c8ea74d6 sstables: generate non-empty summaries for m format
Add summary entries as needed. Also removes the duplicate line that
assigned summary byte cost.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <0d387c68523bae0c121cb15ad1e651ee9a8e4b4a.1525732404.git.vladimir@scylladb.com>
2018-05-09 11:15:02 +03:00
Vladimir Krivopalov
b59549cd16 Fix mistakes and typos in comments (minor clean-up)
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-08 15:28:43 -07:00
Vladimir Krivopalov
e739bb3280 Check Filter.db in SSTables 3.x write tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-08 15:28:35 -07:00
Vladimir Krivopalov
0f37c0e684 Support Bloom filter format used in SSTables 3.0.
The two hash values, base and increment, used to produce indices for
setting bits in the filter, have been swapped in SSTables 3.0.
See CASSANDRA-8413 for details.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-08 15:28:27 -07:00
Vladimir Krivopalov
fe2358e8bd Remove unused overload of i_filter::get_filter().
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-08 15:28:18 -07:00
Calle Wilund
b2b1a1f7e1 database: Fix assert in truncate
Fixes crash in cql_tests.StorageProxyCQLTester.table_test
"avoid race condition when deleting sstable on behalf..." changed
discard_sstables behaviour to only return rp:s for sstables owned
and submitted for deletion (not all matching time stamp),
which can in some cases cause zero rp returned.
Message-Id: <20180508070003.1110-1-calle@scylladb.com>
2018-05-08 22:29:21 +01:00
Vlad Zolotarov
48c96d09d6 db::hints::manager: drain hints when the node is decommissioned/removed
When node is decommissioned/removed it will drain all its hints and all
remote nodes that have hints to it will drain their hints to this node.

What "drain" means? - The node that "drains" hints to a specific
destination will ignore failures and will continue sending hints till the end
of the current segment, erase it and move to the next one till there are
no more segments left.

After all hints are drained the corresponding hints directory is removed.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-08 22:29:21 +01:00
Vlad Zolotarov
ec76f8a27d db::hints::manager: add a few more trace messages
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-08 22:29:21 +01:00
Vlad Zolotarov
6ede32156f db::hints::manager::end_point_hints_manager::sender: add set_stopping()/stopping() methods
It's nicer to have access methods instead of working directly with enum_set methods and values.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-08 22:29:21 +01:00
Vlad Zolotarov
94da744f37 db::hints::manager::end_point_hints_manager::stop(): log the last exception instead of forwarding it
Returning a future with an exception from end_point_manager::stop()
is practically useless because the best the caller can do is to log
it and continue as if it didn't happen because it has other things
to shut down.

Therefore in order to simplify the caller we will log the exception
if it happens and will always return a non-exceptional future.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-08 22:29:21 +01:00
Vlad Zolotarov
8aedbf9d18 db::hints: manager.hh: cleanup: fix the comments
Fix the comments that went out of sync with the current implementation.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-08 22:29:21 +01:00
Vlad Zolotarov
5463b58faa db::hints::manager: rework end_point_hints_manager::stop() to use seastar::async()
This simplifies the code reading and extending.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-05-08 22:29:21 +01:00
Botond Dénes
6f7d919470 database: when dropping a table evict all relevant queriers
Queriers shouldn't outlive the table they read from as that could lead
to use-after-free problems when they are destroyed.

Fixes: #3414

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <3d7172cef79bb52b7097596e1d4ebba3a6ff757e.1525716986.git.bdenes@scylladb.com>
2018-05-07 21:20:25 +03:00
Duarte Nunes
c053275a48 db/view/row_locking: Add timeout when waiting for the lock
This ensures we respect the write timeout set by the client when
applying base writes, in case a writes takes too long to acquire the
row lock for the read-before-write phase of a materialized view
update.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180507132755.8751-1-duarte@scylladb.com>
2018-05-07 18:22:39 +01:00
Duarte Nunes
113294074d Merge seastar upstream
* seastar ac02df7...840002c (20):
  > dpdk: protect against missing statistics
  > alien: make visible in documentation
  > Merge "rewrite iotune to conform to the new ioscheduler" from Glauber
  > app_template: Correct outdated comment
  > apps, tests: Catch polymorphic exceptions by reference
  > configure.py: Enhance detection for gcc -fvisibility=hidden bug
  > reactor: add rudimentary task histogram reporting
  > Revert "Merge "rewrite iotune to conform to the new ioscheduler" from Glauber"
  > Merge "rewrite iotune to conform to the new ioscheduler" from Glauber
  > build: Use the same warning name for Clang and GCC
  > core/rwlock: Add support for timeouts
  > fs qualification: protect against EINTR
  > Docker: Fix failing build due to missing GNU make
  > reactor: move optional to experimental so we compile with c++14
  > future: remove allocation from future::get() thread context switch
  > Merge "rpc streaming" from Gleb
  > reactor: put mountpoint_params in seastar namespace
  > Tutorial: in PDF version of tutorial, better backtick typesetting
  > tutorial: support, and start using, links to other sections
  > tutorial: improve second half of semaphores section

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-05-07 18:22:39 +01:00
Avi Kivity
368e15a8e2 Update scylla-ami submodule
* dist/ami/files/scylla-ami 8a6e4dd...e0b35dc (1):
  > change default roles for EBS / ephemeral
2018-05-07 12:34:04 +03:00
Duarte Nunes
4b3562c3f5 db/view: Limit number of pending view updates
This patch adds a simple and naive mechanism to ensure a base replica
doesn't overwhelm a potentially overloaded view replica by sending too
many concurrent view updates. We add a semaphore to limit to 100 the
number of outstanding view updates. We limit globally per shard, and
not per destination view replica. We also limit statically.

Refs #2538

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180426134457.21290-2-duarte@scylladb.com>
2018-05-07 11:25:27 +03:00
Duarte Nunes
2be75bdfc9 db/timeout_clock: Properly scope type names
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180426134457.21290-1-duarte@scylladb.com>
2018-05-07 11:24:41 +03:00
Nadav Har'El
c93b56034d tests: improve usability of cql_assertions.hh error messages
The functions in cql_assertions.hh are very convenient, but have one
frustrating drawback: When you have many of those assertions in one
test, it's very hard to know *which* of the similar assertions failed.

The problem is that an error often looks like this:

unknown location(0): fatal error: in "test_many_columns":
std::runtime_error: Expected 2 row(s) but got 0
tests/cql_assertions.cc(131): last checkpoint

Which of the many similar checks in "test_many_columns" failed? Note the
unhelpful "unknown location" and also the "last checkpoint" points to code
in cql_assertions.cc, not in the actual test, so it is useless.

The root cause of these problems is that the Boost macros use the C
preprocessor __FILE__ and __LINE__, which in actual C++ functions like
is_rows() remembers its location, instead of the caller. Fixing this will
not be simple. But this patch has a much simpler solution - fixing the
"last checkpoint". What ruins the last checkpoint is the use of BOOST_REQUIRE
inside the cql_assertions.cc is_rows() - when that succeeds, it records
the location inside cql_assertions.cc (!) as the last success.

If we just replace BOOST_REQUIRE by our own test (just like in the rest of
the cql_assertions.cc code), this code will not override the last checkpoint.
The user can see the last real successful BOOST_REQUIRE, or use
BOOST_TEST_PASSPOINT() to set his own checkpoints between different parts of
the same test.

After this patch, and with adding BOOST_TEST_PASSPOINT() calls between
different parts of my test, the failure above now looks like:

unknown location(0): fatal error: in "test_many_columns":
std::runtime_error: Expected 2 row(s) but got 0
tests/secondary_index_test.cc(299): last checkpoint

The "last checkpoint" now shows me exactly where my failing check was.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180501152638.26238-1-nyh@scylladb.com>
2018-05-07 09:19:45 +01:00
Duarte Nunes
eabe471ce8 tests/secondary_index_test: Don't catch polymorphic exceptions by value
Don't slice exception by catching them by value. Instead of catching
by reference, use assert_that_failed().

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180506153745.4512-1-duarte@scylladb.com>
2018-05-06 18:53:40 +03:00
Duarte Nunes
ab5a45b00c Merge 'Improve debuggability of result_message' from Avi
"This patchset adds ostream operators to result_message and uses them
in cql_assertions."

* tag 'result_message-print/v1.1' of https://github.com/avikivity/scylla:
  tests: cql_assersions: improve error message when a row is not found
  transport: add ostream support to result_message
  transport: const correctness for result_message::accept()
2018-05-06 14:52:56 +01:00
Avi Kivity
6d3fb69827 tests: cql_assersions: improve error message when a row is not found
Display the row and the result set.
2018-05-06 16:28:37 +03:00
Avi Kivity
07d69ebce2 transport: add ostream support to result_message
Allow printing result_message:s for debugging.
2018-05-06 16:28:35 +03:00
Avi Kivity
50d4d01cb7 tests: fix view_schema_test cql_assertion types
Use utf8_type where warranted.

Fixes view_schema_test failure where the rows did not match. I don't
understand exactly why the failure happened (using the wrong type
should not cause a failure here), but the change fixes the problem.

Tests: view_schema_test (release)
Message-Id: <20180506130015.7450-1-avi@scylladb.com>
2018-05-06 14:25:22 +01:00
Avi Kivity
31f2b3ce15 transport: const correctness for result_message::accept()
The visitor does not alter the result_message it is visiting (and
its signature indicates that) so accept() should be const-qualified
to indicate that and to allow visiting const result_message:s.
2018-05-06 15:51:48 +03:00
Avi Kivity
cc900c23a6 Merge "Write Statistics.db in SSTables 3.x format." from Vladimir
"
This patchset adds support for writing Statistics.db in the SSTables
'mc' (3.x) format. This file is essential for reading data stored in
Data.db as it contains base values used for delta encoding and types of
columns.

This patchset also fixes several bugs found in writing data and index
files as well as bugs in a statistics-related structure definition.

Tests: unit {debug, release}

All SSTables files for write unit tests are validated to be processed by
sstabledump and output is verified to show the expected data.
"

* 'projects/sstables-30/write-statistics/v1' of https://github.com/argenet/scylla:
  Add test covering the composite partition key case.
  Add Statistics.db files to write tests for SSTables 3.0.
  Do not check rows and cells for expiration when writing them to the data file.
  Fix promoted index serialization.
  Fix the order of items in stats_metadata.
  Fix timestamp_epoch value which was truncated on exceeding int32_t type limit.
  Write serialization header to Statistics.db for SSTables 3.x.
  Do not pass schema to metadata_collector::update(column_stats)
  Collect metadata statistics when writing SSTables 3.0.
  Call get_metadata_collector() instead of referencing sstable::_collector directly.
  Fix logic of writing TTLed cells in SSTable 3.0 format.
  Separate statistics for count of cells, columns and rows in column_stats.
  Deserialize collection in a way that doesn't incur shared_ptr counter increment and is generally shorter.
  Track both min & max values for timestamp, TTL and local deletion time in metadata_collector.
  Add class for tracking both extremum values (min and max) on updates.
2018-05-05 16:53:08 +03:00
Vladimir Krivopalov
4ecb3a5e2a Add test covering the composite partition key case.
Mainly to check that the composite type is properly serialized when
writing serialization header to Statistics.db.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:45:11 -07:00
Vladimir Krivopalov
1b3989adcd Add Statistics.db files to write tests for SSTables 3.0.
For these tests to work, all time-related values are now fixed as these
are stored in Statistics.db files.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:45:11 -07:00
Vladimir Krivopalov
293ee6ae3f Do not check rows and cells for expiration when writing them to the data file.
Although this logic may be seen as a useful optimization, it hinders
unit tests writing SSTables 3.0 as those need to have fixed time-related
values to produce Statistics.db files with the same content on each run.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:45:11 -07:00
Vladimir Krivopalov
44bc0f1493 Fix promoted index serialization.
There is a new field introduced in the SSTables 3.0 index file format
named 'partition_header_length' that can be used to skip over to the
first clustering row in a wide partition. This one has not been
previously written and caused malformed indices.

Updated the corresponding test to include a static row and write
multiple wide partitions.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:45:10 -07:00
Vladimir Krivopalov
56ac941a2e Fix the order of items in stats_metadata.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:45:10 -07:00
Vladimir Krivopalov
926cdc6d70 Fix timestamp_epoch value which was truncated on exceeding int32_t type limit.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:45:10 -07:00
Vladimir Krivopalov
5db6002720 Write serialization header to Statistics.db for SSTables 3.x.
Serialization header is a new components in Statistics.db introduced in
SSTables 3.0 ('ma') format. It is essential for reading data file as it
contains the base values used for delta-encoded values (timestamps,
TTLs, local deletion times) and description of column types.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:43:17 -07:00
Vladimir Krivopalov
6e4601d177 Do not pass schema to metadata_collector::update(column_stats)
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:22:32 -07:00
Vladimir Krivopalov
a10ad6b623 Collect metadata statistics when writing SSTables 3.0.
Track min/max timestamps, TTLs, local deletion times and count of cells,
columns and rows.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-04 15:22:30 -07:00
Raphael S. Carvalho
abcfc19fe9 db: make compaction slightly faster by not using filtering reader on unshared sstable
After reboot, all existing sstables are considered shared. That's a safe default.
Reader used by compaction decides to use filtering reader (filters out data that
doesn't belong to this shard) if sstable is considered shared even though it may
actually be unshared.
By avoiding filtering reader we're avoiding an extra check for each key, and that
may be meaningful for compaction of tons of small partitions and even range
reads of such. We do so by fixing sstable::_shared, which is now set properly for
existing sstables at start.

quick check using microbenchmark which extends perf_sstable with compaction mode:
before: 69407.61 +- 37.03 partitions / sec (30 runs, 1 concurrent ops)
after: 70161.09 +- 40.35 partitions / sec (30 runs, 1 concurrent ops)

Fixes #3042.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180504182158.21130-1-raphaelsc@scylladb.com>
2018-05-04 19:34:09 +01:00
Raphael S. Carvalho
b65bc511fe sstables/compaction_manager: log user initiated compaction
Sometimes it's hard to figure out from log whether user run major
compaction.

Fixes #1303.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180504181047.20277-1-raphaelsc@scylladb.com>
2018-05-04 19:15:58 +01:00
Duarte Nunes
7916368df8 Merge "Introduce system.large_partitions table" from Piotr
"
This series introduces a system.large_partitions table,
used to gather information on largest partitions in the cluster.

Schema below allows easy extraction of most offending keys and removal
by sstable name, which happens when a table is compacted away.

Schema: (
  keyspace_name text,
  table_name text,
  sstable_name text,
  partition_size bigint,
  key text,
  compaction_time timestamp,
  PRIMARY KEY((keyspace_name, table_name), sstable_name, partition_size, key)
) WITH CLUSTERING ORDER BY (partition_size DESC);
"

Closes #3292.

* 'large_partition_table_3' of https://github.com/psarna/scylla:
  database, sstables, tests: add large_partition_handler
  db: add large_partition_handler interface with implementations
  docs: init system_keyspace entry with system.large_partitions
  db: add system.large_partitions table
2018-05-04 18:18:50 +01:00
Piotr Sarna
bc019205b3 schema: fix typos in a comment
Message-Id: <2b2a169e8a511fa9e0e1556ac7559ce9bef896e1.1525431353.git.sarna@scylladb.com>
2018-05-04 15:26:51 +01:00
Piotr Sarna
fe02c3d0e2 database, sstables, tests: add large_partition_handler
This commit makes database, sstables and tests aware
of which large_partition_handler they use.
Proper large_partition_handler is retrievable from config information
and is based on existing compaction_large_partition_warning_threshold_mb
entry. Right now CQL TABLE variant of large_partition_handler is used
in the database.

Tests use a NOP version of large_partition_handler, which does not
depend on CQL queries at all.
2018-05-04 14:38:13 +02:00
Piotr Sarna
14b3c7e7e7 db: add large_partition_handler interface with implementations
This commit introduces large_partition_handler class, which can be used
to take additional action when large partitions are written.

It comes with two implementations:
 * NOP, used in tests, which does nothing on large partition
   update/delete
 * CQL TABLE, which inserts/deletes information on particular sstable
   to system.large_partitions table, in order to be retrievable from
   cqlsh later.

References #3292
2018-05-04 12:46:31 +02:00