Commit Graph

16994 Commits

Author SHA1 Message Date
Raphael S. Carvalho
fc92fb955d sstables/compaction_manager: release reference to exhausted sstable through callback
That's important for the reference to sstable to not be kept throughout
the compaction procedure, which would break the goal of releasing
space during compaction.

Manager passes a callback to compaction which calls it whenever
there's sstable replacement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:16 -02:00
Raphael S. Carvalho
3f309ebba9 sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor
Motivation is that we want to release space for exhausted sstable and that
will only happen when all references to it are gone *and* that backlog
tracker takes the early replacement into account.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:13 -02:00
Raphael S. Carvalho
3433de3dc0 database: do not keep reference to sstable in selector when done selecting
When compacting, we'll create all readers at once and will not select
again from incremental selector, meaning the selector will keep all
respective sstables in current_sstables, preventing compaction from
releasing space as it goes on.

The change is about refreshing sstable set's selector such that it
will not hold a reference to an exhausted sstable whatsoever.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:12 -02:00
Raphael S. Carvalho
f6df949c1a compaction: share sstable set with incremental reader selector
By doing that, we'll be able to release exhausted sstable from both
simulteaneously.
That's achieved by sharing set containing input sstables with the incremental
reader selector and removing exhausted sstables from shared set when the
time has come.

Step towards reducing disk requirement for compaction by making it delete
sstable which all data is in a sealed new sstable. For that to happen,
all references must be gone.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:10 -02:00
Raphael S. Carvalho
e5a0b05c15 sstables/compaction: release space earlier of exhausted input sstables
Currently, compaction only replace input sstables at end of compaction,
meaning compaction must be finished for all the space of those sstables
to be released.

What we can do instead is to delete earlier some input sstable under
some conditions:

1) SStable data should be committed to a new, sealed output sstable,
meaning it's exhausted.
2) Exhausted sstable mustn't overlap with a non-exhausted sstable
because a tombstone in the exhausted could have been purged and the
shadowed data in non-exhausted could be ressurected if system
crashes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:07 -02:00
Raphael S. Carvalho
ace070c8fc sstables: make partitioned sstable set's incremental selector resilient to changes in the set
The motivation is that compaction may remove a sstable from the set while the
incremental selector is alive, and for that to work, we need to invalidate
the iterators stored by the selector. We could have added a method to notify
it, but there will be a case where the one keeping the set cannot forward
the notification to the selector. So it's better for the selector to take
care of itself. Change counter approach is used which allows the selector
to know when to invalidate the iterators.

After invalidation, selector will move the iterator back into its right
place by looking for lower bound for current pos.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:05 -02:00
Raphael S. Carvalho
8d11b0bbb4 database: do not store reference to sstable in incremental selector
Use sstable generation instead to keep track of read sstables.
The motivation is that we'll not keep reference to sstables, so allowing
their space on disk to be released as soon they get exhausted.
Generation is used because it guarantees uniqueness of the sstable.

Reviewed-by: Botond Dénes <bdenes@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:04 -02:00
Raphael S. Carvalho
edc87014c1 tests/sstables: add run identifier correctness test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:02 -02:00
Raphael S. Carvalho
a66b1954cc sstables: use a random uuid for sstables without run identifier
Older sstables must have an identifier for them to be associated
with their own run.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:01 -02:00
Raphael S. Carvalho
62025fa52c sstables: add run identifier to scylla metadata
It identifies a run which a particular sstable belongs to.
Existing sstables will have a random uuid associated with it
in memory.

UUID is the correct choice because it allows sstables to be
exported without having conflicts when using identifier generated
by different nodes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:52:44 -02:00
Rafael Ávila de Espíndola
d18bbe9d45 Remove unreachable default cases.
These switches are fully covered. We can be sure they will stay this
way because of -Werror and gcc's -Wswitch warning.

We can also be sure that we never have an invalid enum value since the
state machine values are not read from disk.

The patch also removes a superfluous ';'.
Message-Id: <20181124020128.111083-1-espindola@scylladb.com>
2018-11-24 09:31:51 +00:00
Raphael S. Carvalho
d29482dce8 sstables: deprecate sstable metadata's ancestors
The reason for that is that it's not available in sstable format mc,
so we can no longer rely on it in common code for the currently
supported formats.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>
2018-11-23 19:38:32 +01:00
Tomasz Grabiec
564b328b2e Merge 'Add tests for schema changes' from Paweł
This series adds a generic test for schema changes that generates
various schema and data before and after an ALTER TABLE operation. It is
then used to check correctness of mutation::upgrade() and sstable
readers and lead to the discovery of #3924 and #3925.

Fixes #3925.

* https://github.com/pdziepak/scylla.git schema-change-test/v3.1
  schema_builder: make member function names less confusing
  converting_mutation_partition_applier: fix collection type changes
  converting_mutation_partition_applier: do not emit empty collections
  sstable: use format() instead of sprint()
  tests/random-utils: make functions and variables inline
  tests: add models for schemas and data
  tests: generate schema changes
  tests/mutation: add test for schema changes
  tests/sstable: add test for schema changes
2018-11-23 15:11:31 +01:00
Paweł Dziepak
09439cd809 tests/sstable: add test for schema changes
for_each_schema_change() is used for testing reading an sstable that was
written with a different schema. Because of #3924, for now the mc format
is not verified this way.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
dc7f9fea5b tests/mutation: add test for schema changes 2018-11-23 12:14:06 +00:00
Paweł Dziepak
35f9f424e9 tests: generate schema changes
This patch adds for_each_schema_change() functions which generates
schemas and data before and after some modification to the schema (e.g.
adding a column, changing its type). It can be used to test schema
upgrades.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
daee4bd3b8 tests: add models for schemas and data
This patch introduces a model of Scylla schemas and data, implemented
using simple standard library primitives. It can be used for testing the
actuall schemas, mutation_partitions, etc. used by the schema by
comparing the results of various actions.

The initial use case for this model was to test schema changes, but
there is no reason why in the future it cannot be extended to test other
things as well.
2018-11-23 12:14:06 +00:00
Takuya ASADA
cf0d00b81a dist/ami: fix 'unknown configuration key: "enhanced_networking"' error while building AMI
packer 1.3.2 no longer supported enhanced_networking directive, we need
to use new directives("sriov_support" and "ena_support") to build with
new version.
packer provides automatic configuration file fixing tool, so new
scylla.json is generated by following command:
 ./packer/packer fix scylla.json

Fixes #3938

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181123053719.32451-1-syuu@scylladb.com>
2018-11-23 08:15:47 +02:00
Paweł Dziepak
91793c0a43 bytes_ostream: drop appending_hash specialisation
appending_hash is used for computing hashes that become part of the
binary interface. They cannot change between Scylla version and the same
data needs to always result in the same hash.

At the moment, appending_hash<bytes_ostream> doesn't fulfil those
requirements since it leaks information how the underlying buffer is
fragmented. Fortunately, it has no users so it doesn't casue any
compatibility issues.

Moreover, bytes_ostream is usually used as an output of some
serialisation routine (e.g. frozen_mutation_fragment or CQL response).
Those serialisation formats do not guarantee that there is a single
representation of a given data and therefore are not fit to be hashed by
appending_hash. Removing appending_hash<bytes_ostream> may help
preventing such incorrect uses.
Message-Id: <20181122163823.12759-1-pdziepak@scylladb.com>
2018-11-22 23:53:54 +00:00
Tomasz Grabiec
fb38f0e9f8 Update seastar submodule
* seastar b924495...1fbb633 (3):
  > rpc: Reduce code duplication
  > tests: perf: Make do_not_optimize() take the argument by const&
  > doc: Fix import paths in the tutorial
2018-11-22 23:53:54 +00:00
Paweł Dziepak
2a0e929830 tests/random-utils: make functions and variables inline
random-utils.hh is a header which may be included in multiple
translation units so all members should be non-static inline to avoid
any duplication.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
edb5402a73 sstable: use format() instead of sprint()
The format message was using the new stlye formatting markers ("{}")
which are understood by format() but not by sprint() (the latter is
basically deprecated).
2018-11-22 11:30:31 +00:00
Paweł Dziepak
1fbe33791d converting_mutation_partition_applier: do not emit empty collections
This patch changes the behaviour of the schema upgrade code so that if
all cells and the tombstons of a collection are removed during the upgrade
the collection is not emitted (as opposed to emitting an empty one).
Both behaviours are valid, but the new one makes it more consistent with
how atomic cells are upgraded and how schema upgrades work for sstable
readers.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
7b12aaa093 converting_mutation_partition_applier: fix collection type changes
ALTER TABLE allows changing the type of a collection to a compatible
one. This includes changes from a fixed-sized type to a variable-sized
one. If that happens the atomic_cells representing collection elements
need to be rewritten so that the value size is included. The logic for
rewritting atomic cells already exists (for those that are not
collection members) and is reused in this patch.

Fixes #3925.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
43e0201ec6 schema_builder: make member function names less confusing
Right now, schema_builder member functions have names that very poorly
convey the actions that are performed for them. This is made even worse
by some overloads which drastically change the semantics. For example:

    schema_builder()
        .with_column("v1", /* ... */)
        .without_column("v1", removal_timestamp);

Creates a column "v1" and adds an information that there was a column
with that name that was removed at 'removal_timestamp'.

    schema_builder()
        .with_coulmn("v1")
        .without_column(utf8_type->decompose("v1"));

This adds column "v1" and then immediately removes it.

In order to clean up this mess the names were changes so that:
 * with_/without_ functions only add informations to the schema (e.g.
   info that a column was removed, but without removing a column of that
   name if one exists)
 * functions which names start with a verb actually perform that action,
   e.g. the new remove_column() removes the column (and adds information
   that it used to exist) as in the second example.
2018-11-22 11:30:31 +00:00
Benny Halevy
dcd18e2b62 remove exec permission from top_k source files
This was introduced by 32525f2694

Cc: Rafi Einstein <rafie@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181121163352.13325-1-bhalevy@scylladb.com>
2018-11-21 18:38:50 +02:00
Gleb Natapov
b4a8802edc hints: make hints manager more resilient to unexpected directory content
Currently if hints directory contains unexpected directories Scylla fails to
start with unhandled std::invalid_argument exception. Make the manager
ignore malformed files instead and try to proceed anyway.
Message-Id: <20181121134618.29936-2-gleb@scylladb.com>
2018-11-21 14:53:03 +00:00
Gleb Natapov
9433d02624 hints: add auxiliary function for scanning high level hints directory
We scan hints directory in two places: to search for files to replay and
to search for directories to remove after resharding. The code that
translates directory name to a shard is duplicated. It is simple now, so
not a bit issue but in case it grows better have it in one place.
Message-Id: <20181121134618.29936-1-gleb@scylladb.com>
2018-11-21 14:53:03 +00:00
Paweł Dziepak
4aa5d83590 Merge "Optimize sstable writing of the MC format" from Tomasz
"
Tested with perf_fast_forward from:

  github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1

Using the following command line:

  build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \
     --data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \
     --datasets small-part

The average reported flush throughput was (stdev for the avergages is around 4k):
  - for mc before the series: 367848 frag/s
  - for lc before the series: 463458 frag/s (= mc.before +25%)
  - for mc after the series: 429276 frag/s (= mc.before +16%)
  - for lc after the series: 466495 frag/s (= mc.before +26%)

Refs #3874.
"

* tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla:
  sstables: mc: Avoid serialization of promoted index when empty
  sstables: mc: Avoid double serialization of rows
  tests: sstable 3.x: Do not compare Statistics component
  utils: Introduce memory_data_sink
  schema: Optimize column count getters
  sstables: checksummed_file_data_sink_impl: Bypass output_stream
2018-11-21 13:11:40 +00:00
Tomasz Grabiec
049926bfb8 sstables: mc: Avoid serialization of promoted index when empty
calculate_write_size() adds some overhead, even if we're not going to
write anything.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
0a9f5b563a sstables: mc: Avoid double serialization of rows
The old code was serializing the row twice. Once to get the size of
its block on disk, which is needed to write the block length, and then
to actually write the block.

This patch avoids this by serializing once into a temporary buffer and
then appending that buffer to the data file writer.

I measured about 10% improvement in memtable flush throughput with
this for the small-part dataset in perf_fast_forward.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
8f686af9af tests: sstable 3.x: Do not compare Statistics component
The Statistics component recorded in the test was generated using a
buggy verion of Scylla, and is not correct. Exposed by fixing the bug
in the way statistics are generated.

Rather than comparing binary content, we should have explicit checks
for statistics.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
143fd6e1c2 utils: Introduce memory_data_sink 2018-11-21 14:04:27 +01:00
Tomasz Grabiec
789fac9884 schema: Optimize column count getters 2018-11-21 14:04:27 +01:00
Tomasz Grabiec
8e8b96c6ed sstables: checksummed_file_data_sink_impl: Bypass output_stream
We can avoid the data copying by switching from this:

  sink -> stream -> sink

to this:

  sink -> sink
2018-11-21 14:04:27 +01:00
Avi Kivity
bb85a21a8f Merge "compress: Restore lz4 as default compressor" from Duarte
"
Enables sstable compression with LZ4 by default, which was the
long-time behavior until a regression turned off compression by
default.

Fixes #3926
"

* 'restore-default-compression/v2' of https://github.com/duarten/scylla:
  tests/cql_query_test: Assert default compression options
  compress: Restore lz4 as default compressor
  tests: Be explicit about absence of compression
2018-11-21 14:20:39 +02:00
Benny Halevy
76b1c184b7 conf: clean up cassandra references in scylla.yaml
Indicate the default scylla directories, rather than Cassandra's.
Provide links to Scylladocumentation where possible,
update links to Casandra documentation otherwise.
Clean up a few typos.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181119141912.28830-1-bhalevy@scylladb.com>
2018-11-21 13:04:24 +02:00
Rafael Ávila de Espíndola
7fa7e9716d Mention scylla-tools-java and scylla-jmx in HACKING.md
I struggled a bit finding out why nodetool was not working, so it
might be a good idea to expand the documentation a bit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181120233358.25859-1-espindola@scylladb.com>
2018-11-21 12:55:17 +02:00
Tomasz Grabiec
349c9f7a69 HACKING.md: Add a link to the slides about core dump debugging tools
Message-Id: <1542793207-1620-1-git-send-email-tgrabiec@scylladb.com>
2018-11-21 11:45:23 +02:00
Michael Munday
53fdde75f6 dht: use little endian byte order explicitly for token hash
This avoids a difference between little and big endian sytems. We
now also calculate a full murmur hash for tokens with less than 8
bytes, however in practice the token size is always 8.

Message-Id: <20181120214733.43800-1-mike.munday@ibm.com>
2018-11-21 11:44:29 +02:00
Michael Munday
360374cfde tests: fix compilation of partitioner_test with boost 1.68 on IBM Z
The boost multiprecision library that I am compiling against seems
to be missing an overload for the cast to a string. The easy
workaround seems to be to call str() directly instead.

This also fixes #3922.

Message-Id: <20181120215709.43939-1-mike.munday@ibm.com>
2018-11-21 11:43:42 +02:00
Duarte Nunes
9464fffc8c tests/cql_query_test: Assert default compression options
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:27 +00:00
Duarte Nunes
36dc9e3280 compress: Restore lz4 as default compressor
Fixes a regression introduced in
74758c87cd, where tables started to be
created without compression by default (before they were created with
lz4 by default).

Fixes #3926

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:27 +00:00
Duarte Nunes
5f64e34fcc tests: Be explicit about absence of compression
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:26 +00:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Takuya ASADA
42baf6a6f7 dist/ami: update packer
Update packer to latest version, 1.3.2.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110441.16284-2-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Takuya ASADA
b9a42e83ad dist/ami: enable AMI build log
To make easier to debug AMI build error, enable logging.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110441.16284-1-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Takuya ASADA
72411f95cb reloc/build_reloc.sh: find ninja-build after executed install-dependencies.sh
The build environment may not installed ninja-build before running
install-dependencies.sh, so do it after running the script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110737.17755-1-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Avi Kivity
183c2369f3 Update seastar submodule
* seastar a44cedf...d59fcef (10):
  > dns: Set tcp output stream buffer size to zero explicitly
  > tests: add libc-ares to travis dependencies
  > tests: add dns_test to test suite
  > build: drop bundled c-ares package
  > prometheus: replace the instance label with an optional one
  > build: Refactor C++ dialect detection
  > build: add libatomic to install-depenencies.sh
  > core: use std::underlying_type for open_flags
  > core: introduce open_flags::operator&
  > core: Fix build for `gnu++14`
2018-11-20 21:29:57 +02:00
Tomasz Grabiec
57e25fa0f8 utils: phased_barrier: Make advance_and_await() have strong exception guarantees
Currently, when advance_and_await() fails to allocate the new gate
object, it will throw bad_alloc and leave the phased_barrier object in
an invalid state. Calling advance_and_await() again on it will result
in undefined behavior (typically SIGSEGV) beacuse _gate will be
disengaged.

One place affected by this is table::seal_active_memtable(), which
calls _flush_barrier.advance_and_await(). If this throws, subsequent
flush attempts will SIGSEGV.

This patch rearranges the code so that advance_and_await() has strong
exception guarantees.
Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>
2018-11-20 16:15:12 +00:00