Commit Graph

1447 Commits

Author SHA1 Message Date
Avi Kivity
672de608bf tests: fix call to seastar::sleep()
It's not in the global namespace.
2017-06-22 18:16:13 +03:00
Raphael S. Carvalho
4bb27cbd6f lcs: actually prefer oldest sstables of L0 when it falls behind
Strategy prefers promoting oldest sstables in L0. Because sort
procedure is incorrectly sorting elements in descending order,
newest sstables will be promoted first *if and only if* L0 falls
behind (more than 32 sstables). If L0 doesn't fall behind, we'll
have all L0 sstables compacted with overlapping ones in L1.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-19 20:45:39 -03:00
Nadav Har'El
3018df11b5 Allow reading exactly desired byte ranges and fast_forward_to
In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170619152629.11703-1-nyh@scylladb.com>
2017-06-19 18:31:32 +03:00
Avi Kivity
6e2c9ef9fb Revert "Allow reading exactly desired byte ranges and fast_forward_to"
This reverts commit 317d7fc253 (and also the
related 2c57ab84b2).  It causes crashes
during range scans, reported by Gleb:

"To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s
dataset and 3 node cluster.

Backtrace:
    at /home/gleb/work/seastar/seastar/core/apply.hh:36
    rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57
    range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142
    at ./seastar/core/future.hh:890
    at /home/gleb/work/seastar/seastar/core/future-util.hh:119
    at /home/gleb/work/seastar/seastar/core/future-util.hh:142
2017-06-18 16:10:21 +03:00
Calle Wilund
3464422051 commitlog_test: Fix reader test dropping rp handles
Test wants data in live segments to read from, so should
not just drop the handles returned from allocate.
Message-Id: <1497344532-2616-1-git-send-email-calle@scylladb.com>
2017-06-16 22:45:46 +01:00
Etienne Kruger
be0a947596 tests: perf_simple_query: Add delete perf test
Add a performance test for deletion in addition to the existing update
and query tests. The deletion performance test is executed using the
'--delete' argument to perf_simple_query.

Fixes #2417.

Signed-off-by: Etienne Kruger <el@loadavg.io>
Message-Id: <20170615232500.26987-1-el@loadavg.io>
2017-06-16 14:51:00 +01:00
Nadav Har'El
317d7fc253 Allow reading exactly desired byte ranges and fast_forward_to
In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170614072122.13473-1-nyh@scylladb.com>
2017-06-15 13:22:46 +01:00
Duarte Nunes
5736468a71 mutation_partition_serializer: Assume range tombstone support
Range tombstones were introduced in version 1.3 and there exists
no direct upgrade from 1.2 to vnext, so we can retire the code
enforcing backwards compatibility.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170614211654.82501-1-duarte@scylladb.com>
2017-06-15 09:54:05 +03:00
Avi Kivity
e11f1c9cc3 tests: fix partitioner_test build on gcc 5 2017-06-14 17:22:01 +03:00
Avi Kivity
419ad9d6cb Merge "repair memory usage fix" from Asias
"This series switches repair to use more stream plans to stream the mismatched
sub ranges and use a range generator to produce sub ranges.

Test shows no huge memory is used for repair with large data set.

In addition, we now have a progress reporter in the log how many ranges are processed.

   Jun 06 14:18:22  [shard 0] repair - Repair 512 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942]
   Jun 06 14:19:55  [shard 0] repair - Repair 513 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942]

Fixes #2430."

* tag 'asias/fix-repair-2430-branch-master-v1' of github.com:cloudius-systems/seastar-dev:
  repair: Remove unused sub_ranges_max
  repair: Reduce parallelism in repair_ranges
  repair: Tweak the log a bit
  repair: Use more stream_plan
  repair: iterator over subranges instead of list
2017-06-08 14:19:08 +03:00
Calle Wilund
2913241df1 memtable/commitlog: Change bookkeep to track individul segments
Use per CF-id reference count instead, and use handles as result of 
add operations. These must either be explicitly released or stored
(rp_set), or they will release the corresponding replay_position
upon destruction. 

Note: this does _not_ remove the replay positioning ordering requirement
for mutations. It just removes it as a means to track segment liveness.
2017-06-07 12:07:01 +00:00
Calle Wilund
0c598e5645 commitlog_test: Fix test_commitlog_delete_when_over_disk_limit
Test should
a.) Wait for the flush semaphore
b.) Only compare segement sets between start and end, not start,
    end and inbetwen. I.e. the test sort of assumed we started
    with < 2 (or so) segments. Not always the case (timing)

Message-Id: <1496828317-14375-1-git-send-email-calle@scylladb.com>
2017-06-07 12:44:02 +03:00
Nadav Har'El
b3ff37e67f repair: iterator over subranges instead of list
When starting repair, we divided the large token ranges (vnodes) linto small
subranges of a desired length (around 100 partition), and built a huge list
of those subranges - to iterate over them later and compare checksums of
those chunks.

However, building this list up-front is completely unnecessary, and wastes
a lot of memory: In a test with 1 TB of data, as much as 3 gigabytes was
spent on this list. Instead, what we do in this patch is to find the next
chunk in a DFS-like splitting algorithm, using only the token range
midpoint() function (as before). The amount of memory needed for this is
O(logN), instead of O(N) in the previous implementation.

Refs #2430.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2017-06-07 08:50:56 +08:00
Vlad Zolotarov
0619c2cb71 utils::serialization: remove not used deserialization_xxx() functions
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1495556124-16672-1-git-send-email-vladz@scylladb.com>
2017-05-26 19:26:20 +03:00
Tomasz Grabiec
f3a6d94398 sstables: Introduce sstable::as_mutation_source()
Adaptors extracted from existing testing code.
Message-Id: <1495729508-30081-1-git-send-email-tgrabiec@scylladb.com>
2017-05-25 19:30:20 +03:00
Avi Kivity
fd0e1eb1e2 Merge "Fixes for mutation algebra" from Tomasz
"Enforces commutativity of addition:

 m1 + m2 == m2 + m1

and consistency of difference and addition with equality:

 m1 + (m2 - m1) == m1 + m2"

* tag 'tgrabiec/fix-range-tombstone-commutativity-v2' of github.com:cloudius-systems/seastar-dev:
  mutation: Make compare_*_for_merge() consistent with equals()
  tests: mutation: Improve assertion failure message
  tests: Use default equality in test_mutation_diff_with_random_generator
  mutation: Make counter cell difference consistent with apply
  tests: range_tombstone_list_test: Improve error message
  tests: range_tombstone_list: Check adjacent range merging
  range_tombstone_list: Merge adjacent range tombstones in apply()
  tests: mutation: Check commutativity of mutation addition
  range_tombstone_list: Avoid violating set invariant
  range_tombstone_list: Make tombstone merging commutative
  range_tombstone_list: Add erase() operation to the reverter
  range_tombstone_list: Make all undo operations ordered relative to each other
  utils: Extract to_boost_visitor() to a separate header
  allocating_strategy: Introduce alloc_strategy_unique_ptr<>
2017-05-23 15:20:38 +03:00
Tomasz Grabiec
804f46f684 mutation: Make compare_*_for_merge() consistent with equals()
equals() considers expiring cells to be different form non-expiring cells,
but compare_row_marker_for_merge() considers them equal. Fix the latter to
pick expiring cells. The choice was arbitrary.
2017-05-23 13:35:03 +02:00
Tomasz Grabiec
c1475a8eb2 tests: mutation: Improve assertion failure message 2017-05-23 13:16:03 +02:00
Tomasz Grabiec
d15880b3b7 tests: Use default equality in test_mutation_diff_with_random_generator 2017-05-23 13:16:03 +02:00
Tomasz Grabiec
951da421db tests: range_tombstone_list_test: Improve error message 2017-05-23 13:16:03 +02:00
Tomasz Grabiec
bee40b4628 tests: range_tombstone_list: Check adjacent range merging 2017-05-23 13:16:03 +02:00
Tomasz Grabiec
3c509308ab range_tombstone_list: Merge adjacent range tombstones in apply()
Needed for equivalence to work correctly with difference and addition:

  m1 + (m2 - m1) = m1 + m2

Fixes #2158.
2017-05-23 13:16:03 +02:00
Tomasz Grabiec
ef4c7c458c tests: mutation: Check commutativity of mutation addition 2017-05-23 12:11:12 +02:00
Tomasz Grabiec
5aeb9eb70c utils: Extract to_boost_visitor() to a separate header 2017-05-22 19:30:02 +02:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Avi Kivity
c8cb3d6ff5 Merge "Materialized views: bug fixes and unit tests" from Duarte
"This series fixes bugs related to materialized views, most pertaining
to column filtering in the where clause."

* 'materialized-views/bug-fixes/v1' of https://github.com/duarten/scylla:
  tests/view_schema_test: Add more test cases
  tests/cql_assertions: Add assertion for row set equality
  single_column_relation: Correctly print IN relation
  statement_restrictions: Allow filtering regular columns for views
  statement_restrictions: Relax clustering restrictions for views
  statement_restrictions: Relax partition restrictions for views
  cql3/statements: Prevent setting default ttl on view
  cql3/restrictions: Complete implementation of is_satisfied_by()
  db/view: Re-implement clustering_prefix_matches()
  db/view: Re-implement partition_key_matches()
  db/view: Generate regular tombstone for base deletions
  db/view: Consider cell liveness when generating updates
  db/view: Don't generate view updates for static rows
2017-05-20 13:52:56 +03:00
Avi Kivity
ba31619594 tests: fix partitioner_test for g++ 5
It can't make the leap from dht::ring_position to
stdx::optional<range_bound<dht::ring_position>> for some reason.
2017-05-18 13:09:41 +03:00
Avi Kivity
2aa5b3e20c Merge "Improve perf_fast_forward test" from Tomasz
"Notably:
  - add validation of the results (e.g. fragment count, expectations about disk activity)
  - add cache-specific tests"

* 'tgrabiec/add-cache-tests-to-perf-fast-forward' of github.com:cloudius-systems/seastar-dev:
  tests: perf_fast_forward: Report cache stats
  row_cache: Keep counters in a struct
  tests: perf_fast_forward: Add cache-specific tests
  tests: perf_fast_forward: Extract test_reading_all()
  tests: perf_fast_forward: Add validation of the results
  tests: perf_fast_forward: Fix partition scans to read the expected amount of fragments
  tests: perf_fast_forward: Allow the test to be interrupted
  tests: perf_fast_forward: Allow testing with cache enabled
  row_cache: Implement mutation_reader::fast_forward_to() for cache scanner
2017-05-17 18:06:02 +03:00
Paweł Dziepak
3ecceaee48 Merge "Fix fast_forward_to() on sstable reader being ignored in some cases" from Tomasz
"When mutation reader enters the partition using index,
streamed_mutation object is returned to the user before the row start
fragment is processed. In that case, when we process the row start, we
should ignore it and not call setup_for_partition() again. That may
override user's fast_forward_to() request."

* 'tgrabiec/fix-initial-fast-forward-to-for-single-key-sstable-readers' of github.com:scylladb/seastar-dev:
  tests: mutation_source_test: Test forwarding in single-key readers
  sstables: Remove unused code
  sstables: mutation_reader: Fix setup_for_partition() being called twice in some cases
  sstables: Fix verify_end_state() to tolerate ATOM_START_2 state
2017-05-17 15:35:30 +01:00
Tomasz Grabiec
777ffa3a27 tests: perf_fast_forward: Report cache stats 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
7a81f5e980 tests: perf_fast_forward: Add cache-specific tests 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
1a7b03004a tests: perf_fast_forward: Extract test_reading_all() 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
a38fd16f89 tests: perf_fast_forward: Add validation of the results 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
3c3ea51657 tests: perf_fast_forward: Fix partition scans to read the expected amount of fragments
make_pkeys() needs to be invoked with n equal to the number of keys
which the table was populated with. Otherwise the extra keys, which
are missing in the table, may be placed anywhere in the vector due to
ring order sorting, and break the assumption that the table contains
all keys from the array up to index n. This resulted in the test
reading slighlty less fragments than it would follow from the desired
count.

Another problem is that we should not skip the fast_forward_to() call
for the inital range (workaround for a bug in sstable mutation
reader), otherwise we will read slightly less than expected as well.
2017-05-17 14:15:14 +02:00
Tomasz Grabiec
49a0bc3847 tests: perf_fast_forward: Allow the test to be interrupted 2017-05-17 14:15:14 +02:00
Tomasz Grabiec
5c7f5643a6 tests: perf_fast_forward: Allow testing with cache enabled 2017-05-17 14:15:14 +02:00
Avi Kivity
44a1a51987 tests: add tests for dht::split_range_to_single_shard() 2017-05-17 13:50:30 +03:00
Avi Kivity
6eb6f12909 tests: add test for ring_position_exponential_sharder 2017-05-17 13:18:52 +03:00
Avi Kivity
025c6b45b2 dht: extend i_partitioner::next_token_for_shard()
Right now, next_token_for_shard() only allows iterating linearly in shard
order.  Add the ability to select a specific shard to skip to (in case we're
only interested in a single shard), and to select larger ranges (so that
exponential increases are not implemented by iteration).
2017-05-17 12:30:03 +03:00
Duarte Nunes
ef252036ba tests/view_schema_test: Add more test cases
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 11:21:58 +02:00
Duarte Nunes
0861a66853 tests/cql_assertions: Add assertion for row set equality
For row set equality, the order of the actual rows doesn't need to
match the order of the expected rows.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:19 +02:00
Duarte Nunes
f365b7f1f7 tests: Add test case for nonwrapping_range::intersection()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:18 +02:00
Avi Kivity
f5dae826ce Merge "Migrate schema tables to v3 format" from Calle
"Defines origin v3-format for system/schema tables, and use them for
schema storage/retrival.

Includes a legacy_schema_migrator implementation/port from origin. Note
that since we don't support features like triggers, functions and
aggregates, it will bail if encountering such a feature used.

Note also that this patch set does not convert the "hints" and
"backlog" tables, even though these have changed in v3 as well.
That will be a separate patch set.

Tested against dtests. Note that patches for dtest + ccm
will follow."

* 'calle/systemtables' of github.com:cloudius-systems/seastar-dev: (36 commits)
  legacy_schema_migrator: Actually truncate legacy schema tables on finish
  database: Extract "remove" from "drop_columnfamily"
  v3 schema test fixes
  thrift: Update CQL mapping of static CFs
  schema_tables: Use v3 schema tables and formats
  type_parser: Origin expects empty string -> bytes_type
  cf_prop_defs: Add crc_check_chance as recognized (even if we don't use)
  types_test: v3 style schemas enforce explicit "frozen" in tupes/ut:s
  cql3_type: v3 to_string
  cql_types: Introduce cql3_type::empty and associate with empty data_type
  schema: rename column accessors to be in line with origin
  schema: Add "is_static_compact_table"
  schema_builder: Add helper to generate unique column names akin origin
  schema: Add utility functions for static columns
  schema: Use heterogeneous comparator for columns bounds
  cql3_type_parser: Resolve from cql3 names/expressions
  cql3_type: Add "prepare_interal" and "references_user_type"
  cql3::cql3_type: Add prepare_internal path using only "local" holders
  cql3_type: Add virtual destructor.
  database/main: encapsulate system CF dir touching
  ...
2017-05-17 11:25:52 +03:00
Vlad Zolotarov
494ea82a88 utils::UUID: align the UUID serialization API with the similar API of other classes in the project
The standard serialization API (e.g. in data_value) includes the following methods:

size_t serialized_size() const;
void serialize(bytes::iterator& it) const;
bytes serialize() const;

Align the utils::UUID API with the pattern above.

The only addition is that we are going to make an output iterator parameter of a second method above
a template so that we may serialize into different output sources.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-16 15:56:03 -04:00
Vlad Zolotarov
7706775a63 utils: serialization: unify the variety of serialize_XXX(...)
Use the same templated implementation for all different serialize_XXX(...).

The chosen implementation is based on the std::copy_n(char*, size, OutputIterator),
which is heavily optimized and will be using memcpy/memmove where possible.

This patch also removes the not needed specializations that accept signed integer
values since we were casting them to unsigned value anyway.

The std::ostream based specifications are also removed since they are not used
anywhere except for a test-serialization.cc and adjusting the ostream to the iterator
is a single-liner.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-16 15:56:03 -04:00
Tomasz Grabiec
bdf3c536aa tests: mutation_source_test: Test forwarding in single-key readers 2017-05-16 13:36:10 +02:00
Duarte Nunes
a69039df03 tests/batchlog_manager_test: Fix failure
Since a9f6e5f8da, metrics can't be
duplicated. This patch works around that by avoiding to create a new
batchlog_manager (one is already created by the cql_test_env).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170510191047.6154-1-duarte@scylladb.com>
2017-05-11 08:28:08 +02:00
Calle Wilund
66991a7ccb v3 schema test fixes 2017-05-10 16:44:48 +00:00
Calle Wilund
3d90152dc5 types_test: v3 style schemas enforce explicit "frozen" in tupes/ut:s 2017-05-10 16:44:48 +00:00
Calle Wilund
0e6ae8dec2 schema: rename column accessors to be in line with origin
More pointedly: Expose columns as is (currently
all_columns_in_select_order), expose name->column mapping more
appropriately named. 

Renaming like this is not strictly neccesary, but there is a point to
trying to keep nomenclature similar-ish with origin, esp. when select
order column need to become filtered (spoiler alert).
2017-05-10 16:44:48 +00:00