Commit Graph

1215 Commits

Author SHA1 Message Date
Paweł Dziepak
b8d737ff0a tests/row_cache_test: verify that eviction follows lru
Refs #1847.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1479231555-28191-1-git-send-email-pdziepak@scylladb.com>
2016-11-15 18:57:54 +01:00
Duarte Nunes
e680587b8a sstable_test: Be explicit about uncompressed tables
After 7c28ed, the schemas defined in the test became compressed by
default. This patch changes the test so that it is explicit about
which schemas shouldn't define a compressor.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1478646530-5558-1-git-send-email-duarte@scylladb.com>
2016-11-09 11:21:59 +02:00
Piotr Jastrzebski
50b41f7d1d Fix row_cache_test
partition_range passed to row_cache::make_reader
has to be kept alive as long as the resulting reader
is used.

Otherwise weird things start to happen.

This used to work just because of a pure luck.
When I started changing the row_cache implementation
I run into very weird behaviors for this tests.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <2c9e337dbbcf35f4e1394cad043eda10b8c2bd4a.1478602876.git.piotr@scylladb.com>
2016-11-08 13:28:53 +01:00
Calle Wilund
11baf37ab5 commitlog: Prevent exceptions in stream::produce from being set twice
Fixes #1775
stream lacks a check "is_open", which is a bummer. We have to both
prevent exception propagation and add a flag of our own to make sure
exceptions in producer code reaches consumer, and does not simply
get lost in the reactor.
Message-Id: <1478508817-18854-1-git-send-email-calle@scylladb.com>
2016-11-07 11:41:33 +01:00
Paweł Dziepak
985d2f6d4a Merge "Remove quadratic behavior from atomic sstable deletion" from Avi
"The atomic sstable deletion provides exception safety at the cost of
quadratic behavior in the number of sstables awaiting deletion.  This
causes high cpu utilization during startup.

Change the code to avoid quadratic complexity, and add some unit tests.

See #1812."
2016-11-04 15:48:04 +00:00
Avi Kivity
f75aceabc5 sstables: add unit tests for atomic deletion
We simulate shards deleting sstables, but this is all happening on a single
core, and no sstables are harmed during test execution.
2016-11-04 15:48:43 +02:00
Avi Kivity
1d77e3a03a partitioner: add unit tests for token_for_next_shard()
i_partitioner::token_for_next_shard() is an inverse for
i_partitioner::shard_of(), test that this is so.
2016-11-03 19:10:20 +02:00
Avi Kivity
a35136533d Convert ring_position and token ranges to be nonwrapping
Wrapping ranges are a pain, so we are moving wrap handling to the edges.

Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.

Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
2016-11-02 21:04:11 +02:00
Raphael S. Carvalho
53b7b7def3 sstables: handle unrecognized sstable component
As in C*, unrecognized sstable components should be ignored when
loading a sstable. At the moment, Scylla fails to do so and will
not boot as a result. In addition, unknown components should be
remembered when moving a sstable or changing its generation.

Fixes #1780.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b7af0c28e5b574fd577a7a1d28fb006ac197aa0a.1478025930.git.raphaelsc@scylladb.com>
2016-11-02 12:44:53 +02:00
Avi Kivity
7faf2eed2f build: support for linking statically with boost
Remove assumptions in the build system about dynamically linked boost unit
tests.  Includes seastar update which would have otherwise broken the
build.
2016-10-26 08:51:21 +03:00
Piotr Jastrzebski
27726cecff Clean up position_in_partition.
Introduce position_in_partition_view and use it in
position() method in mutation_fragment, range_tombstone,
static_row and clustering_row.
Clean up comparators in position_in_partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c65293c71a6aa23cf930ed317fb63df1fdc34fd1.1477399763.git.piotr@scylladb.com>
2016-10-25 15:13:20 +01:00
Paweł Dziepak
210a390892 tests: add missing sstable for partition skipping test
Commit 7dcd70124a "tests/sstables: add
test for fast forwarding reader" added a test for skipping parts of
sstable. Unfortunately, it did not include the sstables it was trying to
read.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 23:23:49 +01:00
Paweł Dziepak
0c24bbe639 tests/row_cache: add fast_forward_to() to throttled reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
69645455f3 tests/row_cache: count mutations read from _underlying
Originally, cache tests checked how many times a mutation reader was
created from the underlying mutation source to determine whether
continuity flag is working correctly.

This is not going to work with fast forwarding mutation readers so the
test is switched to count number of mutations (+ end of stream markers)
returned from underlying mutaiton readers which is much less fragile.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
6755a679f6 drop key readers
key_readers weren't used since introduction of continuity flag to cache
entries.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
5ac9babe97 tests/mutation_reader: test fast forwarding combined reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
7dcd70124a tests/sstables: add test for fast forwarding reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
5534dc2817 tests: add more helpers to mutation reader assertions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
cf024975fe sstables: enable fast forwarding for range readers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
f49a9e0d64 sstables: drop unused read_range_rows() overload
That overload was used only by unit test and violated guarantee that
partition range lives until mutation reader is done.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
25b91c51e2 ssables: add data_consume_rows_context::reset()
reset() is going to be used to restore valid state after fast forwarding
the reader.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Tomasz Grabiec
308434f891 tests: memtable: Add test for partition version list consistency after compaction 2016-10-18 11:57:14 +02:00
Tomasz Grabiec
d836e8f64b tests: memtable: Add tests for flushing reader
Message-Id: <1476454187-11462-1-git-send-email-tgrabiec@scylladb.com>
2016-10-14 15:11:06 +01:00
Avi Kivity
9ac441d3b5 range: adjust split_after to allow split_point outside input range
Make split_after() more generic by allowing split_point to be anywhere,
not just within the input range.  If the split_point is before, the entire
range is returned; and if it is after, stdx::nullopt is returned.

"before" and "after" are not well defined for wrap-around ranges, so
but we are phasing them out and soon there will not be
wrapping_range::split_after() users.

This is a prerequisite for converting partition_range and friends to
nonwrapping_range.
Message-Id: <1475765099-10657-1-git-send-email-avi@scylladb.com>
2016-10-06 17:54:44 +02:00
Avi Kivity
c94fb1bf12 build: reduce inclusions of messaging_service.hh
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.

Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
2016-10-05 11:46:49 +03:00
Avi Kivity
f8118d9fc2 Merge "Virtual dirty memory management" from Glauber
"Description:
============

Scylla currently suffers from a brick wall behavior of the request throttler.
Requests pile up until we reach the dirty memory limit, at which point we stop
serving them until we have freed enough memory to allow for more requests.

The problem is that freeing dirty memory means writing an SSTable to completion.
That can take a long time, even if we are blessed with great disks. Those long
waiting times can and will translate into timeouts. That is bad behavior.

What this patch does is introduce one form of virtual dirty memory accounting.
Instead of allowing 100 % of the dirty memory to be filled up until we stop
accepting requests, we will do that when we reach 50 % of memory. However,
instead of releasing requests only when an SSTable is fully written, we start
releasing them when some memory was written.

The practical effect of that, is that once we reach 50 % occupancy in our dirty
memory region, we will bring the system from CPU speed to disk speed, and will
start accepting requests only at the rate we are able to write memory back.

Results
=======

With this patchset running a load big enough to easily saturate the disk,
(commitlog disabled to highlight the effects of the memtable writer), I am able
to run scylla for many minutes, with timeouts occurring only when I run out of
disk space, whereas without this patch a swarm of timeouts would start merely 2
seconds after the load started - and would never get stable.

In V2, I have sent a set of graphs illustrating the performance of this solution.
This version does not have any significant differences in that front.

For details, please refer to
https://groups.google.com/d/msg/scylladb-dev/iCvD-3Z-QqY/EM8KUh_MAQAJ

Accuracy of the accounting:
---------------------------
It is important for us to be as accurate as possible when accounting freed
memory, since every byte we mark as freed may allow one or more requests to be
executed.  I have measured the accuracy of this approach (ignoring padding,
object size for the mutation fragments) to be 99.83 % of used memory in the
test workload I have ran (large, 65k mutations). Memtables under this circumnstance
tend to have a very high occupancy ratio because throttle breeds idle, and idle
breeds compact-on-idle.

Known Issues:
-------------

A lot of time can be elapsed between destroying the flush_reader and actually
releasing memory. The release of memory only happens when the SSTable is fully
sealed, and we have to flush the files, as well as finish writing all SSTable
components at this point. This happened in practice with a buggy kernel that
would result in flushes taking a long time.

After that is fixed, this is just a theoretical problem and in practice it
shouldn't matter given the time we expect those operations to take."

* 'virtual-dirty-v6' of github.com:glommer/scylla:
  database: allow virtual dirty memory management
  streamed_mutation: make _buffer private
  add accounting of memory read to partition_snapshot_reader
  move partition_snapshot_reader code to header file
  LSA: allow a group to query its own region group
  memtables: split scanning reader in two
  sstables: use special reader for writing a memtable
  LSA: export information about object memory footprint
  LSA: export information about size of the throttle queue
  database: export virtual dirty bytes region group
2016-10-04 20:57:52 +03:00
Glauber Costa
28e3f2f6ee LSA: export information about object memory footprint
We allocate objects of a certain size, but we use a bit more memory to hold
them.  To get a clerer picture about how much memory will an object cost us, we
need help from the allocator. This patch exports an interface that allow users
to query into a specific allocator to get that information.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-04 10:39:10 -04:00
Avi Kivity
a51804eca8 Merge "token_restriction: Deal with minimum tokens" from Duarte
"This patch set ensures we can correctly handle queries
where the minimum token is specified."

* 'min-token/v3' of github.com:duarten/scylla:
  cql_query_test: Add test case for min/max token bounds
  token_restriction: Deal with minimum tokens
  partitioner: Parse token from bytes
2016-10-02 12:32:40 +03:00
Raphael S. Carvalho
a8ab4b8f37 lcs: fix starvation at higher levels
When max sstable size is increased, higher levels are suffering from
starvation because we decide to compact a given level if the following
calculation results in a number greater than 1.001:
level_size(L) / max_size_for_level_l(L)

Fixes #1720.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-30 14:09:49 -03:00
Raphael S. Carvalho
a3bf7558f2 lcs: fix broken token range distribution at higher levels
Uniform token range distribution across sstables in a level > 1 was broken,
because we were only choosing sstable with lowest first key, when compacting
a level > 0. This resulted in performance problem because L1->L2 may have a
huge overlap over time, for example.
Last compacted key will now be stored for each level to ensure sort of
"round robin" selection of sstables for compactions at level >= 1.
That's also done by C*, and they were once affected by it as described in
https://issues.apache.org/jira/browse/CASSANDRA-6284.

Fixes #1719.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-30 14:09:16 -03:00
Duarte Nunes
72af476397 cql_query_test: Add test case for min/max token bounds
This patch adds a test case for specifying the minimum and maximum
tokens in a cql3 query.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-30 11:45:45 +00:00
Raphael S. Carvalho
0eaa0f46c9 sstables: store first and last decorated keys in sstable object
leveled strategy uses heavily first and last decorated keys of a
sstable to get overlapping sstables in a given level. By storing
first and last decorated keys in sstable object, it's expected
that performance of leveled strategy (not compaction) will be
improved.
We will set first and last keys in sstable when either loading
or sealing it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>
2016-09-19 13:25:58 +02:00
Raphael S. Carvalho
dffb41f9d8 sstables: remove schema parameter from some sstable methods
schema can now be found in the sstable object itself.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>
2016-09-19 13:25:58 +02:00
Tomasz Grabiec
2282599394 tests: Add test for UUID type ordering
Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>
2016-09-16 11:07:14 +01:00
Gleb Natapov
2e8b255741 Merge seastar upstream
* seastar 0303e0c...e534401 (6):
  > Merge "enable rpc to work on non contiguous memory for receive" from Gleb
  > install-dependencies.sh: install python3 for Ubuntu/Debian, which requires for configure.py
  > fix tcp stuck when output_stream write more than 212992 bytes once.
  > scripts/posix_net_conf.sh: supress 'ls: cannot access /sys/class/net/<NIC>/device/msi_irqs/' error message
  > scripts/posix_net_conf.sh: fix 'command not found' error when specifies --cpu-mask
  > native_network_stack: Fix use after free/missing wait in dhcp

Includes: "Remove utils::fragmented_input_stream and utils::input_stream in favor of seastar version" from Gleb.
2016-09-15 12:12:16 +03:00
Raphael S. Carvalho
2a426ab248 tests: add test to check tombstone metadata
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:35 -03:00
Raphael S. Carvalho
026853fabb tests: add test to check composite validity
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:30 -03:00
Raphael S. Carvalho
1f31223f32 sstables: store schema in sstable object
That will be needed for optimization that will store decorated keys
in the sstable object, and also for a subsequent work that will
detect wrong metadata (min/max column names) by looking at columns
in the schema. As schema is stored in sstable, there's no longer
a need to store ks and cf names in it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:17 -03:00
Glauber Costa
dc5d8e33af Revert "row_cache: update sstable histograms on cache hits"
This reverts commit 1726b1d0cc.

Reverting this patch turns our SSTable access counter into a miss counter only.
The estimated histogram always starts its first bucket at 1, so by marking cache
accesses we will be wrongly feeding "1" into the buckets.

Notice that this is not yet ideal: nodetool is supposed to show a histogram of
all reads, and by doing this we are changing its meaning slightly. Workloads
that serve mostly from cache will be distorted towards their misses.

The real solution is to use a different histogram, but we will need to enforce
a newer version of nodetool for that: the current issue is that nodetool expects
an EstimatedHistogram in a specific format in the other side.

Conflicts:
	row_cache.hh

Message-Id: <a599fa9e949766e7c9697450ae34fc28e881e90a.1472742276.git.glauber@scy
lladb.com>
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-09-01 18:07:31 +03:00
Duarte Nunes
f4cf2f2aef tracing: Make trace_state_ptr argument required
This patch makes the optional trace_state_ptr arguments introduced in
previous patches mandatory where possible. Functions which are called
internally don't have a trace context, so for those we keep the
argument's default value for convenience.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:32 +02:00
Glauber Costa
1726b1d0cc row_cache: update sstable histograms on cache hits
If we have a cache hit, we still need to update our sstable histogram - notting
that we have touched 0 SSTables.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:14:22 -04:00
Paweł Dziepak
e981101fa9 Merge "Remove clustering_key_filtering_context" from Piotr
"clustering_key_filtering_context is no longer needed.
partition_slice can be used instead so this series removes
clustering_key_filtering_context and passes partition_slice down where
it's needed. Then a static get_ranges method is used to obtain
clustering key ranges for a given partition.

Fixes #1614."
2016-08-30 22:30:15 +01:00
Piotr Jastrzebski
3607d99269 Remove clustering_key_filtering_context.
Remove clustering_key_filter_factory and clustering_key_filtering_context.
Use partition_slice directly with a static get_ranges method.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-08-30 20:31:55 +02:00
Paweł Dziepak
6012a7e733 mutation_partition: fix iterator invalidation in trim_rows
Reversed iterators are adaptors for 'normal' iterators. These underlying
iterators point to different objects that the reversed iterators
themselves.

The consequence of this is that removing an element pointed to by a
reversed iterator may invalidate reversed iterator which point to a
completely different object.

This is what happens in trim_rows for reversed queries. Erasing a row
can invalidate end iterator and the loop would fail to stop.

The solution is to introduce
reversal_traits::erase_dispose_and_update_end() funcion which erases and
disposes object pointed to by a given iterator but takes also a
reference to and end iterator and updates it if necessary to make sure
that it stays valid.

Fixes #1609.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com>
2016-08-25 16:52:35 +03:00
Paweł Dziepak
222bde7e6f bytes_ostream: introduce upper bound on chunk size
This patch makes append() and write() limit the maximum size of a single
allocation to bytes_ostream::max_chunk_size.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-08-22 09:31:33 +01:00
Paweł Dziepak
0ee98ea4c4 tests: add fragmented input stream test
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-08-22 09:31:33 +01:00
Paweł Dziepak
148e9c5608 streamed_mutation_from_mutation: fix destroying bi::sets
Once unlink_leftmost_without_rebalance() has been called on a bi::set no
other method can be used. This includes clear_and_disposed() used by the
mutation_partition destructor.

We like unlink_leftmost_without_rebalance() because it is efficient, so
the solution is to manually finish destroying clustering row and range
tombstone sets in the reader destructor using that function.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-08-17 11:03:59 +01:00
Piotr Jastrzebski
bb0c4c3c40 Fix compilation errors
query::range parameter in mutation_partiton::range
has to be changed to nonwrapping_range.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <36e444bfe90586f8d3b08ca36d8dc13d5898ef97.1471347402.git.piotr@scylladb.com>
2016-08-16 12:49:54 +01:00
Duarte Nunes
be4adf212a nonwrapping_range: Add unit tests
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00
Duarte Nunes
bb16e194bc range: Add nonwrapping_range class
This patch introduces the nonwrapping_range class. This class is
intended to be used by code that requires non wrapping ranges.
Internally, it uses a wrapping_range. Users are responsible for
ensuring the bounds are correct when creating a nonwrapping_range.

The path proposed here is to incrementally replace usages of
wrapping_range/range by nonwrapping_range, pushing usages of wrapping
ranges as further to the edges as possible.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-08-15 14:48:57 +00:00