Commit Graph

11716 Commits

Author SHA1 Message Date
Paweł Dziepak
210a390892 tests: add missing sstable for partition skipping test
Commit 7dcd70124a "tests/sstables: add
test for fast forwarding reader" added a test for skipping parts of
sstable. Unfortunately, it did not include the sstables it was trying to
read.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 23:23:49 +01:00
Glauber Costa
1578d7363a commitlog: rework blocking logic
The current incarnation of commitlog establishes a maximum amount of
writes that can be in-flight, and blocks new requests after that limit
is reached.

That is obviously something we must do, but the current approach to it
is problematic for two main reasons:

1) It forces the requests that trigger a write to wait on the current
   write to finish. That is excessive; ideally we would wait for one
   particular write to finish, not necessarily the current one. That
   is made worse by the fact that when a write is followed by a flush
   (happens when we move to a new segment), then we must wait for
   *all* writes in that segment to finish.

1) it casts concurrency in terms of writes instead of memory, which
   makes the aforementioned problem a lot worse: if we have very big
   buffers in flight and we must wait for them to finish, that can
   take a long time, often in the order of seconds, causing timeouts.

The approach taken by this patch is to replace the _write_semaphore
with a request_controller. This data structure will account the amount
of memory used by the buffers and set a limit on it. New allocations
will be held until we go below that limit, and will be released
as soon as this happens.

This guarantees that the latencies introduced by this mechanism are
spread out a lot better among requests and will keep higher percentile
latencies in check.

To test this, I have ran a workload that times out frequently. That
workload use 10 threads to write 100 partitions (to isolate from the
effects of the memtable introduced latencies) in a loop and each
partition is 2MB in size.

After 10 minutes running this load, we are left with the following
percentiles:

latency mean              : 51.9 [WRITE:51.9]
latency median            : 9.8 [WRITE:9.8]
latency 95th percentile   : 125.6 [WRITE:125.6]
latency 99th percentile   : 1184.0 [WRITE:1184.0]
latency 99.9th percentile : 1991.2 [WRITE:1991.2]
latency max               : 2338.2 [WRITE:2338.2]

After this patch:

latency mean              : 54.9 [WRITE:54.9]
latency median            : 43.5 [WRITE:43.5]
latency 95th percentile   : 126.9 [WRITE:126.9]
latency 99th percentile   : 253.9 [WRITE:253.9]
latency 99.9th percentile : 364.6 [WRITE:364.6]
latency max               : 471.4 [WRITE:471.4]

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-19 13:56:36 -04:00
Glauber Costa
aec724bbda commitlog: factor out code for checking mutation size
In a subsequent patch, I'll use this code in a different place. To
prepare for that, we move it out as a method. It also fits a lot better
inside the segment manager, so move it there.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-19 13:49:47 -04:00
Glauber Costa
a50996f376 commitlog: calculate segment-independent size of mutations
Goal is to calculate a size that is lesser or equal than the
segment-dependent size.

This was originally written by Tomasz, and featured in his submission
"commitlog: Handle overload more gracefully"

Extracted here so it sits clearly in a different patch.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-19 13:49:47 -04:00
Glauber Costa
0b7c9fa17f commitlog: remove _needed_size
It is mostly an optimization, and while it makes sense in this context,
it won't soon as we'll stop waiting for the current cycle specifically
to finish.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-19 13:49:47 -04:00
Glauber Costa
6214bdeb66 commitlog: move segment_manager constructor outside the class definition
We'll do that so we can, in following patches, use static members from
the segment. Those are not defined at this point.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-19 13:49:47 -04:00
Glauber Costa
299877f432 commitlog: add a counter for pending allocations
We track the amount of pending allocations but we don't really export
it. It will be crucial when we stop tracking pending writes.

This patch exports it through a method instead of the totals structure,
so we can easily change it. Current code probing pending_allocations
(the api code) is also converted to use the public method instead of the
totals struct.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-19 13:49:47 -04:00
Avi Kivity
07c995ab3d Merge "Fast forward mutation readers" from Paweł
"This patchset enables mutation readers to be fast forwarded to a different
partition range. The main reason for introducing such feature are range
queries served from cache. If the cache is partially populated in the
requested range the reader will end up with multiple subranges that have
to be read from the sstables. Originally, each of these subranges would
require a new reader to be created, but with fast forwarding we can have
just one sstable reader. This is better since there is a chance that buffers
kept by the reader may be still useful after fast forwarding it.

In this series there are also patches that clean up cache readers in order
to make integration with fast forwarding easier. Namely, continuity flag is
changed to store information about range before the entry which significantly
simplifies the logic.

Fixes #1299."

* 'pdziepak/fast-forward-mutation-readers/v5' of github.com:cloudius-systems/seastar-dev: (24 commits)
  sstables: keep separate stream history for single and range reads
  sstables: drop sstable::{lower, upper}_bound()
  row_cache: rework cache to use fast forwarding reader
  row_cache: put cache entry flags in a struct
  row_cache: add do_find_or_create_entry() to reduce code duplication
  mutation_reader: forward fast_forward_to() calls
  tests/row_cache: add fast_forward_to() to throttled reader
  tests/row_cache: count mutations read from _underlying
  memtable: add support for fast_forward_to()
  drop key readers
  tests/mutation_reader: test fast forwarding combined reader
  database: enable fast forwarding of range_sstable_reader
  combined_mutation_reader: implement fast_forward_to()
  mutation_reader: make combinded_reader public
  tests/sstables: add test for fast forwarding reader
  tests: add more helpers to mutation reader assertions
  sstables: enable fast forwarding for range readers
  mutation_reader: introduce fast_forward_to()
  sstables: implement mutation_reader::impl::fast_forward_to()
  sstables: introduce index_reader
  ...
2016-10-19 18:10:44 +03:00
Paweł Dziepak
ab0eeae82d sstables: keep separate stream history for single and range reads
Single partition and partition range reads are expected to behave
considerably different so it is worth to have them use separate file
stream history. This also makes reads use different history for each
sstable which is also a good thing.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
20bfa1fa52 sstables: drop sstable::{lower, upper}_bound()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
5ff699e09f row_cache: rework cache to use fast forwarding reader
This uncomfortably large patch overhauls cache range reader so that it
can take advantage of fast forwarding mutation readers.

A significant change in the cache itself is that the continuity flag now
is used to determine whether cache is contiguous between the previous
entry and the current one. This allows for a significant simplification
of the cache code and easier integration with reader fast forwarding.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
18acb0c0e6 row_cache: put cache entry flags in a struct
Flags are easier to manage if they are in a single structure.
Especially, default initialization and move contstructors are simpler
and less error prone.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
f248e23db5 row_cache: add do_find_or_create_entry() to reduce code duplication
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
bcd374c05d mutation_reader: forward fast_forward_to() calls
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
0c24bbe639 tests/row_cache: add fast_forward_to() to throttled reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
69645455f3 tests/row_cache: count mutations read from _underlying
Originally, cache tests checked how many times a mutation reader was
created from the underlying mutation source to determine whether
continuity flag is working correctly.

This is not going to work with fast forwarding mutation readers so the
test is switched to count number of mutations (+ end of stream markers)
returned from underlying mutaiton readers which is much less fragile.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
e14f8027d5 memtable: add support for fast_forward_to()
Fast forwarding of memtable readers is needed only for unit tests which
often use memtables as underlying data source for cache and the cache
readers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
6755a679f6 drop key readers
key_readers weren't used since introduction of continuity flag to cache
entries.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
5ac9babe97 tests/mutation_reader: test fast forwarding combined reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
7bebfb851f database: enable fast forwarding of range_sstable_reader
When fast forwarding a reader that combines sstable reader we must also
remember that the set of sstables for the new range may be different
than for the previous one. The reader introduced in this patch makes
sure that we read from correct sstables.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
b7b7b2bd63 combined_mutation_reader: implement fast_forward_to()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
2c0cdd55fc mutation_reader: make combinded_reader public
We want to be able to fast forward sstable readers. However, just
implementing fast_forward_to() for combined_reader is not enough as the
sstables we are reading from may need to change.

Following patches are going to introduce a combined sstable reader that
derives from combined_reader. To make that possible we first need to
make combined_reader public.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
7dcd70124a tests/sstables: add test for fast forwarding reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
5534dc2817 tests: add more helpers to mutation reader assertions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
cf024975fe sstables: enable fast forwarding for range readers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
62c9492d33 mutation_reader: introduce fast_forward_to()
This patch introduces the interface for fast forwarding mutation
readers. The main user of this feature is going to be cache which, while
serving range query, may need to read multiple small ranges from the
sstables to populate itself with the missing entries.

Fast forwarding is an alternative to recreating a reader with different
range. Its main advantage is fact that it avoids dropping data that has
already been read.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
c63e88d556 sstables: implement mutation_reader::impl::fast_forward_to()
This patch allows sstable readers to be fast forwarded without making it
necessary to recreate the reader (and dropping all buffers in the
process). It is built on top of index_reader and ability of
data_consume_context to be fast forwarded.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
a530762277 sstables: introduce index_reader
index_reader is a helper that implements index lookups. Its goal is to
avoid dropping read buffers if they still may be needed (for example to
get end bound of the range or after fast forwarding the reader).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
f49a9e0d64 sstables: drop unused read_range_rows() overload
That overload was used only by unit test and violated guarantee that
partition range lives until mutation reader is done.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
0bc873ace5 sstables: add fast_forward_to() to continuous_data_consumer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
25b91c51e2 ssables: add data_consume_rows_context::reset()
reset() is going to be used to restore valid state after fast forwarding
the reader.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
2124d08b88 sstables: add skip() to compressed_file_data_source
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
54069162f5 Merge "Add test for partition version list consistency after compaction" from Tomek 2016-10-18 11:03:25 +01:00
Tomasz Grabiec
308434f891 tests: memtable: Add test for partition version list consistency after compaction 2016-10-18 11:57:14 +02:00
Tomasz Grabiec
6548132423 lsa: Make logalloc::tracker::full_compaction() compact all reclaimable regions
is_compactible() will pass on very small regions. full_compaction() is
only used in tests to force objects to be moved due to compaction, so
we want all reclaimable regions to be compacted.
2016-10-18 11:16:08 +02:00
Tomasz Grabiec
ecf85cbffb mutation: Define + operation
It's more convenient to write m1 + m2 in tests than to do more
elaborate constructs with copy constructors and apply().
2016-10-18 11:16:08 +02:00
Tomasz Grabiec
fe387f8ba0 partition_version: Fix corruption of partition_version list
The move constructor of partition_version was not invoking move
constructor of anchorless_list_base_hook. As a result, when
partition_version objects were moved, e.g. during LSA compaction, they
were unlinked from their lists.

This can make readers return invalid data, because not all versions
will be reachable.

It also casues leaks of the versions which are not directly attached
to memtable entry. This will trigger assertion failure in LSA region
destructor. This assetion triggers with row cache disabled. With cache
enabled (default) all segments are merged into the cache region, which
currently is not destroyed on shutdown, so this problem would go
unnoticed. With cache disabled, memtable region is destroyed after
memtable is flushed and after all readers stop using that memtable.

Fixes #1753.
Message-Id: <1476778472-5711-1-git-send-email-tgrabiec@scylladb.com>
2016-10-18 09:25:38 +01:00
Duarte Nunes
1d45f19c78 create_view_statement: Use cf_properties
This patch uses cf_properties instead to add the missing attributes to
the create_view_statement class.

Fixes #1766

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-10-18 01:18:52 +00:00
Duarte Nunes
7c58b7e764 unimplemented: Add materialized views
This patch adds the VIEWS element to the cause enum so we can
mark failures due to incomplete support of materialized views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-10-18 01:18:52 +00:00
Duarte Nunes
7c28ed3dfc schema: Extract default compressor
This patch extracts the definition of the default compressor into the
compression_parameters class, so that the table and view creation
statements don't have to explicitly deal with it.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-10-18 01:18:52 +00:00
Duarte Nunes
dc470e6a36 cql3: Extract cf_properties
This patch extracts the cf_properties class, which contains common
attributes of tables and materialized views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-10-18 01:18:51 +00:00
Takuya ASADA
587d375e19 main: exit with 1 when verify_seastar_io_scheduler() failed
Since we are exiting Scylla process in engine().at_exit() using
::_exit(0), even verify_seastar_io_scheduler() throwing an exception,
scylla always exit with 0.

Systemd misunderstands scylla-server.service was shutdown successfully
because of this, so we need to pass correct exit code to ::_exit() here.

Fixes #1674

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1475065607-15486-1-git-send-email-syuu@scylladb.com>
2016-10-17 13:57:00 +03:00
Avi Kivity
163088c6af Merge seastar upstream
* seastar 207bf3d...ccd8649 (3):
  > Merge "Augment semaphore with non-blocking operations" from Glauber
  > Merge "More dynamic fstream patches" from Paweł
  > Merge "fstream: add dynamic adjustments based on stream history" from Paweł
2016-10-17 12:49:17 +03:00
Avi Kivity
65c27ccf21 bytes_ostream: make max_chunk_size() an inline function
Fixes debug build looking for a variable definition and not finding it.
2016-10-17 11:49:33 +03:00
Avi Kivity
c0a1ad0b77 bytes_ostream: use larger allocations
A 1MB response will require 2000 allocations with the current 512-byte
chunk size.  Increase it exponentially to reduce allocation count for
larger responses (still respecting the upper limit).
Message-Id: <1476369152-1245-1-git-send-email-avi@scylladb.com>
2016-10-16 10:05:48 +01:00
Tomasz Grabiec
d836e8f64b tests: memtable: Add tests for flushing reader
Message-Id: <1476454187-11462-1-git-send-email-tgrabiec@scylladb.com>
2016-10-14 15:11:06 +01:00
Tomasz Grabiec
63784fd921 db: Fix corruption of partition_entry
Memory accounting code was attaching partition_snapshot to
partition_entry in order to calculate the size of partition_version
object. However, it is only allowed if partition_entry doesn't have
any snapshot attached already. In this case it always has one, created
by the flushing reader.

Change the accounting code to reuse existing partition_snapshot reference.

Fixes #1746
Message-Id: <1476449160-9252-1-git-send-email-tgrabiec@scylladb.com>
2016-10-14 15:10:48 +01:00
Paweł Dziepak
d08cffd3c7 lsa: avoid exceptions during segment_zone creation
LSA tries to allocate zones as large as possible (while still leaving
enough free space for the standard allocator). It uses the amount of
free memory in order to guess how much it can get, but that obviously
doesn't account for fragmentation and the allocation attempt may fail.

This patch changes the LSA code so that it doesn't throw in case zone
couldn't be created but just returns a null pointer which should be
more performant if the LSA memory cannot grow any more.

Fixes #1394.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1476435031-5601-1-git-send-email-pdziepak@scylladb.com>
2016-10-14 11:08:24 +02:00
Amnon Heiman
7829da13b4 scylla_setup: Reorder questions and actions
The expected behaviour in the scylla_setup script is that a question
will be followed by the answer.

For example, after asking if the scylla should be run as a service the
relevant actions will be taken before the following question.

This patch address two such mis-orders:
1. the scylla-housekeeping depends on the scylla-server, but the
setup should first setup the scylla-server service and only then ask
(and install if needed) the scylla-housekeeping.
2. The node_exporter should be placed after the io_setup is done.

Fixes #1739

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1476370098-25617-1-git-send-email-amnon@scylladb.com>
2016-10-13 18:29:36 +03:00
Pekka Enberg
3b4e6cdc5e abstract_replication_strategy: Fix exception type if class not found
Change abstract_replication_strategy::create_replication_strategy() to
throw exceptions::configuration_error if replication strategy class
lookup to make sure the error is converted to the correct CQL response.

Fixes #1755

Message-Id: <1476361262-28723-1-git-send-email-penberg@scylladb.com>
2016-10-13 17:39:28 +03:00