Commit Graph

1300 Commits

Author SHA1 Message Date
Tomasz Grabiec
9da078a18a tests: logalloc_test: Print debugging info and abort on failure
The test started to fail sporadically on jenkins after
7a00dd6985 due to quiesce() timing out. It's not
clear though if this is a regression because before the series such timeouts
would not cause test failure if the future resulves eventually, timeout was
only logged.

I was not able to reproduce it on my setup nor on jenkins, so let's add more
debugging output and trigger a coredump next time the test fails.

Message-Id: <1487089576-27147-1-git-send-email-tgrabiec@scylladb.com>
2017-02-15 14:41:49 +02:00
Tomasz Grabiec
7ec8c4cf54 tests: mutation_source_test: Test that slicing returns only relevant range tombstones 2017-02-13 20:52:50 +01:00
Tomasz Grabiec
2b8bd10dca tests: Pass all mutation source parameters 2017-02-13 20:52:49 +01:00
Tomasz Grabiec
25dffef6ae tests: mutation_source_tests: Ensure timestamps are strictly monotonic 2017-02-13 16:19:32 +01:00
Tomasz Grabiec
e6a95fd8cc tests: streamed_mutation_assertions: Add more expectation methods 2017-02-13 16:19:32 +01:00
Tomasz Grabiec
62843175ea tests: streamed_mutation_assertions: Make produces_end_of_stream() give better error messages 2017-02-13 16:19:32 +01:00
Paweł Dziepak
4ffe0401ee test/mutation_source: specify whether to generate counter mutations
Tests using random mutation generator should be provided with bot
counter and non-counter mutations to ensure that both cases are
sufficiently covered. However, mixed schemas (with both counter and
non-counter columns) are not allowed so the RMG has to be explicitly
told whether to use counter or non-counter schema.
2017-02-07 15:17:14 +00:00
Paweł Dziepak
294bf0bb7a tests/canonical_mutation: don't try to upgrade incompatible schemas
Test case test_reading_with_different_schemas uses randomly generated
pairs of mutations and tries to upgrade one to the schema of the other.

However, there are cases when one schema cannot be upgraded to another,
for example, counter and non-counter schemas.
2017-02-07 15:17:14 +00:00
Avi Kivity
b18e54307f tests: add --operations-per-shard option to perf_simple_query
This helps achieve more repeatable runs that can then be compared via the
Linux perf tool.  The option overrides duration-based testing and runs the
test for a specific number of iterations.
Message-Id: <20170204172937.8462-1-avi@scylladb.com>
2017-02-06 12:08:04 +01:00
Avi Kivity
7a00dd6985 Merge "Avoid avalanche of tasks after memtable flush" from Tomasz
"Before, the logic for releasing writes blocked on dirty worked like this:

  1) When region group size changes and it is not under pressure and there
     are some requests blocked, then schedule request releasing task

  2) request releasing task, if no pressure, runs one request and if there are
     still blocked requests, schedules next request releasing task

If requests don't change the size of the region group, then either some request
executes or there is a request releasing task scheduled. The amount of scheduled
tasks is at most 1, there is a single releasing thread.

However, if requests themselves would change the size of the group, then each
such change would schedule yet another request releasing thread, growing the task
queue size by one.

The group size can also change when memory is reclaimed from the groups (e.g.
when contains sparse segments). Compaction may start many request releasing
threads due to group size updates.

Such behavior is detrimental for performance and stability if there are a lot
of blocked requests. This can happen on 1.5 even with modest concurrency
because timed out requests stay in the queue. This is less likely on 1.6 where
they are dropped from the queue.

The releasing of tasks may start to dominate over other processes in the
system. When the amount of scheduled tasks reaches 1000, polling stops and
server becomes unresponsive until all of the released requests are done, which
is either when they start to block on dirty memory again or run out of blocked
requests. It may take a while to reach pressure condition after memtable flush
if it brings virtual dirty much below the threshold, which is currently the
case for workloads with overwrites producing sparse regions.

I saw this happening in a write workload from issue #2021 where the number of
request releasing threads grew into thousands.

Fix by ensuring there is at most one request releasing thread at a time. There
will be one releasing fiber per region group which is woken up when pressure is
lifted. It executes blocked requests until pressure occurs."

* tag 'tgrabiec/lsa-single-threaded-releasing-v2' of github.com:cloudius-systems/seastar-dev:
  tests: lsa: Add test for reclaimer starting and stopping
  tests: lsa: Add request releasing stress test
  lsa: Avoid avalanche releasing of requests
  lsa: Move definitions to .cc
  lsa: Simplify hard pressure notification management
  lsa: Do not start or stop reclaiming on hard pressure
  tests: lsa: Adjust to take into account that reclaimers are run synchronously
  lsa: Document and annotate reclaimer notification callbacks
  tests: lsa: Use with_timeout() in quiesce()
2017-02-02 17:49:31 +02:00
Piotr Jastrzebski
36b2c4df19 row_cache_test: extend test_mvcc
Make the test execute with and without an active
reader to memtable that's flushed to cache.

This improves the code covarage of MVCC with tests.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <007b6cd1ba7a84ea5675ea82e454bf1adf3b3330.1485954941.git.piotr@scylladb.com>
2017-02-02 13:51:32 +01:00
Paweł Dziepak
8671d8329d perf_simple_query: add counter tables tests 2017-02-02 10:35:14 +00:00
Paweł Dziepak
99b21fbb86 tests: random_mutation_generator: generate counter cells 2017-02-02 10:35:14 +00:00
Paweł Dziepak
de2acd47c9 tests/sstables: test reading and writing counters 2017-02-02 10:35:14 +00:00
Paweł Dziepak
5905729c4a sstables: read counter cells 2017-02-02 10:35:14 +00:00
Paweł Dziepak
de698105e4 tests/counter: test apply, difference and freeze 2017-02-02 10:35:14 +00:00
Paweł Dziepak
496b42fcc7 tests: add test for counters 2017-02-02 10:35:13 +00:00
Tomasz Grabiec
2fd339787b tests: lsa: Add test for reclaimer starting and stopping 2017-02-01 17:41:56 +01:00
Tomasz Grabiec
f943296da0 tests: lsa: Add request releasing stress test 2017-02-01 17:41:55 +01:00
Piotr Jastrzebski
c7e95af0b0 row_cache_test: fix test_mvcc
Currently the test does not wait for cache update
to finish before carrying on with the checks.

This makes the test nondeterministic and purely wrong
because checks expect update to be finished.

This patch changes the test to wait for update to finish.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <2a99bba24b1628466d3495332b48ef3ccdb43c26.1485862389.git.piotr@scylladb.com>
2017-01-31 11:37:29 +00:00
Tomasz Grabiec
f053b48f7c tests: lsa: Adjust to take into account that reclaimers are run synchronously 2017-01-30 19:18:07 +01:00
Tomasz Grabiec
ed9ff19467 lsa: Document and annotate reclaimer notification callbacks
They are called from region_group::update(), so must be alloc-free and
noexcept.
2017-01-30 19:18:07 +01:00
Tomasz Grabiec
2ec6fe415e tests: lsa: Use with_timeout() in quiesce()
Current consutrct doesn't interrupt the test, the timeout failure will
only be logged.
2017-01-30 19:18:07 +01:00
Pekka Enberg
be0351b49c cql3: Introduce raw_value and raw_value_view types
Currently, the code is using bytes_opt and bytes_view_opt to represent
CQL values, which can hold a value or null. In preparation for
supporting a third state, unset value introduced in CQL v4, introduce
new raw_value and raw_value_view types and use them instead.

The new types are based on boost::variant<> and are capable of holding
null, unset values, and blobs that represent a value.
2017-01-26 13:50:04 +02:00
Tomasz Grabiec
2c7902fb2b Revert "lsa: Reduce reclamation latency"
This reverts commit d61002cc33.

Introduced a regression in row_cache_alloc_stress.

The problem is that reclaim_from_evictable() evicts way too much after
the refactor due to the stop condition not taking into account how
much data was evicted so far and only looking at occupancy of the
minimal segment. This may lead to eviction of the whole region.
2017-01-26 10:43:18 +01:00
Duarte Nunes
54a464ae27 random_mutation_generator: Always generate range tombstones
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-01-23 19:02:23 +01:00
Duarte Nunes
a01aa91c82 range_tombstone_list: Add unit tests for difference()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-01-23 18:14:33 +01:00
Benoît Canet
bcc826cc34 mutation_reader: Short circuit the read path on empty range
Add a boolean to short circuit the read path on empty range
hoping for some speedup.

tested in read write with cs using:

cl=QUORUM duration=1m -mode native cql3 -rate threads=700 -node localhost

Will do some additional benchmark.

Fixes #1056

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <20170118194451.16836-1-benoit@scylladb.com>
2017-01-20 10:05:40 +00:00
Tomasz Grabiec
d61002cc33 lsa: Reduce reclamation latency
Currently eviction is performed until occupancy of the whole region
drops below the 85% threshold. This may take a while if region had
high occupancy and is large. We could improve the situation by only
evicting until occupancy of the sparsest segment drops below the
threshold, as is done by this change.

I tested this using a c-s read workload in which the condition
triggers in the cache region, with 1G per shard:

 lsa-timing - Reclamation cycle took 12.934 us.
 lsa-timing - Reclamation cycle took 47.771 us.
 lsa-timing - Reclamation cycle took 125.946 us.
 lsa-timing - Reclamation cycle took 144356 us.
 lsa-timing - Reclamation cycle took 655.765 us.
 lsa-timing - Reclamation cycle took 693.418 us.
 lsa-timing - Reclamation cycle took 509.869 us.
 lsa-timing - Reclamation cycle took 1139.15 us.

The 144ms pause is when large eviction is necessary.

The change improves worst case latency. Reclamation time statistics
over 30 second period after cache fills up, in microseconds:

Before:

  avg = 1524.283148
  stdev = 11021.021118
  min = 12.934000
  max = 144356.000000
  sum = 257603.852000
  samples = 169

After:

  avg = 1317.362414
  stdev = 1913.542802
  min = 263.935000
  max = 19244.600000
  sum = 175209.201000
  samples = 133

Refs #1634.

Message-Id: <1484730859-11969-1-git-send-email-tgrabiec@scylladb.com>
2017-01-19 17:35:36 +02:00
Tomasz Grabiec
ddfee57c97 Replace iostream include with iosfwd in headers
Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>
2017-01-17 14:52:44 +02:00
Paweł Dziepak
e03868c226 tests: run with all features enabled
Since ce083308a1
"random_mutation_generator: Generate RTs by default" random mutation
generator produces range tombstones. However, so far the tests were run
with all features disabled (because of incomplete initialization of all
services) which meant that RANGE_TOMBSTONE feature was not enabled and
the code couldn't handle range tombstones that weren't just prefixes.

This patch solves the problem by forcing all features to be enabled when
tests are run.
Message-Id: <20170116103324.22956-1-pdziepak@scylladb.com>
2017-01-16 11:38:45 +01:00
Duarte Nunes
ce083308a1 random_mutation_generator: Generate RTs by default
This patch changes the random_mutation_generator so it generates range
tombstones by default. This fixes a bug where reversibly applying
range tombstones wasn't being tested.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170110164822.28747-1-duarte@scylladb.com>
2017-01-11 09:24:37 +00:00
Avi Kivity
0591303b72 Merge "avoid excessive memory usage during resharding" from Rapahel
"Intended to reduce memory usage when resharding by sharing sstable
components among shards. File descriptors are also shared from now
on, meaning that a much smaller number of file descriptors will be
used during resharding.

Fixes #1951."

branch 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla

* 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla:
  db: avoid excessive memory usage during resharding
  checked_file_impl: add support to dup
  sstables: group sstable components that can be shared among shards
  sstables: rename sstable member
2017-01-09 20:43:50 +02:00
Raphael S. Carvalho
68dfcf5256 db: avoid excessive memory usage during resharding
After resharding, sstables may be owned by all shards, which
means that file descriptors and memory usage for metadata will
increase by a factor equal to number of shards. That can easily
lead to OOM.

SSTable components are immutable, so they can be stored in one
shard and shared with others that need it. We use the following
formula to decide which shard will open the sstable and share
it with the others: (generation % smp::count), which is the
inverse of how we calculate generation for new sstables.
So if no resharding is performed, everything is shard-local.
With this approach, resource usage due to loaded sstables will
be evenly distributed among shards.

For this approach to work, we now only populate keyspaces from
shard 0. It's now the sole responsible for iterating through
column family dirs. In addition, most of population functions
are now free and take distributed database object as parameter.

Fixes #1951.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-01-09 15:24:36 -02:00
Avi Kivity
77cb2b452f Merge "CQL 3.3.1 support" from Pekka
"This patch series adds support for CQL 3.3.1. The changes to CQL are listed
here:

  https://github.com/apache/cassandra/blob/cassandra-2.2/doc/cql3/CQL.textile#changes

The following CQL features are already supported by Scylla:

  - TRUNCATE TABLE alias
  - Double-dollar string literals
  - Aggregate functions: MIN, MAX, SUM, and AVG

This series adds the following CQL features:

  - New data types: tinyint, smallint, date, and time
  - CQL binary protocol v4 (required by the new data types)
  - Advertise Cassandra 2.2.8 version from Scylla so that drivers correctly
    detect the presence of CQL 3.3.1

The following CQL features are not supported by Scylla:

  - Role-based access control (issue #1941)
  - JSON data type
  - User-defined functions (UDFs)
  - User-defined aggregates (UDAs)

The following CQL binary protocol v4 changes are not implemented by this
series:

  - Read_failure and Write_failure error codes are not implemented.
    They error codes not used by the smart drivers but as they are
    propagated to application code, we eventually need to wire them up
    to our storage proxy implementation.
  - Function_failure error code is only used by user-defined functions
    and the fromJson function, which are not implemented by Scylla.

Fixes #1284."

* 'penberg/cql-3.3.1/v5' of github.com:cloudius-systems/seastar-dev:
  version: Bump Cassandra version to 2.2.8
  db/schema_tables: Add schema_functions and schema_aggregates tables
  tests/type_tests: TIME type test cases
  tests/cql_query_test: TIME type test cases
  cql3: TIME data type support
  tests/type_tests: DATE type test cases
  tests/cql_query_test: DATE type test cases
  cql3: DATE type support
  date.h: 64-bit year and days representation
  licenses: Add utils/date.h license
  utils/date.h: Import date and time library sources
  tests/type_tests: TINYINT and SMALLINT type test cases
  tests/cql_query_test: TINYINT and SMALLINT type test cases
  cql3: TINYINT and SMALLINT data type support
  types: Fix integer_type_impl::parse_int() for bytes
2017-01-09 11:54:45 +02:00
Pekka Enberg
10facd7db8 tests/type_tests: TIME type test cases 2017-01-09 10:42:21 +02:00
Pekka Enberg
a49ee9387e tests/cql_query_test: TIME type test cases 2017-01-09 10:42:20 +02:00
Pekka Enberg
9ceea7bbc4 tests/type_tests: DATE type test cases 2017-01-09 10:42:20 +02:00
Pekka Enberg
f0cbfb9e4f tests/cql_query_test: DATE type test cases 2017-01-09 10:42:20 +02:00
Raphael S. Carvalho
eed2a7d065 sstables: group sstable components that can be shared among shards
We intend to share immutable sstable components among shards to
reduce excessive memory usage when resharding shared sstables.

This change is about grouping those components into a structure,
and using foreign ptr to make sure that the structure will be
deleted by whichever shard created it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-01-06 15:16:19 -02:00
Raphael S. Carvalho
a492f8dfaf sstables: rename sstable member
Rename _components to _recognized_components because _components
will be used to name a field with shareable components.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-01-06 15:16:17 -02:00
Avi Kivity
be11b054e1 Merge "Reduce the size of mutation_partition" from Piotr
"Reduce the size of mutation_partition by implementing intrusive set using
bi::rbtree_algorithms directly and using tree nodes optimized for size.

This will reduce the size of mutation_partition by:
24 bytes + <number of cql rows> * 8 bytes

This should have a positive impact on performance because mutation_partitions
are stored both in memtable and cache.

Fixes #742."

* 'haaawk/742' of github.com:cloudius-systems/seastar-dev:
  intrusive_set: rename size() to calculate_size()
  Make intrusive_set_external_comparator::_value_traits static
  Implement intrusive set using rbtree_algorithms
  mutation_partition: make apply_reversibly_intrusive_set nongeneric
  mutation_partition: take schema in find_row and clustered_row
  mutation_partition: Extract intrusive set logic to a class.
  mutation_partition: Replace value_comp with key_comp calls
2017-01-05 17:34:10 +02:00
Piotr Jastrzebski
b159e08764 intrusive_set: rename size() to calculate_size()
This hopefully will make it more apparent that
the time complexity of this method is O(N) not O(1).

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-01-05 12:21:43 +01:00
Piotr Jastrzebski
4bbe05dd47 mutation_partition: take schema in find_row and clustered_row
This will allow intrusive set implementation that does not
store schema.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-01-05 11:26:03 +01:00
Pekka Enberg
0ea5652354 tests/type_tests: TINYINT and SMALLINT type test cases 2017-01-05 10:57:35 +02:00
Pekka Enberg
41e3327ebc tests/cql_query_test: TINYINT and SMALLINT type test cases 2017-01-05 10:57:35 +02:00
Pekka Enberg
060841b756 tests/types_test: Fix int32 type string conversion boundary case
The test case is interested in the upper boundary of 32-bit integer
because we already test the lower boundary in assertions below. The old
test passed, of course, but it wasn't very interesting.
Message-Id: <1483522773-6008-1-git-send-email-penberg@scylladb.com>
2017-01-04 11:57:02 +01:00
Avi Kivity
868b4d110c Merge "Fixes for intentional short reads" from Paweł
"This patchset contains fixes for the changes introduced in "Query result
size limiting". It also improves handling of short data reads.

I order to minimise chances of digest mismatch during data queries replicas
that were asked just to return a digest also keep track of the size of the
data (in the IDL representation) so that they would stop at the same point
nodes doing full data queries would. Moreover, data queries are not
affected by per-shard memory limit and the coordinator sends individual
result size limits to replicas in order not to depend on hardcoded values.

It is still possible to get digest mismatches if the IDL changes (e.g. a
new field is added), but, hopefully, that won't be a serious problem."

* 'pdziepak/short-read-fixes/v4' of github.com:cloudius-systems/seastar-dev:
  query: introduce result_memory_accounter::foreign_state
  storage_proxy: fix short reads in parallel range queries
  storage_proxy: pass maximum result size to replicas
  mutation_partition: use result limiter for digest reads
  query: make result_memory_limiter constants available for linker
  result_memory_limiter: add accounter for digest reads
  idl: allow writers to use any output stream
  result_memory_limiter: split new_read() to new_{data, mutation}_read()
  idl: is_short_read() was added in 1.6
  mutation_partition: honour allowed_short_read for static rows
  storage_proxy: fix _is_short_read computation
  storage_proxy: disallow short reads if got no live rows
  storage_proxy: don't stop after result with no live rows
2016-12-26 10:42:49 +02:00
Avi Kivity
1d9ee358f1 Revert "Merge "Reduce the size of mutation_partition" from Piotr"
This reverts commit aa392810ff, reversing
changes made to a24ff47c637e6a5fd158099b8a65f1191fc2d023; it uses
boost::intrusive::detail directly, which it must not, and doesn't compile on
all boost versions as a consequence.
2016-12-25 16:07:48 +02:00
Avi Kivity
aa392810ff Merge "Reduce the size of mutation_partition" from Piotr
"Reduce the size of mutation_partition by implementing intrusive set using
bi::rbtree_algorithms directly and using tree nodes optimized for size.

This will reduce the size of mutation_partition by:
24 bytes + <number of cql rows> * 8 bytes

This should have a positive impact on performance because mutation_partitions
are stored both in memtable and cache.

Fixes #742."

* 'haaawk/742' of github.com:cloudius-systems/seastar-dev:
  intrusive_set: rename size() to calculate_size()
  Make intrusive_set_external_comparator::_value_traits static
  Implement intrusive set using rbtree_algorithms
  mutation_partition: make apply_reversibly_intrusive_set nongeneric
  mutation_partition: take schema in find_row and clustered_row
  mutation_partition: Extract intrusive set logic to a class.
  mutation_partition: Replace value_comp with key_comp calls
2016-12-25 12:56:10 +02:00