Commit Graph

11716 Commits

Author SHA1 Message Date
Tomasz Grabiec
0d40b86546 Merge "bail sooner from cache update if need_preempt()" from Glauber
An earlier patch of mine was using should_yield to do the same.  That
is a better direction, but should_yield() was demonstrably more
expensive so for now we'll go with need_preempt() - since this is
hurting pretty much every latency-dependent workload.

I am also including the scripts that I have used to measure and
compare the various versions of this patch.
2017-01-31 09:51:34 +01:00
Tomasz Grabiec
f053b48f7c tests: lsa: Adjust to take into account that reclaimers are run synchronously 2017-01-30 19:18:07 +01:00
Tomasz Grabiec
ed9ff19467 lsa: Document and annotate reclaimer notification callbacks
They are called from region_group::update(), so must be alloc-free and
noexcept.
2017-01-30 19:18:07 +01:00
Tomasz Grabiec
2ec6fe415e tests: lsa: Use with_timeout() in quiesce()
Current consutrct doesn't interrupt the test, the timeout failure will
only be logged.
2017-01-30 19:18:07 +01:00
Pekka Enberg
a625aae489 cql3/values.hh: Fix to_bytes_opt(raw_value)
The data() method already returns a bytes_opt so there's no need to call to_bytes_opt() again.

Fixes compliation failure on CentOS:

  In file included from ./cql3/query_options.hh:51:0,
                   from ./cql3/cql_statement.hh:47,
                   from ./cql3/statements/raw/select_statement.hh:45,
                   from build/release/gen/cql3/CqlParser.hpp:65,
                   from build/release/gen/cql3/CqlParser.cpp:44:
  ./cql3/values.hh: In function 'bytes_opt to_bytes_opt(const cql3::raw_value&)':
  ./cql3/values.hh:184:37: error: no matching function for call to 'to_bytes_opt(bytes_opt)'
       return to_bytes_opt(value.data());

Message-Id: <1485761863-28236-1-git-send-email-penberg@scylladb.com>
2017-01-30 10:49:31 +02:00
Gleb Natapov
6e4817137e storage_proxy: report foreground reads instead of reads
The reason is the same as why foreground writes are reported instead of
total writes (049ae37d08): It is much easier to see what is going on
this way.

Also fixes a typo in a counter's description.

Fixes #1217

Message-Id: <20170129093412.GS11469@scylladb.com>
2017-01-29 12:40:56 +02:00
Avi Kivity
9fb2f31616 Merge "CQL binary protocol unset value support" from Pekka
This patch series adds support for "unset values" that were introduced
in CQL binary protocol v4. They allow bound statements to skip updates
to some or all of the bound variables.

Unset values are specified using the BoundStatement.unset() method in
the Java driver:

  http://docs.datastax.com/en/drivers/java/3.1/com/datastax/driver/core/BoundStatement.html#unset-int-

and using the UNSET_VALUE constant in the Python driver:

  https://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.UNSET_VALUE

Fixes #2039.

* 'penberg/cql-unset-values/v2' of github.com:cloudius-systems/seastar-dev:
  transport/server: CQL unset value support
  cql3/statements/select_statement: Unset value support
  cql3/user_types: Unset value support
  cql3/tuples: Unset value support
  cql3/maps: Unset value support
  cql3/sets: Unset value support
  cql3/lists: Unset value support
  cql3/constants: UNSET_VALUE constant
  cql3/constants: Unset value support
  cql3/attributes: Unset value support
  types.hh: Add field_name_as_string() to user_type_impl type
  cql3: Introduce raw_value and raw_value_view types
2017-01-29 10:59:01 +02:00
Pekka Enberg
533c8d3949 transport/server: CQL unset value support
This patch implements support for CQL unset values at the protocol level.

Fixes #2039
2017-01-27 09:24:36 +02:00
Pekka Enberg
2bd560118e cql3/statements/select_statement: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
baaf1779c5 cql3/user_types: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
99c7dabd2a cql3/tuples: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
a0e6f6f371 cql3/maps: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
f883e64d70 cql3/sets: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
50ec81ee67 cql3/lists: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
c4cd0a6541 cql3/constants: UNSET_VALUE constant 2017-01-27 09:24:36 +02:00
Pekka Enberg
063be3ed44 cql3/constants: Unset value support 2017-01-27 09:24:36 +02:00
Glauber Costa
b4ac2c1d60 debug: add systemtap script to measure interesting latencies during cache updates.
Example output:

Measuring Scylla row cache update times ^C
Total update time, (usec)
value |-------------------------------------------------- count
    2 |                                                   0
    4 |                                                   0
    8 |@@                                                 2
   16 |@@@                                                3
   32 |                                                   0
   64 |                                                   0
  128 |@@@@                                               4
  256 |@@                                                 2
  512 |                                                   0
 1024 |                                                   0

Time spent per partition batch (nsec)
 value |-------------------------------------------------- count
   128 |                                                       0
   256 |                                                       0
   512 |                                                      43
  1024 |                                                       2
  2048 |                                                       2
  4096 |                                                      45
  8192 |                                                     349
 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  61494
 32768 |@@@@@@@@@@@@@@@@@                                  21497
 65536 |                                                       0
131072 |                                                       0

Partitions updated per batch:
value |-------------------------------------------------- count
    0 |                                                      57
    1 |                                                      46
    2 |                                                      76
    4 |                                                     134
    8 |                                                     324
   16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  82795
   32 |                                                       0
   64 |                                                       0

Total partitions updated: 2485000
Average time spent per partition batch (nsec): 28816
Average time per partition per partition (nsec): 967

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-01-26 22:15:16 -05:00
Glauber Costa
69dbb3e108 row_cache: yield if need_preempt(), even if there is quota left.
The quota check is quite old at the moment, and dates back to a time in
which the infrastructure in seastar threads was lacking a lot. It is a
bad check since it will not take into consideration the size of the
partition or the time it takes to merge them.

A better check would at least take need_preempt() into account, so that
we would respect the task quota. That check is now embedded into
should_yield(), so there would no need to check anything else.

Although should_yield() does the job, it is still currently quite
expensive. And because we are in a seastar thread with a computationally
intensive loop, it can hurt latency a lot.

So as a temporary measure, let's at least check for need_preempt() - as
it is hurting real users at the moment - and soon work on making
should_yield() cheaper.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-01-26 22:10:54 -05:00
Glauber Costa
0e1f64b163 row_cache: add systemtap markers for the update process
update is one of our biggest sources of performance issues as far as the
cache is concerned. systemtap can be useful in helping tracking some of
them down.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-01-26 21:56:32 -05:00
Duarte Nunes
937ed1bacb bound_view: Simplify copy ctor
By using default generation.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Reviewed-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1485355007-1913-1-git-send-email-duarte@scylladb.com>
2017-01-26 19:29:29 +02:00
Avi Kivity
b91b9b351a Revert "Merge seastar upstream"
This reverts commit f301c678bfe5eb5df71f71fd20e08b422b1023bb; the rpc changes
don't compile due to rpc timeout type change.
2017-01-26 18:30:56 +02:00
Avi Kivity
f301c678bf Merge seastar upstream
* seastar 397685c...f5fa2e3 (3):
  > rpc: use lowres_clock instead of high resolution one
  > semaphore: make semaphore's clock configurable
  > rpc: detect timedout outgoing packets earlier
2017-01-26 18:16:14 +02:00
Pekka Enberg
3385144860 cql3/attributes: Unset value support 2017-01-26 13:50:04 +02:00
Pekka Enberg
630aba32ff types.hh: Add field_name_as_string() to user_type_impl type
This is needed to construct validation error messages when user types
encounter unset values.
2017-01-26 13:50:04 +02:00
Pekka Enberg
be0351b49c cql3: Introduce raw_value and raw_value_view types
Currently, the code is using bytes_opt and bytes_view_opt to represent
CQL values, which can hold a value or null. In preparation for
supporting a third state, unset value introduced in CQL v4, introduce
new raw_value and raw_value_view types and use them instead.

The new types are based on boost::variant<> and are capable of holding
null, unset values, and blobs that represent a value.
2017-01-26 13:50:04 +02:00
Gleb Natapov
64660397fc storage_proxy: move operation type information from counter's name to a label
Makes it much more flexible to view the data in various ways in Graphana.

Message-Id: <20170126102746.GL11469@scylladb.com>
2017-01-26 12:38:29 +02:00
Tomasz Grabiec
2c7902fb2b Revert "lsa: Reduce reclamation latency"
This reverts commit d61002cc33.

Introduced a regression in row_cache_alloc_stress.

The problem is that reclaim_from_evictable() evicts way too much after
the refactor due to the stop condition not taking into account how
much data was evicted so far and only looking at occupancy of the
minimal segment. This may lead to eviction of the whole region.
2017-01-26 10:43:18 +01:00
Paweł Dziepak
8cdffd7c57 time_type_impl: value initialize result
parse_time() adds hourse, minutes, etc to a final value 'result'.
However, it is of type std::chrono::nanoseconds which means it is not
zeroed at initialization unless it is explicitly asked to do so.

Fixed debug mode failures in types_tyes and cql_query_test.

Message-Id: <20170125155239.1253-1-pdziepak@scylladb.com>
2017-01-25 17:56:31 +02:00
Paweł Dziepak
034d028329 Merge "range_tombstone_list: Properly implement difference()" from Duarte
"This patchset properly implements range_tombstone_list::difference(),
which was very broken. We add unit tests for the function and ensure
we always randomly generate range_tombstones in other unit tests so
other problems aren't hidden."
2017-01-25 12:08:19 +00:00
Duarte Nunes
8c65b98ea7 mutation_merger: Emit deferred tombstones
This patch ensures the mutation_merger emits any deferred tombstones
that it still may be holding before closing the stream.

Together with the range_tombstone_list: Properly implement
difference() patch set, this fixes breakage of streamed_mutation_test
and row_cache_test.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170123195643.9876-1-duarte@scylladb.com>
2017-01-25 12:02:03 +00:00
Takuya ASADA
bce0fb3fa2 dist: add lspci on dependencies, since it used by dpdk-devbind.py
On minimum setup environment scylla_sysconfig_setup will fail because lspci command is not installed. So install it on package installation time.

Fixes #2035

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1485327435-20543-1-git-send-email-syuu@scylladb.com>
2017-01-25 10:22:57 +02:00
Avi Kivity
d2fc98270e Merge seastar upstream
* seastar 6d80c6a...397685c (4):
  > Merge "add label to the io_queue" from Amnon
  > rpc: Modify the shutdown code to wait and handle exceptions
  > tls.cc: Fix shutdown_input/output to conform with expected socket behaviour
  > core: Add counter for polls
2017-01-24 18:36:25 +02:00
Gleb Natapov
ccee01f352 storage_proxy: put datacenter name into a label instead of counter's name
Having datacenter name as a label makes it possible to create Prometheus board for the counters.

Message-Id: <20170124132051.GX11469@scylladb.com>
2017-01-24 15:27:34 +02:00
Duarte Nunes
54a464ae27 random_mutation_generator: Always generate range tombstones
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-01-23 19:02:23 +01:00
Duarte Nunes
a01aa91c82 range_tombstone_list: Add unit tests for difference()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-01-23 18:14:33 +01:00
Duarte Nunes
85315d1760 range_tombstone_list: Correctly implement difference()
The difference method wasn't properly implemented. The version in this
patch correctly computes the difference and returns a range tombstone
list contains those range tombstones in "this" but absent from the
other, specified range tombstone list.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-01-23 18:14:33 +01:00
Duarte Nunes
e7d20ea900 range_tombstone_list: Add apply() convenience overload
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-01-23 18:14:33 +01:00
Duarte Nunes
0847954d92 bound_view: Add copy ctor and assignment operator
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-01-23 18:14:33 +01:00
Avi Kivity
1758361640 Merge seastar upstream
* seastar 38aaa4a...6d80c6a (2):
  > DPDK: Change the metrics registration with label support
  > metric: Fix the error: could not convert {...} from <brace-enclosed initializer list> to struct metric_definition_impl
2017-01-23 11:55:21 +02:00
Takuya ASADA
f6d7a76223 dist: rename dist/ubuntu to dist/debian
Now we supported both Ubuntu and Debian on dist/ubuntu, and Ubuntu is one of
Debian variant, so dist/debian is better naming.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1485161896-21851-1-git-send-email-syuu@scylladb.com>
2017-01-23 10:59:52 +02:00
Avi Kivity
31c8e6885b build: improve support for custom builds
Add a counter field to RELEASE, just before the date, and fix it at zero.
This allows custom package builds to override it in a way that sorts before
the official packages.

Example:

  Official release:   1.6.0-0.20160120.<githash>
  Custom release 1:   1.6.0-1.avi.20160121.<githash>
  Custom release 2:   1.6.0-2.avi.20160122.<githash>

The counter (0/1/2) ensures that the build number dominates over the date
when sorting.

Message-Id: <20170122102814.19649-1-avi@scylladb.com>
2017-01-22 14:56:52 +02:00
Avi Kivity
1be9c232b6 Merge seastar upstream
* seastar ff098c8...38aaa4a (1):
  > metrics: equal operator should use ==
2017-01-22 14:41:59 +02:00
Tomasz Grabiec
834df74df0 Merge batch statement optimization from github.com/avikivity/scylla/1689/v2
From Avi:

In many cases, batch statements are used to mutate a single partition, or
a number of partitions that is smaller than the number of statements within
the batch.  We can detect this case and reduce the numbers of mutations
applied, and in some cases, convert a logged batch into an unlogged batch.

Ref #1689.
2017-01-20 13:44:05 +01:00
Tomasz Grabiec
6c75614d19 sstables: Fix input_stream not being closed by index_reader
Fixes #2022
Message-Id: <1484912679-5729-1-git-send-email-tgrabiec@scylladb.com>
2017-01-20 11:58:33 +00:00
Paweł Dziepak
19ad35610b sstables: do not discard future returned by fast_forward_to()
continuous_data_consumer::fast_forward_to() returns a future which was
later ignored by data_consume_context::fast_forward_to().

With the current implementation, the future in question is always ready
and that's why the problem didn't manifest itself in the form of crashes
or invalid results.
Message-Id: <20170120105746.7300-1-pdziepak@scylladb.com>
2017-01-20 12:22:17 +01:00
Avi Kivity
a9403877e4 cql3: add more metrics for batch statements
- how many statements are in a batch
 - different types of batches
 - whether we were able to convert a logged batch to an unlogged batch
2017-01-20 13:19:00 +02:00
Avi Kivity
e3c003544d cql3: optimize batch_statement when the same partition is mutated multiple times
Batch statements are often used to insert multiple rows into the same
partition.  Recognize this case and merge mutations to the same partition.

If the result is a single mutation, there is an additional win (already
present in the code), where a logged batch can be converted into an unlogged
batch.

Ref #1689.
2017-01-20 13:18:56 +02:00
Benoît Canet
bcc826cc34 mutation_reader: Short circuit the read path on empty range
Add a boolean to short circuit the read path on empty range
hoping for some speedup.

tested in read write with cs using:

cl=QUORUM duration=1m -mode native cql3 -rate threads=700 -node localhost

Will do some additional benchmark.

Fixes #1056

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <20170118194451.16836-1-benoit@scylladb.com>
2017-01-20 10:05:40 +00:00
Avi Kivity
54b8acdd9f dht: add hashing and comparison helpers to dht::decorarted_key
An std::hash specialization, and an equality comparator.
2017-01-20 11:24:14 +02:00
Avi Kivity
141048e0e5 dht: improve token hash function
For a small token, we can just return it, since it already is a hash.
We hash large tokens using murmur3, which is supposedly a good hash.
2017-01-20 11:24:14 +02:00