Commit Graph

11247 Commits

Author SHA1 Message Date
Paweł Dziepak
a16761dcb4 position_in_partition: add feed_hash() 2017-02-02 10:35:14 +00:00
Paweł Dziepak
f4fce93807 position_in_partition: add functions for querying object type 2017-02-02 10:35:14 +00:00
Paweł Dziepak
53d9a6f220 types: make counter_type_impl report its cql3_type 2017-02-02 10:35:14 +00:00
Paweł Dziepak
a805bea97a transport: encode counters as long_type
For the purposes of CQL counters are long values (either a delta in case
of writes or the final value for reads).
2017-02-02 10:35:14 +00:00
Paweł Dziepak
b6564651e4 mutation_partition: make for_each_cell() accessible outside source file
for_each_cell() const already can be used from any place in the code,
allow the same with non-const version.
2017-02-02 10:35:14 +00:00
Paweł Dziepak
bf60b7844b messaging_service: add COUNTER_MUTATION verb
This verb is going to be used for coordinator<->leader communication
during counter updates.
2017-02-02 10:35:14 +00:00
Paweł Dziepak
67ca6959bd storage_service: add COUNTERS feature 2017-02-02 10:35:14 +00:00
Paweł Dziepak
9989239c97 idl: add idl description of consistency level 2017-02-02 10:35:14 +00:00
Paweł Dziepak
4b3c0db5cc schema: make is_counter() return correct value 2017-02-02 10:35:14 +00:00
Paweł Dziepak
99b21fbb86 tests: random_mutation_generator: generate counter cells 2017-02-02 10:35:14 +00:00
Paweł Dziepak
de2acd47c9 tests/sstables: test reading and writing counters 2017-02-02 10:35:14 +00:00
Paweł Dziepak
83c6fc1114 sstables: write counter cells 2017-02-02 10:35:14 +00:00
Paweł Dziepak
5905729c4a sstables: read counter cells 2017-02-02 10:35:14 +00:00
Paweł Dziepak
de698105e4 tests/counter: test apply, difference and freeze 2017-02-02 10:35:14 +00:00
Paweł Dziepak
0c93d01232 atomic_cell: make sure upper level tombstones cover counters
Support for deletion of counters is limited in a way that once deleted
they cannot be used again (i.e. tombstone always wins, regardless of the
timestamp). Logic responsible for merging two counter cells already
makes sure that tombstones are handled properly, but it is also
necessary to ensure that higher level tombstones always cover counters.
2017-02-02 10:35:14 +00:00
Paweł Dziepak
9f1ebd4f7c idl/mutation: add counter serialisation logic 2017-02-02 10:35:14 +00:00
Paweł Dziepak
47d14906e6 mutation_partition: support querying counter cells 2017-02-02 10:35:14 +00:00
Paweł Dziepak
63f25eb12c mutation_hasher: handle counter cells properly 2017-02-02 10:35:14 +00:00
Paweł Dziepak
25c8ed1c71 feed_hash: allow additional arguments 2017-02-02 10:35:14 +00:00
Paweł Dziepak
a57e86cc37 mutation_partition: compute counter difference 2017-02-02 10:35:13 +00:00
Paweł Dziepak
2725a4945d mutation_partition: apply counter cells properly 2017-02-02 10:35:13 +00:00
Paweł Dziepak
496b42fcc7 tests: add test for counters 2017-02-02 10:35:13 +00:00
Paweł Dziepak
7bb5b49799 add in memory representation of counters
Live counter cells are collections of shards, each one representing the
sum of all operations performed by a particular replica. This commits
introduces an in-memory representation of counters as well as basic
operations such as merge, difference and hashing.
2017-02-02 10:35:13 +00:00
Paweł Dziepak
c66db213d3 storage_service: allow getting local host id without futures<> 2017-02-02 10:35:13 +00:00
Paweł Dziepak
0a8f00c159 atomic_cell: add flag for recognizing counter updates
A counter cell may be either a collection of shards or just a delta. The
former can only appear in certain places on coordinator and leader.
2017-02-02 10:35:13 +00:00
Paweł Dziepak
ab344c5aa3 mutation_partition_view: extract atomic_cell variant 2017-02-02 10:35:13 +00:00
Paweł Dziepak
83f6018ea2 schema: keep counter information in column definition 2017-02-02 10:35:13 +00:00
Avi Kivity
aec419da13 Merge seastar upstream
* seastar c1dbd89...f07f8ed (3):
  > Merge "Introduce when_all_succeed()" from Paweł
  > tests: adjust collectd test for metric API change
  > Merge "DNS query support" from Calle
2017-02-02 12:30:10 +02:00
Piotr Jastrzebski
15cc8460bd mutation_partition: make rows_entry constructors explicit
All converting constructors should be explicit otherwise they
can create a confusion. I got myself in such a situation when
clustering key got implicitly converted into rows_entry when
I was not expecting it.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c3f19719760f6dc7cf5e858b9c452506faedf521.1485950529.git.piotr@scylladb.com>
2017-02-01 17:57:50 +01:00
Amnon Heiman
45b6070832 Merge seastar upstream
* seastar 397685c...c1dbd89 (13):
  > lowres_clock: drop cache-line alignment for _timer
  > net/packet: add missing include
  > Merge "Adding histogram and description support" from Amnon
  > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
  > Set the option '--server' of tests/tcp_sctp_client to be required
  > core/memory: Remove superfluous assignment
  > core/memory: Remove dead code
  > core/reactor: Use logger instead of cerr
  > fix inverted logic in overprovision parameter
  > rpc: fix timeout checking condition
  > rpc: use lowres_clock instead of high resolution one
  > semaphore: make semaphore's clock configurable
  > rpc: detect timedout outgoing packets earlier

Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.

Includes fixup from Amnon:

collectd api should use the metrics getters

As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.

This will be important when the internal implementation will changed
from union to variant.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
2017-02-01 14:39:08 +02:00
Glauber Costa
facb0aa6d9 row_cache: rewrite loop so that debug mode doesn't become a noop
need_preempt() is always true in debug mode. Because of that, this loop
will never be executed. Rewrite it as a do-while loop so we are sure
that it is executed at least once - or exactly once in debug mode.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1485913079-1283-1-git-send-email-glauber@scylladb.com>
2017-02-01 10:02:13 +02:00
Tomasz Grabiec
634761dbba commitlog: Fix default limit for size on disk
The per-node limit will be total memory divided by number of shards
instead of just total memory. For example, when Scylla is started with
-c16 -m16G, the commit log will induce flushes on given shard when
unflushed data exceeds on that shard 62MB instead of 1GB.

Fixes #2046.

Message-Id: <1485874534-10939-1-git-send-email-tgrabiec@scylladb.com>
2017-01-31 17:12:59 +02:00
Piotr Jastrzebski
c7e95af0b0 row_cache_test: fix test_mvcc
Currently the test does not wait for cache update
to finish before carrying on with the checks.

This makes the test nondeterministic and purely wrong
because checks expect update to be finished.

This patch changes the test to wait for update to finish.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <2a99bba24b1628466d3495332b48ef3ccdb43c26.1485862389.git.piotr@scylladb.com>
2017-01-31 11:37:29 +00:00
Avi Kivity
aedb5e5cfa mutation_fragment: add std::ostream support
Helps poor debuggers.
Message-Id: <20170130163605.4858-1-avi@scylladb.com>
2017-01-31 10:37:42 +01:00
Tomasz Grabiec
0d40b86546 Merge "bail sooner from cache update if need_preempt()" from Glauber
An earlier patch of mine was using should_yield to do the same.  That
is a better direction, but should_yield() was demonstrably more
expensive so for now we'll go with need_preempt() - since this is
hurting pretty much every latency-dependent workload.

I am also including the scripts that I have used to measure and
compare the various versions of this patch.
2017-01-31 09:51:34 +01:00
Pekka Enberg
a625aae489 cql3/values.hh: Fix to_bytes_opt(raw_value)
The data() method already returns a bytes_opt so there's no need to call to_bytes_opt() again.

Fixes compliation failure on CentOS:

  In file included from ./cql3/query_options.hh:51:0,
                   from ./cql3/cql_statement.hh:47,
                   from ./cql3/statements/raw/select_statement.hh:45,
                   from build/release/gen/cql3/CqlParser.hpp:65,
                   from build/release/gen/cql3/CqlParser.cpp:44:
  ./cql3/values.hh: In function 'bytes_opt to_bytes_opt(const cql3::raw_value&)':
  ./cql3/values.hh:184:37: error: no matching function for call to 'to_bytes_opt(bytes_opt)'
       return to_bytes_opt(value.data());

Message-Id: <1485761863-28236-1-git-send-email-penberg@scylladb.com>
2017-01-30 10:49:31 +02:00
Gleb Natapov
6e4817137e storage_proxy: report foreground reads instead of reads
The reason is the same as why foreground writes are reported instead of
total writes (049ae37d08): It is much easier to see what is going on
this way.

Also fixes a typo in a counter's description.

Fixes #1217

Message-Id: <20170129093412.GS11469@scylladb.com>
2017-01-29 12:40:56 +02:00
Avi Kivity
9fb2f31616 Merge "CQL binary protocol unset value support" from Pekka
This patch series adds support for "unset values" that were introduced
in CQL binary protocol v4. They allow bound statements to skip updates
to some or all of the bound variables.

Unset values are specified using the BoundStatement.unset() method in
the Java driver:

  http://docs.datastax.com/en/drivers/java/3.1/com/datastax/driver/core/BoundStatement.html#unset-int-

and using the UNSET_VALUE constant in the Python driver:

  https://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.UNSET_VALUE

Fixes #2039.

* 'penberg/cql-unset-values/v2' of github.com:cloudius-systems/seastar-dev:
  transport/server: CQL unset value support
  cql3/statements/select_statement: Unset value support
  cql3/user_types: Unset value support
  cql3/tuples: Unset value support
  cql3/maps: Unset value support
  cql3/sets: Unset value support
  cql3/lists: Unset value support
  cql3/constants: UNSET_VALUE constant
  cql3/constants: Unset value support
  cql3/attributes: Unset value support
  types.hh: Add field_name_as_string() to user_type_impl type
  cql3: Introduce raw_value and raw_value_view types
2017-01-29 10:59:01 +02:00
Pekka Enberg
533c8d3949 transport/server: CQL unset value support
This patch implements support for CQL unset values at the protocol level.

Fixes #2039
2017-01-27 09:24:36 +02:00
Pekka Enberg
2bd560118e cql3/statements/select_statement: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
baaf1779c5 cql3/user_types: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
99c7dabd2a cql3/tuples: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
a0e6f6f371 cql3/maps: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
f883e64d70 cql3/sets: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
50ec81ee67 cql3/lists: Unset value support 2017-01-27 09:24:36 +02:00
Pekka Enberg
c4cd0a6541 cql3/constants: UNSET_VALUE constant 2017-01-27 09:24:36 +02:00
Pekka Enberg
063be3ed44 cql3/constants: Unset value support 2017-01-27 09:24:36 +02:00
Glauber Costa
b4ac2c1d60 debug: add systemtap script to measure interesting latencies during cache updates.
Example output:

Measuring Scylla row cache update times ^C
Total update time, (usec)
value |-------------------------------------------------- count
    2 |                                                   0
    4 |                                                   0
    8 |@@                                                 2
   16 |@@@                                                3
   32 |                                                   0
   64 |                                                   0
  128 |@@@@                                               4
  256 |@@                                                 2
  512 |                                                   0
 1024 |                                                   0

Time spent per partition batch (nsec)
 value |-------------------------------------------------- count
   128 |                                                       0
   256 |                                                       0
   512 |                                                      43
  1024 |                                                       2
  2048 |                                                       2
  4096 |                                                      45
  8192 |                                                     349
 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  61494
 32768 |@@@@@@@@@@@@@@@@@                                  21497
 65536 |                                                       0
131072 |                                                       0

Partitions updated per batch:
value |-------------------------------------------------- count
    0 |                                                      57
    1 |                                                      46
    2 |                                                      76
    4 |                                                     134
    8 |                                                     324
   16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  82795
   32 |                                                       0
   64 |                                                       0

Total partitions updated: 2485000
Average time spent per partition batch (nsec): 28816
Average time per partition per partition (nsec): 967

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-01-26 22:15:16 -05:00
Glauber Costa
69dbb3e108 row_cache: yield if need_preempt(), even if there is quota left.
The quota check is quite old at the moment, and dates back to a time in
which the infrastructure in seastar threads was lacking a lot. It is a
bad check since it will not take into consideration the size of the
partition or the time it takes to merge them.

A better check would at least take need_preempt() into account, so that
we would respect the task quota. That check is now embedded into
should_yield(), so there would no need to check anything else.

Although should_yield() does the job, it is still currently quite
expensive. And because we are in a seastar thread with a computationally
intensive loop, it can hurt latency a lot.

So as a temporary measure, let's at least check for need_preempt() - as
it is hurting real users at the moment - and soon work on making
should_yield() cheaper.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-01-26 22:10:54 -05:00
Glauber Costa
0e1f64b163 row_cache: add systemtap markers for the update process
update is one of our biggest sources of performance issues as far as the
cache is concerned. systemtap can be useful in helping tracking some of
them down.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-01-26 21:56:32 -05:00