Commit Graph

15534 Commits

Author SHA1 Message Date
Tomasz Grabiec
5bc201df10 cache: Release dirty memory with row granularity 2018-05-30 14:41:41 +02:00
Tomasz Grabiec
70c72773be cache: Defer during partition merging 2018-05-30 14:41:41 +02:00
Tomasz Grabiec
051bb74583 mvcc: partition_snapshot_row_cursor: Introduce consume_row() 2018-05-30 14:41:41 +02:00
Tomasz Grabiec
518fd7083f mvcc: partition_snapshot_row_cursor: Introduce maybe_refresh_static()
A version of maybe_refresh() optimized for snapshots which are
no longer populated. Will be used to implement cache update from
memtable.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
c653137b2b mvcc: Make apply_to_incomplete() work with attached versions
Needed before making it preemptible. We cannot steal the entry since
we may need to resume merging later.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
1792be3697 cache: Propagate phase to apply_to_incomplete()
It will be needed to create snapshots with appropriate phase markers.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
494cb3f3da cache: Prepare for incremental apply_to_incomplete()
Incremental merging will be implemented by the means of resumable
functions, which return stop_iteration::no when not yet
finished. We're not using futures, so that the caller can do work
around preemption points as well.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
a19c5cbc16 Introduce a coroutine wrapper
Represents a deferring operation which defers cooperatively with the caller.

The operation is started and resumed by calling run(), which returns
with stop_iteration::no whenever the operation defers and is not
completed yet. When the operation is finally complete, run() returns
with stop_iteration::yes.

This allows the caller to:

 1) execute some post-defer and pre-resume actions atomically

 2) have control over when the operation is resumed and in which context,
    in particular the caller can cancel the operation at deferring points.

It will be used to implement deferring partition_version::apply_to_incomplete().
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
6bd1a04c10 tests: mvcc: Encapsulate memory management details
Curently tests have a single LSA region lock around construction of
managed objects, their manipulation, and access. This way we avoid the
complexity of dealing with allocating sections. That will not be
possible once apply_to_incomplete() is changed to enter an allocating
section itself becasue this requires region to be unlocked at
entry. The tests will have to take more fine-grained locks. That is
somewhat tricky add would add a lot of noise to tests. This patch will
make things easier by abstracting LSA management, among other things,
inside mvcc_conatiner and mvcc_partition classes.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
f6e21accc7 tests: cache: Take into account that update() may defer
The test incorrectly assumed that once update() is started the
cache will return only versions from last_generation. This will not
hold once we start to defer during partition merging.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
c10d9e1607 cache: real_dirty_memory_accounter: Allow construction without memtable 2018-05-30 14:41:40 +02:00
Tomasz Grabiec
6ecda1ccd7 cache: Extract real_dirty_memory_accounter 2018-05-30 14:41:40 +02:00
Tomasz Grabiec
3f19f76c67 mvcc: Destroy memtable partition versions gently
Now all snapshots will have a mutation_cleaner which they will use to
gently destroy freed partition_version objects.

Destruction of memtable entries during cache update is also using the
gentle cleaner now. We need to have a separate cleaner for memtable
objects even though they're owned by cache's region, because memtable
versions must be cleared without a cache_tracker.

Each memtable will have its own cleaner, which will be merged with the
cache's cleaner when memtable is merged into cache.

Fixes some sources of reactor stalls on cache update when there are
large partition entries in memtables.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
c2d702622e memtable: Destroy partitions incrementally from clear_gently()
Destroying large partitions may stall the reactor for a long
time. Avoid this by clearing incrementally.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
81d231f35b mvcc: Remove rows from tracker gently
Some parititons may have a lot of rows. Better to iterate over them
incrementally as part of clear_gently() to avoid stalls.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
f0c1edd672 cache: Destroy partition versions incrementally
Instead of destroying whole partition_versions at once, we will do that
gently using mutation_cleaner to avoid reactor stalls.

Large deletions could happen when large partition gets invalidated,
upgraded to a new schema, or when it's abandaned by a detached snapshot.

Refs #3289.
2018-05-30 14:41:40 +02:00
Tomasz Grabiec
e0803ff71e Introduce mutation_cleaner
Used for collecting unsued partition_version objects and freeing them
incrementally. Will be used for both cache and memtables.
2018-05-30 14:41:39 +02:00
Tomasz Grabiec
e5aa02efeb mvcc: Introduce partition_version_list 2018-05-30 12:18:56 +02:00
Tomasz Grabiec
ca1ee93577 mvcc: Fix move constructor of partition_version_ref() not preserving _unique_owner
We didn't rely on that yet, it seems, but will.

(cherry picked from commit 21a744337de01f699d5c5c340483ad23cabab2ee)
2018-05-30 12:18:56 +02:00
Tomasz Grabiec
40cc766cf2 database: Add API for incremental clearing of partition entries
Partitions can get very large. Destroying them all at once can stall
the reactor for significant amount of time. We want to avoid that by
doing destruction incrementally, deferring in between. A new API is
added for that at various levels:

  stop_iteration clear_gently() noexcept;

It returns stop_iteration::yes when the object is fully cleared and
can be now destroyed quickly. So a deferring destruction can look like
this:

  return repeat([this] { return clear_gently(); });

The reason why clear_gently() doesn't return a future<> itself is that some
contexts cannot defer, like memory reclamation.
2018-05-30 12:18:56 +02:00
Tomasz Grabiec
2f75212ca4 cache: Define trivial methods inline
They have users in a different compilation unit, in partition_version.cc
2018-05-30 12:18:56 +02:00
Tomasz Grabiec
25b3641d9e tests: Improve perf_row_cache_update
We now test more kinds of workloads:
 - small partitions with no clustering key
 - large partition with lots of small rows
 - large partition with lots of range tombstones

We also collect statistics about scheduling latency induced by cache
update.

Example output:

Small partitions, no overwrites:
update: 356.809113 [ms], stall: {ticks: 396, min: 0.006867 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.358102 [ms]}, cache: 257/257 [MB] LSA: 257/257 [MB] std free: 83 [MB]
update: 337.542999 [ms], stall: {ticks: 373, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.358102 [ms]}, cache: 514/514 [MB] LSA: 514/514 [MB] std free: 83 [MB]
update: 383.485291 [ms], stall: {ticks: 425, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 771/788 [MB] LSA: 771/788 [MB] std free: 83 [MB]
update: 574.968811 [ms], stall: {ticks: 634, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.629722 [ms], max: 1.955666 [ms]}, cache: 879/917 [MB] LSA: 879/917 [MB] std free: 83 [MB]
update: 411.541138 [ms], stall: {ticks: 455, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.358102 [ms]}, cache: 787/835 [MB] LSA: 787/835 [MB] std free: 83 [MB]
update: 368.491211 [ms], stall: {ticks: 408, min: 0.001332 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 750/790 [MB] LSA: 750/790 [MB] std free: 83 [MB]
update: 343.671967 [ms], stall: {ticks: 380, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 734/769 [MB] LSA: 734/769 [MB] std free: 83 [MB]
update: 320.277283 [ms], stall: {ticks: 357, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 724/753 [MB] LSA: 724/753 [MB] std free: 83 [MB]
update: 310.583282 [ms], stall: {ticks: 344, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 714/740 [MB] LSA: 714/740 [MB] std free: 83 [MB]
update: 303.627106 [ms], stall: {ticks: 338, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.955666 [ms]}, cache: 707/731 [MB] LSA: 707/731 [MB] std free: 83 [MB]
update: 296.742523 [ms], stall: {ticks: 330, min: 0.001332 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 701/724 [MB] LSA: 701/724 [MB] std free: 83 [MB]
update: 286.598541 [ms], stall: {ticks: 319, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 697/719 [MB] LSA: 697/719 [MB] std free: 83 [MB]
update: 288.649323 [ms], stall: {ticks: 321, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 694/715 [MB] LSA: 694/715 [MB] std free: 83 [MB]
update: 282.069916 [ms], stall: {ticks: 314, min: 0.001598 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 692/712 [MB] LSA: 692/712 [MB] std free: 83 [MB]
update: 292.462036 [ms], stall: {ticks: 325, min: 0.001917 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 689/708 [MB] LSA: 689/708 [MB] std free: 83 [MB]
update: 274.390442 [ms], stall: {ticks: 305, min: 0.001332 [ms], 50%: 1.131752 [ms], 90%: 1.131752 [ms], 99%: 1.131752 [ms], max: 1.131752 [ms]}, cache: 687/705 [MB] LSA: 687/705 [MB] std free: 83 [MB]
invalidation: 172.617508 [ms]
Large partition, lots of small rows:
update: 262.132721 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.005722 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 268.650944 [ms]}, cache: 187/188 [MB] LSA: 187/188 [MB] std free: 82 [MB]
update: 281.359467 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.017084 [ms], 99%: 0.017084 [ms], max: 322.381152 [ms]}, cache: 375/376 [MB] LSA: 375/376 [MB] std free: 82 [MB]
update: 287.229065 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.017084 [ms], 99%: 0.017084 [ms], max: 322.381152 [ms]}, cache: 563/564 [MB] LSA: 563/564 [MB] std free: 82 [MB]
update: 1294.816284 [ms], stall: {ticks: 4, min: 0.001917 [ms], 50%: 0.005722 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 1386.179840 [ms]}, cache: 586/625 [MB] LSA: 586/625 [MB] std free: 82 [MB]
update: 845.022461 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.005722 [ms], 90%: 0.017084 [ms], 99%: 0.017084 [ms], max: 962.624896 [ms]}, cache: 439/475 [MB] LSA: 439/475 [MB] std free: 82 [MB]
update: 380.335938 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 386.857376 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 477.234680 [ms], stall: {ticks: 4, min: 0.002760 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 557.074624 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 525.955017 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 557.074624 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 548.003784 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.006866 [ms], 90%: 0.017084 [ms], 99%: 0.017084 [ms], max: 557.074624 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 528.697937 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 557.074624 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 609.292603 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.005722 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 575.762451 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.017084 [ms], 99%: 0.017084 [ms], max: 668.489536 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 530.801392 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 557.074624 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 535.948364 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.017084 [ms], 99%: 0.017084 [ms], max: 557.074624 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 527.143555 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.020501 [ms], 99%: 0.020501 [ms], max: 557.074624 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
update: 521.869202 [ms], stall: {ticks: 4, min: 0.002760 [ms], 50%: 0.004768 [ms], 90%: 0.017084 [ms], 99%: 0.017084 [ms], max: 557.074624 [ms]}, cache: 599/600 [MB] LSA: 599/600 [MB] std free: 82 [MB]
invalidation: 173.069733 [ms]
Large partition, lots of range tombstones:
update: 224.003220 [ms], stall: {ticks: 4, min: 0.001917 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 268.650944 [ms]}, cache: 52/52 [MB] LSA: 52/52 [MB] std free: 82 [MB]
update: 570.882874 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 105/105 [MB] LSA: 105/105 [MB] std free: 82 [MB]
update: 577.249878 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 158/158 [MB] LSA: 158/158 [MB] std free: 82 [MB]
update: 580.239624 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 211/211 [MB] LSA: 211/211 [MB] std free: 82 [MB]
update: 614.187134 [ms], stall: {ticks: 4, min: 0.001917 [ms], 50%: 0.004768 [ms], 90%: 0.011864 [ms], 99%: 0.011864 [ms], max: 668.489536 [ms]}, cache: 264/264 [MB] LSA: 264/264 [MB] std free: 82 [MB]
update: 618.709229 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.003973 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 317/317 [MB] LSA: 317/317 [MB] std free: 82 [MB]
update: 626.943359 [ms], stall: {ticks: 4, min: 0.001598 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 369/370 [MB] LSA: 369/370 [MB] std free: 82 [MB]
update: 602.873474 [ms], stall: {ticks: 4, min: 0.001917 [ms], 50%: 0.003973 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 422/423 [MB] LSA: 422/423 [MB] std free: 82 [MB]
update: 617.522583 [ms], stall: {ticks: 4, min: 0.001598 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 475/475 [MB] LSA: 475/475 [MB] std free: 82 [MB]
update: 627.291138 [ms], stall: {ticks: 4, min: 0.001598 [ms], 50%: 0.004768 [ms], 90%: 0.011864 [ms], 99%: 0.011864 [ms], max: 668.489536 [ms]}, cache: 528/528 [MB] LSA: 528/528 [MB] std free: 82 [MB]
update: 623.720886 [ms], stall: {ticks: 4, min: 0.001598 [ms], 50%: 0.003973 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 581/581 [MB] LSA: 581/581 [MB] std free: 82 [MB]
update: 630.735596 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 668.489536 [ms]}, cache: 634/634 [MB] LSA: 634/634 [MB] std free: 82 [MB]
update: 2776.525635 [ms], stall: {ticks: 4, min: 0.002300 [ms], 50%: 0.004768 [ms], 90%: 0.014237 [ms], 99%: 0.014237 [ms], max: 2874.382592 [ms]}, cache: 687/687 [MB] LSA: 687/687 [MB] std free: 82 [MB]
2018-05-30 12:18:56 +02:00
Tomasz Grabiec
bb96518cc5 mutation_reader: Make empty mutation source advertize no partitions
So that perf_row_cache_update will always populate cache.
2018-05-30 12:18:56 +02:00
Tomasz Grabiec
aefb5e0fbd Merge "Get rid of cql_statement::execute_internal" from Avi
execute_internal() duplicates several code paths, especially in
the select path, for no good reason.  It boils down to timeout and
consistency level selection which can be done based on
client_state::is_internal().

This patchset eliminated the duplication and execute_internal(),
simplifying the code.

* github.com:avikivity/scylla cql-no-execute_internal/v2:
  cql: schema_altering_statement: make execute() and execute_internal()
    equivalent
  cql: select_statement: make execute() and execute_internal()
    equivalent
  cql: query_processor: don't call cql_statement::execute_internal() any
    more
  cql: cql_statement: remove execute_internal()
2018-05-28 13:01:43 +02:00
Avi Kivity
8033785b36 Update scylla-ami submodule
* dist/ami/files/scylla-ami 025644d...1f5329f (1):
  > scylla_install_ami: Update CentOS to latest version
2018-05-28 13:59:57 +03:00
Avi Kivity
ff3e86888a tests: report tests as they are completed
As each test completes, report it. This prevents a long-running
test in the beginning of the list from stalling output.
Message-Id: <20180526173517.23078-1-avi@scylladb.com>
2018-05-28 13:58:01 +03:00
Avi Kivity
3a4d11d374 Merge "Introduce frozen_mutation_fragment" from Paweł
"
This series introduces frozen_mutation_fragment which can be used to
send mutation_fragments over the wire to a remote node. The main
intended user is going to be the new streaming implementation.

The first part of the series fixes some IDL issues related to empty
structures and variant being the first member of a structure. Both these
problems make the generated code fail to build and they do not, in any
way, affect the existing on-wire protocol.

Logic responsible for freezing and unfreezing of mutation_fragments is
heavily based on the existing code for freezing mutations and shares the
same drawbacks (for example, unnecessary copy during unfreezing). These
preexisting performance problems can be fixed incrementally.

Another performance problem (which affects frozen_mutations as well, but
to a lesser extent) is that since the batching is done at a different
layer each frozen mutation fragment is a separate bytes_ostream object
owning at least one  memory buffer. If the mutation fragments are small
this will cause an excessive number of allocations. This could be solved
either by freezing fragments in batches (though it goes against the RPC
layer doing its own batching) or using bytes_ostream or an equivalent
object with a buffer allocation policy more suitable for such use cases.
This also is something that probably could be an incremental fix.

Tests: unit (release)
"

* tag 'frozen_mutation_fragment/v1-rebased' of https://github.com/pdziepak/scylla:
  idl: add idl description of frozen_mutation_fragments
  tests: add test for frozen_mutation_fragments
  frozen_mutation: introduce frozen_mutation_fragment
  tests/idl: test variant being the first member of a structure
  idl: create variant state in root node
  tests/idl: test serialising and deserialising empty structures
  idl-compiler: avoid unused variable in empty struct deserialisers
  tests/mutation_reader: disambiguate freeze() overload
2018-05-28 13:54:01 +03:00
Takuya ASADA
55d6be9254 Revert "dist/ami: update CentOS base image to latest version"
This reverts commit 69d226625a.
Since ami-4bf3d731 is Market Place AMI, not possible to publish public AMI based on it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180523112414.27307-1-syuu@scylladb.com>
2018-05-28 13:52:34 +03:00
Avi Kivity
b70febe246 cql: cql_statement: remove execute_internal()
With no callers, it can be safely removed.
2018-05-27 12:40:27 +03:00
Avi Kivity
c8a66efb6a cql: query_processor: don't call cql_statement::execute_internal() any more
All cql_statement::execute_internal() overrides now either throw or
call execute().  Since we shouldn't be calling the throwing overrides
internally, we can safely call execute() instead.  This allows us to
get rid of execute_internal().
2018-05-27 12:37:37 +03:00
Avi Kivity
eb19798f99 cql: select_statement: make execute() and execute_internal() equivalent
execute_internal(), for some code paths, differs from execute by the
following:
 1. it uses CL_ONE unconditionally
 2. it has no query timeout
 3. it doesn't use execution stages

for other code paths, it just calls execute.

As preparation for getting rid of execute_internal(), unify the two
code paths.

Commit 4859b759b9 caused the consistency level and timeouts
to be provided by the caller, so using the caller provided parameters
instead of overriding them does not change behavior.
2018-05-27 12:36:02 +03:00
Avi Kivity
d998f06633 cql: schema_altering_statement: make execute() and execute_internal() equivalent
To get rid of execute_internal(), make the normal execute() equivalent and call
it instead of having two different paths.
2018-05-27 11:08:55 +03:00
Duarte Nunes
4859b759b9 Merge 'Make all timeouts explicit' from Avi
"
This patchset makes all users of query_processor specify their timeouts
explicitly, in preparation for the removal of
cql_statement::execute_internal() (whose main function was to override
timeouts).
"

* tag 'cql-explicit-timeouts/v1' of https://github.com/avikivity/scylla:
  query_processor: require clients to specify timeout configuration
  query_processor: un-default consistency level in make_internal_options
2018-05-26 16:10:58 +02:00
Avi Kivity
6e97609049 Merge "Improve support for data types handling in SSTables 3.x" from Vladimir
"
Firstly, this patchset removes the is_fixed_length() function of
abstract_type in favour of value_length_if_fixed().

Secondly, it fixed the byte_type to be compatible with Cassandra which
erroneously treats it as a variable-length data type.

Lastly, it adds a unit test covering all non-composite CQL data types
for writing.

Tests: unit {release}
"

* 'projects/sstables-30/different-data-types/v1' of https://github.com/argenet/scylla:
  tests: Add a unit test for writing different data types to SSTables 3.x format.
  types: Treat byte_type as a variable-length type for compatibility reasons.
  types: Remove is_value_fixed() and use value_length_if_fixed() instead.
2018-05-26 10:24:35 +03:00
Vladimir Krivopalov
0951153292 tests: Add a unit test for writing different data types to SSTables 3.x format.
This tests covers all non-composite CQL data types.
The resulting files are dumped using sstabledump as follows:

[
  {
    "partition" : {
      "key" : [ "key" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 174,
        "liveness_info" : { "tstamp" : "1525385507816568" },
        "cells" : [
          { "name" : "asciival", "value" : "hello" },
          { "name" : "bigintval", "value" : 9223372036854775807 },
          { "name" : "blobval", "value" : "0x6772656174" },
          { "name" : "boolval", "value" : true },
          { "name" : "dateval", "value" : "2017-05-05" },
          { "name" : "decimalval", "value" : 5.45 },
          { "name" : "doubleval", "value" : 36.6 },
          { "name" : "durationval", "value" : 1h4m48s20ms },
          { "name" : "floatval", "value" : 7.62 },
          { "name" : "inetval", "value" : "192.168.0.110" },
          { "name" : "intval", "value" : -2147483648 },
          { "name" : "smallintval", "value" : 32767 },
          { "name" : "timeuuidval", "value" : "50554d6e-29bb-11e5-b345-feff819cdc9f" },
          { "name" : "timeval", "value" : "19:45:05.090000000" },
          { "name" : "tinyintval", "value" : 127 },
          { "name" : "tsval", "value" : "2015-05-01 09:30:54.234Z" },
          { "name" : "uuidval", "value" : "01234567-0123-0123-0123-0123456789ab" },
          { "name" : "varcharval", "value" : "привет" },
          { "name" : "varintval", "value" : 123 }
        ]
      }
    ]
  }
]

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-25 21:41:23 -07:00
Vladimir Krivopalov
3981dd6dd6 types: Treat byte_type as a variable-length type for compatibility reasons.
Although values of the byte_type that corresponds to CQL TINYINT type
always occupy only a single byte, Cassandra treats this it as a
variable-length type for SSTables 3.0 reading and writing.

While it is clearly a mistake at Cassandra side, we have to stay
compatible.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-25 21:41:23 -07:00
Vladimir Krivopalov
24cb062834 types: Remove is_value_fixed() and use value_length_if_fixed() instead.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-25 21:41:23 -07:00
Paweł Dziepak
ed12555192 idl: add idl description of frozen_mutation_fragments 2018-05-25 10:15:10 +01:00
Paweł Dziepak
0bac487426 tests: add test for frozen_mutation_fragments 2018-05-25 10:15:10 +01:00
Paweł Dziepak
aa4e589ace frozen_mutation: introduce frozen_mutation_fragment
This patch introduces IDL definition as well as serialisers and
deserialisers for freezing mutation_fragment so that they can be
transferred between nodes in a cluster.
2018-05-25 10:15:10 +01:00
Paweł Dziepak
b2e9491728 tests/idl: test variant being the first member of a structure 2018-05-25 10:15:10 +01:00
Paweł Dziepak
a5731ded98 idl: create variant state in root node
Each non-final IDL object is preceeded by a frame containing its size.
In case of boost::variant there is a frame for the variant itself, an
integer determining the active alternative of the variant and a frame of
that active alternative.

However, if a variant was the first member of a writable stub object the
IDL would generate code that would not write the frame for the variant.
This is not a very severe issue since there are no such cases right now
as  C++ type system would no allow such generated code to compile.
2018-05-25 10:15:10 +01:00
Paweł Dziepak
d731cf427d tests/idl: test serialising and deserialising empty structures 2018-05-25 10:15:10 +01:00
Paweł Dziepak
f719516be8 idl-compiler: avoid unused variable in empty struct deserialisers
Deserialisers generated by IDL compiler first create a substream
covering the deserialised structure and then skip and read appropriate
members. If there are no members the substream will be unused and prompt
the compiler to emit a warning.
2018-05-25 10:15:10 +01:00
Paweł Dziepak
fde9e1d55f tests/mutation_reader: disambiguate freeze() overload
freeze() is about to get overloaded so make sure we don't get any
ambiguities.
2018-05-25 10:15:10 +01:00
Duarte Nunes
4db0b4af58 Merge 'secondary index: Fixes for tables with multiple clustering columns' from Nadav
"
This patch series fixes #3405: secondary-index search only provided
correct results in certain cases, where entire partitions or contiguous
partition slices matched the query. When this was not the case, and
individual clustering rows match or do not match the query, the wrong
results were returned.

To fix this bug, we need to fix the two stages of secondary-index search:

1. In the first stage, we read from the index MV a list of row keys
   (i.e., primary keys) matching the query. We can no longer remember
   just the partition keys, and need to keep the list of full primary keys.

2. In the second stage, we have a list of rows (not partitions) and need
   to read their selected contents to return to the user. Since CQL queries
   do not have a syntax to select an arbitrary list of rows, we have to
   add new code to do such a selection.

Because we provide an ad-hoc, inefficient, implementation for the row
selection described in stage 2, these patches leave two paths in the code:
The old path, efficiently selecting entire partitions, and the new path,
selecting individual rows. The old path is still used when it is applicable,
which is when a partition key column or the first clustering key column
is searched.
"

* 'si-fix-v4' of http://github.com/nyh/scylla:
  secondary index: test multiple clustering column
  secondary index: fix wrong results returned in certain cases
  secondary index: method for fetching list of rows from base table
  secondary index: method for fetching list of rows from index
  select_statement.cc: refactor find_index_partition_ranges()
  select_statement.cc: fix variable lifetime errors
2018-05-24 21:36:18 +01:00
Nadav Har'El
a6d9ea2fb5 secondary index: test multiple clustering column
This patch adds a test for secondary indexes on a table which has many
columns - two partition key column, two clustering key columns, and two
regular columns. We add a bunch of data in various rows and partitions,
index all columns and search on this data and verify the results.

This test exposed various bugs in secondary index search, including
issue #3405. After we fixed those bugs, the test now passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-05-24 15:56:57 +03:00
Nadav Har'El
1b29dd44f7 secondary index: fix wrong results returned in certain cases
The current secondary-index search code, in
indexed_table_select_statement::do_execute(), begins by fetching a list
of partitions, and then the content of these partitions from the base
table. However, in some cases, when the table has clustering columns and
not searching on the first one of them, doing this work in partition
granularity is wrong, and yields wrong results as demonstrated in
issue #3405.

So in this patch, we recognize the cases where we need to work in
clustering row granularity, and in those cases use the new functions
introduced in the previous patches - find_index_clustering_rows() and
the execute() variant taking a list of primary-keys of rows.

Fixes #3405.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-05-24 15:56:03 +03:00
Nadav Har'El
adf6d742be secondary index: method for fetching list of rows from base table
We add a new variant of select_statement::execute() which allows selecting
an arbitrary list of clustering rows. The existing execute() variant can't
do that - it can only take a list of *partitions*, and read the same
clustering rows from all of them.

The new select variant is not needed for regular CQL queries (which do
not have a syntax allowing reading a list of rows with arbitrary primary
keys), but we will need it for secondary index search, for solving
issue #3405.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-05-24 15:54:36 +03:00
Nadav Har'El
a096a82adc secondary index: method for fetching list of rows from index
We already have a method find_index_partition_ranges(), to fetch a list
of partition keys from the secondary index. However, as we shall see in
the following patches (and see also issue #3405), getting a list of entire
partitions is not always enough - the secondary index actually holds a list
of primary keys, which includes clustering keys, and in some queries we
can't just ignore them.

So this patch provides a new method find_index_clustering_rows(), to
query the secondary index and get a list of matching clustering keys.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-05-24 15:53:29 +03:00