Commit Graph

14460 Commits

Author SHA1 Message Date
Calle Wilund
3e8cfbf2a0 cql3::statements::property_definitions: Use std::variant instead of any
Formalizing what stuff we actually keep in the props. And c++17.
2018-02-07 10:11:46 +00:00
Calle Wilund
0dcf287230 sstables: Add extension type for wrapping file io 2018-02-07 10:11:45 +00:00
Calle Wilund
3ab760b375 schema: Add opaque type to represent extensions
A virtual opaque object meant to represent the "extensions" mapping
in schema_tables::tables/views
2018-02-07 10:11:45 +00:00
Calle Wilund
74758c87cd sstables::compress/compress: Make compression a virtual object
Make a "compressor" an actual class, that can be implemented and
registered via class registry. 

For "common" compressors, the objects will be shared, but complex
implementors can be semi-stateful. 

sstable compression is split into two parts: The "static" config
which is shared across shards, and a "local" one, which holds 
a compressor pointer. The latter is encapsulated, along with 
actual compressed data writers, in sstables/compress.cc.

For compression (write), compression writer is instansiated 
with the settings active in table metadata. 

For decompression (read), compression reader is instansiated
with the settings stored in sstable metadata, which can 
differ from the currently active table metadata. 

v2:
* Structured patch sets differently (dependencies)
* Added more comments/api descs
* Added patch to move all sstable compression into compress.cc,
  effectively separating top-level virtual compressor object
  from sstable io knowledge
v3:
* Rebased
v4: 
* Moved all sstable compression logic/knowledge into  
  compress.cc (local compression). Merged the two patches 
  (separation just confuses reader).
2018-02-07 10:11:45 +00:00
Paweł Dziepak
6ccd317c38 Merge "Do not evict from memtable snapshots" from Tomasz
"When moving whole partition entries from memtable to cache, we move
snapshots as well. It is incorrect to evict from such snapshots
though, because associated readers would miss data.

Solution is to record evictability of partition version references (snapshots)
and avoiding eviction from non-evictable snapshots.

Could affect scanning reads, if the reader uses partition entry from
memtable, and the partition is too large to fit in reader's buffer,
and that entry gets moved to cache (was absent in cache), and then
gets evicted (memory pressure). The reader will not see the remainder
of that entry. Found during code review.

Introduced in ca8e3c4, so affects 2.1+

Fixes #3186.

Tests: unit (release)"

* 'tgrabiec/do-not-evict-memtable-snapshots' of github.com:tgrabiec/scylla:
  tests: mvcc: Add test for eviction with non-evictable snapshots
  mutation_partition: Define + operator on tombstones
  tests: mvcc: Check that partition is fully discontinuous after eviction
  tests: row_cache: Add test for memtable readers surviving flush and eviction
  memtable: Make printable
  mvcc: Take partition_entry by const ref in operator<<()
  mvcc: Do not evict from non-evictable snapshots
  mvcc: Drop unnecessary assignment to partition_snapshot::_version
  tests: Use partition_entry::make_evictable() where appropriate
  mvcc: Encapsulate construction of evictable entries
2018-02-06 14:46:24 +00:00
Tomasz Grabiec
3c51cc79d5 tests: mvcc: Add test for eviction with non-evictable snapshots 2018-02-06 14:24:19 +01:00
Tomasz Grabiec
d37131d320 mutation_partition: Define + operator on tombstones 2018-02-06 14:24:19 +01:00
Tomasz Grabiec
ec5fe5b207 tests: mvcc: Check that partition is fully discontinuous after eviction
evict() should remove everything, including range tombstones, so whole
clustering range should be marked as discontinuous.
2018-02-06 14:24:19 +01:00
Tomasz Grabiec
c1b82e60e3 tests: row_cache: Add test for memtable readers surviving flush and eviction
Reproduces https://github.com/scylladb/scylla/issues/3186
2018-02-06 14:24:19 +01:00
Tomasz Grabiec
d85d651e0f memtable: Make printable
Useful when debugging test failures.
2018-02-06 14:24:19 +01:00
Tomasz Grabiec
06b7b54c3d mvcc: Take partition_entry by const ref in operator<<()
Some users will only have const&.
2018-02-06 14:24:19 +01:00
Tomasz Grabiec
50f5bee12e mvcc: Do not evict from non-evictable snapshots
When moving whole partition entries from memtable to cache, we move
snapshots as well. It is incorrect to evict from such snapshots
though, because associated readers would miss data.

Solution is to record evictability of partition version references (snapshots)
and avoiding eviction from non-evictable snapshots.

Could affect scanning reads, if the reader uses partition entry from
memtable, and the partition is too large to fit in reader's buffer,
and that entry gets moved to cache (was absent in cache), and then
gets evicted (memory pressure). The reader will not see the remainder
of that entry.

Introduced in ca8e3c4, so affects 2.1+

Fixes #3186.
2018-02-06 14:24:19 +01:00
Tomasz Grabiec
c391bff1d2 mvcc: Drop unnecessary assignment to partition_snapshot::_version
merge_partition_versions() is responsible for merging versions
unpinned by the current snapshot. If that fails, we don't need to set
_version back since versions must be still referenced by someone else,
this snapshot is not a unique owner.

This change makes it easier to add tracking of evictability.
2018-02-06 14:24:18 +01:00
Tomasz Grabiec
439cbada2c tests: Use partition_entry::make_evictable() where appropriate 2018-02-06 14:24:18 +01:00
Raphael S. Carvalho
09f4ee808f sstables/compress: Fix race condition in segmented offset reading of shared sstable
Race condition was introduced by commit 028c7a0888, which introduces chunk offset
compression, because a reading state is kept in the compress structure which is
supposed to be immutable and can be shared among shards owning the same sstable.

So it may happen that shard A updates state while shard B relies on information
previously set which leads to incorrect decompression, which in turn leads to
read misbehaving.

We could serialize access to at() which would only lead to contention issues for
shared sstables, but that can be avoided by moving state out of compress structure
which is expected to be immutable after sstable is loaded and feeded to shards that
own it. Sequential accessor (wraps state and reference to segmented_offset) is
added to prevent at() and push_back() interfaces from being polluted.

Tests: release mode.

Fixes #3148.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180205192432.23405-1-raphaelsc@scylladb.com>
2018-02-06 12:10:10 +02:00
Tomasz Grabiec
d899ae0f02 mvcc: Encapsulate construction of evictable entries
Internal invariants of MVCC are better preserved by partition_entry
methods, so move construction of partition entries out of cache_entry
constructors.
2018-02-05 17:54:03 +01:00
Avi Kivity
a94564a637 Merge seastar upstream
* seastar 21badbd...6d02263 (4):
  > build: detect name of ninja executable
  > queue: pop_eventually/push_eventually should throw when called after abort
  > build: compile libfmt out-of-line
  > core/gate: Ensure with_gate leaves gate on exception
2018-02-05 14:42:07 +02:00
Tomasz Grabiec
d21fbc26c7 tests: range_tombstone_list: Do not depend on argument evaluation order
next_pos() calls could be reordered resulting in invalid tombstones being
generated.
Message-Id: <1517833688-20022-1-git-send-email-tgrabiec@scylladb.com>
2018-02-05 12:31:37 +00:00
Tomasz Grabiec
d2baa49313 tests: Do not produce invalid range tombstones
Upper bound should not be smaller than lower bound. Found by
asserting on valid bounds.
Message-Id: <1517833602-19732-1-git-send-email-tgrabiec@scylladb.com>
2018-02-05 12:29:03 +00:00
Takuya ASADA
6d134c0c2b dist/redhat: block installing Scylla on older kernel
We uses AmbientCapabilities directive on systemd unit, but it does not work
on older kernel, causes following error:
"systemd[5370]: Failed at step CAPABILITIES spawning /usr/bin/scylla: Invalid argument"

It only works on kernel-3.10.0-514 == CentOS7.3 or later, block installing rpm
to prevent the error.

Fixes #3176

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1517822764-2684-1-git-send-email-syuu@scylladb.com>
2018-02-05 12:57:17 +02:00
Duarte Nunes
46099e4f58 tests/role_manager_test: Stop role_manager
Not stopping them may cause the tests to fail due to an asynchronous
process being scheduled and accessing freed data.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180202221640.28609-1-duarte@scylladb.com>
2018-02-05 09:39:59 +00:00
Avi Kivity
6919c7434e Merge seastar upstream
* seastar 19efbd9...21badbd (4):
  > reactor: change adjustment method for tasks becoming active
  > Merge 'Update ARM port' from Avi
  > http: Do not wait for close connection on stop if listen did not completed
  > core/future-util: Don't allow rvalues in do_for_each()
2018-02-04 14:28:28 +02:00
Avi Kivity
2173e74212 tests: de-template cql_query_test
cql_query_test contains many continuations that are generic lambdas:

  foo().then([] (auto x) { ... })

These templates prevent Eclipse's indexer from inferring the type of x,
and so everything below that point is one big error as far as Eclipse is
concerned.

De-template these lambdas by specifying the real types.

Unfortunately, compile time decrease was not observed.

Tests: cql_query_test (release)
Message-Id: <20180204113503.23297-1-avi@scylladb.com>
2018-02-04 11:48:52 +00:00
Takuya ASADA
dc2b17b3da dist/redhat: link yaml-cpp statically
To avoid incompatibility between distribution provided libyaml-cpp, link it
statically.

Fixes #3173

Message-Id: <1517546935-15858-2-git-send-email-syuu@scylladb.com>
2018-02-03 16:34:36 +02:00
Takuya ASADA
82f217d62a configure.py: make --static-yaml-cpp works properly for Scylla
We are doing static linking of libyaml-cpp for libseatar well, but
mistakenly not for Scylla, need to fix.

Message-Id: <1517546935-15858-1-git-send-email-syuu@scylladb.com>
2018-02-03 16:34:32 +02:00
Amnon Heiman
836876d81a main: stop prometheus server when shutting down
This patch adds a enging().on_exit cleanup for the prometheus server,
similar to other components in the system.

It will stop the server when sutting down.

Fixes #2520
Message-Id: <20180201132647.17638-1-amnon@scylladb.com>
2018-02-02 11:03:51 +01:00
Tomasz Grabiec
582dd36303 Merge 'Fixes for exception safety in memtable range reads' from Paweł
These patches deal with the remaining exception safety issues in the
memtable partition range readers. That includes moving the assignment
to iterator_reader::_last outside of allocating section to avoid
problems caused by exception-unsafe assignment operator. Memory
accotuning code is also moved out of the retryable context to improve
the code robustness and avoid potential problems in the future.

Fixes #3172.

Tests: unit-test (release)

* https://github.com/pdziepak/scylla.git memtable-range-read-exception-safety/v1:
  memtable: do not update iterator_reader::_last in alloc section
  memtable: do not change accounting state in alloc section
  tests/memtable: add more reader exception safety tests
2018-02-02 11:00:58 +01:00
Paweł Dziepak
c2a5fd520f cql3/role-management: avoid static local shared_ptr
Even if shared_ptr is const it doesn't mean that its internal state is
immutable and it still cannot be freely shared across shards.

Fixes assertion failure in build/debug/tests/cql_roles_query_test.

Message-Id: <20180201125221.30531-1-pdziepak@scylladb.com>
2018-02-01 16:28:36 +02:00
Paweł Dziepak
ea50806172 tests/mutation_reader: avoid static local lw_shared_ptr
Shared pointer don't like being shared across shards.

Fixes assertion failure in build/debug/tests/mutation_reader_test.
Message-Id: <20180201125017.30259-1-pdziepak@scylladb.com>
2018-02-01 13:53:55 +01:00
Paweł Dziepak
20c460d8f0 tests/memtable: add more reader exception safety tests 2018-01-31 16:05:35 +00:00
Paweł Dziepak
c945bdc7f6 memtable: do not change accounting state in alloc section
Allocating sections can be retried so code that has side effects (like
updating flushed bytes accouting) has no place there.
2018-01-31 16:04:31 +00:00
Paweł Dziepak
d803370868 memtable: do not update iterator_reader::_last in alloc section
iterator_reader::_last is a part of the state that survives allocating
section retries, therefore, it should not be modified in the retryable
context.
2018-01-31 16:03:16 +00:00
Avi Kivity
4463e9071a Merge "Adding the API V2 Swagger definition file" from Amnon
"This series adds the base for the V2 Swagger definition file.
After the series, the definition file will be at:
http://localhost:10000/v2

It can be used with the swagger ui, by replacing the url in the search
path."

* 'amnon/swagger_20' of github.com:scylladb/seastar-dev:
  Register the API V2 swagger file
  Adding the header part of the swagger2.0 API
2018-01-31 14:47:50 +02:00
Duarte Nunes
cf6110d840 tests/cell_locker_test: Ensure timeout test finishes in useful time
Use saturating_substract to prevent a really long timeout and having
the test hang.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180130221336.1773-1-duarte@scylladb.com>
2018-01-31 11:34:08 +01:00
Duarte Nunes
01a8e5abb9 Merge 'Materialized views: add local locking' from Nadav
"Before this patch set, our Materialized Views implementation can produce
incorrect results when given concurrent updates of the same base-table
row. Such concurrent updates may result, in certain cases, with two
different rows in the view table, instead of just one with the latest
data. In this series we add locking which serializes the two conflicting
updates, and solves this problem.

I explain in more detail why such locking is needed, and what kinds of
locks are needed, in the third patch."

* 'master' of https://github.com/nyh/scylla:
  Materialized views: serialize read-modify-update of base table
  Materialized views: test row_locker class
  Materialized views: implement row and partition locking mechanism
2018-01-30 17:40:12 +00:00
Tomasz Grabiec
cdd31918d0 Merge 'Make memtable reads exception safe' from Paweł
These patches change the memtable reader implementation (in particular
partition_snapshot_reader) so that the existing exception safety
paroblems are fixed, but also in a way that, hopefully, would make it
easier to reason about the error handling and avoid future bugs in that
area.

The main difficulty related to exception safety is that when an
exception is thrown out of an allocating section that code is run again
with increased memory reserved. If the retryable code has side effects
it is very easy to get incorrect behaviour.

In addition to that, entering an allocating section is not exactly cheap
which encourages doing so rarely and having large sections.

The approach taken by this series is to, first, make entering allocating
sections cheaper and then reducing the amount of logic that runs inside
of them to a minimum.

This means that instead of entering a section once per a call to
flat_mutation_reader::fill_buffer() the allocation section is entered
once for each emitted row. The only state modified from within the
section are cached iterators to the current row, which are dropped on
retry. Hopefully, this would make the reader code easier to reason
about.

The optimisations to the allocating sections and managed_bytes
linearised context has successfully eliminated any penalty caused by
much more fine grained allocating sections.

Fixes #3123.
Fixes #3133.

Tests: unit-tests (release)

BEFORE
test                                      iterations      median         mad         min         max
memtable.one_partition_one_row               1155362   869.139ns     0.282ns   868.465ns   873.253ns
memtable.one_partition_many_rows              127252     7.871us    15.252ns     7.851us     7.886us
memtable.many_partitions_one_row               58715    17.109us     2.765ns    17.013us    17.112us
memtable.many_partitions_many_rows              4839   206.717us   212.385ns   206.505us   207.448us

AFTER
test                                      iterations      median         mad         min         max
memtable.one_partition_one_row               1194453   839.223ns     0.503ns   834.952ns   842.841ns
memtable.one_partition_many_rows              133785     7.477us     4.492ns     7.473us     7.507us
memtable.many_partitions_one_row               60267    16.680us    18.027ns    16.592us    16.700us
memtable.many_partitions_many_rows              4975   201.048us   144.929ns   200.822us   201.699us

        ./before_sq  ./after_sq  diff
 read     337373.86   353694.24  4.8%
 write    388759.99   394135.78  1.4%

* https://github.com/pdziepak/scylla.git memtable-exception-safety/v2:
  tests/perf: add microbenchmarks for memtable reader
  flat_mutation_reader: add allocation point in push_mutation_fragment
  linearization_context: remove non-trivial operations from fast path
  lsa: split alloc section into reserving and reclamation-disabled parts
  lsa: optimise disabling reclamation and invalidation counter
  mutation_fragment: allow creating clustering row in place
  paratition_snapshot_reader: minimise amount of retryable code
  memtable: drop memtable_entry::read()
  tests/memtable: add test for reader exception safety
2018-01-30 18:33:27 +01:00
Paweł Dziepak
1406ac5088 tests/memtable: add test for reader exception safety 2018-01-30 18:33:26 +01:00
Paweł Dziepak
ea7248056f memtable: drop memtable_entry::read() 2018-01-30 18:33:26 +01:00
Paweł Dziepak
0420ca48a5 paratition_snapshot_reader: minimise amount of retryable code
Retryable code that has side effects is a recipe for bugs. This patch
reworkds the snapshot reader so that the amount of logic run with
reclamation disabled is minimal and has a very limited side effects.
2018-01-30 18:33:26 +01:00
Paweł Dziepak
b1cb7d214e mutation_fragment: allow creating clustering row in place
Moving clustering_row is expensive due to amount of data stored
internally. Adding a mutation_fragment constructor that builds a
clustering_row in-place saves some of that moving.
2018-01-30 18:33:26 +01:00
Paweł Dziepak
dcd79af8ed lsa: optimise disabling reclamation and invalidation counter
Most of the lsa gory details are hidden in utils/logalloc.cc. That
includes the actual implementation of a lsa region: region_impl.

However, there is code in the hot path that often accesses the
_reclaiming_enabled member as well as its base class
allocation_strategy.

In order to optimise those accesses another class is introduced:
basic_region_impl that inherits from allocation_strategy and is a base
of region_impl. It is defined in utils/logalloc.hh so that it is
publicly visible and its member functions are inlineable from anywhere
in the code. This class is supposed to be as small as possible, but
contain all members and functions that are accessed from the fast path
and should be inlined.
2018-01-30 18:33:26 +01:00
Paweł Dziepak
d825ae37bf lsa: split alloc section into reserving and reclamation-disabled parts
Allocating sections reserves certain amount of memory, then disables
reclamation and attempts to perform given operation. If that fails due
to std::bad_alloc the reserve is increased and the operation is retried.

Reserving memory is expensive while just disabling reclamation isn't.
Moreover, the code that runs inside the section needs to be safely
retryable. This means that we want the amount of logic running with
reclamation disabled as small as possible, even if it means entering and
leaving the section multiple times.

In order to reduce the performance penalty of such solution the memory
reserving and reclamation disabling parts of the allocating sections are
separated.
2018-01-30 18:33:26 +01:00
Paweł Dziepak
eb2e88e925 linearization_context: remove non-trivial operations from fast path
Since linearization_context is thread_local every time it is accessed
the compiler needs to emit code that checks if it was already
constructed and does so if it wasn't. Moreover, upon leaving the context
from the outermost scope the map needs to be cleared.

All these operations impose some performance overhead and aren't really
necessary if no buffers were linearised (the expected case). This patch
rearranges the code so that lineatization_context is trivially
constructible and the map is cleared only if it was modified.
2018-01-30 18:33:25 +01:00
Paweł Dziepak
a1278b4d6a flat_mutation_reader: add allocation point in push_mutation_fragment
Exception safety tests inject a failure at every allocation and verify
whether the error is handled properly.

push_mutation_fragment() adds a mutation fragment to a circular_buffer,
in theory any call to that function can result in a memory allocation,
but in practice that depends on the implementation details. In order to
improve the effectiveness of the exception safety tests this patch adds
an explicit allocation point in push_mutation_fragment().
2018-01-30 18:33:25 +01:00
Paweł Dziepak
486e0d8740 tests/perf: add microbenchmarks for memtable reader 2018-01-30 18:33:25 +01:00
Avi Kivity
00d70080af Merge "Consume promoted index incrementally" from Vladimir
"This patchset makes index_reader consume promoted index incrementally
on demand as the reader advances through the current partition instead
of storing the entire promoted index which can be huge.

When the current page is parsed, data for promoted indices are turned
into input streams that are only read and parsed if a particular
position within a partition is seeked for. This avoids potentially large
allocations for big partitions."

* 'issues/2981/v10' of https://github.com/argenet/scylla:
  Use advance_past for single partition upper bound.
  Remove obsolete types and methods.
  Simplify continuous_data_consumer::consume_input() interface.
  Parse promoted index entries lazily upon request rather than immediately.
  Add helper input streams: buffer_input_stream and prepended_input_stream.
  Support skipping over bytes from input stream in parsers based on continuous_data_consumer
  Add performance tests for large partition slicing using clustering keys.
2018-01-30 18:22:28 +02:00
Nadav Har'El
2ea1922a4d Materialized views: serialize read-modify-update of base table
Before this patch, our Materialized Views implementation can produce
incorrect results when given concurrent updates of the same base-table
row. Such concurrent updates may result, in certain cases, in two
different rows added to the view table, instead of just one with the latest
data. In this patch we we add locking which serializes the two conflicting
updates, and solves this problem. The locking for a single base-table
column_family is implemented by the row_locker class introduced in a
previous patch.

A long comment in the code of this patch explains in more detail why
this locking is needed, when, and what types of locks are needed: We
sometimes need to lock a single clustering row, sometimes an entire
partition, sometimes an exclusive lock and sometimes a shared lock.

Fixes #3168

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-01-30 16:21:43 +02:00
Nadav Har'El
52e91623ce Materialized views: test row_locker class
This is a unit test for the row_locker facility. It tests various
combination of shared and exclusive locks on rows and on partitions,
some should succeed immediately and some should block.

This tests the row_locker's API only, it does not use or test anything
in Materialized Views.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-01-30 16:19:43 +02:00
Nadav Har'El
31d0a1dd0c Materialized views: implement row and partition locking mechanism
This patch adds a "row_locker" class providing locking (shard-locally) of
individual clustering rows or entire partitions, and both exclusive and
shared locks (a.k.a. reader/writer lock).

As we'll see in a following patch, we need this locking capability for
materialized views, to serialize the read-modify-update modifications
which involve the same rows or partitions.

The new row_locker is significantly different from the existing cell_locker.
The two main differences are that 1. row_locker also supports locking the
entire partition, not just individual rows (or cells in them), and that
2. row_locker supports also shared (reader) locks, not just exclusive locks.
For this reason we opted for a new implementation, instead of making large
modificiations to the existing cell_locker. And we put the source files
in the view/ directory, because row_locker's requirements are pretty
specific to the needs of materialized views.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-01-30 16:16:27 +02:00
Takuya ASADA
bec2b015e3 dist/debian: link yaml-cpp statically
To avoid incompatibility between distribution provided libyaml-cpp, link it
statically.

Fixes #3164

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1517313320-10712-1-git-send-email-syuu@scylladb.com>
2018-01-30 14:22:02 +02:00