scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 04:26:48 +00:00

Author	SHA1	Message	Date
Calle Wilund	3e8cfbf2a0	cql3::statements::property_definitions: Use std::variant instead of any Formalizing what stuff we actually keep in the props. And c++17.	2018-02-07 10:11:46 +00:00
Calle Wilund	0dcf287230	sstables: Add extension type for wrapping file io	2018-02-07 10:11:45 +00:00
Calle Wilund	3ab760b375	schema: Add opaque type to represent extensions A virtual opaque object meant to represent the "extensions" mapping in schema_tables::tables/views	2018-02-07 10:11:45 +00:00
Calle Wilund	74758c87cd	sstables::compress/compress: Make compression a virtual object Make a "compressor" an actual class, that can be implemented and registered via class registry. For "common" compressors, the objects will be shared, but complex implementors can be semi-stateful. sstable compression is split into two parts: The "static" config which is shared across shards, and a "local" one, which holds a compressor pointer. The latter is encapsulated, along with actual compressed data writers, in sstables/compress.cc. For compression (write), compression writer is instansiated with the settings active in table metadata. For decompression (read), compression reader is instansiated with the settings stored in sstable metadata, which can differ from the currently active table metadata. v2: * Structured patch sets differently (dependencies) * Added more comments/api descs * Added patch to move all sstable compression into compress.cc, effectively separating top-level virtual compressor object from sstable io knowledge v3: * Rebased v4: * Moved all sstable compression logic/knowledge into compress.cc (local compression). Merged the two patches (separation just confuses reader).	2018-02-07 10:11:45 +00:00
Paweł Dziepak	6ccd317c38	Merge "Do not evict from memtable snapshots" from Tomasz "When moving whole partition entries from memtable to cache, we move snapshots as well. It is incorrect to evict from such snapshots though, because associated readers would miss data. Solution is to record evictability of partition version references (snapshots) and avoiding eviction from non-evictable snapshots. Could affect scanning reads, if the reader uses partition entry from memtable, and the partition is too large to fit in reader's buffer, and that entry gets moved to cache (was absent in cache), and then gets evicted (memory pressure). The reader will not see the remainder of that entry. Found during code review. Introduced in `ca8e3c4`, so affects 2.1+ Fixes #3186. Tests: unit (release)" * 'tgrabiec/do-not-evict-memtable-snapshots' of github.com:tgrabiec/scylla: tests: mvcc: Add test for eviction with non-evictable snapshots mutation_partition: Define + operator on tombstones tests: mvcc: Check that partition is fully discontinuous after eviction tests: row_cache: Add test for memtable readers surviving flush and eviction memtable: Make printable mvcc: Take partition_entry by const ref in operator<<() mvcc: Do not evict from non-evictable snapshots mvcc: Drop unnecessary assignment to partition_snapshot::_version tests: Use partition_entry::make_evictable() where appropriate mvcc: Encapsulate construction of evictable entries	2018-02-06 14:46:24 +00:00
Tomasz Grabiec	3c51cc79d5	tests: mvcc: Add test for eviction with non-evictable snapshots	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	d37131d320	mutation_partition: Define + operator on tombstones	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	ec5fe5b207	tests: mvcc: Check that partition is fully discontinuous after eviction evict() should remove everything, including range tombstones, so whole clustering range should be marked as discontinuous.	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	c1b82e60e3	tests: row_cache: Add test for memtable readers surviving flush and eviction Reproduces https://github.com/scylladb/scylla/issues/3186	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	d85d651e0f	memtable: Make printable Useful when debugging test failures.	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	06b7b54c3d	mvcc: Take partition_entry by const ref in operator<<() Some users will only have const&.	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	50f5bee12e	mvcc: Do not evict from non-evictable snapshots When moving whole partition entries from memtable to cache, we move snapshots as well. It is incorrect to evict from such snapshots though, because associated readers would miss data. Solution is to record evictability of partition version references (snapshots) and avoiding eviction from non-evictable snapshots. Could affect scanning reads, if the reader uses partition entry from memtable, and the partition is too large to fit in reader's buffer, and that entry gets moved to cache (was absent in cache), and then gets evicted (memory pressure). The reader will not see the remainder of that entry. Introduced in `ca8e3c4`, so affects 2.1+ Fixes #3186.	2018-02-06 14:24:19 +01:00
Tomasz Grabiec	c391bff1d2	mvcc: Drop unnecessary assignment to partition_snapshot::_version merge_partition_versions() is responsible for merging versions unpinned by the current snapshot. If that fails, we don't need to set _version back since versions must be still referenced by someone else, this snapshot is not a unique owner. This change makes it easier to add tracking of evictability.	2018-02-06 14:24:18 +01:00
Tomasz Grabiec	439cbada2c	tests: Use partition_entry::make_evictable() where appropriate	2018-02-06 14:24:18 +01:00
Raphael S. Carvalho	09f4ee808f	sstables/compress: Fix race condition in segmented offset reading of shared sstable Race condition was introduced by commit `028c7a0888`, which introduces chunk offset compression, because a reading state is kept in the compress structure which is supposed to be immutable and can be shared among shards owning the same sstable. So it may happen that shard A updates state while shard B relies on information previously set which leads to incorrect decompression, which in turn leads to read misbehaving. We could serialize access to at() which would only lead to contention issues for shared sstables, but that can be avoided by moving state out of compress structure which is expected to be immutable after sstable is loaded and feeded to shards that own it. Sequential accessor (wraps state and reference to segmented_offset) is added to prevent at() and push_back() interfaces from being polluted. Tests: release mode. Fixes #3148. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180205192432.23405-1-raphaelsc@scylladb.com>	2018-02-06 12:10:10 +02:00
Tomasz Grabiec	d899ae0f02	mvcc: Encapsulate construction of evictable entries Internal invariants of MVCC are better preserved by partition_entry methods, so move construction of partition entries out of cache_entry constructors.	2018-02-05 17:54:03 +01:00
Avi Kivity	a94564a637	Merge seastar upstream * seastar 21badbd...6d02263 (4): > build: detect name of ninja executable > queue: pop_eventually/push_eventually should throw when called after abort > build: compile libfmt out-of-line > core/gate: Ensure with_gate leaves gate on exception	2018-02-05 14:42:07 +02:00
Tomasz Grabiec	d21fbc26c7	tests: range_tombstone_list: Do not depend on argument evaluation order next_pos() calls could be reordered resulting in invalid tombstones being generated. Message-Id: <1517833688-20022-1-git-send-email-tgrabiec@scylladb.com>	2018-02-05 12:31:37 +00:00
Tomasz Grabiec	d2baa49313	tests: Do not produce invalid range tombstones Upper bound should not be smaller than lower bound. Found by asserting on valid bounds. Message-Id: <1517833602-19732-1-git-send-email-tgrabiec@scylladb.com>	2018-02-05 12:29:03 +00:00
Takuya ASADA	6d134c0c2b	dist/redhat: block installing Scylla on older kernel We uses AmbientCapabilities directive on systemd unit, but it does not work on older kernel, causes following error: "systemd[5370]: Failed at step CAPABILITIES spawning /usr/bin/scylla: Invalid argument" It only works on kernel-3.10.0-514 == CentOS7.3 or later, block installing rpm to prevent the error. Fixes #3176 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1517822764-2684-1-git-send-email-syuu@scylladb.com>	2018-02-05 12:57:17 +02:00
Duarte Nunes	46099e4f58	tests/role_manager_test: Stop role_manager Not stopping them may cause the tests to fail due to an asynchronous process being scheduled and accessing freed data. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180202221640.28609-1-duarte@scylladb.com>	2018-02-05 09:39:59 +00:00
Avi Kivity	6919c7434e	Merge seastar upstream * seastar 19efbd9...21badbd (4): > reactor: change adjustment method for tasks becoming active > Merge 'Update ARM port' from Avi > http: Do not wait for close connection on stop if listen did not completed > core/future-util: Don't allow rvalues in do_for_each()	2018-02-04 14:28:28 +02:00
Avi Kivity	2173e74212	tests: de-template cql_query_test cql_query_test contains many continuations that are generic lambdas: foo().then([] (auto x) { ... }) These templates prevent Eclipse's indexer from inferring the type of x, and so everything below that point is one big error as far as Eclipse is concerned. De-template these lambdas by specifying the real types. Unfortunately, compile time decrease was not observed. Tests: cql_query_test (release) Message-Id: <20180204113503.23297-1-avi@scylladb.com>	2018-02-04 11:48:52 +00:00
Takuya ASADA	dc2b17b3da	dist/redhat: link yaml-cpp statically To avoid incompatibility between distribution provided libyaml-cpp, link it statically. Fixes #3173 Message-Id: <1517546935-15858-2-git-send-email-syuu@scylladb.com>	2018-02-03 16:34:36 +02:00
Takuya ASADA	82f217d62a	configure.py: make --static-yaml-cpp works properly for Scylla We are doing static linking of libyaml-cpp for libseatar well, but mistakenly not for Scylla, need to fix. Message-Id: <1517546935-15858-1-git-send-email-syuu@scylladb.com>	2018-02-03 16:34:32 +02:00
Amnon Heiman	836876d81a	main: stop prometheus server when shutting down This patch adds a enging().on_exit cleanup for the prometheus server, similar to other components in the system. It will stop the server when sutting down. Fixes #2520 Message-Id: <20180201132647.17638-1-amnon@scylladb.com>	2018-02-02 11:03:51 +01:00
Tomasz Grabiec	582dd36303	Merge 'Fixes for exception safety in memtable range reads' from Paweł These patches deal with the remaining exception safety issues in the memtable partition range readers. That includes moving the assignment to iterator_reader::_last outside of allocating section to avoid problems caused by exception-unsafe assignment operator. Memory accotuning code is also moved out of the retryable context to improve the code robustness and avoid potential problems in the future. Fixes #3172. Tests: unit-test (release) * https://github.com/pdziepak/scylla.git memtable-range-read-exception-safety/v1: memtable: do not update iterator_reader::_last in alloc section memtable: do not change accounting state in alloc section tests/memtable: add more reader exception safety tests	2018-02-02 11:00:58 +01:00
Paweł Dziepak	c2a5fd520f	cql3/role-management: avoid static local shared_ptr Even if shared_ptr is const it doesn't mean that its internal state is immutable and it still cannot be freely shared across shards. Fixes assertion failure in build/debug/tests/cql_roles_query_test. Message-Id: <20180201125221.30531-1-pdziepak@scylladb.com>	2018-02-01 16:28:36 +02:00
Paweł Dziepak	ea50806172	tests/mutation_reader: avoid static local lw_shared_ptr Shared pointer don't like being shared across shards. Fixes assertion failure in build/debug/tests/mutation_reader_test. Message-Id: <20180201125017.30259-1-pdziepak@scylladb.com>	2018-02-01 13:53:55 +01:00
Paweł Dziepak	20c460d8f0	tests/memtable: add more reader exception safety tests	2018-01-31 16:05:35 +00:00
Paweł Dziepak	c945bdc7f6	memtable: do not change accounting state in alloc section Allocating sections can be retried so code that has side effects (like updating flushed bytes accouting) has no place there.	2018-01-31 16:04:31 +00:00
Paweł Dziepak	d803370868	memtable: do not update iterator_reader::_last in alloc section iterator_reader::_last is a part of the state that survives allocating section retries, therefore, it should not be modified in the retryable context.	2018-01-31 16:03:16 +00:00
Avi Kivity	4463e9071a	Merge "Adding the API V2 Swagger definition file" from Amnon "This series adds the base for the V2 Swagger definition file. After the series, the definition file will be at: http://localhost:10000/v2 It can be used with the swagger ui, by replacing the url in the search path." * 'amnon/swagger_20' of github.com:scylladb/seastar-dev: Register the API V2 swagger file Adding the header part of the swagger2.0 API	2018-01-31 14:47:50 +02:00
Duarte Nunes	cf6110d840	tests/cell_locker_test: Ensure timeout test finishes in useful time Use saturating_substract to prevent a really long timeout and having the test hang. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180130221336.1773-1-duarte@scylladb.com>	2018-01-31 11:34:08 +01:00
Duarte Nunes	01a8e5abb9	Merge 'Materialized views: add local locking' from Nadav "Before this patch set, our Materialized Views implementation can produce incorrect results when given concurrent updates of the same base-table row. Such concurrent updates may result, in certain cases, with two different rows in the view table, instead of just one with the latest data. In this series we add locking which serializes the two conflicting updates, and solves this problem. I explain in more detail why such locking is needed, and what kinds of locks are needed, in the third patch." * 'master' of https://github.com/nyh/scylla: Materialized views: serialize read-modify-update of base table Materialized views: test row_locker class Materialized views: implement row and partition locking mechanism	2018-01-30 17:40:12 +00:00
Tomasz Grabiec	cdd31918d0	Merge 'Make memtable reads exception safe' from Paweł These patches change the memtable reader implementation (in particular partition_snapshot_reader) so that the existing exception safety paroblems are fixed, but also in a way that, hopefully, would make it easier to reason about the error handling and avoid future bugs in that area. The main difficulty related to exception safety is that when an exception is thrown out of an allocating section that code is run again with increased memory reserved. If the retryable code has side effects it is very easy to get incorrect behaviour. In addition to that, entering an allocating section is not exactly cheap which encourages doing so rarely and having large sections. The approach taken by this series is to, first, make entering allocating sections cheaper and then reducing the amount of logic that runs inside of them to a minimum. This means that instead of entering a section once per a call to flat_mutation_reader::fill_buffer() the allocation section is entered once for each emitted row. The only state modified from within the section are cached iterators to the current row, which are dropped on retry. Hopefully, this would make the reader code easier to reason about. The optimisations to the allocating sections and managed_bytes linearised context has successfully eliminated any penalty caused by much more fine grained allocating sections. Fixes #3123. Fixes #3133. Tests: unit-tests (release) BEFORE test iterations median mad min max memtable.one_partition_one_row 1155362 869.139ns 0.282ns 868.465ns 873.253ns memtable.one_partition_many_rows 127252 7.871us 15.252ns 7.851us 7.886us memtable.many_partitions_one_row 58715 17.109us 2.765ns 17.013us 17.112us memtable.many_partitions_many_rows 4839 206.717us 212.385ns 206.505us 207.448us AFTER test iterations median mad min max memtable.one_partition_one_row 1194453 839.223ns 0.503ns 834.952ns 842.841ns memtable.one_partition_many_rows 133785 7.477us 4.492ns 7.473us 7.507us memtable.many_partitions_one_row 60267 16.680us 18.027ns 16.592us 16.700us memtable.many_partitions_many_rows 4975 201.048us 144.929ns 200.822us 201.699us ./before_sq ./after_sq diff read 337373.86 353694.24 4.8% write 388759.99 394135.78 1.4% * https://github.com/pdziepak/scylla.git memtable-exception-safety/v2: tests/perf: add microbenchmarks for memtable reader flat_mutation_reader: add allocation point in push_mutation_fragment linearization_context: remove non-trivial operations from fast path lsa: split alloc section into reserving and reclamation-disabled parts lsa: optimise disabling reclamation and invalidation counter mutation_fragment: allow creating clustering row in place paratition_snapshot_reader: minimise amount of retryable code memtable: drop memtable_entry::read() tests/memtable: add test for reader exception safety	2018-01-30 18:33:27 +01:00
Paweł Dziepak	1406ac5088	tests/memtable: add test for reader exception safety	2018-01-30 18:33:26 +01:00
Paweł Dziepak	ea7248056f	memtable: drop memtable_entry::read()	2018-01-30 18:33:26 +01:00
Paweł Dziepak	0420ca48a5	paratition_snapshot_reader: minimise amount of retryable code Retryable code that has side effects is a recipe for bugs. This patch reworkds the snapshot reader so that the amount of logic run with reclamation disabled is minimal and has a very limited side effects.	2018-01-30 18:33:26 +01:00
Paweł Dziepak	b1cb7d214e	mutation_fragment: allow creating clustering row in place Moving clustering_row is expensive due to amount of data stored internally. Adding a mutation_fragment constructor that builds a clustering_row in-place saves some of that moving.	2018-01-30 18:33:26 +01:00
Paweł Dziepak	dcd79af8ed	lsa: optimise disabling reclamation and invalidation counter Most of the lsa gory details are hidden in utils/logalloc.cc. That includes the actual implementation of a lsa region: region_impl. However, there is code in the hot path that often accesses the _reclaiming_enabled member as well as its base class allocation_strategy. In order to optimise those accesses another class is introduced: basic_region_impl that inherits from allocation_strategy and is a base of region_impl. It is defined in utils/logalloc.hh so that it is publicly visible and its member functions are inlineable from anywhere in the code. This class is supposed to be as small as possible, but contain all members and functions that are accessed from the fast path and should be inlined.	2018-01-30 18:33:26 +01:00
Paweł Dziepak	d825ae37bf	lsa: split alloc section into reserving and reclamation-disabled parts Allocating sections reserves certain amount of memory, then disables reclamation and attempts to perform given operation. If that fails due to std::bad_alloc the reserve is increased and the operation is retried. Reserving memory is expensive while just disabling reclamation isn't. Moreover, the code that runs inside the section needs to be safely retryable. This means that we want the amount of logic running with reclamation disabled as small as possible, even if it means entering and leaving the section multiple times. In order to reduce the performance penalty of such solution the memory reserving and reclamation disabling parts of the allocating sections are separated.	2018-01-30 18:33:26 +01:00
Paweł Dziepak	eb2e88e925	linearization_context: remove non-trivial operations from fast path Since linearization_context is thread_local every time it is accessed the compiler needs to emit code that checks if it was already constructed and does so if it wasn't. Moreover, upon leaving the context from the outermost scope the map needs to be cleared. All these operations impose some performance overhead and aren't really necessary if no buffers were linearised (the expected case). This patch rearranges the code so that lineatization_context is trivially constructible and the map is cleared only if it was modified.	2018-01-30 18:33:25 +01:00
Paweł Dziepak	a1278b4d6a	flat_mutation_reader: add allocation point in push_mutation_fragment Exception safety tests inject a failure at every allocation and verify whether the error is handled properly. push_mutation_fragment() adds a mutation fragment to a circular_buffer, in theory any call to that function can result in a memory allocation, but in practice that depends on the implementation details. In order to improve the effectiveness of the exception safety tests this patch adds an explicit allocation point in push_mutation_fragment().	2018-01-30 18:33:25 +01:00
Paweł Dziepak	486e0d8740	tests/perf: add microbenchmarks for memtable reader	2018-01-30 18:33:25 +01:00
Avi Kivity	00d70080af	Merge "Consume promoted index incrementally" from Vladimir "This patchset makes index_reader consume promoted index incrementally on demand as the reader advances through the current partition instead of storing the entire promoted index which can be huge. When the current page is parsed, data for promoted indices are turned into input streams that are only read and parsed if a particular position within a partition is seeked for. This avoids potentially large allocations for big partitions." * 'issues/2981/v10' of https://github.com/argenet/scylla: Use advance_past for single partition upper bound. Remove obsolete types and methods. Simplify continuous_data_consumer::consume_input() interface. Parse promoted index entries lazily upon request rather than immediately. Add helper input streams: buffer_input_stream and prepended_input_stream. Support skipping over bytes from input stream in parsers based on continuous_data_consumer Add performance tests for large partition slicing using clustering keys.	2018-01-30 18:22:28 +02:00
Nadav Har'El	2ea1922a4d	Materialized views: serialize read-modify-update of base table Before this patch, our Materialized Views implementation can produce incorrect results when given concurrent updates of the same base-table row. Such concurrent updates may result, in certain cases, in two different rows added to the view table, instead of just one with the latest data. In this patch we we add locking which serializes the two conflicting updates, and solves this problem. The locking for a single base-table column_family is implemented by the row_locker class introduced in a previous patch. A long comment in the code of this patch explains in more detail why this locking is needed, when, and what types of locks are needed: We sometimes need to lock a single clustering row, sometimes an entire partition, sometimes an exclusive lock and sometimes a shared lock. Fixes #3168 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-01-30 16:21:43 +02:00
Nadav Har'El	52e91623ce	Materialized views: test row_locker class This is a unit test for the row_locker facility. It tests various combination of shared and exclusive locks on rows and on partitions, some should succeed immediately and some should block. This tests the row_locker's API only, it does not use or test anything in Materialized Views. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-01-30 16:19:43 +02:00
Nadav Har'El	31d0a1dd0c	Materialized views: implement row and partition locking mechanism This patch adds a "row_locker" class providing locking (shard-locally) of individual clustering rows or entire partitions, and both exclusive and shared locks (a.k.a. reader/writer lock). As we'll see in a following patch, we need this locking capability for materialized views, to serialize the read-modify-update modifications which involve the same rows or partitions. The new row_locker is significantly different from the existing cell_locker. The two main differences are that 1. row_locker also supports locking the entire partition, not just individual rows (or cells in them), and that 2. row_locker supports also shared (reader) locks, not just exclusive locks. For this reason we opted for a new implementation, instead of making large modificiations to the existing cell_locker. And we put the source files in the view/ directory, because row_locker's requirements are pretty specific to the needs of materialized views. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-01-30 16:16:27 +02:00
Takuya ASADA	bec2b015e3	dist/debian: link yaml-cpp statically To avoid incompatibility between distribution provided libyaml-cpp, link it statically. Fixes #3164 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1517313320-10712-1-git-send-email-syuu@scylladb.com>	2018-01-30 14:22:02 +02:00

1 2 3 4 5 ...

14460 Commits