scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 22:25:48 +00:00

Author	SHA1	Message	Date
Avi Kivity	187ebdbe46	auth: fix possible use of disengaged optional in has_salted_hash() untyped_result_set_row's cell data type is bytes_opt, and the get_block() accessor accesses the value assuming it's engaged (relying on the caller to call has()). has_unsalted_hash() calls get_blob() without calling has() beforehand, potentially triggering undefined behavior. Fix by using get_or() instead, which also simplifies the caller. I observed failures in Jenkins in this area. It's hard to be sure this is the root cause, since the failures triggered an internal consistency assertion in asan rather than an asan report. However, the error is hard to reproduce and the fix makes sense even if it doesn't prevent the error. See #3480 for the asan error. Fixes #3480 (hopefully). Message-Id: <20180602181919.29204-1-avi@scylladb.com>	2018-06-02 19:46:32 +01:00
Avi Kivity	aab6b0ee27	Merge "Introduce new in-memory representation for cells" from Paweł " This is the first part of the first step of switching Scylla. It covers converting cells to the new serialisation format. The actual structure of the cells doesn't differ much from the original one with a notable exception of the fact that large values are now fragmented and linearisation needs to be explicit. Counters and collections still partially rely on their old, custom serialisation code and their handling is not optimial (although not significantly worse than it used to be). The new in-memory representation allows objects to be of varying size and makes it possible to provide deserialisation context so that we don't need to keep in each instance of an IMR type all the information needed to interpret it. The structure of IMR types is described in C++ using some metaprogramming with the hopes of making it much easier to modify the serialisation format that it would be in case of open-coded serialisation functions. Moreover, IMR types can own memory thanks to a limited support for destructors and movers (the latter are not exactly the same thing as C++ move constructors hence a different name). This makes it (relatively) to ensure that there is an upper bound on the size of all allocations. For now the only thing that is converted to the IMR are atomic_cells and collections which means that the reduction in the memory footprint is not as big as it can be, but introducing the IMR is a big step on its own and also paves the way towards complete elimination of unbounded memory allocations. The first part of this patchset contains miscellaneous preparatory changes to various parts of the Scylla codebase. They are followed by introduction of the IMR infrastructure. Then structure of cells is defined and all helper functions are implemented. Next are several treewide patches that mostly deal with propagating type information to the cell-related operations. Finally, atomic_cell and collections are switched to used the new IMR-based cell implementation. The IMR is described in much more detail in imr/IMR.md added in "imr: add IMR documentation". Refs #2031. Refs #2409. perf_simple_query -c4, medians of 30 results: ./perf_base ./perf_imr diff read 308790.08 309775.35 0.3% write 402127.32 417729.18 3.9% The same with 1 byte values: ./perf_base1 ./perf_imr1 diff read 314107.26 314648.96 0.2% write 463801.40 433255.96 -6.6% The memory footprint is reduced, but that is partially due to removal of small buffer optimisation (whether it will be restored depends on the exact mesurements of the performance impact). Generally, this series was not expected to make a huge difference as this would require converting whole rows to the IMR. Memory footprint: Before: mutation footprint: - in cache: 1264 - in memtable: 986 After: mutation footprint: - in cache: 1104 - in memtable: 866 Tests: unit (release, debug) " * tag 'imr-cells/v3' of https://github.com/pdziepak/scylla: (37 commits) tests/mutation: add test for changing column type atomic_cell: switch to new IMR-based cell reperesentation atomic_cell: explicitly state when atomic_cell is a collection member treewide: require type for creating collection_mutation_view treewide: require type for comparing cells atomic_cell: introduce fragmented buffer value interface treewide: require type to compute cell memory usage treewide: require type to copy atomic_cell treewide: require type info for copying atomic_cell_or_collection treewide: require type for creating atomic_cell atomic_cell: require column_definition for creating atomic_cell views tests: test imr representation of cells types: provide information for IMR data: introduce cell data: introduce type_info imr/utils: add imr object holder imr: introduce concepts imr: add helper for allocating objects imr: allow creating lsa migrators for IMR objects imr: introduce placeholders ...	2018-05-31 19:21:15 +03:00
Amnon Heiman	bc7503feee	Scyllatop to use prometheus by default Scylla now expose the prometheus API by default. This patch chagnes scyllatop to use the Prometheus API, the collect API is still available. The main changes in the patch: * Move collectd specific logic inside collectd. * Add support for help information. * Add command line to configure prometheus end point and to enable collectd. * Add a prometheus class that collect information from prometheus. Fixes: #1541 Message-Id: <20180531124156.26336-1-amnon@scylladb.com>	2018-05-31 18:00:22 +03:00
Tomasz Grabiec	b5e42bc6a0	tests: row_cache: Do not hang when only one of the readers throws Message-Id: <20180531122729.3314-1-tgrabiec@scylladb.com>	2018-05-31 18:00:22 +03:00
Piotr Sarna	360326fdc5	cql3: add compatibility with libjsoncpp < 1.6.0 Only libjsoncpp >= 1.6.0 offers a safe name() method for value iterators. For older versions, deprecated memberName() is used instead. Note that memberName() was deprecated because of its inability to deal with embedded null characters. Fixes #3471 Message-Id: <e64a62bfc24ef06daee238d79d557fe6ec8979d3.1527758708.git.sarna@scylladb.com>	2018-05-31 18:00:22 +03:00
Paweł Dziepak	131a47dea3	tests/mutation: add test for changing column type With the introduction of the new in-memory representation changing column type has become a more complex operation since it needs to handle switch from fixed-size to variable-size types. This commit adds an explicit test for such cases.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	a040d37cd5	atomic_cell: switch to new IMR-based cell reperesentation This patch changes the implementation of atomic_cell and atomic_cell_or_collection to use the data::cell implementation which is based on the new in-memory representation infrastructure.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	0ea6d14cf5	atomic_cell: explicitly state when atomic_cell is a collection member Collections are not going to be fully converted to the IMR just yet and still use the old serialisation format. This means that they still don't support fragmented values very well. This patch passes the information when an atomic_cell is created as a member of a collection so that later we can avoid fragmenting the value in such cases.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	e34ff8b4bf	treewide: require type for creating collection_mutation_view	2018-05-31 15:51:11 +01:00
Paweł Dziepak	9bb1f10bb6	treewide: require type for comparing cells	2018-05-31 15:51:11 +01:00
Paweł Dziepak	aa25f0844f	atomic_cell: introduce fragmented buffer value interface As a prepratation for the switch to the new cell representation this patch changes the type returned by atomic_cell_view::value() to one that requires explicit linearisation of the cell value. Even though the value is still implicitly linearised (and only when managed by the LSA) the new interface is the same as the target one so that no more changes to its users will be needed.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	ec9d166a4f	treewide: require type to compute cell memory usage	2018-05-31 15:51:11 +01:00
Paweł Dziepak	418c159057	treewide: require type to copy atomic_cell	2018-05-31 15:51:11 +01:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Paweł Dziepak	e9d6fc48ac	treewide: require type for creating atomic_cell	2018-05-31 15:51:11 +01:00
Paweł Dziepak	93130e80fb	atomic_cell: require column_definition for creating atomic_cell views	2018-05-31 15:51:11 +01:00
Paweł Dziepak	b25cc61a13	tests: test imr representation of cells	2018-05-31 15:51:11 +01:00
Paweł Dziepak	43b216b43d	types: provide information for IMR	2018-05-31 15:51:11 +01:00
Paweł Dziepak	eec33fda14	data: introduce cell This commit introduces cell serializers and views based on the in-memory representation infrastructure. The code doesn't assume anything about how the cells are stored, they can be either a part of another IMR object (once the rows are converted to the IMR) or a separate objects (just like current atomic_cell).	2018-05-31 15:51:11 +01:00
Duarte Nunes	f8626c7c93	tests/view_schema_test: Test view correctness under base schema changes Reproducer for #3443. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180530194536.51202-2-duarte@scylladb.com>	2018-05-31 12:10:50 +03:00
Duarte Nunes	c4f267bdfe	database: Refresh view dependent fields when altering base A view schema's view_info contains the id of the base regular column that view includes in its primary key. Since the column id of a particular column can potentially change with a new schema version, we need to refresh the stored column id. We weren't doing that when unselected base columns are added, and this patch fixes it by triggering an update of the view schema when base columns are added and the view contains a base regular column in its PK. Fixes #3443 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180530194536.51202-1-duarte@scylladb.com>	2018-05-31 12:10:49 +03:00
Paweł Dziepak	544b3c9a34	data: introduce type_info This patch introduces type_info class which contains all type information needed by IMR deserialisation contexts.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	4929c1f39a	imr/utils: add imr object holder imr::object<> is an owning pointer to an IMR objects. It is LSA-aware.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	fd47858755	imr: introduce concepts This commit adds type traits and concepts for sizers, serializers and writers that help explicitly specify requirements of various interfaces.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	28ea36a686	imr: add helper for allocating objects IMR objects may own memory. object_allocator takes care of allocating memory for all owned objects during the serialisation of their owner. In practice a writer of the parent object would accept a helper object created by object_allocator. That helper object would be either responsible for computing the size of buffers that have to be allocated or perform the actual serialisation in the same two phase manner as it is done for the parent IMR object.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	79941f2fc7	imr: allow creating lsa migrators for IMR objects This patch introduces helpers for creating LSA migrators from IMR deserialisation contexts and context factories.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	5ddb118c78	imr: introduce placeholders In some cases the actual value of an IMR object is not know at the serialisation time. If the type is fixed-size we can use a placeholder to defer writing it to a more conveninent moment.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	8c38f09fbc	tests/imr: add tests for destructor and mover methods	2018-05-31 10:09:01 +01:00
Paweł Dziepak	fa7b080443	imr: introduce destructor and mover methods This patch introduces destructors and movers for IMR objects which enables them to own memory. Custom destructors and methods can be defined by specialising appropriate classes.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	c02bfb942d	imr/compound: introduce tagged_type<Tag, T>	2018-05-31 10:09:01 +01:00
Paweł Dziepak	a29a88c9d9	tests/imr/compound: add tests for structure<...>	2018-05-31 10:09:01 +01:00
Paweł Dziepak	4f51901dfe	imr/compound: introduce structure<...>	2018-05-31 10:09:01 +01:00
Paweł Dziepak	466d91f652	tests/imr/compound: add tests for variant<Ts...>	2018-05-31 10:09:01 +01:00
Paweł Dziepak	8e4c8ce2c4	imr/compound: introduce variant<Ts...>	2018-05-31 10:09:01 +01:00
Paweł Dziepak	7c28c9eda8	tests/imr: add test for optional<T>	2018-05-31 10:09:01 +01:00
Paweł Dziepak	6d7b205d1a	imr: introduce optional<T>	2018-05-31 10:09:01 +01:00
Paweł Dziepak	eb2479fa9a	tests: add test for new in memory representation	2018-05-31 10:09:01 +01:00
Paweł Dziepak	a995fb337c	imr: introduce fundamental types This patch introduces fundamental IMR types: a set of flags, a POD type and a buffer.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	5f960beca1	imr: add IMR documentation	2018-05-31 10:09:01 +01:00
Paweł Dziepak	0092076167	tests: add helpers for generating random data	2018-05-31 10:09:01 +01:00
Paweł Dziepak	cc76480174	tests: introduce tests for metaprogramming helpers	2018-05-31 10:09:01 +01:00
Paweł Dziepak	ba5e64383a	utils: add metaprogramming helper functions	2018-05-31 10:09:01 +01:00
Paweł Dziepak	5845d52632	idl: allow fragmented bytes_view in serialisation This patch adds new way of serialising bytes and sstring objects in the IDL. Using write_fragmented_<field-name>() the caller can pass a range of fragments that would be serialised without linearising the buffer.	2018-05-31 10:09:01 +01:00
Paweł Dziepak	c41b9fc7ec	utils: add fragment range This patch introduces a FragmentRange concept which is the minimal interface all classes representing a fragmented buffer should satisfy.	2018-05-31 10:09:01 +01:00
Nadav Har'El	a1cbeeffcd	tests/view_complex_test.cc: fix and enable buggy test tests/view_complex_test.cc contained a #ifdef'ed-out test claiming to be a reproducer for issue #3362. Unfortunately, it it is not - after earlier commits the only reason this test still fails is a mistake in the test, which expects 0 rows in a case where the real result is 1 row. Issue #3362 does not have to be fixed to fix this test. So this patch fixes the broken test, and enables it. It also adds comments explaining what this test is supposed to do, and why it works the way it does. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180530142214.29398-1-nyh@scylladb.com>	2018-05-30 15:39:25 +01:00
Avi Kivity	9999e0e6bc	Merge "Implement support for static rows in SSTable 3.0" from Piotr " Add handling for static rows and tests for it. " * 'haaawk/sstables3/read-static-v1' of ssh://github.com/scylladb/seastar-dev: sstable_3_x_test: Add test_uncompressed_compound_static_row_read sstable_3_x_test: add test_uncompressed_static_row_read flat_mutation_reader_assertions: improve static row assertions data_consume_rows_context_m: Implement support for static rows mp_row_consumer_m: Implement support for static rows mp_row_consumer_m: Extract fill_cells	2018-05-30 17:17:17 +03:00
Paweł Dziepak	62d0639fe9	Merge "Avoid reactor stalls in cache with large partitions" from Tomasz " We currently suffer from reactor stalls caused by non-preemptible processing of large partitions in the following places: (1) dropping partition entries from cache or memtables does not defer (2) dropping partition versions abandoned by detached snapshots does not defer (3) merging of partition versions when snapshots go away does not defer (4) cache update from memtable processes partition entries without deferring (#2578) (5) partition entries are upgraded to new schema atomically This series fixes problems (1), (2) and (4), but not (3) and (5). (1) and (2) are fixed by introducing mutation_cleaner objects which are containers for garbage partition versions which are delaying actual freeing. Freeing happens from memory reclaimers and is incremental. (3) and (5) are not solved yet. (4) is solved by having partition merging process partitions with row granularity and defer in the middle of partition. In order to preserve update atomicity on partition level as perceived by reads, when update starts we create a snapshot to the current version of partition and process memtable entry by inserting data into a separate partition version. This way if upgrade defers in the middle of partition reads can still go to the old version and not see partial writes. Snapshots are marked with phase numbers, and reads will use the previous phase until whole partition is upgraded. When partition is finally merged, the snapshots go away and the new version will eventually be merged to the old version. Due to (3) however, this merging may still add latency to the upgrade path. Remaining work: - Solving problem (3). I think the approach to take here would be to move the task of merging versions to the background, maybe into mutation_cleaner. - Merging range tombstones incrementally. Performance =========== Performance improvements were evaluated using tests/perf_row_cache_update -c1 -m1G, which measures time it takes to update cache from memtable for various workloads and schemas. For large partition with lots of small rows we see a significant reduction of scheduling latency from ~550ms to ~23ms. The cause of remainig latency is problem (3) stated above. The run time is reduced by 70%. For small partition case without clustering columns we see no degradation. For small partition case with clustering key, but only 3 small rows per partition, we see a 30% degradation in run time. For large partition with lots of range tombstones we see degradation of 15% in run time and scheduling latency. Below you can see full statistics for cache update run time: === Small partitions, no overwrites: Before: avg = 433.965155 stdev = 35.958024 min = 340.093201 max = 468.564514 After: avg = 436.929447 (+1%) stdev = 37.130237 min = 349.410339 max = 489.953400 === Small partition with a few rows: Before: avg = 315.379316 stdev = 30.059120 min = 240.340561 max = 342.408295 After: avg = 407.232691 (+30%) stdev = 53.918717 min = 269.514648 max = 444.846649 === Large partition, lots of small rows: Before: avg = 412.870689 stdev = 227.411317 min = 286.990631 max = 1263.417847 After: avg = 124.351705 (-70%) stdev = 4.705762 min = 110.063255 max = 129.643387 === Large partition, lots of range tombstones: Before: avg = 601.172644 stdev = 121.376866 min = 223.502136 max = 874.111572 After: avg = 695.627588 (+15%) stdev = 135.057004 min = 337.173950 max = 784.838745 " * tag 'tgrabiec/clear-gently-all-partitions-v3' of github.com:tgrabiec/scylla: mvcc: Use small_vector<> in partition_snapshot_row_cursor utils: Extract small_vector.hh mvcc: Erase rows gradually in apply_to_incomplete() mvcc: partition_snapshot_row_cursor: Avoid row copying in consume() when possible cache: real_dirty_memory_accounter: Move unpinning out of the hot path mvcc: partition_snapshot_row_cursor: Reduce lookups in ensure_entry_if_complete() mutation_partition: Reduce row lookups in apply_monotonically() cache: Release dirty memory with row granularity cache: Defer during partition merging mvcc: partition_snapshot_row_cursor: Introduce consume_row() mvcc: partition_snapshot_row_cursor: Introduce maybe_refresh_static() mvcc: Make apply_to_incomplete() work with attached versions cache: Propagate phase to apply_to_incomplete() cache: Prepare for incremental apply_to_incomplete() Introduce a coroutine wrapper tests: mvcc: Encapsulate memory management details tests: cache: Take into account that update() may defer cache: real_dirty_memory_accounter: Allow construction without memtable cache: Extract real_dirty_memory_accounter mvcc: Destroy memtable partition versions gently memtable: Destroy partitions incrementally from clear_gently() mvcc: Remove rows from tracker gently cache: Destroy partition versions incrementally Introduce mutation_cleaner mvcc: Introduce partition_version_list mvcc: Fix move constructor of partition_version_ref() not preserving _unique_owner database: Add API for incremental clearing of partition entries cache: Define trivial methods inline tests: Improve perf_row_cache_update mutation_reader: Make empty mutation source advertize no partitions	2018-05-30 14:12:29 +01:00
Tomasz Grabiec	4561e97efe	mvcc: Use small_vector<> in partition_snapshot_row_cursor I measured 8% improvement in cache update throughput for small partitions.	2018-05-30 14:41:41 +02:00
Tomasz Grabiec	db36ff0643	utils: Extract small_vector.hh	2018-05-30 14:41:41 +02:00
Tomasz Grabiec	5b59df3761	mvcc: Erase rows gradually in apply_to_incomplete() So that we avoid double-buffering partitions.	2018-05-30 14:41:41 +02:00

1 2 3 4 5 ...

15609 Commits