scylladb

Author	SHA1	Message	Date
Botond Dénes	5e97fb9fc4	row_cache: update reader implementations to v2 cache_flat_mutation_reader gets a native v2 implementation. The underlying mutation representation is not changed: range deletions are still stored as v1 range_tombstones in mutation_partition. These are converted to range tombstone changes during reading. This allows for separating the change of a native v2 reader implementation and a native v2 in-memory storage format, enabling the two to be done at separate times and incrementally.	2022-04-21 14:57:04 +03:00
Botond Dénes	b029bd3db7	tree: remove mutation_reader.hh include In most files it was unused. We should move these to the patch which moved out the last interesting reader from mutation_reader.hh (and added the corresponding new header include) but its probably not worth the effort. Some other files still relied on mutation_reader.hh to provide reader concurrency semaphore and some other misc reader related definitions.	2022-03-30 15:42:51 +03:00
Mikołaj Sielużycki	1d84a254c0	flat_mutation_reader: Split readers by file and remove unnecessary includes. The flat_mutation_reader files were conflated and contained multiple readers, which were not strictly necessary. Splitting optimizes both iterative compilation times, as touching rarely used readers doesn't recompile large chunks of codebase. Total compilation times are also improved, as the size of flat_mutation_reader.hh and flat_mutation_reader_v2.hh have been reduced and those files are included by many file in the codebase. With changes real 29m14.051s user 168m39.071s sys 5m13.443s Without changes real 30m36.203s user 175m43.354s sys 5m26.376s Closes #10194	2022-03-14 13:20:25 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Tomasz Grabiec	5196d450bd	row_cache: Improve trace-level logging Print MVCC snapshot to help distinguish reads which use different snapshots. Also, print the whole cursor, not just its position. This helps in determining which MVCC version the iterator comes from.	2021-12-19 22:41:35 +01:00
Tomasz Grabiec	63351483f0	row_cache: Support reverse reads natively Some implementation notes below. When iterating in reverse, _last_row is after the current entry (_next_row) in table schema order, not before like in the forward mode. Since there is no dummy row before all entries, reverse iteration must be now prepared for the fact that advancing _next_row may land not pointing at any row. The partition_snapshot_row_cursor maintains continuity() correctly in this case, and positions the cursor before all rows, so most of the code works unchanged. The only excpetion is in move_to_next_entry(), which now cannot assume that failure to advance to an entry means it can end a read. maybe_drop_last_entry() is not implemented in reverse mode, which may expose reverse-only workload to the problem of accumulating dummy entries. ensure_population_lower_bound() was not updating _last_row after inserting the entry in latets version. This was not a problem for forward reads because they do not modify the row in the partition snapshot represented by _last_row. They only need the row to be there in the latest version after the call. It's different for reveresed reads, which change the continuity of the entry represented by _last_row, hence _last_row needs to have the iterator updated to point to the entry from the latest version, otherwise we'd set the continuity of the previous version entry which would corrupt the continuity.	2021-12-19 22:41:35 +01:00
Tomasz Grabiec	b3618163f8	row_cache: Log produced range tombstones	2021-12-19 22:41:35 +01:00
Pavel Emelyanov	ee103636ac	row-cache: Handle exception (un)safety of rows_entry insertion The B-tree's insert_before() is throwing operation, its caller must account for that. When the rows_entry's collection was switched on B-tree all the risky places were fixed by `ee9e1045`, but few places went under the radar. In the cache_flat_mutation_reader there's a place where a C-pointer is inserted into the tree, thus potentially leaking the entry. In the partition_snapshot_row_cursor there are two places that not only leak the entry, but also leave it in the LRU list. The latter it quite nasty, because those entry can be evicted, eviction code tries to get rows_entry iterator from "this", but the hook happens to be unattached (because insertion threw) and fails the assert. fixes: #9728 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-10 12:35:12 +03:00
Avi Kivity	daf028210b	build: enable -Winconsistent-missing-override warning This warning can catch a virtual function that thinks it overrides another, but doesn't, because the two functions have different signatures. This isn't very likely since most of our virtual functions override pure virtuals, but it's still worth having. Enable the warning and fix numerous violations. Closes #9347	2021-09-15 12:55:54 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Michael Livshin	f364666d4a	row_cache: count read row tombstones Refs #7749. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-08-01 19:41:11 +03:00
Pavel Emelyanov	1bf643d4fd	mutation_partition: Pin mutable access to range tombstones Some callers of mutation_partition::row_tomstones() don't want (and shouldn't) modify the list itself, while they may want to modify the tombstones. This patch explicitly locates those that need to modify the collection, because the next patch will return immutable collection for the others. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-27 20:06:53 +03:00
Pavel Emelyanov	ad27bf40e6	mutation_partition: Pin mutable access to rows Some callers of mutation_partition::clustered_rows() don't want (and shouldn't) modify the tree of rows, while they may want to modify the rows themselves. This patch explicitly locates those that need to modify the collection, because the next patch will return immutable collection for the others. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-27 20:06:53 +03:00
Tomasz Grabiec	2d18360157	row_cache: Consume range tombstones incrementally Before the patch, all range tombstones up to the next row were copied into a vector, and then put into the buffer until it's full. This would get quadratic if there is much more range tombstones than fit in a buffer. The fix is to avoid the accumulation of all tombstones in the vector and invoke the callback instead, which stops the iteartion as soon as the buffer is full. Fixes #2581.	2021-07-26 17:48:05 +02:00
Tomasz Grabiec	cf958b0ad0	row_cache: Emit range tombstone adjacent to upper bound of population range Cache populating reader was emitting the row entry which stands for the upper bound of the population range, but did not emit range tombstones for the clustering range corresponding to: [ before(key), after(key) ). This surfaces after sstable readers are changed to trim emitted range tombstones to the fast-forwarding range. Before, it didn't cause problems, because that range tombstone part would be emitted as part of the sstable read. The fix is to drop the optimization which pushes the row after population is done, and let the regular handling for copy_from_cache_to_buffer() take care of emitting the row and tombstones for the remaining range. A unit test is added which covers population from all sstable versions.	2021-06-16 00:23:49 +02:00
Michael Livshin	9ef2317248	row_cache: count range tombstones processed during read Refs #7749. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210602152210.17948-1-michael.livshin@scylladb.com>	2021-06-14 14:29:05 +02:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Tomasz Grabiec	6863a5e43b	row_cache: Avoid generating overlapping range tombstones Row cache reader can produce overlapping range tombstones in the mutation fragment stream even if there is only a single range tombstone in sstables, due to #2581. For every range between two rows, the row cache reader queries for tombstones relevant for that range. The result of the query is trimmed to the current position of the reader (=position of the previous row) to satisfy key monotonicity. The end position of range tombstones is left unchanged. So cache reader will split a single range tombstone around rows. Those range tombstones are transient, they will be only materialized in the reader's stream, they are not persisted anywhere. That is not a problem in itself, but it interacts badly with mutation compactor due to #8625. The range_tombstone_accumulator which is used to compact the mutation fragment stream needs to accumulate all tombstones which are relevant for the current clustering position in the stream. Adding a new range tombstone is O(N) in the number of currently active tombstones. This means that producing N rows will be O(N^2). In a unit test, I saw reading 137'248 rows which overlap with a range tombstone take 245 seconds. Almost all of CPU time is in drop_unneeded_tombstones(). The solution is to make the cache reader trim range tombstone end to the currently emited sub-range, so that it emits non-overlapping range tombstones. Fixes #8626.	2021-05-12 00:10:24 +02:00
Tomasz Grabiec	6c168ee0eb	row_cache: Always touch the partition on entry This fixes a potential cause for reactor stalls during memory reclamation. Applies only to schemas without clustering columns. Every partition in cache has a dummy row at the end of the clustering range (last dummy). That dummy must be evicted last, because MVCC logic needs it to be there at all times. If LRU picks it for eviction and it's not the last row, eviction does nothing and moves on. Eventually, all other rows in this partition will be evicted too and then the partition will go away with it. Mutation reader updates the position of rows in the LRU (aka touching) as it walks over them. However, it was not always touching the last dummy row. If the partition was fully cached, and schema had no clustering key, it would exit early before reaching the last dummy row, here: inline void cache_flat_mutation_reader::move_to_next_entry() { clogger.trace("csm {}: move_to_next_entry(), curr={}", fmt::ptr(this), _next_row.position()); if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) { move_to_next_range(); That's because no_clustering_row_between() is always true for any key in tables with no clustering columns, and the reader advances to end-of-stream without advancing _next_row to the last dummy. This is expected and desired, it means that the query range ends at the current row and there is no need to move further. We would not take this exit for tables with a non-singular clustering key domain and open-ended query range, since there would be a key gap before the last dummy row. Refs #2972. The effect of leaving the last dummy row not touched will be that such scans will segregate rows in the LRU, bring all regular rows to the front, and dummy rows at the tail. When eviction reaches the band of dummy rows, it will have to walk over it, because evicting them releases no memory. This can cause a reactor stall. An easy fix for the scenario would be to always touch the dummy entry when entering the partition. It's unlikely that the read will not proceed to the regular rows. It would be best to avoid linking such dummies in the LRU, but that's a much more complex change. Discovered in perf_row_cache_update, test_small_partitions(). I saw 200ms stalls with -m8G. Refs #8541. Tests: - row_cache_test (release) - perf_simple_query [no change] Message-Id: <20210427111619.296609-1-tgrabiec@scylladb.com>	2021-04-28 21:59:28 +03:00
Benny Halevy	13dfc41d8c	row_cache: cache_flat_mutation_reader: close underlying readers Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	0a2670c9ec	row_cache: hold read_context as unique_ptr Such that the holder, that is responsible for closing the read_context before destroying it, holds it uniquely. cache_flat_mutation_reader may be constructed either with a read_context&, where it knows that the read_context is owned externally, by the caller, or it could be constructed with a std::unique_ptr<read_context> in which case it assumes ownership of the read_context and it is now responsible for closing it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Pavel Emelyanov	2a7171110d	cache_flat_mutation_reader: Generalize range tombstones emission The range tombstone can be added-to-buffer from two places: when it was found in cache and when it was read from the underlying reader. Both adders can now be generalized. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-16 17:55:46 +03:00
Pavel Emelyanov	2e98cfbf1d	cache_flat_mutation_reader: Tune forward progress check When adding a range tombstone to the buffer the need to stop stuffing the already full one is only done if this particular range timbstone changes the lower_bound. This check can be tuned -- if the lower bound changed _at_ _all_ after a range tombstone was added, we may still abort the loop. This change will allow to generalize range tombstone emission by the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-16 17:55:46 +03:00
Pavel Emelyanov	a35de6ea3e	cache_flat_mutation_reader: Use rows insertion sugar When inserting a rows_entry via unique_ptr the ptr inquestion can be pushed as is, the intrusive btree code releases the pointer (to be exception safe) itself. This makes the code a bit shorter and simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-16 17:55:46 +03:00
Pavel Emelyanov	df488dd8ac	cache_flat_mutation_reader: Move state field There are two alignment gaps in the middle of the c_f_m_r -- one after the state and another one after the set of bools. Keeping them togethers allows the compiler to pack the c_f_m_r better. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-16 17:55:46 +03:00
Pavel Emelyanov	bc3f910fc1	cache_flat_mutation_reader: Remove raiish comparator The instance of position_in_partition::tri_compare sits on the reader itself and just occupies memory. It can be created on demand all the more so it's only one place that needs it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-16 17:55:46 +03:00
Pavel Emelyanov	41352334ba	cache_flat_mutation_reader: Remove unused captured variable The captured timeout is not used in lambda. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-16 17:55:41 +03:00
Pavel Emelyanov	eb65f8ed6b	cache_flat_mutation_reader: Fix trace message text The entry inserted in this branch is not dummy, but an empty row. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-16 17:55:22 +03:00
Piotr Jastrzebski	1f644df09d	cache_flat_mutation_reader: fix do_fill_buffer Make sure that when a partition does not exist in underlying, do_fill_buffer does not try to fast forward withing this nonexistent partition. Test: unit(dev) Fixes #8435 Fixes #8411 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-04-12 21:08:40 +02:00
Pavel Emelyanov	4558eb3afc	partition_snapshot_row_cursor: Move cells hash creation to reader Right now call to .row() method may create hash on row's cells. It's counterintuitive to see a const method that transparently changes something it points to. Since the only caller of a row() who knows whether the hash creation is required is the cache reader, it's better to move the call to prepare_hash() into it. Other than making the .row() less surprising this also helps to get rid of the whole method by the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-09 12:18:29 +03:00
Tomasz Grabiec	cb0b8d1903	row_cache: Zap dummy entries when populating or reading a range This will prevent accumulation of unnecessary dummy entries. A single-partition populating scan with clustering key restrictions will insert dummy entries positioned at the boundaries of the clustering query range to mark the newly populated range as continuous. Those dummy entries may accumulate with time, increasing the cost of the scan, which needs to walk over them. In some workloads we could prevent this. If a populating query overlaps with dummy entries, we could erase the old dummy entry since it will not be needed, it will fall inside a broader continuous range. This will be the case for time series worklodas which scan with a decreasing (newest) lower bound. Refs #8153. _last_row is now updated atomically with _next_row. Before, _last_row was moved first. If exception was thrown and the section was retried, this could cause the wrong entry to be removed (new next instead of old last) by the new algorithm. I don't think this was causing problems before this patch. The problem is not solved for all the cases. After this patch, we remove dummies only when there is a single MVCC version. We could patch apply_monotonically() to also do it, so that dummies which are inside continuous ranges are eventually removed, but this is left for later. perf_row_cache_reads output after that patch shows that the second scan touches no dummies: $ build/release/test/perf/perf_row_cache_reads_g -c1 -m200M Rows in cache: 0 Populating with dummy rows Rows in cache: 265320 Scanning read: 142.621613 [ms], preemption: {count: 639, 99%: 0.545791 [ms], max: 0.526929 [ms]}, cache: 0/0 [MB] read: 0.023197 [ms], preemption: {count: 1, 99%: 0.035425 [ms], max: 0.032736 [ms]}, cache: 0/0 [MB] Message-Id: <20210226172801.800264-1-tgrabiec@scylladb.com>	2021-03-01 20:34:35 +02:00
Avi Kivity	d980f550d1	Merge 'row_cache: Make fill_buffer() preemptable when cursor leads with dummy rows' from Tomasz Grabiec fill_buffer() will keep scanning until _lower_bound_changed is true, even if preemption is signaled, so that the reader makes forward progress. Before the patch, we did not update _lower_bound on touching a dummy entry. The read will not respect preemption until we hit a non-dummy row. If there is a lot of dummy rows, that can cause reactor stalls. Fix that by updating _lower_bound on dummy entries as well. Refs #8153. Tested with perf_row_cache_reads: ``` $ build/release/test/perf/perf_row_cache_reads -c1 -m200M Rows in cache: 0 Populating with dummy rows Rows in cache: 373929 Scanning read: 183.658966 [ms], preemption: {count: 848, 99%: 0.545791 [ms], max: 0.519343 [ms]}, cache: 99/100 [MB] read: 120.951515 [ms], preemption: {count: 257, 99%: 0.545791 [ms], max: 0.518795 [ms]}, cache: 99/100 [MB] ``` Notice that max preemption latency is low in the second "read:" line. Closes #8167 * github.com:scylladb/scylla: row_cache: Make fill_buffer() preemptable when cursor leads with dummy rows tests: perf: Introduce perf_row_cache_reads row_cache: Add metric for dummy row hits	2021-02-28 21:00:20 +02:00
Tomasz Grabiec	b9c3b6c10f	row_cache: Make fill_buffer() preemptable when cursor leads with dummy rows fill_buffer() will keep scanning until _lower_bound_chnaged is true, even if preemption is signalled, so that the reader makes forward progress. Before the patch, we did not update _lower_bound on touching a dummy entry. The read will not respect preemption until we hit a non-dummy row. If there is a lot of dummy rows, that can cause reactor stalls. Fix that by updating _lower_bound on dummy entries as well. Refs #8153. Tested with perf_row_cache_reads: $ build/release/test/perf/perf_row_cache_reads -c1 -m200M Rows in cache: 0 Populating with dummy rows Rows in cache: 373929 Scanning read: 183.658966 [ms], preemption: {count: 848, 99%: 0.545791 [ms], max: 0.519343 [ms]}, cache: 99/100 [MB] read: 120.951515 [ms], preemption: {count: 257, 99%: 0.545791 [ms], max: 0.518795 [ms]}, cache: 99/100 [MB] Notice that max preemption latency is low in the second "read:" line.	2021-02-26 01:20:38 +01:00
Tomasz Grabiec	f0a3272a5f	row_cache: Add metric for dummy row hits This will help to diagnose performance problems related to the read having to walk through a lot of dummy rows to fill the buffer. Refs #8153	2021-02-25 18:26:01 +01:00
Benny Halevy	35256d1b92	treewide: explicitly use flat_mutation_reader_opt Unlike flat_mutation_reader_opt that is defined using optimized_optional<flat_mutation_reader>, std::optional<T> does not evaluate to `false` after being moved, only after it is explicitly reset. Use flat_mutation_reader_opt rather than std::optional<flat_mutation_reader> to make it easier to check if it was closed before it's destroyed or being assigned-over. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210215101254.480228-6-bhalevy@scylladb.com>	2021-02-17 17:57:34 +02:00
Pavel Emelyanov	5c0f9a8180	mutation_partition: Switch cache of rows onto B-tree The switch is pretty straightforward, and consists of - change less-compare into tri-compare - rename insert/insert_check into insert_before_hint - use tree::key_grabber in mutation_partition::apply_monotonically to exception-safely transfer a row from one tree to another - explicitly erase the row from tree in rows_entry::on_evicted, there's a O(1) tree::iterator method for this - rewrite rows_entry -> cache_entry transofrmation in the on_evicted to fit the B-tree API - include the B-tree's external memory usage into stats That's it. The number of keys per node was is set to 12 with linear search and linear extention of 20 because - experimenting with tree shows that numbers 8 through 10 keys with linear search show the best performance on stress tests for insert/find-s of keys that are memcmp-able arrays of bytes (which is an approximation of current clustring key compare). More keys work slower, but still better than any bigger value with any type of search up to 64 keys per node - having 12 keys per nodes is the threshold at which the memory footprint for B-tree becomes smaller than for boost::intrusive::set for partitions with 32+ keys - 20 keys for linear root eats the first-split peak and still performs well in linear search As a result the footpring for B tree is bigger than the one for BST only for trees filled with 21...32 keys by 0.1...0.7 bytes per key. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:30 +03:00
Benny Halevy	29002e3b48	flat_mutation_reader: return future from next_partition To allow it to asynchronously close underlying readers on next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-01-13 17:35:07 +02:00
Pavel Emelyanov	72c2482f73	mutation-partition: Construct rows_entry directly from clustering_row When a rows_entry is added to row_cache it's constructed from clustering_row by unpacking all its internals and putting them into the rows_entry's deletable_row. There's a shorter way -- the clustering_row already has the deletale_row onboard from which rows_entry can copy-construct its. This lets keeping the rows_entry and deletable_row set of constructors a bit shorter. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201224161112.20394-1-xemul@scylladb.com>	2020-12-24 18:13:44 +02:00
Pavel Emelyanov	3da3d448c8	range_tombstone: Remove unused schema arg from .set_start Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-06 15:13:05 +03:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Pavel Emelyanov	812eed27fe	code: Force formatting of pointer in .debug and .trace ... and tests. Printin a pointer in logs is considered to be a bad practice, so the proposal is to keep this explicit (with fmt::ptr) and allow it for .debug and .trace cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-26 20:44:11 +03:00
Botond Dénes	5e9a7d2608	row_cache: remove unnecessary includes of partition_snapshot_reader.hh Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200820124447.2561477-1-bdenes@scylladb.com>	2020-08-20 15:19:42 +02:00
Botond Dénes	196dd5fa9b	treewide: throw std::bad_function_call with backtraces We typically use `std::bad_function_call` to throw from mandatory-to-implement virtual functions, that cannot have a meaningful implementation in the derived class. The problem with `std::bad_function_call` is that it carries absolutely no information w.r.t. where was it thrown from. I originally wanted to replace `std::bad_function_call` in our codebase with a custom exception type that would allow passing in the name of the function it is thrown from to be included in the exception message. However after I ended up also including a backtrace, Benny Halevy pointed out that I might as well just throw `std:bad_function_call` with a backtrace instead. So this is what this patch does. All users are various unimplemented methods of the `flat_mutation_reader::impl` interface. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200408075801.701416-1-bdenes@scylladb.com>	2020-04-08 13:54:06 +02:00
Tomasz Grabiec	c88a4e8f47	mvcc: Introduce partition_snapshot::touch()	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	0675088818	row_cache: Use the correct schema version to populate the partition entry The sstable reader which populates the partition entry in the cache is using the schema of the partition entry snapshot, which will be the schema of the cache at the time the partition was entered. If there was a schema change after the cache reader entered the partition but before it created the sstable reader, the cache populating reader will interpret sstable fragments using the wrong schema version. That is more likely if partitions have many rows, and the front of the partition is populated. With single-row partitions that's unlikely to happen. That is undefined behavior in general, which may include: - read failures due to bad_alloc, if fixed-size cells are interpreted as variable-sized cells, and we misinterpret a value for a huge size - wrong read results - node crash This doesn't result in a permanent corruption, restarting the node should help. Fixes #5127.	2019-10-03 22:03:28 +02:00
Paweł Dziepak	637b9a7b3b	atomic_cell_or_collection: make operator<< show cell content After the new in-memory representation of cells was introduced there was a regression in atomic_cell_or_collection::operator<< which stopped printing the content of the cell. This makes debugging more incovenient are time-consuming. This patch fixes the problem. Schema is propagated to the atomic_cell_or_collection printer and the full content of the cell is printed. Fixes #3571. Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>	2018-10-24 13:29:51 +03:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Tomasz Grabiec	477d7b439b	row_cache: Fix violation of continuity on concurrent eviction and population ensure_population_lower_bound() returned true if current clustering range covers all rows, which means that the populator has a right to set continuity flag to true on the row it inserts. This is correct only if the current population range actually starts since before all clustering rows. Otherwise we're populating since _last_row, and should consult it. The fix introduces a new flag, set when starting to populte, which indicates if we're populating from the beginning of the range or not. We cannot simply check if _last_row is set in ensure_population_lower_bound() because _last_row can be set and then become empty again. Fixes #3608	2018-07-17 16:43:21 +02:00

1 2

75 Commits