scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 13:45:53 +00:00

Author	SHA1	Message	Date
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Mikołaj Sielużycki	6dd9f63f3b	memtable-sstable: Track existence of tombstones in memtable. Add flags if memtable contains tombstones. They can be used as a heuristic to determine if a memtable should be compacted on flush. It's an intermediate step until we can compact during applying mutations on a memtable.	2021-11-29 13:06:12 +01:00
Michał Radwański	771f3b12bd	memtable: enable native reversing This commit consists of changes, which need to reside in a single commit, so that the tests pass on each of the commits. 1. Remove do_make_flat_reader which disabled reverse reads by making the slice a forward one. Remove call to get_ranges which would do superfluous reversal of clustering ranges. 2. test: cql_query_test: remove expectation that the test_query_limit fails for reversed queries, since reversed queries no longer require linear memory wrt. the result size, when paginated.	2021-10-10 20:38:18 +02:00
Botond Dénes	41facb3270	treewide: move reversing to the mutation sources Push down reversing to the mutation-sources proper, instead of doing it on the querier level. This will allow us to test reverse reads on the mutation source level. The `max_size` parameter of `consume_page()` is now unused but is not removed in this patch, it will be removed in a follow-up to reduce churn.	2021-09-29 12:15:45 +03:00
Michael Livshin	2ee9f1b951	memtables: add metric and accounter for range tombstone reads Refs #7749. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-08-01 19:41:11 +03:00
Botond Dénes	0f36e5c498	memtable: migrate off the global reader concurrency semaphore Require the caller of `create_flush_reader()` to pass a permit instead.	2021-07-08 12:31:36 +03:00
Calle Wilund	fdb5801704	table: Always use explicit commitlog discard + clear out rp_set Fixes #8733 If a memtable flush is still pending when we call table::clear(), we can end up doing a "discard-all" call to commitlog, followed by a per-segment-count (using rp_set) _later_. This will foobar our internal usage counts and quite probably cause assertion failures. Fixed by always doing per-memtable explicit discard call. But to ensure this works, since a memtable being flushed remains on memtable list for a while (why?), we must also ensure we clear out the rp_set on discard. v3: * Fix table::clear to discard rp_sets before memtables Closes #8894	2021-06-21 14:53:54 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Calle Wilund	131da30856	table: Always use explicit commitlog discard + clear out rp_set Fixes #8733 If a memtable flush is still pending when we call table::clear(), we can end up doing a "discard-all" call to commitlog, followed by a per-segment-count (using rp_set) _later_. This will foobar our internal usage counts and quite probably cause assertion failures. Fixed by always doing per-memtable explicit discard call. But to ensure this works, since a memtable being flushed remains on memtable list for a while (why?), we must also ensure we clear out the rp_set on discard. Closes #8766	2021-06-06 09:21:23 +03:00
Raphael S. Carvalho	738049cba2	memtable: Track min timestamp Tracking both min and max timestamp will be required for memtable flush to short-circuit interposer consumer if needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-01-04 13:24:43 -03:00
Pavel Emelyanov	92f58f62f2	headers:: Remove flat_mutation_reader.hh from several other headers All they can live with forward declaration of the f._m._r. plus a seastar header in commitlog code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:54:47 +03:00
Pavel Emelyanov	4d2f5f93a4	memtable: Switch onto B+ rails The change is the same as with row-cache -- use B+ with int64_t token as key and array of memtable_entry-s inside it. The changes are: Similar to those for row_cache: - compare() goes away, new collection uses ring_position_comparator - insertion and removal happens with the help of double_decker, most of the places are about slightly changed semantics of it - flags are added to memtable_entry, this makes its size larger than it could be, but still smaller than it was before Memtable-specific: - when the new entry is inserted into tree iterators _might_ get invalidated by double-decker inner array. This is easy to check when it happens, so the invalidation is avoided when possible - the size_in_allocator_without_rows() is now not very precise. This is because after the patch memtable_entries are not allocated individually as they used to. They can be squashed together with those having token conflict and asking allocator for the occupied memory slot is not possible. As the closest (lower) estimate the size of enclosing B+ data node is used Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	dff5eb6f25	memtable: Count partitions separately The B+ will not have constant-time .size() call, so do it by hands Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Pavel Emelyanov	7b2754cf5f	row-cache: Use ring_position_comparator in some places The row cache (and memtable) code uses own comparators built on top of the ring_position_comparator for collections of partitions. These collections will be switched from the key less-compare to the pair of token less-compare + key tri-compare. Prepare for the switch by generalizing the ring_partition_comparator and by patching all the non-collections usage of less-compare to use one. The memtable code doesn't use it outside of collections, but patch it anyway as a part of preparations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-14 16:30:02 +03:00
Botond Dénes	9ede82ebf8	memtable: pass a valid permit to the delegate reader All reader are soon going to require a valid permit, so make sure we have a valid permit which we can pass to the delegate reader when creating it. This means `memtable::make_flat_reader()` now also requires a permit to be passed to it. Internally the permit is stored in `scanning_reader`, which is used both for flushes and normal reads. In the former case a permit is not required.	2020-05-28 11:34:35 +03:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Tomasz Grabiec	ea461a3884	memtable: Extract memtable_entry::upgrade_schema()	2019-10-03 22:03:29 +02:00
Paweł Dziepak	341f186933	memtable: move encoding_stats_collector implementation out of header	2019-02-07 10:16:50 +00:00
Benny Halevy	bd6861989d	sstables: mc: use proper gc_clock types for local_deletion_time and ttl Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Benny Halevy	e2c4d2d60a	memtable: extract encoding_stats_collector base class to encoding_stats header file To be used also by compaction. Refs #3971 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-16 17:59:58 +02:00
Paweł Dziepak	635873639b	Merge "Encoding stats enhancements" from Benny " Cleanup various cases related to updating of metatdata stats and encoding stats updating in preparation for 64-bit gc_clock (#3353). Fixes #4026 Fixes #4033 Fixes #4035 Fixes #4041 Refs #3353 " * 'projects/encoding-stats-fixes/v6' of https://github.com/bhalevy/scylla: sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES sstables: mc: use api::timestamp_type in write_liveness_info sstables: mc: sstable_write encoding_stats are const mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time memtable: don't use encoding_stats epochs as default memtable: mc: udpate min_ttl encoding stats for dead row marker memtable: mc: add comment regarding updating encoding stats of collection tombstones sstables: metadata_collector: add update tombstone stats sstables: assert that delete_time is not live when updating stats sstables: move update_deletion_time_stats to metadata collector sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram sstables: mc: write_liveness_info and write_collection should update tombstone_histogram sstables: update_local_deletion_time for row marker deletion_time and expiration	2019-01-15 16:53:36 +02:00
Benny Halevy	238866228f	memtable: rename get_stats to get_encoding_stats For symmetry reasons to similar sstable and compaction methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190113105155.29118-2-bhalevy@scylladb.com>	2019-01-14 14:58:43 +02:00
Benny Halevy	2c99eb28d8	memtable: don't use encoding_stats epochs as default Why default to an artificial minimum when you can do better with zero effort? Track the actual minima in the memtable instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	9b78911379	memtable: mc: udpate min_ttl encoding stats for dead row marker Update min ttl with expired_liveness_ttl (although it's value of max int32 is not expected to affect the minimum). Fixes #4041 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	47964d9ddc	memtable: mc: add comment regarding updating encoding stats of collection tombstones When the row flag has_complex_deletion is set, some collection columns may have deletion tombstones and some may not. we don't strictly need to update stats will not affect the encoding_stats anyway. Fixes #4035 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-13 14:17:45 +02:00
Benny Halevy	60323b79d1	sstables: mc: sign-extend delta local_deletion_time and delta ttl Follow Cassandra's encoding so that values that are less than the baseline encoding_stats will wrap-around in 64-bits rather tham 32. Fixes #4074 Refs #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com>	2019-01-09 21:43:30 +02:00
Vladimir Krivopalov	a95ba2f38a	memtable: Track regular and shadowable tombstones separately in encoding_stats_collector. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Tomasz Grabiec	074be4d4e8	memtable, cache: Run mutation_cleaner worker in its own scheduling group The worker is responsible for merging MVCC snapshots, which is similar to merging sstables, but in memory. The new scheduling group will be therefore called "memory compaction". We should run it in a separate scheduling group instead of main/memtables, so that it doesn't disrupt writes and other system activities. It's also nice for monitoring how much CPU time we spend on this.	2018-06-27 21:51:04 +02:00
Tomasz Grabiec	6c6ffaee71	mutation_cleaner: Make merge() redirect old instance to the new one If memtable snapshot goes away after memtable started merging to cache, it would enqueue the snapshots for cleaning on the memtable's cleaner, which will have to clean without deferrring when the memtable is destroyed. That may stall the reactor. To avoid this, make merge() cause the old instance of the cleaner to redirect to the new instance (owned by cache), like we do for regions. This way the snapshots mentioned earlier can be cleaned after memtable is destroyed, gracefully.	2018-06-27 21:51:04 +02:00
Tomasz Grabiec	450985dfee	mvcc: Use RAII to ensure that partition versions are merged Before this patch, maybe_merge_versions() had to be manually called before partition snapshot goes away. That is error prone and makes client code more complicated. Delegate that task to a new partition_snapshot_ptr object, through which all snapshots are published now.	2018-06-27 21:51:04 +02:00
Paweł Dziepak	aa25f0844f	atomic_cell: introduce fragmented buffer value interface As a prepratation for the switch to the new cell representation this patch changes the type returned by atomic_cell_view::value() to one that requires explicit linearisation of the cell value. Even though the value is still implicitly linearised (and only when managed by the LSA) the new interface is the same as the target one so that no more changes to its users will be needed.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	ec9d166a4f	treewide: require type to compute cell memory usage	2018-05-31 15:51:11 +01:00
Paweł Dziepak	93130e80fb	atomic_cell: require column_definition for creating atomic_cell views	2018-05-31 15:51:11 +01:00
Tomasz Grabiec	3f19f76c67	mvcc: Destroy memtable partition versions gently Now all snapshots will have a mutation_cleaner which they will use to gently destroy freed partition_version objects. Destruction of memtable entries during cache update is also using the gentle cleaner now. We need to have a separate cleaner for memtable objects even though they're owned by cache's region, because memtable versions must be cleared without a cache_tracker. Each memtable will have its own cleaner, which will be merged with the cache's cleaner when memtable is merged into cache. Fixes some sources of reactor stalls on cache update when there are large partition entries in memtables.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	40cc766cf2	database: Add API for incremental clearing of partition entries Partitions can get very large. Destroying them all at once can stall the reactor for significant amount of time. We want to avoid that by doing destruction incrementally, deferring in between. A new API is added for that at various levels: stop_iteration clear_gently() noexcept; It returns stop_iteration::yes when the object is fully cleared and can be now destroyed quickly. So a deferring destruction can look like this: return repeat([this] { return clear_gently(); }); The reason why clear_gently() doesn't return a future<> itself is that some contexts cannot defer, like memory reclamation.	2018-05-30 12:18:56 +02:00
Glauber Costa	68d1c64e7a	memtable: also keep track of max timestamp We are now keeping track of the minimum timestamp in a memtable. Also keep track of the max timestamp so we can know what it is before we finish flushing the entire memtable to an SSTable. Will be used by partially written SSTables undergoing TWCS. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 12:55:58 -04:00
Paweł Dziepak	863a96db48	Merge "Fix partition tombstones for SSTables 3.x" from Vladimir "Previously, partition tombstone was not written for partitions with no rows causing corrupted data files. This is now fixed and covered with tests. In addition, we now track partition tombstones while collecting encoding statistics." * 'projects/sstables-30/fix-partition-tombstone/v3' of https://github.com/argenet/scylla: tests: Don't use deprecated schema constructor. tests: Add tests to cover partitions consisting only of partition keys. sstables: Make sure partition level tombstone is written for partitions with no rows. memtable: Collect statistics from partition-level tombstone.	2018-05-10 16:27:20 +01:00
Vladimir Krivopalov	ffc3a1ffeb	memtable: Collect statistics from partition-level tombstone. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-10 07:28:50 -07:00
Paweł Dziepak	0b4c6b8938	types: make some collection_type_impl functions non-static The switch to the new in-memory representation will require a larger parts of the logic be aware of the type of the values they are dealing with. In most cases it is not a significant burden for the users.	2018-05-09 16:52:26 +01:00
Vladimir Krivopalov	948c4d79d3	Collect encoding statistics for memtable updates. We keep track of all updates and store the minimal values of timestamps, TTLs and local deletion times across all the inserted data. These values are written as a part of serialization_header for Statistics.db and used for delta-encoding values when writing Data.db file in SSTables 3.0 (mc) format. For #1969. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-04-25 15:39:14 -07:00
Tomasz Grabiec	d85d651e0f	memtable: Make printable Useful when debugging test failures.	2018-02-06 14:24:19 +01:00
Paweł Dziepak	ea7248056f	memtable: drop memtable_entry::read()	2018-01-30 18:33:26 +01:00
Paweł Dziepak	0420ca48a5	paratition_snapshot_reader: minimise amount of retryable code Retryable code that has side effects is a recipe for bugs. This patch reworkds the snapshot reader so that the amount of logic run with reclamation disabled is minimal and has a very limited side effects.	2018-01-30 18:33:26 +01:00
Piotr Jastrzebski	17f2eb8ff7	Remove memtable::make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 11:47:07 +01:00
Paweł Dziepak	32eb6437fd	memtable: make make_flush_reader() return flat_mutation_reader	2017-11-27 20:07:22 +01:00
Glauber Costa	c2f49da609	partition: add method to calculate memory size of a partition Once that is added, also add a method to a memtable entry to calculate the entire size of a memtable entry. Right now we only have one method to calculate the size minus rows. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 16:21:44 -05:00
Glauber Costa	b98a48657e	memtable: add a method to export memtable's dirty memory manager It will be used by the cache update process to gradually return real dirty memory to the manager. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 16:21:44 -05:00

1 2 3

116 Commits