scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	cec9b2d114	mutation_partition_v2: Implement compact() For convenience, will be used in unit tests.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	4317999ca4	cache_tracker: Extract insert(mutation_partition_v2&)	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	c7f7377ea3	mvcc, mutation_partition: Document guarantees in case merging succeeds It's not obvious that invariants for partial merge do not hold for a completed merge. This is due to the fact that an empty source partition, which is always empty after merge, is always fully continuous.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	8ae78ffebd	mutation_partition_v2: Accept arbitrary preemption source in apply_monotonically() Will be useful in testing to exhaustivaly test preemption scenarios.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	8883ac30cf	mutation_partition_v2: Simplify get_continuity()	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	d9e27abe87	row_cache: Distinguish dummy insertion site in trace log	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	026f8cc1e7	db: Use mutation_partition_v2 in mvcc This patch switches memtable and cache to use mutation_partition_v2, and all affected algorithms accordingly. The memtable reader was changed to use the same cursor implementation which cache uses, for improved code reuse and reducing risk of bugs due to discrepancy of algorithms which deal with MVCC. Range tombstone eviction in cache has now fine granularity, like with rows. Fixes #2578 Fixes #3288 Fixes #10587	2023-01-27 21:56:28 +01:00
Tomasz Grabiec	ccf3a13648	range_tombstone_change_merger: Introduce peek() Returns the current tombstone without affecting state.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	42f5a7189d	readers: Extract range_tombstone_change_merger	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	6b7473be53	mvcc: partition_snapshot_row_cursor: Handle non-evictable snapshots This is a prerequisite for using the cursor in memtable readers. Non-evictable snapshots are those which live in memtables. Unlike evictable snapshots, they don't have a dummy entry at position after all clustering rows. In evictable snapshots, lookup always finds an entry, not so with non-evictable snapshots. The cursor was not prepared for this case, this patch handles it.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	091ad8f6ee	mvcc: partition_snapshot_row_cursor: Support digest calculation Prerequisite for using in memtable reader.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	195b40315a	mutation_partition_v2: Store range tombstones together with rows This patch changes mutation_partition_v2 to store range tombstone information together with rows. This mainly affects the version merging algorithm, mutation_partition_v2::apply_monotonically(). Continuity setting no longer can drop dummy entry unconditionally since it may be a boundary of a range tombstone. Memtable/cache is not switched yet. Refs #10587 Refs #3288	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	7e6056b3cc	db: Introduce mutation_partition_v2 Intended to be used in memtable/cache, as opposed to the old mutation_partition which will be intended to be used as temporary object. The two will have different trade-offs regarding memory efficiency and algorithms. In this commit there is no change in logic, the class is mostly copied. Some methods which are not needed on the v2 model were removed from the interface. Logic changes will be introduced in later commits.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	806f698272	doc: Introduce docs/dev/mvcc.md This extracts information which was there in row_cache.md, but is relevant to MVCC in general. It also makes adaptations and reflects the upcoming changes in this series related to switching to the new mutation_partition_v2 model: - continuity in evictable snapshots can now overlap. This is needed to represent range tombstone information, which is linked to continuity information. - description of range tombstone representation was added	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	27882ff19e	db: cache_tracker: Introduce insert() variant which positions before existing entry in the LRU	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	a574a1cc4e	db: Print range_tombstone bounds as position_in_partition It's the standard now which replaced bound_view. Will be consistent with how range tombstone bounds are represented in mutation_partition_v2 (as rows_entry::position()).	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	40719c600c	test: memtable_test: Relax test_segment_migration_during_flush Partition version merging can now insert sentinels, which may temporarily increase unspooled memory. It is no longer true that unspooled monotonically decreases, which the test verified. Relax it, and only verify that unspooled is smaller than real dirty.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	31bcc3b861	test: cache_flat_mutation_reader: Avoid timestamp clash api::new_timestamp() is not monotonic. In test_single_row_and_tombstone_not_cached_single_row_range1, we generate a deletion and an insertion in the deleted reange. If they get the same timestamp, the inserted row will be covered. This will surface after cache starts to compact rows with range tombstones.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	25683449e4	test: cache_flat_mutation_reader_test: Use monotonic timestamps when inserting rows When inserting range tombstones, the test uses api::new_timestamp(), but when inserting rows, it uses a fixed timestamp of 1. This will be problematic when rows get compacted with range tombstone, all rows would get compacted away, which is not expected by the test. To fix this, let's use the same timestamp source as range tombstones. This way rows will get a later timestamp.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	71057412ed	test: mvcc: Fix sporadic failures due to compact_for_compaction() compact_for_compaction() will perform cell expiration based on gc_clock::now(), which introduces sporadic mismatches due to expiry status of a row marker. Drop this, we can rely on compaction done by is_equal_to_compacted()	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	f908713290	test: lib: random_mutation_generator: Produce partition tombstone less often This tombstone has a high chance of obliterating all data, which will make tests which involve partition version merging not very interesting. The result will be an empty partition with a tombstone. Reduce its frequency, so that in MVCC there is a significant chance of having live data in the combined entry where individual versions come from the generator.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	3bf8052be4	test: lib: random_utils: Introduce with_probability()	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	c386874e18	test: lib: Improve error message in has_same_continuity()	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	08f68c5f20	test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	5aa8cb56a8	test: mvcc: Insert entries in the tracker evictable snapshots must have all entries added to the tracker. Partition version merging assumes this. Before this was benign, but will start to trigger asserts in mutation_partition_v2.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	9d38997971	test: mvcc_test: Do not set dummy::no on non-clustering rows This will trigger an assert in apply_monotonically() later in the series, where this row would be merged with a dummy at the same position. This row must not be marked as non-dummy, there is an assumption that non-clustering positions are all dummies. There can't be two entries with the same position an a different dummy status.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	f79072638d	mutation_partition: Print full position in error report in append_clustered_row() std::prev(i) can be dummy.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	6a305666a4	db: mutation_cleaner: Extract make_region_space_guard() Will be used in more places.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	833e2a8d30	position_in_partition: Optimize equality check We can avoid key comparsion if bound weights don't match.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	95b509afcd	mvcc: Fix version merging state resetting Upon entry to merge_partition_versions() we skip over versions which are not referenced in order to start merging from the oldest unreferenced version, which is good for performance. Later, we reallocate version merging state if we detected such a move, so that we don't reuse state allocated for a different version pair than before. This check was using version_no, the counter of skipped versions to detect this. But this only makes sense if each merge_partition_versions() uses the same version pointer as a base. In fact it doesn't, if we skip, we advance _version, so the skip is persisted in the snapshot. It's enough to discard the version merging state when we do that. This shouldn't have effect on existing code base, since there is currently no way to trigger the version skipping loggic.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	1c4b5b0b6b	mutation_partition: apply_resume: Mark operator bool() as explicit	2023-01-27 19:15:38 +01:00
Kamil Braun	5eadea301e	Merge 'pytest: start after ungraceful stop' from Alecco If a server is stopped suddenly (i.e. not graceful), schema tables might be in inconsistent state. Add a test case and enable Scylla configuration option (force_schema_commit_log) to handle this. Fixes #12218 Closes #12630 * github.com:scylladb/scylladb: pytest: test start after ungraceful stop test.py: enable force_schema_commit_log	2023-01-26 12:08:33 +01:00
Botond Dénes	ba26770376	tools/schema_loader: data_dictionary_impl:try_find_table(): also check ks name Although the number of keyspaces should mostly be 1 here, and thus the chance of two tables from different keyspaces colliding is miniscule, it is not zero. Better be safe than sorry, so match the keyspace name too when looking up a table. Closes #12627	2023-01-25 22:04:07 +02:00
Raphael S. Carvalho	87ee547120	table: Fix quadratic behavior when inserting sstables into tracker on schema change Each time backlog tracker is informed about a new or old sstable, it will recompute the static part of backlog which complexity is proportional to the total number of sstables. On schema change, we're calling backlog_tracker::replace_sstables() for each existing sstable, therefore it produces O(N ^ 2) complexity. Fixes #12499. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12593	2023-01-25 21:57:33 +02:00
Botond Dénes	bdd4b25c61	scylla-gdb.py: scylla memory: remove 'sstable reads' from semaphore names This phrase is inaccurate and unnecessary. We know all lines in the printout are for reads and they are semaphores: no need to repeat this information on each line. Example: Read Concurrency Semaphores: read: 0/100, 0/ 41901096, queued: 0 streaming: 0/ 10, 0/ 41901096, queued: 0 system: 0/ 10, 0/ 41901096, queued: 0 Closes #12633	2023-01-25 21:55:27 +02:00
Nadav Har'El	f4f2d608d7	dbuild: fix path in example in README The dbuild README has an example how to enable ccache, and required modifying the PATH. Since recently, our docker image includes required commands (cxxbridge) in /usr/local/bin, so the build will fail if that directory isn't also in the path - so add it in the example. Also use the opportunity to fix the "/home/nyh" in one example to "$HOME". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12631	2023-01-25 21:54:44 +02:00
Kamil Braun	a0ff33e777	test/pylib: scylla_cluster: don't leak server if stopping it fails `ScyllaCluster.server_stop` had this piece of code: ``` server = self.running.pop(server_id) if gracefully: await server.stop_gracefully() else: await server.stop() self.stopped[server_id] = server ``` We observed `stop_gracefully()` failing due to a server hanging during shutdown. We then ended up in a state where neither `self.running` nor `self.stopped` had this server. Later, when releasing the cluster and its IPs, we would release that server's IP - but the server might have still been running (all servers in `self.running` are killed before releasing IPs, but this one wasn't in `self.running`). Fix this by popping the server from `self.running` only after `stop_gracefully`/`stop` finishes. Make an analogous fix in `server_start`: put `server` into `self.running` before we actually start it. If the start fails, the server will be considered "running" even though it isn't necessarily, but that is OK - if it isn't running, then trying to stop it later will simply do nothing; if it is actually running, we will kill it (which we should do) when clearing after the cluster; and we don't leak it. Closes #12613	2023-01-25 16:58:02 +02:00
Alejo Sanchez	878cb45c24	pytest: test start after ungraceful stop Test case for a start of a server after it was stopped suddenly (instead of gracefully). This coud cause commitlog flush issues. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-25 14:49:27 +01:00
Alejo Sanchez	ccbd89f0cd	test.py: enable force_schema_commit_log To handle start after ungraceful stop, enable separate schema commit log from server start. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-25 14:49:27 +01:00
Kamil Braun	5c886e59de	Merge 'Enable Raft by default in new clusters' from Kamil Braun New clusters that use a fresh conf/scylla.yaml will have `consistent_cluster_management: true`, which will enable Raft, unless the user explicitly turns it off before booting the cluster. People using existing yaml files will continue without Raft, unless consistent_cluster_management is explicitly requested during/after upgrade. Also update the docs: cluster creation and node addition procedures. Fixes #12572. Closes #12585 * github.com:scylladb/scylladb: docs: mention `consistent_cluster_management` for creating cluster and adding node procedures conf: enable `consistent_cluster_management` by default	2023-01-25 14:09:38 +01:00
Benny Halevy	82011fc489	dht: incremental_owned_ranges_checker: belongs_to_current_node: mark as const Its _it member keeps state about the current range. Although it's modified by the method, this is an implementation detail that irrelevant to the caller, hence mark the belongs_to_current_node method as const (and noexcept while at it). This allows the caller, cleanup_compaction, to use it from inside a const method, without having to mark its respective member as mutable too. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12634	2023-01-25 14:52:21 +02:00
Alexey Novikov	ce96b472d3	prevent populating cache with expired rows from sstables change row purge condition for compacting_reader to remove all expired rows to avoid read perfomance problems when there are many expired tombstones in row cache Refs #2252 Closes #12565	2023-01-25 12:59:40 +01:00
Kamil Braun	5bc7f0732e	Merge 'test.py: manual cluster pool handling for Python suite' from Alecco From reviews of https://github.com/scylladb/scylladb/pull/12569, avoid using `async with` and access the `Pool` of clusters with `get()`/`put()`. Closes #12612 * github.com:scylladb/scylladb: test.py: manual cluster handling for PythonSuite test.py: stop cluster if PythonSuite fails to start test.py: minor fix for failed PythonSuite test	2023-01-24 17:37:55 +01:00
Nadav Har'El	b28818db06	Merge 'Make regexes in types.cc static and remove unnecessary tolower transform' from Marcin Maliszkiewicz - makes all regexes static If making regex compilation static for uuid_type_impl and timeuuid_type_impl helps then it should also help for timestamp_type and simple_date_type. - remove unnecessary tolower transform in simple_date_type_impl::from_sstring Following function uses only decimal and '-' characters (see date_re). They are not affected by tolower call in any way. Aditionally std::strtoll supports "0x" prefixes but also accepts upper case version "0X" so it's also not affected by tolower call. get_simple_date_time only casts strings to integer types using boost:lexical_cast so also not affected by tolower. Finally, serialize only uses str to include it in an exception text so tolower doesn't affect it in a positive way. It's even better that input is displayed to the user as it was, not converted to lower case. Closes #12621 * github.com:scylladb/scylladb: types: remove unnecessary tolower transform in simple_date_type_impl::from_sstring types: make all regexes static	2023-01-24 16:13:59 +02:00
Pavel Emelyanov	f6e8b64334	snitch: Use set_my_dc_and_rack() on all shards Most of snitch drivers set _my_dc and _my_rack with direct assignment thus skipping the sanity checks for dc/rack being empty. On other shards they call set_my_dc_and_rack() helper which warns the empty value and replaces it with some defaults. It's better to use the helper on all shards in order to have the same dc/rack values everywhere. refs: #12185 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12524	2023-01-24 14:17:06 +02:00
Nadav Har'El	55558e1bd7	test/alternator: check operation on invalid TableName Issue #12538 suggested that maybe Alternator shouldn't bother reporting an invalid table name in item operations like PutItem, and that it's enough to report that the table doesn't exist. But the test added in this patch shows that DynamoDB, like Alternator, reports the invalid table name in this case - not just that the table doesn't exist. That should make us think twice before acting on issue #12538. If we do what this issue recommended, this test will need to be fixed (e.g., to accept as correct both types of errors). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12608	2023-01-24 14:14:39 +02:00
Kefu Chai	4a0134a097	db: system_keyspace: take the reserved_memory into account before this change, we returns the total memory managed by Seastar in the "total" field in system.memory. but this value only reflect the total memory managed by Seastar's allocator. if `reserve_additional_memory` is set when starting app_template, Seastar's memory subsystem just reserves a chunk of memory of this specified size for system, and takes the remaining memory. since `f05d612da8`, we set this value to 50MB for wasmtime runtime. hence the test of `TestRuntimeInfoTable.test_default_content` in dtest fails. the test expects the size passed via the option of `--memory` to be identical to the value reported by system.memory's "total" field. after this change, the "total" field takes the reserved memory for wasm udf into account. the "total" field should reflect the total size of memory used by Scylla, no matter how we use a certain portion of the allocated memory. Fixes #12522 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12573	2023-01-24 14:07:44 +02:00
Anna Stuchlik	3cbe657b24	doc: fixes https://github.com/scylladb/scylla-docs/issues/3706 , v2 of https://github.com/scylladb/scylladb/pull/11638 , add a note about performance penalty in non-frozen connections vs frozen connections and UDT, add a link to the blog post about performance Closes #12583	2023-01-24 13:16:58 +02:00
Alejo Sanchez	f236d518c6	test.py: manual cluster handling for PythonSuite Instead of complex async with logic, use manual cluster pool handling. Revert the discard() logic in Pool from a recent commit. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-24 11:38:17 +01:00
Alejo Sanchez	a6059e4bb7	test.py: stop cluster if PythonSuite fails to start If cluster fails to start, stop it. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-24 11:36:49 +01:00

1 2 3 4 5 ...

34795 Commits