scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 09:30:45 +00:00

Author	SHA1	Message	Date
Avi Kivity	bc2fcf5187	dirty_memory_manager: unscramble terminology Before `95f31f37c1` ("Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity"), we had two region_group objects, one _real_region_group and another _virtual_region_group, each with a set of "soft" and "hard" limits and related functions and members. In `95f31f37c1`, we merged _real_region_group into _virtual_region_group, but unfortunately the _real_region_group members received the "hard" prefix when they got merged. This overloads the meaning of "hard" - is it related to soft/hard limit or is it related to the real/virtual distinction? This patch applied some renaming to restore consistency. Anything that came from _virtual_region_group now has "virtual" in its name. Anything that came from _real_region_group now has "real" in its name. The terms are still pretty bad but at least they are consistent.	2022-10-04 13:56:28 +03:00
Botond Dénes	95f31f37c1	Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity region_group evolved as a tree, each node of which contains some regions (memtables). Each node has some constraints on memory, and can start flushing and/or stop allocation into its memtables and those below it when those constraints are violated. Today, the tree has exactly two nodes, only one of which can hold memtables. However, all the complexity of the tree remains. This series applies some mechanical code transformations that remove the tree structure and all the excess functionality, leaving a much simpler structure behind. Before: - a tree of region_group objects - each with two parameters: soft limit and hard limit - but only two instances ever instantiated After: - a single region_group object - with three parameters - two from the bottom instance, one from the top instance Closes #11570 * github.com:scylladb/scylladb: dirty_memory_manager: move third memory threshold parameter of region_group constructor to reclaim_config dirty_memory_manager: simplify region_group::update() dirty_memory_manager: fold region_group::notify_hard_pressure_relieved into its callers dirty_memory_manager: clean up region_group::do_update_hard_and_check_relief() dirty_memory_manager: make do_update_hard_and_check_relief() a member of region_group dirty_memory_manager: remove accessors around region_group::_under_hard_pressure dirty_memory_manager: merge memory_hard_limit into region_group dirty_memory_manager: rename members in memory_hard_limit dirty_memory_manager: fold do_update() into region_group::update() dirty_memory_manager: simplify memory_hard_limit's do_update dirty_memory_manager: drop soft limit / soft pressure members in memory_hard_limit dirty_memory_manager: de-template do_update(region_group_or_memory_hard_limit) dirty_memory_manager: adjust soft_limit threshold check dirty_memory_manager: drop memory_hard_limit::_name dirty_memory_manager: simplify memory_hard_limit configuration dirty_memory_manager: fold region_group_reclaimer into {memory_hard_limit,region_group} dirty_memory_manager: stop inheriting from region_group_reclaimer dirty_memory_manager: test: unwrap region_group_reclaimer dirty_memory_manager: change region_group_reclaimer configuration to a struct dirty_memory_manager: convert region_group_reclaimer to callbacks dirty_memory_manager: consolidate region_group_reclaimer constructors dirty_memory_manager: rename {memory_hard_limit,region_group}::notify_relief dirty_memory_manager: drop unused parameter to memory_hard_limit constructor dirty_memory_manager: drop memory_hard_limit::shutdown() dirty_memory_manager: split region_group hierarchy into separate classes dirty_memory_manager: extract code block from region_group::update dirty_memory_manager: move more allocation_queue functions out of region_group dirty_memory_manager: move some allocation queue related function definitions outside class scope dirty_memory_manager: move region_group::allocating_function and related classes to new class allocation_queue dirty_memory_manager: remove support for multiple subgroups	2022-10-03 13:22:47 +03:00
Botond Dénes	5621cdd7f9	db/view/view_builder: don't drop partition and range tombstones when resuming The view builder builds the views from a given base table in view_builder::batch_size batches of rows. After processing this many rows, it suspends so the view builder can switch to building views for other base tables in the name of fairness. When resuming the build step for a given base table, it reuses the reader used previously (also serving the role of a snapshot, pinning sstables read from). The compactor however is created anew. As the reader can be in the middle of a partition, the view builder injects a partition start into the compactor to prime it for continuing the partition. This however only included the partition-key, crucially missing any active tombstones: partition tombstone or -- since the v2 transition -- active range tombstone. This can result in base rows covered by either of this to be resurrected and the view builder to generate view updates for them. This patch solves this by using the detach-state mechanism of the compactor which was explicitly developed for situations like this (in the range scan code) -- resuming a read with the readers kept but the compactor recreated. Also included are two test cases reproducing the problem, one with a range tombstone, the other with a partition tombstone. Fixes: #11668 Closes #11671	2022-10-03 11:28:22 +03:00
Nadav Har'El	b8f8eb8710	Merge 'Improve test.py logging' from Kamil Braun Include the unique test name (the unique name distinguishes between different test repeats) and the test case name where possible. Improve printing of clusters: include the cluster name and stopped servers. Fix some logging calls and add new ones. Examples: ``` ------ Starting test test_topology ------ ``` became this: ``` ------ Starting test test_topology.1::test_add_server_add_column ------ ``` This: ``` INFO> Leasing Scylla cluster {127.191.142.1, 127.191.142.2, 127.191.142.3} for test test_add_server_add_column ``` became this: ``` INFO> Leasing Scylla cluster ScyllaCluster(name: 02cdd180-40d1-11ed-8803-3c2c30d32d96, running: {127.144.164.1, 127.144.164.2, 127.144.164.3}, stopped: {}) for test test_topology.1::test_add_server_add_column ``` Closes #11677 * github.com:scylladb/scylladb: test/pylib: scylla_cluster: improve cluster printing test/pylib: don't pass test_case_name to after-test endpoint test/pylib: scylla_cluster: track current test case name and print it test.py: pass the unique test name (e.g. `test_topology.1`) to cluster manager test/pylib: scylla_cluster: pass the test case name to `before_test` test/pylib: use "test_case_name" variable name when talking about test cases	2022-10-02 20:48:50 +03:00
Avi Kivity	6a02bb7c2b	dirty_memory_manager: merge memory_hard_limit into region_group The two classes always have a 1:1 or 0:1 relationship, and so we can just move all the members of memory_hard_limit into region_group, with the functions that track the relationship (memory_hard_limit::{add,del}()) removed. The 0:1 relationship is maintained by initializing the hard limit parameter with std::numeric_limits<size_t>::max(). The _hard_total_memory variable is always checked if it is greater than this parameter in order to do anything, and with this default it can never be.	2022-09-30 21:59:38 +03:00
Avi Kivity	45ab24e43d	dirty_memory_manager: rename members in memory_hard_limit In preparation for merging memory_hard_limit into region_group, disambiguate similarly named members by adding the word "hard" in random places. memory_hard_limit and region_group are candidates for merging because they constantly reference each other, and memory_hard_limit does very little by itself.	2022-09-30 21:47:33 +03:00
Kamil Braun	b2cf610567	test/pylib: scylla_cluster: improve cluster printing Print the cluster name and stopped servers in addition to the running servers. Fix a logging call which tried to print a server in place of a cluster and even at that it failed (the server didn't have a hostname yet so it printed as an empty string). Add another logging call.	2022-09-30 17:00:05 +02:00
Kamil Braun	05ed3769dd	test/pylib: don't pass test_case_name to after-test endpoint It's redundant now, the manager tracks the current test case using before-test endpoint calls.	2022-09-30 16:41:45 +02:00
Kamil Braun	dc6f37b7f7	test/pylib: scylla_cluster: track current test case name and print it Use `_before_test` calls to track the current test case name. Concatenate it with the unique test name like this: `test_topology.1::test_add_server_add_column`, and print it instead of the test case name.	2022-09-30 16:38:35 +02:00
Kamil Braun	5be818d73b	test.py: pass the unique test name (e.g. `test_topology.1`) to cluster manager This helps us distinguish the different repeats of a test in logs. Rename the variable accordingly in `ScyllaClusterManager`.	2022-09-30 16:24:10 +02:00
Kamil Braun	fde4642472	test/pylib: scylla_cluster: pass the test case name to `before_test` We pass the test case name to `after_test` - so make it consistent. Arguably, the test case name is more useful (as it's more precise) than the test name.	2022-09-30 16:17:59 +02:00
Kamil Braun	43d8b4a214	test/pylib: use "test_case_name" variable name when talking about test cases Distinguish "test name" (e.g. `test_topology`) from "test case name" (e.g. `test_add_server_add_column` - a test case inside `test_topology`).	2022-09-30 16:15:48 +02:00
Botond Dénes	060dda8e00	Merge 'Reduce dependencies on large data handler header' from Benny Halevy Reduce the false dependencies on db/large_data_handler.hh by not including it from commonly used header files, and rather including it only in the source files that actually need it. The is in preparation for https://github.com/scylladb/scylladb/issues/11449 Closes #11654 * github.com:scylladb/scylladb: test: lib: do not include db/large_data_handler.hh in test_service.hh test: lib: move sstable test_env::impl ctor out of line sstables: do not include db/large_data_handler.hh in sstables.hh api/column_family: add include db/system_keyspace.hh	2022-09-30 13:27:38 +03:00
Tomasz Grabiec	5268f0f837	test: lib: random_mutation_generator: Don't generate mutations with marker uncompacted with shadowable tombstone The generator was first setting the marker then applied tombstones. The marker was set like this: row.marker() = random_row_marker(); Later, when shadowable tombstones were applied, they were compacted with the marker as expected. However, the key for the row was chosen randomly in each iteration and there are multiple keys set, so there was a possibility of a key clash with an earlier row. This could override the marker without applying any tombstones, which is conditional on random choice. This could generate rows with markers uncompacted with shadowable tombstones. This broken row_cache_test::test_concurrent_reads_and_eviction on comparison between expected and read mutations. The latter was compacted because it went through an extra merge path, which compacts the row. Fix by making sure there are no key clashes. Closes #11663	2022-09-30 11:27:01 +03:00
Kamil Braun	1793d43b15	test/pylib: scylla_cluster: mark `server_remove` as not implemented The `server_remove` function did a very weird thing: it shut down a server and made the framework 'forget' about it. From the point of view of the Scylla cluster and the driver the server was still there. Replace the function's body with `raise NotImplementedError`. In the future it can be replaced with an implementation that calls `removenode` on the Scylla cluster. Remove `test_remove_server_add_column` from `test_topology`. It effectively does the same thing as `test_stop_server_add_column`, except that the framework also 'forgets' about the stopped server. This could lead to weird situations because the forgotten server's IP could be reused in another test that was running concurrently with this test. Closes #11657	2022-09-29 21:03:18 +03:00
Benny Halevy	776b009c0f	test: lib: do not include db/large_data_handler.hh in test_service.hh It was needed for defining and referencing nop_lp_handler and in sstable_3_x_test for testing the large_data_handler. Remove the include from the commonly used header file to reduce the false dependencies on large_data_handler.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-29 18:36:16 +03:00
Benny Halevy	678d88576b	test: lib: move sstable test_env::impl ctor out of line To prepare for removing the include of db/large_data_handler.hh from test/lib/test_services.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-29 18:35:12 +03:00
Botond Dénes	ad04f200d3	Merge 'database: automatically take snapshot of base table views' from Benny Halevy The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11616 * github.com:scylladb/scylladb: database: automatically take snapshot of base table views api: storage_service: reject snapshot of views in api layer	2022-09-29 13:33:31 +03:00
Avi Kivity	cf3830a249	Merge 'Add support for TRUNCATE USING TIMEOUT' from Benny Halevy Extend the cql3 truncate statement to accept attributes, similar to modification statements. To achieve that we define cql3::statements::raw::truncate_statement derived from raw::cf_statement, and implement its pure virtual prepare() method to make a prepared truncate_statement. The latter is no longer derived from raw::cf_statement, and just stores a schema_ptr to get to the keyspace and column_family. `test_truncate_using_timeout` cql-pytest was added to test the new USING TIMEOUT feature. Fixes #11408 Also, update docs/cql/ddl.rst truncate-statement section respectively. Closes #11409 * github.com:scylladb/scylladb: docs: cql-extensions: add TRUNCATE to USING TIMEOUT section. docs: cql: ddl: add support for TRUNCATE USING TIMEOUT cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT cql3: selectStatement: restrict to USING TIMEOUT in grammar cql3: deleteStatement: restrict to USING TIMEOUT\|TIMESTAMP in grammar	2022-09-28 18:19:03 +03:00
Nadav Har'El	de1bc147bc	Merge 'test.py: cleanups in topology test suites' from Kamil Braun Fix the type of `create_server`, rename `topology_for_class` to `get_cluster_factory`, simplify the suite definitions and parameters passed to `get_cluster_factory` Closes #11590 * github.com:scylladb/scylladb: test.py: replace `topology` with `cluster_size` in Topology tests test.py: rename `topology_for_class` to `get_cluster_factory` test/pylib: ScyllaCluster: fix create_server parameter type	2022-09-28 15:19:54 +03:00
Kamil Braun	1bcc28b48b	test/topology_raft_disabled: reenable `test_raft_upgrade` The test was disabled due to a bug in the Python driver which caused the driver not to reconnect after a node was restarted (see scylladb/python-driver#170). Introduce a workaround for that bug: we simply create a new driver session after restarting the nodes. Reenable the test. Closes #11641	2022-09-28 15:13:42 +03:00
Mikołaj Grzebieluch	be8fcba8c1	raft: broadcast_tables: add support for bind variables Extended the queries language to support bind variables which are bound in the execution stage, before creating a raft command. Adjusted `test_broadcast_tables.py` to prepare statements at the beginning of the test. Fixed a small bug in `strongly_consistent_modification_statement::check_access`. Closes #11525	2022-09-28 09:54:59 +03:00
Alejo Sanchez	02933c9b82	test.py: close aiohttp session for topology tests Close the aiohttp ClientSession after pytest session finishes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11648	2022-09-27 18:09:08 +02:00
Kamil Braun	82481ae31b	Merge 'raft server, log size limit in bytes' from Gusev Petr Before this patch we could get an OOM if we received several big commands. The number of commands was small, but their total size in bytes was large. snapshot_trailing_size is needed to guarantee progress. Without this limit the fsm could get stuck if the size of the next item is greater than max_log_size - (size of trailing entries). Closes #11397 * github.com:scylladb/scylladb: raft replication_test, make backpressure test to do actual backpressure raft server, shrink_to_fit on log truncation raft server, release memory if add_entry throws raft server, log size limit in bytes	2022-09-27 14:25:08 +02:00
Kamil Braun	ed67f0e267	Merge 'test.py: fix topology init error handling' from Alecco When there are errors starting the first cluster(s) the logs of the server logs are needed. So move `.start()` to the `try` block in `test.py` (out of `asynccontextmanager`). While there, make `ScyllaClusterManager.start()` idempotent. Closes #11594 * github.com:scylladb/scylladb: test.py: fix ScyllaClusterManager start/stop test.py: fix topology init error handling	2022-09-27 11:36:07 +02:00
Petr Gusev	bc50b7407f	raft replication_test, make backpressure test to do actual backpressure Before this patch this test didn't actually experience any backpressure since all the commands were executed sequentially.	2022-09-27 12:04:14 +04:00
Petr Gusev	b34dfed307	raft server, release memory if add_entry throws We consume memory from semaphore in add_entry_on_leader, but never release it if add_entry throws.	2022-09-27 12:02:34 +04:00
Benny Halevy	64140ccf05	cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT Extend the cql3 truncate statement to accept attributes, similar to modification statements. To achieve that we define cql3::statements::raw::truncate_statement derived from raw::cf_statement, and implement its pure virtual prepare() method to make a prepared truncate_statement. The latter, statements::truncate_statement, is no longer derived from raw::cf_statement, and just stores a schema_ptr to get to the keyspace and column_family names. `test_truncate_using_timeout` cql-pytest was added to test the new USING TIMEOUT feature. Fixes #11408 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 18:30:39 +03:00
Benny Halevy	27d3e48005	cql3: selectStatement: restrict to USING TIMEOUT in grammar It is preferred to reject USING TLL / TIMESTAMP at the grammar level rather than functionally validating the USING attributes. test_using_timeout was adjusted respectively to expect the `SyntaxException` error rather than `InvalidRequest`. Note that cql3::statements::raw::select_statement validate_attrs now asserts that the ttl or the timestamp attributes aren't set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 18:30:39 +03:00
Benny Halevy	0728d33d5f	cql3: deleteStatement: restrict to USING TIMEOUT\|TIMESTAMP in grammar It is preferred to reject USING TLL / TIMESTAMP at the grammar level rather than functionally validating the USING attributes. test_using_timeout was adjusted respectively to expect the `SyntaxException` error rather than `InvalidRequest`. Note that now delete_statement ctor asserts that the ttl attribute is not set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 18:30:39 +03:00
Kamil Braun	696bdb2de7	test.py: replace `topology` with `cluster_size` in Topology tests First, a reminder of a few basic concepts in Scylla: - "topology" is a mapping: for each node, its DC and Rack. - "replication strategy" is a method of calculating replica sets in a cluster. It is not a cluster-global property; each keyspace can have a different replication strategy. A cluster may have multiple keyspaces. - "cluster size" is the number of nodes in a cluster. Replication strategy is orthogonal to topology. Cluster size can be derived from topology and is also orthogonal to replication strategy. test.py was confusing the three concepts together. For some reason, Topology suites were specifying a "topology" parameter which contained replication strategy details - having nothing to do with topology. Also it's unclear why a test suite would specify anything to do with replication strategies - after all, a test may create keyspaces with different replication strategies, and a suite may contain multiple different tests. Get rid of the "topology" parameter, replace it with a simple "cluster_size". In the future we may re-introduce it when we actually implement the possibility to start clusters with custom topologies (which involves configuring the snitch etc.) Simplify the test.py code.	2022-09-26 15:17:50 +02:00
Kamil Braun	06cc4f9259	test/pylib: ScyllaCluster: fix create_server parameter type The only usage of `ScyllaCluster` constructor passed a `create_server` function which expected a `List[str]` for the second parameter, while the constructor specified that the function should expect an `Optional[List[str]]`. There was no reason for the latter, we can easily fix this type error. Also give a type hint for `create_cluster` function in `PythonTestSuite.topology_for_class`. This is actually what catched the type error.	2022-09-26 11:45:44 +02:00
Petr Gusev	27e60ecbf4	raft server, log size limit in bytes Before this patch we could get an OOM if we received several big commands. The number of commands was small, but their total size in bytes was large. snapshot_trailing_size is needed to guarantee progress. Without this limit the fsm could get stuck if the size of the next item is greater than max_log_size - (size of trailing entries).	2022-09-26 13:10:10 +04:00
Benny Halevy	d32c497cd9	database: automatically take snapshot of base table views The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 11:02:54 +03:00
Nadav Har'El	868a884b79	test/cql-pytest: add reproducer for ignored IS NOT NULL This test reproduces issue #10365: It shows that although "IS NOT NULL" is not allowed in regular SELECT filters, in a materialized view it is allowed, even for non-key columns - but then outright ignored and does not actually filter out anything - a fact which already surprised several users. The test also fails on Cassandra - it also wrongly allows IS NOT NULL on the non-key columns but then ignores this in the filter. So the test is marked with both xfail (known to fail on Scylla) and cassandra_bug (fails on Cassandra because of what we consider to be a Cassandra bug). Refs #10365 Refs #11606 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11615	2022-09-26 09:02:08 +03:00
Avi Kivity	2f907dc47d	dirty_memory_manager: fold region_group_reclaimer into {memory_hard_limit,region_group} region_group_reclaimer is used to initialize (by reference) instances of memory_hard_limit and region_group. Now that it is a final class, we can fold it into its users by pasting its contents into those users, and using the initializer (reclaim_config) to initialize the users. Note there is a 1:1 relationship between a region_group_reclaimer instance and a {memory_hard_limit,region_group} instance. It may seem like code duplication to paste the contents of one class into two, but the two classes use region_group_reclaimer differently, and most of the code is just used to glue different classes together, so the next patches will be able to get rid of much of it. Some notes: - no_reclaimer was replaced by a default reclaim_config, as that's how no_reclaimer was initialized - all members were added as private, except when a caller required one to be public - an under_presssure() member already existed, forwarding to the reclaimer; this was just removed.	2022-09-22 13:56:59 +03:00
Avi Kivity	d8f857e74b	dirty_memory_manager: stop inheriting from region_group_reclaimer This inheritance makes it harder to get rid of the class. Since there are no longer any virtual functions in the class (apart from the destructor), we can just convert it to a data member. In a few places, we need forwarding functions to make formerly-inherited functions visible to outside callers. The virtual destructor is removed and the class is marked final to verify it is no longer a base class anywhere.	2022-09-22 13:56:59 +03:00
Avi Kivity	26f3a123a5	dirty_memory_manager: test: unwrap region_group_reclaimer In one test, region_group_reclaimer is wrapped in another class just to toggle a bool, but with the new callbacks it's easy to just use a bool instead.	2022-09-22 13:56:59 +03:00
Avi Kivity	1d3508e02c	dirty_memory_manager: change region_group_reclaimer configuration to a struct It's just so much nicer. The "threshold" limit was renamed to "hard_limit" to contrast it with "soft_limit" (in fact threshold is a good name for soft_limit, since it's a point where the behavior begins to change, but that's too much of a change).	2022-09-22 13:56:59 +03:00
Avi Kivity	2c54c7d51e	dirty_memory_manager: convert region_group_reclaimer to callbacks region_group_reclaimer is partially policy (deciding when to reclaim) and partially mechanism (implementing reclaim via virtual functions). Move the mechanism to callbacks. This will make it easy to fold the policy part into region_group and memory_hard_limit. This folding is expected to simplify things since most of region_group_reclaimer is cross-class communication.	2022-09-22 13:56:59 +03:00
Avi Kivity	152136630c	dirty_memory_manager: split region_group hierarchy into separate classes Currently, region_group forms a hierarchy. Originally it was a tree, but previous work whittled it down to a parent-child relationship (with a single, possible optional parent, and a single child). The actual behavior of the parent and child are very different, so it makes sense to split them. The main difference is that the parent does not contain any regions (memtables), but the child does. This patch mechanically splits the class. The parent is named memory_hard_limit (reflecting its role to prevent lsa allocation above the memtable configured hard limit). The child is still named region_group. Details of the transformation: - each function or data member in region_group is either moved to memory_hard_limit, duplicated in memory_hard_limit, or left in region_group. - the _regions and _blocked_requests members, which were always empty in the parent, were not duplicated. Any member that only accessed them was similarly left alone. - the "no_reclaimer" static member which was only used in the parent was moved there. Similarly the constructor which accepted it was moved. - _child was moved to the parent, and _parent was kept in the child (more or less the defining change of the split) Similarly add(region_group) and del(region_group) (which manage _child) were moved. - do_for_each_parent(), which iterated to the top of the tree, was removed and its callers manually unroll the loop. For the parent, this is just a single iteration (since we're iterating towards the root), for the child, this can be two iterations, but the second one is usually simpler since the parent has many members removed. - do_update(), introduced in the previous patch, was made a template that can act on either the parent or the child. It will be further simplified later. - some tests that check now-impossible topologies were removed. - the parent's shutdown() is trivial since it has no _blocked_requests, but it was kept to reduce churn in the callers.	2022-09-22 13:56:59 +03:00
Avi Kivity	d21d2cdb3e	dirty_memory_manager: remove support for multiple subgroups We only have one parent/child relationship in the region group hierarchy, so support for more is unneeded complexity. Replace the subgroup vector with a single pointer, and delete a test for the removed functionality.	2022-09-22 13:56:59 +03:00
Piotr Sarna	481240b8b4	Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El We had quite a few tests for Alternator TTL in test/alternator, but most of them did not run as part of the usual Jenkins test suite, because they were considered "very slow" (and require a special "--runveryslow" flag to run). In this series we enable six tests which run quickly enough to run by default, without an additional flag. We also make them even quicker - the six tests now take around 2.5 seconds. I also noticed that we don't have a test for the Alternator TTL metrics - and added one. Fixes #11374. Refs https://github.com/scylladb/scylla-monitoring/issues/1783 Closes #11384 * github.com:scylladb/scylladb: test/alternator: insert test names into Scylla logs rest api: add a new /system/log operation alternator ttl: log warning if scan took too long. alternator,ttl: allow sub-second TTL scanning period, for tests test/alternator: skip fewer Alternator TTL tests test/alternator: test Alternator TTL metrics	2022-09-22 09:47:50 +02:00
Petr Gusev	210d9dd026	raft: fix snapshots leak applier_fiber could create multiple snapshots between io_fiber run. The fsm_output.snp variable was overwritten by applier_fiber and io_fiber didn't drop the previous snapshot. In this patch we introduce the variable fsm_output.snps_to_drop, store in it the current snapshot id before applying a new one, and then sequentially drop them in io_fiber after storing the last snapshot_descriptor. _sm_events.signal() is added to fsm::apply_snapshot, since this method mutates the _output and thus gives a reason to run io_fiber. The new test test_frequent_snapshotting demonstrates the problem by causing frequent snapshots and setting the applier queue size to one. Closes #11530	2022-09-21 12:46:26 +02:00
Kamil Braun	3b096b71c1	test/topology_raft_disabled: disable `test_raft_upgrade` For some reason, the test is currently flaky on Jenkins. Apparently the Python driver does not reconnect to the cluster after the cluster restarts (well it does, but then it disconnects from one of the nodes and never reconnects again). This causes the test to hang on "waiting until driver reconnects to every server" until it times out. Disable it for now so it doesn't block next promotion.	2022-09-21 12:32:40 +02:00
Alejo Sanchez	510215d79a	test.py: fix ScyllaClusterManager start/stop Check existing is_running member to avoid re-starting. While there, set it to false after stopping. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-21 11:42:02 +02:00
Alejo Sanchez	933d93d052	test.py: fix topology init error handling Start ScyllaClusterManager within error handling so the ScyllaCluster logs are available in case of error starting up. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-21 09:15:25 +02:00
Avi Kivity	2cec417426	Merge 'tools: use the standard allocator' from Botond Dénes Tools want to be as little disrupting to the environment they run in as possible, because they might be run in a production environment, next to a running scylladb production server. As such, the usual behavior of seastar applications w.r.t. memory is an anti-pattern for tools: they don't want to reserve most of the system memory, in fact they don't want to reserve any amount, instead consuming as much as needed on-demand. To achieve this, tools want to use the standard allocator. To achieve this they need a seastar option to to instruct seastar to not configure and use the seastar allocator and they need LSA to cooperate with the standard allocator. The former is provided by https://github.com/scylladb/seastar/pull/1211. The latter is solved by introducing the concept of a `segment_store_backend`, which abstracts away how the memory arena for segments is acquired and managed. We then refactor the existing segment store so that the seastar allocator specific parts are moved to an implementation of this backend concept, then we introduce another backend implementation appropriate to the standard allocator. Finally, tools configure seastar with the newly introduced option to use the standard allocator and similarly configure LSA to use the standard allocator appropriate backend. Refs: https://github.com/scylladb/scylladb/issues/9882 This is the last major code piece in scylla for making tools production ready. Closes #11510 * github.com:scylladb/scylladb: test/boost: add alternative variant of logalloc test tools: use standard allocator utils/logalloc: add use_standard_allocator_segment_pool_backend() utils/logalloc: introduce segment store backend for standard allocator utils/logalloc: rebase release segment-store on segment-store-backend utils/logalloc: introduce segment_store_backend utils/logalloc: push segment alloc/dealloc to segment_store test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe	2022-09-20 12:59:34 +03:00
Nadav Har'El	4c93a694b7	cql: validate bloom_filter_fp_chance up-front Scylla's Bloom filter implementation has a minimal false-positive rate that it can support (6.71e-5). When setting bloom_filter_fp_chance any lower than that, the compute_bloom_spec() function, which writes the bloom filter, throws an exception. However, this is too late - it only happens while flushing the memtable to disk, and a failure at that point causes Scylla to crash. Instead, we should refuse the table creation with the unsupported bloom_filter_fp_chance. This is also what Cassandra did six years ago - see CASSANDRA-11920. This patch also includes a regression test, which crashes Scylla before this patch but passes after the patch (and also passes on Cassandra). Fixes #11524. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11576	2022-09-20 06:18:51 +03:00
Botond Dénes	60991358e8	Merge 'Improvements to test/lib/sstable_utils.hh' from Raphael "Raph" Carvalho Changes done to avoid pitfalls and fix issues of sstable-related unit tests Closes #11578 * github.com:scylladb/scylladb: test: Make fake sstables implicitly belong to current shard test: Make it clearer that sstables::test::set_values() modify data size	2022-09-20 06:14:07 +03:00

1 2 3 4 5 ...

3695 Commits