scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Kamil Braun	4e35e62597	Merge 'Raft test topology part 3' from Alecco Test schema changes when there was an underlying topology change. - per test case checks of cluster health and cycling - helper class to do cluster manager API requests - tests can perform topology changes: stop/start/restart servers - modified clusters are marked dirty and discarded after the test case - cql connection is updated per topology change and per cluster change Closes #11266 * github.com:scylladb/scylladb: test.py: test topology and schema changes test.py: ClusterManager API mark cluster dirty test.py: call before/after_test for each test case test.py: handle driver connection in ManagerClient test.py: ClusterManager API and ManagerClient test.py: improve topology docstring	2022-08-16 11:00:26 +02:00
Avi Kivity	afa7960926	Merge 'database: evict all inactive reads for table when detaching table' from Botond Dénes Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later. Fixes: https://github.com/scylladb/scylladb/issues/11264 Closes #11273 * github.com:scylladb/scylladb: querier: querier_cache: remove now unused evict_all_for_table() database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table() reader_concurrency_semaphore: add evict_inactive_reads_for_table()	2022-08-15 19:05:59 +03:00
Botond Dénes	d56dcb842c	db/virtual_table: add virtual destructor to virtual_table It should have had one, derived instances are stored and destroyed via the base-class. The only reason this haven't caused bugs yet is that derived instances happen to not have any non-trivial members yet. Closes #11293	2022-08-15 16:58:05 +03:00
Avi Kivity	73d4930815	Merge 'test/lib: various improvements to sstable test env' from Botond Dénes A mixed bag of improvements developed as part of another PR (https://github.com/scylladb/scylladb/pull/10736). Said PR was closed so I'm submitting these improvements separately. Closes #11294 * github.com:scylladb/scylladb: test/lib: move convenience table config factory to sstable_test_env test/lib/sstable_test_env: move members to impl struct test/lib/sstable_utils: use test_env::do_with_async()	2022-08-15 16:57:01 +03:00
Botond Dénes	92e5f438a4	querier: querier_cache: remove now unused evict_all_for_table()	2022-08-15 14:16:41 +03:00
Botond Dénes	2b1eb6e284	database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table() Instead of querier_cache::evict_all_for_table(). The new method cover all queriers and in addition any other inactive reads registered on the semaphore. In theory by the time we detach a table, no regular inactive reads should be in the semaphore anymore, but if there is any still, we better evict them before the table is destroyed, they might attempt to access it in when destroyed later.	2022-08-15 14:16:41 +03:00
Botond Dénes	e55ccbde8f	reader_concurrency_semaphore: add evict_inactive_reads_for_table() Allowing for evicting all inactive reads that belong to a certain table.	2022-08-15 14:16:41 +03:00
Botond Dénes	c8ef356859	test/lib: move convenience table config factory to sstable_test_env All users of `column_family_test_config()`, get the semaphore parameter for it from `sstable_test_env`. It is clear that the latter serves as the storage space for stable objects required by the table config. This patch just enshrines this fact by moving the config factory method to `sstable_test_env`, so it can just get what it needs from members.	2022-08-15 11:23:59 +03:00
Botond Dénes	c0e017e0f7	test/lib/sstable_test_env: move members to impl struct All present members of sstable_test_env are std::unique_ptr<>:s because they require stable addresses. This makes their handling somewhat awkward. Move all of them into an internal `struct impl` and make that member a unique ptr.	2022-08-15 11:20:09 +03:00
Botond Dénes	a9f296ed47	test/lib/sstable_utils: use test_env::do_with_async() Instead of manually instantiating test_env.	2022-08-15 11:19:27 +03:00
Botond Dénes	a9573b84c5	Merge 'commitlog: Revert/modify `fac2bc4` - do footprint add in delete' from Calle Wilund Fixes #11184 Fixes #11237 In prev (broken) fix for https://github.com/scylladb/scylladb/issues/11184 we added the footprint for left-over files (replay candidates) to disk footprint on commitlog init. This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things. Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint). So shards 1-X would all be either locked out or performance degraded. Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved. To further emphasize this, don't store segments found on init scan in all shard instances, instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store pointless string lists in shards 1 to X, and also we can potentially ask for the list twice. More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is considered candidate for recycling, and included in footprint. Closes #11251 * github.com:scylladb/scylladb: commitlog: Make get_segments_to_replay on-demand commitlog: Revert/modify `fac2bc4` - do footprint add in delete	2022-08-15 09:10:32 +03:00
Botond Dénes	8f10413087	Merge 'doc: describe specifying workload attributes with service levels' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/11197 This PR adds a new page where specifying workload attributes with service levels is described and adds it to the menu. Also, I had to fix some links because of the warnings. Closes #11209 * github.com:scylladb/scylladb: doc: remove the reduntant space from index doc: update the syntax for defining service level attributes doc: rewording doc: update the links to fix the warnings doc: add the new page to the toctree doc: add the descrption of specifying workload attributes with service levels doc: add the definition of workloads to the glossary	2022-08-15 07:14:28 +03:00
Nadav Har'El	c8b5c3595e	Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity Increase readability in preparation for managing topology with effective_replication_map (continuing `69aea59d9`). Closes #11290 * github.com:scylladb/scylladb: cql3: select_statement: improve loop termination condition in indexed_table_select_statement::do_execute_base_query() cql3: select_statement: reindent indexed_table_select_statement::do_execute_base_query() cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query() cql3: select_statement: de-result_wrap indexed_table_select_statement::do_execute_base_query()	2022-08-14 23:26:06 +03:00
Nadav Har'El	4a4231ea53	Merge 'storage_proxy: coroutinize some counter mutate functions' from Avi Kivity In preparation for effective_replication_map hygiene, convert some counter functions to coroutines to simplify the changes. Closes #11291 * github.com:scylladb/scylladb: storage_proxy: mutate_counters_on_leader: coroutinize storage_proxy: mutate_counters: coroutinize storage_proxy: mutate_counters: reorganize error handling	2022-08-14 23:16:42 +03:00
Avi Kivity	8070cdbbf9	storage_proxy: mutate_counters_on_leader: coroutinize Simplify ahead of refactoring for consistent effective_replication_map.	2022-08-14 17:36:58 +03:00
Avi Kivity	6e330d98d2	storage_proxy: mutate_counters: coroutinize Simplify ahead of refactoring for consistent effective_replication_map. This is probably a pessimization of the error case, but the error case will be terrible in any case unless we resultify it.	2022-08-14 17:28:46 +03:00
Avi Kivity	105b066ff7	storage_proxy: mutate_counters: reorganize error handling Move the error handling function where it's used so the code is more straightforward. Due to some std::move()s later, we must still capture the schema early.	2022-08-14 17:13:22 +03:00
Avi Kivity	fbaa280acd	cql3: select_statement: improve loop termination condition in indexed_table_select_statement::do_execute_base_query() Move the termination condition to the front of the loop so it's clear why we're looping and when we stop. It's less than perfectly clean since we widen the scope of some variables (from loop-internal to loop-carried), but IMO it's clearer.	2022-08-14 15:40:45 +03:00
Avi Kivity	60c7c11c96	cql3: select_statement: reindent indexed_table_select_statement::do_execute_base_query() Reindent after coroutinization. No functional changes.	2022-08-14 15:35:36 +03:00
Avi Kivity	492dc6879e	cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query() It's much easier to maintain this way. Since it uses ranges_to_vnodes, it interacts with topology and needs integration into effective_replication_map management. The patch leaves bad indentation and an infinite-looking loop in the interest of minimization, but that will be corrected later. Note, the test for `!r.has_value()` was eliminated since it was short-circuited by the test for `!rqr.has_value()` returning from the coroutine rather than propagating an error.	2022-08-14 15:31:45 +03:00
Avi Kivity	973034978c	cql3: select_statement: de-result_wrap indexed_table_select_statement::do_execute_base_query() We use result_wrap() in two places, but that makes coroutinizing the containing function a little harder, since it's composed of more lambdas. Remove the wrappers, gaining a bit of performance in the error case.	2022-08-14 15:22:18 +03:00
Kamil Braun	b4c5b79f5e	db: system_distributed_keyspace: don't call `on_internal_error` in `check_exists` The function `check_exists` checks whether a given table exists, giving an error otherwise. It previously used `on_internal_error`. `check_exists` is used in some old functions that insert CDC metadata to CDC tables. These tables are no longer used in newer Scylla versions (they were replaced with other tables with different schema), and this function is no longer called. The table definitions were removed and these tables are no longer created. They will only exists in clusters that were upgraded from old versions of Scylla (4.3) through a sequence of upgrades. If you tried to upgrade from a very old version of Scylla which had neither the old or the new tables to a modern version, say from 4.2 to 5.0, you would get `on_internal_error` from this `check_exists` function. Fortunately: 1. we don't support such upgrade paths 2. `on_internal_error` in production clusters does not crash the system, only throws. The exception would be catched, printed, and the system would run (just without CDC - until you finished upgrade and called the propoer nodetool command to fix the CDC module). Unfortunately, there is a dtest (`partitioner_tests.py`) which performs an unsupported upgrade scenario - it starts Scylla from Cassandra (!) work directories, which is like upgrading from a very old version of Scylla. This dtest was not failing due to another bug which masked the problem. When we try to fix the bug - see #11225 - the dtest starts hitting the assertion in `check_exists`. Because it's a test, we configure `on_internal_error` to crash the system. The point of this commit is to not crash the system in this rare scenario which happens only in some weird tests. We now throw `std::runtime_error` instead of calling `on_internal_error`. In the dtest, we already ignore the resulting CDC error appearing in the logs (see scylladb/scylla-dtest#2804). Together with this change, we'll be able to fix the #11225 bug and pass this test. Closes #11287	2022-08-14 13:12:03 +03:00
Piotr Sarna	fe617ed198	Merge 'db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column' from Piotr Dulikowski Previously, the `system.local`'s `rpc_address` column kept local node's `rpc_address` from the scylla.yaml configuration. Although it sounds like it makes sense, there are a few reasons to change it to the value of scylla.yaml's `broadcast_rpc_address`: - The `broadcast_rpc_address` is the address that the drivers are supposed to connect to. `rpc_address` is the address that the node binds to - it can be set for example to 0.0.0.0 so that Scylla listens on all addresses, however this gives no useful information to the driver. - The `system.peers` table also has the `rpc_address` column and it already keeps other nodes' `broadcast_rpc_address`es. - Cassandra is going to do the same change in the upcoming version 4.1. Fixes: #11201 Closes #11204 * github.com:scylladb/scylladb: db/system_keyspace: fix indentation after previous patch db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column	2022-08-12 16:24:28 +02:00
Piotr Sarna	1ab4c6aab3	Merge 'cql3: enable collections as UDA accumulators' from Wojciech Mitros Currently, the initial values of UDA accumulators are converted to strings using the to_string() method and from strings using the from_string() method. The from_string() method is not implemented for collections, and it can't be implemented without changing the string format, because in that format, we cannot differentiate whether a separator is a part of a value or is an actual separator between values. In particular, the separators are not escaped in the collection values. Instead of from_string()/to_string() the cql parser is used for creating a value from a string (the same , and to_parsable_string() is used to converting a value into a string. A test using a list as an accumulator is added to cql-pytest/test_uda.py. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #11250 * github.com:scylladb/scylladb: cql3: enable collections as UDA accumulators cql3: extend implementation of to_bytes for raw_value	2022-08-12 12:51:17 +02:00
Botond Dénes	ceb1cdcb7a	Merge 'doc: fix the typo on the Fault Tolerance page' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/438 In addition, I've replaced "Scylla" with "ScyllaDB" on that page. Closes #11281 * github.com:scylladb/scylladb: doc: replace Scylla with ScyllaDB on the Fault Tolerance page doc: fis the typo in the note	2022-08-12 06:58:39 +03:00
Nadav Har'El	c27f431580	test/alternator: fix a flaky test for full-table scan page size This patch fixes the test test_scan.py::test_scan_paging_missing_limit which failed in a Jenkins run once (that we know of). That test verifies that an Alternator Scan operation without an explicit "Limit" is nevertheless paged: DynamoDB (and also Scylla) wanted this page size to be 1 MB, but it turns out (see #10327) that because of the details of how Scylla's scan works, the page size can be larger than 1 MB. How much larger? I ran this test hundreds of times and never saw it exceed a 3 MB page - so the test asserted the page must be smaller than 4 MB. But now in one run - we got to this 4 MB and failed the test. So in this patch we increase the table to be scanned from 4 MB to 6 MB, and assert the page size isn't the full 6 MB. The chance that this size will eventually fail as well should be (famous last words...) very small for two reasons: First because 6 MB is even higher than I the maximum I saw in practice, and second because empirically I noticed that adding more data to the table reduces the variance of the page size, so it should become closer to 1 MB and reduce the chance of it reaching 6 MB. Refs #10327 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11280	2022-08-12 06:57:45 +03:00
Botond Dénes	2a39d6518d	Merge 'doc: clarify the disclaimer about reusing deleted counter column values' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/857 Closes #11253 * github.com:scylladb/scylladb: doc: language improvemens to the Counrers page doc: fix the external link doc: clarify the disclaimer about reusing deleted counter column values	2022-08-12 06:56:28 +03:00
Botond Dénes	10371441c9	Merge 'docs: add a disclaimer about not supporting local counters by SSTableLoader' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/867 Plus some language, formatting, and organization improvements. Closes #11248 * github.com:scylladb/scylladb: doc: language, formatting, and organization improvements doc: add a disclaimer about not supporting local counters by SSTableLoader	2022-08-12 06:55:00 +03:00
Benny Halevy	d295d8e280	everywhere: define locator::host_id as a strong tagged_uuid type So it can be distinguished from other uuid-based identifiers in the system. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11276	2022-08-12 06:01:44 +03:00
Botond Dénes	69aea59d97	Merge 'storage_proxy: use consistent topology, prepare for fencing' from Avi Kivity Replication is a mix of several inputs: tokens and token->node mappings (topology), the replication strategy, replication strategy parameters. These are all captured in effective_replication_map. However, if we use effective_replication_map:s captured at different times in a single query, then different uses may see different inputs to effective_replication_map. This series protects against that by capturing an effective_replication_map just once in a query, and then using it. Furthermore, the captured effective_replication_map is held until the query completes, so topology code can know when a topology is no longer is use (although this isn't exploited in this series). Only the simple read and write paths are covered. Counters and paxos are left for later. I don't think the series fixes any bugs - as far as I could tell everything was happening in the same continuation. But this series ensures it. Closes #11259 * github.com:scylladb/scylladb: storage_proxy: use consistent topology storage_proxy: use consistent replication map on read path storage_proxy: use consistent replication map on write path storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map consistency_level: accept effective_replication_map as parameter, rather than keyspace consistency_level: be more const when using replication_strategy	2022-08-12 06:00:30 +03:00
Alejo Sanchez	10baac1c84	test.py: test topology and schema changes Add support for topology changes: add/stop/remove/restart/replace node. Test simple schema changes when changing topology. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	7f32fc0cc7	test.py: ClusterManager API mark cluster dirty Allow tests to manually mark current cluster dirty. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	a585a82ad1	test.py: call before/after_test for each test case Preparing for topology tests with changing clusters, run before and after checks per test case. Change scope of pytest fixtures to function as we need them per test casse. Add server and client API logic. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	eedc866433	test.py: handle driver connection in ManagerClient Preparing for cluster cycling, handle driver connection in ManagerClient. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	fe561a7dbd	test.py: ClusterManager API and ManagerClient Add an API via Unix socket to Manager so pytests can query information about the cluster. Requests are managed by ManagerClient helper class. The socket is placed inside a unique temporary directory for the Manager (as safe temporary socket filename is not possible in Python). Initial API services are manager up, cluster up, if cluster is dirty, cql port, configured replicas (RF), and list of host ids. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	aad015d4e2	test.py: improve topology docstring Improve docstring of TopologyTestSuite to reflect its differences with other test suites. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Avi Kivity	a2c4f5aa1a	storage_proxy: use consistent topology Derive the topology from captured and stable effective_replication_map instead of getting a fresh topology from storage_proxy, since the fresh topology may be inconsistent with the running query. digest_read_resolver did not capture an effective_replication_map, so that is added.	2022-08-11 17:58:42 +03:00
Avi Kivity	883518697b	storage_proxy: use consistent replication map on read path Capture a replication map just once in abstract_read_executor::_effective_replication_map_ptr. Although it isn't used yet, it serves to keep a reference count on topology (for fencing), and some accesses to topology within reads still remain, which can be converted to use the member in a later patch.	2022-08-11 17:58:42 +03:00
Avi Kivity	01a614fb4d	storage_proxy: use consistent replication map on write path Capture a replication map just once in abstract_write_handler::_effective_replication_map_ptr and use it in all write handlers. A few accesses to get the topology still remain, they will be fixed up in a later patch.	2022-08-11 17:58:42 +03:00
Avi Kivity	f1b0e3d58e	storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map Allow callers to use consistent effective_replication_map:s across calls by letting the caller select the object to use.	2022-08-11 17:58:42 +03:00
Avi Kivity	46bd0b1e62	consistency_level: accept effective_replication_map as parameter, rather than keyspace A keyspace is a mutable object that can change from time to time. An effective_replication_map captures the state of a keyspace at a point in time and can therefore be consistent (with care from the caller). Change consistency_level's functions to accept an effective_replication_map. This allows the caller to ensure that separate calls use the same information and are consistent with each other. Current callers are likely correct since they are called from one continuation, but it's better to be sure.	2022-08-11 17:58:42 +03:00
Avi Kivity	1078d1bfda	consistency_level: be more const when using replication_strategy We don't modify the replication_strategy here, so use const. This will help when the object we get is const itself, as it will be in the next patches.	2022-08-11 17:58:42 +03:00
Wojciech Mitros	48bd752971	cql3: enable collections as UDA accumulators Currently, the initial values of UDA accumulators are converted to strings using the to_string() method and from strings using the from_string() method. The from_string() method is not implemented for collections, and it can't be implemented without changing the string format, because in that format, we cannot differentiate whether a separator is a part of a value or is an actual separator between values. In particular, the separators are not escaped in the collection values. For example, a list with string elements: 'a, b', 'c' would be represented as a string 'a, b, c', while now it is represented as "['a, b', 'c']". Some types that were parsable are now represented in a different way. For example, a tuple ('a', null, 0) was represented as "a:\@:0", and now it is "('a', null, 0)". Instead of from_string()/to_string() the cql parser is used for creating a value from a string (the same , and to_parsable_string() is used to converting a value into a string. A test using a list as an accumulator is added to cql-pytest/test_uda.py. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-08-11 16:23:57 +02:00
Anna Stuchlik	f5a49688ae	doc: replace Scylla with ScyllaDB on the Fault Tolerance page	2022-08-11 16:14:33 +02:00
Anna Stuchlik	7218a977df	doc: fis the typo in the note	2022-08-11 16:09:49 +02:00
Botond Dénes	d407d3b480	Merge 'Calculate effective_replication_map: prevent stalls with everywhere_replication_strategy' from Benny Halevy For replication strategies like "everywhere" and "local" that return the same set of endpoints for all tokens, we can call rs->calculate_natural_endpoints one once and reuse the result for all token. Note that ideally the replication_map could contain only a single token range for this case, but that does't seem to work yet. Add `maybe_yield()` calls to the tight loop to prevent reactor stalls on large clusters when copying a long vector returned by everywhere_replication_strategy to potentially 1000's of tokens in the map. Nicholas Peshek wrote in https://github.com/scylladb/scylladb/issues/10337#issuecomment-1211152370 about similar patch by Geoffrey Beausire: `994c6ecf3c` > Yep. That dropped our startup from 3000+ seconds to about 40. Fixes #10337 Closes #11277 * github.com:scylladb/scylladb: abstract_replication_strategy: calculate_effective_replication_map: optimize for static replication strategies abstract_replication_strategy: add has_uniform_natural_endpoints	2022-08-11 15:19:47 +03:00
Anna Stuchlik	1603129275	doc: remove the reduntant space from index	2022-08-11 12:36:16 +02:00
Anna Stuchlik	ee258cb0af	doc: update the syntax for defining service level attributes	2022-08-11 12:32:38 +02:00
Petr Gusev	4bc6611829	raft read_barrier, retry over intermittent rpc failures If the leader was unavailable during read_barrier, closed_error occurs, which was not handled in any way and eventually reached the client. This patch adds retries in this case. Fix: scylladb#11262 Refs: #11278 Closes #11263	2022-08-11 13:31:19 +03:00
Amnon Heiman	5ac20ac861	Reduce the number of per-scheduling group metrics This patch reduces the number of metrics ScyllaDB generates. Motivation: The combination of per-shard with per-scheduling group generates a lot of metrics. When combined with histograms, which require many metrics, the problem becomes even bigger. The two tools we are going to use: 1. Replace per-shard histograms with summaries 2. Do not report unused metrics. The storage_proxy stats holds information for the API and the metrics layer. We replaced timed_rate_moving_average_and_histogram and time_estimated_histogram with the unfied timed_rate_moving_average_summary_and_histogram which give us an option to report per-shard summaries instead of histogram. All the counters, histograms, and summaries were marked as skip_when_empty. The API was modified to use timed_rate_moving_average_summary_and_histogram. Closes #11173	2022-08-11 13:31:19 +03:00

1 2 3 4 5 ...

32669 Commits