Test schema changes when there was an underlying topology change.
- per test case checks of cluster health and cycling
- helper class to do cluster manager API requests
- tests can perform topology changes: stop/start/restart servers
- modified clusters are marked dirty and discarded after the test case
- cql connection is updated per topology change and per cluster change
Closes#11266
* github.com:scylladb/scylladb:
test.py: test topology and schema changes
test.py: ClusterManager API mark cluster dirty
test.py: call before/after_test for each test case
test.py: handle driver connection in ManagerClient
test.py: ClusterManager API and ManagerClient
test.py: improve topology docstring
Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later.
Fixes: https://github.com/scylladb/scylladb/issues/11264Closes#11273
* github.com:scylladb/scylladb:
querier: querier_cache: remove now unused evict_all_for_table()
database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table()
reader_concurrency_semaphore: add evict_inactive_reads_for_table()
It should have had one, derived instances are stored and destroyed via
the base-class. The only reason this haven't caused bugs yet is that
derived instances happen to not have any non-trivial members yet.
Closes#11293
A mixed bag of improvements developed as part of another PR (https://github.com/scylladb/scylladb/pull/10736). Said PR was closed so I'm submitting these improvements separately.
Closes#11294
* github.com:scylladb/scylladb:
test/lib: move convenience table config factory to sstable_test_env
test/lib/sstable_test_env: move members to impl struct
test/lib/sstable_utils: use test_env::do_with_async()
Instead of querier_cache::evict_all_for_table(). The new method cover
all queriers and in addition any other inactive reads registered on the
semaphore. In theory by the time we detach a table, no regular inactive
reads should be in the semaphore anymore, but if there is any still, we
better evict them before the table is destroyed, they might attempt to
access it in when destroyed later.
All users of `column_family_test_config()`, get the semaphore parameter
for it from `sstable_test_env`. It is clear that the latter serves as
the storage space for stable objects required by the table config. This
patch just enshrines this fact by moving the config factory method to
`sstable_test_env`, so it can just get what it needs from members.
All present members of sstable_test_env are std::unique_ptr<>:s because
they require stable addresses. This makes their handling somewhat
awkward. Move all of them into an internal `struct impl` and make that
member a unique ptr.
Fixes#11184Fixes#11237
In prev (broken) fix for https://github.com/scylladb/scylladb/issues/11184 we added the footprint for left-over
files (replay candidates) to disk footprint on commitlog init.
This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things.
Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint).
So shards 1-X would all be either locked out or performance degraded.
Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved.
To further emphasize this, don't store segments found on init scan in all shard instances,
instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store
pointless string lists in shards 1 to X, and also we can potentially ask for the list twice.
More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is
considered candidate for recycling, and included in footprint.
Closes#11251
* github.com:scylladb/scylladb:
commitlog: Make get_segments_to_replay on-demand
commitlog: Revert/modify fac2bc4 - do footprint add in delete
Fix https://github.com/scylladb/scylladb/issues/11197
This PR adds a new page where specifying workload attributes with service levels is described and adds it to the menu.
Also, I had to fix some links because of the warnings.
Closes#11209
* github.com:scylladb/scylladb:
doc: remove the reduntant space from index
doc: update the syntax for defining service level attributes
doc: rewording
doc: update the links to fix the warnings
doc: add the new page to the toctree
doc: add the descrption of specifying workload attributes with service levels
doc: add the definition of workloads to the glossary
In preparation for effective_replication_map hygiene, convert
some counter functions to coroutines to simplify the changes.
Closes#11291
* github.com:scylladb/scylladb:
storage_proxy: mutate_counters_on_leader: coroutinize
storage_proxy: mutate_counters: coroutinize
storage_proxy: mutate_counters: reorganize error handling
Simplify ahead of refactoring for consistent effective_replication_map.
This is probably a pessimization of the error case, but the error case
will be terrible in any case unless we resultify it.
Move the error handling function where it's used so the code
is more straightforward.
Due to some std::move()s later, we must still capture the schema early.
Move the termination condition to the front of the loop so it's
clear why we're looping and when we stop.
It's less than perfectly clean since we widen the scope of some variables
(from loop-internal to loop-carried), but IMO it's clearer.
It's much easier to maintain this way. Since it uses ranges_to_vnodes,
it interacts with topology and needs integration into
effective_replication_map management.
The patch leaves bad indentation and an infinite-looking loop in
the interest of minimization, but that will be corrected later.
Note, the test for `!r.has_value()` was eliminated since it was
short-circuited by the test for `!rqr.has_value()` returning from
the coroutine rather than propagating an error.
We use result_wrap() in two places, but that makes coroutinizing the
containing function a little harder, since it's composed of more lambdas.
Remove the wrappers, gaining a bit of performance in the error case.
The function `check_exists` checks whether a given table exists, giving
an error otherwise. It previously used `on_internal_error`.
`check_exists` is used in some old functions that insert CDC metadata to
CDC tables. These tables are no longer used in newer Scylla versions
(they were replaced with other tables with different schema), and this
function is no longer called. The table definitions were removed and
these tables are no longer created. They will only exists in clusters
that were upgraded from old versions of Scylla (4.3) through a sequence
of upgrades.
If you tried to upgrade from a very old version of Scylla which had
neither the old or the new tables to a modern version, say from 4.2 to
5.0, you would get `on_internal_error` from this `check_exists`
function. Fortunately:
1. we don't support such upgrade paths
2. `on_internal_error` in production clusters does not crash the system,
only throws. The exception would be catched, printed, and the system
would run (just without CDC - until you finished upgrade and called
the propoer nodetool command to fix the CDC module).
Unfortunately, there is a dtest (`partitioner_tests.py`) which performs
an unsupported upgrade scenario - it starts Scylla from Cassandra (!)
work directories, which is like upgrading from a very old version of
Scylla.
This dtest was not failing due to another bug which masked the problem.
When we try to fix the bug - see #11225 - the dtest starts hitting the
assertion in `check_exists`. Because it's a test, we configure
`on_internal_error` to crash the system.
The point of this commit is to not crash the system in this rare
scenario which happens only in some weird tests. We now throw
`std::runtime_error` instead of calling `on_internal_error`. In the
dtest, we already ignore the resulting CDC error appearing in the logs
(see scylladb/scylla-dtest#2804). Together with this change, we'll be
able to fix the #11225 bug and pass this test.
Closes#11287
Previously, the `system.local`'s `rpc_address` column kept local node's
`rpc_address` from the scylla.yaml configuration. Although it sounds
like it makes sense, there are a few reasons to change it to the value
of scylla.yaml's `broadcast_rpc_address`:
- The `broadcast_rpc_address` is the address that the drivers are
supposed to connect to. `rpc_address` is the address that the node
binds to - it can be set for example to 0.0.0.0 so that Scylla listens
on all addresses, however this gives no useful information to the
driver.
- The `system.peers` table also has the `rpc_address` column and it
already keeps other nodes' `broadcast_rpc_address`es.
- Cassandra is going to do the same change in the upcoming version 4.1.
Fixes: #11201Closes#11204
* github.com:scylladb/scylladb:
db/system_keyspace: fix indentation after previous patch
db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column
Currently, the initial values of UDA accumulators are converted
to strings using the to_string() method and from strings using the
from_string() method. The from_string() method is not implemented
for collections, and it can't be implemented without changing the
string format, because in that format, we cannot differentiate
whether a separator is a part of a value or is an actual separator
between values. In particular, the separators are not escaped
in the collection values.
Instead of from_string()/to_string() the cql parser is used
for creating a value from a string (the same , and to_parsable_string()
is used to converting a value into a string.
A test using a list as an accumulator is added to
cql-pytest/test_uda.py.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
Closes#11250
* github.com:scylladb/scylladb:
cql3: enable collections as UDA accumulators
cql3: extend implementation of to_bytes for raw_value
Fix https://github.com/scylladb/scylla-doc-issues/issues/438
In addition, I've replaced "Scylla" with "ScyllaDB" on that page.
Closes#11281
* github.com:scylladb/scylladb:
doc: replace Scylla with ScyllaDB on the Fault Tolerance page
doc: fis the typo in the note
This patch fixes the test test_scan.py::test_scan_paging_missing_limit
which failed in a Jenkins run once (that we know of).
That test verifies that an Alternator Scan operation *without* an explicit
"Limit" is nevertheless paged: DynamoDB (and also Scylla) wanted this page
size to be 1 MB, but it turns out (see #10327) that because of the details
of how Scylla's scan works, the page size can be larger than 1 MB. How much
larger? I ran this test hundreds of times and never saw it exceed a 3 MB
page - so the test asserted the page must be smaller than 4 MB. But now
in one run - we got to this 4 MB and failed the test.
So in this patch we increase the table to be scanned from 4 MB to 6 MB,
and assert the page size isn't the full 6 MB. The chance that this size will
eventually fail as well should be (famous last words...) very small for
two reasons: First because 6 MB is even higher than I the maximum I saw
in practice, and second because empirically I noticed that adding more
data to the table reduces the variance of the page size, so it should
become closer to 1 MB and reduce the chance of it reaching 6 MB.
Refs #10327
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11280
Fix https://github.com/scylladb/scylla-doc-issues/issues/857Closes#11253
* github.com:scylladb/scylladb:
doc: language improvemens to the Counrers page
doc: fix the external link
doc: clarify the disclaimer about reusing deleted counter column values
Fix https://github.com/scylladb/scylla-doc-issues/issues/867
Plus some language, formatting, and organization improvements.
Closes#11248
* github.com:scylladb/scylladb:
doc: language, formatting, and organization improvements
doc: add a disclaimer about not supporting local counters by SSTableLoader
Replication is a mix of several inputs: tokens and token->node mappings (topology),
the replication strategy, replication strategy parameters. These are all captured
in effective_replication_map.
However, if we use effective_replication_map:s captured at different times in a single
query, then different uses may see different inputs to effective_replication_map.
This series protects against that by capturing an effective_replication_map just
once in a query, and then using it. Furthermore, the captured effective_replication_map
is held until the query completes, so topology code can know when a topology is no
longer is use (although this isn't exploited in this series).
Only the simple read and write paths are covered. Counters and paxos are left for
later.
I don't think the series fixes any bugs - as far as I could tell everything was happening
in the same continuation. But this series ensures it.
Closes#11259
* github.com:scylladb/scylladb:
storage_proxy: use consistent topology
storage_proxy: use consistent replication map on read path
storage_proxy: use consistent replication map on write path
storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map
consistency_level: accept effective_replication_map as parameter, rather than keyspace
consistency_level: be more const when using replication_strategy
Add support for topology changes: add/stop/remove/restart/replace node.
Test simple schema changes when changing topology.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Preparing for topology tests with changing clusters, run before and
after checks per test case.
Change scope of pytest fixtures to function as we need them per test
casse.
Add server and client API logic.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Add an API via Unix socket to Manager so pytests can query information
about the cluster. Requests are managed by ManagerClient helper class.
The socket is placed inside a unique temporary directory for the
Manager (as safe temporary socket filename is not possible in Python).
Initial API services are manager up, cluster up, if cluster is dirty,
cql port, configured replicas (RF), and list of host ids.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Derive the topology from captured and stable effective_replication_map
instead of getting a fresh topology from storage_proxy, since the
fresh topology may be inconsistent with the running query.
digest_read_resolver did not capture an effective_replication_map, so
that is added.
Capture a replication map just once in
abstract_read_executor::_effective_replication_map_ptr. Although it isn't
used yet, it serves to keep a reference count on topology (for fencing),
and some accesses to topology within reads still remain, which can be
converted to use the member in a later patch.
Capture a replication map just once in
abstract_write_handler::_effective_replication_map_ptr and use it
in all write handlers. A few accesses to get the topology still remain,
they will be fixed up in a later patch.
A keyspace is a mutable object that can change from time to time. An
effective_replication_map captures the state of a keyspace at a point in
time and can therefore be consistent (with care from the caller).
Change consistency_level's functions to accept an effective_replication_map.
This allows the caller to ensure that separate calls use the same
information and are consistent with each other.
Current callers are likely correct since they are called from one
continuation, but it's better to be sure.
Currently, the initial values of UDA accumulators are converted
to strings using the to_string() method and from strings using the
from_string() method. The from_string() method is not implemented
for collections, and it can't be implemented without changing the
string format, because in that format, we cannot differentiate
whether a separator is a part of a value or is an actual separator
between values. In particular, the separators are not escaped
in the collection values. For example, a list with string elements:
'a, b', 'c' would be represented as a string 'a, b, c', while now
it is represented as "['a, b', 'c']".
Some types that were parsable are now represented in a different
way. For example, a tuple ('a', null, 0) was represented as
"a:\@:0", and now it is "('a', null, 0)".
Instead of from_string()/to_string() the cql parser is used
for creating a value from a string (the same , and to_parsable_string()
is used to converting a value into a string.
A test using a list as an accumulator is added to
cql-pytest/test_uda.py.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
For replication strategies like "everywhere"
and "local" that return the same set of endpoints
for all tokens, we can call rs->calculate_natural_endpoints
one once and reuse the result for all token.
Note that ideally the replication_map could contain only
a single token range for this case, but that does't seem to work yet.
Add `maybe_yield()` calls to the tight loop
to prevent reactor stalls on large clusters when copying
a long vector returned by everywhere_replication_strategy
to potentially 1000's of tokens in the map.
Nicholas Peshek wrote in
https://github.com/scylladb/scylladb/issues/10337#issuecomment-1211152370
about similar patch by Geoffrey Beausire:
994c6ecf3c
> Yep. That dropped our startup from 3000+ seconds to about 40.
Fixes#10337Closes#11277
* github.com:scylladb/scylladb:
abstract_replication_strategy: calculate_effective_replication_map: optimize for static replication strategies
abstract_replication_strategy: add has_uniform_natural_endpoints
If the leader was unavailable during read_barrier,
closed_error occurs, which was not handled in any way
and eventually reached the client. This patch adds retries in this case.
Fix: scylladb#11262
Refs: #11278Closes#11263
This patch reduces the number of metrics ScyllaDB generates.
Motivation: The combination of per-shard with per-scheduling group
generates a lot of metrics. When combined with histograms, which require
many metrics, the problem becomes even bigger.
The two tools we are going to use:
1. Replace per-shard histograms with summaries
2. Do not report unused metrics.
The storage_proxy stats holds information for the API and the metrics
layer. We replaced timed_rate_moving_average_and_histogram and
time_estimated_histogram with the unfied
timed_rate_moving_average_summary_and_histogram which give us an option
to report per-shard summaries instead of histogram.
All the counters, histograms, and summaries were marked as
skip_when_empty.
The API was modified to use
timed_rate_moving_average_summary_and_histogram.
Closes#11173