This code is now spread over main and differs in cql_test_env. The PR unifies both places and makes the manager start-stop look standard
refs: #2795Closes#15375
* github.com:scylladb/scylladb:
batchlog_manager: Remove start() method
batchlog_manager: Start replay loop in constructor
main, cql_test_env: Start-stop batchlog manager in one "block"
batchlog_manager: Move shard-0 check into batchlog_replay_loop()
batchlog_manager: Fix drain() reentrability
This reverts commit 628e6ffd33, reversing
changes made to 45ec76cfbf.
The test included with this PR is flaky and often breaks CI.
Revert while a fix is found.
Fixes: #15371
Currently starting and stopping of b.m. is spread over main(). Keep it
close to each other.
Another trickery here is that calling b.m.::start() can only be done
after joining the cluster, because this start() spawns replay loop
which, in turn calls token_metadata::count_normal_token_owners() and if
the latter returns zero, the b.m. code uses it as a fraction denominator
and crashes.
With the above in mind, cql_test_env should start batchlog manager after
it "joins the ring" too. For now it doesn't make any difference, but
next patch will make use of it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently minio starts with a bucket that has public anonymous access. Respectively, all tests use unsigned S3 requests. That was done for simplicity, and its better to apply some policy to the bucket and, consequentially, make tests sign their requests.
Other than the obvious benefit that we test requests signing in unit tests, another goal of this PR is to make it possible to simulate and test various error paths locally, e.g. #13745 and #13022Closes#14525
* github.com:scylladb/scylladb:
test/s3: Remove AWS_S3_EXTRA usage
test/s3: Run tests over non-anonymous bucket
test/minio: Create random temp user on start
code: Rename S3_PUBLIC_BUCKET_FOR_TEST
This is a workaround for the flakiness of the test where INSERT
statements following the rolling restart fail with "No host available"
exception. The hypothesis is that those INSERTS race with driver
reconnecting to the cluster and if INSERTs are attempted before
reconnection is finished, the driver will refuse to execute the
statements.
The real fix should be in the driver to join with reconnections but
before that is ready we want to fix CI flakiness.
Refs #14746Closes#15355
Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large:
1. Large allocations. Serialization of reconcilable_result causes large allocations for storing result rows in std::deque
2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s
3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111.
This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows.
This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do.
My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition):
* Node1: 1 live row, 1M dead rows
* Node2: 1M dead rows, 1 live row
This was designed to trigger reconciliation right from the very start of the query.
Before:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```
After:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```
Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before).
Refs https://github.com/scylladb/scylladb/issues/7929
Refs https://github.com/scylladb/scylladb/issues/3672
Refs https://github.com/scylladb/scylladb/issues/7933
Fixes https://github.com/scylladb/scylladb/issues/9111Closes#14923
* github.com:scylladb/scylladb:
test/topology_custom: add test_read_repair.py
replica/mutation_dump: detect end-of-page in range-scans
tools/scylla-sstable: write: abort parser thread if writing fails
test/pylib: add REST methods to get node exe and workdir paths
test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
service/storage_proxy: add trace points for the actual read executor type
service/storage_proxy: add trace points for read-repair
storage_proxy: Add more trace-level logging to read-repair
database: Fix accounting of small partitions in mutation query
database, storage_proxy: Reconcile pages with no live rows incrementally
When `nodetool disablebinary` command executes its handler aborts listening sockets, shuts down all client connections _and_ (!) then waits for the connections to stop existing. Effectively the command tries to make sure that no activity initiated by a CQL query continues, even though client would never see its result (client sockets are closed)
This makes the disablebinary command hang for long sometimes, which is not really nice. The proposal is to wait for the connections to terminate in the background. So once disablebinary command exists what's guaranteed is that all client connections are aborted and new connections are not admitted, but some activity started by them may still be running (e.g. up until `nodetool drain` is issued). Driver-side sockets won't get the queries' results anyway.
The behavior of `disablebinary` is not documented wrt whether it should wait for CQL processing to stop or not, so technically we're not breaking anything. However, it can happen that it's a disruptive change and some setups may behave differently after it.
refs: #14031
refs: #14711Closes#14743
* github.com:scylladb/scylladb:
test/cql-pytest: Add enable|disable-binary test case
test.py: Add suite option to auto-dirty cluster after test
test/pylib: Add nodetool enable|disable-binary commands
transport: Shutdown server on disablebinary
generic_server: Introduce shutdown()
generic_server: Decouple server stopped from connection stopped
transport/controller: Coroutinize do_stop_server()
transport/controller: Coroutinize stop_server()
The test checks that `nodetool disablebinary` makes subsequent queries
fail and `nodetool enablebinary` lets client to establish new
connections.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's better to pass a disengaged optional when
the caller doesn't have the information rather than
passing the default dc_rack location so the latter
will never implicitly override a known endpoint dc/rack location.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#15300
The current read-loop fails to detect end-of-page and if the query
result buider cuts the page, it will just proceed to the next
partition. This will result in distorted query results, as the result
builder will request for the consumption to stop after each clustering
row.
To fix, check if the page was cut before moving on to the next
partition.
A unit test reproducing the bug was also added.
The local node's dc:rack pair is cached on system keyspace on start. However, most of other code don't need it as they get dc:rack from topology or directly from snitch. There are few places left that still mess with sysks cache, but they are easy to patch. So after this patch all the core code uses two sources of dc:rack -- topology / snitch -- instead of three.
Closes#15280
* github.com:scylladb/scylladb:
system_keyspace: Don't require snitch argument on start
system_keyspace: Don't cache local dc:rack pair
system_keyspace: Save local info with explicit location
storage_service: Get endpoint location from snitch, not system keyspace
snitch: Introduce and use get_location() method
repair: Local location variables instead of system keyspace's one
repair: Use full endpoint location instead of datacenter part
A reviewer noted that test_update_expression_list_append_non_list_arguments
has too much code duplication - the same long API call to run
"SET a = list_append(...)" was repeated many times.
So in this patch we add a short inner function "try_list_append" to
avoid this duplication.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes: #15298
Find progress of repair tasks based on the number of ranges
that have been repaired.
Fixes: [#1156](https://github.com/scylladb/scylla-enterprise/issues/1156).
Closes#14698
* github.com:scylladb/scylladb:
test: repair tasks test
repair: add methods making repair progress more precise
tasks: make progress related methods virtual
repair: add get_progress method to shard_repair_task_impl
repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size()
repair: log a name of a particular table repair is working on
tasks: delete move and copy constructors from task_manager::task::impl
Currently, the topology coordinator has the
`topology::transition_state::publish_cdc_generation` state responsible
for publishing the already created CDC generations to the user-facing
description tables. This process cannot fail as it would cause some CDC
updates to be missed. On the other hand, we would like to abort the
`publish_cdc_generation` state when bootstrap aborts. Of course, we
could also wait until handling this state finishes, even in the case of
the bootstrap abort, but that would be inefficient. We don't want to
unnecessarily block topology operations by publishing CDC generations.
The solution proposed by this PR is to remove the
`publish_cdc_generation` state completely and introduce a new background
fiber of the topology coordinator -- `cdc_generation_publisher` -- that
continually publishes committed CDC generations.
Apart from introducing the CDC generation publisher, we add
`test_cdc_generation_publishing.py` that verifies its correctness and we
adapt other CDC tests to the new changes.
Fixes#15194Closes#15281
* github.com:scylladb/scylladb:
test: test_cdc: introduce wait_for_first_cdc_generation
test: move cdc_streams_check_and_repair check
test: add test_cdc_generation_publishing
docs: remove information about publish_cdc_generation
raft topology: introduce the CDC generation publisher
system_keyspace: load unpublished_cdc_generations to topology
raft topology: mark committed CDC generations as unpublished
raft topology: add unpublished_cdc_generations to system.topology
Add tests for gossiper/endpoint/live and gossiper/endpoint/down
which run only in release mode.
Enable test_remove_node_with_concurrent_ddl and fix types and
variables names used by it, so that they can be reused in gossiper
test.
Fixes: #15223.
Closes#15244
* github.com:scylladb/scylladb:
test: topology: add gossiper test
test: fix types and variable names in wait_for_host_down
After introducing the CDC generation publisher,
test_cdc_log_entries_use_cdc_streams could (at least in theory)
fail by accessing system_distributed.cdc_streams_descriptions_v2
before the first CDC generation has been published.
To avoid flakiness, we simply wait until the first CDC generation
is published in a new function -- wait_for_first_cdc_generation.
The part of test_topology_ops that tests the
cdc_streams_check_and_repair request could (at least in theory)
fail on
`assert(len(gen_timestamps) + 1 == len(new_gen_timestamps))`
after introducing the CDC generation publisher because we can
no longer assume that all previously committed CDC generations
have been published before sending the request.
To prevent flakiness, we move this part of the test to
test_cdc_generations_are_published. This test allows for ensuring
that all previous CDC generations have been published.
Additionally, checking cdc_streams_check_and_repair there is
simpler and arguably fits the test better.
We add two test cases that test the new CDC generation publisher
to detect potential bugs like incorrect order of publications or
not publishing some generations at all.
The purpose of the second test case --
test_multiple_unpublished_cdc_generations -- is to enforce and test
a scenario when there are multiple unpublished CDC generations at
the same time. We expect that this is a rare case. The main fiber
of the topology coordinator would have to make much more progress
(like finishing two bootstraps) than the CDC generation publisher
fiber. Since multiple unpublished CDC generations might never
appear in other tests but could be handled incorrectly, having
such a test is valuable.
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda with seastar::async(). The cql env already provides a helper for that
Closes#15305
* github.com:scylladb/scylladb:
cql_query_test: Fix indentation after previous patch
cql_query_test: Use do_with_cql_env_thread() explicitly
Now when the keys and region can be configured with "standard"
environment variables, the old custom one can be removed. No automation
uses that it was purely a support for manual testing of a client against
AWS's S3 server
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently minio applies anonymous public policy for the test bucket and
all tests just use unsigned S3 requests. This patch generates a policy
for the temporary minio user and removes the anon public one. All tests
are updated respectively to use the provided key:secret pair.
The use-https bit is off by default as minio still starts with plain
http. That's OK for now, all tests are local and have no secret data
anyway
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The user is going to have rights to access the test bucket. For now just
create one and export the tests via environment
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The bucket is going to stop being public, rename the env variable in
advance to make the essential patch smaller
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The Alternator tests can run against HTTPS - namely when using
test/alternator/run with the "--https" option (local Alternator
configured with HTTPS) or "--aws" option (DynamoDB, using HTTPS).
In some cases we make these HTTPS requests with verify=False, to avoid
checking the SSL certificates. E.g., this is necessary for Alternator
with a self-signed certificate. Unfortunately, the urllib3 library adds
an ugly warning message when SSL certificate verification is disabled.
In the past we tried to disable these warnings, using the documented
urllib3.disable_warnings() function, but it didn't help. It turns out
that pytest has its own warning handling, so to disable warnings in
pytest we must say so in a special configuration parameter in pytest.ini.
So in this patch, we drop the disable_warnings call from conftest.py
(where it didn't help), and instead put a similar declaration in
pytest.ini. The disable_warnings call in the test/alternator/run
script needs to remain - it is run outside pytest, so pytest.ini
doesn't affect it.
After this patch, running test/alternator/run with --https or --aws
finishes without warnings, as desired.
Fixes#15287
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#15292
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda
with seastar::async(). The cql env already provides a helper for that
Indentation is deliberately left broken until next patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Passing the gate_closed_exception to the task promise
ends up with abandoned exception since no-one is waiting
for it.
Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.
Fixes scylladb/scylladb#15211
In addition, this series adds a private abort_source for each task_manager module
(chained to the main task_manager::abort_source) and abort is requested on task_manager::module::stop().
gate holding in compaction_manager is hardened
and makes sure to stop compaction_manager and task_manager in sstable_compaction_test cases.
Closes#15213
* github.com:scylladb/scylladb:
compaction_manager: stop: close compaction_state:s gates
compaction_manager: gracefully handle gate close
task_manager: task: start: fixup indentation
task_manager: module: make_task: enter gate when the task is created
task_manaer: module: stop: request abort
task_manager: task::impl: subscribe to module about_source
test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
test: simple_backlog_controller_test: stop compaction and task managers
Improved the coverage of the tests for the list_append() function
in UpdateExpression - test that if one of its arguments is not a list,
including a missing attribute or item, it is reported as an error as
expected.
The new tests pass on both Alternator and DynamoDB.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#15291
On boot system keyspace is kicked to insert local info into system.local
table. Among other things there's dc:rack pair which sys.ks. gets from
its cache which, in turn, should have been previously initialized from
snitch on sys.ks. start. This patch makes the local info updating method
get the dc:rack from caller via argument. Callers, in turn, call snitch
directly, because these are main and cql_test_env startup routines.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"experimental" option was marked "Unused" in 64bc8d2f7d. but we
chose to keep it in hope that the upgrade test does not fail.
despite that the upgrade tests per-se survived the "upgrade",
after the upgrade, the tests exercising the experimental features
are still failing hard. they have not been updated to set the
"experimental-features" option, and are still relying on
"experimental" to enable all the experimental features under
test.
so, in this change, let's just drop the option so that
scylla can fail early at seeing this "experimental" option.
this should help us to identify the tests relying on it
quicker. as the "experimental" features should only be used
in development environment, this change should have no impact
to production.
Refs #15214
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15233
The s.service since d42685d0cb is having on-board query processor ref^w
pointer and can use it to join cluster
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#15236
Index caching was disabled by default because it caused performance regressions
for some small-partition workloads. See https://github.com/scylladb/scylladb/issues/11202.
However, it also means that there are workloads which could benefit from the
index cache, but (by default) don't.
As a compromise, we can set a default limit on the memory usage of index cache,
which should be small enough to avoid catastrophic regressions in
small-partition workloads, but big enough to accommodate workloads where
index cache is obviously beneficial.
This series adds such a configurable limit, sets it to to 0.2 of total cache memory by default,
and re-enables index caching by default.
Fixes#15118Closes#14994
* github.com:scylladb/scylladb:
test: boost/cache_algorithm_test: add cache_algorithm_test
sstables: partition_index_cache: deglobalize stats
utils: cached_file: deglobalize cached_file metrics
db: config: enable index caching by default
config: add index_cache_fraction
utils: lru: add move semantics to list links
Move partition_index_cache stats from a thread_local variable
to cache_tracker. After the change, partition_index_cache
receives a reference to the stats via constructor, instead of
referencing a global.
This is needed so that cache_tracker can know the memory usage
of index caches (for cache eviction purposes) without relying on
globals.
But it also makes sense even without that motive.
Move cached_file metrics from a thread_local variable
to cache_tracker.
This is needed so that cache_tracker can know
the memory usage of index caches (for purposes
of cache eviction) without relying on globals.
But it also makes sense even without that motive.
This PR collects followups described in #14972:
- The `system.topology` table is now flushed every time feature-related
columns are modified. This is done because of the feature check that
happens before the schema commitlog is replayed.
- The implementation now guarantees that, if all nodes support some
feature as described by the `supported_features` column, then support
for that feature will not be revoked by any node. Previously, in an
edge case where a node is the last one to add support for some feature
`X` in `supported_features` column, crashes before applying/persisting
it and then restarts without supporting `X`, it would be allowed to boot
anyway and would revoke support for the `X` in `system.topology`.
The existing behavior, although counterintuitive, was safe - the
topology coordinator is responsible for explicitly marking features as
enabled, and in order to enable a feature it needs to perform a special
kind of a global barrier (`barrier_after_feature_update`) which only
succeeds after the node has updated its features column - so there is no
risk of enabling an unsupported feature. In order to make the behavior
less confusing, the node now will perform a second check when it tries
to update its `supported_features` column in `system.topology`.
- The `barrier_after_feature_update` is removed and the regular global
`barrier` topology command is used instead. The `barrier` handler now
performs a feature check if the node did not have a chance to verify and
update its cluster features for the second time.
JOIN_NODE rpc will be sent separately as it is a big item on its own.
Fixes: #14972Closes#15168
* github.com:scylladb/scylladb:
test: topology{_experimental_raft}: don't stop gracefully in feature tests
storage_service: remove _topology_updated_with_local_metadata
topology_coordinator: remove barrier_after_feature_update
topology_coordinator: perform feature check during barrier
storage_service: repeat the feature check after read barrier
feature_service: introduce unsupported_feature_exception
feature_service: move startup feature check to a separate function
topology_coordinator: account for features to enable in should_preempt_balancing
group0_state_machine: flush system.topology when updating features columns
Scrub tests use a lot of temporary directories. This is suspected to
cause problems in some cases. To improve the situation, this patch:
* Creates a single root temporary directory for all scrub tests
* All further fixtures create their files/directories inside this root
dir.
* All scrub tests create their temporary directories within this root
dir.
* All temporary directories now use an appropriate "prefix", so we can
tell which temporary directory is part of the problem if a test fails.
Refs: #14309Closes#15117
The current cluster feature tests are stopping nodes in a graceful way.
Doing it gracefully isn't strictly necessary for the test scenarios and
we can switch `server_stop_gracefully` calls to `server_stop`. This only
became possible after a previous commit which causes `system.topology`
table to be flushed when cluster feature columns are modified, and will
server as a good test for it.
We want to stop supporting IPs for `--ignore-dead-nodes` in
`raft_removenode` and `--ignore-dead-nodes-for-replace` for
`raft_replace`. However, we shouldn't remove these features without the
deprecation period because the original `removenode` and `replace`
operations still support them. So, we add them for now.
The `IP -> Raft ID` translation is done through the new
`raft_address_map::find_by_addr` member function.
We update the documentation to inform about the deprecation of the IP
support for `--ignore-dead-nodes`.
Fixes#15126Closes#15156
* github.com:scylladb/scylladb:
docs: inform about deprecating IP support for --ignore-dead-nodes
raft topology: support IPs for --ignore-dead-nodes
raft_address_map: introduce find_by_addr