scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 04:56:58 +00:00

Author	SHA1	Message	Date
Artsiom Mishuta	4b975668f6	tiering (test.py): introduce tiering labels introduce tiering marks 1 “unstable” - For unstable tests that will be will continue runing every night and generate up-to-date statistics with failures without failing the “Main” verification path(scylla-ci, Next) 2 “nightly” - for tests that are quite old, stable, and test functionality that rather not be changed or affected by other features, are partially covered in other tests, verify non-critical functionality, have not found any issues or regressions, too long to run on every PR, and can be popped out from the CI run. set 7 long tests(according to statistic in elastic) as nightly(theses 8 tests took 20% of CI run, about 4 hours without paralelization) 1 test as unstable(as exaple ot marker usage) Closes scylladb/scylladb#24974	2025-08-04 15:38:16 +03:00
Nadav Har'El	2431f92967	alternator, test: add reproducer for issue about immediate LWT timeout This patch adds a reproducer for issue #16261, where it was reported that when Alternator read-modify-write (using LWT) operations to the same partition are sent to different nodes, sometimes the operation fails immediately, with an InternalServerError claiming to be a "timeout", although this happens almost immediately (after a few milliseconds), not after any real timeout. The test uses 3 nodes, and 3 threads which send RMW operations to different items in the same partition, and usually (though not with 100% certainty) it reaches the InternalServerError in around 100 writes by each thread. This InternalServerError looks like: Internal server error: exceptions::mutation_write_timeout_exception (Operation timed out for alternator_alternator_Test_1719157066704.alternator_Test_1719157066704 - received only 1 responses from 2 CL=LOCAL_SERIAL.) The test also prints how much time it took for the request to fail, for example: In incrementing 1,0 on node 1: error after 0.017074108123779297 This is 0.017 seconds - it's not the cas_contention_timeout_in_ms timeout (1 second) or any other timeout. If we enable trace logging, adding to topology_experimental_raft/suite.yaml extra_scylla_cmdline_options: ["--logger-log-level", "paxos=trace"] we get the following TRACE-level message in the log: paxos - CAS[0] accept_proposal: proposal is partially rejected This again shows the problem is "uncertainty" (partial rejection) and not a timeout. Refs #16261 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19445	2025-08-01 11:58:52 +03:00
Nadav Har'El	16c1365332	test,alternator: test server-side load balancing with zero-token node In issue #6527 it was suggested that a zero-token node (a.k.a coordinator- only node, or data-less node) could serve as a topology-aware Alternator load balancer - requests could be sent to it and they will be forwarded to the right node. This feature was implemented, but we never tested that it actually works for Alternator requests. So this patch tests this by starting a 5-node cluster with 4 regular nodes and one zero-token node, and testing that requests to the zero-token node work as expected. It is important to know that this feature does indeed work as expected, and also to have a regression test for it so the feature doesn't break in the future. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23114	2025-06-25 11:13:15 +03:00
Nadav Har'El	3ce7e250cc	alternator: fix schema "concurrent modification" errors In ScyllaDB, schema modification operations use "optimistic locking": A schema operation reads the current schema, decides what it wants to do and prepares changes to the schema, and then attempts to commit those changes - but only if the schema hasn't changed since the first read. If the schema has already been changed by some other node - we need to try again. In a loop. In Alternator, there are six operations that perform schema modification: CreateTable, DeleteTable, UpdateTable, TagResource, UntagResource and UpdateTimeToLive. All of them were missing this loop. We knew about this - and even had FIXME in all places. So all these operations, when facing contention of concurrent schema modifications on different nodes may fail one of these operations with an error like: Internal server error: service::group0_concurrent_modification (Failed to apply group 0 change due to concurrent modification). This problem had very minor effect, if any, on real users because the DynamoDB SDK automatically retries operations that fail with retryable errors - like this "Internal server error" - and most likely the schema operation will succeed upon retry. However, as shown in issue #13152 these failures were annoying in our CI, where tests - which disable request retries - failed on these errors. This patch fixes all six operations (the last three operations all use one common function, db::modify_tags(), so are fixed by one change) to add the missing loop. The patch also includes reproducing tests for all these operations - the new tests all fail before this patch, and pass with it. These new tests are much more reliable reproducers than the dtests we had that only sometimes - very rarely - reproduced the problem. Moreover, the new tests reproduces the bug seperately for each of the six operations, so if we forget to fix one of the six operations, one of the tests would have continued to fail. Of course I checked this during development. The new tests are in the test/cluster framework, not test/alternator, because this problem can only be reproduced in a multi-node cluster: On a single node, it serializes its schema modifications on its own; The collisions only happen when more than one node attempts schema modifications at the same time. Fixes #13152 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23827	2025-05-05 09:59:08 +03:00
Artsiom Mishuta	d1198f8318	test.py: rename topology_custom folder to cluster rename topology_custom folder to cluster as it contains not only topology test cases	2025-03-04 10:32:44 +01:00

5 Commits