Commit Graph

38359 Commits

Author SHA1 Message Date
Petr Gusev
1ddc76ffd1 test_fencing: add test_fence_hints
The test makes a write through the first node with
the third node down, this causes a hint to be stored on the
first node for the second. We increment the version
and fence_version on the third node, restart it,
and expect to see a hint delivery failure
because of versions mismatch. Then we update the versions
of the first node and expect hint to be successfully
delivered.
2023-08-22 15:48:40 +04:00
Petr Gusev
3ccd2abad4 test.py: output the skipped tests
pytest option -rs forces it to print
all the skipped tests along with
the reasons. Without this option we
can't tell why certain tests were skipped,
maybe some of them shouldn't already.
2023-08-22 15:48:40 +04:00
Petr Gusev
c434d26b36 test.py: add skip_mode decorator and fixture
Syntactic sugar for marking tests to be
skipped in a particular mode.

There is skip_in_debug/skip_in_release in suite.yaml,
but they can be applied only on the entire file,
which is unnatural and inconvenient. Also, they
don't allow to specify a reason why the test is skipped.

Separate dictionary skipped_funcs is needed since
we can't use pytest fixtures in decorators.
2023-08-22 15:48:40 +04:00
Petr Gusev
a639d161e6 test.py: add mode fixture
Sometimes a test wants to know what mode
it is running in so that e.g. it can skip
itself in some of them.
2023-08-22 15:48:40 +04:00
Petr Gusev
439c91851f hints: add debug log for dropped hints
Dropping data is rather important event,
let's log it at least at the debug level.
It'll help in debugging tests.
2023-08-22 15:48:40 +04:00
Petr Gusev
9fd3df13a2 hints: send_one_hint: extend the scope of file_send_gate holder
The problem was that the holder in with_gate
call was released too early. This happened
before the possible call to on_hint_send_failure
in then_wrapped. As a result, the effects of
on_hint_send_failure (segment_replay_failed flag)
were not visible in send_one_file after
ctx_ptr->file_send_gate.close(), so we could decide
that the segment was sent in full and delete
it even if sending of some hints led to errors.

Fixes #15110
2023-08-22 15:48:40 +04:00
Petr Gusev
0b7a90dff6 pylib: add ScyllaMetrics
This patch adds facilities to work
with Scylla metrics from test.py tests.
The new metrics property was added to
ManagerClient, its query method
sends a request to Scylla metrics
endpoint and returns and object
to conveniently access the result.

ScyllaMetrics is copy-pasted from
test_shedding.py. It's difficult
to reuse code between 'new' and 'old'
styles of tests, we can't just import
pylib in 'old' tests because of some
problems with python search directories.
A past commit of mine that attempted
to solve this problem was rejected on review.
2023-08-22 14:31:04 +04:00
Petr Gusev
1b7603af23 hints manager: add send_errors counter
There was no indication of problems
in the hints manager metrics before.
We need this counter for fencing tests
in the later commit, but it seems to be
useful on its own.
2023-08-22 14:31:04 +04:00
Petr Gusev
fa25e6d63e token_metadata: add debug logs
We log the new version when the new token
metadata is set.

Also, the log for fence_version is moved
in shared_token_metadata from storage_service
for uniformity.
2023-08-22 14:31:04 +04:00
Petr Gusev
360453fd87 fencing: add simple data plane test
The test starts a three node cluster
and manually decrements the version on
the last node. It then tries to write
some data through the last node and
expects to get 'stale topology' exception.
2023-08-22 14:31:01 +04:00
Petr Gusev
5361de76f9 random_tables.py: add counter column type
We'll need it for fencing test.
2023-08-11 17:37:09 +04:00
Petr Gusev
f5b41a8075 raft topology: don't increment version when transitioning to node_state::normal
This version increment in not accompanied by a
global_token_metadata_barrier, which means the new
version won't be reflected in fence_version
and basically will have no effect in terms of fencing.
2023-08-11 16:22:25 +04:00
Patryk Jędrzejczak
d1d1b6cf6e raft: remove a replaced node from group 0 earlier
The topology coordinator only marks a replaced node as LEFT
during the replace operation and actually removes it from the
group 0 config in cleanup_group0_config_if_needed. If this
function is called before raft has committed a replacing node as
a voter, it does not remove the replaced node from the group 0
config. Then, the coordinator can decide that it has no work to
do and starts sleeping, leaving us with an outdated config.

This behavior reduces group 0 availability and causes problems in
tests (see  #14885). Also, it makes the coordinator's logic
confusing - it claims that it has no work to do when it has some
work to do. Therefore, we modify the coordinator so that it
removes the replaced node earlier in handle_topology_transition.

Fixes #14885
Fixes #14975

Closes #15009
2023-08-11 01:32:24 +02:00
Botond Dénes
403ba9b055 Merge 'gossiper: lock_endpoint: fix review comments' from Benny Halevy
This series fixes a couple of review comments on #14845

Closes #14976

* github.com:scylladb/scylladb:
  gossiper: lock_endpoint: fix comment regarding permit_id mismatch
  gossiper: lock_endpoint: change assert to on_internal_error
2023-08-10 17:37:32 +03:00
Kamil Braun
8f658fb139 Merge 's3/client: check for available port before starting minio server' from Kefu Chai
there is chance that the default port of 9000 has been used on the host
running the test, in that case, we should try to use another available
port.

so, in this change, we try ports in the ranges of [9000, 9000+1000), and
use the first one which is not connectable.

Fixes #14985
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14997

* github.com:scylladb/scylladb:
  test: stop using HostRegistry in MinioServer
  s3/client: check for available port before starting minio server
2023-08-10 14:01:13 +02:00
Alejo Sanchez
e2122163f5 test/pylib: protect double call to cluster stop
test.py schedules calls to cluster .uninstall() and .stop() making
double calls to it running at the same time. Mark the cluster as not
running early on.

While there, do the same for .stop_gracefully() for consistency.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #14987
2023-08-10 13:37:49 +02:00
Kamil Braun
39330b9c11 Merge 'gossiper: convict: lock_endpoint' from Benny Halevy
Currently, `mark_dead` is called with null_permit_id
from `convict`, and in this case, by contract,
it should lock the endpoint, same as `mark_as_shutdown`.

This change somehow escaped #14845 so it amends it.

Fixes #14838

Closes #15001

* github.com:scylladb/scylladb:
  gossiper: verify permit_id in all private functions
  gossiper: convict: lock_endpoint
2023-08-10 13:09:05 +02:00
Benny Halevy
623ed1a249 gossiper: verify permit_id in all private functions
Instead of acquiring the permit is the permit_id arg is null,
like in mark_as_shutdown, just asssert that the permit_id is
non-null. The functions are both private and we control all callers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-10 08:17:04 +03:00
Benny Halevy
42c1c5ead8 gossiper: convict: lock_endpoint
Currently, `mark_dead` is called with null_permit_id
from `convict`, and in this case, by contract,
it should lock the endpoint, same as `mark_as_shutdown`.

This change somehow escaped #14845 so it amends it.

Fixes #14838

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-10 07:50:33 +03:00
Kefu Chai
0c0a59bf62 test: stop using HostRegistry in MinioServer
since MinioServer find a free port by itself, there is no need to
provide it an IP address for it anymore -- we can always use
127.0.0.1.

so, in this change, we just drop the HostRegistry parameter passed
to the constructor of MinioServer, and pass the host address in place
of it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 23:40:22 +08:00
Kamil Braun
59c410fb97 Merge 'migration_manager: announce: provide descriptions for all calls' from Patryk Jędrzejczak
The `system.group0_history` table provides useful descriptions for each
command committed to Raft group 0. One way of applying a command to
group 0 is by calling `migration_manager::announce`. This function has
the `description` parameter set to empty string by default. Some calls
to `announce` use this default value which causes `null` values in
`system.group0_history`. We want `system.group0_history` to have an
actual description for every command, so we change all default
descriptions to reasonable ones.

Going further, We remove the default value for the `description`
parameter of `migration_manager::announce` to avoid using it in the
future. Thanks to this, all commands in `system.group0_history` will
have a non-null description.

Fixes #13370

Closes #14979

* github.com:scylladb/scylladb:
  migration_manager: announce: remove the default value of description
  test: always pass empty description to migration_manager::announce
  migration_manager: announce: provide descriptions for all calls
2023-08-09 16:58:41 +02:00
Kefu Chai
29554b0fc6 s3/client: check for available port before starting minio server
there is chance that the default port of 9000 has been used on the
host running the test, in that case, we should try to use another
available port.

so, in this change, we try ports in the ranges of [9000, 9000+1000),
and use the first one which is not connectable.

Fixes #14985
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 17:33:42 +08:00
Botond Dénes
108e510a23 Merge 'Update sstable_requiring_cleanup on compaction completion' from Benny Halevy
Currently `sstable_requiring_cleanup` is updated using `compacting_sstable_registration`, but that mechanism is not used by offstrategy compaction, leading to #14304.

This series introduces `compaction_manager::on_compaction_completion` that intercepts the call
to the table::on_compaction_completion. This allows us to update `sstable_requiring_cleanup` right before the compacted sstables are deleted, making sure they are no leaked to `sstable_requiring_cleanup`, which would hold a reference to them until cleanup attempts to clean them up.

`cleanup_incremental_compaction_test` was adjusted to observe the sstables `on_delete` (by adding a new observer event) to detect the case where cleanup attempts to delete the leaked sstables and fails since they were already deleted from the file system by offstrategy compaction. The test fails with the fix and passes with it.

Fixes #14304

Closes #14858

* github.com:scylladb/scylladb:
  compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup
  compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
  sstable: add on_delete observer
  compaction_manager: add on_compaction_completion
  sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty
2023-08-09 11:03:45 +03:00
Botond Dénes
69d6778daf Merge 'build: cmake: fixes for the release build' from Kefu Chai
before this change, we use generator expression to initialize CMAKE_CXX_FLAGS_RELEASE, this has two problems:

1. the generator expression is not expanded when setting a regular variable.
2. the ending ">" is missing in one of the generator expression.
3. the parameters are not separated with ";"

so address them, let's just

* use `add_compile_options()` directly, as the corresponding `mode.${build_mode}.cmake` is only included when the "${build_mode}" is used.
* add comma in between the command line options.
* add the missing closing ">"

Closes #14996

* github.com:scylladb/scylladb:
  build: cmake: pass --gc-sections to ld not ar
  build: cmake: use add_compile_options() in release build
2023-08-09 09:55:02 +03:00
Kefu Chai
47c9b25bac compaction_manager: correct comment on compaction_task_executor::state
when it comes to `regular_compaction_task_executor`, we repeat the
compaction until the compaction can not proceed, so after an iteration
of compaction completes successfully, the task can still continue with
yet another round of the compaction as it sees appropriate. so let's
update the comment to reflect this fact.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14891
2023-08-09 09:49:18 +03:00
Kefu Chai
6dc885a8e2 compaction: mark more member variables const
quite a few member variables serves as the configuration for
a given compaction, they are immutable in the life cycle of it,
so for better readability, let's mark them `const`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14981
2023-08-09 09:28:44 +03:00
Botond Dénes
5f65ac73ed Merge 'Remove qctx' from Pavel Emelyanov
The only place that still calls it is static force_blocking_flush method. If can be made non-static already. Also, while at it, coroutinize some system_keyspace methods and fix a FIXME regarding replica::database access in it.

Closes #14984

* github.com:scylladb/scylladb:
  code: Remove query-context.hh
  code: Remove qctx
  system_keyspace: Use system_keyspace's container() to flush
  system_keyspace: Make force_blocking_flush() non-static
  system_keyspace: Coroutinize update_tokens()
  system_keyspace: Coroutinize save_truncation_record()
2023-08-09 09:27:53 +03:00
Kefu Chai
782b1992b2 build: cmake: pass --gc-sections to ld not ar
ar is not able to tell which sections to be GC'ed, hence it does
not care about --gc-sections, but ld does. let's add this option
to CMAKE_EXE_LINKER_FLAGS.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 13:50:44 +08:00
Kefu Chai
f7377725c2 build: cmake: use add_compile_options() in release build
before this change, we use generator expression to initialize
CMAKE_CXX_FLAGS_RELEASE, this has two problems:

1. the generator expression is not expanded when setting
   a regular variable.
2. the ending ">" is missing in one of the generator
   expression.
3. the parameters are not separated with ";"

so address them, let's just

* use `add_compile_options()` directly, as the corresponding
  `mode.${build_mode}.cmake` is only included when the
  "${build_mode}" is used.
* add comma in between the command line options.
* add the missing closing ">"

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 12:56:06 +08:00
Pavel Emelyanov
f1515c610e code: Remove query-context.hh
The whole thing is unused now, so the header is no longer needed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:11:07 +03:00
Pavel Emelyanov
413d81ac16 code: Remove qctx
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:10:56 +03:00
Pavel Emelyanov
d7f5d6dba8 system_keyspace: Use system_keyspace's container() to flush
In force_blocking_flush() there's an invoke-on-all invocation of
replica::database::flush() and a FIXME to get the replica database from
somewhere else rather than via query-processor -> data_dictionary.

Since now the force_blocking_flush() is non-static the invoke-on-all can
happen via system_keyspace's container and the database can be obtained
directly from the sys.ks. local instance

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:09:32 +03:00
Pavel Emelyanov
7a342ed5c0 system_keyspace: Make force_blocking_flush() non-static
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:09:20 +03:00
Pavel Emelyanov
6b8fe5ac43 system_keyspace: Coroutinize update_tokens()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:09:15 +03:00
Pavel Emelyanov
1700d79b60 system_keyspace: Coroutinize save_truncation_record()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:09:09 +03:00
Benny Halevy
7a7c8d0d23 compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup
Erase retired sstable from compaction_state::sstables_requiring_cleanup
also on_compaction_completion (in addition to
compacting_sstable_registration::release_compacting
for offstrategy compaction with piggybacked cleanup
or any other compaction type that doesn't use
compacting_sstable_registration.

Add cleanup_during_offstrategy_incremental_compaction_test
that is modeled after cleanup_incremental_compaction_test to check
that cleanup doesn't attempt to cleanup already-deleted
sstables that were left over by offstrategy compaction
in sstables_requiring_cleanup.

Fixes #14304

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-08 08:16:46 +03:00
Benny Halevy
b1e164a241 compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
Prevent div-by-zero byt returning const level 1
if max_sstable_size is zero, as configured by
cleanup_incremental_compaction_test, before it's
extended to cover also offstrategy compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-08 08:16:46 +03:00
Benny Halevy
b08f2ac4c6 sstable: add on_delete observer
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-08 08:15:00 +03:00
Benny Halevy
df66895080 compaction_manager: add on_compaction_completion
Pass the call to the table on_compaction_completion
so we can manage the sstables requiring cleanup state
along the way.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-08 08:12:05 +03:00
Benny Halevy
ea64ae54f8 sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty
Make sure that there are no sstables_requiring_cleanup
after cleanup compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-08 08:12:01 +03:00
Patryk Jędrzejczak
356e131acd migration_manager: announce: remove the default value of description
We remove the default value for the description parameter of
migration_manager::announce to avoid using it in the future.
Thanks to this, all commands in system.group0_history will
have a non-null description.
2023-08-07 14:38:11 +02:00
Patryk Jędrzejczak
866c9a904d test: always pass empty description to migration_manager::announce
In the next commit, we remove the default value for the
description parameter of migration_manager::announce to avoid
using it in the future. However, many calls to announce in tests
use the default value. We have to change it, but we don't really
care about descriptions in the tests, so we pass the empty string
everywhere.
2023-08-07 14:38:11 +02:00
Patryk Jędrzejczak
27ddf78171 migration_manager: announce: provide descriptions for all calls
The system.group0_history table provides useful descriptions
for each command committed to Raft group 0. One way of applying
a command to group 0 is by calling migration_manager::announce.
This function has the description parameter set to empty string
by default. Some calls to announce use this default value which
causes null values in system.group0_history. We want
system.group0_history to have an actual description for every
command, so we change all default descriptions to reasonable ones.

We can't provide a reasonable description to announce in
query_processor::execute_thrift_schema_command because this
function is called in multiple situations. To solve this issue,
we add the description parameter to this function and to
handler::execute_schema_command that calls it.
2023-08-07 14:38:11 +02:00
Avi Kivity
4f7e83a4d0 cql3: select_statement: reject DISTINCT with GROUP BY on clustering keys
While in SQL DISTINCT applies to the result set, in CQL it applies
to the table being selected, and doesn't allow GROUP BY with clustering
keys. So reject the combination like Cassandra does.

While this is not an important issue to fix, it blocks un-xfailing
other issues, so I'm clearing it ahead of fixing those issues.

An issue is unmarked as xfail, and other xfails lose this issue
as a blocker.

Fixes #12479

Closes #14970
2023-08-07 15:35:59 +03:00
Benny Halevy
db7a4109dd gossiper: lock_endpoint: fix comment regarding permit_id mismatch
Fixes a code review comment.
See https://github.com/scylladb/scylladb/pull/14845#discussion_r1283572889

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-07 14:39:42 +03:00
Benny Halevy
4ebd2fa09d gossiper: lock_endpoint: change assert to on_internal_error
Fixes a code review comment.
See https://github.com/scylladb/scylladb/pull/14845#discussion_r1283060243

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-07 14:36:35 +03:00
Patryk Jędrzejczak
1772433ae2 raft_group0: log gaining and losing leadership on the INFO level
Knowing that a server gained or lost leadership in group 0 is
sometimes useful for the purpose of debugging, so we log
information about these events on the INFO level.

Gaining and losing leadership are relatively rare events, so
this change shouldn't flood the logs.

Closes #14877
2023-08-07 12:13:24 +02:00
Kamil Braun
9edc98f8e9 Merge 'raft: make a removed/decommissioning node a non-voter early' from Patryk Jędrzejczak
For `removenode`, we make a removed node a non-voter early. There is no
downside to it because the node is already dead. Moreover, it improves
availability in some situations.

For `decommission`, if we decommission a node when the number of nodes
is even, we make it a non-voter early to improve availability. All
majorities containing this node will remain majorities when we make this
node a non-voter and remove it from the set because the required size of
a majority decreases.

We don't change `decommission` when the number of nodes is odd since
this may reduce availability.

Fixes #13959

Closes #14911

* github.com:scylladb/scylladb:
  raft: make a decommissioning node a non-voter early
  raft: topology_coordinator: implement step_down_as_nonvoter
  raft: make a removed node a non-voter early
2023-08-07 10:14:33 +02:00
Botond Dénes
fa4aec90e9 Merge 'test: tasks: Fix task_manager/wait_task test ' from Aleksandra Martyniuk
Rewrite test that checks whether task_manager/wait_task works properly.
The old version didn't work. Delete functions used in old version.

Closes #14959

* github.com:scylladb/scylladb:
  test: rewrite wait_task test
  test: move ThreadWrapper to rest_util.py
2023-08-07 09:04:29 +03:00
Benny Halevy
6f037549ac sstables: delete_with_pending_deletion_log: batch sync_directory
When deleting multiple sstables with the same prefix
the deletion atomicity is ensured by the pending_delete_log file,
so if scylla crashes in the middle, deletions will be replyed on
restart.

Therefore, we don't have to ensure atomicity of each individual
`unlink`.  We just need to sync the directory once, before
removing the pending_delete_log file.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14967
2023-08-06 18:52:13 +03:00