Commit Graph

38388 Commits

Author SHA1 Message Date
Pavel Emelyanov
fdfec474ae table: Make sstables with required state
By default it's created with normal state, but there are some places
that need to put it into staging. Do it with new state enum

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 15:28:54 +03:00
Pavel Emelyanov
855f2b4b86 test: Make sstables with upload state in some cases
As was mentione in the previous patch, there are few places in tests
that put sstables in upload/ subdir and they really mean it. Those need
to use sstables manager/directory API directly (already) and specify the
state explicitly (this patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 15:28:54 +03:00
Pavel Emelyanov
9e752ca6ab tools: Make sstables with normal state
Just like tests, tool open sstable by its full path and doesn't make any
assumptions about sstable state

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:02 +03:00
Pavel Emelyanov
6628dc47c5 table: Open-code sstables making streaming helpers
There are two of those that call each other to end up calling plain
make_sstable() one. It's simpler to patch both if they just call the
latter directly.

While at it -- drop the unused default argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:02 +03:00
Pavel Emelyanov
734c0820df tests: Make sstables with normal state by default
It's assumed that sstables are not very specific about which
subdirectory an sstable is, so they can use normal state. Places that
need to move sstables between states will use sstable manager API
explicitly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:02 +03:00
Pavel Emelyanov
e7bbdbcef0 sstable_directory: Make sstable with required state
The state is on sstable_directory, can switch to using the new manager
API. The full path is still there, will be dropped later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:02 +03:00
Pavel Emelyanov
c0b922a8af sstable_directory: Construct with state
This is to replace full path sitting on this object eventually. For now
they have to co-exist, but state will be used to make_sstable()-s from
manager with its new API

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:01 +03:00
Pavel Emelyanov
6fc62c2d9f distributed_loader: Make sstable with desired state when populating
This still needs to conver state to directory name internally as
sstable_directory instances are hashed on populator by subdir string.
Also the full string path is printed in logs. All this is now internal
to populate method and will be fixed later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:45:52 +03:00
Pavel Emelyanov
b0064f5c55 distributed_loader: Make sstable with upload state when uploading
Just make use of the new shiny sstables_manager API

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:45:52 +03:00
Pavel Emelyanov
249a6a4d27 sstable: Introduce state enum
There are several states between which an sstable can migrate. Nowadays
the state is encoded into sstable directory, which is not nice. Also S3
backed sstables don't support states only keeping sstables in "normal".

This patch adds enum state in order to replace the path-encoded one
eventually. The new sstables_manager::make_sstable() method is added
that accepts table directory (without quarantine/ or staging/ component)
and the desired initial state (optional). Next patches will make use of
this maker and the existing one will be removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:45:52 +03:00
Pavel Emelyanov
c257ad90e1 sstable_directory: Merge verify and g.c. calls
Name it .prepare() and remove the sstable_directory() public method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:45:51 +03:00
Pavel Emelyanov
07d4672054 distributed_loader: Merge verify and gc invocations
Both are launched on shard-0, no need to invoke_on two times

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:41:48 +03:00
Pavel Emelyanov
e955186765 sstable/filesystem: Put underscores to dir members
They are private class fields, must be _-prefixed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:00:50 +03:00
Pavel Emelyanov
84b318228a sstable/s3: Mark make_s3_object_name() const
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:00:50 +03:00
Pavel Emelyanov
f15246f5ef sstable: Remove filename(dir, ...) method
It's only used by fs storage driver that can do dir/file concatenation
on its own. Moreover, this method is not welcome to be used even
internally

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:00:50 +03:00
Avi Kivity
b120d35c58 Merge 'Relax cql_test_env services maintenance' from Pavel Emelyanov
To add a sharded service to the cql_test_env one needs to patch it in 5 or 6 places

- add cql_test_env reference
- add cql_test_env constructor argument
- initialize the reference in initializer list
- add service variable to do_with method
- pass the variable to cql_test_env constructor
- (optionally) export it via cql_test_env public method

Steps 1 through 5 are annoying, things get much simpler if look like

- add cql_test_env variable
- (optionally) export it via cql_test_env public method

This is what this PR does

refs: #2795

Closes #15028

* github.com:scylladb/scylladb:
  cql_test_env: Drop local *this reference
  cql_test_env: Drop local references
  cql_test_env: Move most of the stuff in run_in_thread()
  cql_test_env: Open-code env start/stop and remove both
  cql_test_env: Keep other services as class variables
  cql_test_env: Keep services as class variables
  cql_test_env: Construct env early
  cql_test_env: De-static fdpinger variable
  cql_test_env: Define all services' variables early
  cql_test_env: Keep group0_client pointer
2023-08-13 20:24:52 +03:00
Benny Halevy
8fbcf1ab9f view: start: ignore also abort_requested_exception
We see the abort_requested_exception error from time
to time, instead of sleep_aborted that was expected
and quietly ignored (in debug log level).

Treat abort_requested_exception the same way since
the error is expected on shutdown and to reduce
test flakiness, as seen for example, in
https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/3033/artifact/logs-full.release.010/1691896356104_repair_additional_test.py%3A%3ATestRepairAdditional%3A%3Atest_repair_schema/node2.log
```
INFO  2023-08-13 03:12:29,151 [shard 0] compaction_manager - Asked to stop
WARN  2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Got error in the loop, live_nodes={}: seastar::sleep_aborted (Sleep is aborted)
INFO  2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Finished main loop
WARN  2023-08-13 03:12:29,152 [shard 0] cdc - Aborted update CDC description table with generation (2023/08/13 03:12:17, d74aad4b-6d30-4f22-947b-282a6e7c9892)
INFO  2023-08-13 03:12:29,152 [shard 1] compaction_manager - Asked to stop
INFO  2023-08-13 03:12:29,152 [shard 1] compaction_manager - Stopped
INFO  2023-08-13 03:12:29,153 [shard 0] init - Signal received; shutting down
INFO  2023-08-13 03:12:29,153 [shard 0] init - Shutting down view builder ops
INFO  2023-08-13 03:12:29,153 [shard 0] view - Draining view builder
INFO  2023-08-13 03:12:29,153 [shard 1] view - Draining view builder
INFO  2023-08-13 03:12:29,153 [shard 0] compaction_manager - Stopped
ERROR 2023-08-13 03:12:29,153 [shard 0] view - start failed: seastar::abort_requested_exception (abort requested)
ERROR 2023-08-13 03:12:29,153 [shard 1] view - start failed: seastar::abort_requested_exception (abort requested)
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #15029
2023-08-13 18:39:09 +03:00
Gleb Natapov
70b5360a73 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942

Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>
2023-08-13 14:19:39 +03:00
Patryk Jędrzejczak
2e2271f639 raft: make a replaced node a non-voter early
We make a replaced node a non-voter early, just as a removed node
in 377f87c91a.

Closes #15022
2023-08-12 22:03:46 +02:00
Avi Kivity
0cd0be6275 Merge 'Remove some stale sugar from cql_test_env' from Pavel Emelyanov
There are some asserting checks for keyspace and table existence on cql_test_env that perform some one-linee work in a complex manner, tests can do better on their own. Removing it makes cql_test_env simpler

refs: #2795

Closes #15027

* github.com:scylladb/scylladb:
  test: Remove require_..._exists from cql_test_env
  test: Open-code ks.cf name parse into cdc_test
  test: Don't use require_table_exists() in test/lib/random_schema
  test: Use BOOST_REQUIRE(!db.has_schema())
  test: Use BOOST_REQUIRE(db.has_schema())
  test: Use BOOST_REQUIRE(db.has_keyspace())
  test: Threadify cql_query_test::test_compact_storage case
  test: Threadify some cql_query_test cases
2023-08-12 22:32:47 +03:00
Pavel Emelyanov
64ddc9e4b4 cql_test_env: Drop local *this reference
The auto& env = *this is also now excessive, so drop it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:30:34 +03:00
Pavel Emelyanov
de679d7c36 cql_test_env: Drop local references
The local auto& foo = env._foo references in run_in_thread() a no longer
needed, the code that uses foo can be switched to use _foo (this->_foo)
instead

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:29:42 +03:00
Pavel Emelyanov
487ecae517 cql_test_env: Move most of the stuff in run_in_thread()
Thw do_with() method is static and cannot just access cql_test_env
variable's fields, using local references instead. To simplify this,
most of the method's content is moved to non-static run_in_thread()
method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:28:40 +03:00
Pavel Emelyanov
2c175660f2 cql_test_env: Open-code env start/stop and remove both
These two just make more churn in next patch, so drop both

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:28:03 +03:00
Pavel Emelyanov
10f9292fe8 cql_test_env: Keep other services as class variables
There are more services on do_with() stack that are not referenced from
the cql_test_env. Move them to be class variables too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:27:19 +03:00
Pavel Emelyanov
08a3be3b17 cql_test_env: Keep services as class variables
Now they are duplicated -- variables exist on do_with() stack and the
class references some of them. This patch makes is vice-versa -- all the
variables are on the cql_test_env and do_with() references them. The
latter will change soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:26:21 +03:00
Pavel Emelyanov
b31d2097b8 cql_test_env: Construct env early
Its constructor is _just_ assigning references and setting up rlimits.
Both can happen early

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:25:49 +03:00
Pavel Emelyanov
49d4760655 cql_test_env: De-static fdpinger variable
So that it could be moved onto cql_test_env as a class member

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:25:25 +03:00
Pavel Emelyanov
749c5baf21 cql_test_env: Define all services' variables early
Nowadays they are all scattered along the .do_with() function. Keeping
them in one early place makes it possible to relocate them onto the
cql_test_env later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:23:54 +03:00
Pavel Emelyanov
d36737f094 cql_test_env: Keep group0_client pointer
It's now reference, but some time later it won't be able to get
initialized construction-time, so turn it into pointer

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:23:16 +03:00
Pavel Emelyanov
da98355bc8 test: Remove require_..._exists from cql_test_env
Not used by any code anymore. This makes cql_test_env shorter and nicer

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:36 +03:00
Pavel Emelyanov
64c8a59e9b test: Open-code ks.cf name parse into cdc_test
The test uses qualified ks.cf name to find the schema, but it's the only
test case that does it. There's no point in maintaining a dedicated
helper on the cql_test_env just for that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:36 +03:00
Pavel Emelyanov
6ead9a5255 test: Don't use require_table_exists() in test/lib/random_schema
This check is pointless. The subsequent call to find_column_family()
would call on_internal_error() in case schema is not found, and since
cql_test_env sets abort-on-internal-error to true, this would fail just
like that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:36 +03:00
Pavel Emelyanov
b4c84f9174 test: Use BOOST_REQUIRE(!db.has_schema())
Surprisingly there's a dedicated helper for the check opposite to the
one fixed in the previous patch. Fix one too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:36 +03:00
Pavel Emelyanov
063baabaee test: Use BOOST_REQUIRE(db.has_schema())
Same as in previous patch, the cql_test_env::require_table_exists()
helper is exactly the same, but returns future and asserts on failures
for no gain

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:32 +03:00
Pavel Emelyanov
a128108acd test: Use BOOST_REQUIRE(db.has_keyspace())
The cql_test_env::require_keyspace_exists() performs exactly the same
check, but is future-returning function for no reason and it assert()s
on failure, that's less informative (not that it ever failed) than
BOOST_REQUIRE

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:29 +03:00
Pavel Emelyanov
53c309063e test: Threadify cql_query_test::test_compact_storage case
It's like in previous patch, and for the same reason, but the change is
a bit more complicated because it uses resolved futures' results in few
places, so it likely deserves separate commit

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:39:40 +03:00
Pavel Emelyanov
59db35fba0 test: Threadify some cql_query_test cases
Those two use straight .then-s sequences, no point in keeping them that
long. Being threads makes next patches shorter and nicer

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:38:51 +03:00
Aleksandra Martyniuk
db932c7106 compaction: hold gate immediately after task executor is created
If make_task call in compaction_manager::perform_compaction yields,
compaction_task_executor::_compaction_state may be gone and gate
won't be held.

Hold gate immediately after compaction_task_executor is created.
Add comment not to call prepare_task without preparation.

Refs: #14971.
Fixes: #14977.

Closes #14999
2023-08-11 13:56:38 +02:00
Kefu Chai
17c1b15c81 create-relocatable-package.py: add version file with tempfile
before this change, we build multiple relocatable package for
different builds in parallel using ninja. all these relocatable
packages are built using the same script of
`create-relocatable-package.py`. but this script always use the
same directory and file for the `.relocatable_package_version`
file.

so there are chances that these jobs building the relocatable
package can race and writing / accessing the same file at the same
time. so, in this change, instead of using a fixed file path
for this temporary file, we use a NamedTemporaryFile for this purpose.
this should helps us avoid the build failures like

```
[2023-08-10T09:38:00.019Z] FAILED: build/debug/dist/tar/scylla-unstripped-5.4.0~dev-0.20230810.116c10a2b0c6.x86_64.tar.gz
[2023-08-10T09:38:00.019Z] scripts/create-relocatable-package.py --mode debug 'build/debug/dist/tar/scylla-unstripped-5.4.0~dev-0.20230810.116c10a2b0c6.x86_64.tar.gz'
[2023-08-10T09:38:00.019Z] Traceback (most recent call last):
[2023-08-10T09:38:00.019Z]   File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 130, in <module>
[2023-08-10T09:38:00.019Z]     os.makedirs(f'build/{SCYLLA_DIR}')
[2023-08-10T09:38:00.019Z]   File "<frozen os>", line 225, in makedirs
[2023-08-10T09:38:00.019Z] FileExistsError: [Errno 17] File exists:
'build/scylla-package'
```

Fixes #15018
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15007
2023-08-11 12:57:28 +03:00
Aleksandra Martyniuk
ae67f5d47e api: ignore future in task_manager_json::wait_task
Before returning task status, wait_task waits for it to finish with
done() method and calls get() on a resulting future.

If requested task fails, an exception will be thrown and user will
get internal server error instead of failed task status.

Result of done() method is ignored.

Fixes: #14914.

Closes #14915
2023-08-11 08:18:51 +03:00
Patryk Jędrzejczak
d1d1b6cf6e raft: remove a replaced node from group 0 earlier
The topology coordinator only marks a replaced node as LEFT
during the replace operation and actually removes it from the
group 0 config in cleanup_group0_config_if_needed. If this
function is called before raft has committed a replacing node as
a voter, it does not remove the replaced node from the group 0
config. Then, the coordinator can decide that it has no work to
do and starts sleeping, leaving us with an outdated config.

This behavior reduces group 0 availability and causes problems in
tests (see  #14885). Also, it makes the coordinator's logic
confusing - it claims that it has no work to do when it has some
work to do. Therefore, we modify the coordinator so that it
removes the replaced node earlier in handle_topology_transition.

Fixes #14885
Fixes #14975

Closes #15009
2023-08-11 01:32:24 +02:00
Botond Dénes
403ba9b055 Merge 'gossiper: lock_endpoint: fix review comments' from Benny Halevy
This series fixes a couple of review comments on #14845

Closes #14976

* github.com:scylladb/scylladb:
  gossiper: lock_endpoint: fix comment regarding permit_id mismatch
  gossiper: lock_endpoint: change assert to on_internal_error
2023-08-10 17:37:32 +03:00
Kamil Braun
8f658fb139 Merge 's3/client: check for available port before starting minio server' from Kefu Chai
there is chance that the default port of 9000 has been used on the host
running the test, in that case, we should try to use another available
port.

so, in this change, we try ports in the ranges of [9000, 9000+1000), and
use the first one which is not connectable.

Fixes #14985
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14997

* github.com:scylladb/scylladb:
  test: stop using HostRegistry in MinioServer
  s3/client: check for available port before starting minio server
2023-08-10 14:01:13 +02:00
Alejo Sanchez
e2122163f5 test/pylib: protect double call to cluster stop
test.py schedules calls to cluster .uninstall() and .stop() making
double calls to it running at the same time. Mark the cluster as not
running early on.

While there, do the same for .stop_gracefully() for consistency.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #14987
2023-08-10 13:37:49 +02:00
Kamil Braun
39330b9c11 Merge 'gossiper: convict: lock_endpoint' from Benny Halevy
Currently, `mark_dead` is called with null_permit_id
from `convict`, and in this case, by contract,
it should lock the endpoint, same as `mark_as_shutdown`.

This change somehow escaped #14845 so it amends it.

Fixes #14838

Closes #15001

* github.com:scylladb/scylladb:
  gossiper: verify permit_id in all private functions
  gossiper: convict: lock_endpoint
2023-08-10 13:09:05 +02:00
Benny Halevy
623ed1a249 gossiper: verify permit_id in all private functions
Instead of acquiring the permit is the permit_id arg is null,
like in mark_as_shutdown, just asssert that the permit_id is
non-null. The functions are both private and we control all callers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-10 08:17:04 +03:00
Benny Halevy
42c1c5ead8 gossiper: convict: lock_endpoint
Currently, `mark_dead` is called with null_permit_id
from `convict`, and in this case, by contract,
it should lock the endpoint, same as `mark_as_shutdown`.

This change somehow escaped #14845 so it amends it.

Fixes #14838

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-10 07:50:33 +03:00
Kefu Chai
0c0a59bf62 test: stop using HostRegistry in MinioServer
since MinioServer find a free port by itself, there is no need to
provide it an IP address for it anymore -- we can always use
127.0.0.1.

so, in this change, we just drop the HostRegistry parameter passed
to the constructor of MinioServer, and pass the host address in place
of it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 23:40:22 +08:00
Kamil Braun
59c410fb97 Merge 'migration_manager: announce: provide descriptions for all calls' from Patryk Jędrzejczak
The `system.group0_history` table provides useful descriptions for each
command committed to Raft group 0. One way of applying a command to
group 0 is by calling `migration_manager::announce`. This function has
the `description` parameter set to empty string by default. Some calls
to `announce` use this default value which causes `null` values in
`system.group0_history`. We want `system.group0_history` to have an
actual description for every command, so we change all default
descriptions to reasonable ones.

Going further, We remove the default value for the `description`
parameter of `migration_manager::announce` to avoid using it in the
future. Thanks to this, all commands in `system.group0_history` will
have a non-null description.

Fixes #13370

Closes #14979

* github.com:scylladb/scylladb:
  migration_manager: announce: remove the default value of description
  test: always pass empty description to migration_manager::announce
  migration_manager: announce: provide descriptions for all calls
2023-08-09 16:58:41 +02:00