Commit Graph

41653 Commits

Author SHA1 Message Date
Kamil Braun
76fb902858 test: unflake test_topology_remove_garbage_group0
The test is booting nodes, and then immediately starts shutting down
nodes and removing them from the cluster. The shutting down and
removing may happen before driver manages to connect to all nodes in the
cluster. In particular, the driver didn't yet connect to the last
bootstrapped node. Or it can even happen that the driver has connected,
but the control connection is established to the first node, and the
driver fetched topology from the first node when the first node didn't
yet consider the last node to be normal. So the driver decides to close
connection to the last node like this:
```
22:34:03.159 DEBUG> [control connection] Removing host not found in
   peers metadata: <Host: 127.42.90.14:9042 datacenter1>
```

Eventually, at the end of the test, only the last node remains, all
other nodes have been removed or stopped. But the driver does not have a
connection to that last node.

Fix this problem by ensuring that:
- all nodes see each other as NORMAL,
- the driver has connected to all nodes
at the beginning of the test, before we start shutting down and removing
nodes.

Fixes scylladb/scylladb#16373

Closes scylladb/scylladb#17676
2024-03-08 10:08:09 +01:00
Mikołaj Grzebieluch
a0915115c3 maintenance_socket: change log message to differentiate from regular CQL ports
Scylla-ccm uses function `wait_for_binary_interface` that waits for
scylla logs to print "Starting listening for CQL clients". If this log
is printed far before the regular cql_controller is initialized,
scylla-ccm assumes too early that node is initialized.
It can result in timeouts that throw errors, for example in the function
`watch_rest_for_alive`.

Closes scylladb/scylladb#17496
2024-03-08 10:08:09 +01:00
Nadav Har'El
ea53db379f Merge 'tools/scylla-nodetool: listsnapshot: make it compatible with origin' from Botond Dénes
The following incompatibilities were identified by `listsnapshots_test.py` in dtests:
* Command doesn't bail out when there are no snapshots, instead it prints meaningless empty report
* Formatting is incompatible

Both are fixed in this mini-series.

Closes scylladb/scylladb#17541

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
  tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
2024-03-08 10:08:09 +01:00
Botond Dénes
b69ee6bc27 Merge 'Fix load-and-stream for tablets' from Raphael "Raph" Carvalho
It might happen that multiple tablets co-habit the same shard, so we want load-and-stream to jump into a new streaming session for every tablet, such that the receiver will have the data properly segregated. That's a similar treatment we gave to repair. Today, load-and-stream fails due to sstables spanning more than 1 tablet in the receiver.

Synchronization with migration is done by taking replication map, so migrations cannot advance while streaming new data. A bug was fixed too, where data must be streamed to pending replicas too, to handle case where migration is ongoing and new data must reach both old and new replica set. A test was added stressing this synchronization path.

Another bug was fixed in sstable loading, which expected sharder to not be invalidated throughout the operation, but that breaks during migrations.

Fixes #17315.

Closes scylladb/scylladb#17449

* github.com:scylladb/scylladb:
  test: test_tablets: Add load-and-stream test
  sstables_loader: Stream to pending tablet replica if needed
  sstables_loader: Implement tablet based load-and-stream
  sstables_loader: Virtualize sstable_streamer for tablet
  sstables_loader: Avoid reallocations in vector
  sstable_loader: Decouple sstable streaming from selection
  sstables_loader: Introduce sstable_streamer
  Fix online SSTable loading with concurrent tablet migration
2024-03-07 14:18:30 +02:00
Nadav Har'El
19bcea6216 materialized views: fix rare failure caused by empty update
This one-line patch fixes a failure in the dtest

        lwt_schema_modification_test.py::TestLWTSchemaModification
        ::test_table_alter_delete

Where an update sometimes failed due to an internal server error, and the
log had the mysterious warning message:

        "std::logic_error (Empty materialized view updated)"

We've also seen this log-message in the past in another user's log, and
never understood what it meant.

It turns out that the error message was generated (and warning printed)
while building view updates for a base-table mutation, and noticing that
the base mutation contains an *empty* row - a row with no cells or
tombstone or anything whatsoever. This case was deemed (8 years ago,
in d5a61a8c48) unexpected and nonsensical,
and we threw an exception. But this case actually *can* happen - here is
how it happened in test_table_alter_delete - which is a test involving
a strange combination of materialized views, LWT and schema changes:

 1. A table has a materialized view, and also a regular column "int_col".
 2. A background thread repeatedly drops and re-creates this column
    int_col.
 3. Another thread deletes rows with LWT ("IF EXISTS").
 4. These LWT operations each reads the existing row, and because of
    repeated drop-and-recreate of the "int_col" column, sometimes this
    read notices that one node has a value for int_col and the other
    doesn't, and creates a read-repair mutation setting int_col (the
    difference between the two reads includes just in this column).
 5. The node missing "int_col" receives this mutation which sets only
    int_col. It upgrade()s this mutation to its most recent schema,
    which doesn't have int_col, so it removes this column from the
    mutation row - and is left with a completely empty mutation row.
    This completely empty row is not useful, but upgrade() doesn't
    remove it.
 6. The view-update generation code sees this empty base-mutation row
    and fails it with this std::logic_error.
 7. The node which sent the read-repair mutation sees that the read
    repair failed, so it fails the read and therefore fails the LWT
    delete operation.
    It is this LWT operation which failed in the test, and caused
    the whole test to fail.

The fix is trivial: an empty base-table row mutation should simply be
*ignored* when generating view updates - it shouldn't cause any error.

Before this patch, test_table_alter_delete used to fail in roughly
20% of the runs on my laptop. After this patch, I ran it 100 times
without a single failure.

Fixes #15228
Fixes #17549

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17607
2024-03-07 12:00:43 +02:00
Botond Dénes
09068d20ea tools/scylla-nodetool: scrub: make keyspace parameter optional
When no keyspace is provided, request all keyspaces from the server,
then scrub all of them. This is what the legacy nodetool does, for some
reason this was missed when re-implementing scrub.

Closes scylladb/scylladb#17495
2024-03-07 11:15:46 +02:00
Tomasz Grabiec
ec6ed18b5c Merge 'Handle tablet migration failure in barrier stages' from Pavel Emelyanov
There are 4 barrier-only stages when migrating a tablet and the test needs to fail pending/leaving replica that handles it in order to validate how coordinator handles dead node. Failing the barrier is done by suspending it with injection code and stopping the node without waking it up. The main difficulty here is how to tell one barrier RPC call from another, because they don't have anything onboard that could tell which stage the barrier is run for. This PR suggests that barrier injection code looks directly into the system.tablets table for the transition stage, the stage is already there by the time barrier is about to ack itself over RPC.

refs: #16527

Closes scylladb/scylladb#17450

* github.com:scylladb/scylladb:
  topology.tablets_migration: Handle failed use_new
  topology.tablets_migration: Handle failed write_both_read_new
  topology.tablets_migration: Handle failed write_both_read_old
  topology.tablets_migration: Handle failed allow_write_both_read_old
  test/tablets_migration: Add conditional break-point into barrier handler
  replica: Add helper to read tablet transition stage
  topology_coordinator: Add action_failed() helper
2024-03-07 09:56:13 +01:00
Botond Dénes
5dfaa69bde tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
The author (me) tried to be clever and fix the formatting, but then he
realized this just means a lot of unnecessary fighting with tests. So
this patch makes the formatting compatible with that of the legacy
nodetool:
* Use compatible rounding and precision formatting
* Use incorrect unit (KB instead of KiB)
* Align numbers to the left
* Add trailing white-space to "Snapshot Details: "
2024-03-07 03:54:54 -05:00
Botond Dénes
80483ba732 tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
Print a message and exit, don't continue to output the snapshot table.
This is what the legacy nodetool does too.
2024-03-07 03:54:54 -05:00
Botond Dénes
ac15e4c109 tools/scylla-nodetool: repair: accept and ignore -full/--full and -j/--job-threads
These two parameters are not used by the native nodetool, because
ScyllaDB itself doesn't support them. These should be just ignored and
indeed there was a unit test checking that this is the case. However,
due to a mistake in the unit test, this was not actually tested and
nodetool complained when seeing these params.
This patch fixes both the test and the native nodetool.

Closes scylladb/scylladb#17477
2024-03-07 11:53:50 +03:00
Nadav Har'El
a36c8b28dd Merge 'scylla-gdb.py: fixes warnings raised by flake8' from Kefu Chai
this changeset addresses some warnings raised by flake8 in hope to improve the readability of this script in general.

Closes scylladb/scylladb#17668

* github.com:scylladb/scylladb:
  scylla-gdb: s/if not foo is None/if foo is not None/
  scylla-gdb.py: add space after keyword
  scylla-gdb.py: remove extraneous spaces
  scylla-gdb.py: use 2 empty lines between top-level funcs/classes
  scylla-gdb.py: replace <tab> with 4 spaces
  scylla-gdb: fix the indent
2024-03-07 10:41:15 +02:00
Botond Dénes
28639e6a59 Merge 'docs: trigger the docs-pages workflow on release branches' from Beni Peled
Currently, the github docs-pages workflow is triggered only when changes are merged to the master/enterprise branches, which means that in the case of changes to a release branch, for example, a fix to branch-5.4, or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and therefore the documentation is not updated with the new change,

In this change, I added the `branch-**` pattern, so changes to release branches will trigger the workflow

Closes scylladb/scylladb#17281

* github.com:scylladb/scylladb:
  docs: always build from the default branch
  docs: trigger the docs-pages workflow on release branches
2024-03-07 10:01:50 +02:00
Botond Dénes
75fe2f5c3a Merge 'test: rest_api: fix tests to work with tablets' from Aleksandra Martyniuk
Fix test_compaction_task.py, test_repair_task.py and
test_storage_service.py to work with tablets.

Fixes: #17338.

Closes scylladb/scylladb#17474

* github.com:scylladb/scylladb:
  test: rest_api: enable tablets by default
  test: fix indentation and delete unused this_dc param
  test: rest_api: fix test_storage_service.py
  test: rest_api: fix test_repair_task.py
  test: rest_api: fix test_compaction_task.py
  test: rest_api: use skip_without_tablets fixture
  test: rest_api: add some tablet related fixtures
2024-03-07 10:00:09 +02:00
Asias He
83a28342ea service: Drop unused table param from session_topology_guard
The table param is not used. Dropping it so it can be used in places
where the table object is not available.

Closes scylladb/scylladb#17628
2024-03-07 09:34:40 +02:00
Israel Fruchter
6eb0509ff9 Update tools/cqlsh submodule
* tools/cqlsh b8d86b76...e5f5eafd (2):
  > dist/debian: fix the trailer line format
  > `COPY TO STDOUT` shouldn't put None where a function is expected

Fixes: scylladb/scylladb#17451

Closes scylladb/scylladb#17447
2024-03-07 09:33:36 +02:00
Michał Chojnowski
f9e97fa632 sstables: fix a use-after-free in key_view::explode()
key_view::explode() contains a blatant use-after-free:
unless the input is already linearized, it returns a view to a local temporary buffer.

This is rare, because partition keys are usually not large enough to be fragmented.
But for a sufficiently large key, this bug causes a corrupted partition_key down
the line.

Fixes #17625

Closes scylladb/scylladb#17626
2024-03-07 09:07:07 +02:00
Kefu Chai
7631605892 query-request: use default-generated operator==
instead of using the hand-crafted operator==, use the default-generated
one, which is equivalent to the former.

regarding the difference between global operator== and member operator==,
the default-generated operator in C++20 is now symmetric. so we don't
need to worry about the problem of `max_result_size` being lhs or rhs.
but neither do we need to worry about the implicit conversion, because
all constructors of `max_result_size` are marked explicit. so we don't
gain any advantage by making the operator== global instead of a member
operator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17536
2024-03-07 09:02:42 +03:00
Kefu Chai
64e14d21db locator/tablets: add fmt::formatter for tablet_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* tablet_id
* tablet_replica
* tablet_metadata
* tablet_map

their operator<<:s are dropped

Refs scylladb/scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17504
2024-03-07 09:00:49 +03:00
Kefu Chai
6ef507e842 build: cmake: add table_check.cc to repair/CMakeLists.txt
in 5202bb9d, we introduced repair/table_check.cc, but we didn't
update repair/CMakeLists.txt accordingly. but the symbols defined
by this compilation unit is referenced by other source files when
building scylla.

so, in this change, we add this table_check.cc to the "repair"
target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17517
2024-03-07 08:59:02 +03:00
Pavel Emelyanov
52a1b2c413 Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* position_range
* mutation_fragment
* range_tombstone_stream
* mutation_fragment_v2::printer

Refs #13245

Closes scylladb/scylladb#17521

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for position_range
  mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream
  mutation: add fmt::formatter for mutation_fragment_v2::printer
2024-03-07 08:56:21 +03:00
Pavel Emelyanov
df6048adec topology.tablets_migration: Handle failed use_new
This stage doesn't need any special treatment, because we cannot revert
to old replicas and should proceed normally. The barrier itself won't
get stuck, because it already handles excluded/ignored nodes.

Just make the test validate it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
fb7428c560 topology.tablets_migration: Handle failed write_both_read_new
Two options here -- go revert to old replicas by jumping into
cleanup_target stage or proceed noramlly. The choice depends on which
replica set has less number of dead nodes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
324eaaf873 topology.tablets_migration: Handle failed write_both_read_old
At this stage it can happen that target replica got some writes, so its
tablet needs to be cleaned up, so jump to cleanup_target stage.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
f81e0b2e88 topology.tablets_migration: Handle failed allow_write_both_read_old
This is early stage, just proceed to existing revert_migration

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
5bb1597a30 test/tablets_migration: Add conditional break-point into barrier handler
There are several transition stages that are executed by the topology
coordinator with the help of barrier-and-drain raft commands. For the
test to stop and remove a node while handling this stage it must inject
a break-point into barrier handler, wait for it to happen and then stop
the node without resuming the break-point. Then removenode from the
cluster.

The break-point suspends barrier handling when a specific tablet is in
specific transition stage. Tablet ID and desired stage are configured
via injector parameters.

With today's error-injection facilities the way to suspend code
execution is with injecting a lambda that waits for a message from the
injection engine.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
f5264dc501 replica: Add helper to read tablet transition stage
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:25 +03:00
Kefu Chai
4f8b618be7 scylla-gdb: s/if not foo is None/if foo is not None/
more readable this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
643a6d5bda scylla-gdb.py: add space after keyword
it'd be more pythonic to just put an expression after `assert`,
instead of quoting it with a pair of parenthesis. and there is no need
to add `;` after `break`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
8c65f92f1f scylla-gdb.py: remove extraneous spaces
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
12c06c39c3 scylla-gdb.py: use 2 empty lines between top-level funcs/classes
and 1 empty line for nested functions/classes, to be more PEP8
compliant. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
8e3b22c76a scylla-gdb.py: replace <tab> with 4 spaces
do not mix tab and spaces for indent

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
c4b679fe3b scylla-gdb: fix the indent
indent should be multiple of 4 spaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Pavel Emelyanov
79b5a75ded topology_coordinator: Add action_failed() helper
It checks if the action holder holds a failed action.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:46:29 +03:00
Botond Dénes
8dd6fe75e7 Merge 'tools/scylla-nodetool: implement info ' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17498

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement info
  test/nodetool: move format_size into utils.py
2024-03-07 07:14:51 +02:00
Avi Kivity
c5f01349b1 Merge 'Add specialized tablet_sstable_set' from Benny Halevy
Make a specialized sstable_set for tablets
via tablet_storage_group_manager::make_sstable_set.

This sstable set takes a snapshot of the storage_groups
(compound) sstable_sets and maps the selected tokens
directly into the tablet compound_sstable_set.

This sstable_set provides much more efficient access
to the table's sstable sets as it takes advantage of the disjointness
of sstable sets between tablets/storage_groups, and making it is cheaper
that rebuilding a complete partitioned_sstable_set from all sstables in the table.

Fixes #16876

Cassandra-stress setup:
```
$ sudo cpupower frequency-set -g userspace
$ build/release/scylla (developer-mode options) --smp=16 --memory=8G --experimental-features=consistent-topology-changes --experimental-features=tablets
cqlsh> CREATE KEYSPACE keyspace1 WITH replication={'class':'NetworkTopologyStrategy', 'replication_factor':1} AND tablets={'initial':2048};
$ ./tools/java/tools/bin/cassandra-stress write no-warmup n=10000000 -pop 'seq=1...10000000' -rate threads=128
$ scylla-api-client system drop_sstable_caches POST
$ ./tools/java/tools/bin/cassandra-stress read no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128
$ scylla-api-client system drop_sstable_caches POST
$ ./tools/java/tools/bin/cassandra-stress mixed no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128
```

Baseline (0a7854ea4d) vs. fix (0c2c00f01b)

Throughput (op/s):
workload | baseline | fix
---------|----------|----------
write | 76,806 | 100,787
read | 34,330 | 106,099
mixed | 32,195 | 79,246

Closes scylladb/scylladb#17149

* github.com:scylladb/scylladb:
  table: tablet_storage_group_manager: make tablet_sstable_set
  storage_group_manager: add make_sstable_set
  tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count
  table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode
  table: move compaction_group_list and storage_group_vector to storage_group_manager
  compaction_group::table_state: get_group_id: become self-sufficient
  compaction_group, table: make_compound_sstable_set: declare as const
  tablet_storage_group_manager: precalculate my_host_id and _tablet_map
  table: coroutinize update_effective_replication_map
2024-03-06 23:59:39 +02:00
Botond Dénes
557d851191 tools/toolchain/README.md: mention the need of credentials for publishing images
Without this, the push will fail, complaining about bad permissions.

Closes scylladb/scylladb#17652
2024-03-06 15:58:24 +02:00
Kefu Chai
3e91b1382b tools/scylla-nodetool: always use compile-time format string
instead of passing fmt string as a plain `const char*`, pass it as
a consteval type, so that `fmt::format()` can perform compile-time
format check against it and the formatted params.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17656
2024-03-06 14:55:10 +02:00
Avi Kivity
3ab2088119 Merge 'build: cmake: use scylla build mode for rust profile name ' from Kefu Chai
before this change, we used the lower-case CMake build configuration
name for the rust profile names. but this was wrong, because the
profiles are named with the scylla build mode.

in this change, we translate the `$<CONFIG>` to scylla build mode,
and use it for the profile name and for the output directory of
the built library.

Closes scylladb/scylladb#17648

* github.com:scylladb/scylladb:
  build: cmake: use scylla build mode for rust profile name
  build: cmake: define per-config build mode
2024-03-06 13:46:20 +02:00
Botond Dénes
65b9e10543 repair: resolve start-up deadlock
Repairs have to obtain a permit to the reader concurrency semaphore on
each shard they have a presence on. This is prone to deadlocks:

node1                              node2
repair1_master (takes permit)      repair1_follower (waits on permit)
repair2_master (waits for permit)  repair2_follower (takes permit)

In lieu of strong central coordination, we solved this by making permits
evictable: if repair2 can evict repair1's permit so it can obtain one
and make progress. This is not efficient as evicting a permit usually
means discarding already done work, but it prevents the deadlocks.
We recently discovered that there is a window when deadlocks can still
happen. The permit is made evictable when the disk reader is created.
This reader is an evictable one, which effectively makes the permit
evictable. But the permit is obtained when the repair constrol
structrure -- repair meta -- is create. Between creating the repair meta
and reading the first row from disk, the deadlock is still possible. And
we know that what is possible, will happen (and did happen). Fix by
making the permit evictable as soon as the repair meta is created. This
is very clunky and we should have a better API for this (refs #17644),
but for now we go with this simple patch, to make it easy to backport.

Refs: #17644
Fixes: #17591

Closes scylladb/scylladb#17646
2024-03-06 11:38:07 +02:00
Kamil Braun
19b816bb68 Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz
This patch series makes all auth writes serialized via raft. Reads stay
eventually consistent for performance reasons. To make transition to new
code easier data is stored in a newly created keyspace: system_auth_v2.

Internally the difference is that instead of executing CQL directly for
writes we generate mutations and then announce them via raft group0. Per
commit descriptions provide more implementation details.

Refs https://github.com/scylladb/scylladb/issues/16970
Fixes https://github.com/scylladb/scylladb/issues/11157

Closes scylladb/scylladb#16578

* github.com:scylladb/scylladb:
  test: extend auth-v2 migration test to catch stale static
  test: add auth-v2 migration test
  test: add auth-v2 snapshot transfer test
  test: auth: add tests for lost quorum and command splitting
  test: pylib: disconnect driver before re-connection
  test: adjust tests for auth-v2
  auth: implement auth-v2 migration
  auth: remove static from queries on auth-v2 path
  auth: coroutinize functions in password_authenticator
  auth: coroutinize functions in standard_role_manager
  auth: coroutinize functions in default_authorizer
  storage_service: add support for auth-v2 raft snapshots
  storage_service: extract getting mutations in raft snapshot to a common function
  auth: service: capture string_view by value
  alternator: add support for auth-v2
  auth: add auth-v2 write paths
  auth: add raft_group0_client as dependency
  cql3: auth: add a way to create mutations without executing
  cql3: run auth DML writes on shard 0 and with raft guard
  service: don't loose service_level_controller when bouncing client_state
  auth: put system_auth and users consts in legacy namespace
  cql3: parametrize keyspace name in auth related statements
  auth: parametrize keyspace name in roles metadata helpers
  auth: parametrize keyspace name in password_authenticator
  auth: parametrize keyspace name in standard_role_manager
  auth: remove redundant consts auth::meta::*::qualified_name
  auth: parametrize keyspace name in default_authorizer
  db: make all system_auth_v2 tables use schema commitlog
  db: add system_auth_v2 tables
  db: add system_auth_v2 keyspace
2024-03-06 10:11:33 +01:00
Botond Dénes
58265a7dc1 tools/utils: fix use-after-free when printing error message for unknown operation
When a tool application is invoked with an unknown operation, an error
message is printed, which includes all the known operations, with all
their aliases. This is collected in `std::vector<std::string_view>`. The
problem is that the vector containing alias names, is returned as a
value, so the code ends up creating views to temporaries.
Fix this by returning alias vector with const&.

Fixes: #17584

Closes scylladb/scylladb#17586
2024-03-06 10:42:02 +02:00
Pavel Emelyanov
ca8bfed8e6 topology_coordinator: Demote log level for advance_in_background() errors
The helper in question is supposed to spawn a background fiber with
tablet migration stage action and repeat it in case action fails (until
operator intervention, but that's another story). In case action fails
a message with ERROR level is logger about the failure.

This error confuses some tests that scan scylla log messages for
ERROR-s at the end, treat most of them (if not all) as ciritical and
fail. But this particular message is not in fact an error -- topology
coordinator would re-execute this action anyway, so let's demote the
message to be WARN instead.

refs: #17027

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17568
2024-03-06 10:39:00 +02:00
Botond Dénes
88a76245ba Merge 'Get metrics description' from Amnon Heiman
This series adds a Python script that searches the code for metrics definition and their description.
Because part of the code uses a nonstandard way of definition, it uses a configuration file to resolve parameter values.

The script supports the code that uses string format and string concatenation with variables.

The documentation team will use the results to both document the existing metrics and to get the metrics changes between releases.

Replaces #16328

Closes scylladb/scylladb#17479

* github.com:scylladb/scylladb:
  Adding scripts/metrics-config.yml
  Adding scripts/get_description.py to fetch metrics description
2024-03-06 10:37:35 +02:00
Kefu Chai
e248ab48db tools/scylla-nodetool: correct tablestats filtering
before this change, we failed to apply the filtering of tablestats
command in the right way:

1. `table_filter` failed to check if delimiter is npos before
   extract the cf component from the specified table name.
2. the stats should not included the keyspace which are not
   included by the filter.
3. the total number of tables in the stats report should contain
   all tables no matter they are filtered or not.

in this change, all the problems above are addressed. and the tests
are updated to cover these use cases.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17468
2024-03-06 10:36:20 +02:00
Benny Halevy
0c2c00f01b table: tablet_storage_group_manager: make tablet_sstable_set
Make a specialized sstable_set for tablets
via tablet_storage_group_manager::make_sstable_set.

This sstable set takes a snapshot of the storage_groups
(compound) sstable_sets and maps the selected tokens
directly into the tablet compound_sstable_set.

Refs #16876

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
0745865914 storage_group_manager: add make_sstable_set
Move the responsibility for preparing the table_set
covering all sstables in the table to the storage_group_manager
so it can specialize the sstable_set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
3cee24c148 tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count
Mini-cleanup of `new_tablet_count`, similar
to pre-calculating `old_tablet_count` once.

While at it, add some missing coding-style related spaces.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
c65768dc24 table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode
No validation is really required in release build.
Add `#ifndef SCYLLA_BUILD_MODE_RELEASE` before
adding another term to the logic in the next patch
that adds support for sparse allocation in a cloned
tablet_storage_group_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
7f203f0551 table: move compaction_group_list and storage_group_vector to storage_group_manager
So the storage_group_manager can be used later by table_sstable_set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:33 +02:00
Tzach Livyatan
a245c0bb98 Docs: Remove 3rd party Rust Driver from the driver list
The 3rd party Rust https://github.com/AlexPikalov/cdrs is not maintained, and we have a better internal alternative.

Closes scylladb/scylladb#15815
2024-03-06 10:34:43 +02:00