Commit Graph

34956 Commits

Author SHA1 Message Date
Benny Halevy
6d7b2bc02f sstables: compressed_file_data_source_impl: get: throw malformed_sstable_exception on premature eof
Currently, the reader might dereference a null pointer
if the input stream reaches eof prematurely,
and read_exactly returns an empty temporary_buffer.

Detect this condition before dereferencing the buffer
and sstables::malformed_sstable_exception.

Fixes #13599

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13600

(cherry picked from commit 77b70dbdb7)
2023-12-15 13:54:42 +02:00
Wojciech Mitros
119c8279dd rust: update wasmtime dependency
The previous version of wasmtime had a vulnerability that possibly
allowed causing undefined behavior when calling UDFs.

We're directly updating to wasmtime 8.0.1, because the update only
requires a slight code modification and the Wasm UDF feature is
still experimental. As a result, we'll benefit from a number of
new optimizations.

Fixes #13807

Closes #13804

(cherry picked from commit 6bc16047ba)
2023-12-15 13:54:42 +02:00
Michał Chojnowski
3af6dfe4ac database: fix reads_memory_consumption for system semaphore
The metric shows the opposite of what its name suggests.
It shows available memory rather than consumed memory.
Fix that.

Fixes #13810

Closes #13811

(cherry picked from commit 0813fa1da0)
2023-12-15 13:54:42 +02:00
Eliran Sinvani
0230798db3 use_statement: Covert an exception to a future exception
The use statement execution code can throw if the keyspace is
doesn't exist, this can be a problem for code that will use
execute in a fiber since the exception will break the fiber even
if `then_wrapped` is used.

Fixes #14449

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb/scylladb#14394

(cherry picked from commit c5956957f3)
2023-12-15 13:54:42 +02:00
Botond Dénes
64503a7137 Merge 'mutation_query: properly send range tombstones in reverse queries' from Michał Chojnowski
reconcilable_result_builder passes range tombstone changes to _rt_assembler
using table schema, not query schema.
This means that a tombstone with bounds (a; b), where a < b in query schema
but a > b in table schema, will not be emitted from mutation_query.

This is a very serious bug, because it means that such tombstones in reverse
queries are not reconciled with data from other replicas.
If *any* queried replica has a row, but not the range tombstone which deleted
the row, the reconciled result will contain the deleted row.

In particular, range deletes performed while a replica is down will not
later be visible to reverse queries which select this replica, regardless of the
consistency level.

As far as I can see, this doesn't result in any persistent data loss.
Only in that some data might appear resurrected to reverse queries,
until the relevant range tombstone is fully repaired.

This series fixes the bug and adds a minimal reproducer test.

Fixes #10598

Closes scylladb/scylladb#16003

* github.com:scylladb/scylladb:
  mutation_query_test: test that range tombstones are sent in reverse queries
  mutation_query: properly send range tombstones in reverse queries

(cherry picked from commit 65e42e4166)
2023-12-14 12:53:07 +02:00
Yaron Kaikov
b013877629 build_docker.sh: Upgrade package during creation and remove sshd service
When scanning our latest docker image using `trivy` (command: `trivy
image docker.io/scylladb/scylla-nightly:latest`), it shows we have OS
packages which are out of date.

Also removing `openssh-server` and `openssh-client` since we don't use
it for our docker images

Fixes: https://github.com/scylladb/scylladb/issues/16222

Closes scylladb/scylladb#16224

(cherry picked from commit 7ce6962141)

Closes #16360
2023-12-11 10:57:16 +02:00
Botond Dénes
33d2da94ab reader_concurrency_semaphore: execution_loop(): trigger admission check when _ready_list is empty
The execution loop consumes permits from the _ready_list and executes
them. The _ready_list usually contains a single permit. When the
_ready_list is not empty, new permits are queued until it becomes empty.
The execution loops relies on admission checks triggered by the read
releasing resouces, to bring in any queued read into the _ready_list,
while it is executing the current read. But in some cases the current
read might not free any resorces and thus fail to trigger an admission
check and the currently queued permits will sit in the queue until
another source triggers an admission check.
I don't yet know how this situation can occur, if at all, but it is
reproducible with a simple unit test, so it is best to cover this
corner-case in the off-chance it happens in the wild.
Add an explicit admission check to the execution loop, after the
_ready_list is exhausted, to make sure any waiters that can be admitted
with an empty _ready_list are admitted immediately and execution
continues.

Fixes: #13540

Closes #13541

(cherry picked from commit b790f14456)
2023-12-07 16:04:55 +02:00
Paweł Zakrzewski
dac69be4a4 auth: fix error message when consistency level is not met
Propagate `exceptions::unavailable_exception` error message to the
client such as cqlsh.

Fixes #2339

(cherry picked from commit 400aa2e932)
2023-12-07 14:49:47 +02:00
Botond Dénes
763e583cf2 Merge 'row_cache: abort on exteral_updater::execute errors' from Benny Halevy
Currently the cache updaters aren't exception safe
yet they are intended to be.

Instead of allowing exceptions from
`external_updater::execute` escape `row_cache::update`,
abort using `on_fatal_internal_error`.

Future changes should harden all `execute` implementations
to effectively make them `noexcept`, then the pure virtual
definition can be made `noexcept` to cement that.

\Fixes scylladb/scylladb#15576

\Closes scylladb/scylladb#15577

* github.com:scylladb/scylladb:
  row_cache: abort on exteral_updater::execute errors
  row_cache: do_update: simplify _prev_snapshot_pos setup

(cherry picked from commit 4a0f16474f)

Closes scylladb/scylladb#16256
2023-12-07 09:16:42 +02:00
Nadav Har'El
b331b4a4bb Backport fixes for nodetool commands with Alternator GSI in the database
Fixes #16153

* java e716e1bd1d...80701efa8d (1):
  > NodeProbe: allow addressing table name with colon in it

/home/nyh/scylla/tools$ git submodule summary jmx | cat
* jmx bc4f8ea...f21550e (3):
  > ColumnFamilyStore: only quote table names if necessary
  > APIBuilder: allow quoted scope names
  > ColumnFamilyStore: don't fail if there is a table with ":" in its name

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #16296
2023-12-06 10:48:49 +02:00
Anna Stuchlik
d9448a298f doc: fix rollback in the 4.6-to-5.0 upgrade guide
This commit fixes the rollback procedure in
the 4.6-to-5.0 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4, branch-5.2, and branch-5.1

Closes scylladb/scylladb#16155

(cherry picked from commit 1e80bdb440)
2023-12-05 15:10:21 +02:00
Anna Stuchlik
a82fd96b6a doc: fix rollback in the 5.0-to-5.1 upgrade guide
This commit fixes the rollback procedure in
the 5.0-to-5.1 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4, branch-5.2, and branch-5.1

Closes scylladb/scylladb#16154

(cherry picked from commit 7ad0b92559)
2023-12-05 15:08:25 +02:00
Anna Stuchlik
ae79fb9ce0 doc: fix rollback in the 5.1-to-5.2 upgrade guide
This commit fixes the rollback procedure in
the 5.1-to-5.2 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4 and branch-5.2.

Closes scylladb/scylladb#16152

(cherry picked from commit 91cddb606f)
2023-12-05 14:58:21 +02:00
Pavel Emelyanov
d83f4b9240 Update seastar submodule
* seastar eda297fc...43a1ce58 (2):
  > io_queue: Add iogroup label to metrics
  > io_queue: Remove ioshard metrics label

refs: scylladb/seastar#1591

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 10:46:07 +03:00
Raphael S. Carvalho
1b8c078cab test: Fix sporadic failures of database_test
database_test is failing sporadically and the cause was traced back
to commit e3e7c3c7e5.

The commit forces a subset of tests in database_test, to run once
for each of predefined x_log2_compaction_group settings.

That causes two problems:
1) test becomes 240% slower in dev mode.
2) queries on system.auth is timing out, and the reason is a small
table being spread across hundreds of compaction groups in each
shard. so to satisfy a range scan, there will be multiple hops,
making the overhead huge. additionally, the compaction group
aware sstable set is not merged yet. so even point queries will
unnecessarily scan through all the groups.

Fixes #13660.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13851

(cherry picked from commit a7ceb987f5)
2023-11-30 17:31:07 +02:00
Benny Halevy
1592a84b80 task_manager: module: make_task: enter gate when the task is created
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes scylladb/scylladb#15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit f9a7635390)
2023-11-30 17:16:57 +02:00
Michał Chojnowski
bfeadae1bd position_in_partition: make operator= exception-safe
The copy assignment operator of _ck can throw
after _type and _bound_weight have already been changed.
This leaves position_in_partition in an inconsistent state,
potentially leading to various weird symptoms.

The problem was witnessed by test_exception_safety_of_reads.
Specifically: in cache_flat_mutation_reader::add_to_buffer,
which requires the assignment to _lower_bound to be exception-safe.

The easy fix is to perform the only potentially-throwing step first.

Fixes #15822

Closes scylladb/scylladb#15864

(cherry picked from commit 93ea3d41d8)
2023-11-30 15:01:22 +02:00
Avi Kivity
2c219a65f8 Update seastar submodule (spins on epoll)
* seastar 45f4102428...eda297fcb5 (1):
  > epoll: Avoid spinning on aborted connections

Fixes #12774
Fixes #7753
Fixes #13337
2023-11-30 14:09:22 +02:00
Piotr Grabowski
7054b1ab1e install-dependencies.sh: update node_exporter to 1.7.0
Update node_exporter to 1.7.0.

The previous version (1.6.1) was flagged by security scanners (such as
Trivy) with HIGH-severity CVE-2023-39325. 1.7.0 release fixed that
problem.

[Botond: regenerate frozen toolchain]

Fixes #16085

Closes scylladb/scylladb#16086

Closes scylladb/scylladb#16090

(cherry picked from commit 321459ec51)

[avi: regenerate frozen toolchain]
2023-11-27 18:17:38 +00:00
Anna Mikhlin
a65838ee9c re-spin: 5.2.11 scylla-5.2.11 2023-11-26 16:17:58 +02:00
Botond Dénes
68faf18ad9 Update ./tools/jmx and ./tools/java submodules
* tools/jmx 88d9bdc...bc4f8ea (1):
  > Merge "scylla-apiclient: update several Java dependencies" from Piotr Grabowski

* tools/java f8f556d802...e716e1bd1d (1):
  > Merge 'build: update several dependencies' from Piotr Grabowski

Update build dependencies which were flagged by security scanners.

Refs: scylladb/scylla-jmx#220
Refs: scylladb/scylla-tools-java#351

Closes #16150
2023-11-23 15:29:00 +02:00
Beni Peled
44d1b55253 release: prepare for 5.2.11 2023-11-22 14:22:13 +02:00
Tomasz Grabiec
bfd8401477 api, storage_service: Recalculate table digests on relocal_schema api call
Currently, the API call recalculates only per-node schema version. To
workaround issues like #4485 we want to recalculate per-table
digests. One way to do that is to restart the node, but that's slow
and has impact on availability.

Use like this:

  curl -X POST http://127.0.0.1:10000/storage_service/relocal_schema

Fixes #15380

Closes #15381

(cherry picked from commit c27d212f4b)
2023-11-21 01:29:28 +01:00
Botond Dénes
e31f2224f5 migration_manager: also reload schema on enabling digest_insensitive_to_expiry
Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: #16004

Closes scylladb/scylladb#16013

(cherry picked from commit 22381441b0)
2023-11-21 01:29:28 +01:00
Kamil Braun
4101c8beab schema_tables: remove default value for reload in merge_schema
To avoid bugs like the one fixed in the previous commit.

(cherry picked from commit 4376854473)
2023-11-21 01:29:28 +01:00
Kamil Braun
c994ed2057 schema_tables: pass reload flag when calling merge_schema cross-shard
In 0c86abab4d `merge_schema` obtained a new flag, `reload`.

Unfortunately, the flag was assigned a default value, which I think is
almost always a bad idea, and indeed it was in this case. When
`merge_scehma` is called on shard different than 0, it recursively calls
itself on shard 0. That recursive call forgot to pass the `reload` flag.

Fix this.

(cherry picked from commit 48164e1d09)
2023-11-21 01:29:28 +01:00
Avi Kivity
40eed1f1c5 Merge 'schema_mutations, migration_manager: Ignore empty partitions in per-table digest' from Tomasz Grabiec
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

Tombstones expire 7 days after schema change which introduces them. If
one of the nodes is restarted after that, it will compute a different
table schema digest on boot. This may cause performance problems. When
sending a request from coordinator to replica, the replica needs
schema_ptr of exact schema version request by the coordinator. If it
doesn't know that version, it will request it from the coordinator and
perform a full schema merge. This adds latency to every such request.
Schema versions which are not referenced are currently kept in cache
for only 1 second, so if request flow has low-enough rate, this
situation results in perpetual schema pulls.

After ae8d2a550d (5.2.0), it is more liekly to
run into this situation, because table creation generates tombstones
for all schema tables relevant to the table, even the ones which
will be otherwise empty for the new table (e.g. computed_columns).

This change inroduces a cluster feature which when enabled will change
digest calculation to be insensitive to expiry by ignoring empty
partitions in digest calculation. When the feature is enabled,
schema_ptrs are reloaded so that the window of discrepancy during
transition is short and no rolling restart is required.

A similar problem was fixed for per-node digest calculation in
c2ba94dc39e4add9db213751295fb17b95e6b962. Per-table digest calculation
was not fixed at that time because we didn't persist enabled features
and they were not enabled early-enough on boot for us to depend on
them in digest calculation. Now they are enabled before non-system
tables are loaded so digest calculation can rely on cluster features.

Fixes #4485.

Manually tested using ccm on cluster upgrade scenarios and node restarts.

Closes #14441

* github.com:scylladb/scylladb:
  test: schema_change_test: Verify digests also with TABLE_DIGEST_INSENSITIVE_TO_EXPIRY enabled
  schema_mutations, migration_manager: Ignore empty partitions in per-table digest
  migration_manager, schema_tables: Implement migration_manager::reload_schema()
  schema_tables: Avoid crashing when table selector has only one kind of tables

(cherry picked from commit cf81eef370)
2023-11-21 01:29:28 +01:00
Gleb Natapov
f233c8a9e4 database: fix do_apply_many() to handle empty array of mutations
Currently the code will assert because cl pointer will be null and it
will be null because there is no mutations to initialize it from.
Message-Id: <20230212144837.2276080-3-gleb@scylladb.com>

(cherry picked from commit 941407b905)

Backport needed by #4485.
2023-11-21 01:29:17 +01:00
Botond Dénes
0f3e31975d api/storage_service: start/stop native transport in the statement sg
Currently, it is started/stopped in the streaming/maintenance sg, which
is what the API itself runs in.
Starting the native transport in the streaming sg, will lead to severely
degraded performance, as the streaming sg has significantly less
CPU/disk shares and reader concurrency semaphore resources.
Furthermore, it will lead to multi-paged reads possibly switching
between scheduling groups mid-way, triggering an internal error.

To fix, use `with_scheduling_group()` for both starting and stopping
native transport. Technically, it is only strictly necessary for
starting, but I added it for stop as well for consistency.

Also apply the same treatment to RPC (Thrift). Although no one uses it,
best to fix it, just to be on the safe side.

I think we need a more systematic approach for solving this once and for
all, like passing the scheduling group to the protocol server and have
it switch to it internally. This allows the server to always run on the
correct scheduling group, not depending on the caller to remember using
it. However, I think this is best done in a follow-up, to keep this
critical patch small and easily backportable.

Fixes: #15485

Closes scylladb/scylladb#16019

(cherry picked from commit dfd7981fa7)
2023-11-20 20:00:56 +02:00
Takuya ASADA
c98b22afce scylla_post_install.sh: detect RHEL correctly
$ID_LIKE = "rhel" works only on RHEL compatible OSes, not for RHEL
itself.
To detect RHEL correctly, we also need to check $ID = "rhel".

Fixes #16040

Closes scylladb/scylladb#16041

(cherry picked from commit 338a9492c9)
2023-11-20 19:36:22 +02:00
Marcin Maliszkiewicz
900754d377 db: view: run local materialized view mutations on a separate smp service group
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes https://github.com/scylladb/scylladb/issues/15844

Closes scylladb/scylladb#15845

(cherry picked from commit 020a9c931b)
2023-11-19 18:54:46 +02:00
Botond Dénes
fbb356aa88 repair/repair.cc: do_repair_ranges(): prevent stalls when skipping ranges
We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.

Fixes: #14330

Closes scylladb/scylladb#15879

(cherry picked from commit 90a8489809)
2023-11-08 21:10:30 +02:00
Michał Jadwiszczak
e8871c02a1 cql3:statements:describe_statement: check pointer to UDF/UDA
While looking for specific UDF/UDA, result of
`functions::functions::find()` needs to be filtered out based on
function's type.

Fixes: #14360
(cherry picked from commit d498451cdf)
2023-11-08 20:16:41 +02:00
Pavel Emelyanov
f76ba217e7 Merge 'api: failure_detector: invoke on shard 0' from Kamil Braun
These APIs may return stale or simply incorrect data on shards
other than 0. Newer versions of Scylla are better at maintaining
cross-shard consistency, but we need a simple fix that can be easily and
without risk be backported to older versions; this is the fix.

Add a simple test to check that the `failure_detector/endpoints`
API returns nonzero generation.

Fixes: scylladb/scylladb#15816

Closes scylladb/scylladb#15970

* github.com:scylladb/scylladb:
  test: rest_api: test that generation is nonzero in `failure_detector/endpoints`
  api: failure_detector: fix indentation
  api: failure_detector: invoke on shard 0

(cherry picked from commit 9443253f3d)
2023-11-07 15:12:12 +01:00
Botond Dénes
17e4d535db test/cql-pytest/nodetool.py: no_autocompaction_context: use the correct API
This `with` context is supposed to disable, then re-enable
autocompaction for the given keyspaces, but it used the wrong API for
it, it used the column_family/autocompaction API, which operates on
column families, not keyspaces. This oversight led to a silent failure
because the code didn't check the result of the request.
Both are fixed in this patch:
* switch to use `storage_service/auto_compaction/{keyspace}` endpoint
* check the result of the API calls and report errors as exceptions

Fixes: #13553

Closes #13568

(cherry picked from commit 66ee73641e)
2023-11-07 13:59:01 +02:00
Aleksandra Martyniuk
75b792e260 repair: release resources of shard_repair_task_impl
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.

In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.

Release shard_repair_task_impl resources immediately after shard
repair is finished.

Fixes: #15505.
(cherry picked from commit 0474e150a9)

Closes #15875
2023-11-07 09:40:05 +02:00
Tomasz Grabiec
573ef87245 Merge ' tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes
scylla-sstable currently has two ways to obtain the schema:

    * via a `schema.cql` file.
    * load schema definition from memory (only works for system tables).

This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.

This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.

Example:

```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```

As seen above, subdirectories like qurantine, staging etc are also supported.

Fixes: https://github.com/scylladb/scylladb/issues/10126

Closes #13448

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add tests for schema loading
  test/cql-pytest: add no_autocompaction_context
  docs: scylla-sstable.rst: remove accidentally added copy-pasta
  docs: scylla-sstable.rst: remove paragraph with schema limitations
  docs: scylla-sstable.rst: update schema section
  test/cql-pytest: nodetool.py: add flush_keyspace()
  tools/scylla-sstable: reform schema loading mechanism
  tools/schema_loader: add load_schema_from_schema_tables()
  db/schema_tables: expose types schema

(cherry picked from commit 952b455310)

Closes #15386
2023-11-02 17:25:18 +02:00
Beni Peled
454e5a7110 release: prepare for 5.2.10 scylla-5.2.10 2023-11-02 15:08:11 +00:00
Avi Kivity
9967c0bda4 Update tools/pythion3 submodule (tar file timestamps)
* tools/python3 cf7030a...6ad2e5a (1):
  > create-relocatable-package.py: fix timestamp of executable files

Fixes #13415.
2023-11-02 12:37:09 +01:00
Botond Dénes
48509c5c00 Merge '[Backport 5.2] properly update storage service after schema changes' from Benny Halevy
This is a backport of https://github.com/scylladb/scylladb/pull/14158 to branch 5.2

Closes #15872

* github.com:scylladb/scylladb:
  migration_notifier: get schema_ptr by value
  migration_manager: propagate listener notification exceptions
  storage_service: keyspace_changed: execute only on shard 0
  database: modify_keyspace_on_all_shards: execute func first on shard 0
  database: modify_keyspace_on_all_shards: call notifiers only after applying func on all shards
  database: add modify_keyspace_on_all_shards
  schema_tables: merge_keyspaces: extract_scylla_specific_keyspace_info for update_keyspace
  database: create_keyspace_on_all_shards
  database: update_keyspace_on_all_shards
  database: drop_keyspace_on_all_shards
2023-10-31 10:27:08 +02:00
Botond Dénes
d606e9bfa2 Merge '[branch-5.2] Enable incremental compaction on off-strategy' from Raphael "Raph" Carvalho
Off-strategy suffers with a 100% space overhead, as it adopted
a sort of all or nothing approach. Meaning all input sstables,
living in maintenance set, are kept alive until they're all
reshaped according to the strategy criteria.

Input sstables in off-strategy are very likely to be mostly disjoint,
so it can greatly benefit from incremental compaction.

The incremental compaction approach is not only good for
decreasing disk usage, but also memory usage (as metadata of
input and output live in memory), and file desc count, which
takes memory away from OS.

Turns out that this approach also greatly simplifies the
off-strategy impl in compaction manager, as it no longer have
to maintain new unused sstables and mark them for
deletion on failure, and also unlink intermediary sstables
used between reshape rounds.

Fixes https://github.com/scylladb/scylladb/issues/14992.

Backport notes: relatively easy to backport, had to include
**replica: Make compaction_group responsible for deleting off-strategy compaction input**
and
**compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0**

Closes #15793

* github.com:scylladb/scylladb:
  test: Verify that off-strategy can do incremental compaction
  compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
  compaction: Clear pending_replacement list when tombstone GC is disabled
  compaction: Enable incremental compaction on off-strategy
  compaction: Extend reshape type to allow for incremental compaction
  compaction: Move reshape_compaction in the source
  compaction: Enable incremental compaction only if replacer callback is engaged
  replica: Make compaction_group responsible for deleting off-strategy compaction input
2023-10-30 12:00:54 +02:00
Benny Halevy
cd7abb3833 migration_notifier: get schema_ptr by value
To prevent use-after-free as seen in
https://github.com/scylladb/scylladb/issues/15097
where a temp schema_ptr retrieved from a global_schema_ptr
get destroyed when the notification function yielded.

Capturing the schema_ptr on the coroutine frame
is inexpensive since its a shared ptr and it makes sure
that the schema remains valid throughput the coroutine
life time.

\Fixes scylladb/scylladb#15097

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

\Closes #15098

(cherry picked from commit 0f54e24519)
2023-10-29 19:39:17 +02:00
Benny Halevy
8064fface9 migration_manager: propagate listener notification exceptions
1e29b07e40 claimed
to make event notification exception safe,
but swallawing the exceptions isn't safe at all,
as this might leave the node in an inconsistent state
if e.g. storage_service::keyspace_changed fails on any of the
shards.  Propagating the exception here will cause abort,
but it is better than leaving the node up, but in an
inconsistent state.

We keep notifying other listeners even if any of them failed
Based on 1e29b07e40:
```
If one of the listeners throws an exception, we must ensure that other
listeners are still notified.
```

The decision about swallowing exceptions can't be
made in such a generic layer.
Specific notification listeners that may ignore exceptions,
like in transport/evenet_notifier, may decide to swallow their
local exceptions on their own (as done in this patch).

Refs #3389

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 825d617a53)
2023-10-29 19:32:55 +02:00
Benny Halevy
0cf6891c6d storage_service: keyspace_changed: execute only on shard 0
Previously all shards called `update_topology_change_info`
which in turn calls `mutate_token_metadata`, ending up
in quadratic complexity.

Now that the notifications are called after
all database shards are updated, we can apply
the changes on token metadata / effective replication map
only on shard 0 and count on replicate_to_all_cores to
propagate those changes to all other shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a690f0e81f)
2023-10-29 19:27:52 +02:00
Benny Halevy
16a594d564 database: modify_keyspace_on_all_shards: execute func first on shard 0
When creating or altering a keyspace, we create a new
effective_replication_map instance.

It is more efficient to do that first on shard 0
and then on all other shards, otherwise multiple
shards might need to calculate to new e_r_m (and reach
the same result).  When the new e_r_m is "seeded" on
shard 0, other shards will find it there and clone
a local copy of it - which is more efficient.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 13dd92e618)
2023-10-29 19:22:01 +02:00
Benny Halevy
096c312821 database: modify_keyspace_on_all_shards: call notifiers only after applying func on all shards
When creating, updating, or dropping keyspaces,
first execute the database internal function to
modify the database state, and only when all shards
are updated, run the listener notifications,
to make sure they would operate when the database
shards are consistent with each other.

\Fixes #13137

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit ba15786059)
2023-10-29 19:21:34 +02:00
Benny Halevy
5c27dacad5 database: add modify_keyspace_on_all_shards
Run all keyspace create/update/drop ops
via `modify_keyspace_on_all_shards` that
will standardize the execution on all shards
in the coming patches.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 3b8c913e61)
2023-10-29 19:16:56 +02:00
Benny Halevy
14113dc23e schema_tables: merge_keyspaces: extract_scylla_specific_keyspace_info for update_keyspace
Similar to create_keyspace_on_all_shards,
`extract_scylla_specific_keyspace_info` and
`create_keyspace_from_schema_partition` can be called
once in the upper layer, passing keyspace_metadata&
down to database::update_keyspace_on_all_shards
which now would only make the per-shard
keyspace_metadata from the reference it gets
from the schema_tables layer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit dc9b0812e9)
2023-10-29 19:14:06 +02:00
Benny Halevy
4d5a99f3b8 database: create_keyspace_on_all_shards
Part of moving the responsibility for applying
and notifying keyspace schema changes from
schema_tables to the database so that the
database can control the order of applying the changes
across shards and when to notify its listeners.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 3520c786bd)
2023-10-29 19:13:55 +02:00
Benny Halevy
ffe28b3e3f database: update_keyspace_on_all_shards
Part of moving the responsibility for applying
and notifying keyspace schema changes from
schema_tables to the database so that the
database can control the order of applying the changes
across shards and when to notify its listeners.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 53a6ea8616)
2023-10-29 19:06:45 +02:00