32747 Commits

Author SHA1 Message Date
Anna Stuchlik
790a36155b doc: fix rollback in the 5.0-to-5.1 upgrade guide
This commit fixes the rollback procedure in
the 5.0-to-5.1 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4, branch-5.2, and branch-5.1

Closes scylladb/scylladb#16154

(cherry picked from commit 7ad0b92559)
2023-12-05 15:08:58 +02:00
Pavel Emelyanov
6be2ba8a0b Update seastar submodule
* seastar 0377812f...c2152bc0 (2):
  > io_queue: Add iogroup label to metrics
  > io_queue: Remove ioshard metrics label

refs: scylladb/seastar#1591

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 10:42:50 +03:00
Michał Chojnowski
f7d6364712 position_in_partition: make operator= exception-safe
The copy assignment operator of _ck can throw
after _type and _bound_weight have already been changed.
This leaves position_in_partition in an inconsistent state,
potentially leading to various weird symptoms.

The problem was witnessed by test_exception_safety_of_reads.
Specifically: in cache_flat_mutation_reader::add_to_buffer,
which requires the assignment to _lower_bound to be exception-safe.

The easy fix is to perform the only potentially-throwing step first.

Fixes #15822

Closes scylladb/scylladb#15864

(cherry picked from commit 93ea3d41d8)
2023-11-30 15:01:40 +02:00
Avi Kivity
b6d0949ab5 Update seastar submodule (spins on epoll)
* seastar 06bb987969...0377812f33 (1):
  > epoll: Avoid spinning on aborted connections

Fixes #12774
Fixes #7753
Fixes #13337
2023-11-30 14:14:56 +02:00
Piotr Grabowski
74ebd484ae install-dependencies.sh: update node_exporter to 1.7.0
Update node_exporter to 1.7.0.

The previous version (1.6.1) was flagged by security scanners (such as
Trivy) with HIGH-severity CVE-2023-39325. 1.7.0 release fixed that
problem.

[Botond: regenerate frozen toolchain]

Fixes #16085

Closes scylladb/scylladb#16086

Closes scylladb/scylladb#16090

(cherry picked from commit 321459ec51)

[avi: regenerate frozen toolchain]
[avi: update build script to work around https://users.rust-lang.org/t/cargo-uses-too-much-memory-being-run-in-qemu/76531]
2023-11-27 18:16:47 +00:00
Anna Mikhlin
0f2269afbf re-spin: 5.1.19 scylla-5.1.19 2023-11-26 17:09:36 +02:00
Botond Dénes
3a66260231 Update ./tools/jmx and ./tools/java submodules
* tools/jmx 06f2735...ed3cc6d (1):
  > Merge "scylla-apiclient: update several Java dependencies" from Piotr Grabowski

* tools/java be0aaf7597...7459a11815 (1):
  > Merge 'build: update several dependencies' from Piotr Grabowski

Update build dependencies which were flagged by security scanners.

Refs: scylladb/scylla-jmx#220
Refs: scylladb/scylla-tools-java#351

Closes #16151
2023-11-23 15:30:22 +02:00
Beni Peled
3ffd3e6636 release: prepare for 5.1.19 2023-11-22 14:37:39 +02:00
Tomasz Grabiec
370ffc80b0 api, storage_service: Recalculate table digests on relocal_schema api call
Currently, the API call recalculates only per-node schema version. To
workaround issues like #4485 we want to recalculate per-table
digests. One way to do that is to restart the node, but that's slow
and has impact on availability.

Use like this:

  curl -X POST http://127.0.0.1:10000/storage_service/relocal_schema

Fixes #15380

Closes #15381

(cherry picked from commit c27d212f4b)
(cherry picked from commit bfd8401477)
2023-11-22 00:12:51 +01:00
Botond Dénes
c9fa077c82 migration_manager: also reload schema on enabling digest_insensitive_to_expiry
Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: #16004

Closes scylladb/scylladb#16013

(cherry picked from commit 22381441b0)
(cherry picked from commit e31f2224f5)
2023-11-21 21:42:19 +01:00
Kamil Braun
2ea211db69 schema_tables: remove default value for reload in merge_schema
To avoid bugs like the one fixed in the previous commit.

(cherry picked from commit 4376854473)
(cherry picked from commit 4101c8beab)
2023-11-21 21:42:19 +01:00
Kamil Braun
dc4be20609 schema_tables: pass reload flag when calling merge_schema cross-shard
In 0c86abab4d `merge_schema` obtained a new flag, `reload`.

Unfortunately, the flag was assigned a default value, which I think is
almost always a bad idea, and indeed it was in this case. When
`merge_scehma` is called on shard different than 0, it recursively calls
itself on shard 0. That recursive call forgot to pass the `reload` flag.

Fix this.

(cherry picked from commit 48164e1d09)
(cherry picked from commit c994ed2057)
2023-11-21 21:42:19 +01:00
Avi Kivity
7a2f9fb48f Merge 'schema_mutations, migration_manager: Ignore empty partitions in per-table digest' from Tomasz Grabiec
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

Tombstones expire 7 days after schema change which introduces them. If
one of the nodes is restarted after that, it will compute a different
table schema digest on boot. This may cause performance problems. When
sending a request from coordinator to replica, the replica needs
schema_ptr of exact schema version request by the coordinator. If it
doesn't know that version, it will request it from the coordinator and
perform a full schema merge. This adds latency to every such request.
Schema versions which are not referenced are currently kept in cache
for only 1 second, so if request flow has low-enough rate, this
situation results in perpetual schema pulls.

After ae8d2a550d (5.2.0), it is more liekly to
run into this situation, because table creation generates tombstones
for all schema tables relevant to the table, even the ones which
will be otherwise empty for the new table (e.g. computed_columns).

This change inroduces a cluster feature which when enabled will change
digest calculation to be insensitive to expiry by ignoring empty
partitions in digest calculation. When the feature is enabled,
schema_ptrs are reloaded so that the window of discrepancy during
transition is short and no rolling restart is required.

A similar problem was fixed for per-node digest calculation in
c2ba94dc39e4add9db213751295fb17b95e6b962. Per-table digest calculation
was not fixed at that time because we didn't persist enabled features
and they were not enabled early-enough on boot for us to depend on
them in digest calculation. Now they are enabled before non-system
tables are loaded so digest calculation can rely on cluster features.

Fixes #4485.

Manually tested using ccm on cluster upgrade scenarios and node restarts.

Closes #14441

* github.com:scylladb/scylladb:
  test: schema_change_test: Verify digests also with TABLE_DIGEST_INSENSITIVE_TO_EXPIRY enabled
  schema_mutations, migration_manager: Ignore empty partitions in per-table digest
  migration_manager, schema_tables: Implement migration_manager::reload_schema()
  schema_tables: Avoid crashing when table selector has only one kind of tables

(cherry picked from commit cf81eef370)
(cherry picked from commit 40eed1f1c5)
2023-11-21 21:42:19 +01:00
Gleb Natapov
801687f185 database: fix do_apply_many() to handle empty array of mutations
Currently the code will assert because cl pointer will be null and it
will be null because there is no mutations to initialize it from.
Message-Id: <20230212144837.2276080-3-gleb@scylladb.com>

(cherry picked from commit 941407b905)

Backport needed by #4485.

(cherry picked from commit f233c8a9e4)
2023-11-21 21:10:59 +01:00
Botond Dénes
e3de2187ef api/storage_service: start/stop native transport in the statement sg
Currently, it is started/stopped in the streaming/maintenance sg, which
is what the API itself runs in.
Starting the native transport in the streaming sg, will lead to severely
degraded performance, as the streaming sg has significantly less
CPU/disk shares and reader concurrency semaphore resources.
Furthermore, it will lead to multi-paged reads possibly switching
between scheduling groups mid-way, triggering an internal error.

To fix, use `with_scheduling_group()` for both starting and stopping
native transport. Technically, it is only strictly necessary for
starting, but I added it for stop as well for consistency.

Also apply the same treatment to RPC (Thrift). Although no one uses it,
best to fix it, just to be on the safe side.

I think we need a more systematic approach for solving this once and for
all, like passing the scheduling group to the protocol server and have
it switch to it internally. This allows the server to always run on the
correct scheduling group, not depending on the caller to remember using
it. However, I think this is best done in a follow-up, to keep this
critical patch small and easily backportable.

Fixes: #15485

Closes scylladb/scylladb#16019

(cherry picked from commit dfd7981fa7)
2023-11-20 20:01:56 +02:00
Takuya ASADA
c8fdd595e3 scylla_post_install.sh: detect RHEL correctly
$ID_LIKE = "rhel" works only on RHEL compatible OSes, not for RHEL
itself.
To detect RHEL correctly, we also need to check $ID = "rhel".

Fixes #16040

Closes scylladb/scylladb#16041

(cherry picked from commit 338a9492c9)
2023-11-20 19:36:40 +02:00
Marcin Maliszkiewicz
abf62e5b7f db: view: run local materialized view mutations on a separate smp service group
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes https://github.com/scylladb/scylladb/issues/15844

Closes scylladb/scylladb#15845

(cherry picked from commit 020a9c931b)
2023-11-19 18:56:32 +02:00
Pavel Emelyanov
f886581bee Merge 'api: failure_detector: invoke on shard 0' from Kamil Braun
These APIs may return stale or simply incorrect data on shards
other than 0. Newer versions of Scylla are better at maintaining
cross-shard consistency, but we need a simple fix that can be easily and
without risk be backported to older versions; this is the fix.

Add a simple test to check that the `failure_detector/endpoints`
API returns nonzero generation.

Fixes: scylladb/scylladb#15816

Closes scylladb/scylladb#15970

* github.com:scylladb/scylladb:
  test: rest_api: test that generation is nonzero in `failure_detector/endpoints`
  api: failure_detector: fix indentation
  api: failure_detector: invoke on shard 0

(cherry picked from commit 9443253f3d)
2023-11-07 14:56:53 +01:00
Botond Dénes
2071b70394 Merge '[branch-5.1] Enable incremental compaction on off-strategy' from Raphael "Raph" Carvalho
Off-strategy suffers with a 100% space overhead, as it adopted
a sort of all or nothing approach. Meaning all input sstables,
living in maintenance set, are kept alive until they're all
reshaped according to the strategy criteria.

Input sstables in off-strategy are very likely to be mostly disjoint,
so it can greatly benefit from incremental compaction.

The incremental compaction approach is not only good for
decreasing disk usage, but also memory usage (as metadata of
input and output live in memory), and file desc count, which
takes memory away from OS.

Turns out that this approach also greatly simplifies the
off-strategy impl in compaction manager, as it no longer have
to maintain new unused sstables and mark them for
deletion on failure, and also unlink intermediary sstables
used between reshape rounds.

Fixes https://github.com/scylladb/scylladb/issues/14992.

Backport notes: relatively easy to backport, had to include
**replica: Make compaction_group responsible for deleting off-strategy compaction input**
and
**compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0**

Closes #15794

* github.com:scylladb/scylladb:
  test: Verify that off-strategy can do incremental compaction
  compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
  compaction: Clear pending_replacement list when tombstone GC is disabled
  compaction: Enable incremental compaction on off-strategy
  compaction: Extend reshape type to allow for incremental compaction
  compaction: Move reshape_compaction in the source
  compaction: Enable incremental compaction only if replacer callback is engaged
  replica: Make compaction_group responsible for deleting off-strategy compaction input
2023-10-30 12:01:34 +02:00
Benny Halevy
83ca111398 docs: nodetool/removenode: fix host_id in examples
removenode host_id must specify the host ID as a UUID,
not an ip address.

Fixes #11839

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11840

(cherry picked from commit 44e1058f63)
2023-10-27 09:22:19 +03:00
Kefu Chai
58f1ecdddb sstables: writer: delegate flush() in checksummed_file_data_sink_impl
before this change, `checksummed_file_data_sink_impl` just inherits the
`data_sink_impl::flush()` from its parent class. but as a wrapper around
the underlying `_out` data_sink, this is not only an unusual design
decision in a layered design of an I/O system, but also could be
problematic. to be more specific, the typical user of `data_sink_impl`
is a `data_sink`, whose `flush()` member function is called when
the user of `data_sink` want to ensure that the data sent to the sink
is pushed to the underlying storage / channel.

this in general works, as the typical user of `data_sink` is in turn
`output_stream`, which calls `data_sink.flush()` before closing the
`data_sink` with `data_sink.close()`. and the operating system will
eventually flush the data after application closes the corresponding
fd. to be more specific, almost none of the popular local filesystem
implements the file_operations.op, hence, it's safe even if the
`output_stream` does not flush the underlying data_sink after writing
to it. this is the use case when we write to sstables stored on local
filesystem. but as explained above, if the data_sink is backed by a
network filesystem, a layered filesystem or a storage connected via
a buffered network device, then it is crucial to flush in a timely
manner, otherwise we could risk data lost if the application / machine /
network breaks when the data is considerered persisted but they are
_not_!

but the `data_sink` returned by `client::make_upload_jumbo_sink` is
a little bit different. multipart upload is used under the hood, and
we have to finalize the upload once all the parts are uploaded by
calling `close()`. but if the caller fails / chooses to close the
sink before flushing it, the upload is aborted, and the partially
uploaded parts are deleted.

the default-implemented `checksummed_file_data_sink_impl::flush()`
breaks `upload_jumbo_sink` which is the `_out` data_sink being
wrapped by `checksummed_file_data_sink_impl`. as the `flush()`
calls are shortcircuited by the wrapper, the `close()` call
always aborts the upload. that's why the data and index components
just fail to upload with the S3 backend.

in this change, we just delegate the `flush()` call to the
wrapped class.

Fixes #15079
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15134

(cherry picked from commit d2d1141188)
2023-10-26 16:48:34 +03:00
Avi Kivity
18014f1d9a cql3: grammar: reject intValue with no contents
The grammar mistakenly allows nothing to be parsed as an
intValue (itself accepted in LIMIT and similar clauses).

Easily fixed by removing the empty alternative. A unit test is
added.

Fixes #14705.

Closes #14707

(cherry picked from commit e00811caac)
2023-10-25 19:28:34 +03:00
Wojciech Mitros
2c50655835 build: set an older version for cxxbridge that works in the frozen toolchain
In this branch(5.1) the most recent available rustc version is 1.60,
despite that, the 'cargo install' command tries to install the most
recent version of a package by default, which may rely on newer rustc
versions. This patch specifies the version of the cxxbridge-cmd package
to one that works with rustc 1.60.

Closes scylladb/scylladb#15812

[avi: regenerated frozen toolchain]

Closes scylladb/scylladb#15828
2023-10-24 16:59:18 +03:00
Raphael S. Carvalho
6acb1916f0 test: Verify that off-strategy can do incremental compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-22 17:41:30 -03:00
Benny Halevy
72522849c1 compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
Prevent div-by-zero byt returning const level 1
if max_sstable_size is zero, as configured by
cleanup_incremental_compaction_test, before it's
extended to cover also offstrategy compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b1e164a241)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-22 17:35:29 -03:00
Raphael S. Carvalho
e3bd23a429 compaction: Clear pending_replacement list when tombstone GC is disabled
pending_replacement list is used by incremental compaction to
communicate to other ongoing compactions about exhausted sstables
that must be replaced in the sstable set they keep for tombstone
GC purposes.

Reshape doesn't enable tombstone GC, so that list will not
be cleared, which prevents incremental compaction from releasing
sstables referenced by that list. It's not a problem until now
where we want reshape to do incremental compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-22 17:35:27 -03:00
Raphael S. Carvalho
f2f590d197 compaction: Enable incremental compaction on off-strategy
Off-strategy suffers with a 100% space overhead, as it adopted
a sort of all or nothing approach. Meaning all input sstables,
living in maintenance set, are kept alive until they're all
reshaped according to the strategy criteria.

Input sstables in off-strategy are very likely to mostly disjoint,
so it can greatly benefit from incremental compaction.

The incremental compaction approach is not only good for
decreasing disk usage, but also memory usage (as metadata of
input and output live in memory), and file desc count, which
takes memory away from OS.

Turns out that this approach also greatly simplifies the
off-strategy impl in compaction manager, as it no longer have
to maintain new unused sstables and mark them for
deletion on failure, and also unlink intermediary sstables
used between reshape rounds.

Fixes #14992.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 42050f13a0)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-22 17:35:22 -03:00
Raphael S. Carvalho
67d6bd49e6 compaction: Extend reshape type to allow for incremental compaction
That's done by inheriting regular_compaction, which implement
incremental compaction. But reshape still implements its own
methods for creating writer and reader. One reason is that
reshape is not driven by controller, as input sstables to it
live in maintenance set. Another reason is customization
of things like sstable origin, etc.
stop_sstable_writer() is extended because that's used by
regular_compaction to check for possibility of removing
exhausted sstables earlier whenever an output sstable
is sealed.
Also, incremental compaction will be unconditionally
enabled for ICS/LCS during off-strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit db9ce9f35a)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-22 17:35:19 -03:00
Raphael S. Carvalho
d20989470e compaction: Move reshape_compaction in the source
That's in preparation to next change that will make reshape
inherit from regular compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-22 17:35:14 -03:00
Raphael S. Carvalho
3854de1656 compaction: Enable incremental compaction only if replacer callback is engaged
That's needed for enabling incremental compaction to operate, and
needed for subsequent work that enables incremental compaction
for off-strategy, which in turn uses reshape compaction type.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-22 17:35:11 -03:00
Raphael S. Carvalho
af66363067 replica: Make compaction_group responsible for deleting off-strategy compaction input
Compaction group is responsible for deleting SSTables of "in-strategy"
compactions, i.e. regular, major, cleanup, etc.

Both in-strategy and off-strategy compaction have their completion
handled using the same compaction group interface, which is
compaction_group::table_state::on_compaction_completion(...,
				sstables::offstrategy offstrategy)

So it's important to bring symmetry there, by moving the responsibility
of deleting off-strategy input, from manager to group.

Another important advantage is that off-strategy deletion is now throttled
and gated, allowing for better control, e.g. table waiting for deletion
on shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13432

(cherry picked from commit 457c772c9c)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-10-22 17:28:07 -03:00
Raphael S. Carvalho
387c567125 Resurrect optimization to avoid bloom filter checks during compaction
Commit 8c4b5e4 introduced an optimization which only
calculates max purgeable timestamp when a tombstone satisfy the
grace period.

Commit 'repair: Get rid of the gc_grace_seconds' inverted the order,
probably under the assumption that getting grace period can be
more expensive than calculating max purgeable, as repair-mode GC
will look up into history data in order to calculate gc_before.

This caused a significant regression on tombstone heavy compactions,
where most of tombstones are still newer than grace period.
A compaction which used to take 5s, now takes 35s. 7x slower.

The reason is simple, now calculation of max purgeable happens
for every single tombstone (once for each key), even the ones that
cannot be GC'ed yet. And each calculation has to iterate through
(i.e. check the bloom filter of) every single sstable that doesn't
participate in compaction.

Flame graph makes it very clear that bloom filter is a heavy path
without the optimization:
    45.64%    45.64%  sstable_compact  sstable_compaction_test_g
        [.] utils::filter::bloom_filter::is_present

With its resurrection, the problem is gone.

This scenario can easily happen, e.g. after a deletion burst, and
tombstones becoming only GC'able after they reach upper tiers in
the LSM tree.

Before this patch, a compaction can be estimated to have this # of
filter checks:
(# of keys containing *any* tombstone) * (# of uncompacting sstable
runs[1])

[1] It's # of *runs*, as each key tend to overlap with only one
fragment of each run.

After this patch, the estimation becomes:
(# of keys containing a GC'able tombstone) * (# of uncompacting
runs).

With repair mode for tombstone GC, the assumption, that retrieval
of gc_before is more expensive than calculating max purgeable,
is kept. We can revisit it later. But the default mode, which
is the "timeout" (i.e. gc_grace_seconds) one, we still benefit
from the optimization of deferring the calculation until
needed.

Cherry picked from commit 38b226f997

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Fixes #14091.

Closes #13908

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #15745
2023-10-20 09:35:51 +03:00
Botond Dénes
553658ef6f Merge '[5.1 backport] doc: remove recommended image upgrade with OS from previous releases' from Anna Stuchlik
This is a backport of PR  https://github.com/scylladb/scylladb/pull/15740.

This commit removes the information about the recommended way of upgrading ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade procedure is not supported (it was implemented, but then reverted).

The scope of this commit:

- Remove the information from the 5.0-to.-5.1 upgrade guide and replace with general info.
- Remove the information from the 4.6-to.-5.1 upgrade guide and replace with general info.
- Remove the information from the 5.x.y-to.-5.x.z upgrade guide and replace with general info.
- Remove the following files as no longer necessary (they were only created to incorporate the (invalid) information about image upgrade into the upgrade guides.
    /upgrade/_common/upgrade-image-opensource.rst
    /upgrade/_common/upgrade-guide-v5-patch-ubuntu-and-debian-p1.rst
    /upgrade/_common/upgrade-guide-v5-patch-ubuntu-and-debian-p2.rst
    /upgrade/_common/upgrade-guide-v5-patch-ubuntu-and-debian.rst

Closes #15769

* github.com:scylladb/scylladb:
  doc: remove wrong image upgrade info (5.x.y-to-5.x.y)
  doc: remove wrong image upgrade info (4.6-to-5.0)
  doc: remove wrong image upgrade info (5.0-to-5.1)
2023-10-19 13:34:30 +03:00
Anna Stuchlik
6628bee308 doc: remove wrong image upgrade info (5.x.y-to-5.x.y)
This commit removes the invalid information about
the recommended way of upgrading ScyllaDB
images (by updating ScyllaDB and OS packages
in one step) from the 5.x.y-to-5.x.y upgrade guide.
This upgrade procedure is not supported (it was
implemented, but then reverted).

Refs https://github.com/scylladb/scylladb/issues/15733

In addition, the following files are removed as no longer
necessary (they were only created to incorporate the (invalid)
information about image upgrade into the upgrade guides.

/upgrade/_common/upgrade-image-opensource.rst
/upgrade/_common/upgrade-guide-v5-patch-ubuntu-and-debian-p1.rst
/upgrade/_common/upgrade-guide-v5-patch-ubuntu-and-debian-p2.rst
/upgrade/_common/upgrade-guide-v5-patch-ubuntu-and-debian.rst

(cherry picked from commit dd1207cabb)
2023-10-19 09:08:35 +02:00
Anna Stuchlik
407585cd40 doc: remove wrong image upgrade info (4.6-to-5.0)
This commit removes the invalid information about
the recommended way of upgrading ScyllaDB
images (by updating ScyllaDB and OS packages
in one step) from the 4.6-to-5.0 upgrade guide.
This upgrade procedure is not supported (it was
implemented, but then reverted).

Refs https://github.com/scylladb/scylladb/issues/15733

(cherry picked from commit 526d543b95)
2023-10-19 09:07:36 +02:00
Anna Stuchlik
3d1218bacb doc: remove wrong image upgrade info (5.0-to-5.1)
This commit removes the invalid information about
the recommended way of upgrading ScyllaDB
images (by updating ScyllaDB and OS packages
in one step) from the 5.0-to-5.1 upgrade guide.
This upgrade procedure is not supported (it was
implemented, but then reverted).

Refs https://github.com/scylladb/scylladb/issues/15733

(cherry picked from commit 9852130c5b)
2023-10-19 09:07:20 +02:00
Asias He
23f9fdfbba repair: Use the updated estimated_partitions to create writer
The estimated_partitions is estimated after the repair_meta is created.

Currently, the default estimated_partitions was used to create the
write which is not correct.

To fix, use the updated estimated_partitions.

Reported by Petr Gusev

Closes #14179
Fixes #15748

(cherry picked from commit 4592bbe182)
2023-10-18 13:58:58 +03:00
Nadav Har'El
eaf93b3953 Cherry-pick Seastar patch
Backported Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f:

* seastar 04a39f448...06bb98796 (1):
  > sstring: refactor to_sstring() using fmt::format_to()

Refs https://github.com/scylladb/scylladb/issues/15127

Closes #15664
2023-10-09 12:39:06 +03:00
Raphael S. Carvalho
5591bb15a3 reader_concurrency_semaphore: Fix stop() in face of evictable reads becoming inactive
Scylla can crash due to a complicated interaction of service level drop,
evictable readers, inactive read registration path.

1) service level drop invoke stop of reader concurrency semaphore, which will
wait for in flight requests

2) turns out it stops first the gate used for closing readers that will
become inactive.

3) proceeds to wait for in-flight reads by closing the reader permit gate.

4) one of evictable reads take the inactive read registration path, and
finds the gate for closing readers closed.

5) flat mutation reader is destroyed, but finds the underlying reader was
not closed gracefully and triggers the abort.

By closing permit gate first, evictable readers becoming inactive will
be able to properly close underlying reader, therefore avoiding the
crash.

Fixes #15534.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15535

(cherry picked from commit 914cbc11cf)
2023-09-29 09:25:27 +03:00
Yaron Kaikov
3f3ecbe727 release: prepare for 5.1.18 scylla-5.1.18 2023-09-19 14:53:23 +03:00
Avi Kivity
fafc10e89e Merge "auth: do not grant permissions to creator without actually creating" from Wojciech Mitros
Currently, when creating the table, permissions may be mistakenly
granted to the user even if the table is already existing. This
can happen in two cases:

The query has a IF NOT EXISTS clause - as a result no exception
is thrown after encountering the existing table, and the permission
granting is not prevented.
The query is handled by a non-zero shard - as a result we accept
the query with a bounce_to_shard result_message, again without
preventing the granting of permissions.
These two cases are now avoided by checking the result_message
generated when handling the query - now we only grant permissions
when the query resulted in a schema_change message.

Additionally, a test is added that reproduces both of the mentioned
cases.

CVE-2023-33972

Fixes #15467.

* 'no-grant-on-no-create' of github.com:scylladb/scylladb-ghsa-ww5v-p45p-3vhq:
  auth: do not grant permissions to creator without actually creating
  transport: add is_schema_change() method to result_message

(cherry picked from commit ab6988c52f)
2023-09-19 02:19:52 +03:00
Raphael S. Carvalho
c6c05b8a40 compaction: base compaction throughput on amount of data read
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.

A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.

Fixes #14533.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14615

(cherry picked from commit 3b1829f0d8)
2023-09-14 21:30:51 +03:00
Jan Ciolek
f5c542de13 cql.g: make the parser reject INSERT JSON without a JSON value
We allow inserting column values using a JSON value, eg:
```cql
INSERT INTO mytable JSON '{ "\"myKey\"": 0, "value": 0}';
```

When no JSON value is specified, the query should be rejected.

Scylla used to crash in such cases. A recent change fixed the crash
(https://github.com/scylladb/scylladb/pull/14706), it now fails
on unwrapping an uninitialized value, but really it should
be rejected at the parsing stage, so let's fix the grammar so that
it doesn't allow JSON queries without JSON values.

A unit test is added to prevent regressions.

Refs: https://github.com/scylladb/scylladb/pull/14707
Fixes: https://github.com/scylladb/scylladb/issues/14709

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

\Closes #14785

(cherry picked from commit cbc97b41d4)
2023-09-14 21:22:51 +03:00
Nadav Har'El
bc62963e61 test/alternator: fix flaky test test_ttl_expiration_gsi_lsi
The Alternator test test_ttl.py::test_ttl_expiration_gsi_lsi was flaky.
The test incorrectly assumes that when we write an already expired item,
it will be visible for a short time until being deleted by the TTL thread.
But this doesn't need to be true - if the test is slow enough, it may go
look or the item after it was already expired!

So we fix this test by splitting it into two parts - in the first part
we write a non-expiring item, and notice it eventually appears in the
GSI, LSI, and base-table. Then we write the same item again, with an
expiration time - and now it should eventually disappear from the GSI,
LSI and base-table.

This patch also fixes a small bug which prevented this test from running
on DynamoDB.

Fixes #14495

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14496

(cherry picked from commit 599636b307)
2023-09-14 20:44:40 +03:00
Pavel Emelyanov
d9134003d5 Update seastar submodule
* seastar c0d1e3d8...04a39f44 (3):
  > rpc: Abort server connection streams on stop
  > rpc: Do not register stream to dying parent
  > rpc: Fix client-side stream registration race

refs: #13100

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-06 12:31:16 +03:00
Beni Peled
ebc9aed74e release: prepare for 5.1.17 scylla-5.1.17 2023-08-31 15:48:23 +03:00
Michał Chojnowski
1c3751b251 reader_concurrency_semaphore: fix a deadlock between stop() and execution_loop()
Permits added to `_ready_list` remain there until
executed by `execution_loop()`.
But `execution_loop()` exits when `_stopped == true`,
even though nothing prevents new permits from being added
to `_ready_list` after `stop()` sets `_stopped = true`.

Thus, if there are reads concurrent with `stop()`,
it's possible for a permit to be added to `_ready_list`
after `execution_loop()` has already quit. Such a permit will
never be destroyed, and `stop()` will forever block on
`_permit_gate.close()`.

A natural solution is to dismiss `execution_loop()` only after
it's certain that `_ready_list` won't receive any new permits.
This is guaranteed by `_permit_gate.close()`. After this call completes,
it is certain that no permits *exist*.

After this patch, `execution_loop()` no longer looks at `_stopped`.
It only exits when `_ready_list_cv` breaks, and this is triggered
by `stop()` right after `_permit_gate.close()`.

Fixes #15198

Closes #15199

(cherry picked from commit 2000a09859)
2023-08-31 08:35:27 +03:00
Calle Wilund
5e876c6614 generic_server: Handle TLS error codes indicating broken pipe
Fixes  #14625

In broken pipe detection, handle also TLS error codes.

Requires https://github.com/scylladb/seastar/pull/1729

Closes #14626

(cherry picked from commit 890f1f4ad3)
2023-08-29 15:46:48 +03:00
Botond Dénes
ccbce78b1c Update seastar submodule
* seastar e541165e...c0d1e3d8 (1):
  > tls: Export error_category instance used by tls + some common error codes

Refs: #14625
2023-08-29 15:46:25 +03:00
Alejo Sanchez
690b5579ef gms, service: replicate live endpoints on shard 0
Call replicate_live_endpoints on shard 0 to copy from 0 to the rest of
the shards. And get the list of live members from shard 0.

Move lock to the callers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13240

(cherry picked from commit da00052ad8)
2023-08-29 12:24:30 +02:00