Commit Graph

36826 Commits

Author SHA1 Message Date
Alejo Sanchez
2de6b8f49c test/topology: split raft upgrade tests
Split raft upgrade tests to run in parallel by default

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-05-19 01:07:41 +02:00
Anna Stuchlik
6f4a68175b doc: fix the links to the Enterprise docs
Fixes https://github.com/scylladb/scylladb/issues/13915

This commit fixes broken links to the Enterprise docs.
They are links to the enterprise branch, which is not
published. The links to the Enterprise docs should include
"stable" instead of the branch name.

This commit must be backported to branch-5.2, because
the broken links are present in the published 5.2 docs.

Closes #13917
2023-05-17 13:56:21 +03:00
Kefu Chai
6cd745fd8b build: cmake: add missing test
string_format_test was added in 1b5d5205c8,
so let's add it to CMake building system as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13912
2023-05-17 09:51:51 +03:00
Raphael S. Carvalho
5544d12f18 compaction: avoid excessive reallocation and during input list formatting
with off-strategy, input list size can be close to 1k, which will
lead to unneeded reallocations when formatting the list for
logging.

in the past, we faced stalls in this area, and excessive reallocation
(log2 ~1k = ~10) may have contributed to that.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13907
2023-05-17 09:40:06 +03:00
Benny Halevy
302a89488a test: sstable_3_x_test: add test_compression_premature_eof
Reproduces #13599 and verifies the fix.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13903
2023-05-17 09:00:44 +03:00
Gleb Natapov
605e53e617 do not report raft as enabled before group0 is configured
Currently we may start to receive requests before group0 is configured
during boot.  If that happens those requests may try to pull schema and
issue raft read barrier which will crash the system because group0 is
not yet available. Workaround it by pretending the raft is disabled in
this case and use non raft procedure. The proper fix should make sure
that storage proxy verbs are registered only after group0 is fully
functional.

Message-Id: <ZGOZkXC/MsiWtNGu@scylladb.com>
2023-05-17 01:06:42 +02:00
Michał Chojnowski
9b0679c140 range_tombstone_change_generator: fix an edge case in flush()
range_tombstone_change_generator::flush() mishandles the case when two range
tombstones are adjacent and flush(pos, end_of_range=true) is called with pos
equal to the end bound of the lesser-position range tombstone.

In such case, the start change of the greater-position rtc will be accidentally
emitted, and there won't be an end change, which breaks reader assumptions by
ending the stream with an unclosed range tombstone, triggering an assertion.

This is due to a non-strict inequality used in a place where strict inequality
should be used. The modified line was intended to close range tombstones
which end exactly on the flush position, but this is unnecessary because such
range tombstones are handled by the last `if` in the function anyway.
Instead, this line caused range tombstones beginning right after the flush
position to be emitted sometimes.

Fixes #12462

Closes #13906
2023-05-16 17:54:08 +02:00
Nadav Har'El
24c3cbcb0b Merge 'Improve verbosity of test/pylib/minio.py' from Pavel Emelyanov
CI once failed due to mc being unable to configure minio server. There's currently no glues why it could happen, let's increase the minio.py verbosity a bit

refs: #13896

Closes #13901

* github.com:scylladb/scylladb:
  test,minio: Run mc with --debug option
  test,minio: Log mc operations to log file
2023-05-16 18:04:36 +03:00
Nadav Har'El
52e4edfd5e Merge 'cql: update permissions when creating/altering a function/keyspace' from Wojciech Mitros
Currently, when a user creates a function or a keyspace, no
permissions on functions are update.
Instead, the user should gain all permissions on the function
that they created, or on all functions in the keyspace they have
created. This is also the behavior in Cassandra.

However, if the user is granted permissions on an function after
performing a CREATE OR REPLACE statement, they may
actually only alter the function but still gain permissions to it
as a result of the approach above, which requires another
workaround added to this series.

Lastly, as of right now, when a user is altering a function, they
need both CREATE and ALTER permissions, which is incompatible
with Cassandra - instead, only the ALTER permission should be
required.

This series fixes the mentioned issues, and the tests are already
present in the auth_roles_test dtest.

Fixes #13747

Closes #13814

* github.com:scylladb/scylladb:
  cql: adjust tests to the updated permissions on functions
  cql: fix authorization when altering a function
  cql: grant permissions on functions when creating a keyspace/function
  cql: pass a reference to query processor in grant_permissions_to_creator
  test_permissions: make tests pass on cassandra
2023-05-16 18:04:35 +03:00
Avi Kivity
d2d53fc1db Merge 'Do not yield while traversing the gossiper endpoint state map' from Benny Halevy
This series introduces a new gossiper method: get_endpoints that returns a vector of endpoints (by value) based on the endpoint state map.

get_endpoints is used here by gossiper and storage_service for iterations that may preempt
instead of iterating direction over the endpoint state map (`_endpoint_state_map` in gossiper or via `get_endpoint_states()`) so to prevent use-after-free that may potentially happen if the map is rehashed while the function yields causing invalidation of the loop iterators.

Fixes #13899

Closes #13900

* github.com:scylladb/scylladb:
  storage_service: do not preempt while traversing endpoint_state_map
  gossiper: do not preempt while traversing endpoint_state_map
2023-05-16 18:04:35 +03:00
Botond Dénes
3ea521d21b Update tools/jmx submodule
* tools/jmx f176bcd1...1fd23b60 (1):
  > select-java: query java version using -XshowSettings
2023-05-16 18:04:35 +03:00
Kamil Braun
5a8e2153a0 Merge 'Fix heart_beat_state::force_highest_possible_version_unsafe' from Benny Halevy
It turns out that numeric_limits defines an implicit implementation
for std::numeric_limits<utils::tagged_integer<Tag, ValueType>>
which apprently returns a default-constructed tagged_integer
for min() and max(), and this broke
`gms::heart_beat_state::force_highest_possible_version_unsafe()`
since [gms: heart_beat_state: use generation_type and version_type](4cdad8bc8b)
(merged in [Merge 'gms: define and use generation and version types'...](7f04d8231d))

Implementing min/max correctly
Fixes #13801

Closes #13880

* github.com:scylladb/scylladb:
  storage_service: handle_state_normal: on_internal_error on "owns no tokens"
  utils: tagged_integer: implement std::numeric_limits::{min,max}
  test: add tagged_integer_test
2023-05-16 13:59:41 +02:00
Wojciech Mitros
6bc16047ba rust: update wasmtime dependency
The previous version of wasmtime had a vulnerability that possibly
allowed causing undefined behavior when calling UDFs.

We're directly updating to wasmtime 8.0.1, because the update only
requires a slight code modification and the Wasm UDF feature is
still experimental. As a result, we'll benefit from a number of
new optimizations.

Fixes #13807

Closes #13804
2023-05-16 13:03:29 +03:00
Pavel Emelyanov
29fffaa160 schema_tables: Use sharded<database>& variable
The auto& db = proxy.local().get_db() is called few lines above this
patch, so the &db can be reused for invoke_on_all() call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13896
2023-05-16 12:57:47 +03:00
Benny Halevy
1da0b0ff76 storage_service: do not preempt while traversing endpoint_state_map
The map iterators might be invalidated while yielding
on insert if the map is rehashed.
See https://en.cppreference.com/w/cpp/container/unordered_map/insert

Refs #13899

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-16 12:24:44 +03:00
Benny Halevy
ba13056eba gossiper: do not preempt while traversing endpoint_state_map
The map iterators might be invalidated while yielding
on insert if the map is rehashed.
See https://en.cppreference.com/w/cpp/container/unordered_map/insert

Refs #13899

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-16 12:24:42 +03:00
Pavel Emelyanov
01628ae8c1 test,minio: Run mc with --debug option
With that if mc fails we'll (hopefully) get some meaningful information
about why it happened.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:16:15 +03:00
Pavel Emelyanov
4041c2f30d test,minio: Log mc operations to log file
Currently everything minio.py does goes to test.py log, while mc (and
minio) output go to another log file. That's inconvenient, better to
keep minio.py's messages in minio log file.

Also, while at it, print a message if local alias drop fails (it's
benign failure, but it's good to have the note anyway).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:14:49 +03:00
Kefu Chai
67dae95f58 build: cmake: add Scylla_USE_LINKER option
this option allows user to use specified linker instead of the
default one. this is more flexible than adding more linker
candidates to the known linkers.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13874
2023-05-16 11:30:18 +03:00
Tzach Livyatan
a73fde6888 Update Azure recommended instances type from the Lsv2-series to the Lsv3-series
Closes #13835
2023-05-16 10:58:19 +03:00
Avi Kivity
3c54d5ec5e test: string_format_test: don't compare std::string with sstring
For unknown reasons, clang 16 rejects equality comparison
(operator==) where the left-hand-side is an std::string and the
right-hand-side is an sstring. gcc and older clang versions first
convert the left-hand-side to an sstring and then call the symmetric
equality operator.

I was able to hack sstring to support this assymetric comparison,
but the solution is quite convoluted, and it may be that it's clang
at fault here. So instead this patch eliminates the three cases where
it happened. With is applied, we can build with clang 16.

Closes #13893
2023-05-16 08:56:16 +03:00
Kefu Chai
b112a3b78a api: storage_service: use string for generation
in this change, the type of the "generation" field of "sstable" in the
return value of RESTful API entry point at
"/storage_service/sstable_info" is changed from "long" to "string".

this change depends on the corresponding change on tools/jmx submodule,
so we have to include the submodule change in this very commit.

this API is used by our JMX exporter, which in turn exposes the
SSTable information via the "StorageService.getSSTableInfo" mBean
operation, which returns the retrieved SSTable info as a list of
CompositeData. and "generation" is a field of an element in the
CompositeData. in general, the scylla JMX exporter is consumed
by the nodetool, which prints out returned SSTable info list with
a pretty formatted table, see
tools/java/src/java/org/apache/cassandra/tools/nodetool/SSTableInfo.java.
the nodetool's formatter is not aware of the schema or type of the
SSTables to be printed, neither does it enforce the type -- it just
tries it best to pretty print them as a tabular.

But the fields in CompositeData is typed, when the scylla JMX exporter
translates the returned SSTables from the RESTful API, it sets the
typed fields of every `SSTableInfo` when constructing `PerTableSSTableInfo`.
So, we should be consistent on the type of "generation" field on both
the JMX and the RESTful API sides. because we package the same version
of scylla-jmx and nodetool in the same precompiled tarball, and enforce
the dependencies on exactly same version when shipping deb and rpm
packages, we should be safe when it comes to interoperability of
scylla-jmx and scylla. also, as explained above, nodetool does not care
about the typing, so it is not a problem on nodetool's front.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13834
2023-05-15 20:33:48 +03:00
Botond Dénes
646396a879 mutation/mutation_partition: append_clustered_row(): use on_internal_error()
Instead of simply throwing an exception. With just the exception, it is
impossible to find out what went wrong, as this API is very generic and
is used in a variety of places. The backtrace printed by
`on_internal_error()` will help zero in on the problem.

Fixes: #13876

Closes #13883
2023-05-15 20:31:44 +03:00
Calle Wilund
469e710caa docs: Add initial doc on commitlog segment file format
Refs #12849

Just a few lines on the file format of segments.

Closes #13848
2023-05-15 16:22:44 +03:00
Benny Halevy
502b5522ca storage_service: handle_state_normal: on_internal_error on "owns no tokens"
Although this condition should not happen,
we suspect that certain timing conditions might
lead this state of node in handle_normal_state
(possibly when shutdown) has no tokens.

Currently we call on_internal_error_noexcept, so
if abort_on_internal_error is false, we will just
print an error and continue on with handle_state_normal.

Change that to `on_internal_error` so to throw an
exception in production in this unexpected state.

Refs #13801

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-15 12:49:17 +03:00
Anna Stuchlik
84ed95f86f doc: add OS support for version 2023.1
Fixes https://github.com/scylladb/scylladb/issues/13857

This commit adds the OS support for ScyllaDB Enterprise 2023.1.
The support is the same as for ScyllaDB Open Source 5.2, on which
2023.1 is based.

After this commit is merged, it must be backported to branch-5.2.
In this way, it will be merged to branch-2023.1 and available in
the docs for Enterprise 2023.1

Closes: #13858
2023-05-15 10:51:53 +03:00
Alejo Sanchez
19687b54f1 test/pytest: yaml configuration cluster section
Separate cluster_size into a cluster section and specify this value as
initial_size.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13440
2023-05-15 09:48:39 +02:00
Benny Halevy
a70b53b6e7 utils: tagged_integer: implement std::numeric_limits::{min,max}
Add add a respective unit test.

It turns out that numeric_limits defines an implicit implementation
for std::numeric_limits<utils::tagged_integer<Tag, ValueType>>
which apprently returns a default-constructed tagged_integer
for min() and max(), and this broke
`gms::heart_beat_state::force_highest_possible_version_unsafe()`
since 4cdad8bc8b
(merged in 7f04d8231d)

Implementing min/max correctly
Fixes #13801

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-15 10:19:39 +03:00
Botond Dénes
0cff0ffa08 Merge 'alternator,config: make alternator_timeout_in_ms live-updateable' from Kefu Chai
before this change, alternator_timeout_in_ms is not live-updatable,
as after setting executor's default timeout right before creating
sharded executor instances, they never get updated with this option
anymore. but many users would like to set the driver timers based on
server timers. we need to enable them to configure timeout even
when the server is still running.

in this change,

* `alternator_timeout_in_ms` is marked as live-updateable
* `executor::_s_default_timeout` is changed to a thread_local variable,
   so it can be updated by a per-shard updateable_value. and
   it is now a updateable_value, so its variable name is updated
   accordingly. this value is set in the ctor of executor, and
   it is disconnected from the corresponding named_value<> option
   in the dtor of executor.
* alternator_timeout_in_ms is passed to the constructor of
   executor via sharded_parameter, so `executor::_timeout_in_ms` can
   be initialized on per-shard basis
* `executor::set_default_timeout()` is dropped, as we already pass
   the option to executor in its ctor.

Fixes #12232

Closes #13300

* github.com:scylladb/scylladb:
  alternator: split the param list of executor ctor into multi lines
  alternator,config: make alternator_timeout_in_ms live-updateable
2023-05-15 10:16:29 +03:00
Botond Dénes
6c27297406 Merge 'test: sstable_*test: use generator to create new generations' from Kefu Chai
in this series, instead of hardwiring to integer, we switch to generation generator for creating new generations. this should helps us to migrate to a generation identifier which can also represented by UUID. and potentially can help to improve the testing coverage once we switch over to UUID-based generation identifier. will need to parameterize these tests by then, for sure.

Closes #13863

* github.com:scylladb/scylladb:
  test: sstable: use generator to generate generations
  test: sstable: pass generation_type in helper functions
  test: sstable: use generator to generate generations
2023-05-15 10:04:30 +03:00
Botond Dénes
3256afe263 Update tools/jmx submodule
* tools/jmx 5f988945...f176bcd1 (1):
  > sstableinfo: change the type of generation to string

Refs: #13834
2023-05-15 09:59:40 +03:00
Asias He
93c93c69f9 repair: Add per peer node error for get_sync_boundary and friends
It is useful to know which node has the error. For example, when a node
has a corrupted sstable, with this patch, repair master node can tell
which node has the corrupted sstable.

```
WARN  2023-05-15 10:54:50,213 [shard 0] repair -
repair[2df49b2c-219d-411d-87c6-2eae7073ba61]: get_combined_row_hash: got
error from node=127.0.0.2, keyspace=ks2a, table=tb,
range=(8992118519279586742,9031388867920791714],
error=seastar::rpc::remote_verb_error (some error)
```

Fixes #13881

Closes #13882
2023-05-15 09:52:27 +03:00
Pavel Emelyanov
07b7e9faf1 load-meter: Remove unused get_load_string
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13873
2023-05-15 09:21:08 +03:00
Piotr Dulikowski
760651b4ad error injection: allow enabling injections via config
Currently, error injections can be enabled either through HTTP or CQL.
While these mechanisms are effective for injecting errors after a node
has already started, it can't be reliably used to trigger failures
shortly after node start. In order to support this use case, this commit
adds possibility to enable some error injections via config.

A configuration option `error_injections_at_startup` is added. This
option uses our existing configuration framework, so it is possible to
supply it either via CLI or in the YAML configuration file.

- When passed in commandline, the option is parsed as a
  semicolon-separated list of error injection names that should be
  enabled. Those error injections are enabled in non-oneshot mode.

  The CLI option is marked as not used in release mode and does not
  appear in the option list.

  Example:

      --error-injections-at-startup failure_point1;failure_point2

- When provided in YAML config, the option is parsed as a list of items.
  Each item is either a string or a map or parameters. This method is
  more flexible as it allows to provide parameters for each injection
  point. At this time, the only benefit is that it allows enabling
  points in oneshot mode, but more parameters can be added in the future
  if needed.

  Explanatory example:

      error_injections_at_startup:
      - failure_point1 # enabled in non-oneshot mode
      - name: failure_point2 # enabled in oneshot mode
        one_shot: true       # due to one_shot optional parameter

The primary goal of this feature is to facilitate testing of raft-based
cluster features. An error injection will be used to enable an
additional feature to simulate node upgrade.

Tests: manual

Closes #13861
2023-05-15 09:14:07 +03:00
Botond Dénes
1b04fc1425 Merge 'Use member initializer list for trace_state and related helper classes' from Pavel Emelyanov
Constructors of trace_state class initialize most of the fields in constructor body with the help of non-inline helper method. It's possible and is better to initialize as much as possible with initializer lists.

Closes #13871

* github.com:scylladb/scylladb:
  tracing: List-initialize trace_state::_records
  tracing: List-initialize trace_state::_props
  tracing: List-initialize trace_state::_slow_query_threshold
  tracing: Reorder trace_state fields initialization
  tracing: Remove init_session_records()
  tracing: List-initialize one_session_records::ttl
  tracing: List-initialize one_session_records
  tracing: List-initialize session_record
2023-05-15 09:06:14 +03:00
Botond Dénes
20ff122a84 Merge 'Delete S3 sstables without the help of deletion log' from Pavel Emelyanov
There are two layers of stables deletion -- delete-atomically and wipe. The former is in fact the "API" method, it's called by table code when the specific sstable(s) are no longer needed. It's called "atomically" because it's expected to fail in the middle in a safe manner so that subsequent boot would pick the dangling parts and proceed. The latter is a low-level removal function that can fail in the middle, but it's not of _its_ care.

Currently the atomic deletion is implemented with the help of sstable_directory::delete_atomically() method that commits sstables files names into deletion log, then calls wipe (indirectly), then drops the deletion log. On boot all found deletion logs are replayed. The described functionality is used regardless of the sstable storage type, even for S3, though deletion log is an overkill for S3, it's better be implemented with the help of ownership table. In fact, S3 storage already implements atomic deletion in its wipe method thus being overly careful.

So this PR
- makes atomic deletion be storage-specific
- makes S3 wipe non-atomic

fixes: #13016
note: Replaying sstables deletion from ownership table on boot is not here, see #13024

Closes #13562

* github.com:scylladb/scylladb:
  sstables: Implement atomic deleter for s3 storage
  sstables: Get atomic deleter from underlying storage
  sstables: Move delete_atomically to manager and rename
2023-05-15 08:57:47 +03:00
Benny Halevy
1b5d5205c8 test: add tagged_integer_test
Add basic test for tagged+integer arithmetic operations.

Remove const qualifier from `tagged_integer::operator[+-]=`
as these are add/sub-assign operators that need to modify
the value in place.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-14 23:26:58 +03:00
Wojciech Mitros
96e912e1cf auth: disallow CREATE permission on a specific function
Similarly to how we handle Roles and Tables, we do not
allow permissions on non-existent objects, so the CREATE
permission on a specific function is meaningless, because
for the permission to be granted to someone, the function
must be already created.
This patch removes the CREATE permission from the set of
permissions applicable to a specific function.

Fixes #13822

Closes #13824
2023-05-14 18:40:34 +03:00
Wojciech Mitros
1e18731a69 cql-pytest: translate Cassandra's UFTypesTest
This is a translation of Cassandra's CQL unit test source file
validation/entities/UFTypesTest.java into our cql-pytest framework.

There are 7 tests, which reproduce one known bug:
Refs #13746: UDF can only be used in SELECT, and abort when used in WHERE, or in INSERT/UPDATE/DELETE commands

And uncovered two previously unknown bugs:

Refs #13855: UDF with a non-frozen collection parameter cannot be called on a frozen value
Refs #13860: A non-frozen collection returned by a UDF cannot be used as a frozen one

Additionally, we encountered an issue that can be treated as either a bug or a hole in documentation:

Refs #13866: Argument and return types in UDFs can be frozen

Closes #13867
2023-05-14 15:22:03 +03:00
Avi Kivity
31e820e5a1 Merge 'Allow tombstone GC in compaction to be disabled on user request' from Raphael "Raph" Carvalho
Adding new APIs /column_family/tombstone_gc and /storage_service/tombstone_gc, that will allow for disabling tombstone garbage collection (GC) in compaction.

Mimicks existing APIs /column_family/autocompaction and /storage_service/autocompaction.

column_family variant must specify a single table only, following existing convention.

whereas the storage_service one can specify an entire keyspace, or a subset of a tables in a keyspace.

column_family API usage
-----

```
    The table name must be in keyspace:name format

    Get status:
    curl -s -X GET "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

    Enable GC
    curl -s -X POST "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

    Disable GC
    curl -s -X DELETE "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"
```

storage_service API usage
-----

```
    Tables can be specified using a comma-separated list.

    Enable GC on keyspace
    curl -s -X POST "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

    Disable GC on keyspace
    curl -s -X DELETE "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

    Enable GC on a subset of tables
    curl -s -X POST
    "http://127.0.0.1:10000/storage_service/tombstone_gc/ks?cf=table1,table2"
```

Closes #13793

* github.com:scylladb/scylladb:
  test: Test new API for disabling tombstone GC
  test: rest_api: extract common testing code into generic functions
  Add API to disable tombstone GC in compaction
  api: storage_service: restore indentation
  api: storage_service: extract code to set attribute for a set of tables
  tests: Test new option for disabling tombstone GC in compaction
  compaction_strategy: bypass tombstone compaction if tombstone GC is disabled
  table: Allow tombstone GC in compaction to be disabled on user request
2023-05-14 14:16:16 +03:00
Tomasz Grabiec
a91e83fad6 Merge "issue raft read barrier before pulling schema" from Gleb
Schema pull may fail because the pull does not contain everything that
is needed to instantiate a schema pointer. For instance it does not
contain a keyspace. This series changes the code to issue raft read
barrier before the pull which will guaranty that the keyspace is created
before the actual schema pull is performed.
2023-05-14 14:14:24 +03:00
Raphael S. Carvalho
a7ceb987f5 test: Fix sporadic failures of database_test
database_test is failing sporadically and the cause was traced back
to commit e3e7c3c7e5.

The commit forces a subset of tests in database_test, to run once
for each of predefined x_log2_compaction_group settings.

That causes two problems:
1) test becomes 240% slower in dev mode.
2) queries on system.auth is timing out, and the reason is a small
table being spread across hundreds of compaction groups in each
shard. so to satisfy a range scan, there will be multiple hops,
making the overhead huge. additionally, the compaction group
aware sstable set is not merged yet. so even point queries will
unnecessarily scan through all the groups.

Fixes #13660.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13851
2023-05-14 14:14:24 +03:00
Avi Kivity
97694d26c4 Merge 'reader_permit: minor improvements to resource consume/release safety' from Botond Dénes
This PR contains some small improvements to the safety of consuming/releasing resources to/from the semaphore:
* reader_permit: make the low-level `consume()/signal()` API private, making the only user (an RAII class) friend.
* reader_resources: split `reset()` into `noexcept` and potentially throwing variant.
* reader_resources::reset_to(): try harder to avoid calling `consume()` (when the new resource amount is smaller then the previous one)

Closes #13678

* github.com:scylladb/scylladb:
  reader_permit: resource_units::reset_to(): try harder to avoid calling consume()
  reader_permit: split resource_units::reset()
  reader_permit: make consume()/signal() API private
2023-05-14 14:14:23 +03:00
Avi Kivity
5d6f31df8e Merge 'Coroutinize sstable::read_toc()' from Pavel Emelyanov
It consists of two parts -- call for do_read_simple() with lambda and handling of its results. PR coroutinizes it in two steps for review simplicity -- first the lambda, then the outer caller. Then restores indentation.

Closes #13862

* github.com:scylladb/scylladb:
  sstables: Restore indentation after previous patches
  sstables: Coroutinuze read_toc() outer part
  sstables: Coroutinuze read_toc() inner part
2023-05-14 14:14:23 +03:00
Avi Kivity
0a78995e2b Merge 'Share s3 clients between sstables' from Pavel Emelyanov
Currently s3::client is created for each sstable::storage. It's later shared between sstable's files and upload sink(s). Also foreign_sstable_open_info can produce a file from a handle making a new standalone client. Coupled with the seastar's http client spawning connections on demand, this makes it impossible to control the amount of opened connections to object storage server.

In order to put some policy on top of that (as well as apply workload prioritization) s3 clients should be collected in one place and then shared by users. Since s3::client uses seastar::http::client under the hood which, in turn, can generate many connections on demand, it's enough to produce a single s3::client per configured endpoint one each shard and then share it between all the sstables, files and sinks.

There's one difficulty however, solving which is most of what this PR does. The file handle, that's used to transfer sstable's file across shards, should keep aboard all it needs to re-create a file on another shard. Since there's a single s3::client per shard, creation of a file out of a handle should grab that shard's client somehow. The meaningful shard-local object that can help is the sstables_manager and there are three ways to make use of it. All deal with the fact that sstables_manager-s are not sharded<> services, but are owner by the database independently on each shard.

1. walk the client -> sst.manager -> database -> container -> database -> sst.manager -> client chain by keeping its first half on the handle and unrolling the second half to produce a file
2. keep sharded peering service referenced by the sstables_manager that's initialized in main and passed though the database constructor down to sstables_manager(s)
3. equip file_handle::to_file with the "context" argument and teach sstables foreign info opener to push sstables_manager down to s3 file ... somehow

This PR chooses the 2nd way and introduces the sstables::storage_manager main-local sharded peering service that maintains all the s3::clients. "While at it" the new manager gets the object_storage_config updating facilities from the database (it's overloaded even without it already). Later the manager will also be in charge of collecting and exporting S3 metrics. In order to limit the number of S3 connections it also needs a patch seastar http::client, there's PR already doing that, once (if) merged there'll come one more fix on top.

refs: #13458
refs: #13369
refs: scylladb/seastar#1652

Closes #13859

* github.com:scylladb/scylladb:
  s3: Pick client from manager via handle
  s3: Generalize s3 file handle
  s3: Live-update clients' configs
  sstables: Keep clients shared across sstables
  storage_manager: Rewrap config map
  sstables, database: Move object storage config maintenance onto storage_manager
  sstables: Introduce sharded<storage_manager>
2023-05-14 14:14:23 +03:00
Pavel Emelyanov
8bca54902c sstables: Implement atomic deleter for s3 storage
The existing storage::wipe() method of s3 is in fact atomic deleter --
it commits "deleting" status into ownership table, deletes the objects
from server, then removes the entry from ownership table. So the atomic
deleter does the same and the .wipe() just removes the objects, because
it's not supposed to be atomic.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 17:52:13 +03:00
Pavel Emelyanov
6a8139a4fe sstables: Get atomic deleter from underlying storage
While the driver isn't known without the sstable itself, we have a
vector of them can can get it from the front element. This is not very
generic, but fortunately all sstables here belong to the same table and,
respectively, to the same storage and even prefix. The latter is also
assert-checked by the sstable_directory atomic deleter code.

For now S3 storage returns the same directory-based deleter, but next
patch will change that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 17:52:13 +03:00
Pavel Emelyanov
5985f00da9 sstables: Move delete_atomically to manager and rename
This is to let manager decide which storage driver to call for atomic
sstables deletion in the next patch. While at it -- rename the
sstable_directory's method into something more descriptive (to make
compiler catch all callers of it).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-12 17:52:12 +03:00
Raphael S. Carvalho
107999c990 test: Test new API for disabling tombstone GC
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:34:38 -03:00
Raphael S. Carvalho
c396db2e4c test: rest_api: extract common testing code into generic functions
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:34:38 -03:00