Commit Graph

951 Commits

Author SHA1 Message Date
Botond Dénes
bd7a3e5871 Merge 'Sanitize sstables-making utils in tests' from Pavel Emelyanov
There are tons of wrappers that help test cases make sstables for their needs. And lots of code duplication in test cases that do parts of those helpers' work on their own. This set cleans some bits of those

Closes #14280

* github.com:scylladb/scylladb:
  test/utils: Generalize making memtable from vector<mutation>
  test/util: Generalize make_sstable_easy()-s
  test/sstable_mutation: Remove useless helper
  test/sstable_mutation: Make writer config in make_sstable_mutation_source()
  test/utils: De-duplicate make_sstable_containing-s
  test/sstable_compaction: Remove useless one-line local lambda
  test/sstable_compaction: Simplify sstable making
  test/sstables*: Make sstable from vector of mutations
  test/mutation_reader: Remove create_sstable() helper from test
2023-06-19 14:05:29 +03:00
Pavel Emelyanov
6bec03f96f test: Remove sstable_utils' storage_prefix() helper
It's excessive, test case that needs it can get storage prefix without
this fancy wrapper-helper

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14273
2023-06-19 13:51:04 +03:00
Nadav Har'El
ac3d0d4460 Merge 'cql3: expr: support evaluate(column_mutation_attribute)' from Avi Kivity
In preparation for converting selectors to evaluate expressions,
add support for evaluating column_mutation_attribute (representing
the WRITETIME/TTL pseudo-functions).

A unit test is added.

Fixes #12906

Closes #14287

* github.com:scylladb/scylladb:
  test: expr: test evaluation of column_mutation_attribute
  test: lib: enhance make_evaluation_inputs() with support for ttls/timestamps
  cql3: expr: evaluate() column_mutation_attribute
2023-06-19 11:11:49 +03:00
Botond Dénes
562087beff Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai"
This reverts commit d1dc579062, reversing
changes made to 3a73048bc9.

Said commit caused regressions in dtests. We need to investigate and fix
those, but in the meanwhile let's revert this to reduce the disruption
to our workflows.

Refs: #14283
2023-06-19 08:49:27 +03:00
Avi Kivity
5e2fd0bbaf test: lib: enhance make_evaluation_inputs() with support for ttls/timestamps
While remaining backwards compatible, allow supplying custom timestamp/ttl
with each fake column value.

Note: I tried to use a formatter<> for the new data structure, but
got entangled in a template loop.
2023-06-18 22:45:25 +03:00
Kamil Braun
028183c793 main, cql_test_env: simplify system_keyspace initialization
Initialization of `system_keyspace` is now all done at once instead of
being spread out through the entire procedure. This is doable because
`query_processor` is now available early. A couple of FIXMEs have been
resolved.
2023-06-18 13:39:27 +02:00
Kamil Braun
33c19baabc db: system_keyspace: take simpler service references in make
Take references to services which are initialized earlier. The
references to `gossiper`, `storage_service` and `raft_group0_registry`
are no longer needed.

This will allow us to move the `make` step right after starting
`system_keyspace`.
2023-06-18 13:39:27 +02:00
Kamil Braun
b34605d161 db: system_keyspace: call initialize_virtual_tables from main
`initialize_virtual_tables` was called from `system_keyspace::make`,
which caused this `make` function to take a bunch of references to
late-initialized services (`gossiper`, `storage_service`).

Call it from `main`/`cql_test_env` instead.

Note: `system_keyspace::make` is called from
`distributed_loader::init_system_keyspace`. The latter function contains
additional steps: populate the system keyspaces (with data from
sstables) and mark their tables ready for writes.

None of these steps apply to virtual tables.

There exists at least one writable virtual table, but writes into
virtual tables are special and the implementation of writes is
virtual-table specific. The existing writable virtual table
(`db_config_table`) only updates in-memory state when written to. If a
virtual table would like to create sstables, or populate itself with
sstable data on startup, it will have to handle this in its own
initialization function.

Separating `initialize_virtual_tables` like this will allow us to
simplify `system_keyspace` initialization, making it independent of
services used for distributed communication.
2023-06-18 13:39:27 +02:00
Pavel Emelyanov
15ac192cc2 test/utils: Generalize making memtable from vector<mutation>
Both, make_sstable_easy() and make_sstable_containing() prepare memtable
by allocating it and applying mutations from vector. Make a local
helper. Many test cases can, probably, benefit from it too, but they
often do more stuff before applying mutation to memtable, so this is
left for future patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:24:24 +03:00
Pavel Emelyanov
2badad1b15 test/util: Generalize make_sstable_easy()-s
There are two of them, one making sstable from memtable and the other
one doing the same from a custom reader. The former can just call the
latter with memtable's flat reader

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:23:46 +03:00
Pavel Emelyanov
6fe7476ba9 test/utils: De-duplicate make_sstable_containing-s
The function that prepares memtable from mutations vector can call its
overload that writes this memtable into an sstable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-16 21:19:55 +03:00
Pavel Emelyanov
900c609269 Merge 'Initialize query_processor early, without messaging_service or gossiper' from Kamil Braun
In https://github.com/scylladb/scylladb/pull/14231 we split `storage_proxy` initialization into two phases: for local and remote parts. Here we do the same with `query_processor`. This allows performing queries for local tables early in the Scylla startup procedure, before we initialize services used for cluster communication such as `messaging_service` or `gossiper`.

Fixes: #14202

As a follow-up we will simplify `system_keyspace` initialization, making it available earlier as well.

Closes #14256

* github.com:scylladb/scylladb:
  main, cql_test_env: start `query_processor` early
  cql3: query_processor: split `remote` initialization step
  cql3: query_processor: move `migration_manager&`, `forwarder&`, `group0_client&` to a `remote` object
  cql3: query_processor: make `forwarder()` private
  cql3: query_processor: make `get_group0_client()` private
  cql3: strongly_consistent_modification_statement: fix indentation
  cql3: query_processor: make `get_migration_manager` private
  tracing: remove `qp.get_migration_manager()` calls
  table_helper: remove `qp.get_migration_manager()` calls
  thrift: handler: move implementation of `execute_schema_command` to `query_processor`
  data_dictionary: add `get_version`
  cql3: statements: schema_altering_statement: move `execute0` to `query_processor`
  cql3: statements: pass `migration_manager&` explicitly to `prepare_schema_mutations`
  main: add missing `supervisor::notify` message
2023-06-16 17:41:08 +03:00
Kamil Braun
9f9f4c224b main, cql_test_env: start query_processor early
Start it right after `storage_proxy`.

We also need to start `cql_config` earlier
because `query_processor` uses it.
2023-06-16 14:29:59 +02:00
Kamil Braun
c212370cf1 cql3: query_processor: split remote initialization step
Pass `migration_manager&`, `forward_service&` and `raft_group0_client&`
in the remote init step which happens after the constructor.

Add a corresponding uninit remote step.
Make sure that any use of the `remote` services is finished before we
destroy the `remote` object by using a gate.

Thanks to this in a later commit we'll be able to move the construction
of `query_processor` earlier in the Scylla initialization procedure.
2023-06-16 14:29:59 +02:00
Kefu Chai
939fa087cc sstables, replica: pass uuid_sstable_identifiers to generation generator
before this change, we assume that generation is always integer based.
in order to enable the UUID-based generation identifier if the related
option is set, we should populate this option down to generation generator.

because we don't have access to the cluster features in some places where
a new generation is created, a new accessor exposing feature_service from
sstable manager is added.

Fixes #10459
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kamil Braun
26cd3b9b78 data_dictionary: add get_version
The `replica::database` version simply calls `get_version`
on the real database.

The `schema_loader` version throws `bad_function_call`.
2023-06-15 09:48:54 +02:00
Kamil Braun
b23cc9b441 main, cql_test_env: initialize storage_proxy early
This is another part of splitting Scylla initialization into two phases:
local and remote parts. Performing queries is done with `storage_proxy`,
so for local queries we want to initialize it before we initialize
services specific to cluster communication such as `gossiper`,
`messaging_service`, `storage_service`.

`system_keyspace` should also be initialized after `storage_proxy` (and
is after this patch) so in the future we'll be able to merge the
multiple initialization steps of `system_keyspace` into one (it only
needs the local part to work).
2023-06-14 11:41:36 +02:00
Kamil Braun
a8f6afc2fd main, cql_test_env: initialize database early
We want to separate two phases of Scylla service initialization: first
we initialize the local part, which allows performing local queries,
then a remote part, which requires contacting other nodes in a cluster
and allows performing distributed queries.

The `database` object is crucial for both remote and local queries, but it
was created pretty late, after services such as `gossiper` or
`storage_service` which are used for distributed operations.

Fortunately we can easily move `database` initialization and all of its
prerequisites early in the init procedure.
2023-06-14 11:41:36 +02:00
Kamil Braun
f26e98c3be storage_proxy: don't pass gossiper& and messaging_service& during initialization
These services are now passed during `init_messaging_service`, and
that's when the `remote` object is constructed.

The `remote` object is then destroyed in `uninit_messaging_service`.

Also, `migration_manager*` became `migration_manager&` in
`init_messaging_service`.
2023-06-14 11:41:36 +02:00
Kamil Braun
2dbf6f32cd Merge 'Fix crash during restart of a single node with topology over raft' from Gleb
This is a regression introduced in f26179cd27.

Fixes: #14136

* 'gleb/set_group0' of github.com:scylladb/scylla-dev:
  test: restart first node to see if it can boot after restart
  service: move setting of group0 point in storage_service earlier
2023-06-07 10:21:17 +02:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Gleb Natapov
8598cebb11 service: move setting of group0 point in storage_service earlier
group0 pointer in storage_service should be set when group0 starts.
After f26179cd27 we start group0 earlier,
so we need to move setting of the group0 pointer as well.
2023-06-06 12:12:48 +03:00
Benny Halevy
bda3705974 test/lib: test_reader_conversions: always close reader
read_mutation_from_flat_mutation_reader might throw
so we need to close the reader returned from
ms.make_fragment_v1_stream also on the error
path to avoid the internal error abort when the
reader is destroyed while opened.

Fixes #14098

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14099
2023-05-31 17:49:38 +02:00
Kefu Chai
82cac8e7cf treewide: s/std::source_location/seastar::compact::source_location/
CWG 2631 (https://cplusplus.github.io/CWG/issues/2631.html) reports
an issue on how the default argument is evaluated. this problem is
more obvious when it comes to how `std::source_location::current()`
is evaluated as a default argument. but not all compilers have the
same behavior, see https://godbolt.org/z/PK865KdG4.

notebaly, clang-15 evaluates the default argument at the callee
site. so we need to check the capability of compiler and fall back
to the one defined by util/source_location-compat.hh if the compiler
suffers from CWG 2631. and clang-16 implemented CWG2631 in
https://reviews.llvm.org/D136554. But unfortunately, this change
was not backported to clang-15.

before switching over to clang-16, for using std::source_location::current()
as the default parameter and expect the behavior defined by CWG2631,
we have to use the compatible layer provided by Seastar. otherwise
we always end up having the source_location at the callee side, which
is not interesting under most circumstances.

so in this change, all places using the idiom of passing
std::source_location::current() as the default parameter are changed
to use seastar::compat::source_location::current(). despite that
we have `#include "seastarx.h"` for opening the seastar namespace,
to disambiguate the "namespace compat" defined somewhere in scylladb,
the fully qualified name of
`seastar::compat::source_location::current()` is used.

see also 09a3c63345, where we used
std::source_location as an alias of std::experimental::source_location
if it was available. but this does not apply to the settings of our
current toolchain, where we have GCC-12 and Clang-15.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14086
2023-05-30 15:10:12 +03:00
Pavel Emelyanov
44b811ce19 test: Don't create directory for system tables in cql_test_env
The distributed_loader::init_system_keyspaces() does it when called few
lines above this place

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-26 17:58:46 +03:00
Botond Dénes
57758ec3e1 Merge 'Put streaming sched group onto stream manager' from Pavel Emelyanov
The manager is in charge of updating IO bandwidth on the respective prio class. Nowadays it uses global priority-manager, but unifying sched classes effort will require it to use non-global streaming sched group. After the patch the sched class field is unused, but it's a preparation towards huge (really huge) "switch to seastar API level 7" patch

ref: #13963

Closes #13997

* github.com:scylladb/scylladb:
  stream_manager: Add streaming sched group copy
  cql_test_env: Move sched groups initialization up
2023-05-24 09:27:30 +03:00
Botond Dénes
313ae4ddac Merge 'Generalize some file accessing helpers in test/' from Pavel Emelyanov
Several test cases use common operations one files like existence checking, content comparing, etc. with the help of home-brew local helpers. The set makes use of some existing seastar:: ones and generalizes others into test/lib/. The primary intent here is `57 insertions(+), 135 deletions(-)`

Closes #13936

* github.com:scylladb/scylladb:
  test: Generalize touch_file() into test_utils.*
  test/database: Generalize file/dir touch and exists checks
  test/sstables: Use seastar::file_exists() to check
  test/sstables: Remove sstdesc
  test/sstables: Use compare_files from utils/ in sstable_test
  test/sstables: Use compare_files() from utils/ in sstable_3_x_test
  test/util: Add compare_file() helpers
2023-05-24 08:43:41 +03:00
Avi Kivity
3956e01640 Merge 'Clean index_reader API' from Pavel Emelyanov
The way index_reader maintains io_priority_class can be relaxed a bit. The main intent is to shorten the #13963 final patch a bit, as a side effect index_reader gets its portion of API polishing.

ref: #13963

Closes #13992

* github.com:scylladb/scylladb:
  index_reader: Introduce and use default arguments to constructor
  index_reader: Use _pc field in get_file_input_stream_options() directly
  index_reader: Move index_reader::get_file_input_stream_options to private: block
2023-05-23 18:46:26 +03:00
Pavel Emelyanov
678f8fb1b7 stream_manager: Add streaming sched group copy
The manager in question is responsible for maintaining the streaming
class IO bandwidth update. Nowadays it does it via priority manager's
global streaming IO priority class field, but it will need to switch to
streaming sched group.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 14:31:23 +03:00
Pavel Emelyanov
ff9d65f6ad cql_test_env: Move sched groups initialization up
The streaming manager will need to keep its copy of
streaming/maintenance group, so groups should be created early.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 14:31:23 +03:00
Pavel Emelyanov
2bb024c948 index_reader: Introduce and use default arguments to constructor
Most of creators of index_reader construct it with default prio class,
null trace pointer and use_caching::yes. Assigning implicit defaults to
constructor arguments keeps the code shorter and easier to read.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 11:29:04 +03:00
Pavel Emelyanov
9bdc0d3f44 test: Generalize touch_file() into test_utils.*
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:40:55 +03:00
Pavel Emelyanov
1f4c3be50c test/util: Add compare_file() helpers
To be used later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 10:37:08 +03:00
Jan Ciolek
d2ef55b12c test: use NetworkTopologyStrategy in all unit tests
As described in https://github.com/scylladb/scylladb/issues/8638,
we're moving away from `SimpleStrategy`, in the future
it will become deprecated.

We should remove all uses of it and replace them
with `NetworkTopologyStrategy`.

This change replaces `SimpleStrategy` with
`NetworkTopologyStrategy` in all unit tests,
or at least in the ones where it was reasonable to do so.
Some of the tests were written explicitly to test the
`SimpleStrategy` strategy, or changing the keyspace from
`SimpleStrategy` to `NetworkTopologyStrategy`.
These tests were left intact.
It's still a feature that is supported,
even if it's slowly getting deprecated.

The typical way to use `NetworkTopologyStrategy` is
to specify a replication factor for each datacenter.
This could be a bit cumbersome, we would have to fetch
the list of datacenters, set the repfactors, etc.

Luckily there is another way - we can just specify
a replication factor to use for or each existing
datacenter, like this:
```cql
CREATE KEYSPACE {} WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1};
```

This makes the change rather straightforward - just replace all
instances of `'SimpleStrategy'', with `'NetworkTopologyStrategy'`.

Refs: https://github.com/scylladb/scylladb/issues/8638

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13990
2023-05-23 08:52:56 +03:00
Botond Dénes
3b424e391b Merge 'perform_cleanup: wait until all candidates are cleaned up' from Benny Halevy
cleanup_compaction should resolve only after all
sstables that require cleanup are cleaned up.

Since it is possible that some of them are in staging
and therefore cannot be cleaned up, retry once a second
until they become eligible.

Timeout if there is no progress within 5 minutes
to prevent hanging due to view building bug.

Fixes #9559

Closes #13812

* github.com:scylladb/scylladb:
  table: signal compaction_manager when staging sstables become eligible for cleanup
  compaction_manager: perform_cleanup: wait until all candidates are cleaned up
  compaction_manager: perform_cleanup: perform_offstrategy if needed
  compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance
  sstable_set: add for_each_sstable_gently* helpers
2023-05-19 12:35:59 +03:00
Kamil Braun
13df85ea11 Merge 'Cut feature_service -> system_keyspace dependency' from Pavel Emelyanov
This implicit link it pretty bad, because feature service is a low-level
one which lots of other services depend on. System keyspace is opposite
-- a high-level one that needs e.g. query processor and database to
operate. This inverse dependency is created by the feature service need
to commit enabled features' names into system keyspace on cluster join.
And it uses the qctx thing for that in a best-effort manner (not doing
anything if it's null).

The dependency can be cut. The only place when enabled features are
committed is when gossiper enables features on join or by receiving
state changes from other nodes. By that time the
sharded<system_keyspace> is up and running and can be used.

Despite gossiper already has system keyspace dependency, it's better not
to overload it with the need to mess with enabling and persisting
features. Instead, the feature_enabler instance is equipped with needed
dependencies and takes care of it. Eventually the enabler is also moved
to feature_service.cc where it naturally belongs.

Fixes: #13837

Closes #13172

* github.com:scylladb/scylladb:
  gossiper: Remove features and sysks from gossiper
  system_keyspace: De-static save_local_supported_features()
  system_keyspace: De-static load_|save_local_enabled_features()
  system_keyspace: Move enable_features_on_startup to feature_service (cont)
  system_keyspace: Move enable_features_on_startup to feature_service
  feature_service: Open-code persist_enabled_feature_info() into enabler
  gms: Move feature enabler to feature_service.cc
  gms: Move gossiper::enable_features() to feature_service::enable_features_on_join()
  gms: Persist features explicitly in features enabler
  feature_service: Make persist_enabled_feature_info() return a future
  system_keyspace: De-static load_peer_features()
  gms: Move gossiper::do_enable_features to persistent_feature_enabler::enable_features()
  gossiper: Enable features and register enabler from outside
  gms: Add feature_service and system_keyspace to feature_enabler
2023-05-18 18:21:06 +02:00
Benny Halevy
bb59687116 table: signal compaction_manager when staging sstables become eligible for cleanup
perform_cleanup may be waiting for those sstables
to become eligible for cleanup so signal it
when table::move_sstables_from_staging detects an
sstable that requires cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-17 11:33:22 +03:00
Botond Dénes
0cff0ffa08 Merge 'alternator,config: make alternator_timeout_in_ms live-updateable' from Kefu Chai
before this change, alternator_timeout_in_ms is not live-updatable,
as after setting executor's default timeout right before creating
sharded executor instances, they never get updated with this option
anymore. but many users would like to set the driver timers based on
server timers. we need to enable them to configure timeout even
when the server is still running.

in this change,

* `alternator_timeout_in_ms` is marked as live-updateable
* `executor::_s_default_timeout` is changed to a thread_local variable,
   so it can be updated by a per-shard updateable_value. and
   it is now a updateable_value, so its variable name is updated
   accordingly. this value is set in the ctor of executor, and
   it is disconnected from the corresponding named_value<> option
   in the dtor of executor.
* alternator_timeout_in_ms is passed to the constructor of
   executor via sharded_parameter, so `executor::_timeout_in_ms` can
   be initialized on per-shard basis
* `executor::set_default_timeout()` is dropped, as we already pass
   the option to executor in its ctor.

Fixes #12232

Closes #13300

* github.com:scylladb/scylladb:
  alternator: split the param list of executor ctor into multi lines
  alternator,config: make alternator_timeout_in_ms live-updateable
2023-05-15 10:16:29 +03:00
Avi Kivity
31e820e5a1 Merge 'Allow tombstone GC in compaction to be disabled on user request' from Raphael "Raph" Carvalho
Adding new APIs /column_family/tombstone_gc and /storage_service/tombstone_gc, that will allow for disabling tombstone garbage collection (GC) in compaction.

Mimicks existing APIs /column_family/autocompaction and /storage_service/autocompaction.

column_family variant must specify a single table only, following existing convention.

whereas the storage_service one can specify an entire keyspace, or a subset of a tables in a keyspace.

column_family API usage
-----

```
    The table name must be in keyspace:name format

    Get status:
    curl -s -X GET "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

    Enable GC
    curl -s -X POST "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"

    Disable GC
    curl -s -X DELETE "http://127.0.0.1:10000/column_family/tombstone_gc/ks:cf"
```

storage_service API usage
-----

```
    Tables can be specified using a comma-separated list.

    Enable GC on keyspace
    curl -s -X POST "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

    Disable GC on keyspace
    curl -s -X DELETE "http://127.0.0.1:10000/storage_service/tombstone_gc/ks"

    Enable GC on a subset of tables
    curl -s -X POST
    "http://127.0.0.1:10000/storage_service/tombstone_gc/ks?cf=table1,table2"
```

Closes #13793

* github.com:scylladb/scylladb:
  test: Test new API for disabling tombstone GC
  test: rest_api: extract common testing code into generic functions
  Add API to disable tombstone GC in compaction
  api: storage_service: restore indentation
  api: storage_service: extract code to set attribute for a set of tables
  tests: Test new option for disabling tombstone GC in compaction
  compaction_strategy: bypass tombstone compaction if tombstone GC is disabled
  table: Allow tombstone GC in compaction to be disabled on user request
2023-05-14 14:16:16 +03:00
Avi Kivity
0a78995e2b Merge 'Share s3 clients between sstables' from Pavel Emelyanov
Currently s3::client is created for each sstable::storage. It's later shared between sstable's files and upload sink(s). Also foreign_sstable_open_info can produce a file from a handle making a new standalone client. Coupled with the seastar's http client spawning connections on demand, this makes it impossible to control the amount of opened connections to object storage server.

In order to put some policy on top of that (as well as apply workload prioritization) s3 clients should be collected in one place and then shared by users. Since s3::client uses seastar::http::client under the hood which, in turn, can generate many connections on demand, it's enough to produce a single s3::client per configured endpoint one each shard and then share it between all the sstables, files and sinks.

There's one difficulty however, solving which is most of what this PR does. The file handle, that's used to transfer sstable's file across shards, should keep aboard all it needs to re-create a file on another shard. Since there's a single s3::client per shard, creation of a file out of a handle should grab that shard's client somehow. The meaningful shard-local object that can help is the sstables_manager and there are three ways to make use of it. All deal with the fact that sstables_manager-s are not sharded<> services, but are owner by the database independently on each shard.

1. walk the client -> sst.manager -> database -> container -> database -> sst.manager -> client chain by keeping its first half on the handle and unrolling the second half to produce a file
2. keep sharded peering service referenced by the sstables_manager that's initialized in main and passed though the database constructor down to sstables_manager(s)
3. equip file_handle::to_file with the "context" argument and teach sstables foreign info opener to push sstables_manager down to s3 file ... somehow

This PR chooses the 2nd way and introduces the sstables::storage_manager main-local sharded peering service that maintains all the s3::clients. "While at it" the new manager gets the object_storage_config updating facilities from the database (it's overloaded even without it already). Later the manager will also be in charge of collecting and exporting S3 metrics. In order to limit the number of S3 connections it also needs a patch seastar http::client, there's PR already doing that, once (if) merged there'll come one more fix on top.

refs: #13458
refs: #13369
refs: scylladb/seastar#1652

Closes #13859

* github.com:scylladb/scylladb:
  s3: Pick client from manager via handle
  s3: Generalize s3 file handle
  s3: Live-update clients' configs
  sstables: Keep clients shared across sstables
  storage_manager: Rewrap config map
  sstables, database: Move object storage config maintenance onto storage_manager
  sstables: Introduce sharded<storage_manager>
2023-05-14 14:14:23 +03:00
Raphael S. Carvalho
6c32148751 tests: Test new option for disabling tombstone GC in compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:14:28 -03:00
Raphael S. Carvalho
3b28c26c77 table: Allow tombstone GC in compaction to be disabled on user request
If tombstone GC was disabled, compaction will ensure that fully expired
sstables won't be bypassed and that no expired tombstones will be
purged. Changing the value takes immediate effect even on ongoing
compactions.

Not wired into an API yet.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-12 10:14:28 -03:00
Pavel Emelyanov
a59096aa70 sstables, database: Move object storage config maintenance onto storage_manager
Right now the map<endpoint, config> sits on the sstables manager and its
update is governed by database (because it's peering and can kick other
shards to update it as well).

Having the sharded<storage_manager> at hand lets freeing database from
the need to update configs and keeps sstables_manager a bit smaller.
Also this will allow keeping s3 clients shared between sstables via this
map by next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:00 +03:00
Pavel Emelyanov
2153751d45 sstables: Introduce sharded<storage_manager>
The manager in question keeps track of whatever sstables_manager needs
to work with the storage (spoiler: only S3 one). It's main-local sharded
peering service, so that container() call can be used by next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:36:01 +03:00
Botond Dénes
24cb351655 Merge 'test: sstable_*test: avoid using helper using generation_type::int_t ' from Kefu Chai
the series drops some of the callers using SSTable generation as integer. as the generation of SSTable is but an identifier, we should not use it as an integer out of generation_type's implementation.

Closes #13845

* github.com:scylladb/scylladb:
  test: drop unused helper functions
  test: sstable_mutation_test: avoid using helper using generation_type::int_t
  test: sstable_move_test: avoid using helper using generation_type::int_t
  test: sstable_*test: avoid using helper using generation_type::int_t
  test: sstable_3_x_test: do not use reuseable_sst() accepting integer
2023-05-11 10:17:02 +03:00
Kefu Chai
29284d64a5 test: drop unused helper functions
all users of these two helpers have switched to their alternatives,
so there is no need to keep them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-11 12:32:37 +08:00
Kefu Chai
bfd6caffbb test: sstable_*test: avoid using helper using generation_type::int_t
this change is one of the series which drops most of the callers
using SSTable generation as integer. as the generation of SSTable
is but an identifier, we should not use it as an integer out of
generation_type's implementation. so, in this change, instead of
using the helper accepting int, we switch to the one which accepts
generation_type by offering a default paramter, which is a
generation created using 1. this preserves the existing behavior.

we will divert other callers of `reusable_sst(...,
generation_type::int)` in following-up changes in different ways.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-11 12:32:22 +08:00
Nadav Har'El
e57252092c Merge 'cql3: result_set, selector: change value type to managed_bytes_opt' from Avi Kivity
CQL evolved several expression evaluation mechanisms: WHERE clause,
selectors (the SELECT clause), and the LWT IF clause are just some
examples. Most now use expressions, which use managed_bytes_opt
as the underlying value representation, but selectors still use bytes_opt.

This poses two problems:
1. bytes_opt generates large contiguous allocations when used with large blobs, impacting latency
2. trying to use expressions with bytes_opt will incur a copy, reducing performance

To solve the problem, we harmonize the data types to managed_bytes_opt
(#13216 notwithstanding). This is somewhat difficult since the source of the values
are views into a bytes_ostream. However, luckily bytes_ostream and managed_bytes_view
are mostly compatible so with a little effort this can be done.

The series is neutral wrt performance:

before:
```
222118.61 tps ( 61.1 allocs/op,  12.1 tasks/op,   43092 insns/op,        0 errors)
224250.14 tps ( 61.1 allocs/op,  12.1 tasks/op,   43094 insns/op,        0 errors)
224115.66 tps ( 61.1 allocs/op,  12.1 tasks/op,   43092 insns/op,        0 errors)
223508.70 tps ( 61.1 allocs/op,  12.1 tasks/op,   43107 insns/op,        0 errors)
223498.04 tps ( 61.1 allocs/op,  12.1 tasks/op,   43087 insns/op,        0 errors)
```

after:
```
220708.37 tps ( 61.1 allocs/op,  12.1 tasks/op,   43118 insns/op,        0 errors)
225168.99 tps ( 61.1 allocs/op,  12.1 tasks/op,   43081 insns/op,        0 errors)
222406.00 tps ( 61.1 allocs/op,  12.1 tasks/op,   43088 insns/op,        0 errors)
224608.27 tps ( 61.1 allocs/op,  12.1 tasks/op,   43102 insns/op,        0 errors)
225458.32 tps ( 61.1 allocs/op,  12.1 tasks/op,   43098 insns/op,        0 errors)
```

Though I expect with some more effort we can eliminate some copies.

Closes #13637

* github.com:scylladb/scylladb:
  cql3: untyped_result_set: switch to managed_bytes_view as the cell type
  cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
  cql3: untyped_result_set: always own data
  types: abstract_type: add mixed-type versions of compare() and equal()
  utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view
  utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt
  utils: managed_bytes: add managed_bytes_view::with_linearized()
  utils: managed_bytes: mark managed_bytes_view::is_linearized() const
2023-05-10 15:01:45 +03:00
Avi Kivity
42a1ced73b cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.

Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.

The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
2023-05-07 17:17:36 +03:00
Kefu Chai
bd3e8d0460 test: drop a reusable_sst() variant which accepts int as generation
this is one of the changes to reduce the usage of integer based generation
test. in future, we will need to expand the test to exercise the UUID
based generation, or at least to be neutral to the underlying generation's
identifier type. so, to remove the helpers which only accept `generation_type::int_t`
would helps us to make this happen.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-06 18:24:48 +08:00