Commit Graph

669 Commits

Author SHA1 Message Date
Tomasz Grabiec
d8826acaa3 tablets: Fix stack smashing in tablet_map_to_mutation()
The code was incorrectly passing a data_value of type bytes due to
implicit conversion of the result of serialize() (bytes_opt) to a
data_value object of type bytes_type via:

   data_value(std::optional<NativeType>);

mutation::set_static_cell() accepts a data_value object, which is then
serialized using column's type in abstract_type::decompose(data_value&):

    bytes b(bytes::initialized_later(), serialized_size(*this, value._value));
    auto i = b.begin();
    value.serialize(i);

Notice that serialized_size() is taken from the column type, but
serialization is done using data_value's type. The two types may have
a compatible CQL binary representation, but may differ in native
types. serialized_size() may incorrectly interpret the native type and
come up with the wrong size. If the size is too smaller, we end up with
stack or heap corruption later after serialize().

For example, if the column type is utf8 but value holds bytes, the
size will be wrong because even though both use the basic_sstring
type, they have a different layout due to max_size (15 vs 31).

Fixes #13717

Closes #13787
2023-05-07 14:07:50 +03:00
Avi Kivity
f125a3e315 Merge 'tree: finish the reader_permit state renames' from Botond Dénes
In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`.
This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date.

Closes #13573

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update API w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes
2023-05-04 18:29:04 +03:00
Avi Kivity
1d351dde06 Merge 'Make S3 client work with real S3' from Pavel Emelyanov
Current S3 client was tested over minio and it takes few more touches to work with amazon S3.

The main challenge here is to support singed requests. The AWS S3 server explicitly bans unsigned multipart-upload requests, which in turn is the essential part of the sstables S3 backend, so we do need signing. Signing a request has many options and requirements, one of them is -- request _body_ can be or can be not included into signature calculations. This is called "(un)signed payload". Requests sent over plain HTTP require payload signing (i.e. -- request body should be included into signature calculations), which can a bit troublesome, so instead the PR uses unsigned payload (i.e. -- doesn't include the request body into signature calculation, only necessary headers and query parameters), but thus also needs HTTPS.

So what this set does is makes the existing S3 client code sign requests. In order to sign the request the code needs to get AWS key and secret (and region) from somewhere and this somewhere is the conf/object_storage.yaml config file. The signature generating code was previously merged (moved from alternator code) and updated to suit S3 client needs.

In order to properly support HTTPS the PR adds special connection factory to be used with seastar http client. The factory makes DNS resolving of AWS endpoint names and configures gnutls systemtrust.

fixes: #13425

Closes #13493

* github.com:scylladb/scylladb:
  doc: Add a document describing how to configure S3 backend
  s3/test: Add ability to run boost test over real s3
  s3/client: Sign requests if configured
  s3/client: Add connection factory with DNS resolve and configurable HTTPS
  s3/client: Keep server port on config
  s3/client: Construct it with config
  s3/client: Construct it with sstring endpoint
  sstables: Make s3_storage with endpoint config
  sstables_manager: Keep object storage configs onboard
  code: Introduce conf/object_storage.yaml configuration file
2023-05-04 18:08:54 +03:00
Benny Halevy
e2023877f2 sstable_directory: parallel_for_each_restricted: do not move container
Commit ecbd112979
`distributed_loader: reshard: consider sstables for cleanup`
caused a regression in loading new sstables using the `upload`
directory, as seen in e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/230/testReport/migration_test/TestMigration/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split000___test_migrate_sstable_without_compression_3_0_md_/
```
        query = "SELECT COUNT(*) FROM cf"
        statement = SimpleStatement(query)
        s = self.patient_cql_connection(node, 'ks')
        result = list(s.execute(statement))
>       assert result[0].count == expected_number_of_rows, \
            "Expected {} rows. Got {}".format(expected_number_of_rows, list(s.execute("SELECT * FROM ks.cf")))
E       AssertionError: Expected 1 rows. Got []
E       assert 0 == 1
E         +0
E         -1
```

The reason for the regression is that the call to `do_for_each_sstable`
in `collect_all_shared_sstables` to search for sstables that need
cleanup caused the list of sstables in the sstable directory to be
moved and cleared.

parallel_for_each_restricted moves the container passed to it
into a `do_with` continuation.  This is required for
parallel_for_each_restricted.

However, moving the container is destructive and so,
the decision whether to move or not needs to be the
caller's, not the callee.

This patch changes the signature of parallel_for_each_restricted
to accept a lvalue reference to the container rather than a rvalue reference,
allowing the callers to decide whether to move or not.

Most callers are converted to move the container, as they effectively do
today, and a new method, `filter_sstables` was added for the
`collect_all_shared_sstables` us case, that allows the `func` that
processes each sstable to decide whether the sstable is kept
in `_unshared_local_sstables` or not.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-04 11:36:25 +03:00
Pavel Emelyanov
bd1e3c688f sstables_manager: Keep object storage configs onboard
The user sstables manager will need to provide endpoint config for
sstables' storage drivers. For that it needs to get it from db::config
and keep in-sync with its updates.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Botond Dénes
022465d673 Merge 'Tone down offstrategy log message' from Benny Halevy
In many cases we trigger offstrategy compaction opportunistically
also when there's nothing to do.  In this case we still print
to the log lots of info-level message and call
`run_offstrategy_compaction` that wastes more cpu cycles
on learning that it has nothing to do.

This change bails out early if the maintenance set is empty
and prints a "Skipping off-strategy compaction" message in debug
level instead.

Fixes #13466

Also, add an group_id class and return it from compaction_group and table_state.
Use that to identify the compaction_group / table_state by "ks_name.cf_name compaction_group=idx/total" in log messages.

Fixes #13467

Closes #13520

* github.com:scylladb/scylladb:
  compaction_manager: print compaction_group id
  compaction_group, table_state: add group_id member
  compaction_manager: offstrategy compaction: skip compaction if no candidates are found
2023-05-02 08:05:18 +03:00
Avi Kivity
7b7d9bcb14 Merge 'Do not access owned_ranges_ptr across shards in update_sstable_cleanup_state' from Benny Halevy
This series fixes a few issues caused by f1bbf705f9
(f1bbf705f9):

- table, compaction_manager: prevent cross shard access to owned_ranges_ptr
  - Fixes #13631
- distributed_loader: distribute_reshard_jobs: pick one of the sstable shard owners
- compaction: make_partition_filter: do not assert shard ownership
  - allow the filtering reader now used during resharding to process tokens owned by other shards

Closes #13635

* github.com:scylladb/scylladb:
  compaction: make_partition_filter: do not assert shard ownership
  distributed_loader: distribute_reshard_jobs: pick one of the sstable shard owners
  table, compaction_manager: prevent cross shard access to owned_ranges_ptr
2023-05-01 22:51:00 +03:00
Kefu Chai
56b99b7879 build: cmake: pick up tablets related changes
to sync with the changes in 5e89f2f5ba

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 11:13:41 +08:00
Kamil Braun
30cc07b40d Merge 'Introduce tablets' from Tomasz Grabiec
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.

Things achieved in this PR:

  - You can start a cluster and create a keyspace whose tables will use
    tablet-based replication. This is done by setting `initial_tablets`
    option:

    ```
        CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
                        'replication_factor': 3,
                        'initial_tablets': 8};
    ```

    All tables created in such a keyspace will be tablet-based.

    Tablet-based replication is a trait, not a separate replication
    strategy. Tablets don't change the spirit of replication strategy, it
    just alters the way in which data ownership is managed. In theory, we
    could use it for other strategies as well like
    EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
    is augmented to support tablets.

  - You can create and drop tablet-based tables (no DDL language changes)

  - DML / DQL work with tablet-based tables

    Replicas for tablet-based tables are chosen from tablet metadata
    instead of token metadata

Things which are not yet implemented:

  - handling of views, indexes, CDC created on tablet-based tables
  - sharding is done using the old method, it ignores the shard allocated in tablet metadata
  - node operations (topology changes, repair, rebuild) are not handling tablet-based tables
  - not integrated with compaction groups
  - tablet allocator piggy-backs on tokens to choose replicas.
    Eventually we want to allocate based on current load, not statically

Closes #13387

* github.com:scylladb/scylladb:
  test: topology: Introduce test_tablets.py
  raft: Introduce 'raft_server_force_snapshot' error injection
  locator: network_topology_strategy: Support tablet replication
  service: Introduce tablet_allocator
  locator: Introduce tablet_aware_replication_strategy
  locator: Extract maybe_remove_node_being_replaced()
  dht: token_metadata: Introduce get_my_id()
  migration_manager: Send tablet metadata as part of schema pull
  storage_service: Load tablet metadata when reloading topology state
  storage_service: Load tablet metadata on boot and from group0 changes
  db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
  migration_notifier: Introduce before_drop_keyspace()
  migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
  test: perf: Introduce perf-tablets
  test: Introduce tablets_test
  test: lib: Do not override table id in create_table()
  utils, tablets: Introduce external_memory_usage()
  db: tablets: Add printers
  db: tablets: Add persistence layer
  dht: Use last_token_of_compaction_group() in split_token_range_msb()
  locator: Introduce tablet_metadata
  dht: Introduce first_token()
  dht: Introduce next_token()
  storage_proxy: Improve trace-level logging
  locator: token_metadata: Fix confusing comment on ring_range()
  dht, storage_proxy: Abstract token space splitting
  Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
  db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
  db: Introduce get_non_local_vnode_based_strategy_keyspaces()
  service: storage_proxy: Avoid copying keyspace name in write handler
  locator: Introduce per-table replication strategy
  treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
  locator: Introduce effective_replication_map
  locator: Rename effective_replication_map to vnode_effective_replication_map
  locator: effective_replication_map: Abstract get_pending_endpoints()
  db: Propagate feature_service to abstract_replication_strategy::validate_options()
  db: config: Introduce experimental "TABLETS" feature
  db: Log replication strategy for debugging purposes
  db: Log full exception on error in do_parse_schema_tables()
  db: keyspace: Remove non-const replication strategy getter
  config: Reformat
2023-04-27 09:40:18 +02:00
Raphael S. Carvalho
59904be5c3 table: Avoid reallocations in make_compaction_groups()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-25 11:14:33 -03:00
Raphael S. Carvalho
9f5e19224d table: Remove another outdated comment regarding sstable generation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-25 11:09:51 -03:00
Raphael S. Carvalho
2d45dd35c7 table: Remove outdated comment regarding automatic compaction
We already provide a way to disable automatic compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-25 11:09:45 -03:00
Tomasz Grabiec
9d786c1ebc db: tablets: Add persistence layer 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
94e1c7b859 db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
This allows update_pending_ranges(), invoked on keyspace creation, to
succeed in the presence of keyspaces with per-table replication
strategy. It will update only vnode-based erms, which is intended
behavior, since only those need pending ranges updated.

This change will also make node operations like bootstrap, repair,
etc. to work (not fail) in the presence of keyspaces with per-table
erms, they will just not be replicated using those algorithms.

Before, these would fail inside get_effective_replication_map(), which
is forbidden for keyspaces with per-table replication.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
dc04da15ec db: Introduce get_non_local_vnode_based_strategy_keyspaces()
It's meant to be used in places where currently
get_non_local_strategy_keyspaces() is used, but work only with
keyspaces which use vnode-based replication strategy.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
9b17ad3771 locator: Introduce per-table replication strategy
Will be used by tablet-based replication strategies, for which
effective replication map is different per table.

Also, this patch adapts existing users of effective replication map to
use the per-table effective replication map.

For simplicity, every table has an effective replication map, even if
the erm is per keyspace. This way the client code can be uniform and
doesn't have to check whether replication strategy is per table.

Not all users of per-keyspace get_effective_replication_map() are
adapted yet to work per-table. Those algorithms will throw an
exception when invoked on a keyspace which uses per-table replication
strategy.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
5d9bcb45de treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
d3c9ad4ed6 locator: Rename effective_replication_map to vnode_effective_replication_map
In preparation for introducing a more abstract
effective_replication_map which can describe replication maps which
are not based on vnodes.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
7b01fe8742 db: Propagate feature_service to abstract_replication_strategy::validate_options()
Some replication strategy options may be feature-dependent.
2023-04-24 10:49:36 +02:00
Tomasz Grabiec
a892e144cc db: Log replication strategy for debugging purposes 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
7543c75b62 db: Log full exception on error in do_parse_schema_tables() 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
c923bdd222 db: keyspace: Remove non-const replication strategy getter
Keyspace will store replication_ptr, which is a const pointer. No user
needs a mutable reference.
2023-04-24 10:49:36 +02:00
Benny Halevy
dabf46c37f compaction_group, table_state: add group_id member
To help identify the compaction group / table_state.

Ref #13467

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-24 10:06:04 +03:00
Botond Dénes
1750bb34b7 Merge 'sstables, replica: add generation generator' from Kefu Chai
this is the first step to the uuid-based generation identifier. the goal is to encapsulate the generation related logic in generator, so its consumers do not have to understand the difference between the int64_t based generation and UUID v1 based generation.

this commit should not change the behavior of existing scylla. it just allows us to derive from `generation_generator` so we can have another generator which generates UUID based generation identifier.

Closes #13073

* github.com:scylladb/scylladb:
  replica, test: create generation id using generator
  sstables: add generation_generator
  test: sstables: use generate_n for generating ids for testing
2023-04-24 09:31:08 +03:00
Botond Dénes
7f04d8231d Merge 'gms: define and use generation and version types' from Benny Halevy
This series cleans up the generation and value types used in gms / gossiper.
Currently we use a blend of int, int32_t, and int64_t around messaging.
This change defines gms::generation_type and gms::version_type as int32_t
and add check in non-release modes that the respective int64 value passed over messaging do not overflow 32 bits.

Closes #12966

* github.com:scylladb/scylladb:
  gossiper: version_generator: add {debug_,}validate_gossip_generation
  gms: gossip_digest: use generation_type and version_type
  gms: heart_beat_state: use generation_type and version_type
  gms: versioned_value: use version_type
  gms: version_generator: define version_type and generation_type strong types
  utils: move generation-number to gms
  utils: add tagged_integer
  gms: versioned_value: make members private
  scylla-gdb: add get_gms_versioned_value
  gms: versioned_value: delete unused compare_to function
  gms: gossip_digest: delete unused compare_to function
2023-04-24 08:44:48 +03:00
Pavel Emelyanov
5e201b9120 database: Remove compaction_manager.hh inclusion into database.hh
The only reason why it's there (right next to compaction_fwd.hh) is
because the database::table_truncate_state subclass needs the definition
of compaction_manager::compaction_reenabler subclass.

However, the former sub is not used outside of database.cc and can be
defined in .cc. Keeping it outside of the header allows dropping the
compaction_manager.hh from database.hh thus greatly reducing its fanout
over the code (from ~180 indirect inclusions down to ~20).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13622
2023-04-23 16:27:11 +03:00
Benny Halevy
c7d064b8b1 distributed_loader: distribute_reshard_jobs: pick one of the sstable shard owners
When distributing the resharding jobs, prefer one of
the sstable shard owners based on foreign_sstable_open_info.

This is particularly important for uploaded sstables
that are resharded since they require cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 15:13:16 +03:00
Benny Halevy
2f61de8f7b table, compaction_manager: prevent cross shard access to owned_ranges_ptr
Seen after f1bbf705f9 in debug mode

distributed_loader collect_all_shared_sstables copies
compaction::owned_ranges_ptr (lw_shared_ptr<const
dht::token_range_vector>)
across shards.

Since update_sstable_cleanup_state is synchronous, it can
be passed a const refrence to the token_range_vector instead.
It is ok to access the memory read-only across shards
and since this happens on start-up, there are no special
performance requirements.

Fixes #13631

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 15:12:13 +03:00
Benny Halevy
c5d819ce60 gms: versioned_value: make members private
and provide accessor functions to get them.

1. So they can't be modified by mistake, as the versioned value is
   immutable. A new value must have a higher version.
2. Before making the version a strong gms::version_type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Kefu Chai
576adbdbc5 replica, test: create generation id using generator
reuse generation_generator for generating generation identifiers for
less repeatings. also, add allow update generator to update its
lastest known generation id.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 22:02:30 +08:00
Pavel Emelyanov
837fde84b1 view: Carry data_dictionary arg through standalone helpers
There's a bunch of functions in view.{hh|cc} that don't belong to any
class and perform view-related claculations for view updates. Lots of
them eventually call view_info::select_statement() which will later need
the dictionary.

By now all those methods' callers have data dictionary at hand and can
share it via argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:46 +03:00
Pavel Emelyanov
9d3d533561 view_update_builder: Construct with data dictionary
The caller is table with view-update-generator at hand (it calls
mutate_MV on). Builder here is used as a temporary object that destroys
once the caller coroutine co_return-s, so keeping the database obtained
from the view-update-generator is safe.

Later the v.u.b. object will propagate its data dictionary down the
callstacks.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 11:17:38 +03:00
Pavel Emelyanov
4a16ab3bd4 table: Push view_update_generator arg to affected_views()
Caller already has it to call mutate_MV() on. The method in question
will need the generator in one of the next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-20 10:42:31 +03:00
Botond Dénes
804403f618 reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
They is still using the old terminology for permit state names, bring
them up to date with the recent state name changes.
2023-04-19 05:20:42 -04:00
Botond Dénes
289ff821c9 Merge 'Remove global proxy usage from view builder's value_getter' from Pavel Emelyanov
There's a legacy safety check in view code that needs to find a base table from its schema ID. To do it it calls for global storage proxy instance. The comment says that this code can be removed once computes_column feature is known by everyone. I'm not sure if that's the case, so here's more complicated yet less incompatible way to stop using global proxy instance.

Closes #13504

* github.com:scylladb/scylladb:
  view: Remove unused view_ptr reference
  view: Carry backing-secondary-index bit via view builder
  view: Keep backing-seconday-index bool on value_getter
  table: Add const index manager sgetter
2023-04-14 11:23:23 +03:00
Raphael S. Carvalho
a47bac931c Move TWCS option from table into TWCS itself
enable_optimized_twcs_queries is specific to TWCS, therefore it
belongs to TWCS, not replica::table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13489
2023-04-14 08:28:16 +03:00
Raphael S. Carvalho
fe6df3d270 sstable_loader: Discard SSTable bloom filter on load-and-stream
Load-and-stream reads the entire content from SSTables, therefore it can
afford to discard the bloom filter that might otherwise consume a significant
amount of memory. Bloom filters are only needed by compaction and other
replica::table operations that might want to check the presence of keys
in the SSTable files, like single-partition reads.

It's not uncommon to see Data:Filter ratio of less than 100:1, meaning
that for ~300G of data, filters will take ~3G.

In addition to saving memory footprint, it also reduces operation time
as load-and-stream no longer have to read, parse and build the filters
from disk into memory.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:34:22 -03:00
Pavel Emelyanov
0d9da46428 table: Add const index manager sgetter
To be used by next patch that will call this helper inside non-mutable
lambda

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-13 16:45:16 +03:00
Kefu Chai
59579d5876 utils: fragment_range: specialize fmt::formatter<FragmentedView>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print classes fulfill the requirement of `FragmentedView` concept
without the help of template function of `to_hex()`, this function is
dropped in this change, as all its callers are now using fmtlib
for formatting now. the helper of `fragment_to_hex()` is dropped
as well, its only caller is `to_hex()`.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13471
2023-04-11 16:09:38 +03:00
Botond Dénes
f1bbf705f9 Merge 'Cleanup sstables in resharding and other compaction types' from Benny Halevy
This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

Closes #12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor
2023-04-11 14:52:29 +03:00
Benny Halevy
96660b2ef7 table: discard_sstables: update_sstable_cleanup_state when deleting sstables
We need to remove the deleted sstables from
update_sstable_cleanup_state otherwise their data and index
files will remain opened and their storage space won't be reclaimed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:37:56 +03:00
Benny Halevy
73280c0a15 compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
So it can be used in the next patch that will refactor
compaction_state out of class compaction_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:19:04 +03:00
Benny Halevy
6ebafe74b9 table, compaction_manager: add requires_cleanup
Returns true iff any of the sstables in the set
requries cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:14:36 +03:00
Benny Halevy
db7fa9f3be distributed_loader: reshard: update sstable cleanup state
Since the sstables are loaded from foreign open info
we should mark them for cleanup if needed (and owned_ranges_ptr is provided).

This will allow a later patch to enable filtering
for cleanup only for sstable sets containing
sstables that require cleanup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:11:00 +03:00
Benny Halevy
d0690b64c1 table, compaction_manager: add update_sstable_cleanup_state
update_sstable_cleanup_state calls needs_cleanup and
inserts (or erases) the sstable into the respective
compaction_state.sstables_requiring_cleanup set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:10:55 +03:00
Benny Halevy
1baca96de1 compaction_manager: needs_cleanup: delete unused schema param
It isn't needed.  The sstable already has a schema.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:03:53 +03:00
Benny Halevy
ecbd112979 distributed_loader: reshard: consider sstables for cleanup
When called from `process_upload_dir` we pass a list
of owned tokens to `reshard`.  When they are available,
run resharding, with implicit cleanup, also on unshared
sstables that need cleanup.

Fixes #11933

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:01:38 +03:00
Benny Halevy
3ccbb28f2a distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
To facilitate implicit cleanup of sstables via resharding.

Refs #11933

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:59:38 +03:00
Benny Halevy
aa4b18f8fb distributed_loader: reshard: add optional owned_ranges_ptr param
For passing owned_ranges_ptr from
distributed_loader::process_upload_dir.

Refs #11933

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:57:41 +03:00
Benny Halevy
f540af930b distributed_loader: reshard: get a ref to table_state
We don't reference the table itself, only as_table_state.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:57:11 +03:00