Commit Graph

144 Commits

Author SHA1 Message Date
Botond Dénes
05ef13a627 Merge 'Add support to split large partitions across SSTables' from Raphael "Raph" Carvalho
Introduces support to split large partitions during compaction. Today, compaction can only split input data at partition boundary, so a large partition is stored in a single file. But that can cause many problems, like memory pressure (e.g.: https://github.com/scylladb/scylladb/issues/4217), and incremental compaction can also not fulfill its promise as the file storing the large partition can only be released once exhausted.

The first step was to add clustering range metadata for first and last partition keys (retrieved from promoted index), which is crucial to determine disjointness at clustering level, and also the order at which the disjoint files should be opened for incremental reading.

The second step was to extend sstable_run to look at clustering dimension, so a set of files storing disjoint ranges for the same partition can live in the same sstable run.

The final step was to introduce the option for compaction to split large partition being written if it has exceeded the size threshold.

What's next? Following this series, a reader will be implemented for sstable_run that will incrementally open the readers. It can be safely built on the assumption of the disjoint invariant after the second step aforementioned.

Closes #11233

* github.com:scylladb/scylladb:
  test: Add test for large partition splitting on compaction
  compaction: Add support to split large partitions
  sstable: Extend sstable_run to allow disjointness on the clustering level
  sstables: simplify will_introduce_overlapping()
  test: move sstable_run_disjoint_invariant_test into sstable_datafile_test
  test: lib: Fix inefficient merging of mutations in make_sstable_containing()
  sstables: Keep track of first partition's first pos and last partition's last pos
  sstables: Rename min/max position_range to a descriptive name
  sstables_manager: Add sstable metadata reader concurrency semaphore
  sstables: Add ability to find first or last position in a partition
2022-09-15 16:08:56 +03:00
Raphael S. Carvalho
e099a9bf3b sstables_manager: Add sstable metadata reader concurrency semaphore
Let's introduce a reader_concurrency_semaphore for reading sstable
metadata, to avoid an OOM due to unlimited concurrency.
The concurrency on startup is not controlled, so it's important
to enforce a limit on the amount of memory used by the parallel
readers.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-14 13:09:51 -03:00
Michał Chojnowski
9b6fc553b4 db: commitlog: don't print INFO logs on shutdown
The intention was for these logs to be printed during the
database shutdown sequence, but it was overlooked that it's not
the only place where commitlog::shutdown is called.
Commitlogs are started and shut down periodically by hinted handoff.
When that happens, these messages spam the log.

Fix that by adding INFO commitlog shutdown logs to database::stop,
and change the level of the commitlog::shutdown log call to DEBUG.

Fixes #11508

Closes #11536
2022-09-14 11:30:53 +03:00
Benny Halevy
3b0147390b replica: database: get_tombstone_gc_state from compaction_manager
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:02:54 +03:00
Benny Halevy
5dd15aa3c8 tombstone_gc: introduce tombstone_gc_state
and use it to access the repair history maps.

At this introductory patch, we use default-constructed
tombstone_gc_state to access the thread-local maps
temporarily and those use sites will be replaced
in following patches that will gradually pass
the tombstone_gc_state down from the compaction_manager
to where it's used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:02:54 +03:00
Benny Halevy
3d88fe9729 database: do not drop_repair_history_map_for_table in detach_column_family
drop_repair_history_map_for_table is called on each shard
when database::truncate is done, and the table is stopped.

dropping it before the table is stopped is too early.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 22:43:08 +03:00
Botond Dénes
7d17d675af utils/logalloc: move global stat accessors to tracker
These are pretend free functions, accessing globals in the background,
make them a member of the tracker instead, which everything needed
locally to compute them. Callers still have to access these stats
through the global tracker instance, but this can be changed to happen
through a local instance. Soon....
2022-08-23 10:38:58 +03:00
Botond Dénes
2b1eb6e284 database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table()
Instead of querier_cache::evict_all_for_table(). The new method cover
all queriers and in addition any other inactive reads registered on the
semaphore. In theory by the time we detach a table, no regular inactive
reads should be in the semaphore anymore, but if there is any still, we
better evict them before the table is destroyed, they might attempt to
access it in when destroyed later.
2022-08-15 14:16:41 +03:00
Avi Kivity
e9cbc9ee85 Merge 'Add support for empty replica pages' from Botond Dénes
Many tombstones in a partition is a problem that has been plaguing queries since the inception of Scylla (and even before that as they are a pain in Apache Cassandra too). Tombstones don't count towards the query's page limit, neither the size nor the row number one. Hence, large spans of tombstones (be that row- or range-tombstones) are problematic: the query can time out while processing this span of tombstones, as it waits for more live rows to fill the page. In the extreme case a partition becomes entirely unreadable, all read attempts timing out, until compaction manages to purge the tombstones.
The solution proposed in this PR is to pass down a tombstone limit to replicas: when this limit is reached, the replica cuts the page and marks it as short one, even if the page is empty currently. To make this work, we use the last-position infrastructure added recently by 3131cbea62, so that replicas can provide the position of the last processed item to continue the next page from. Without this no forward progress could be made in the case of an empty page: the query would continue from the same position on the next page, having to process the same span of tombstones.
The limit can be configured with the newly added `query_tombstone_limit` configuration item, defaulted to 10000. The coordinator will pass this to the newly added `tombstone_limit` field of `read_command`, if the `replica_empty_pages` cluster feature is set.

Upgrade sanity test was conducted as following:
* Created cluster of 3 nodes with RF=3 with master version
* Wrote small dataset of 1000 rows.
* Deleted prefix of 980 rows.
* Started read workload: `scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100`
* Also did some manual queries via `cqlsh` with smaller page size and tracing on.
* Stopped and upgraded each node one-by-one. New nodes were started by `--query-tombstone-page-limit=10`.
* Confirmed there are no errors or read-repairs.

Perf regression test:
```
build/release/test/perf/perf_simple_query_g -c1 -m2G --concurrency=1000 --task-quota-ms 10 --duration=60
```
Before:
```
median 133665.96 tps ( 62.0 allocs/op,  12.0 tasks/op,   43007 insns/op,        0 errors)
median absolute deviation: 973.40
maximum: 135511.63
minimum: 104978.74
```
After:
```
median 129984.90 tps ( 62.0 allocs/op,  12.0 tasks/op,   43181 insns/op,        0 errors)
median absolute deviation: 2979.13
maximum: 134538.13
minimum: 114688.07
```
Diff: +~200 instruction/op.

Fixes: https://github.com/scylladb/scylla/issues/7689
Fixes: https://github.com/scylladb/scylla/issues/3914
Fixes: https://github.com/scylladb/scylla/issues/7933
Refs: https://github.com/scylladb/scylla/issues/3672

Closes #11053

* github.com:scylladb/scylladb:
  test/cql-pytest: add test for query tombstone page limit
  query-result-writer: stop when tombstone-limit is reached
  service/pager: prepare for empty pages
  service/storage_proxy: set smallest continue pos as query's continue pos
  service/storage_proxy: propagate last position on digest reads
  query: result_merger::get() don't reset last-pos on short-reads and last pages
  query: add tombstone-limit to read-command
  service/storage_proxy: add get_tombstone_limit()
  query: add tombstone_limit type
  db/config: add config item for query tombstone limit
  gms: add cluster feature for empty replica pages
  tree: don't use query::read_command's IDL constructor
2022-08-10 13:38:06 +03:00
Avi Kivity
be44fd63f9 Merge 'Make get_range_addresses async and hold effective_replication_map_ptr around it' from Benny Halevy
This series converts the synchronous `effective_replication_map::get_range_addresses` to async
by calling the replication strategy async entry point with the same name, as its callers are already async
or can be made so easily.

To allow it to yield and work on a coherent view of the token_metadata / topology / replication_map,
let the callers of this patch hold a effective_replication_map per keyspace and pass it down
to the (now asynchronous) functions that use it (making affected storage_service methods static where possible
if they no longer depend on the storage_service instance).

Also, the repeated calls to everywhere_replication_strategy::calculate_natural_endpoints
are optimized in this series by introducing a virtual abstract_replication_strategy::has_static_natural_endpoints predicate
that is true for local_strategy and everywhere_replication_strategy, and is false otherwise.
With it, functions repeatedly calling calculate_natural_endpoints in a loop, for every token, will call it only once since it will return the same result every time anyhow.

Refs #11005

Doesn't fix the issue as the large allocation still remains until we make change dht::token_range_vector chunked (chunked_vector cannot be used as is at the moment since we require the ability to push also to the front when unwrapping)

Closes #11009

* github.com:scylladb/scylladb:
  effective_replication_map: make get_range_addresses asynchronous
  range_streamer: add_ranges and friends: get erm as param
  storage_service: get_new_source_ranges: get erm as param
  storage_service: get_changed_ranges_for_leaving: get erm as param
  storage_service: get_ranges_for_endpoint: get erm as param
  repair: use get_non_local_strategy_keyspaces_erms
  database: add get_non_local_strategy_keyspaces_erms
  database: add get_non_local_strategy_keyspaces
  storage_service: coroutinize update_pending_ranges
  effective_replication_map: add get_replication_strategy
  effective_replication_map: get_range_addresses: use the precalculated replication_map
  abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies
  abstract_replication_strategy: reindent
  utils: sequenced_set: expose set and `contains` method
  abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set
  utils: sequenced_set: templatize VectorType
  utils: sanitize sequenced_set
  utils: sequenced_set: delete mutable get_vector method
2022-08-09 13:25:53 +03:00
Botond Dénes
1b669cefed service/storage_proxy: add get_tombstone_limit()
To be used by coordinator side code to determine the correct tombstone
limit to pass to read-command (tombstone limit field added in the next
commit). When this limit is non-zero, the replica will start cutting
pages after the tombstone limit is surpassed.
This getter works similarly to `get_max_result_size()`: if the cluster
feature for empty replica pages is set, it will return the value
configured via db::config::query_tombstone_limit. System queries always
use a limit of 0 (unlimited tombstones).
2022-08-09 10:00:40 +03:00
Benny Halevy
db5c5ca59e database: add get_non_local_strategy_keyspaces_erms
To be used for getting a coheret set of all keyspaces
with non-local replication strategy and their respective
effective_replication_map.

As an example, use it in this patch in
storage_service::update_pending_ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
7ee6048255 database: add get_non_local_strategy_keyspaces
For node operations, we currently call get_non_system_keyspaces
but really want to work on all keyspace that have non-local
replication strategy as they are replicated on other nodes.

Reflect that in the replica::database function name.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
c71ef330b2 query-request, everywhere: define and use query_id as a strong type
Define query_id as a tagged_uuid
So it can be differentiated from other uuid-class types.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:13:28 +03:00
Benny Halevy
2b017ce285 schema, everywhere: define and use table_schema_version as a strong type
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.

Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:45 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Benny Halevy
37b7a9cce2 utils: get rid of joinpoint
Now that it is no longer used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
56f336d1aa database: get rid of timestamp_func
Pass an optional truncated_at time_point to
truncate_table_on_all_shards instead of the over-complicated
timestamp_func that returns the same time_point on all shards
anyhow, and was only used for coordination across shards.

Since now we synchronize the internal execution phase in
truncate_table_on_all_shards, there is no longer need
for this timestamp_func.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
b640c4fd17 database: truncate: snapshot table in all-shards layer
With that the database layer does no longer
need to invoke the private table::snapshot function,
so it can be defriended from class table.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
af0c71aa12 database: truncate: flush table and views in all-shards layer
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
6e07e6b7ac database: truncate: stop and disable compaction in all-shards layer
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
e78dad1dfb database: truncate: move call to set_low_replay_position_mark to all-shards layer
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
a8bd3d97b6 database: truncate: enter per-shard table async_gate in all-shards layer
Start moving the per-shard state establishment logic
to truncate_table_on_all_shards, so that we would evetually
do only the truncate logic per-se in the per-shard truncate function.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
ff028316f2 database: truncate: move check for schema_tables keyspace to all-shards layer.
Now that the per-shard truncate function is called
only from truncate_table_on_all_shards, we can reject the schema_tables
keyspace in the upper layer.  There's no need to check that on each shard.

While at it, reuse `is_system_keyspace`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
fbe1fa1370 database: snapshot_table_on_all_shards: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
4d4ca40c38 table: add snapshot_on_all_shards
Called from the respective database entry points.

Will be called also from the database drop / truncate path
and will be used for central coordination of per-shard
table::snapshot so we don't have to depend on the snapshot_manager
mechanism that is fragile and currently causes abort if we fail
to allocate it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
be56a73e78 database: add snapshot_table_on_all_shards
We need to snapshot a single table in several paths.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
d96b56fee2 database: rename {flush,snapshot}_on_all and make static
Follow the convention of drop_table_on_all_shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
a1eed1a6e9 database: drop_table_on_all_shards: truncate and stop table in upper layer
truncate the table on all shards then stop it on shards
in the upper layer rather than in the per-shard drop_column_family()
function, so we can further refactor truncate later, flushing
and taking snapshot on all shards, before truncating.

With that, rename drop_column_family to detach_columng_family
as now it only deregisters the column family from containers
that refer to it (even via its uuid) and then its caller
is reponsible to take it from there.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
92cb7d448b database: drop_table_on_all_shards: get all table shards before drop_column_family on each
Se we the upper layer can flush, snapshot, and truncate
the table on all shards, step by step.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
0aaaefbb5c database: drop_column_family: define table& cf
To reduce the churn in the following patch
that will pass the table& as a parameter.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
bb1e5ffb8c database: drop_column_family: reuse uuid for evict_all_for_table
cf->schema()->id() is the same one returned
by find_uuid(ks_name, cf_name);

As a follow up, we should define a concrete
table_id type and rename schema::id() to schema::table_id()
to return it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
e800e1e720 database: drop_column_family: move log message up a layer
Print once on "coordinator" shard.

And promote to info level as it's important to log
when we're dropping a table (and if we're going to take a snapshot).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
ca78a63873 database: truncate: get rid of the unused ks param
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
46e2a7c83b database: add truncate_table_on_all_shards
As a first step to decouple truncate from flush
and snpashot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
5e8c05f1a8 database: drop_table_on_all_shards: do not accept a truncated_at
timestamp_func

Since in the drop_table case we want to discard ALL
sstables in the table, not only those with `max_data_age()`
up until drop started.

Fixes #11232

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:52:51 +03:00
Benny Halevy
574909c78f database: truncate: get optional snapshot_name from caller
Before we change drop_table_on_all_shards to always
pass db_clock::time_point::max() in the next patch,
let it pass a unique snapshot name, otherwise
the snapshot name will always be based on the constant, max
time_point.

Refs #11232

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:03:19 +03:00
Benny Halevy
474b2fdf37 database: truncate: fix assert about replay_position low_mark
This assert was tweaked several times:
Introduced in 83323e155e,
then fixed in b2b1a1f7e1 to account
for no rp from discard_sstables, then in
9620755c7f to account for
cases we do not flush the table, then again in
71c5dc82df to make that more accurate.

But, the assert wasn't correct in the first place
in the sense that we first get `low_mark` which
represents the highest replay_position at the time truncate
was called, but then we call discard_sstables with a time_point
of `truncated_at` that we get from the caller via the timestamp_func,
and that one could be in the past, before truncate was called -
hence discard_sstables with that timestamp may very well
return a replay_position from older sstables, prior to flush
that can be smaller than the low_mark.

Fix this assert to account for that case.

The real fix to this issue is to have a truncate_tombstone
that will carry an authoritative api::timstamp (#11230)

Fixes #11231

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 09:18:06 +03:00
Benny Halevy
e4e92d44ae main: start compaction_manager as a sharded service
And pass a reference to it to the database rather
than having the database construct its own compaction_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:50:15 +03:00
Benny Halevy
d55a2ac762 dirty_memory_manager: flush_when_needed: move error handling to flush_one/seal_active_memtable
Currently flush is retried both by dirty_memory_manager::flush_when_needed
and table::seal_active_memtable, which may be called by other paths
like table::flush.

Unify the retry logic into seal_active_memtable so that
we have similar error handling semantics on all paths.

Refs #4174
Refs #10498

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-27 13:43:17 +03:00
Benny Halevy
863e9d9e6a dirty_memory_manager: flush_when_needed: target error handling at flush_one
Now that everything prior to flush_one is noexcept
make table::seal_active_memtable and the paths that call it
noexcept, making sure that any errors are returned only
as exceptional futures, and handle them in flush_when_needed().

The original handle_exception had a broader scope than now needed,
so this change is mostly technical, to show that we can narrow down
the error handling to the continuation of flush_one - and verify that
the unit test is not broken.
A later patch moves this error handling logic away to seal_active_memtable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-27 13:43:17 +03:00
Avi Kivity
fbe8ea7727 logalloc, dirty_memory_manager: move region_group and associated code
region_group is an abstraction that allows accounting for groups of
regions, but the cost/benefit ratio of maintaining the abstraction
is poor. Each time we need to change decision algorithm of memtable
flushing (admittedly rarely), we need to distill that into an abstraction
for region_groups and then use it. An example is virtual regions groups;
we wanted to account for the partially flushed memtables and had to
invent region groups to stand in their place.

Rather than continuing to invest in the abstraction, break it now
and move it to the memtable dirty memory manager which is responsible
for making those decisions. The relevant code is moved to
dirty_memory_manager.hh and dirty_memory_manager.cc (new file), and
a new unit test file is added as well.

A downside of the change is that unit testing will be more difficult.
2022-07-26 11:12:10 +03:00
Igor Ribeiro Barbosa Duarte
3b19bcf1a1 memtable_flush: Make memtable_flush_static_shares liveupdateable
This patch makes memtable_flush_static_shares liveupdateable
to avoid having to restart the cluster after updating
this config.

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-07-19 10:10:46 -03:00
Igor Ribeiro Barbosa Duarte
8dd0f4672d compaction: Make compaction_static_shares liveupdateable
This patch makes compaction_static_shares liveupdateable
to avoid having to restart the cluster after updating
this config.

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-07-19 10:10:46 -03:00
Igor Ribeiro Barbosa Duarte
c2ee6492e6 backlog_controller: Unify backlog_controller constructors
This patch adds the _static_shares variable to the backlog_controller so that
instead of having to use a separate constructor when controller is disabled,
we can use a single constructor and periodically check on the adjust method
if we should use the static shares or the controller. This will be useful on
the next patches to make compaction_static_shares and memtable_flush_static_shares
live updateable.

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-07-19 10:06:12 -03:00
Botond Dénes
9afd2dc428 Merge 'Make compaction manager switch to table abstraction ' from Raphael "Raph" Carvalho
This work gets us a step closer to compaction groups.

Everything in compaction layer but compaction_manager was converted to table_state.

After this work, we can start implementing compaction groups, as each group will be represented by its own table_state. User-triggered operations that span the entire table, not only a group, can be done by calling the manager operation on behalf of each group and then merging the results, if any.

Closes #11028

* github.com:scylladb/scylla:
  compaction: remove forward declaration of replica::table
  compaction_manager: make add() and remove() switch to table_state
  compaction_manager: make run_custom_job() switch to table_state
  compaction_manager: major: switch to table_state
  compaction_manager: scrub: switch to table_state
  compaction_manager: upgrade: switch to table_state
  compaction: table_state: add get_sstables_manager()
  compaction_manager: cleanup: switch to table_state
  compaction_manager: offstrategy: switch to table_state()
  compaction_manager: rewrite_sstables(): switch to table_state
  compaction_manager: make run_with_compaction_disabled() switch to table_state
  compaction_manager: compaction_reenabler: switch to table_state
  compaction_manager: make submit(T) switch to table_state
  compaction_manager: task: switch to table_state
  compaction: table_state: Add is_auto_compaction_disabled_by_user()
  compaction: table_state: Add on_compaction_completion()
  compaction: table_state: Add make_sstable()
  compaction_manager: make can_proceed switch to table_state
  compaction_manager: make stop compaction procedures switch to table_state
  compaction_manager: make get_compactions() switch to table_state
  compaction_manager: change task::update_history() to use table_state instead
  compaction_manager: make can_register_compaction() switch to table_state
  compaction_manager: make get_candidates() switch to table_state
  compaction_manager: make propagate_replacement() switch to table_state
  compaction: Move table::in_strategy_sstables() and switch to table_state
  compaction: table_state: Add maintenance sstable set
  compaction_manager: make has_table_ongoing_compaction() switch to table_state
  compaction_manager: make compaction_disabled() switch to table_state
  compaction_manager: switch to table_state for mapping of compaction_state
  compaction_manager: move task ctor into source
2022-07-18 15:18:29 +03:00
Benny Halevy
bbbbea65fb database: clear_snapshot: remove dropped table directory when it has no remaining snapshots
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
c70a675d77 database: clear_snapshot: make it a coroutine and use thread
and use an async thread around `directory_lister`
rather than `lister::scan_dir` to simplify the implementation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
ae3b1b5a64 database_test: drop_table_with_snapshots: test auto_snapshot
Refactor test_drop_table_with_auto_snapshot out of
drop_table_with_snapshots, adding a auto_snapshot param,
controlling how to configure the cql_test_env db:.config::auto_snapshot,
so we can test both cases - auto_snapshot enabled and disabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
2e37dcf62a database: drop_table_on_all_shards: remove table directory having no snapshots
If the table to remove has no snapshots then
completely remove its directory on storage
as the left-over directory slows down operations on the keyspace
and makes searching for live tables harder.

Fixes #10896

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00