Commit Graph

33038 Commits

Author SHA1 Message Date
Botond Dénes
e82ea2f3ad test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe
Said test creates two vectors, the vector storage being allocated with
the default allocator, while its content being allocated on LSA. If an
exception is thrown however, both are freed via the default allocator,
triggering an assert in LSA code. Move the cleanup into a `defer()` so
the correct cleanup sequence is executed even on exceptions.
2022-09-16 12:16:57 +03:00
Nadav Har'El
77467bcbcd Merge 'test/pylib: APIs to read and modify configuration from tests' from Kamil Braun
We introduce `server_get_config` to fetch the entire configuration dict
and `update_config` to update a value under the given key.

Closes #11493

* github.com:scylladb/scylladb:
  test/pylib: APIs to read and modify configuration from tests
  test/pylib: ScyllaServer: extract _write_config_file function
  test/pylib: ScyllaCluster: extend ActionReturn with dict data
  test/pylib: ManagerClient: introduce _put_json
  test/pylib: ManagerClient: replace `_request` with `_get`, `_get_text`
  test: pylib: store server configuration in `ScyllaServer`
2022-09-14 18:49:55 +03:00
Kefu Chai
2a74a0086f docs: fix typos
* s/udpates/updates/
* s/opetarional/operational/

Signed-off-by: Kefu Chai <tchaikov@gmail.com>

Closes #11541
2022-09-14 17:04:05 +03:00
Kamil Braun
73bf781e17 test/pylib: APIs to read and modify configuration from tests
We introduce `server_get_config` to fetch the entire configuration dict
and `update_config` to update a value under the given key.
2022-09-14 12:46:41 +02:00
Kamil Braun
1f550428a9 test/pylib: ScyllaServer: extract _write_config_file function
For refreshing the on-disk config file with the config stored in dict
form in the `self.config` field.
2022-09-14 12:46:41 +02:00
Kamil Braun
52e52e8503 test/pylib: ScyllaCluster: extend ActionReturn with dict data
For returning types more complex than text. Also specify a default empty
string value for the `msg` field for non-text return values.
2022-09-14 12:46:41 +02:00
Kamil Braun
c9348ae8ea test/pylib: ManagerClient: introduce _put_json
For sending PUT requests to the Manager (such as updating
configuration).
2022-09-14 12:46:41 +02:00
Kamil Braun
d81c722476 test/pylib: ManagerClient: replace _request with _get, _get_text
`_request` performed a GET request and extracted a text body out of the
response.

Split it into `_get`, which only performs the request, and `_get_text`,
which calls `_get` and extracts the body as text.

Also extract a `_resource_uri` function which will be used for other
request types.
2022-09-14 12:46:41 +02:00
Kamil Braun
9d39e14518 test: pylib: store server configuration in ScyllaServer
In following commits we will make this configuration accessible from
tests through the Manager (for fetching and updating).
2022-09-14 12:46:41 +02:00
Nadav Har'El
cf30432715 Merge 'test: add a topology suite with Raft disabled' from Kamil Braun
Add a suite which is basically equivalent to `topology` except that it
doesn't start servers with Raft enabled.

The suite will be used to test the Raft upgrade procedure.

The suite contains a basic test just to check the suite itself can run;
the test will be removed when 'real' tests are added.

Closes #11487

* github.com:scylladb/scylladb:
  test.py: PythonTestSuite: sum default config params with user-provided ones
  test: add a topology suite with Raft disabled
  test: pylib: use Python dicts to manipulate `ScyllaServer` configuration
  test: pylib: store `config_options` in `ScyllaServer`
2022-09-14 13:37:44 +03:00
Pavel Emelyanov
43131976e9 updateable_value: Update comment about cross-shard copying
refs: #7316

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11538
2022-09-14 12:35:56 +02:00
Michał Chojnowski
9b6fc553b4 db: commitlog: don't print INFO logs on shutdown
The intention was for these logs to be printed during the
database shutdown sequence, but it was overlooked that it's not
the only place where commitlog::shutdown is called.
Commitlogs are started and shut down periodically by hinted handoff.
When that happens, these messages spam the log.

Fix that by adding INFO commitlog shutdown logs to database::stop,
and change the level of the commitlog::shutdown log call to DEBUG.

Fixes #11508

Closes #11536
2022-09-14 11:30:53 +03:00
Avi Kivity
a24a8fd595 Update seastar submodule
* seastar cbb0e888d8...601e0776c0 (1):
  > coroutine: explain and mitigate the lambda coroutine fiasco

Closes #11537
2022-09-13 22:37:29 +03:00
Alejo Sanchez
6799e766ca test.py: topology increment timeouts even more
Due to slow debug machines timing out, bump up all timeouts
significantly.

The cause was ExecutionProfile request_timeout. Also set a high
heartbeat timeout and bump already set timeouts to be safe, too.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11516
2022-09-13 11:57:31 +02:00
Piotr Dulikowski
e69b44a60f exception: fix the error code used for rate_limit_exception
Per-partition rate limiting added a new error type which should be
returned when Scylla decides to reject an operation due to per-partition
rate limit being exceeded. The new error code requires drivers to
negotiate support for it, otherwise Scylla will report the error as
`Config_error`. The existing error code override logic works properly,
however due to a mistake Scylla will report the `Config_error` code even
if the driver correctly negotiated support for it.

This commit fixes the problem by specifying the correct error code in
`rate_limit_exception`'s constructor.

Tested manually with a modified version of the Rust driver which
negotiates support for the new error. Additionally, tested what happens
when the driver doesn't negotiate support (Scylla properly falls back to
`Config_error`).

Branches: 5.1
Fixes: #11517

Closes #11518
2022-09-13 11:46:15 +02:00
Nadav Har'El
8ece63c433 Merge 'Safemode - Introduce TimeWindowCompactionStrategy Guardrails'
This series introduces two configurable options when working with TWCS tables:

- `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting
- `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check.

Refs: #6923
Fixes: #9029

Closes #11445

* github.com:scylladb/scylladb:
  tests: cql_query_test: add mixed tests for verifying TWCS guard rails
  tests: cql_query_test: add test for TWCS window size
  tests: cql_query_test: add test for TWCS tables with no TTL defined
  cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables
  cql: add max window restriction for TimeWindowCompactionStrategy
  time_window_compaction_strategy: reject invalid window_sizes
  cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr
2022-09-12 23:55:51 +03:00
Botond Dénes
045b053228 Update seastar submodule
* seastar 2b2f6c08...cbb0e888 (10):
  > memory: allow user to select allocator to be used at runtime
  > perftune.py: correct typos
  > Merge 'seastar-addr2line: support more flexible syslog-style backtraces' from Benny Halevy
  > Fix instruction count for start_measuring_time
  > build: s/c-ares::c-ares/c-ares::cares/
  > Merge 'shared_ptr_debug_helper: turn assert into on_internal_error_abort' from Benny Halevy
  > test: fix use after free in the loopback socket
  > doc/tutorial.md: fix docker command for starting hello-world_demo
  > httpd: add a ctor without addr parameter
  > dns: dns_resolver: sock_entry: move-construct tcp/udp entries in place

Closes #11526
2022-09-12 18:34:22 +03:00
Avi Kivity
62ac3432c9 Merge "Always notify dropped RPC connections" from Pavel E
"
This set makes messaging service notify connection drop listeners
when connection is dropped for _any_ reason and cleans things up
around it afterwards
"

* 'br-messaging-notify-connection-drop' of https://github.com/xemul/scylla:
  messaging_service: Relax connection drop on re-caching
  messaging_service: Simplify remove_rpc_client_one()
  messaging_service: Notify connection drop when connection is removed
2022-09-12 17:02:51 +03:00
Yaron Kaikov
27e326652b build_docker.sh:fix python2 dependency
Following the revert of b004da9d1b which solved https://github.com/scylladb/scylla-pkg/issues/3094

updating docker dependency to match `scylla-tools-java` requirements

Closes #11522
2022-09-12 13:33:06 +03:00
Kamil Braun
2fe3e67a47 gms: feature_service: don't distinguish between 'known' and 'supported' features
`feature_service` provided two sets of features: `known_feature_set` and
`supported_feature_set`. The purpose of both and the distinction between
them was unclear and undocumented.

The 'supported' features were gossiped by every node. Once a feature is
supported by every node in the cluster, it becomes 'enabled'. This means
that whatever piece of functionality is covered by the feature, it can
by used by the cluster from now on.

The 'known' set was used to perform feature checks on node start; if the
node saw that a feature is enabled in the cluster, but the node does not
'know' the feature, it would refuse to start. However, if the feature
was 'known', but wasn't 'supported', the node would not complain. This
means that we could in theory allow the following scenario:
1. all nodes support feature X.
2. X becomes enabled in the cluster.
3. the user changes the configuration of some node so feature X will
   become unsupported but still known.
4. The node restarts without error.

So now we have a feature X which is enabled in the cluster, but not
every node supports it. That does not make sense.

It is not clear whether it was accidental or purposeful that we used the
'known' set instead of the 'supported' set to perform the feature check.

What I think is clear, is that having two sets makes the entire thing
unnecessarily complicated and hard to think about.

Fortunately, at the base to which this patch is applied, the sets are
always the same. So we can easily get rid of one of them.

I decided that the name which should stay is 'supported', I think it's
more specific than 'known' and it matches the name of the corresponding
gossiper application state.

Closes #11512
2022-09-12 13:09:12 +03:00
Takuya ASADA
cd5320fe60 install.sh: add --without-systemd option
Since we fail to write files to $USER/.config on Jenkins jobs, we need
an option to skip installing systemd units.
Let's add --without-systemd to do that.

Also, to detect the option availability, we need to increment
relocatable package version.

See scylladb/scylla-dtest#2819

Closes #11345
2022-09-12 13:04:00 +03:00
Avi Kivity
521127a253 Update tools/jmx submodule
* tools/jmx 06f2735...88d9bdc (1):
  > install.sh: add --without-systemd option
2022-09-12 13:02:16 +03:00
Kamil Braun
ce7bb8b6d0 test.py: PythonTestSuite: sum default config params with user-provided ones
Previously, if the suite.yaml file provided
`extra_scylla_config_options` but didn't provide values for `authorizer`
or `authenticator` inside the config options, the harness wouldn't give
any defaults for these keys. It would only provide defaults for these
keys if suite.yaml didn't specify `extra_scylla_config_options` at all.

It makes sense to give the user the ability to provide extra options
while relying on harness defaults for `authenticator` and `authorizer`
if the user doesn't care about them.
2022-09-12 11:58:05 +02:00
Kamil Braun
1661fe9f37 test: add a topology suite with Raft disabled
Add a suite which is basically equivalent to `topology` except that it
doesn't start servers with Raft enabled.

The suite will be used to test the Raft upgrade procedure.

The suite contains a basic test just to check the suite itself can run;
the test will be removed when 'real' tests are added.
2022-09-12 11:58:05 +02:00
Kamil Braun
311806244d test: pylib: use Python dicts to manipulate ScyllaServer configuration
Previously we used a formattable string to represent the configuration;
values in the string were substituted by Python's formatting mechanism
and the resulting string was stored to obtain the config file.

This approach had some downsides, e.g. it required boilerplate work to
extend: to add a new config options, you would have to modify this
template string.

Instead we can represent the configuration as a Python dictionary. Dicts
are easy to manipulate, for example you can sum two dicts; if a key
appears in both, the second dict 'wins':
```
{1:1} | {1:2} == {1:2}
```

This makes the configuration easy to extend without having to write
boilerplate: if the user of `ScyllaServer` wants to add or override a
config option, they can simply add it to the `config_options` dict and
that's it - no need to modify any internal template strings in
`ScyllaServer` implementation like before. The `config_options` dict is
simply summed with the 'base' config dict of `ScyllaServer`
(`config_options` is the right summand so anything in there overrides
anything in the base dict).

An example of this extensibility is the `authenticator` and `authorizer`
options which no longer appear in `scylla_cluster.py` module after this
change, they only appear in the suite.yaml file.

Also, use "workdir" option instead of specifying data dir, commitlog
dir etc. separately.
2022-09-12 11:57:58 +02:00
Kamil Braun
fd19825eaa test: pylib: store config_options in ScyllaServer
Previously the code extracted `authenticator` and `authorizer` keys from
the config options and stored them.

Store the entire dict instead. The new code is easier to extend if we
want to make more options configurable.
2022-09-12 11:57:18 +02:00
Pavel Emelyanov
5663b3eda5 messaging_service: Relax connection drop on re-caching
When messaging_service::get_rpc_client() picks up cached socket and
notices error on it, it drops the connection and creates a new one. The
method used to drop the connection is the one that re-lookups the verb
index again, which is excessive. Tune this up while at it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-12 12:05:02 +03:00
Botond Dénes
a0392bc1eb Merge 'doc: update the default SStable format' from Anna Stuchlik
The purpose of this PR is to update the information about the default SStable format.
It

Closes #11431

* github.com:scylladb/scylladb:
  doc: simplify the information about default formats in different versions
  doc: update the SSTables 3.0 Statistics File Format to add the UUID host_id option of the ME format
  doc: add the information regarding the ME format to the SSTables 3.0 Data File Format page
  doc: fix additional information regarding the ME format on the SStable 3.x page
  doc: add the ME format to the table
  add a comment to remove the information when the documentation is versioned (in 5.1)
  doc: replace Scylla with ScyllaDB
  doc: fix the formatting and language in the updated section
  doc: fix the default SStable format
2022-09-12 09:50:01 +03:00
Pavel Emelyanov
f3dfc9dbd4 system_keyspace: Don't load preferred IPs if not asked for
If snitch->prefer_local() is false, advertised (via gossiper)
INTERNAL_IPs are not suggested to messaging service to use. The same
should apply to boot-time when messaging service is loaded with those
IPs taken from the system.peers table.

fixes: #11353
tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2172/

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220909144800.23122-1-xemul@scylladb.com>
2022-09-12 09:48:23 +03:00
Botond Dénes
9db940ff1b Merge "Make network_topology_strategy_test use topology" from Pavel Emelyanov
"
The test in question plays with snitches to simulate the topology
over which tokens are spread. This set replaces explicit snitch
usage with temporary topology object.

Some snitch traces are still left, but those are for token_metadata
internal which still call global snitch for DC/RACK.
"

* 'br-tests-use-topology-not-snitch' of https://github.com/xemul/scylla:
  network_topology_strategy_test: Use topology instead of snitch
  network_topology_strategy_test: Populate explicit topology
2022-09-12 09:40:17 +03:00
Avi Kivity
6c797587c7 dirty_memory_manager: region_group: remove sorting of subgroups
dirty_memory_manager tracks lsa regions (memtables) under region_group:s,
in order to be able to pick up the largest memtable as a candidate for
flushing.

Just as region_group:s contain regions, they can also contain other
region_group:s in a nested structure. It also tracks the nested region_group
that contains the largest region in a binomial heap.

This latter facility is no longer used. It saw use when we had the system
dirty_memory_manager nested under the user dirty_memory_manager, but
that proved too complicated so it was undone. We still nest a virtual
region_group under the real region_group, and in fact it is the
virtual region_group that holds the memtables, but it is accessed
directly to find the largest memtable (region_group::get_largest_region)
and so all the mechanism that sorts region_group:s is bypassed.

Start to dismantle this house of cards by removing the subgroup
sorting. Since the hierarchy has exactly one parent and one child,
it's clearly useless. This is seen by the fact that we can just remove
everything related.

We still need the _subgroups member to hold the virtual region_group;
it's replaced by a vector. I verified that the non-intrusive vector
is exception safe since push_back() happens at the very end; in any
case this is early during setup where we aren't under memory pressure.

A few tests that check the removed functionality are deleted.

Closes #11515
2022-09-12 09:29:08 +03:00
Botond Dénes
0e2d6cfd61 Merge 'Introduce Compaction Groups' from Raphael "Raph" Carvalho
Compaction group can be defined as a set of files that can be compacted together. Today, all sstables belonging to a table in a given shard belong to the same group. So we can say there's one group per table per shard. As we want to eventually allow isolation of data that shouldn't be mixed, e.g. data from different vnodes, then we want to have more than one group per table per shard. That's why compaction groups is being introduced here.

Today, all memtables and sstables are stored in a single structure per table. After compaction groups, there will be memtables and sstables for each group in the table.

As we're taking an incremental approach, table still supports a single group. But work was done on preparing table for supporting multiple groups. Completing that work is actually the next step. Also, a procedure for deriving the group from token is introduced, but today it always return the single group owned by the table. Once multiple groups are supported, then that procedure should be implemented to map a token to a group.

No semantics was changed by this series.

Closes #11261

* github.com:scylladb/scylladb:
  replica: Move memtables to compaction_group
  replica: move compound SSTable set to compaction group
  replica: move maintenance SSTable set to compaction_group
  replica: move main SSTable set to compaction_group
  replica: Introduce compaction_group
  replica: convert table::stop() into coroutine
  compaction_manager: restore indentation
  compaction_manager: Make remove() and stop_ongoing_compactions() noexcept
  test: sstable_compaction_test: Don't reference main sstable set directly
  test: sstable_utils: Set data size fields for fake SSTable
  test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable
2022-09-12 09:28:44 +03:00
Botond Dénes
5374f0edbf Merge 'Task manager' from Aleksandra Martyniuk
Task manager for observing and managing long-running, asynchronous tasks in Scylla
with the interface for the user. It will allow listing of tasks, getting detailed
task status and progression, waiting for their completion, and aborting them.
The task manager will be configured with a “task ttl” that determines how long
the task status is kept in memory after the task completes.

At first it will support repair and compaction tasks, and possibly more in the future.

Currently:
Sharded `task_manager` is started in `main.cc` where it is further passed
to `http_context` for the purpose of user interface.

Task manager's tasks are implemented in two two layers: the abstract
and the implementation one. The latter is a pure virtual class which needs
to be overriden by each module. Abstract layer provides the methods that
are shared by all modules and the access to module-specific methods.

Each module can access task manager, create and manage its tasks through
`task_manager::module` object. This way data specific to a module can be
separated from the other modules.

User can access task manager rest api interface to track asynchronous tasks.
The available options consist of:
- getting a list of modules
- getting a list of basic stats of all tasks in the requested module
- getting the detailed status of the requested task
- aborting the requested task
- waiting for the requested task to finish

To enable testing of the provided api, test specific task implementation and module
are provided. Their lifetime can be simulated with the standalone test api.
These components are compiled and the tests are run in all but release build modes.

Fixes: #9809

Closes #11216

* github.com:scylladb/scylladb:
  test: task manager api test
  task_manager: test api layer implementation
  task_manager: add test specific classes
  task_manager: test api layer
  task_manager: api layer implementation
  task_manager: api layer
  task_manager: keep task_manager reference in http_context
  start sharded task manager
  task_manager: create task manager object
2022-09-12 09:26:46 +03:00
Felipe Mendes
6a3d8607b4 tests: cql_query_test: add mixed tests for verifying TWCS guard rails
This patch adds set of 10 cenarios that have been unveiled during additional testing.
In particular, most of the scenarios cover ALTER TABLE statements, which - if not handled -
may break the guardrails safe-mode. The situations covered are:

- STCS->TWCS with no TTL defined
- STCS->TWCS with small TTL
- STCS->TWCS with large TTL value
- TWCS table with small to large TTL
- No TTL TWCS to large TTL and then small TTL
- twcs_max_window_count LiveUpdate - Decrease TTL
- twcs_max_window_count LiveUpdate - Switch CompactionStrategy
- No TTL TWCS table to STCS
- Large TTL TWCS table, modify attribute other than compaction and default_time_to_live
- Large TTL STCS table, fail to switch to TWCS with no TTL explicitly defined
2022-09-11 17:57:14 -03:00
Felipe Mendes
a7a91e3216 tests: cql_query_test: add test for TWCS window size
This patch adds a test for checking the validity of tables using TimeWindowCompactionStrategy
with an incorrect number of compaction windows.

The twcs_max_window_count LiveUpdate-able parameter is also disabled during the execution of the
test in order to ensure that users can effectively disable the enforcement, should they want.
2022-09-11 17:38:25 -03:00
Felipe Mendes
1c5d46877e tests: cql_query_test: add test for TWCS tables with no TTL defined
This patch adds a testcase for TimeWindowCompactionStrategy tables created with no
default_time_to_live defined. It makes use of the LiveUpdate-able restrict_twcs_default_ttl
parameter in order to determine whether TWCS tables without TTL should be forbidden or not.

The test replays all 3 possible variations of the tri_mode_restriction and verifies tables
are correctly created/altered according to the current setting on the replica which receives
the request.
2022-09-11 16:55:46 -03:00
Felipe Mendes
7fec4fcaa6 cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables
TimeWindowCompactionStrategy (TWCS) tables are known for being used explicitly for time-series workloads. In particular, most of the time users should specify a default_time_to_live during table creation to ensure data is expired such as in a sliding window. Failure to do so may create unbounded windows - which - depending on the compaction window chosen, may introduce severe latency and operational problems, due to unbounded window growth.

However, there may be some use cases which explicitly ingest data by using the `USING TTL` keyword, which effectively has the same effect. Therefore, we can not simply forbid table creations without a default_time_to_live explicitly set to any value other than 0.

The new restrict_twcs_without_default_ttl option has three values: "true", "false", and "warn":

We default to "warn", which will notify the user of the consequences when creating a TWCS table without a default_time_to_live value set. However, users are encouraged to switch it to "true", as - ideally - a default_time_to_live value should always be expected to prevent applications failing to ingest data against the database ommitting the `USING TTL` keyword.
2022-09-11 16:50:42 -03:00
Felipe Mendes
a3356e866b cql: add max window restriction for TimeWindowCompactionStrategy
The number of potential compaction windows (or buckets) is defined by the default_time_to_live / sstable_window_size ratio. Every now and then we end up in a situation on where users of TWCS end up underestimating their window buckets when using TWCS. Unfortunately, scenarios on which one employs a default_time_to_live setting of 1 year but a window size of 30 minutes are not rare enough.

Such configuration is known to only make harm to a workload: As more and more windows are created, the number of SSTables will grow in the same pace, and the situation will only get worse as the number of shards increase.

This commit introduces the twcs_max_window_count option, which defaults to 50, and will forbid the Creation or Alter of tables which get past this threshold. A value of 0 will explicitly skip this check.

Note: this option does not forbid the creation of tables with a default_time_to_live=0 as - even though not recommended - it is perfectly possible for a TWCS table with default TTL=0 to have a bound window, provided any ingestion statements make use of 'USING TTL' within the CQL statement, in addition to it.
2022-09-11 16:50:22 -03:00
Felipe Mendes
f1ffb501f0 time_window_compaction_strategy: reject invalid window_sizes
Scylla mistakenly allows an user to configure an invalid TWCS window_size <= 0, which effectively breaks the notion of compaction windows.
Interestingly enough, a <= 0 window size should be considered an undefined behavior as either we would create a new window every 0 duration (?) or the table would behave as STCS, the reader is encouraged to figure out which one of these is true. :-)

Cassandra, on the other hand, will properly throw a ConfigurationException when receiving such invalid window sizes and we now match the behavior to the same as Cassandra's.

Refs: #2336
2022-09-11 16:40:03 -03:00
Raphael S. Carvalho
f5715d3f0b replica: Move memtables to compaction_group
Now memtables live in compaction_group. Also introduced function
that selects group based on token, but today table always return
the single group managed by it. Once multiple groups are supported,
then the function should interpret token content to select the
group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
f4579795e6 replica: move compound SSTable set to compaction group
The group is now responsible for providing the compound set.
table still has one compound set, which will span all groups for
the cases we want to ignore the group isolation.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
6717d96684 replica: move maintenance SSTable set to compaction_group
This commit is restricted to moving maintenance set into compaction_group.
Next, we'll introduce compound set into it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
ce8e5f354c replica: move main SSTable set to compaction_group
This commit is restricted to moving main set into compaction_group.
Next, we'll move maintenance set into it and finally the memtable.

A method is introduced to figure out which group a sstable belongs
to, but it's still unimplemented as table is still limited to
a single group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
4871f1c97c replica: Introduce compaction_group
Compaction group is a new abstraction used to group SSTables
that are eligible to be compacted together. By this definition,
a table in a given shard has a single compaction group.
The problem with this approach is that data from different vnodes
is intermixed in the same sstable, making it hard to move data
in a given sstable around.
Therefore, we'll want to have multiple groups per table.

A group can be thought of an isolated LSM tree where its memtable
and sstable files are isolated from other groups.

As for the implementation, the idea is to take a very incremental
approach.
In this commit, we're introducing a single compaction group to
table.
Next, we'll migrate sstable and maintenance set from table
into that single compaction group. And finally, the memtable.

Cache will be shared among the groups, for simplicity.
It works due to its ability to invalidate a subset of the
token range.

There will be 1:1 relationship between compaction_group and
table_state.
We can later rename table_state to compaction_group_state.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
a6ecadf3de replica: convert table::stop() into coroutine
await_pending_ops() is today marked noexcept, so doesn't have to
be implemented with finally() semantics.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
44913ebbd0 compaction_manager: restore indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
888660fa44 compaction_manager: Make remove() and stop_ongoing_compactions() noexcept
stop_ongoing_compactions() is made noexcept too as it's called from
remove() and we want to make the latter noexcept, to allow compaction
group to qualify its stop function as noexcept too.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
65414e6756 test: sstable_compaction_test: Don't reference main sstable set directly
Preparatory change for main sstable set to be moved into compaction
group. After that, tests can no longer direct access the main
set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
dfa7273127 test: sstable_utils: Set data size fields for fake SSTable
So methods that look at data size and require it to be higher than 0
will work on fake SSTables created using set_values().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
4fa8159a13 test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable
column_family_test::add_sstable will soon be changed to run in a thread,
and it's not needed in this procedure, so let's remove its usage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00