Commit Graph

33025 Commits

Author SHA1 Message Date
Pavel Emelyanov
2c74062962 messaging_service: Fix gossiper verb group
When configuring tcp-nodelay unconditionally, messaging service thinks
gossiper uses group index 1, though it had changed some time ago and now
those verbs belong to group 0.

fixes: #11465

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-14 20:40:47 +03:00
Pavel Emelyanov
7bdad47de2 messaging_service: Mind the absence of topology data when creating sockets
When a socket is created to serve a verb there may be no topology
information regarding the target node. In this case current code
configures socket as if the peer node lived in "default" dc and rack of
the same name. If topology information appears later, the client is not
re-connected, even though it could providing more relevant configuration
(e.g. -- w/o encryption)

This patch checks if the topology info is needed (sometimes it's not)
and if missing it configures the socket in the most restrictive manner,
but notes that the socket ignored the topology on creation. When
topology info appears -- and this happens when a node joins the cluster
-- the messaging service is kicked to drop all sockets that ignored the
topology, so thay they reconnect later.

The mentioned "kick" comes from storage service on-join notification.
More correct fix would be if topology had on-change notification and
messaging service subscribed on it, but there are two cons:

- currently dc/rack do not change on the fly (though they can, e.g. if
  gossiping property file snitch is updated without restart) and
  topology update effectively comes from a single place

- updating topology on token-metadata is not like topology.update()
  call. Instead, a clone of token metadata is created, then update
  happens on the clone, then the clone is committed into t.m. Though
  it's possible to find out commit-time which nodes changed their
  topology, but since it only happens on join this complexity likely
  doesn't worth the effort (yet)

fixes: #11514
fixes: #11492
fixes: #11483

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-14 20:30:51 +03:00
Pavel Emelyanov
5ffc9d66ec messaging_service: Templatize and rename remove_rpc_client_one
It actually finds and removes a client and in its new form it also
applies filtering function it, so some better name is called for

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-14 20:30:07 +03:00
Avi Kivity
a24a8fd595 Update seastar submodule
* seastar cbb0e888d8...601e0776c0 (1):
  > coroutine: explain and mitigate the lambda coroutine fiasco

Closes #11537
2022-09-13 22:37:29 +03:00
Alejo Sanchez
6799e766ca test.py: topology increment timeouts even more
Due to slow debug machines timing out, bump up all timeouts
significantly.

The cause was ExecutionProfile request_timeout. Also set a high
heartbeat timeout and bump already set timeouts to be safe, too.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #11516
2022-09-13 11:57:31 +02:00
Piotr Dulikowski
e69b44a60f exception: fix the error code used for rate_limit_exception
Per-partition rate limiting added a new error type which should be
returned when Scylla decides to reject an operation due to per-partition
rate limit being exceeded. The new error code requires drivers to
negotiate support for it, otherwise Scylla will report the error as
`Config_error`. The existing error code override logic works properly,
however due to a mistake Scylla will report the `Config_error` code even
if the driver correctly negotiated support for it.

This commit fixes the problem by specifying the correct error code in
`rate_limit_exception`'s constructor.

Tested manually with a modified version of the Rust driver which
negotiates support for the new error. Additionally, tested what happens
when the driver doesn't negotiate support (Scylla properly falls back to
`Config_error`).

Branches: 5.1
Fixes: #11517

Closes #11518
2022-09-13 11:46:15 +02:00
Nadav Har'El
8ece63c433 Merge 'Safemode - Introduce TimeWindowCompactionStrategy Guardrails'
This series introduces two configurable options when working with TWCS tables:

- `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting
- `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check.

Refs: #6923
Fixes: #9029

Closes #11445

* github.com:scylladb/scylladb:
  tests: cql_query_test: add mixed tests for verifying TWCS guard rails
  tests: cql_query_test: add test for TWCS window size
  tests: cql_query_test: add test for TWCS tables with no TTL defined
  cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables
  cql: add max window restriction for TimeWindowCompactionStrategy
  time_window_compaction_strategy: reject invalid window_sizes
  cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr
2022-09-12 23:55:51 +03:00
Botond Dénes
045b053228 Update seastar submodule
* seastar 2b2f6c08...cbb0e888 (10):
  > memory: allow user to select allocator to be used at runtime
  > perftune.py: correct typos
  > Merge 'seastar-addr2line: support more flexible syslog-style backtraces' from Benny Halevy
  > Fix instruction count for start_measuring_time
  > build: s/c-ares::c-ares/c-ares::cares/
  > Merge 'shared_ptr_debug_helper: turn assert into on_internal_error_abort' from Benny Halevy
  > test: fix use after free in the loopback socket
  > doc/tutorial.md: fix docker command for starting hello-world_demo
  > httpd: add a ctor without addr parameter
  > dns: dns_resolver: sock_entry: move-construct tcp/udp entries in place

Closes #11526
2022-09-12 18:34:22 +03:00
Avi Kivity
62ac3432c9 Merge "Always notify dropped RPC connections" from Pavel E
"
This set makes messaging service notify connection drop listeners
when connection is dropped for _any_ reason and cleans things up
around it afterwards
"

* 'br-messaging-notify-connection-drop' of https://github.com/xemul/scylla:
  messaging_service: Relax connection drop on re-caching
  messaging_service: Simplify remove_rpc_client_one()
  messaging_service: Notify connection drop when connection is removed
2022-09-12 17:02:51 +03:00
Yaron Kaikov
27e326652b build_docker.sh:fix python2 dependency
Following the revert of b004da9d1b which solved https://github.com/scylladb/scylla-pkg/issues/3094

updating docker dependency to match `scylla-tools-java` requirements

Closes #11522
2022-09-12 13:33:06 +03:00
Kamil Braun
2fe3e67a47 gms: feature_service: don't distinguish between 'known' and 'supported' features
`feature_service` provided two sets of features: `known_feature_set` and
`supported_feature_set`. The purpose of both and the distinction between
them was unclear and undocumented.

The 'supported' features were gossiped by every node. Once a feature is
supported by every node in the cluster, it becomes 'enabled'. This means
that whatever piece of functionality is covered by the feature, it can
by used by the cluster from now on.

The 'known' set was used to perform feature checks on node start; if the
node saw that a feature is enabled in the cluster, but the node does not
'know' the feature, it would refuse to start. However, if the feature
was 'known', but wasn't 'supported', the node would not complain. This
means that we could in theory allow the following scenario:
1. all nodes support feature X.
2. X becomes enabled in the cluster.
3. the user changes the configuration of some node so feature X will
   become unsupported but still known.
4. The node restarts without error.

So now we have a feature X which is enabled in the cluster, but not
every node supports it. That does not make sense.

It is not clear whether it was accidental or purposeful that we used the
'known' set instead of the 'supported' set to perform the feature check.

What I think is clear, is that having two sets makes the entire thing
unnecessarily complicated and hard to think about.

Fortunately, at the base to which this patch is applied, the sets are
always the same. So we can easily get rid of one of them.

I decided that the name which should stay is 'supported', I think it's
more specific than 'known' and it matches the name of the corresponding
gossiper application state.

Closes #11512
2022-09-12 13:09:12 +03:00
Takuya ASADA
cd5320fe60 install.sh: add --without-systemd option
Since we fail to write files to $USER/.config on Jenkins jobs, we need
an option to skip installing systemd units.
Let's add --without-systemd to do that.

Also, to detect the option availability, we need to increment
relocatable package version.

See scylladb/scylla-dtest#2819

Closes #11345
2022-09-12 13:04:00 +03:00
Avi Kivity
521127a253 Update tools/jmx submodule
* tools/jmx 06f2735...88d9bdc (1):
  > install.sh: add --without-systemd option
2022-09-12 13:02:16 +03:00
Pavel Emelyanov
5663b3eda5 messaging_service: Relax connection drop on re-caching
When messaging_service::get_rpc_client() picks up cached socket and
notices error on it, it drops the connection and creates a new one. The
method used to drop the connection is the one that re-lookups the verb
index again, which is excessive. Tune this up while at it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-12 12:05:02 +03:00
Botond Dénes
a0392bc1eb Merge 'doc: update the default SStable format' from Anna Stuchlik
The purpose of this PR is to update the information about the default SStable format.
It

Closes #11431

* github.com:scylladb/scylladb:
  doc: simplify the information about default formats in different versions
  doc: update the SSTables 3.0 Statistics File Format to add the UUID host_id option of the ME format
  doc: add the information regarding the ME format to the SSTables 3.0 Data File Format page
  doc: fix additional information regarding the ME format on the SStable 3.x page
  doc: add the ME format to the table
  add a comment to remove the information when the documentation is versioned (in 5.1)
  doc: replace Scylla with ScyllaDB
  doc: fix the formatting and language in the updated section
  doc: fix the default SStable format
2022-09-12 09:50:01 +03:00
Pavel Emelyanov
f3dfc9dbd4 system_keyspace: Don't load preferred IPs if not asked for
If snitch->prefer_local() is false, advertised (via gossiper)
INTERNAL_IPs are not suggested to messaging service to use. The same
should apply to boot-time when messaging service is loaded with those
IPs taken from the system.peers table.

fixes: #11353
tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2172/

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220909144800.23122-1-xemul@scylladb.com>
2022-09-12 09:48:23 +03:00
Botond Dénes
9db940ff1b Merge "Make network_topology_strategy_test use topology" from Pavel Emelyanov
"
The test in question plays with snitches to simulate the topology
over which tokens are spread. This set replaces explicit snitch
usage with temporary topology object.

Some snitch traces are still left, but those are for token_metadata
internal which still call global snitch for DC/RACK.
"

* 'br-tests-use-topology-not-snitch' of https://github.com/xemul/scylla:
  network_topology_strategy_test: Use topology instead of snitch
  network_topology_strategy_test: Populate explicit topology
2022-09-12 09:40:17 +03:00
Avi Kivity
6c797587c7 dirty_memory_manager: region_group: remove sorting of subgroups
dirty_memory_manager tracks lsa regions (memtables) under region_group:s,
in order to be able to pick up the largest memtable as a candidate for
flushing.

Just as region_group:s contain regions, they can also contain other
region_group:s in a nested structure. It also tracks the nested region_group
that contains the largest region in a binomial heap.

This latter facility is no longer used. It saw use when we had the system
dirty_memory_manager nested under the user dirty_memory_manager, but
that proved too complicated so it was undone. We still nest a virtual
region_group under the real region_group, and in fact it is the
virtual region_group that holds the memtables, but it is accessed
directly to find the largest memtable (region_group::get_largest_region)
and so all the mechanism that sorts region_group:s is bypassed.

Start to dismantle this house of cards by removing the subgroup
sorting. Since the hierarchy has exactly one parent and one child,
it's clearly useless. This is seen by the fact that we can just remove
everything related.

We still need the _subgroups member to hold the virtual region_group;
it's replaced by a vector. I verified that the non-intrusive vector
is exception safe since push_back() happens at the very end; in any
case this is early during setup where we aren't under memory pressure.

A few tests that check the removed functionality are deleted.

Closes #11515
2022-09-12 09:29:08 +03:00
Botond Dénes
0e2d6cfd61 Merge 'Introduce Compaction Groups' from Raphael "Raph" Carvalho
Compaction group can be defined as a set of files that can be compacted together. Today, all sstables belonging to a table in a given shard belong to the same group. So we can say there's one group per table per shard. As we want to eventually allow isolation of data that shouldn't be mixed, e.g. data from different vnodes, then we want to have more than one group per table per shard. That's why compaction groups is being introduced here.

Today, all memtables and sstables are stored in a single structure per table. After compaction groups, there will be memtables and sstables for each group in the table.

As we're taking an incremental approach, table still supports a single group. But work was done on preparing table for supporting multiple groups. Completing that work is actually the next step. Also, a procedure for deriving the group from token is introduced, but today it always return the single group owned by the table. Once multiple groups are supported, then that procedure should be implemented to map a token to a group.

No semantics was changed by this series.

Closes #11261

* github.com:scylladb/scylladb:
  replica: Move memtables to compaction_group
  replica: move compound SSTable set to compaction group
  replica: move maintenance SSTable set to compaction_group
  replica: move main SSTable set to compaction_group
  replica: Introduce compaction_group
  replica: convert table::stop() into coroutine
  compaction_manager: restore indentation
  compaction_manager: Make remove() and stop_ongoing_compactions() noexcept
  test: sstable_compaction_test: Don't reference main sstable set directly
  test: sstable_utils: Set data size fields for fake SSTable
  test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable
2022-09-12 09:28:44 +03:00
Botond Dénes
5374f0edbf Merge 'Task manager' from Aleksandra Martyniuk
Task manager for observing and managing long-running, asynchronous tasks in Scylla
with the interface for the user. It will allow listing of tasks, getting detailed
task status and progression, waiting for their completion, and aborting them.
The task manager will be configured with a “task ttl” that determines how long
the task status is kept in memory after the task completes.

At first it will support repair and compaction tasks, and possibly more in the future.

Currently:
Sharded `task_manager` is started in `main.cc` where it is further passed
to `http_context` for the purpose of user interface.

Task manager's tasks are implemented in two two layers: the abstract
and the implementation one. The latter is a pure virtual class which needs
to be overriden by each module. Abstract layer provides the methods that
are shared by all modules and the access to module-specific methods.

Each module can access task manager, create and manage its tasks through
`task_manager::module` object. This way data specific to a module can be
separated from the other modules.

User can access task manager rest api interface to track asynchronous tasks.
The available options consist of:
- getting a list of modules
- getting a list of basic stats of all tasks in the requested module
- getting the detailed status of the requested task
- aborting the requested task
- waiting for the requested task to finish

To enable testing of the provided api, test specific task implementation and module
are provided. Their lifetime can be simulated with the standalone test api.
These components are compiled and the tests are run in all but release build modes.

Fixes: #9809

Closes #11216

* github.com:scylladb/scylladb:
  test: task manager api test
  task_manager: test api layer implementation
  task_manager: add test specific classes
  task_manager: test api layer
  task_manager: api layer implementation
  task_manager: api layer
  task_manager: keep task_manager reference in http_context
  start sharded task manager
  task_manager: create task manager object
2022-09-12 09:26:46 +03:00
Felipe Mendes
6a3d8607b4 tests: cql_query_test: add mixed tests for verifying TWCS guard rails
This patch adds set of 10 cenarios that have been unveiled during additional testing.
In particular, most of the scenarios cover ALTER TABLE statements, which - if not handled -
may break the guardrails safe-mode. The situations covered are:

- STCS->TWCS with no TTL defined
- STCS->TWCS with small TTL
- STCS->TWCS with large TTL value
- TWCS table with small to large TTL
- No TTL TWCS to large TTL and then small TTL
- twcs_max_window_count LiveUpdate - Decrease TTL
- twcs_max_window_count LiveUpdate - Switch CompactionStrategy
- No TTL TWCS table to STCS
- Large TTL TWCS table, modify attribute other than compaction and default_time_to_live
- Large TTL STCS table, fail to switch to TWCS with no TTL explicitly defined
2022-09-11 17:57:14 -03:00
Felipe Mendes
a7a91e3216 tests: cql_query_test: add test for TWCS window size
This patch adds a test for checking the validity of tables using TimeWindowCompactionStrategy
with an incorrect number of compaction windows.

The twcs_max_window_count LiveUpdate-able parameter is also disabled during the execution of the
test in order to ensure that users can effectively disable the enforcement, should they want.
2022-09-11 17:38:25 -03:00
Felipe Mendes
1c5d46877e tests: cql_query_test: add test for TWCS tables with no TTL defined
This patch adds a testcase for TimeWindowCompactionStrategy tables created with no
default_time_to_live defined. It makes use of the LiveUpdate-able restrict_twcs_default_ttl
parameter in order to determine whether TWCS tables without TTL should be forbidden or not.

The test replays all 3 possible variations of the tri_mode_restriction and verifies tables
are correctly created/altered according to the current setting on the replica which receives
the request.
2022-09-11 16:55:46 -03:00
Felipe Mendes
7fec4fcaa6 cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables
TimeWindowCompactionStrategy (TWCS) tables are known for being used explicitly for time-series workloads. In particular, most of the time users should specify a default_time_to_live during table creation to ensure data is expired such as in a sliding window. Failure to do so may create unbounded windows - which - depending on the compaction window chosen, may introduce severe latency and operational problems, due to unbounded window growth.

However, there may be some use cases which explicitly ingest data by using the `USING TTL` keyword, which effectively has the same effect. Therefore, we can not simply forbid table creations without a default_time_to_live explicitly set to any value other than 0.

The new restrict_twcs_without_default_ttl option has three values: "true", "false", and "warn":

We default to "warn", which will notify the user of the consequences when creating a TWCS table without a default_time_to_live value set. However, users are encouraged to switch it to "true", as - ideally - a default_time_to_live value should always be expected to prevent applications failing to ingest data against the database ommitting the `USING TTL` keyword.
2022-09-11 16:50:42 -03:00
Felipe Mendes
a3356e866b cql: add max window restriction for TimeWindowCompactionStrategy
The number of potential compaction windows (or buckets) is defined by the default_time_to_live / sstable_window_size ratio. Every now and then we end up in a situation on where users of TWCS end up underestimating their window buckets when using TWCS. Unfortunately, scenarios on which one employs a default_time_to_live setting of 1 year but a window size of 30 minutes are not rare enough.

Such configuration is known to only make harm to a workload: As more and more windows are created, the number of SSTables will grow in the same pace, and the situation will only get worse as the number of shards increase.

This commit introduces the twcs_max_window_count option, which defaults to 50, and will forbid the Creation or Alter of tables which get past this threshold. A value of 0 will explicitly skip this check.

Note: this option does not forbid the creation of tables with a default_time_to_live=0 as - even though not recommended - it is perfectly possible for a TWCS table with default TTL=0 to have a bound window, provided any ingestion statements make use of 'USING TTL' within the CQL statement, in addition to it.
2022-09-11 16:50:22 -03:00
Felipe Mendes
f1ffb501f0 time_window_compaction_strategy: reject invalid window_sizes
Scylla mistakenly allows an user to configure an invalid TWCS window_size <= 0, which effectively breaks the notion of compaction windows.
Interestingly enough, a <= 0 window size should be considered an undefined behavior as either we would create a new window every 0 duration (?) or the table would behave as STCS, the reader is encouraged to figure out which one of these is true. :-)

Cassandra, on the other hand, will properly throw a ConfigurationException when receiving such invalid window sizes and we now match the behavior to the same as Cassandra's.

Refs: #2336
2022-09-11 16:40:03 -03:00
Raphael S. Carvalho
f5715d3f0b replica: Move memtables to compaction_group
Now memtables live in compaction_group. Also introduced function
that selects group based on token, but today table always return
the single group managed by it. Once multiple groups are supported,
then the function should interpret token content to select the
group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
f4579795e6 replica: move compound SSTable set to compaction group
The group is now responsible for providing the compound set.
table still has one compound set, which will span all groups for
the cases we want to ignore the group isolation.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
6717d96684 replica: move maintenance SSTable set to compaction_group
This commit is restricted to moving maintenance set into compaction_group.
Next, we'll introduce compound set into it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
ce8e5f354c replica: move main SSTable set to compaction_group
This commit is restricted to moving main set into compaction_group.
Next, we'll move maintenance set into it and finally the memtable.

A method is introduced to figure out which group a sstable belongs
to, but it's still unimplemented as table is still limited to
a single group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
4871f1c97c replica: Introduce compaction_group
Compaction group is a new abstraction used to group SSTables
that are eligible to be compacted together. By this definition,
a table in a given shard has a single compaction group.
The problem with this approach is that data from different vnodes
is intermixed in the same sstable, making it hard to move data
in a given sstable around.
Therefore, we'll want to have multiple groups per table.

A group can be thought of an isolated LSM tree where its memtable
and sstable files are isolated from other groups.

As for the implementation, the idea is to take a very incremental
approach.
In this commit, we're introducing a single compaction group to
table.
Next, we'll migrate sstable and maintenance set from table
into that single compaction group. And finally, the memtable.

Cache will be shared among the groups, for simplicity.
It works due to its ability to invalidate a subset of the
token range.

There will be 1:1 relationship between compaction_group and
table_state.
We can later rename table_state to compaction_group_state.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
a6ecadf3de replica: convert table::stop() into coroutine
await_pending_ops() is today marked noexcept, so doesn't have to
be implemented with finally() semantics.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
44913ebbd0 compaction_manager: restore indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
888660fa44 compaction_manager: Make remove() and stop_ongoing_compactions() noexcept
stop_ongoing_compactions() is made noexcept too as it's called from
remove() and we want to make the latter noexcept, to allow compaction
group to qualify its stop function as noexcept too.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
65414e6756 test: sstable_compaction_test: Don't reference main sstable set directly
Preparatory change for main sstable set to be moved into compaction
group. After that, tests can no longer direct access the main
set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
dfa7273127 test: sstable_utils: Set data size fields for fake SSTable
So methods that look at data size and require it to be higher than 0
will work on fake SSTables created using set_values().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Raphael S. Carvalho
4fa8159a13 test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable
column_family_test::add_sstable will soon be changed to run in a thread,
and it's not needed in this procedure, so let's remove its usage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-11 14:26:59 -03:00
Jadw1
ba461aca8b cql-pytest: more neutral command in cql_test_connection fixture
I found 'use system` to not be neutral enough (e.g. in case of testing
describe statement). `BEGIN BATCH APPLY BATCH` sounds better.

Closes #11504
2022-09-11 18:49:06 +03:00
Nadav Har'El
d71098a3b8 Update tools/java submodule
* tools/java b7a0c5bd31...b004da9d1b (1):
  > Revert "dist/debian:add python3 as dependency"
2022-09-11 17:45:43 +03:00
Pavel Emelyanov
bbad3eac63 pylib: Cast port number config to int explicitly
Otherwise it crashes some python versions.

The cast was there before a2dd64f68f
explicitly dropped one while moving the code between files.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11511
2022-09-09 18:08:08 +02:00
Kamil Braun
be1ef9d2a7 gms: feature_service: remove the USES_RAFT feature
It was not and won't be used for anything.
Note that the feature was always disabled or masked so no node ever
announced it, thus it's safe to get rid of.

Closes #11505
2022-09-09 18:05:46 +02:00
Michał Chojnowski
47844689d8 token_metadata: make local_dc_filter a lambda, not a std::function
This std::function causes allocations, both on construction
and in other operations. This costs ~2200 instructions
for a DC-local query. Fix that.

Closes #11494
2022-09-09 18:05:46 +02:00
Michał Chojnowski
af7ace3926 utils: config_file: fix handling of workdir,W in the YAML file
Option names given in db/config.cc are handled for the command line by passing
them to boost::program_options, and by YAML by comparing them with YAML
keys.
boost::program_options has logic for understanding the
long_name,short_name syntax, so for a "workdir,W" option both --workdir and -W
worked, as intended. But our YAML config parsing doesn't have this logic
and expected "workdir,W" verbatim, which is obviously not intended. Fix that.

Fixes #7478
Fixes #9500
Fixes #11503

Closes #11506
2022-09-09 18:05:46 +02:00
Kamil Braun
dba595d347 Merge 'Minimal implementation of Broadcast Tables' from Mikołaj Grzebieluch
Broadcast tables are tables for which all statements are strongly
consistent (linearizable), replicated to every node in the cluster and
available as long as a majority of the cluster is available. If a user
wants to store a “small” volume of metadata that is not modified “too
often” but provides high resiliency against failures and strong
consistency of operations, they can use broadcast tables.

The main goal of the broadcast tables project is to solve problems which
need to be solved when we eventually implement general-purpose strongly
consistent tables: designing the data structure for the Raft command,
ensuring that the commands are idempotent, handling snapshots correctly,
and so on.

In this MVP (Minimum Viable Product), statements are limited to simple
SELECT and UPDATE operations on the built-in table. In the future, other
statements and data types will be available but with this PR we can
already work on features like idempotent commands or snapshotting.
Snapshotting is not handled yet which means that restarting a node or
performing too many operations (which would cause a snapshot to be
created) will give incorrect results.

In a follow-up, we plan to add end-to-end Jepsen tests
(https://jepsen.io/). With this PR we can already simulate operations on
lists and test linearizability in linear complexity. This can also test
Scylla's implementation of persistent storage, failure detector, RPC,
etc.

Design doc: https://docs.google.com/document/d/1m1IW320hXtsGulzSTSHXkfcBKaG5UlsxOpm6LN7vWOc/edit?usp=sharing

Closes #11164

* github.com:scylladb/scylladb:
  raft: broadcast_tables: add broadcast_kv_store test
  raft: broadcast_tables: add returning query result
  raft: broadcast_tables: add execution of intermediate language
  raft: broadcast_tables: add compilation of cql to intermediate language
  raft: broadcast_tables: add definition of intermediate language
  db: system_keyspace: add broadcast_kv_store table
  db: config: add BROADCAST_TABLES feature flag
2022-09-09 18:05:37 +02:00
Aleksandra Martyniuk
55cd8fe3bf test: task manager api test
Test of a task manager api.
2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk
ec86410094 task_manager: test api layer implementation
The implementation of a test api that helps testing task manager
api. It provides methods to simulate the operations that can happen
on modules and theirs task. Through the api user can: register
and unregister the test module and the tasks belonging to the module,
and finish the tasks with success or custom error.
2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk
b1fa6e49af task_manager: add test specific classes
Add test_module and test_task classes inheriting from respectively
task_manager::module and task_manager::task::impl that serve
task manager testing.
2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk
42f36db55b task_manager: test api layer
The test api that helps testing task manager api. It can be used
to simulate the operations that can happen on modules and theirs
task. Through the api user can: register and unregister the test
module and the tasks belonging to the module, and finish the tasks
with success or custom error.
2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk
c9637705a6 task_manager: api layer implementation
The implementation of a task manager api layer. It provides
methods to list the modules registered in task_manager, list
tasks belonging to the given module, abort, wait for or retrieve
a status of the given task.
2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk
07043cee68 task_manager: api layer
The task manager api layer. It can be used to list the modules
registered in task_manager, list tasks belonging to the given
module, abort, wait for or retrieve a status of the given task.
2022-09-09 14:29:28 +02:00