Commit Graph

2024 Commits

Author SHA1 Message Date
Tomasz Grabiec
16eb4c6ce2 Merge "raft: system table backed persistency module" from Pavel Solodovnikov
This series contains an initial implementation of raft persistency module
that uses `raft` system table as the underlying storage model.

"system.raft" table will be used as a backend storage for implementing
raft persistence module in Scylla. It combines both raft log,
persisted vote and term, and snapshot info.

The table is partitioned by group id, thus allowing multi-raft
operation. The rest of the table structure mirrors the fields of
corresponding core raft structures defined in `raft.hh`, such as
`raft::log_entry`.

The raft table stores the only the latest snapshot id while
the actual snapshot will be available in a separate table
called `system.raft_snapshots`. The schema of `raft_snapshots`
mirrors the fields of `raft::snapshot` structure.

IDL definitions are also added for every raft struct so that we
automatically provide serialization and deserialization facilities
needed both for persistency module and for future RPC implmementation.

The first patch is a side-change needed to provide complete
serialization/deserialization for `bytes_ostream`, which we
need when persisting the raft log in the table (since `data`
is a variant containing `raft::command` (aka `bytes_ostream`)
among others).
`bytes_ostream` was lacking `deserialize` function, which is
added in the patch.

The second patch provides serializer for `lw_shared_ptr<T>`
which will be used for `raft::append_entries`, which has
a field with `std::vector<const lw_shared_ptr<raft::log_entry>>`
type.

There is also a patch to extend `fragmented_temporary_buffer`
with a static function `allocate_to_fit` that allocates an
instance of the fragmented buffer that has a specified size.
Individual fragment size is limited to 128kb.

The patch-set also contains the test suite covering basic
functionality of the persistency module.

* manmanson/raft-api-impl-v11:
  raft/sys_table_storage: add basic tests for raft_sys_table_storage
  raft: introduce `raft_sys_table_storage` class
  utils: add `fragmented_temporary_buffer::allocate_to_fit`
  raft: add IDL definitions for raft types
  raft: create `system.raft` and `system.raft_snapshots` tables
  serializer: add `serializer<lw_shared_ptr<T>>` specialization
  serializer: add `deserialize` function overload for `bytes_ostream`
2021-01-29 11:40:39 +02:00
Pavel Solodovnikov
aebb1987b5 raft: introduce raft_sys_table_storage class
This is the implementation of raft persistency module that
uses `raft` system table as the underlying storage model.

The instance is supposed to be bound to a single raft group.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-01-29 02:00:12 +03:00
Pavel Solodovnikov
10b117aada raft: create dummy impl for schema changes state machine
This patch introduces `schema_raft_state_machine` class
which is currently just a dummy implementation throwing a
"not implemented" exceptions for every call.

Will be needed later to construct an instance of `raft::server`.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210126193413.1520948-1-pa.solodovnikov@scylladb.com>
2021-01-27 12:33:27 +01:00
Asias He
c82250e0cf gossip: Allow deferring advertise of local node to be up
Currently the replacing node sets the status as STATUS_UNKNOWN when it
starts gossip service for the first time before it sets the status to
HIBERNATE to start the replacing operation. This introduces the
following race:

1) Replacing node using the same IP address of the node to be replaced
starts gossip service without setting the gossip STATUS (will be seen as
STATUS_UNKNOWN by other nodes)

2) Replacing node waits for gossip to settle and learns status and
tokens of existing nodes

3) Replacing node announces the HIBERNATE STATUS.

After Step 1 and before Step 3, existing nodes will mark the replacing
node as UP, but haven't marked the replacing node as doing replacing
yet. As a result, the replacing node will not be excluded from the read
replicas and will be considered a target node to serve CQL reads.

To fix, we make the replacing node avoid responding echo message when it is not
ready.

Fixes #7312

Closes #7714
2021-01-26 19:02:11 +01:00
Gleb Natapov
020da49c89 storage_proxy: remove no longer needed range_slice_read_executor
After support for mixed cluster compatibility feature
DIGEST_MULTIPARTITION_READ was dropped in 854a44ff9b
range_slice_read_executor and never_speculating_read_executor become
identical, so remove the former for good.

Message-Id: <20210124122731.GA1122499@scylladb.com>
2021-01-24 14:45:22 +02:00
Benny Halevy
088f92e574 paxos_state: learn: fix injected error description
It was copy-pasted from another injection point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201220091439.3604201-1-bhalevy@scylladb.com>
2021-01-24 11:51:23 +02:00
Avi Kivity
586f16bf79 Merge "Cut snitch -> storage service dependency" from Pavel E
"
Currently storage service and snitch implicitly depend on each
other. Storage service gossips snitch data on start, snitch
kicks the storage service when its configuration changes.

This interdependency is relaxed:

- snitch gossips all its state itself without using the
  storage service as a mediator
- storage service listens for snitch updates with the help
  of self-breaking subscription

Both changes make snitch independent from storage service,
remove yet another call for global storage service from the
codebase and make the storage service -> snitch reference
robust against dagling pointers/references

tests: unit(dev), dtest.rebuild.TestRebuild.simple_rebuild(dev)
"

* 'br-snitch-gossip-2' of https://github.com/xemul/scylla:
  storage-service: Subscribe to snitch to update topology
  snitch: Introduce reconfiguration signal
  snitch: Always gossip snitch info itself
  snitch: Do gossip DC and RACK itself
  snitch: Add generic gossiping helper
2021-01-20 10:23:43 +02:00
Avi Kivity
df3ef800c2 Merge 'Introduce load and stream feature' from Asias He
storage_service: Introduce load_and_stream

=== Introduction ===

This feature extends the nodetool refresh to allow loading arbitrary sstables
that do not belong to a node into the cluster. It loads the sstables from disk
and calculates the owning nodes of the data and streams to the owners
automatically.

From example, say the old cluster has 6 nodes and the new cluster has 3 nodes.
We can copy the sstables from the old cluster to any of the new nodes and
trigger the load and stream process.

This can make restores and migrations much easier.

=== Performance ===

I managed to get 40MB/s per shard on my build machine.
CPU: AMD Ryzen 7 1800X Eight-Core Processor
DISK: Samsung SSD 970 PRO 512GB

Assume 1TB sstables per node, each shard can do 40MB/s, each node has 32
shards, we can finish the load and stream 1TB of data in 13 mins on each
node.

1TB / 40 MB per shard * 32 shard / 60 s = 13 mins

=== Tests ===

backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_test
which creates a cluster with 4 nodes and inserts data, then use
load_and_stream to restore to a 2 nodes cluster.

=== Usage ===

curl -X POST "http://{ip}:10000/storage_service/sstables/{keyspace}?cf={table}&load_and_stream=true

=== Notes ===

Btw, with the old nodetool refresh, the node will not pick up the data
that does not belong to this node but it will not delete it either. One
has to run nodetool cleanup to remove those data manually which is a
surprise to me and probably to users as well. With load and stream, the
process will delete the sstables once it finishes stream, so no nodetool
cleanup is needed.

The name of this feature load and stream follows load and store in CPU world.

Fixes #7831

Closes #7846

* github.com:scylladb/scylla:
  storage_service: Introduce load_and_stream
  distributed_loader: Add get_sstables_from_upload_dir
  table: Add make_streaming_reader for given sstables set
2021-01-18 15:08:19 +02:00
Asias He
4d32d03172 storage_service: Introduce load_and_stream
=== Introduction ===

This feature extends the nodetool refresh to allow loading arbitrary sstables
that do not belong to a node into the cluster. It loads the sstables from disk
and calculates the owning nodes of the data and streams to the owners
automatically.

From example, say the old cluster has 6 nodes and the new cluster has 3 nodes.
We can copy the sstables from the old cluster to any of the new nodes and
trigger the load and stream process.

This can make restores and migrations much easier.

=== Performance ===

I managed to get 40MB/s per shard on my build machine.
CPU: AMD Ryzen 7 1800X Eight-Core Processor
DISK: Samsung SSD 970 PRO 512GB

Assume 1TB sstables per node, each shard can do 40MB/s, each node has 32
shards, we can finish the load and stream 1TB of data in 13 mins on each
node.

1TB / 40 MB per shard * 32 shard / 60 s = 13 mins

=== Tests ===

backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_test
which creates a cluster with 4 nodes and inserts data, then use
load_and_stream to restore to a 2 nodes cluster.

=== Usage ===

curl -X POST "http://{ip}:10000/storage_service/sstables/{keyspace}?cf={table}&load_and_stream=true

=== Notes ===

Btw, with the old nodetool refresh, the node will not pick up the data
that does not belong to this node but it will not delete it either. One
has to run nodetool cleanup to remove those data manually which is a
surprise to me and probably to users as well. With load and stream, the
process will delete the sstables once it finishes stream, so no nodetool
cleanup is needed.

The name of this feature load and stream follows load and store in CPU world.

Fixes #7831
2021-01-18 16:32:33 +08:00
Piotr Sarna
6ae94d31c1 treewide: remove shared pointer usage from the pager
The pager interface doesn't really need to be virtual,
so the next step could be to remove the need for pointers
entirely, but migrating from shared_ptr to unique_ptr
is a low-hanging fruit.

Message-Id: <a5bdecb17ae58e914da020fb58a41f4574565c66.1610709560.git.sarna@scylladb.com>
2021-01-15 15:03:14 +02:00
Pavel Emelyanov
2b31be0daa client-state,cdc: Remove call for storage_service from permissions check
The client_state::check_access() calls for global storage service
to get the features from it and check if the CDC feature is on.
The latter is needed to perform CDC-specific checks.

However it was noticed, that the check for the feature is excessive
as all the guarded if-s will resolve to false in case CDC is off
and the check_access will effectively work as it would with the
feature check.

With that observation, it's possible to ditch one more global storage
service reference.

tests: unit(dev), dtest(dev, auth)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210105063651.7081-1-xemul@scylladb.com>
2021-01-14 12:52:24 +02:00
Pavel Emelyanov
d3ee8774ad storage-service: Subscribe to snitch to update topology
Currently snitch explicitly calls storage service (if
it's initialized) to update topology on snitch data
change.

Instead of it -- make storage service subscribe on the
snitch reconfigure signal upon creation.

This finally makes snitch fully independent from storage
service.

In tests the snitch instance is not created, so check
for it before subscribing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-01-13 16:41:34 +03:00
Pavel Emelyanov
ca336409d7 snitch: Always gossip snitch info itself
The gossiping_property_file_snitch updates the gossip RACK and DC
values upon config change. Right now this is done with the help
of storage service, but the needed code to gossip rack and dc is
already available in the snitch itself.

Said that -- gossip snitch info by snitch helper and remove the
storage_service's one. This makes the 2nd step decoupling snitch
and storage service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-01-13 16:41:34 +03:00
Pavel Emelyanov
99e71bd1f6 snitch: Do gossip DC and RACK itself
This is the 2nd step in generalizing the snitch data gossiping
and at the same the 1st step in decoupling storage service and
snitch.

During start storage service starts gossiper, which notifies the
snicth with .gossiper_starting() call, then the storage service
calls gossip_snitch_info.

This patch makes snitch itself do the last step.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-01-13 16:41:34 +03:00
Gleb Natapov
d3aa17591c migration_manager: drop announce_locally flag
It looks like the history of the flag begins in Cassandra's
https://issues.apache.org/jira/browse/CASSANDRA-7327 where it is
introduced to speedup tests by not needing to start the gossiper.
The thing is we always start gossiper in our cql tests, so the flag only
introduce noise. And, of course, since we want to move schema to use raft
it goes against the nature of the raft to be able to apply modification only
locally, so we better get rid of the capability ASAP.

Tests: units(dev, debug)
Message-Id: <20201230111101.4037543-2-gleb@scylladb.com>
2021-01-03 13:58:09 +02:00
Benny Halevy
322aa2f8b5 token_metadata: add clear_gently
clear_gently gently clears the token_metadata members.
It uses continuations to allow yielding if needed
to prevent reactor stalls.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-22 11:22:21 +02:00
Benny Halevy
e089c22ec1 token_metdata: futurize update_normal_tokens
The function complexity if O(#tokens) in the worst case
as for each endpoint token to traverses _token_to_endpoint_map
lineraly to erase the endpoint mapping if it exists.

This change renames the current implementation of
update_normal_tokens to update_normal_tokens_sync
and clones the code as a coroutine that returns a future
and may yield if needed.

Eventually we should futurize the whole token_metadata
and abstract_replication_strategy interface and get rid
of the synchronous functions.  Until then the sync
version is still required from call sites that
are neither returning a future nor run in a seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-22 10:35:15 +02:00
Gleb Natapov
85cffd1aeb lwt: rewrite storage_proxy::cas using coroutings
Makes code much simpler to understand.

Message-Id: <20201201160213.GW1655743@scylladb.com>
2020-12-17 18:15:35 +01:00
Gleb Natapov
37368726c9 migration_manager: remove unused announce() variant
Message-Id: <20201216153150.GG3244976@scylladb.com>
2020-12-16 18:14:07 +02:00
Piotr Sarna
27fba35832 storage_proxy: start propagating local timeouts as timeouts
A local timeout was previously propagated to the client as WriteFailure,
while there exists a more concrete error type for that: WriteTimeout.
2020-12-14 07:50:40 +01:00
Avi Kivity
0f967f911d Merge "storage_service: get_token_metadata_ptr to hold on to token_metadata" from Benny
"
This series fixes use-after-free via token_metadata&

We may currently get a token_metadata& via get_token_metadata() and
use it across yield points in a couple of sites:
- do_decommission_removenode_with_repair
- get_new_source_ranges

To fix that, get_token_metadata_ptr and hold on to it
across yielding.

Fixes #7790

Dtest: update_cluster_layout_tests:TestUpdateClusterLayout.simple_removenode_2_test(debug)
Test: unit(dev)
"

* tag 'storage_service-token_metadata_ptr-v2' of github.com:bhalevy/scylla:
  storage_service: get_new_source_ranges: don't hold token_metadata& across yield point
  storage_service: get_changed_ranges_for_leaving: no need to maybe_yield for each token_range
  storage_service: get_changed_ranges_for_leaving: release token_metadata_ptr sooner
  storage_service: get_changed_ranges_for_leaving: don't hold token_metadata& across yield
2020-12-13 17:37:24 +02:00
Benny Halevy
1fbc831dae storage_service: get_new_source_ranges: don't hold token_metadata& across yield point
Provide the token_metadata& to get_new_source_ranges by the caller,
who keeps it valid throughout the call.

Note that there is no need to clone_only_token_map
since the token_metadata_ptr is immutable and can be
used just as well for calling strat.get_range_addresses.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-13 16:42:00 +02:00
Benny Halevy
f13913d251 storage_service: get_changed_ranges_for_leaving: no need to maybe_yield for each token_range
Now that we pass can_yield::yes to calculate_natural_endpoints
for each token_range.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-13 16:42:00 +02:00
Benny Halevy
89ed0705e8 storage_service: get_changed_ranges_for_leaving: release token_metadata_ptr sooner
No need to hold on to the shared token_metadata_ptr
after we got clone_after_all_left().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-13 16:42:00 +02:00
Benny Halevy
684c4143df storage_service: get_changed_ranges_for_leaving: don't hold token_metadata& across yield
When yielding in clone_only_token_map or clone_after_all_left
the token_metadata got with get_token_metadata() may go away.

Use get_token_metadata_ptr() instead to hold on to it.

And with that, we don't need to clone_only_token_map.
`metadata` is not modified by calculate_natural_endpoints, so we
can just refer to the immutable copy retrieved with
get_token_metadata_ptr.

Fixes #7790

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-13 16:41:58 +02:00
Avi Kivity
9265b87610 Merge "Remove get_local_storage_proxy from validation" from Pavel E
"
The validate_column_family() helper uses the global proxy
reference to get database from. Fortunatelly, all the callers
of it can provide one via argument.

tests: unit(dev)
"

* 'br-no-proxy-in-validate' of https://github.com/xemul/scylla:
  validation: Remove get_local_storage_proxy call
  client_state: Call validate_column_family() with database arg
  client_state: Add database& arg to has_column_family_access
  storage_proxy: Add .local_db() getters
  validate: Mark database argument const
2020-12-13 13:12:57 +02:00
Pavel Emelyanov
89fd524c5a schema-tables: Add database argument to make_update_table_mutations
There are 3 callers of this helper (cdc, migration manager and tests)
and all of them already have the database object at hands.

The argument will be used by next patch to remove call for global
storage proxy instance from make_update_indices_mutations.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-11 21:21:22 +03:00
Pavel Emelyanov
12cc539835 client_state: Call validate_column_family() with database arg
The previous patch brought the databse reference arg. And since
the currently called validate_column_family() overload _just_
gets the database from global proxy, it's better to shortcut.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-11 18:50:49 +03:00
Pavel Emelyanov
b0c4a9087d client_state: Add database& arg to has_column_family_access
It is called from cql3/statements' check_access methods and from thrift
handlers. The former have proxy argument from which they can get the
database. The latter already have the database itself on board.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-11 18:49:16 +03:00
Pavel Emelyanov
4c7bc8a3d1 storage_proxy: Add .local_db() getters
To facilitate the next patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-11 18:48:02 +03:00
Asias He
829b4c1438 repair: Make removenode safe by default
Currently removenode works like below:

- The coordinator node advertises the node to be removed in
  REMOVING_TOKEN status in gossip

- Existing nodes learn the node in REMOVING_TOKEN status

- Existing nodes sync data for the range it owns

- Existing nodes send notification to the coordinator

- The coordinator node waits for notification and announce the node in
  REMOVED_TOKEN

Current problems:

- Existing nodes do not tell the coordinator if the data sync is ok or failed.

- The coordinator can not abort the removenode operation in case of error

- Failed removenode operation will make the node to be removed in
  REMOVING_TOKEN forever.

- The removenode runs in best effort mode which may cause data
  consistency issues.

  It means if a node that owns the range after the removenode
  operation is down during the operation, the removenode node operation
  will continue to succeed without requiring that node to perform data
  syncing. This can cause data consistency issues.

  For example, Five nodes in the cluster, RF = 3, for a range, n1, n2,
  n3 is the old replicas, n2 is being removed, after the removenode
  operation, the new replicas are n1, n5, n3. If n3 is down during the
  removenode operation, only n1 will be used to sync data with the new
  owner n5. This will break QUORUM read consistency if n1 happens to
  miss some writes.

Improvements in this patch:

- This patch makes the removenode safe by default.

We require all nodes in the cluster to participate in the removenode operation and
sync data if needed. We fail the removenode operation if any of them is down or
fails.

If the user want the removenode operation to succeed even if some of the nodes
are not available, the user has to explicitly pass a list of nodes that can be
skipped for the operation.

$ nodetool removenode --ignore-dead-nodes <list_of_dead_nodes_to_ignore> <host_id>

Example restful api:

$ curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5"

- The coordinator can abort data sync on existing nodes

For example, if one of the nodes fails to sync data. It makes no sense for
other nodes to continue to sync data because the whole operation will
fail anyway.

- The coordinator can decide which nodes to ignore and pass the decision
  to other nodes

Previously, there is no way for the coordinator to tell existing nodes
to run in strict mode or best effort mode. Users will have to modify
config file or run a restful api cmd on all the nodes to select strict
or best effort mode. With this patch, the cluster wide configuration is
eliminated.

Fixes #7359

Closes #7626
2020-12-10 10:14:39 +02:00
Juliusz Stasiewicz
b150906d39 gossip: Added SNITCH_NAME to application_state
Snitch name needs to be exchanged within cluster once, on shadow
round, so joining nodes cannot use wrong snitch. The snitch names
are compared on bootstrap and on normal node start.

If the cluster already used mixed snitches, the upgrade to this
version will fail. In this case customer needs to add a node with
correct snitch for every node with the wrong snitch, then put
down the nodes with the wrong snitch and only then do the upgrade.

Fixes #6832

Closes #7739
2020-12-09 15:45:25 +02:00
Pavel Emelyanov
df0e26035f counters: Drop call to get_local_storage_service and related
The local host id is now passed by argument, so we don't
need the counter_id::local() and some other methods that
call or are called by it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-04 16:31:12 +03:00
Pavel Emelyanov
5a286ee8d4 storage_service: Keep local host id to database
The value in question is cached from db::system_keyspace
for places that want to have it without waiting for
futures. So far the only place is database counters code,
so keep the value on database itself. Next patches will
make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-12-04 15:09:29 +03:00
Benny Halevy
157a964a63 locator: extract can_yield to utils/maybe_yield.hh
Move the definition of bool_class can_yield to a standalone
header file and define there a maybe_yield(can_yield) helper.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-24 12:23:56 +02:00
Nadav Har'El
5f37c1ef33 Merge 'Don't add delay to the timestamp of the first CDC generation' from Piotr Jastrzębski
After the concept of the seed nodes was removed we can distinguish
whether the node is the first node in the cluster or not.

Thanks to this we can avoid adding delay to the timestamp of the first
CDC generation.

The delay is added to the timestamp to make sure that all the nodes
in the cluster manage to learn about it before the timestamp becomes in the past.
It is safe to not add the delay for the first node because we know it's the only node
in the cluster and no one else has to learn about the timestamp.

Fixes #7645

Tests: unit(dev)

Closes #7654

* github.com:scylladb/scylla:
  cdc: Don't add delay to the timestamp of the first generation
  cdc: Change for_testing to add_delay in make_new_cdc_generation
2020-11-19 16:47:16 +02:00
Piotr Jastrzebski
93a7f7943c cdc: Don't add delay to the timestamp of the first generation
After the concept of the seed nodes was removed we can distinguish
whether the node is the first node in the cluster or not.

Thanks to this we can avoid adding delay to the timestamp of the first
CDC generation.

Fixes #7645

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 13:03:18 +01:00
Piotr Jastrzebski
3024795507 cdc: Change for_testing to add_delay in make_new_cdc_generation
The meaning of the parameter changes from defining whether the function
is called in testing environment to deciding whether a delay should be
added to a timestamp of a newly created CDC generation.

This is a preparation for improvement in the following patch that does
not always add delay to every node but only to non-first node.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 12:19:42 +01:00
Avi Kivity
d612ca78f3 Merge 'Allow changing hinted handoff configuration in runtime' from Piotr Dulikowski
This PR allows changing the hinted_handoff_enabled option in runtime, either by modifying and reloading YAML configuration, or through HTTP API.

This PR also introduces an important change in semantics of hinted_handoff_enabled:
- Previously, hinted_handoff_enabled controlled whether _both writing and sending_ hints is allowed at all, or to particular DCs,
- Now, hinted_handoff_enabled only controls whether _writing hints_ is enabled. Sending hints from disk is now always enabled.

Fixes: #5634
Tests:
- unit(dev) for each commit of the PR
- unit(debug) for the last commit of the PR

Closes #6916

* github.com:scylladb/scylla:
  api: allow changing hinted handoff configuration
  storage_proxy: fix wrong return type in swagger
  hints_manager: implement change_host_filter
  storage_proxy: always create hints manager
  config: plug in hints::host_filter object into configuration
  db/hints: introduce host_filter
  hints/resource_manager: allow registering managers after start
  hints: introduce db::hints::directory_initializer
  directories.cc: prepare for use outside main.cc
2020-11-18 13:41:02 +02:00
Avi Kivity
100ad4db38 Merge 'Allow ALTERing the properties of system_auth tables' from Dejan Mircevski
As requested in #7057, allow certain alterations of system_auth tables. Potentially destructive alterations are still rejected.

Tests: unit (dev)

Closes #7606

* github.com:scylladb/scylla:
  auth: Permit ALTER options on system_auth tables
  auth: Add command_desc
  auth: Add tests for resource protections
2020-11-17 12:15:20 +02:00
Piotr Dulikowski
0fd36e2579 api: allow changing hinted handoff configuration
This commit makes it possible to change hints manager's configuration at
runtime through HTTP API.

To preserve backwards compatibility, we keep the old behavior of not
creating and checking hints directories if they are not enabled at
startup. Instead, hint directories are lazily initialized when hints are
enabled for the first time through HTTP API.
2020-11-17 10:24:43 +01:00
Piotr Dulikowski
1302f1b5bf storage_proxy: always create hints manager
Now, the hints manager object for regular hints is always created, even
if hints are disabled in configuration. Please note that the behavior of
hints will be unchanged - no hints will be sent when they are disabled.
The intent of this change is to make enabling and disabling hints in
runtime easier to implement.
2020-11-17 10:24:43 +01:00
Piotr Dulikowski
cefe5214ff config: plug in hints::host_filter object into configuration
Uses db::hints::host_filter as the type of hinted_handoff_enabled
configuration option.

Previously, hinted_handoff_enabled used to be a string option, and it
was parsed later in a separate function during startup. The function
returned a std::optional<std::unordered_set<sstring>>, whose meaning in
the context of hints is rather enigmatic for an observer not familiar
with hints.

Now, hinted_handoff_enabled has type of db::hints::host_filter, and it
is plugged into the config parsing framework, so there is no need for
later post-processing.
2020-11-17 10:24:42 +01:00
Piotr Dulikowski
a4f03d72b3 hints/resource_manager: allow registering managers after start
This change modifies db::hints::resource_manager so that it is now
possible to add hints::managers after it was started.

This change will make it possible to register the regular hints manager
later in runtime, if it wasn't enabled at boot time.
2020-11-17 10:15:47 +01:00
Dejan Mircevski
1beb57ad9d auth: Permit ALTER options on system_auth tables
These alterations cannot break the database irreparably, so allow
them.

Expand command_desc as required.

Add a type (rather than command_desc) parameter to
has_column_family_access() to minimize code changes.

Fixes #7057

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-11-16 22:32:32 -05:00
Dejan Mircevski
9a6c1b4d50 auth: Add command_desc
Instead of passing various bits of the command around, pass one
command_desc object.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-11-16 20:23:52 -05:00
Piotr Sarna
fc8ffe08b9 storage_proxy: unify retiring view response handlers
Materialized view updates participate in a retirement program,
which makes sure that they are immediately taken down once their
target node is down, without having to wait for timeout (since
views are a background operation and it's wasteful to wait in the
background for minutes). However, this mechanism has very delicate
lifetime issues, and it already caused problems more than once,
most recently in #5459.
In order to make another bug in this area less likely, the two
implementations of the mechanism, in on_down() and drain_on_shutdown(),
are unified.

Possibly refs #7572

Closes #7624
2020-11-16 18:50:49 +02:00
Benny Halevy
51e4d6490b storage_service: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-15 15:18:48 +02:00
Benny Halevy
e861c352f8 storage_service: mutate_token_metadata: do replicate_to_all_cores
Replicate the mutated token_metadata to all cores on success.

This moves replication out of update_pending_ranges(mutable_token_metadata_ptr, sstring),
so add explicit call to replicate_to_all_cores where it is called outside
of mutate_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-15 14:34:20 +02:00
Benny Halevy
25b5db0b72 storage_service: add mutate_token_metadata helper
Replace a repeating pattern of:
    with_token_metadata_lock([] {
        return get_mutable_token_metadata_ptr([] (mutable_token_metadata_ptr tmptr) {
            // mutate token_metadata via tmptr
        });
    });

With a call to mutate_token_metadata that does both
and calls the function with then mutable_token_metadata_ptr.

A following patch will also move the replication to all
cores to mutate_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-15 14:31:39 +02:00