Commit Graph

29806 Commits

Author SHA1 Message Date
Benny Halevy
5440739e1b snapshot_ctl: cleanup true_snapshots_size
Cleanup indentation and s/local_total/total/
as it is

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-19 07:50:53 +02:00
Benny Halevy
5db3cbe1e4 snpashot_ctl: true_snapshots_size: do not map_reduce across all shards
snapshot_ctl uses map_reduce over all database shards,
each counting the size of the snapshots directory,
which is shared, not per-shard.

So the total live size returned by it is multiples by the number of shards.

Add a unit test to test that.

Fixes #9897

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-19 07:50:53 +02:00
Gleb Natapov
dc886d96d1 idl-compiler: update the documentation with new features added recently
The series to move storage_proxy verbs to the IDL added not features to
the IDL compiler, but was lacking a documentation. This patch documents
the features.
2022-01-16 15:12:07 +02:00
Mikołaj Sielużycki
f6d9d6175f sstables: Harden bad_alloc handling during memtable flush.
dirty_memory_manager monitors memory and triggers memtable flushing if
there is too much pressure. If bad_alloc happens during the flush, it
may break the loop and flushes won't be triggered automatically, leading
to blocked writes as memory won't be automatically released.

The solution is to add exception handling to the loop, so that the inner
part always returns a non-exceptional future (meaning the loop will
break only on node shutdown).

try/catch is used around on_internal_error instead of
on_internal_error_noexcept, as the latter doesn't have a version that
accepts an exception pointer. To get the exception message from
std::exception_ptr a rethrow is needed anyway, so this was a simpler
approach.

Fixes: #4174

Message-Id: <20220114082452.89189-1-mikolaj.sieluzycki@scylladb.com>
2022-01-14 16:09:21 +02:00
Botond Dénes
b6828e899a Merge "Postpone reshape of SSTables created by repair" from Raphael
"
SSTables created by repair will potentially not conform to the
compaction strategy
layout goal. If node shuts down before off-strategy has a chance to
reshape those files, node will be forced to reshape them on restart.
That
causes unexpected downtime. Turns out we can skip reshape of those files
on boot, and allow them to be reshaped after node becomes online, as if
the node never went down. Those files will go through same procedure as
files created by repair-based ops. They will be placed in maintenance
set,
and be reshaped iteratively until ready for integration into the main
set.
"

Fixes #9895.

tests: UNIT(dev).

* 'postpone_reshape_on_repair_originated_files' of https://github.com/raphaelsc/scylla:
  distributed_loader: postpone reshape of repair-originated sstables
  sstables: Introduce filter for sstable_directory::reshape
  table: add fast path when offstrategy is not needed
  sstables: add constant for repair origin
2022-01-14 14:05:09 +02:00
Botond Dénes
c727360eca db: convert data listeners to v2
To remove yet another back-and-forth conversion in
table::make_reader_v2().

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220114085551.565752-1-bdenes@scylladb.com>
2022-01-14 13:57:44 +02:00
Avi Kivity
4995179c6f Merge "Use data_dictionary in client_state and validation" from Pavel E
"
The main motivation for the set is to expell query_processor.proxy().local_db()
calls from cql3/statements code. The only places that still use q.p. like this
are those calling client_state::has_..._access() checkers. Those checks can
go with the data_dictionary which is already available on the query processor.
This is the continuation of the 9643f84d ("Eliminate direct storage_proxy usage
from cql3 statements") patch set.

As a side effect the validation/ code, that's called from has_..._access checks,
is also converted to use data_dictionary.

tests: unit(dev, debug)
"

* 'br-cql3-dictionary' of https://github.com/xemul/scylla:
  validation: Make validate_column_family use data_dictionary::database
  client_state: Make has_access use data_dictionary::database
  client_state: Make has_schema_access use data_dictionary::database
  client_state: Make has_column_family_access use data_dictionary::database
  client_state: Make has_keyspace_access use data_dictionary::database
2022-01-14 13:55:22 +02:00
Raphael S. Carvalho
ae3b589f12 table: Reduce off-strategy space requirement if multiple compaction rounds are required
Off-strategy compaction works by iteratively reshaping the maintenance set
until it's ready for integration into the main set. As repair-based ops
produces disjoint sstables only, off-strategy compaction can complete
the reshape in a single round.
But if reshape ends up requiring more than one round, space requirement
for off-strategy to succeed can be high. That's because we're only
deleting input SSTables on completion. SSTables from maintenance set
can be only deleted on completion as we can only merge maintenance
set into main one once we're done reshaping[1]. But a SSTable that was
created by a reshape and later used as a input in another reshape can
be deleted immediately as its existence is not needed anywhere.

[1] We don't update maintenance set after each reshape round, because that
would mess with its disjointness. We also don't iteratively merge
maintenance set into main set, as the data produced by a single round
is potentially not ready for integration into main set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220111202950.111456-1-raphaelsc@scylladb.com>
2022-01-14 13:46:31 +02:00
Botond Dénes
3005b9b5f8 Merge "move raft verbs to the IDL" from Gleb Natapov
"
The series moves raft verbs to the IDL and also fix some verbs to be one
way like they were intended to be.
"

* 'gleb/raft-idl' of github.com:scylladb/scylla-dev:
  raft service: make one way raft messages truly one way
  raft: move raft verbs to the IDL
  raft: split idl to rpc and storage
  idl-compiler: always produce const variant of serializers
  raft: simplify raft idl definitions
2022-01-14 13:40:20 +02:00
Pavel Emelyanov
00de5f4876 validation: Make validate_column_family use data_dictionary::database
And instantly convert the validate_keyspace() as it's not called
from anywhere but the validate_column_family().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-01-14 13:00:53 +03:00
Pavel Emelyanov
71c3a7525b client_state: Make has_access use data_dictionary::database
This db argument is only needed to be pushed into
cdc::is_log_for_some_table() helper. All callers already have
the d._d.::database at hands and convert it into .real_database()
call-time, so this patch effectively generalizes those calls to
the .real_database().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-01-14 12:59:35 +03:00
Pavel Emelyanov
f22eb22b8b client_state: Make has_schema_access use data_dictionary::database
It's now called with d._d.::database converted to .real_database()
right in the argument passing, so this change can be treated as
the generalization of that .real_database() call.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-01-14 12:55:53 +03:00
Pavel Emelyanov
b6bc7a9b29 client_state: Make has_column_family_access use data_dictionary::database
Straightforward replacement. Internals of the has_column_family_access()
temporarily get .real_database(), but it will be changed soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-01-14 12:55:15 +03:00
Pavel Emelyanov
1ed237120a client_state: Make has_keyspace_access use data_dictionary::database
Straightforward replacement. Internals of the has_keyspace_access()
temporarily get .real_database(), but it will be changed soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-01-14 12:54:01 +03:00
Kamil Braun
168c6f47f9 replica: database: allow disabling optimized TWCS queries through compaction strategy options
As requested from field engineering, add a way to disable
the optimized TWCS query algorithm (use regular query path)
just in case a bug or a performance regression shows up in
production.

To disable the optimized query path, add
'enable_optimized_twcs_queries': 'false' to compaction strategy options,
e.g.
```
alter table ks.t with compaction =
    {'class': 'TimeWindowCompactionStrategy',
     'enable_optimized_twcs_queries': 'false'};
```

Setting the `enable_optimized_twcs_queries` key to anything other than
`'false'` (note: a boolean `false` expands to a string `'false'`) or
skipping it (re)enables the optimized query path.

Note: the flag can be set in a cluster in the middle of upgrade. Nodes
which do not understand it simply ignore it, but they do store it in
their schema tables (they store the entire `compaction` map). After
these nodes are upgraded, they will understand the flag and act
accordingly.

Note: in the situation above, some nodes may use the optimized path and
some may use the regular path. This may happen also in a fully upgraded
cluster when compaction options are changed concurrently to reads;
there is a short period of time where the schema change propagates and
some nodes got the flag but some didn't.

These should not be a problem since the optimization does not change the
returned read results (unless there is a bug).

Generally, the flag is not intended for normal use, but for field
engineers to disable it in case of a serious problem.

Ref #6418.

Closes #9900
2022-01-14 07:10:02 +02:00
Kamil Braun
4c3fb9ac68 conf: update description of reversed_reads_auto_bypass_cache in scylla.yaml
Message-Id: <20220111123937.10750-1-kbraun@scylladb.com>
2022-01-13 23:49:01 +01:00
Kamil Braun
fe0366f6bc cdc: check_and_repair_cdc_streams: fix indentation 2022-01-13 23:10:18 +02:00
Juliusz Stasiewicz
ea46439858 cdc: check_and_repair_cdc_streams: regenerate if too many streams are present
If the number of streams exceeds the number of token ranges
it indicates that some spurious streams from decommissioned
nodes are present.

In such a situation - simply regenerate.

Fixes #9772

Closes #9780
2022-01-13 23:10:18 +02:00
Nadav Har'El
a0cad9585f merge: move tests to use new schema announcement API
Merged patch series from Gleb Natapov:

The series moves tests to use new schema announcement API and removes
the old one.

Gleb Natapov (7):
  test: convert database_test to new schema announcement api
  test use new schema announcement api in cql_test_env.cc
  test: move cql_query_test.cc to new schema announcement api
  test: move memtable_test.cc to new schema announcement api
  test: move schema_change_test.cc to new schema announcement api
  migration_manager: drop unused announce_ functions
  migration_manager: assert that raft ops are done on shard 0

 service/migration_manager.hh     |  5 ---
 service/migration_manager.cc     | 52 ++++++++------------------------
 test/boost/cql_query_test.cc     |  3 +-
 test/boost/database_test.cc      |  5 +--
 test/boost/memtable_test.cc      |  2 +-
 test/boost/schema_change_test.cc | 18 ++++++-----
 test/lib/cql_test_env.cc         |  2 +-
 7 files changed, 31 insertions(+), 56 deletions(-)
2022-01-13 23:10:18 +02:00
Gleb Natapov
0169e4d7ed migration_manager: assert that raft ops are done on shard 0
Now that all consumers run on shard zero we can assert it.
2022-01-13 23:10:18 +02:00
Gleb Natapov
1ff85020b5 migration_manager: drop unused announce_ functions 2022-01-13 23:10:18 +02:00
Gleb Natapov
f0a41c102a test: move schema_change_test.cc to new schema announcement api 2022-01-13 23:10:18 +02:00
Gleb Natapov
512556914a test: move memtable_test.cc to new schema announcement api 2022-01-13 23:10:13 +02:00
Botond Dénes
d6efe27545 Merge 'db: config: add a flag to disable new reversed reads algorithm' from Kamil Braun
Just in case the new algorithm turns out to be buggy, or give a
performance regression, add a flag to fall-back to the old algorithm for
use in the field.

Closes #9908

* github.com:scylladb/scylla:
  db: config: add a flag to disable new reversed reads algorithm
  replica: table: remove obsolete comment about reversed reads
2022-01-13 23:09:02 +02:00
Gleb Natapov
be46109af6 test: move cql_query_test.cc to new schema announcement api 2022-01-13 23:09:02 +02:00
Avi Kivity
63d254a8d2 Merge 'gms, service: futurize and coroutinize gossiper-related code' from Pavel Solodovnikov
This series greatly reduces gossipers' dependence on `seastar::async` (yet, not completely).

`i_endpoint_state_change_subscriber` callbacks are converted to return futures (again, to get rid of `seastar::async` dependency), all users are adjusted appropriately (e.g. `storage_service`, `cdc::generation_service`, `streaming::stream_manager`, `view_update_backlog_broker` and `migration_manager`).
This includes futurizing and coroutinizing the whole function call chain up to the `i_endpoint_state_change_subscriber` callback functions.

To aid the conversion process, a non-`seastar::async` dependent variant of `utils::atomic_vector::for_each` is introduced (`for_each_futurized`). A different name is used to clearly distinguish converted and non-converted code, so that the last step (remove `seastar::async()` wrappers around callback-calling code in gossiper) is easier. This is left for a follow-up series, though.

Tests: unit(dev)

Closes #9844

* github.com:scylladb/scylla:
  service: storage_service: coroutinize `set_gossip_tokens`
  service: storage_service: coroutinize `leave_ring`
  service: storage_service: coroutinize `handle_state_left`
  service: storage_service: coroutinize `handle_state_leaving`
  service: storage_service: coroutinize `handle_state_removing`
  service: storage_service: coroutinize `do_drain`
  service: storage_service: coroutinize `shutdown_protocol_servers`
  service: storage_service: coroutinize `excise`
  service: storage_service: coroutinize `remove_endpoint`
  service: storage_service: coroutinize `handle_state_replacing`
  service: storage_service: coroutinize `handle_state_normal`
  service: storage_service: coroutinize `update_peer_info`
  service: storage_service: coroutinize `do_update_system_peers_table`
  service: storage_service: coroutinize `update_table`
  service: storage_service: coroutinize `handle_state_bootstrap`
  service: storage_service: futurize `notify_*` functions
  service: storage_service: coroutinize `handle_state_replacing_update_pending_ranges`
  repair: row_level_repair_gossip_helper: coroutinize `remove_row_level_repair`
  locator: reconnectable_snitch_helper: coroutinize `reconnect`
  gms: i_endpoint_state_change_subscriber: make callbacks to return futures
  utils: atomic_vector: introduce future-returning `for_each` function
  utils: atomic_vector: rename `for_each` to `thread_for_each`
  gms: gossiper: coroutinize `start_gossiping`
  gms: gossiper: coroutinize `force_remove_endpoint`
  gms: gossiper: coroutinize `do_status_check`
  gms: gossiper: coroutinize `remove_endpoint`
2022-01-13 23:09:02 +02:00
Gleb Natapov
100b44f5ff test use new schema announcement api in cql_test_env.cc 2022-01-13 23:09:02 +02:00
Avi Kivity
230eac439e Update seastar submodule
* seastar ae8d1c28a2...5025cd44ea (2):
  > Merge "Lazy IO capacity replenishment" from Pavel E
Fixes #9893
  > configure.py: don't use deprecated mktemp()
2022-01-13 23:09:02 +02:00
Gleb Natapov
5dffc8ed3e test: convert database_test to new schema announcement api 2022-01-13 23:09:02 +02:00
Gleb Natapov
c500a90902 raft service: make one way raft messages truly one way
Raft core does not expect replies for most messages it sends, but they
are defined as two way by the IDL currently. Fix them to be one way.
2022-01-13 13:14:46 +02:00
Gleb Natapov
b1fea20d36 raft: move raft verbs to the IDL 2022-01-13 13:14:46 +02:00
Gleb Natapov
8a25b740df raft: split idl to rpc and storage
Storage uses only small part of the IDL, so it can include only the part
that is relevant to it.
2022-01-13 13:14:46 +02:00
Gleb Natapov
b0dee71b34 idl-compiler: always produce const variant of serializers
Currently const variant is produced only if a type and its const usage
are in the same idl file, but a type can be defined in one file and used
as const in another.
2022-01-13 13:14:46 +02:00
Gleb Natapov
c5474f9ac2 raft: simplify raft idl definitions
We may use high level types in the IDL.
2022-01-13 13:14:46 +02:00
Nadav Har'El
f842f65794 Merge 'thrift: switch to replica::database uses to data_dictionary' from Avi Kivity
replica::database is (as its name indicates) a replica-side service, while thrift
is coordinator-side. Convert thrift's use of replica::database for data dictionary
lookups to the data_dictionary module. Since data_dictionary was missing a
get_keyspaces() operation, add that.

Thrift still uses replica::database to get the schema version. That should be
provided by migration_manager, but changing that is left for later.

Closes #9888

* github.com:scylladb/scylla:
  thrift: switch from replica module to data_dictionary module
  thrift: simplify execute_schema_command() calling convention
  data_dictionary: add get_keyspaces() method
2022-01-13 10:52:30 +02:00
Nadav Har'El
343c521e28 alternator: avoid large contigous allocation in BatchGetItem
The BatchGetItem request can return a very large response - according to
DynamoDB documentation up to 16 MB, but presently in Alternator, we allow
even more (see #5944).

The problem is that the existing code prepares the entire response as
a large contiguous string, resulting in oversized allocation warnings -
and potentially allocation failures. So in this patch we estimate the size
of the BatchGetItem response, and if it is "big enough" (currently over
100 KB), we return it with the recently added streaming output support.

This streaming output doesn't avoid the extra memory copies unfortunately,
but it does avoid a *contiguous* allocation which is the goal of this
patch.

After this patch, one oversized allocation warning is gone from the test:

    test/alternator/run test_batch.py::test_batch_get_item_large

(a second oversized allocation is still present, but comes from the
unrelated BatchWriteItem issue #8183).

Fixes #8522

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220111170541.637176-1-nyh@scylladb.com>
2022-01-13 09:46:08 +01:00
Kamil Braun
e98711cfcb db: config: add a flag to disable new reversed reads algorithm
Just in case the new algorithm turns out to be buggy, or give a
performance regression, add a flag to fall-back to the old algorithm for
use in the field.
2022-01-12 18:59:19 +01:00
Avi Kivity
6205d40d5f thrift: switch from replica module to data_dictionary module
Thrift is a coordinator-side service and should not touch the replica
module. Switch it to data_dictionary.

The switch is straightforward with two exceptions:
 - client_state still receives replica::database parameters. After
   this change it will be easier to adapt client_state too.
 - calls to replica::database::get_version() remain. They should be
   rerouted to migration_manager instead, as that deals with schema
   management.
2022-01-12 19:54:38 +02:00
Kamil Braun
7fb7a406e7 replica: table: remove obsolete comment about reversed reads 2022-01-12 17:57:08 +01:00
Avi Kivity
85061b694b thrift: simplify execute_schema_command() calling convention
execute_schema_command is always called with the same first two
parameters, which are always defined froom the thrift_handler
instance that contains its caller. Simplify it by making it a member
function.

This simplifies migration to data_dictionary in the next patch.
2022-01-12 18:56:47 +02:00
Avi Kivity
631a19884d data_dictionary: add get_keyspaces() method
Mirroring replica::database::get_keyspaces(), for Thrift's use.

We return a vector instead of a hash map. Random access is already
available via database::find_keyspace(). The name is available
via the keyspace metadata, and in fact Thrift ignore the map
name and uses the metadata name. Using a simpler type reduces
include dependencies for this heavily used module.

The function is plumbed to replica::database::get_keyspaces() so
it returns the same data.
2022-01-12 18:24:38 +02:00
Raphael S. Carvalho
a144d30162 distributed_loader: postpone reshape of repair-originated sstables
SSTables created by repair will potentially not conform to the compaction strategy
layout goal. If node shuts down before off-strategy has a chance to
reshape those files, node will be forced to reshape them on restart. That
causes unexpected downtime. Turns out we can skip reshape of those files
on boot, and allow them to be reshaped after node becomes online, as if
the node never went down. Those files will go through same procedure as
files created by repair-based ops. They will be placed in maintenance set,
and be reshaped iteratively until ready for integration into the main set.

Fixes #9895.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-12 13:14:31 -03:00
Nadav Har'El
8bcd23fa02 Merge: move rest of internal ddl users to use raft from Gleb
The patch series moves the rest of internal ddl users to do schema
change over raft (if enabled). After that series only tests are left
using old API.

* 'gleb/raft-schema-rest-v6' of github.com:scylladb/scylla-dev: (33 commits)
  migration_manager: drop no longer used functions
  system_distributed_keyspace: move schema creation code to use raft
  auth: move table creation code to use raft
  auth: move keyspace creation code to use raft
  table_helper: move schema creation code to use raft
  cql3: make query_processor inherit from peering_sharded_service
  table_helper: make setup_table() static
  table_helper: co-routinize setup_keyspace()
  redis: move schema creation code to go through raft
  thrift: move system_update_column_family() to raft
  thrift: authenticate a statement before verifying in system_update_column_family()
  thrift: co-routinize system_update_column_family()
  thrift: move system_update_keyspace() to raft
  thrift: authenticate a statement before verifying in system_update_keyspace()
  thrift: co-routinize system_update_keyspace()
  thrift: move system_drop_keyspace() to raft
  thrift: authenticate a statement before verifying in system_drop_keyspace()
  thrift: co-routinize system_drop_keyspace()
  thrift: move system_add_keyspace() to raft
  thrift: co-routinize system_add_keyspace()
  ...
2022-01-12 18:09:08 +02:00
Raphael S. Carvalho
f9e33f7046 sstables: Introduce filter for sstable_directory::reshape
This will be useful to allow sstable_directory user to filter out
sstables that should not be reshaped. The default filter is
implemented as including everything.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-12 11:54:17 -03:00
Gleb Natapov
2aec9009ef migration_manager: drop no longer used functions 2022-01-12 16:40:06 +02:00
Gleb Natapov
9ce62bcc33 system_distributed_keyspace: move schema creation code to use raft 2022-01-12 16:40:06 +02:00
Gleb Natapov
50b7806c57 auth: move table creation code to use raft 2022-01-12 16:40:06 +02:00
Gleb Natapov
4273a3308c auth: move keyspace creation code to use raft 2022-01-12 16:40:06 +02:00
Gleb Natapov
03184bd786 table_helper: move schema creation code to use raft 2022-01-12 16:40:06 +02:00
Gleb Natapov
eb62e81843 cql3: make query_processor inherit from peering_sharded_service
This what we can get to a distributed object from shard local one.
2022-01-12 16:40:06 +02:00