scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 22:25:48 +00:00

Author	SHA1	Message	Date
Benny Halevy	5440739e1b	snapshot_ctl: cleanup true_snapshots_size Cleanup indentation and s/local_total/total/ as it is Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-01-19 07:50:53 +02:00
Benny Halevy	5db3cbe1e4	snpashot_ctl: true_snapshots_size: do not map_reduce across all shards snapshot_ctl uses map_reduce over all database shards, each counting the size of the snapshots directory, which is shared, not per-shard. So the total live size returned by it is multiples by the number of shards. Add a unit test to test that. Fixes #9897 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-01-19 07:50:53 +02:00
Gleb Natapov	dc886d96d1	idl-compiler: update the documentation with new features added recently The series to move storage_proxy verbs to the IDL added not features to the IDL compiler, but was lacking a documentation. This patch documents the features.	2022-01-16 15:12:07 +02:00
Mikołaj Sielużycki	f6d9d6175f	sstables: Harden bad_alloc handling during memtable flush. dirty_memory_manager monitors memory and triggers memtable flushing if there is too much pressure. If bad_alloc happens during the flush, it may break the loop and flushes won't be triggered automatically, leading to blocked writes as memory won't be automatically released. The solution is to add exception handling to the loop, so that the inner part always returns a non-exceptional future (meaning the loop will break only on node shutdown). try/catch is used around on_internal_error instead of on_internal_error_noexcept, as the latter doesn't have a version that accepts an exception pointer. To get the exception message from std::exception_ptr a rethrow is needed anyway, so this was a simpler approach. Fixes: #4174 Message-Id: <20220114082452.89189-1-mikolaj.sieluzycki@scylladb.com>	2022-01-14 16:09:21 +02:00
Botond Dénes	b6828e899a	Merge "Postpone reshape of SSTables created by repair" from Raphael " SSTables created by repair will potentially not conform to the compaction strategy layout goal. If node shuts down before off-strategy has a chance to reshape those files, node will be forced to reshape them on restart. That causes unexpected downtime. Turns out we can skip reshape of those files on boot, and allow them to be reshaped after node becomes online, as if the node never went down. Those files will go through same procedure as files created by repair-based ops. They will be placed in maintenance set, and be reshaped iteratively until ready for integration into the main set. " Fixes #9895. tests: UNIT(dev). * 'postpone_reshape_on_repair_originated_files' of https://github.com/raphaelsc/scylla: distributed_loader: postpone reshape of repair-originated sstables sstables: Introduce filter for sstable_directory::reshape table: add fast path when offstrategy is not needed sstables: add constant for repair origin	2022-01-14 14:05:09 +02:00
Botond Dénes	c727360eca	db: convert data listeners to v2 To remove yet another back-and-forth conversion in table::make_reader_v2(). Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220114085551.565752-1-bdenes@scylladb.com>	2022-01-14 13:57:44 +02:00
Avi Kivity	4995179c6f	Merge "Use data_dictionary in client_state and validation" from Pavel E " The main motivation for the set is to expell query_processor.proxy().local_db() calls from cql3/statements code. The only places that still use q.p. like this are those calling client_state::has_..._access() checkers. Those checks can go with the data_dictionary which is already available on the query processor. This is the continuation of the `9643f84d` ("Eliminate direct storage_proxy usage from cql3 statements") patch set. As a side effect the validation/ code, that's called from has_..._access checks, is also converted to use data_dictionary. tests: unit(dev, debug) " * 'br-cql3-dictionary' of https://github.com/xemul/scylla: validation: Make validate_column_family use data_dictionary::database client_state: Make has_access use data_dictionary::database client_state: Make has_schema_access use data_dictionary::database client_state: Make has_column_family_access use data_dictionary::database client_state: Make has_keyspace_access use data_dictionary::database	2022-01-14 13:55:22 +02:00
Raphael S. Carvalho	ae3b589f12	table: Reduce off-strategy space requirement if multiple compaction rounds are required Off-strategy compaction works by iteratively reshaping the maintenance set until it's ready for integration into the main set. As repair-based ops produces disjoint sstables only, off-strategy compaction can complete the reshape in a single round. But if reshape ends up requiring more than one round, space requirement for off-strategy to succeed can be high. That's because we're only deleting input SSTables on completion. SSTables from maintenance set can be only deleted on completion as we can only merge maintenance set into main one once we're done reshaping[1]. But a SSTable that was created by a reshape and later used as a input in another reshape can be deleted immediately as its existence is not needed anywhere. [1] We don't update maintenance set after each reshape round, because that would mess with its disjointness. We also don't iteratively merge maintenance set into main set, as the data produced by a single round is potentially not ready for integration into main set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220111202950.111456-1-raphaelsc@scylladb.com>	2022-01-14 13:46:31 +02:00
Botond Dénes	3005b9b5f8	Merge "move raft verbs to the IDL" from Gleb Natapov " The series moves raft verbs to the IDL and also fix some verbs to be one way like they were intended to be. " * 'gleb/raft-idl' of github.com:scylladb/scylla-dev: raft service: make one way raft messages truly one way raft: move raft verbs to the IDL raft: split idl to rpc and storage idl-compiler: always produce const variant of serializers raft: simplify raft idl definitions	2022-01-14 13:40:20 +02:00
Pavel Emelyanov	00de5f4876	validation: Make validate_column_family use data_dictionary::database And instantly convert the validate_keyspace() as it's not called from anywhere but the validate_column_family(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-01-14 13:00:53 +03:00
Pavel Emelyanov	71c3a7525b	client_state: Make has_access use data_dictionary::database This db argument is only needed to be pushed into cdc::is_log_for_some_table() helper. All callers already have the d._d.::database at hands and convert it into .real_database() call-time, so this patch effectively generalizes those calls to the .real_database(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-01-14 12:59:35 +03:00
Pavel Emelyanov	f22eb22b8b	client_state: Make has_schema_access use data_dictionary::database It's now called with d._d.::database converted to .real_database() right in the argument passing, so this change can be treated as the generalization of that .real_database() call. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-01-14 12:55:53 +03:00
Pavel Emelyanov	b6bc7a9b29	client_state: Make has_column_family_access use data_dictionary::database Straightforward replacement. Internals of the has_column_family_access() temporarily get .real_database(), but it will be changed soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-01-14 12:55:15 +03:00
Pavel Emelyanov	1ed237120a	client_state: Make has_keyspace_access use data_dictionary::database Straightforward replacement. Internals of the has_keyspace_access() temporarily get .real_database(), but it will be changed soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-01-14 12:54:01 +03:00
Kamil Braun	168c6f47f9	replica: database: allow disabling optimized TWCS queries through compaction strategy options As requested from field engineering, add a way to disable the optimized TWCS query algorithm (use regular query path) just in case a bug or a performance regression shows up in production. To disable the optimized query path, add 'enable_optimized_twcs_queries': 'false' to compaction strategy options, e.g. ``` alter table ks.t with compaction = {'class': 'TimeWindowCompactionStrategy', 'enable_optimized_twcs_queries': 'false'}; ``` Setting the `enable_optimized_twcs_queries` key to anything other than `'false'` (note: a boolean `false` expands to a string `'false'`) or skipping it (re)enables the optimized query path. Note: the flag can be set in a cluster in the middle of upgrade. Nodes which do not understand it simply ignore it, but they do store it in their schema tables (they store the entire `compaction` map). After these nodes are upgraded, they will understand the flag and act accordingly. Note: in the situation above, some nodes may use the optimized path and some may use the regular path. This may happen also in a fully upgraded cluster when compaction options are changed concurrently to reads; there is a short period of time where the schema change propagates and some nodes got the flag but some didn't. These should not be a problem since the optimization does not change the returned read results (unless there is a bug). Generally, the flag is not intended for normal use, but for field engineers to disable it in case of a serious problem. Ref #6418. Closes #9900	2022-01-14 07:10:02 +02:00
Kamil Braun	4c3fb9ac68	conf: update description of `reversed_reads_auto_bypass_cache` in scylla.yaml Message-Id: <20220111123937.10750-1-kbraun@scylladb.com>	2022-01-13 23:49:01 +01:00
Kamil Braun	fe0366f6bc	cdc: `check_and_repair_cdc_streams`: fix indentation	2022-01-13 23:10:18 +02:00
Juliusz Stasiewicz	ea46439858	cdc: `check_and_repair_cdc_streams`: regenerate if too many streams are present If the number of streams exceeds the number of token ranges it indicates that some spurious streams from decommissioned nodes are present. In such a situation - simply regenerate. Fixes #9772 Closes #9780	2022-01-13 23:10:18 +02:00
Nadav Har'El	a0cad9585f	merge: move tests to use new schema announcement API Merged patch series from Gleb Natapov: The series moves tests to use new schema announcement API and removes the old one. Gleb Natapov (7): test: convert database_test to new schema announcement api test use new schema announcement api in cql_test_env.cc test: move cql_query_test.cc to new schema announcement api test: move memtable_test.cc to new schema announcement api test: move schema_change_test.cc to new schema announcement api migration_manager: drop unused announce_ functions migration_manager: assert that raft ops are done on shard 0 service/migration_manager.hh \| 5 --- service/migration_manager.cc \| 52 ++++++++------------------------ test/boost/cql_query_test.cc \| 3 +- test/boost/database_test.cc \| 5 +-- test/boost/memtable_test.cc \| 2 +- test/boost/schema_change_test.cc \| 18 ++++++----- test/lib/cql_test_env.cc \| 2 +- 7 files changed, 31 insertions(+), 56 deletions(-)	2022-01-13 23:10:18 +02:00
Gleb Natapov	0169e4d7ed	migration_manager: assert that raft ops are done on shard 0 Now that all consumers run on shard zero we can assert it.	2022-01-13 23:10:18 +02:00
Gleb Natapov	1ff85020b5	migration_manager: drop unused announce_ functions	2022-01-13 23:10:18 +02:00
Gleb Natapov	f0a41c102a	test: move schema_change_test.cc to new schema announcement api	2022-01-13 23:10:18 +02:00
Gleb Natapov	512556914a	test: move memtable_test.cc to new schema announcement api	2022-01-13 23:10:13 +02:00
Botond Dénes	d6efe27545	Merge 'db: config: add a flag to disable new reversed reads algorithm' from Kamil Braun Just in case the new algorithm turns out to be buggy, or give a performance regression, add a flag to fall-back to the old algorithm for use in the field. Closes #9908 * github.com:scylladb/scylla: db: config: add a flag to disable new reversed reads algorithm replica: table: remove obsolete comment about reversed reads	2022-01-13 23:09:02 +02:00
Gleb Natapov	be46109af6	test: move cql_query_test.cc to new schema announcement api	2022-01-13 23:09:02 +02:00
Avi Kivity	63d254a8d2	Merge 'gms, service: futurize and coroutinize gossiper-related code' from Pavel Solodovnikov This series greatly reduces gossipers' dependence on `seastar::async` (yet, not completely). `i_endpoint_state_change_subscriber` callbacks are converted to return futures (again, to get rid of `seastar::async` dependency), all users are adjusted appropriately (e.g. `storage_service`, `cdc::generation_service`, `streaming::stream_manager`, `view_update_backlog_broker` and `migration_manager`). This includes futurizing and coroutinizing the whole function call chain up to the `i_endpoint_state_change_subscriber` callback functions. To aid the conversion process, a non-`seastar::async` dependent variant of `utils::atomic_vector::for_each` is introduced (`for_each_futurized`). A different name is used to clearly distinguish converted and non-converted code, so that the last step (remove `seastar::async()` wrappers around callback-calling code in gossiper) is easier. This is left for a follow-up series, though. Tests: unit(dev) Closes #9844 * github.com:scylladb/scylla: service: storage_service: coroutinize `set_gossip_tokens` service: storage_service: coroutinize `leave_ring` service: storage_service: coroutinize `handle_state_left` service: storage_service: coroutinize `handle_state_leaving` service: storage_service: coroutinize `handle_state_removing` service: storage_service: coroutinize `do_drain` service: storage_service: coroutinize `shutdown_protocol_servers` service: storage_service: coroutinize `excise` service: storage_service: coroutinize `remove_endpoint` service: storage_service: coroutinize `handle_state_replacing` service: storage_service: coroutinize `handle_state_normal` service: storage_service: coroutinize `update_peer_info` service: storage_service: coroutinize `do_update_system_peers_table` service: storage_service: coroutinize `update_table` service: storage_service: coroutinize `handle_state_bootstrap` service: storage_service: futurize `notify_*` functions service: storage_service: coroutinize `handle_state_replacing_update_pending_ranges` repair: row_level_repair_gossip_helper: coroutinize `remove_row_level_repair` locator: reconnectable_snitch_helper: coroutinize `reconnect` gms: i_endpoint_state_change_subscriber: make callbacks to return futures utils: atomic_vector: introduce future-returning `for_each` function utils: atomic_vector: rename `for_each` to `thread_for_each` gms: gossiper: coroutinize `start_gossiping` gms: gossiper: coroutinize `force_remove_endpoint` gms: gossiper: coroutinize `do_status_check` gms: gossiper: coroutinize `remove_endpoint`	2022-01-13 23:09:02 +02:00
Gleb Natapov	100b44f5ff	test use new schema announcement api in cql_test_env.cc	2022-01-13 23:09:02 +02:00
Avi Kivity	230eac439e	Update seastar submodule * seastar ae8d1c28a2...5025cd44ea (2): > Merge "Lazy IO capacity replenishment" from Pavel E Fixes #9893 > configure.py: don't use deprecated mktemp()	2022-01-13 23:09:02 +02:00
Gleb Natapov	5dffc8ed3e	test: convert database_test to new schema announcement api	2022-01-13 23:09:02 +02:00
Gleb Natapov	c500a90902	raft service: make one way raft messages truly one way Raft core does not expect replies for most messages it sends, but they are defined as two way by the IDL currently. Fix them to be one way.	2022-01-13 13:14:46 +02:00
Gleb Natapov	b1fea20d36	raft: move raft verbs to the IDL	2022-01-13 13:14:46 +02:00
Gleb Natapov	8a25b740df	raft: split idl to rpc and storage Storage uses only small part of the IDL, so it can include only the part that is relevant to it.	2022-01-13 13:14:46 +02:00
Gleb Natapov	b0dee71b34	idl-compiler: always produce const variant of serializers Currently const variant is produced only if a type and its const usage are in the same idl file, but a type can be defined in one file and used as const in another.	2022-01-13 13:14:46 +02:00
Gleb Natapov	c5474f9ac2	raft: simplify raft idl definitions We may use high level types in the IDL.	2022-01-13 13:14:46 +02:00
Nadav Har'El	f842f65794	Merge 'thrift: switch to replica::database uses to data_dictionary' from Avi Kivity replica::database is (as its name indicates) a replica-side service, while thrift is coordinator-side. Convert thrift's use of replica::database for data dictionary lookups to the data_dictionary module. Since data_dictionary was missing a get_keyspaces() operation, add that. Thrift still uses replica::database to get the schema version. That should be provided by migration_manager, but changing that is left for later. Closes #9888 * github.com:scylladb/scylla: thrift: switch from replica module to data_dictionary module thrift: simplify execute_schema_command() calling convention data_dictionary: add get_keyspaces() method	2022-01-13 10:52:30 +02:00
Nadav Har'El	343c521e28	alternator: avoid large contigous allocation in BatchGetItem The BatchGetItem request can return a very large response - according to DynamoDB documentation up to 16 MB, but presently in Alternator, we allow even more (see #5944). The problem is that the existing code prepares the entire response as a large contiguous string, resulting in oversized allocation warnings - and potentially allocation failures. So in this patch we estimate the size of the BatchGetItem response, and if it is "big enough" (currently over 100 KB), we return it with the recently added streaming output support. This streaming output doesn't avoid the extra memory copies unfortunately, but it does avoid a contiguous allocation which is the goal of this patch. After this patch, one oversized allocation warning is gone from the test: test/alternator/run test_batch.py::test_batch_get_item_large (a second oversized allocation is still present, but comes from the unrelated BatchWriteItem issue #8183). Fixes #8522 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220111170541.637176-1-nyh@scylladb.com>	2022-01-13 09:46:08 +01:00
Kamil Braun	e98711cfcb	db: config: add a flag to disable new reversed reads algorithm Just in case the new algorithm turns out to be buggy, or give a performance regression, add a flag to fall-back to the old algorithm for use in the field.	2022-01-12 18:59:19 +01:00
Avi Kivity	6205d40d5f	thrift: switch from replica module to data_dictionary module Thrift is a coordinator-side service and should not touch the replica module. Switch it to data_dictionary. The switch is straightforward with two exceptions: - client_state still receives replica::database parameters. After this change it will be easier to adapt client_state too. - calls to replica::database::get_version() remain. They should be rerouted to migration_manager instead, as that deals with schema management.	2022-01-12 19:54:38 +02:00
Kamil Braun	7fb7a406e7	replica: table: remove obsolete comment about reversed reads	2022-01-12 17:57:08 +01:00
Avi Kivity	85061b694b	thrift: simplify execute_schema_command() calling convention execute_schema_command is always called with the same first two parameters, which are always defined froom the thrift_handler instance that contains its caller. Simplify it by making it a member function. This simplifies migration to data_dictionary in the next patch.	2022-01-12 18:56:47 +02:00
Avi Kivity	631a19884d	data_dictionary: add get_keyspaces() method Mirroring replica::database::get_keyspaces(), for Thrift's use. We return a vector instead of a hash map. Random access is already available via database::find_keyspace(). The name is available via the keyspace metadata, and in fact Thrift ignore the map name and uses the metadata name. Using a simpler type reduces include dependencies for this heavily used module. The function is plumbed to replica::database::get_keyspaces() so it returns the same data.	2022-01-12 18:24:38 +02:00
Raphael S. Carvalho	a144d30162	distributed_loader: postpone reshape of repair-originated sstables SSTables created by repair will potentially not conform to the compaction strategy layout goal. If node shuts down before off-strategy has a chance to reshape those files, node will be forced to reshape them on restart. That causes unexpected downtime. Turns out we can skip reshape of those files on boot, and allow them to be reshaped after node becomes online, as if the node never went down. Those files will go through same procedure as files created by repair-based ops. They will be placed in maintenance set, and be reshaped iteratively until ready for integration into the main set. Fixes #9895. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-01-12 13:14:31 -03:00
Nadav Har'El	8bcd23fa02	Merge: move rest of internal ddl users to use raft from Gleb The patch series moves the rest of internal ddl users to do schema change over raft (if enabled). After that series only tests are left using old API. * 'gleb/raft-schema-rest-v6' of github.com:scylladb/scylla-dev: (33 commits) migration_manager: drop no longer used functions system_distributed_keyspace: move schema creation code to use raft auth: move table creation code to use raft auth: move keyspace creation code to use raft table_helper: move schema creation code to use raft cql3: make query_processor inherit from peering_sharded_service table_helper: make setup_table() static table_helper: co-routinize setup_keyspace() redis: move schema creation code to go through raft thrift: move system_update_column_family() to raft thrift: authenticate a statement before verifying in system_update_column_family() thrift: co-routinize system_update_column_family() thrift: move system_update_keyspace() to raft thrift: authenticate a statement before verifying in system_update_keyspace() thrift: co-routinize system_update_keyspace() thrift: move system_drop_keyspace() to raft thrift: authenticate a statement before verifying in system_drop_keyspace() thrift: co-routinize system_drop_keyspace() thrift: move system_add_keyspace() to raft thrift: co-routinize system_add_keyspace() ...	2022-01-12 18:09:08 +02:00
Raphael S. Carvalho	f9e33f7046	sstables: Introduce filter for sstable_directory::reshape This will be useful to allow sstable_directory user to filter out sstables that should not be reshaped. The default filter is implemented as including everything. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-01-12 11:54:17 -03:00
Gleb Natapov	2aec9009ef	migration_manager: drop no longer used functions	2022-01-12 16:40:06 +02:00
Gleb Natapov	9ce62bcc33	system_distributed_keyspace: move schema creation code to use raft	2022-01-12 16:40:06 +02:00
Gleb Natapov	50b7806c57	auth: move table creation code to use raft	2022-01-12 16:40:06 +02:00
Gleb Natapov	4273a3308c	auth: move keyspace creation code to use raft	2022-01-12 16:40:06 +02:00
Gleb Natapov	03184bd786	table_helper: move schema creation code to use raft	2022-01-12 16:40:06 +02:00
Gleb Natapov	eb62e81843	cql3: make query_processor inherit from peering_sharded_service This what we can get to a distributed object from shard local one.	2022-01-12 16:40:06 +02:00

1 2 3 4 5 ...

29806 Commits