scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 23:13:15 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	562fcf0c19	locator: Keep optional initial_tablets on r.s. params Now all the callers have it at hands (spoiler: not yet initialized, but still) so the params can also have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:02:41 +03:00
Pavel Emelyanov	a67c535539	keyspace_metadata: Carry optional<initial_tablets> on board The object in question fully describes the keyspace to be created and, among other things, contains replication strategy options. Next patches move the "initial_tablets" option out of those options and keep it separately, so the ks metadata should also carry this option separately. This patch is _just_ extending the metadata creation API, in fact the new field is unused (write-only) so all the places that need to provide this data keep it disengaged and are explicitly marked with FIXME comment. Next patches will fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:58:05 +03:00
Pavel Emelyanov	a943bd927b	locator: Call create_replication_strategy() with r.s. params Previous patch added params to r.s. classes' constructors, but callers don't construct those directly, instead they use the create_r.s.() wrapper. This patch adds params to the wrapper too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:54:59 +03:00
Nadav Har'El	79011eeb24	Merge 'virtual_tables, schema_registry: fix use after free related to schema registry' from Avi Kivity Both virtual tables and schema registry contain thread_local caches that are destroyed at thread exit. after a Seastar change[1], these destructions can happen after the reactor is destroyed, triggering a use-after-free. Fix by scoping the destruction so it takes place earlier. [1] `101b245ed7` Closes scylladb/scylladb#16510 * github.com:scylladb/scylladb: schema_registry, database: flush entries when no longer in use virtual_tables: scope virtual tables registry in system_keyspace	2023-12-21 17:10:25 +02:00
Avi Kivity	c00b376a3e	schema_registry, database: flush entries when no longer in use The schema registry disarms internal timers when it is destroyed. This accesses the Seastar reactor. However, after [1] we don't have ordering between the reactor destruction and the thread_local registry destruction. Fix this by flushing all entries when the database is destroyed. The database object is fundamental so it's unlikely we'll have anything using the registry after it's gone. [1] `101b245ed7`	2023-12-21 17:00:41 +02:00
Kefu Chai	6018e0fea7	database: log when done with truncating truncating is an unusual operation, and we write a logging message when the truncate op starts with INFO level, it would be great if we can have a matching logging messge indicating the end of truncate on the server side. this would help with investigation the TRUNCATE timeout spotted on the client. at least we can rule out the problem happening we server is performing truncate. Refs #15610 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16247	2023-12-21 13:59:09 +02:00
Raphael S. Carvalho	5e55954f27	replica: Make the storage snapshot survive concurrent compactions Consider this: 1) file streaming takes storage snapshot = list of sstables 2) concurrent compaction unlink some of those sstables from file system 3) file streaming tries to send unlinked sstables, but files other than data and index cannot be read as only data and index have file descriptors opened To fix it, the snapshot now returns a set of files, one per sstable component, for each sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16476	2023-12-21 12:50:28 +02:00
Raphael S. Carvalho	5fa69b8a67	replica: Fix indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-18 10:23:22 -03:00
Raphael S. Carvalho	8a9784d29c	replica: Kill unused calculate_disk_space_used_for() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-18 10:22:19 -03:00
Raphael S. Carvalho	546b31846a	replica: Introduce storage group splitting This introduces the ability to split a storage group. The main compaction group is split into left and right groups. set_split() is used to set the storage group to splitting mode, which will create left and right compaction groups. Incoming writes will now be placed into memtable of either left or right groups. split() is used to complete the splitting of a group. It only returns when all preexisting data is split. That means main compaction group will be empty and all the data will be stored in either left or right group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 12:02:01 -03:00
Raphael S. Carvalho	3c5b00ea04	replica: Add storage_group::memtable_count() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	e5a9299696	replica: Add compaction_group::empty() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	213b2f1382	replica: Rename compaction_group_manager to storage_group_manager That's to reflect the fact that the manager now works with storage groups instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	15de1cdcbc	replica: Introduce concept of storage group Storage group is the storage of tablets. This new concept is helpful for tablet splitting, where the storage of tablet will be split in multiple compaction groups, where each can be compacted independently. The reason for not going with arena concept is that it added complexity, and it felt much more elegant to keep compaction group unchanged which at the end of the day abstracts the concept of a set of sstables that can be compacted and operated independently. When splitting, the storage group for a tablet may therefore own multiple compaction groups, left, right, and main, where main keeps the data that needs splitting. When splitting completes, only left and right compaction groups will be populated. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	55bcfba4de	replica: Allow uncompacted SSTables to be moved into a new set With off-strategy, we allow sstables to be moved into a new sstable set even if they didn't undergo reshape compaction. That's done by specifying a sstable is present both in input and output, with the completion desc. We want to do the same with other compaction types. Think for example of split compaction: compaction manager may decide a sstable doesn't need splitting, yet it wants that sstable to be moved into a new sstable set. Theoretically, we could introduce new code to do this movement, but more code means increased maintenance burden and higher chances of bugs. It makes sense to reuse the compaction completion path, as we do today with off-strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Avi Kivity	7fce057cda	database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_sempaphore are triplicated: each metrics is registered for streaming, user, and system classes. To fix, just move the metrics registration from database to reader_concurrency_sempaphore, so each reader_concurrency_sempaphore instantiated will register its metrics (if its creator asked for it). Adjust the names given to reader_concurrency_sempaphore so we don't change the labels. scylla-gdb is adjusted to support the new names.	2023-12-13 09:16:18 -05:00
Botond Dénes	e1b30f50be	reader_concurrency_semaphore: add register_metrics constructor parameter To be used in the next patch to control whether the semaphore registers and exports metrics or not. We want to move metric registration to the semaphore but we don't want all semaphores to export metrics. The decision on whether a semaphore should or shouldn't export metrics should be made on a case-by-case basis so this new parameter has no default value (except for the for_tests constructor).	2023-12-13 06:25:45 -05:00
Avi Kivity	814f3eb6b5	sstables: name sstables_manager Soon, the reader_concurrency_semaphore will require a unique and meaningful name in order to label its metrics. To prepare for that, name sstable_manager instances. This will be used to generate a name for sstable_manager's reader_concurrency_semaphore.	2023-12-13 04:40:33 -05:00
Benny Halevy	cddcf3ad0c	table: add table_holder and hold method A smart pointer that guards the table object while it's being used by async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:43:49 +02:00
Benny Halevy	c8768f9102	table: stop: allow compactions to be stopped while closing async_gate To make sure a table object is kept valid throughout the lifetime of compaction a following patch will enter the table's _async_gate when the compaction task starts. This change defers awaiting the gate.close future till after stopping ongoing compaction so that closing the gate will prevent starting new compactions while ongoing compaction can be stopped and finally awaiting the close() future will wait for them to unwind and exit the gate after being stopped. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:31:50 +02:00
Nadav Har'El	12f0007ede	Merge 'Skip auto snapshot for non-local storages' from Pavel Emelyanov When a table is truncated or dropped it can be auto-snapshotted if the respective config option is set (by default it is). Non local storages don't implement snapshotting yet and emit on_internal_error() in that case aborting the whole process. It's better to skip snapshot with a warning instead. Closes scylladb/scylladb#16220 * github.com:scylladb/scylladb: database: Do not auto snapshot non-local storages' tables database: Simplify snapshot booleans in truncate_table_on_all_shards()	2023-12-11 12:13:48 +02:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Pavel Emelyanov	3eaadfcd4a	database: Do not auto snapshot non-local storages' tables Snapshotting is not yet supported for those (see #13025) and auto-snapshot would step on internal error. Skip it and print a warning into logs fixes #16078 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 13:47:12 +03:00
Pavel Emelyanov	44c076472c	database: Simplify snapshot booleans in truncate_table_on_all_shards() There are three of them in this function -- with_snapshot argument, auto_snapshot local copy of db::config option and the should_snapshot local variable that's && of the above two. The code can go with just one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 13:06:28 +03:00
Asias He	67cfa12c7d	compaction_group_for_token: Handle minimum_token and maximum_token token The following error was seen: [shard 0] table - compaction_group_for_token: compaction_group idx=0 range=(minimum token,-6917529027641081857] does not contain token=minimum token Since minimum_token or maximum_token will not be inside a token range. Skip the in token range check.	2023-12-07 14:54:12 +08:00
Tomasz Grabiec	7a59acf248	tablets: Fail gracefully when migrating tablet has no pending replica Before the patch we SIGSEGV trying to access pending replica in this case. Fail early instead.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	5381792401	tablets: Add per-tablet session id field to tablet metadata range_streamer will pick it up when creating topology_guard. It's materialized in memory only for migrating tablets in tablet_transition_info.	2023-12-06 18:36:17 +01:00
Botond Dénes	d2a88cd8de	Merge 'Typos: fix typos in code' from Yaniv Kaul Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255 Closes scylladb/scylladb#16289 * github.com:scylladb/scylladb: Update unified/build_unified.sh Update main.cc Update dist/common/scripts/scylla-housekeeping Typos: fix typos in code	2023-12-06 07:36:41 +02:00
Avi Kivity	12f160045b	Merge 'Get rid of fb_utilities' from Benny Halevy utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address. As part of the effort to get rid of all global state, this series gets rid of fb_utilities. This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address. Closes scylladb/scylladb#16250 * github.com:scylladb/scylladb: treewide: get rid of now unused fb_utilities tracing: use locator::topology rather than fb_utilities streaming: use locator::topology rather than fb_utilities raft: use locator::topology/messaging rather than fb_utilities storage_service: use locator::topology rather than fb_utilities storage_proxy: use locator::topology rather than fb_utilities service_level_controller: use locator::topology rather than fb_utilities misc_services: use locator::topology rather than fb_utilities migration_manager: use messaging rather than fb_utilities forward_service: use messaging rather than fb_utilities messaging_service: accept broadcast_addr in config rather than via fb_utilities messaging_service: move listen_address and port getters inline test: manual: modernize message test table: use gossiper rather than fb_utilities repair: use locator::topology rather than fb_utilities dht/range_streamer: use locator::topology rather than fb_utilities db/view: use locator::topology rather than fb_utilities database: use locator::topology rather than fb_utilities db/system_keyspace: use topology via db rather than fb_utilities db/system_keyspace: save_local_info: get broadcast addresses from caller db/hints/manager: use locator::topology rather than fb_utilities db/consistency_level: use locator::topology rather than fb_utilities api: use locator::topology rather than fb_utilities alternator: ttl: use locator::topology rather than fb_utilities gossiper: use locator::topology rather than fb_utilities gossiper: add get_this_endpoint_state_ptr test: lib: cql_test_env: pass broadcast_address in cql_test_config init: get_seeds_from_db_config: accept broadcast_address locator: replication strategies: use locator::topology rather than fb_utilities locator: topology: add helpers to retrieve this host_id and address snitch: pass broadcast_address in snitch_config snitch: add optional get_broadcast_address method locator: ec2_multi_region_snitch: keep local public address as member ec2_multi_region_snitch: reindent load_config ec2_multi_region_snitch: coroutinize load_config ec2_snitch: reindent load_config ec2_snitch: coroutinize load_config thrift: thrift_validation: use std::numeric_limits rather than fb_utilities	2023-12-05 19:40:14 +02:00
Yaniv Kaul	ae2ab6000a	Typos: fix typos in code Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255	2023-12-05 15:18:11 +02:00
Tomasz Grabiec	2d4cd9c574	tablets: Fix topology_metadata_guard holding on to the old erm Since abort callbacks are fired synchronously, we must change the table's erm before we do that so that the callbacks obtain the new erm. Otherwise, we will block barriers.	2023-12-05 14:09:34 +01:00
Pavel Emelyanov	9bbbe7a99f	discard_sstables: Atomically delete all sstables When collected sstables are deleted each is passed into sstables_manager.delete_atomically(). For on-disk sstables this creates a deletion log for each removed stable, which is quite an overkill. The atomic deletion callback already accepts vector of shared sstables, so it's simpler (and a bit faster) to remove them all in a batch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:14:23 +03:00
Pavel Emelyanov	96bc530a57	discard_sstables: Indentation and formatting fix after previous patch By "formatting" fix I mean -- remove the temporary on-stack references that were left for the ease of patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Pavel Emelyanov	6d135fea43	discard_sstable: Open-code local prune() lambda The lambda in question was the struct pruner method and was left there for the ease of patching. Now, when this lambda is only called once inside the function it is declared in, it can be open-coded into the place where it's called Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Pavel Emelyanov	68cb2e66fc	discard_sstables: Do not allocate pruner This allocation remained from the pre-coroutine times of the method. Now the contents of prumer -- refernce on table, vector and replay_position can reside on coroutine frame Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Benny Halevy	f9acc90926	table: use gossiper rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:43:47 +02:00
Benny Halevy	f40bb7c583	database: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Patryk Jędrzejczak	c8ee7d4499	db: make schema commitlog feature mandatory Using consistent cluster management and not using schema commitlog ends with a bad configuration throw during bootstrap. Soon, we will make consistent cluster management mandatory. This forces us to also make schema commitlog mandatory, which we do in this patch. A booting node decides to use schema commitlog if at least one of the two statements below is true: - the node has `force_schema_commitlog=true` config, - the node knows that the cluster supports the `SCHEMA_COMMITLOG` cluster feature. The `SCHEMA_COMMITLOG` cluster feature has been added in version 5.1. This patch is supposed to be a part of version 6.0. We don't support a direct upgrade from 5.1 to 6.0 because it skips two versions - 5.2 and 5.4. So, in a supported upgrade we can assume that the version which we upgrade from has schema commitlog. This means that we don't need to check the `SCHEMA_COMMITLOG` feature during an upgrade. The reasoning above also applies to Scylla Enterprise. Version 2024.2 will be based on 6.0. Probably, we will only support an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even if we support an upgrade from 2023.x, this patch won't break anything because 2023.1 is based on 5.2, which has schema commitlog. Upgrades from 2022.x definitely won't be supported. When we populate a new cluster, we can use the `force_schema_commitlog=true` config to use schema commitlog unconditionally. Then, the cluster feature check is irrelevant. This check could fail because we initiate schema commitlog before we learn about the features. The `force_schema_commitlog=true` config is especially useful when we want to use consistent cluster management. Failing feature checks would lead to crashes during initial bootstraps. Moreover, there is no point in creating a new cluster with `consistent_cluster_management=true` and `force_schema_commitlog=false`. It would just cause some initial bootstraps to fail, and after successful restarts, the result would be the same as if we used `force_schema_commitlog=true` from the start. In conclusion, we can unconditionally use schema commitlog without any checks in 6.0 because we can always safely upgrade a cluster and start a new cluster. Apart from making schema commitlog mandatory, this patch adds two changes that are its consequences: - making the unneeded `force_schema_commitlog` config unused, - deprecating the `SCHEMA_COMMITLOG` feature, which is always assumed to be true. Closes scylladb/scylladb#16254	2023-12-04 21:02:16 +02:00
Nadav Har'El	4505a86f46	tablets, mv: fix base-view pairing to consider base replication map In the view update code, the function get_view_natural_endpoint() determines which view replica this base replica should send an update to. It currently gets the view table's replication map (i.e., the map from view tokens to lists of replicas holding the token), but assumes that this is also the base table's replication map. This assumption was true with vnodes, but is no longer true with tablets - the base table's replication map can be completely different from the view table's. By looking at the wrong mapping, get_view_natural_endpoint() can believe that this node isn't really a base-replica and drop the view update. Alternatively, it can think it is a base replica - but use the wrong base-view pairing and create base-view inconsistencies. This patch solves this bug - get_view_natural_endpoint() now gets two separate replication maps - the base's and the view's. The callers need to remember what the base table was (in some cases they didn't care at the point of the call), and pass it to the function call. This patch also includes a simple test that reproduces the bug, and confirms it is fixed: The test has a 6-node cluster using tablets and a base table with RF=1, and writes one row to it. Before this patch, the code usually gets confused, thinking the base replica isn't a replica and loses the view update. With this patch, the view update works. Fixes #16227. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16228	2023-12-04 16:38:54 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Benny Halevy	66ba983fe0	compaction_manager: flush_all_tables before major compaction Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	be763bea34	database: add flush_all_tables Flushes all tables after forcing force_new_active_segment of the commitlog to make sure all commitlog segments can get recycled. Otherwise, due to "false sharing", rarely-written tables might inhibit recycling of the commitlog segments they reference. After `f42eb4d1ce`, that won't allow compaction to purge some tombstones based on the min_gc_time. To be used in the next patch by major compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	1fd85bd37b	api: compaction: add flush_memtables option When flushing is done externally, e.g. by running `nodetool flush` prior to `nodetool compact`, flush_memtables=false can be passed to skip flushing of tables right before they are major-compacted. This is useful to prevent creation of small sstables due to excessive memtable flushing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Botond Dénes	f46cdce9d3	Merge 'Make memtable flush tolerate misconfigured S3 storage' from Pavel Emelyanov Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal. Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well. fixes: #13745 Closes scylladb/scylladb#15635 * github.com:scylladb/scylladb: test: Add object_store test to validate config reloading works test: Add config update facility to test cluster test: Make S3_Server export config file as pathlib.Path config: Make object storage config updateable_value_source memtable: Extend list of checking codes sstables/storage/s3: Fix missing TOC status check s3/client: Map http exceptions into storage_io_error exceptions: Extend storage_io_error construction options	2023-11-28 09:33:37 +02:00
Raphael S. Carvalho	157a5c4b1b	treewide: Avoid using namespace sstables in header to avoid conflicts That's needed for compaction_group.hh to be included in headers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-23 17:36:57 +02:00
Tomasz Grabiec	b06a0078fb	Merge 'Support for sending tablet info to the drivers' from Sylwia Szunejko There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed. The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload. Mechanism for sending custom_payload added. Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable. I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly. Automatic tests added. Closes scylladb/scylladb#15410 * github.com:scylladb/scylladb: docs: add documentation about sending tablet info to protocol extensions Add tests for sending tablet info cql3: send tablet if wrong node/shard is used during modification statement cql3: send tablet if wrong node/shard is used during select statement locator: add function to check locality locator: add function to check if host is local transport: add function to add tablet info to the result_message transport: add support for setting custom payload	2023-11-22 17:44:07 +02:00
sylwiaszunejko	93420353f4	transport: add function to add tablet info to the result_message	2023-11-21 15:15:20 +01:00
Pavel Emelyanov	9eb96a03f0	memtable: Extend list of checking codes When flushing an sstable there can be errors that are not fatal and shouldn't cause the whole scylla to die. Currently only ENOSPC and EDQUOT are considered as such, but there's one more possibility -- access denied errors. Those can happen, for example, if datadir is chmod/chown-ed by mistake or intentionally while scylla is running (doing it pre-start time won't trigger the issue as distributed loader checks permissions of datadir on boot). Another option to step on "access denied" error is to flush memtable on S3 storage with broken configuration. Anyway, seeing the access denied error is also a good reason not to crash, but print a warning in logs and retry in a hope that the node administrator fixed things. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Nadav Har'El	0fd10690d4	Merge 'When creating S3-backed keyspace, check the endpoint instantly' from Pavel Emelyanov Currently CREATE KEYSPACE ... WITH STORAGE = { 'type' = 'S3' ... } will create keyspace even if the backend configuration is "invalid" in the sense that the requested endpoint is not known to scylla via object_storage.yaml config file. The first time after that when this misconfiguration will reveal itself is when flushing a memtable (see #15635), but it's good to know the endpoint is not configured earlier than that. fixes: #15074 Closes scylladb/scylladb#16038 * github.com:scylladb/scylladb: test: Add validation of misconfigured storage creation sstables: Throw early if endpoint for keyspace is not configured replica: Move storage options validation to sstables manager test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store sstables: Add has_endpoint_client() helper to manager	2023-11-20 21:12:48 +02:00
Pavel Emelyanov	f2a99ad30a	replica: Move storage options validation to sstables manager Currently the cql statement .validate() callback is responsible for checking if the non-local storage options are allowed with the respective feature. Next patch will need to extend this check to also validate the details of the provided storage options, but doing it at cql level doesn't seem correct -- it's "too far" from query processor down to sstables manager. Good news is that there's a lower-level validation of the new keyspace, namely the database::validate_new_keyspace() call. Move the storage options validation into sstables manager, while at it, reimplement it as a visitor to facilitate further extentions and plug the new validation to the aforementioned database::validate_new_keyspace(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 15:24:59 +03:00

1 2 3 4 5 ...

989 Commits