scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	f6eae191ff	sstables/storage: Do storage init/destroy based on storage options It's only local storage type that needs directores touch/remove, S3 storage initialization is for now a no-op, maybe some day soon it will appear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	11b704e8b8	replica/{ks\|cf}: Move storage init/destroy to sstables manager It's the manager that knows about storages and it should init/destroy it. Also the "upload" and "staging" paths are about to be hidden in sstables/ code, this code move also facilitates that. The indentation in storage.cc is deliberately broken to make next patch look nicer (spoiler: it won't have to shift those lines right). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Botond Dénes	76ab66ca1f	Merge 'Support state change for S3-backed sstables' from Pavel Emelyanov The sstable currently can move between normal, staging and quarantine state runtime. For S3-backed sstables the state change means maintaining the state itself in the ownership table and updating it accordingly. There's also the upload facility that's implemented as state change too, but this PR doesn't support this part. fixes: #13017 Closes scylladb/scylladb#15829 * github.com:scylladb/scylladb: test: Make test_sstables_excluding_staging_correctness run over s3 too sstables,s3: Support state change (without generation change) system_keyspace: Add state field to system.sstables sstable_directory: Tune up sstables entries processing comment system_keyspace: Tune up status change trace message sstables: Add state string to state enum class convert	2023-11-07 10:45:41 +02:00
Benny Halevy	a1acf6854b	everywhere: reduce dependencies on i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:47:44 +02:00
Benny Halevy	aa70e3a536	dht: fold compatible_ring_position in ring_position.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Pavel Emelyanov	d827068d01	sstables,s3: Support state change (without generation change) Now when the system.sstables has the state field, it can be changed (UPDATEd). However, when changing the state AND generation, this still won't work, because generation is the clustering key of the table in question and cannot be just changed. This, nonetheless, is OK, as generation changes with state only when moving an sstable from upload dir into normal/staging and this is separate issue for S3 (#13018). For now changing state only is OK. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	ca5d3d217f	system_keyspace: Add state field to system.sstables The state is one of <empty>(normal)/staging/quarantine. Currently when sstable is moved to non-normal state the s3 backend state_change() call throws thus such sstables do not appear. Next patches are going to change that and the new field in the system.sstables is needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	295936c1d3	sstable_directory: Tune up sstables entries processing comment In fact, this FIXME had been fixed by `2c9ec6bc` (sstable_directory: Garbage collect S3 sstables on reboot) and is no longer valid. However, it's still good to know if GC failed or misbehaved, so replace the comment with a warning. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	63758d19ce	sstables: Add state string to state enum class convert There's the backward converter already out there. Next code will need to convert string representation of the state back to the internal type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Kefu Chai	b36cef6f1a	sstable: remove _remote_prefix from s3_storage since we use the sstable.generation() for the remote prefix of the key of the object for storing the sstable component, there is no need to set remote_prefix beforehand. since `s3_storage::ensure_remote_prefix()` and `system_kesypace::sstables_registry_lookup_entry()` are not used anymore, they are removed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 10:08:22 +08:00
Kefu Chai	af8bc8ba63	sstable: switch to uuid identifier for naming S3 sstable objects before this change, we create a new UUID for a new sstable managed by the s3_storage, and we use the string representation of UUID defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for naming the objects stored on s3_storage. but this representation is not what we are using for storing sstables on local filesystem when the option of "uuid_sstable_identifiers_enabled" is enabled. instead, we are using a base36-based representation which is shorter. to be consistent with the naming of the sstables created for local filesystem, and more importantly, to simplify the interaction between the local copy of sstables and those stored on object storage, we should use the same string representation of the sstable identifier. so, in this change: 1. instead of creating a new UUID, just reuse the generation of the sstable for the object's key. 2. do not store the uuid in the sstable_registry system table. As we already have the generation of the sstable for the same purpose. 3. switch the sstable identifier representation from the one defined by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the base36-based one (implemented by fmt::formatter<sstables::generation_type>) 4. enable the `uuid_sstable_identifers` cluster feature if it is enabled in the `test_env_config`, so that it the sstable manager can enable the uuid-based uuid when creating a new uuid for sstable. 5. throw if the generation of sstable is not UUID-based when accessing / manipulating an sstable with S3 storage backend. as the S3 storage backend now relies on this option. as, otherwise we'd have sstables with key like s3://bucket/number/basename, which is just unable to serve as a unique id for sstable if the bucket is shared across multiple tables. Fixes #14175 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 10:08:22 +08:00
Raphael S. Carvalho	fded314e46	sstables: Fix update of tombstone GC settings to have immediate effect After "repair: Get rid of the gc_grace_seconds", the sstable's schema (mode, gc period if applicable, etc) is used to estimate the amount of droppable data (or determine full expiration = max_deletion_time < gc_before). It could happen that the user switched from timeout to repair mode, but sstables will still use the old mode, despite the user asked for a new one. Another example is when you play with value of grace period, to prevent data resurrection if repair won't be able to run in a timely manner. The problem persists until all sstables using old GC settings are recompacted or node is restarted. To fix this, we have to feed latest schema into sstable procedures used for expiration purposes. Fixes #15643. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15746	2023-10-19 16:27:59 +03:00
Kefu Chai	c8cb70918b	sstable: drop unused parse() overload for deletion_time `deletion_time` is a part of the `partition_header`, which is in turn a part of `partition`. and `data_file` is a sequence of `partition`. `data_file` represents *-Data.db component of an SSTable. see docs/architecture/sstable3/sstables-3-data-file-format.rst. we always parse the data component via `flat_mutation_reader_v2`, which is in turn implemented with mx/reader.cc or kl/reader.cc depending on the version of SSTable to be read. in other words, we decode `deletion_time` in mx/reader.cc or kl/reader.cc, not in sstable.cc. so let's drop the overload parse() for deletion_time. it's not necessary and more importantly, confusing. Refs #15116 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15756	2023-10-18 18:41:56 +03:00
Avi Kivity	f3dc01c85e	Merge 'Enlight sstable_directory construction' from Pavel Emelyanov Currently distributed_loader starts sharded<sstable_directory> with four sharded parameters. That's quite bulky and can be made much shorter. Closes scylladb/scylladb#15653 * github.com:scylladb/scylladb: distributed_loader: Remove explicit sharded<erms> distributed_loader: Brush up start_subdir() sstable_directory: Add enlightened construction table: Add global_table_ptr::as_sharded_parameter()	2023-10-18 16:42:04 +03:00
Botond Dénes	f7e269ccb8	Merge 'Progress of compaction executors' from Aleksandra Martyniuk compaction_read_monitor_generator is an existing mechanism for monitoring progress of sstables reading during compaction. In this change information gathered by compaction_read_monitor_generator is utilized by task manager compaction tasks of the lowest level, i.e. compaction executors, to calculate task progress. compaction_read_monitor_generator has a flag, which decides whether monitored changes will be registered by compaction_backlog_tracker. This allows us to pass the generator to all compaction readers without impacting the backlog. Task executors have access to compaction_read_monitor_generator_wrapper, which protects the internals of compaction_read_monitor_generator and provides only the necessary functionality. Closes scylladb/scylladb#14878 * github.com:scylladb/scylladb: compaction: add get_progress method to compaction_task_impl compaction: find total compaction size compaction: sstables: monitor validation scrub with compaction_read_generator compaction: keep compaction_progress_monitor in compaction_task_executor compaction: use read monitor generator for all compactions compaction: add compaction_progress_monitor compaction: add flag to compaction_read_monitor_generator	2023-10-18 12:19:51 +03:00
Botond Dénes	7f81957437	Merge 'Initialize datadir for system and non-system keyspaces the same way' from Pavel Emelyanov When populating system keyspace the sstable_directory forgets to create upload/ subdir in the tables' datadir because of the way it's invoked from distributed loader. For non-system keyspaces directories are created in table::init_storage() which is self-contained and just creates the whole layout regardless of what. This PR makes system keyspace's tables use table::init_storage() as well so that the datadir layout is the same for all on-disk tables. Test included. fixes: #15708 closes: scylladb/scylla-manager#3603 Closes scylladb/scylladb#15723 * github.com:scylladb/scylladb: test: Add test for datadir/ layout sstable_directory: Indentation fix after previous patch db,sstables: Move storage init for system keyspace to table creation	2023-10-18 12:12:19 +03:00
Kefu Chai	203f41dc99	sstable: improve descriptions of capped.*deletion_time before this change, they reads > Was local deletion time capped at ... and > Was partition tombstone deletion time capped at ... the "Was" part is confusing. and the first description is not accurate enough. so let's improve them a little bit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15108	2023-10-18 09:40:02 +03:00
Pavel Emelyanov	c3b3e5b107	sstable_directory: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-16 16:26:37 +03:00
Pavel Emelyanov	059d7c795e	db,sstables: Move storage init for system keyspace to table creation User and system keyspaces are created and populated slightly differently. System keyspace is created via system_keyspace::make() which eventually calls calls add_column_family(). Then it's populated via init_system_keyspace() which calls sstable_directory::prepare() which, in turn, optionally creates directories in datadir/ or checks the directory permissions if it exists User keyspaces are created with the help of add_column_family_and_make_directory() call which calls the add_column_family() mentioned above _and_ calls table::init_storage() to create directories. When it's populated with init_non_system_keyspaces() it also calls sstable_directory::prepare() which notices that the directory exists and then checks the permissions. As a result, sstable_directory::prepare() initializes storage for system keyspace only and there's a BUG (#15708) that the upload/ subdir is not created. This patch makes the directories creation for _all_ keyspaces with the table::init_storage(). The change only touches system keyspace by moving the creation of directories from sstable_directory::prepare() into system_keyspace::make(). Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-16 16:19:25 +03:00
Aleksandra Martyniuk	7b3e0ab1f2	compaction: sstables: monitor validation scrub with compaction_read_generator Validation scrub bypasses the usual compaction machinery, though it still needs to be tracked with compaction_progress_monitor so that we could reach its progress from compaction task executor. Track sstable scrub in validate mode with read monitors.	2023-10-12 17:03:46 +02:00
Pavel Emelyanov	795dcf2ead	sstable_directory: Add enlightened construction The existing constructor is pretty heavyweight for the distributed loader to use -- it needs to pass it 4 sharded parameters which looks pretty bulky in the text editor. However, 5 constructor arguments are obtained directly from the table, so the dist. loader code with global table pointer at hand can pass _it_ as sharded parameter and let the sstable directory extract what it needs. Sad news is that sstable_directory cannot be switched to just use table reference. Tools code doesn't have table at hand, but needs the facilities sstable_directory provides Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-06 15:54:51 +03:00
Botond Dénes	96787ec0a5	Merge 'Do not keep excessive info on sstables::entry_descriptor' from Pavel Emelyanov The descriptor in question is used to parse sstable's file path and return back the result. Parser, among "relevant" info, also parses sstable directory and keyspace+table names. However, there are no code (almost) that needs those strings. And the need to construct descriptor with those makes some places obscurely use empty strings. The PR removes sstable's directory, keyspace and table names from descriptor and, while at it, relaxes the sstable directory code that makes descriptor out of a real sstable object by (!) parsing its Data file path back. Closes scylladb/scylladb#15617 * github.com:scylladb/scylladb: sstables: Make descriptor from sstable without parsing sstables: Do not keep directory, keyspace and table names on descriptor sstables: Make tuple inside helper parser method sstables: Do not use ks.cf pair from descriptor sstables: Return tuple from parse_path() without ks.cf hints sstables: Rename make_descriptor() to parse_path()	2023-10-05 15:15:23 +03:00
Pavel Emelyanov	d112098c08	sstables: Make descriptor from sstable without parsing When loading unshared remote sstable, sstable_directory needs to make a descriptor out of a real sstable. For that it parses the sstable's Data component path which is pretty weird. It's simpler to make descriptor out of the ssatble itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:01 +03:00
Pavel Emelyanov	96651e0ddb	sstables: Do not keep directory, keyspace and table names on descriptor Now no code uses those strings. Even worse -- there are some places that need to provide some strings but don't have real values at hand, so just hard-code the empty strings there (because they are really not used). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:01 +03:00
Pavel Emelyanov	6a601be1f3	sstables: Make tuple inside helper parser method This just moves the std::make_tuple() call into internal static path parsing helper to make the next patch smaller and nicer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:01 +03:00
Pavel Emelyanov	62d71d398f	sstables: Return tuple from parse_path() without ks.cf hints There are two path parsers. One of them accepts keyspace and table names and the other one doesn't. The latter is then supposed to parse the ks.cf pair from path and put it on the descriptor. This patch makes this method return ks.cf so that later it will be possible to remove these strings from the desctiptor itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:00 +03:00
Pavel Emelyanov	d56f9db121	sstables: Rename make_descriptor() to parse_path() The method really parses provided path, so the existing name is pretty confusing. It's extra confusing in the table::get_snapshot_details() where it's just called and the return value is simply ignored. Named "parse_..." makes it clear what the method is for. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 11:04:07 +03:00
Piotr Jastrzebski	9edf6e4653	sstable_set: Remove unused _schema field Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com>	2023-10-04 18:50:23 +02:00
Piotr Jastrzebski	ce2be977a6	sstable_set_impl: Return also schema from make_incremental_selector Define sstable_set_impl::selector_and_schema_t type as a tuple that contains both a newly created selector and a schema that the selector is using. This will allow removal of _schema field from sstable_set class as the only place it was used was make_incremental_selector. Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com>	2023-10-04 18:40:05 +02:00
Piotr Jastrzebski	47917bcf22	filter: hash key once per sstable set not sstable Before this commit the primary key was hashed for bloom filter check for each sstable. This commit makes the key be hashed once per sstable set and reused for bloom filter lookups in all sstables in the set. I tested this change using perf_simple_query with the following modifications: 1. Create more than one sstable to have sstable set of more than one elements 2. Try to prevent compactions (I wasn't 100% successful) 3. Use a key that's not present to avoid reading from disk ``` diff --git a/test/perf/perf_simple_query.cc b/test/perf/perf_simple_query.cc index 26dbf1e99..6bd460df2 100644 --- a/test/perf/perf_simple_query.cc +++ b/test/perf/perf_simple_query.cc @@ -105,6 +105,8 @@ std::ostream& operator<<(std::ostream& os, const test_config& cfg) { static void create_partitions(cql_test_env& env, test_config& cfg) { std::cout << "Creating " << cfg.partitions << " partitions..." << std::endl; + // Create 10 sstables each with all the data + for (unsigned count = 0; count < 10; ++count) { for (unsigned sequence = 0; sequence < cfg.partitions; ++sequence) { if (cfg.counters) { execute_counter_update_for_key(env, make_key(sequence)); @@ -117,6 +119,7 @@ static void create_partitions(cql_test_env& env, test_config& cfg) { std::cout << "Flushing partitions..." << std::endl; env.db().invoke_on_all(&replica::database::flush_all_memtables).get(); } + } } static int64_t make_random_seq(test_config& cfg) { @@ -137,8 +140,18 @@ static std::vector<perf_result> test_read(cql_test_env& env, test_config& cfg) { query += " using timeout " + cfg.timeout; } auto id = env.prepare(query).get0(); - return time_parallel([&env, &cfg, id] { - bytes key = make_random_key(cfg); + // Always use the same key that is not present + // to make sure we don't read from disk and make + // the benchmark CPU bounded. + int64_t key_value = 6; + bytes key(bytes::initialized_later(), 5sizeof(key_value)); + auto i = key.begin(); + write<uint64_t>(i, key_value); + write<uint64_t>(i, key_value); + write<uint64_t>(i, key_value); + write<uint64_t>(i, key_value); + write<uint64_t>(i, key_value); + return time_parallel([&env, id, key] { return env.execute_prepared(id, {{cql3::raw_value::make_value(std::move(key))}}).discard_result(); }, cfg.concurrency, cfg.duration_in_seconds, cfg.operations_per_shard, cfg.stop_on_error); } @@ -423,6 +436,10 @@ static std::vector<perf_result> do_cql_test(cql_test_env& env, test_config& cfg) .with_column("C2", bytes_type) .with_column("C3", bytes_type) .with_column("C4", bytes_type) + // Try to prevent compaction + // to keep the number of sstables high + .set_compaction_enabled(false) + .set_min_compaction_threshold(2000000000) .build(); }).get(); @@ -539,6 +556,11 @@ int scylla_simple_query_main(int argc, char* argv) { const auto enable_cache = app.configuration()["enable-cache"].as<bool>(); std::cout << "enable-cache=" << enable_cache << '\n'; db_cfg->enable_cache(enable_cache); + // Try to prevent compaction + // to keep the number of sstables high + db_cfg->concurrent_compactors(1); + db_cfg->compaction_enforce_min_threshold(true); + db_cfg->compaction_throughput_mb_per_sec(1); cql_test_config cfg(db_cfg); return do_with_cql_env_thread([&app] (auto&& env) { ``` The following command showed 2-3% improvement on my machine but this depends on the lenght of the key and the number of sstables in the set. ``` ./build/release/scylla perf-simple-query --bypass-cache --flush -c 1 --random-seed=2068087418 --enable-cache false ``` Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com> Closes scylladb/scylladb#15538	2023-09-26 16:27:11 +03:00
Botond Dénes	d5f095d5a4	Merge 'Make interaction of compaction strategy with sstable runs more robust and efficient' from Raphael "Raph" Carvalho SSTable runs work hard to keep the disjointness invariant, therefore they're expensive to build from scratch. For every insertion, it keeps the elements sorted by their first key in order to reject insertion of element that would introduce overlapping. Additionally, a sstable run can grow to dozens of elements (or hundreds) therefore, we can also make interaction with compaction strategies more efficient by not copying them when building a list of candidates in compaction manager. And less fragile by filtering out any sstable runs that are not completely eligible for compaction. Previously, ICS had to give up on using runs managed by sstable set due to fragility of the interface (meaning runs are being built from scratch on every call to the strategy, which is very inefficient, but that had to be done for correctness), but now we can restore that. Closes scylladb/scylladb#15440 * github.com:scylladb/scylladb: compaction: Switch to strategy_control::candidates() for regular compaction tests: Prepare sstable_compaction_test for change in compaction_strategy interface compaction: Allow strategy to retrieve candidates either as sstables or runs compaction: Make get_candidates() work with frozen_sstable_run too sstables: add sstable_run::run_identifier() sstables: tag sstable_run::insert() with nodiscard sstables: Make all_sstable_runs() more efficient by exposing frozen shared runs sstables: Simplify sstable_set interface to retrieve runs	2023-09-26 14:56:05 +03:00
Raphael S. Carvalho	4b193c04dd	sstables: add sstable_run::run_identifier() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	8235889b8a	sstables: tag sstable_run::insert() with nodiscard sstable_run may reject insertion of a sstable if it's going to break the disjoint invariant of the run, but it's important that the caller is aware of it, so it can act on it like generating a new run id for the sstable so it can be inserted in another run. the tag is important to avoid unknown problems in this area. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	0fe2630d70	sstables: Make all_sstable_runs() more efficient by exposing frozen shared runs Users of all_sstable_runs() don't want to mutate the runs, but rather work with their content. So let's avoid copy and make the intention explicit with the new frozen_sstable_run used as return type for the interface. This will guarantee that ICS will be able to fetch uncompacting runs efficiently. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:20 -03:00
Raphael S. Carvalho	9f6c3369d2	sstables: Simplify sstable_set interface to retrieve runs This interface selects all runs that store at least one of the sstables in the vector. But that's very fragile, to the point that even ICS had to stop using it. A better interface is to return all runs managed by the set and allow compaction manager to do its filtering. We want to use it in ICS to avoid the overhead of rebuilding sstable runs which may be expensive as sorting is performed to guarantee the disjoint invariant. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:04:20 -03:00
Pavel Emelyanov	99cbb6b733	sstable_directory: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-25 20:34:52 +03:00
Pavel Emelyanov	7ab03e33a2	sstable_directory: Simplify filesystem prepare() When FS lister gets prepared it - checks if the directory exists - creates if it it doesn't or bais out if it's quarantine one - goes and checks the directory's owner and mode The last step is excessive if the directory didn't exist on entry and was created. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-25 20:03:19 +03:00
Pavel Emelyanov	9c3e055d22	distributed_loader: Move directory touching to sstable_directory This is continuation of the previous patch -- when populating a table, creating directories should be (optionally) performed by the lister backend, not by the generic loader. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-25 20:01:53 +03:00
Pavel Emelyanov	2678cc2ae8	distributed_loader: Move directory existance checks to sstable_directory The loader code still "knows" that tables' sstables live in directories on datadir filesystem, but that's not always so. So whether or not the directory with sstables exists should be checked by sstable directory's component lister, not the loader. After this change potentially missing quarantine directory will be processed with the sstable directory with empty result, but that's OK, empty directories should be already handled correctly, so even if the directory lister doesn't produce any sstables because it found no files, or because it just skipped scanning doesn't make any difference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-25 19:59:41 +03:00
Pavel Emelyanov	603f3ca042	sstable_directory: Move prepare() core to lister Current sstable_directory::prepare() code checks the sstable directory existance which only makes sense for filesystem-backed sstables. S3-backed don't (well -- won't) have any directories in datadir, so the check should be moved into component lister. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-25 19:58:53 +03:00
Avi Kivity	1da6a939fe	Merge 'Track memory usage of S3 object uploads' from Pavel Emelyanov The S3 uploading sink needs to collect buffers internally before sending them out, because the minimal upload-able part size is 5Mb. When the necessary amount of bytes is accumulated, the part uploading fibers starts in the background. On flush the sink waits for all the fibers to complete and handles failure of any. Uploading parallelism is nowadays limited by the means of the http client max-connections parameter. However, when a part uploading fibers waits for it connection it keeps the 5Mb+ buffers on the request's body, so even though the number of uploading parts is limited, the number of _waiting_ parts is effectively not. This PR adds a shard-wide limiter on the number of background buffers S3 clients (and theirs http clients) may use. Closes scylladb/scylladb#15497 * github.com:scylladb/scylladb: s3::client: Track memory in client uploads code: Configure s3 clients' memory usage s3::client: Construct client with shared semaphore sstables::storage_manager: Introduce config	2023-09-21 18:24:42 +03:00
Botond Dénes	f6575344df	Merge 'Collect dangling object-store sstables' from Pavel Emelyanov Sstables in transitional states are marked with the respective 'status' in the registry. Currently there are two of such -- 'creating' and 'removing'. And the 'sealed' status for sstables in use. On boot the distributed loader tries to garbage collect the dangling sstables. For filesystem storage it's done with the help of temorary sstables' dirs and pending deletion logs. For s3-backed sstables, the garbage collection means fetching all non-sealed entries and removing the corresponding objects from the storage. Test included (last patch) fixes #13024 Closes scylladb/scylladb#15318 * github.com:scylladb/scylladb: test: Extend object_store test to validate GC works sstable_directory: Garbage collect S3 sstables on reboot sstable_directory: Pass storage to garbage_collect() sstable_directory: Create storage instance too	2023-09-21 09:15:00 +03:00
Pavel Emelyanov	182a5348d4	code: Configure s3 clients' memory usage This sets the real limits on the memory semaphore. - scylla sets it to 1% of total memory, 10Mb min, 100Mb max - tests set it to 16Mb - perf test sets it to all available memory Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-20 17:50:29 +03:00
Pavel Emelyanov	b299757884	s3::client: Construct client with shared semaphore The semaphore will be used to cap memory consumption by client. This patch makes sure the reference to a semaphore exists as an argument to client's constructor, not more than that. In scylla binary, the semaphore sits on storage_manager. In tests the semaphore is some local object. For now the semaphore is unused and is initialized locked as this patch just pushes the needed argument all the way around, next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-20 17:50:07 +03:00
Pavel Emelyanov	f40b4e3e84	sstables::storage_manager: Introduce config Just an empty config that's fed to storage_manager when constructed as a preparation for further heavier patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-20 17:42:59 +03:00
Botond Dénes	edb50c27ec	Merge 'Use sstable_state in sstables populator' from Pavel Emelyanov Some time ago populating of tables from sstables was reworked to use sstable states instead of full paths (#12707). Since then few places in the populator was left that still operate on the state-based subdirectory name. This PR collects most of those dangling ends refs: #13020 Closes scylladb/scylladb#15421 * github.com:scylladb/scylladb: distributed_loader: Print sstable state explicitly distributed_loader: Move check for the missing dir upper distributed_loader: Use state as _sstable_directories key	2023-09-18 14:38:49 +03:00
Kefu Chai	a51b14d4c4	sstables/metadata_collector: drop unused functions column_stats::update_local_deletion_time() is not used anywhere, what is being used is `column_stats::update_local_deletion_time_and_tombstone_histogram(time_point)`. while `update_local_deletion_time_and_tombstone_histogram(int32_t)` is only used internally by a single caller. neither is `column_stats::update(const deletion_time&)` used. so let's drop them. and merge `update_local_deletion_time_and_tombstone_histogram(int32_t)` into its caller. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15189	2023-09-18 10:18:56 +03:00
Pavel Emelyanov	4370e6c8d0	distributed_loader: Print sstable state explicitly When populating from a particular directory, populator code converts state to subdir name, then prints the path. The conversion is pretty much artificial, it's better to provide printer for state and print state explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-14 16:41:26 +03:00
Petr Gusev	b90011294d	config.cc: drop db::config::host_id In this refactoring commit we remove the db::config::host_id field, as it's hacky and duplicates token_metadata::get_my_id. Some tests want specific host_id, we add it to cql_test_config and use in cql_test_env. We can't pass host_id to sstables_manager by value since it's initialized in database constructor and host_id is not loaded yet. We also prefer not to make a dependency on shared_token_metadata since in this case we would have to create artificial shared_token_metadata in many tools and tests where sstables_manager is used. So we pass a function that returns host_id to sstables_manager constructor.	2023-09-13 23:00:15 +04:00
Pavel Emelyanov	2c9ec6bc93	sstable_directory: Garbage collect S3 sstables on reboot When booting there can be dangling entries in sstables registry as well as objects on the storage itself. This patch makes the S3 lister list those entries and then kick the s3_storage to remove the corresponding objects. At the end the dangling entries are removed from the registry Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-12 09:56:13 +03:00

1 2 3 4 5 ...

3281 Commits