scylladb

Author	SHA1	Message	Date
Kefu Chai	3835ebfcdc	utils/managed_bytes: add fmt::formatters for managed_bytes and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * managed_bytes * managed_bytes_view * managed_bytes_opt Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 11:32:41 +08:00
Kefu Chai	3d9054991b	utils/logalloc: add fmt::formatter for occupancy_stats before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `occupancy_stats`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 11:32:41 +08:00
Avi Kivity	51df8b9173	interval: rename nonwrapping_interval to interval Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.	2024-02-21 19:43:17 +02:00
Avi Kivity	605bf6e221	range.hh: retire range.hh was deprecated in `bd794629f9` (2020) since its names conflict with the C++ library concept of an iterator range. The name ::range also mapped to the dangerous wrapping_interval rather than nonwrapping_interval. Complete the deprecation by removing range.hh and replacing all the aliases by the names they point to from the interval library. Note this now exposes uses of wrapping intervals as they are now explicit. The unit tests are renamed and range.hh is deleted. Closes scylladb/scylladb#17428	2024-02-21 00:24:25 +02:00
Kefu Chai	a7a2cf64cc	utils/rjson: add templated streaming_writer::Write() so we can use it in a templated context. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 18:12:35 +08:00
Kefu Chai	4da9a62472	utils: managed_bytes: fix typo in comment s/assigments/assignments/ this misspelling was identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17333	2024-02-15 10:37:25 +02:00
Botond Dénes	120442231f	Merge 'row_cache: test cache consistency during multi-partition cache updates' from Michał Chojnowski Adds a test reproducing https://github.com/scylladb/scylladb/issues/16759, and the instrumentation needed for it. Closes scylladb/scylladb#17208 * github.com:scylladb/scylladb: row_cache_test: test cache consistency during memtable-to-cache merge row_cache: use preemption_source in update() utils: preempt: add preemption_source	2024-02-13 17:37:06 +02:00
Michał Chojnowski	5a3e4a1cc0	utils: managed_bytes: optimize memory usage for small buffers managed_bytes is implemented as chain of blob_storage objects. Each blob_storage contains 24 bytes of metadata. But in the most common case -- when there is only a single element in the chain -- 16 bytes of this metadata is trivial/unused. This is regrettable waste because managed_bytes is used for every database cell in the memtables and cache. It means that every value of size >= 7 bytes (smaller ones fit in the inline storage of managed_bytes) receives 16 bytes of useless overhead. To correct that, this patch adds to managed_bytes an alternative storage layout -- used for buffers small enough to fit in one contiguous fragment -- which only stores the necessary minimum of metadata. (That is: a pointer to the parent, to facilitate moving the storage during memory defragmentation).	2024-02-09 20:56:20 +01:00
Michał Chojnowski	277a31f0ae	utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view Some methods of managed_bytes contain the logic needed to read/write the contents of managed_bytes, even though this logic is already present in managed_bytes_{,mutable}_view. Reimplementing those methods by using the views as intermediates allows us to remove some code and makes the responsibilities cleaner -- after the change, managed_bytes contains the logic of allocating and freeing the storage, while views provide read/write access to the storage. This change will simplify the next patch which changes the internals of managed_bytes.	2024-02-09 17:00:33 +01:00
Michał Chojnowski	fabab2f46f	utils: preempt: add preemption_source While `preemption_check` can be passed to functions to control their preemption points, there is no way to inspect the state of the system after the preemption results in a yield. `preemption_source` is a superset of `preemption_check`, which also allows for customizing the yield, not just the preemption check. An implementation passed by a test can hook the yield to put the tested function to sleep, run some code, and then wake the function up. We use the preprocessor to minimize the impact on release builds. Only dev-mode preemption_source is hookable. When it's used in other modes, it should compile to direct reactor calls, as if it wasn't used.	2024-02-07 18:31:28 +01:00
Pavel Emelyanov	ca261f8916	utils: Mark chunked_vector::max_chunk_capacity with constexpr It uses only compile-time constants to produce the value, so deserves this marking Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17181	2024-02-07 09:22:23 +02:00
Avi Kivity	784c2f8ad2	Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing. Closes scylladb/scylladb#17130 * github.com:scylladb/scylladb: treewide: replace seastar::future::get0() with seastar::future::get() sstable: capture return value of get0() using auto utils: result_loop: define result_type with decayed type [avi: add another one that snuck in while this was cooking]	2024-02-04 15:23:33 +02:00
Pavel Emelyanov	75bc702ae8	utils: Remove unused operator<< for file_lock object The lock itself is only used by utils/directories code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17051	2024-02-02 15:20:40 +01:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Kefu Chai	9fcca8f585	utils: result_loop: define result_type with decayed type this change prepares for replacing `seastar::future::get0()` with `seastar::future::get()`. the former's return type is a plain `T`, while the latter is `T&&`. in this case `T` is `boost::outcome::result<..>`. in order to extract its `error_type`, we need to get its decayed type. since `std::remove_reference_t<T>` also returns `T`, let's use it so it works with both `get0()` and `get()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 22:12:18 +08:00
Kefu Chai	946d281d39	exceptions: s/#warn/#warning/ `#warning` is a preprocessor macro in C/C++, while `#warn` is not. the reason we haven't run into the build failure caused by this is likely that we are only building on amd64/aarch64 with libstdc++ at the time of writing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17074	2024-02-01 14:50:17 +02:00
Botond Dénes	b9af2efcb1	Merge 'directories: prevent inode cache fragmentation by orderly verifying data directory contents' from Lakshmi Narayanan Sreethar During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache. Mitigate this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache. Fixes https://github.com/scylladb/scylladb/issues/14506 Closes scylladb/scylladb#16952 * github.com:scylladb/scylladb: directories: prevent inode cache fragmentation by orderly verifying data directory contents directories: skip verifying data directory contents during startup directories: co-routinize create_and_verify	2024-02-01 12:30:07 +02:00
Botond Dénes	2a4b991772	Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs. The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function. Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw. Fixes #17035 Closes scylladb/scylladb#17073 * github.com:scylladb/scylladb: utils: replace assert() by on_internal_error() utils: add on_internal_error with common logger utils: add a timeuuid minimum, like we had maximum test/cql-pytest: tests for "date" type	2024-02-01 10:48:48 +02:00
Asias He	2888c3086c	utils: Add uuid_xor_to_uint32 helper Convert the uuid to a uint32_t using xor. It is useful to get a uint32_t number from the uuid. Refs: #16927 Closes scylladb/scylladb#17049	2024-02-01 10:27:55 +02:00
Lakshmi Narayanan Sreethar	dbe758d309	directories: prevent inode cache fragmentation by orderly verifying data directory contents During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache. Prevent this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache. Fixes #14506 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 12:20:23 +05:30
Lakshmi Narayanan Sreethar	74a4085426	directories: skip verifying data directory contents during startup This is in preparation for a subsequent patch that will verify the contents of the data directory in a specific order. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 11:54:59 +05:30
Lakshmi Narayanan Sreethar	2e3d2498f4	directories: co-routinize create_and_verify Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 11:41:10 +05:30
Nadav Har'El	458fd0c2f7	utils: replace assert() by on_internal_error() In issue #17035 we had a situation where a certain input timestamp could result in the create_time() utility function getting called on a timestamp that cannot be represented as timeuuid, and this resulted in an assertion failure, and a crash. I guess we used an assertion because we believed that callers try to avoid calling this function on excessively large timestamps, but evidentally, they didn't tried hard enough and we got a crash. The code in UUID_gen.hh changed a lot over the years and has become very convoluted and it is almost impossible to understand all the code paths that could lead to this assertion failures. So it's better to replace this assertion by a on_internal_error, which by default is just an exception - and also logs the backtrace of the failure. Issue #17035 would have been much less serious if we had an exception instead of an assert. Refs #17035 Refs #7871, Refs #13970 (removes an assert) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 16:45:28 +02:00
Nadav Har'El	259811b6ec	utils: add on_internal_error with common logger Seastar's on_internal_error() is a useful replacement for assert() but it's inconvenient that it requires each caller to supply a logger - which is often inconvenient, especially when the caller is a header file. So in this patch we introduce a utils::on_internal_error() function which is the same as seastar::on_internal_error() (the former calls the latter), except it uses a single logger instead of asking the caller to pass a logger. Refs #7871 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 16:45:09 +02:00
Pavel Emelyanov	7c5c89ba8d	Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel" This reverts commit `370fbd346c`, reversing changes made to `0912d2a2c6`. This makes scylla-manager mis-interpret the data_file_directories somehow, issue #17078	2024-01-31 15:08:14 +03:00
Nadav Har'El	827c20467c	utils: add a timeuuid minimum, like we had maximum Our time-handling code in UUID_gen.hh is very fragile for very large timestamps, because the different types - such as Cassandra "timestamp" and Timeuuid use very different resolution and ranges. In issue #17035 we discovered a situation where a certain CQL "timestamp"-type value could cause an assertion-failure and a crash in the create_time() function that creates a timeuuid - because that timestamp didn't fit the place we have in timeuuid. We already added in the past a limit, UUID_UNIXTIME_MAX, beyond which we refuse timestamps, to avoid these assertions failure. However, we missed the possibility of negative timestamps (which are allowed in CQL), and indeed a negative timestamp (or a timestamp which was "wrapped" to a negative value) is what caused issue #17035. So this patch adds a second limit, UUID_UNIXTIME_MIN - limiting the most negative timestamp that we support to well below the area which causes problems, and adds tests that reproduce #17035 and that we didn't break anything else (e.g., negative timestamps are still allowed - just not extremely negative timestamps). Fixes #17035. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 11:32:26 +02:00
Pavel Emelyanov	84ddc37130	utils: Coroutinize disk_sanity() It's pretty hairy in its future-promises form, with coroutines it's much easier to read Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17052	2024-01-31 09:20:21 +02:00
Kefu Chai	b931d93668	treewide: fix misspellings in code comments these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17004	2024-01-31 09:16:10 +02:00
Patryk Wrobel	781a6a5071	utils/directories: make utils::directories::set an internal type Previously, utils::directories::set could have been used by clients of utils::directories class to provide dirs for creation. Due to moving the responsibility for providing paths of dirs from db::config to utils::directories, such usage is no longer the case. This change: - defines utils::directories::set in utils/directories.cc to disallow its usage by the clients of utils::directories - makes utils::directories::create_and_verify() member function private; now it is used only by the internals of the class - introduces a new member function to utils::directories called create_and_verify_sharded_directory() to limit the functionality provided to clients Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	dc8d5ffaf6	db::config: keep dir paths unchanged This change is intended to ensure, that db::config fields related to directories are not changed. To achieve that a member function called setup_directories() is removed. The responsibility for directories paths has been moved to utils::directories, which may generate default paths if the configuration does not provide a specific value. Fixes: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	1cd676e438	Allow utils::directories to provide paths to dirs This change extends utils::directories class in the following way: - adds new member variables that correspond to fields from db::config that describe paths of directories - introduces a public interface to retrieve the values of the new members - allows construction of utils::directories object based on db::config to setup internal member variables related to paths to dirs The new members of utils::directories are overriden when the provided values are empty. The way of setting paths is taken from db::config. To ensure that the new logic works correctly `utils_directories_test` has been created. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	1b0ccaf4f2	Clean-up of utils::directories This change is intended to clean-up files in which utils::directories class is defined to ease further extensions. The preparation consists of: - removal of `using namespace` from directories.hh to avoid namespace pollution in files, that include this header - explicit inclusion of headers, that were missing or were implicitly included to ensure that directories.hh is self-sufficient - defining directories::set class outside of its parent to improve readability Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Botond Dénes	c6fd4dffbb	Merge 'Remove anonymous namespaces from headers' from Patryk Wróbel Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This PR removes unnamed namespaces from header files. References: - [CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf21-dont-use-an-unnamed-anonymous-namespace-in-a-header) - [SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file"](https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file) Closes scylladb/scylladb#16998 * github.com:scylladb/scylladb: utils/config_file_impl.hh: remove anonymous namespace from header mutation/mutation.hh: remove anonymous namespace from header	2024-01-26 13:20:17 +02:00
Patryk Wrobel	6faa178f10	utils/config_file_impl.hh: remove anonymous namespace from header Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This change aligns the code with the following guidelines: - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header" - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file" Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-26 08:44:44 +01:00
Kefu Chai	8a5689e7a7	error_injection: do not cast to milliseconds when injecting timeout before this change, we always cast the wait duration to millisecond, even if it could be using a higher resolution. actually `std::chrono::steady_clock` is using `nanosecond` for its duration, so if we inject a deadline using `steady_clock`, we could be awaken earlier due to the narrowing of the duration type caused by the duration_cast. in this change, we just use the duration as it is. this should allow the caller to use the resolution provided by Seastar without losing the precision. Fixes #15902 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 19:10:24 +08:00
Nadav Har'El	df6c9828ef	Merge 'Add protobuf and Native histogram support' from Amnon Heiman Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. Native histograms hold the benefits of high resolution at a lower resource cost. This series allows sending histograms in a native histogram format over protobuf. By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf. Fixes #12931 Closes scylladb/scylladb#16737 * github.com:scylladb/scylladb: main.cc: Add prometheus_allow_protobuf command line histogram_metrics_helper: support native histogram config: Add prometheus_allow_protobuf flag	2024-01-24 21:24:50 +02:00
Kefu Chai	207fe93b90	utils: add formatter for rjson::value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for rjson::value, and drop its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16956	2024-01-24 10:30:52 +02:00
Amnon Heiman	95d1146fea	histogram_metrics_helper: support native histogram approx_exponential_histogram uses similar logic to Prometheus native histogram, to allow Prometheus sending its data in a native histogram format it needs to report schema and min id (id of the first bucket). This patch update to_metrics_histogram to set those optional parameters, leaving it to the Prometheus to decide in what format the histogram will be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:34 +02:00
Kefu Chai	ac473eca91	utils:: add formatter for enum_option before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for enum_option<>. since its operator<<() is still used by the homebrew generic formatter for formatting vector<>, operator<<() is preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16917	2024-01-23 10:03:51 +02:00
Avi Kivity	9e8b65f587	chunked_vector: remove range constructor Standard containers don't have constructors that take ranges; instead people use boost::copy_range or C++23 std::ranges::to. Make the API more uniform by removing this special constructor. The only caller, in a test, is adjusted. Closes scylladb/scylladb#16905	2024-01-22 10:26:15 +02:00
Kefu Chai	a1dcddd300	utils: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16833	2024-01-18 12:50:06 +02:00
Kefu Chai	f11a53856d	utils/pretty_printers: add "I" specifier support this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:47 +08:00
Kefu Chai	7d627b328f	utils/pretty_printers: use the formatting of to_hr_size() keep the precision of 4 digits, for instance, so that we format "8191" as "8191" instead of as "8 Ki". this is modeled after the behavior of `to_hr_size()`. for better user experience. and also prepares to consolidate these two formatters. tests are updated to exercise both IEC and SI notations. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:03 +08:00
Patryk Wrobel	a64eb92369	utils: specialize fmt::formatter for utils::tagged_integer This change introduces a specialization of fmt::formatter for utils::tagged_integer. This enables the usage of this type with FMTv10, which dropped the default generated formatter. Usage of utils::tagged_integer without the defined formatter resulted in compilation error when compiled with FMTv10. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16715	2024-01-10 18:32:43 +03:00
Kefu Chai	c1beba1f7d	utils: config_file: throw bpo::invalid_option_value() when seeing invalid option before this change, `std::invalid_argument` is thrown by `bpo::notify(configuration)` in `app_template::run_deprecated()` when invalid option is passed in via command line. `utils::named_value` throws `std::invalid_argument` if the given value is not listed in `_allowed_values`. but we don't handle `std::invalid_argument` in `app_template::run_deprecated()`. so the application aborts with unhandled exception if the specified argument is not allowed. in this change, we convert the `std::invalid_argument` to a derived class of `bpo::error` in the customized notify handler, so that it can be handled in `app_template::run_deprecated()`. because `name_value::operator()` is also used otherwhere, we should not throw a bpo::error there. so its exception type is preserved. Fixes #16687 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16688	2024-01-09 11:49:06 +02:00
Kefu Chai	db9e314965	treewide: apply codespell to the comments in source code for less spelling errors in comment. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16408	2023-12-20 10:25:03 +02:00
Botond Dénes	a6200e99e6	Merge 'Handle S3 partial read overflows' from Pavel Emelyanov The test case that validates upload-sink works does this by getting several random ranges from the uploaded object and checks that the content is what it should be. The range boundaries are generated like this: ``` uint64_t len = random(1, chunk_size); uint64_t offset = random(file_size) - len; ``` The 2nd line is not correct, if random number happens less than the len the offset befomes "negative", i.e. -- very large 64-bit unsigned value. Next, this offset:len gets into s3 client's get_object_contiguous() helper which in turn converts them into http range header's bytes-specifier format which is "first_bytet-last_byte" one. The math here is ``` first_byte = offset; last_byte = offset + len - 1; ``` Here the overflow of the offset thing results in underflow of the last_byte -- it becomes less than the first_byte. According to RFC this range-specifier is invalid and (!) can be ignored by the server. This is what minio does -- it ignores invalid range and returns back full object. But that's not all. When returning object portion the http request status code is PartialContent, but when the range is ignored and full object is returned, the status is OK. This makes s3 client's request fail with unexpected_status_error in the middle of the test. Then the object is removed with deferred action and actual error is printed into logs. In the end of the day logs look as if deletion of an object failed with OK status %) fixes: #16133 Closes scylladb/scylladb#16324 * github.com:scylladb/scylladb: test/s3: Avoid object range overflow s3/client: Handle GET-with-Range overflows correctly	2023-12-18 10:00:32 +02:00
Kefu Chai	c485644303	utils: bit_cast: drop unused #includes Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-12 21:09:51 +08:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Pavel Emelyanov	3e9309caf4	s3/client: Handle GET-with-Range overflows correctly The get_object_contiguous() accepts optional range argument in a form of offset:lengh and then converts it into first_byte:last_byte pair to satisfy http's Range header range-specifier. If the lat_byte, which is offset + lenght - 1, overflows 64-bits the range specifier becomes invalid. According to RFC9110 servers may ignore invalid ranges if they want to and this is what minio does. The result is pretty interesting. Since the range is specified, client expect PartialContent response, but since the range is ignored by server the result is OK, as if the full object was requested. So instead of some sane "overflow" error, the get_object_contiguous() fails with status "success". The fix is in pre-checking provided ranges and failing early Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 10:50:55 +03:00

1 2 3 4 5 ...

1604 Commits