This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.
This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
It's pretty hairy in its future-promises form, with coroutines it's
much easier to read
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17052
Previously, utils::directories::set could have been used by
clients of utils::directories class to provide dirs for creation.
Due to moving the responsibility for providing paths of dirs from
db::config to utils::directories, such usage is no longer the case.
This change:
- defines utils::directories::set in utils/directories.cc to disallow
its usage by the clients of utils::directories
- makes utils::directories::create_and_verify() member function
private; now it is used only by the internals of the class
- introduces a new member function to utils::directories called
create_and_verify_sharded_directory() to limit the functionality
provided to clients
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
This change is intended to ensure, that
db::config fields related to directories
are not changed. To achieve that a member
function called setup_directories() is
removed.
The responsibility for directories paths
has been moved to utils::directories,
which may generate default paths if the
configuration does not provide a specific
value.
Fixes: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
This change extends utils::directories class in
the following way:
- adds new member variables that correspond to
fields from db::config that describe paths
of directories
- introduces a public interface to retrieve the
values of the new members
- allows construction of utils::directories
object based on db::config to setup internal
member variables related to paths to dirs
The new members of utils::directories are overriden
when the provided values are empty. The way of setting
paths is taken from db::config.
To ensure that the new logic works correctly
`utils_directories_test` has been created.
Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
This change is intended to clean-up files in which
utils::directories class is defined to ease further
extensions.
The preparation consists of:
- removal of `using namespace` from directories.hh to
avoid namespace pollution in files, that include this
header
- explicit inclusion of headers, that were missing or
were implicitly included to ensure that directories.hh
is self-sufficient
- defining directories::set class outside of its parent
to improve readability
Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.
This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.
This PR removes unnamed namespaces from header files.
References:
- [CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf21-dont-use-an-unnamed-anonymous-namespace-in-a-header)
- [SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file"](https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file)
Closesscylladb/scylladb#16998
* github.com:scylladb/scylladb:
utils/config_file_impl.hh: remove anonymous namespace from header
mutation/mutation.hh: remove anonymous namespace from header
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.
This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.
This change aligns the code with the following guidelines:
- CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous)
namespace in a header"
- SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace
in a header file"
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
before this change, we always cast the wait duration to millisecond,
even if it could be using a higher resolution. actually
`std::chrono::steady_clock` is using `nanosecond` for its duration,
so if we inject a deadline using `steady_clock`, we could be awaken
earlier due to the narrowing of the duration type caused by the
duration_cast.
in this change, we just use the duration as it is. this should allow
the caller to use the resolution provided by Seastar without losing
the precision.
Fixes#15902
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Native histograms (also known as sparse histograms) are an experimental Prometheus feature.
They use protobuf as the reporting layer.
Native histograms hold the benefits of high resolution at a lower resource cost.
This series allows sending histograms in a native histogram format over protobuf.
By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf.
Fixes#12931Closesscylladb/scylladb#16737
* github.com:scylladb/scylladb:
main.cc: Add prometheus_allow_protobuf command line
histogram_metrics_helper: support native histogram
config: Add prometheus_allow_protobuf flag
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for rjson::value, and drop its
operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16956
approx_exponential_histogram uses similar logic to Prometheus native
histogram, to allow Prometheus sending its data in a native histogram
format it needs to report schema and min id (id of the first bucket).
This patch update to_metrics_histogram to set those optional parameters,
leaving it to the Prometheus to decide in what format the histogram will
be reported.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for enum_option<>. since its
operator<<() is still used by the homebrew generic formatter for
formatting vector<>, operator<<() is preserved.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16917
Standard containers don't have constructors that take ranges;
instead people use boost::copy_range or C++23 std::ranges::to.
Make the API more uniform by removing this special constructor.
The only caller, in a test, is adjusted.
Closesscylladb/scylladb#16905
this is to mimic the formatting of `human_readable_value`, and
to prepare for consolidating these two formatters, so we don't have
two pretty printers in the tree.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
keep the precision of 4 digits, for instance, so that we format
"8191" as "8191" instead of as "8 Ki". this is modeled after
the behavior of `to_hr_size()`. for better user experience.
and also prepares to consolidate these two formatters.
tests are updated to exercise both IEC and SI notations.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This change introduces a specialization of fmt::formatter
for utils::tagged_integer. This enables the usage of this
type with FMTv10, which dropped the default generated formatter.
Usage of utils::tagged_integer without the defined formatter
resulted in compilation error when compiled with FMTv10.
Refs: #13245
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#16715
before this change, `std::invalid_argument` is thrown by
`bpo::notify(configuration)` in `app_template::run_deprecated()` when
invalid option is passed in via command line. `utils::named_value`
throws `std::invalid_argument` if the given value is not listed in
`_allowed_values`. but we don't handle `std::invalid_argument` in
`app_template::run_deprecated()`. so the application aborts with
unhandled exception if the specified argument is not allowed.
in this change, we convert the `std::invalid_argument` to a
derived class of `bpo::error` in the customized notify handler,
so that it can be handled in `app_template::run_deprecated()`.
because `name_value::operator()` is also used otherwhere, we
should not throw a bpo::error there. so its exception type
is preserved.
Fixes#16687
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16688
The test case that validates upload-sink works does this by getting several random ranges from the uploaded object and checks that the content is what it should be. The range boundaries are generated like this:
```
uint64_t len = random(1, chunk_size);
uint64_t offset = random(file_size) - len;
```
The 2nd line is not correct, if random number happens less than the len the offset befomes "negative", i.e. -- very large 64-bit unsigned value.
Next, this offset:len gets into s3 client's get_object_contiguous() helper which in turn converts them into http range header's bytes-specifier format which is "first_bytet-last_byte" one. The math here is
```
first_byte = offset;
last_byte = offset + len - 1;
```
Here the overflow of the offset thing results in underflow of the last_byte -- it becomes less than the first_byte. According to RFC this range-specifier is invalid and (!) can be ignored by the server. This is what minio does -- it ignores invalid range and returns back full object.
But that's not all. When returning object portion the http request status code is PartialContent, but when the range is ignored and full object is returned, the status is OK. This makes s3 client's request fail with unexpected_status_error in the middle of the test. Then the object is removed with deferred action and actual error is printed into logs. In the end of the day logs look as if deletion of an object failed with OK status %)
fixes: #16133Closesscylladb/scylladb#16324
* github.com:scylladb/scylladb:
test/s3: Avoid object range overflow
s3/client: Handle GET-with-Range overflows correctly
Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later.
This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted.
The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained.
The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was.
This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas.
Closesscylladb/scylladb#15847
* github.com:scylladb/scylladb:
test: tablets: Add test for failed streaming being fenced away
error_injection: Introduce poll_for_message()
error_injection: Make is_enabled() public
api: Add API to kill connection to a particular host
range_streamer: Do not block topology change barriers around streaming
range_streamer, tablets: Do not keep token metadata around streaming
tablets: Fail gracefully when migrating tablet has no pending replica
storage_service, api: Add API to disable tablet balancing
storage_service, api: Add API to migrate a tablet
storage_service, raft topology: Run streaming under session topology guard
storage_service, tablets: Use session to guard tablet streaming
tablets: Add per-tablet session id field to tablet metadata
service: range_streamer: Propagate topology_guard to receivers
streaming: Always close the rpc::sink
storage_service: Introduce concept of a topology_guard
storage_service: Introduce session concept
tablets: Fix topology_metadata_guard holding on to the old erm
docs: Document the topology_guard mechanism
The get_object_contiguous() accepts optional range argument in a form of
offset:lengh and then converts it into first_byte:last_byte pair to
satisfy http's Range header range-specifier.
If the lat_byte, which is offset + lenght - 1, overflows 64-bits the
range specifier becomes invalid. According to RFC9110 servers may ignore
invalid ranges if they want to and this is what minio does.
The result is pretty interesting. Since the range is specified, client
expect PartialContent response, but since the range is ignored by server
the result is OK, as if the full object was requested. So instead of
some sane "overflow" error, the get_object_contiguous() fails with
status "success".
The fix is in pre-checking provided ranges and failing early
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.
Refs: https://github.com/scylladb/scylladb/issues/16255Closesscylladb/scylladb#16289
* github.com:scylladb/scylladb:
Update unified/build_unified.sh
Update main.cc
Update dist/common/scripts/scylla-housekeeping
Typos: fix typos in code
Prototype implementation of format suggested/requested by @avikivity:
Divides segments into disk-write-alignment sized pages, each tagged with segment ID + CRC of data content.
When read, we both verify sector integrity (CRC) to detect corruption, as well as matching ID read with expected one.
If the latter mismatches we have a prematurely terminated segment (read truncation), which, depending on whether the CL is
written in batch or periodic mode, as well as explicit sync, can mean data loss.
Note: all-zero pages are treated as kosher, both to align with newly allocated segments, as well as fully terminated (zero-page) ones.
Note: This is a preview/RFC - the rest of the file format is not modified. At least parts of entry CRC could probably be removed, but I have not done so yet (needs some thinking).
Note: Some slight abstraction breaks in impl. and probably less than maximal efficiency.
v2:
* Removed entry CRC:s in file format.
* Added docs on format v3
* Added one more test for recycling-truncation
v3:
* Fixed typos in size calc and docs
* Changed sect metadata order
* Explicit iter type
Closesscylladb/scylladb#15494
* github.com:scylladb/scylladb:
commitlog_test: Add test for replaying large-ish mutation
commitlog_test: Add additional test for segmnent truncation
docs: Add docs on commitlog format 3
commitlog: Remove entry CRC from file format
commitlog: Implement new format using CRC:ed sectors
commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges
fragmented_temporary_buffer: Add const iterator access to underlying buffers
commitlog_replayer: differentiate between truncated file and corrupt entries
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Current mechanism to deprecate config options is implemented in a hacky
way in `main.cpp` and doesn't account for existing
`db::config/boost::po` API controlling lifetime of config options, hence
it's being replaced in this PR by adding yet another `value_status`
enumerator: `Deprecated`, so that deprecation of config options is
controlled in one place in `config.cc`,i.e. when specifying config options.
Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing
With this change, if a `Deprecated` config option is specified as
1. a command line parameter, scylla will run and log:
```
WARN 2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated)
```
(Previously it was only a message printed to standard output, not a
scylla log of warn level).
2. an option in `scylla.yaml`, scylla will run and log:
```
WARN 2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota
```
Fixes#15887
Incorporates dropped https://github.com/scylladb/scylladb/pull/15928Closesscylladb/scylladb#16184
TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place.
To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream.
Closesscylladb/scylladb#16175
* github.com:scylladb/scylladb:
sstable: Generalize toc file read and parse
s3/client: Don't GET object contents on out-of-bound reads
s3/client: Cache stats on readable_file
If S3 readable file is used inside file input stream, the latter may
call its read methods with position that is above file size. In that
case server replies with generic http error and the fact that the range
was invalid is encoded into reply body's xml.
That's not great to catch this via wrong reply status exception and xml
parsing all the more so we can know that the read is out-of-bound in
advance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
S3-based sstables components are immutable, so every time stat is called
there's no need to ping server again.
But the main intention of this patch is to provide stats for read calls
in the next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.
In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When http request resolves with excpetion it makes sense to translate
the network exception into storage exceptio to make upper layers think
that it was some sort of IO error, not SUDDENLY and http one.
The translation is, for now, pretty simple:
- 404 and 3xx -> ENOENT
- 403(forbidden) and 401(unauthorized) -> EACCESS
- anything else -> EIO
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
vla (variable length array) is an extension in GCC and Clang. and
it is not part of the C++ standard.
so let's avoid using it if possible, for better standard compliant.
it's also more consistent with other places where we calculate the size
of an array of T in the same source file.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16084
instead of linking against cryptopp, we should link against
crytopp::crytopp. the latter is the target exposed by
Findcryptopp.cmake, while the former is but a library name which
is not even exposed by any find_package() call.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16060
The jumbo sink is there to upload files that can be potentially larger
than 50Gb (10000*5Mb). For that the sink uploads a set of so called
"pieces" -- files up to 50Gb each -- then uses the copy-upload APi call
to squash the pieces together. After copying the piece is removed. In
case of a crash while uploading pieces remain in the bucket forever
which is not great.
This patch tags pieces with 'kind=piece' tag in order to tell pieces
from regular objects. This can be used, for example, by setting up the
lifecycle tag-based policy and collect dangling pieces eventually.
fixes: #13670
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#16023
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.
we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15913
This PR implements the following new nodetool commands:
* snapshot
* drain
* flush
* disableautocompaction
* enableautocompaction
All commands come with tests and all tests pass with both the new and the current nodetool implementations.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#15939
* github.com:scylladb/scylladb:
test/nodetool: add README.md
tools/scylla-nodetool: implement enableautocompaction command
tools/scylla-nodetool: implement disableautocompaction command
tools/scylla-nodetool: implement the flush command
tools/scylla-nodetool: extract keyspace/table parsing
tools/scylla-nodetool: implement the drain command
tools/scylla-nodetool: implement the snapshot command
test/nodetool: add support for matching aproximate query parameters
utils/http: make dns_connection_factory::initialize() static
Said method can out-live the factory instance. This was not a problem
because the method takes care to keep all its need from `this` alive, by
copying them to the coroutine stack. However, this fact that this method
can out-live the instance is not obvious, and an unsuspecting developer
(me) added a new member (_logger) which was not kept alive.
This can cause a use-after-free in the factory. Fix by making
initialize() static, forcing the instance to pass all parameters
explicitely and add a comment explaining that this method can out-live
the instance.
on debian derivatives librapidxml-dev installs rapidxml.h as
rapixml/rapidxml.hpp, so let's use it as a fallback.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15814