mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-23 00:02:37 +00:00
Drop local formatter for seastar::http::reply, which should have been added to Seastar in the first place, and now conflicts. Also drop local formatters for types that are aliases for Seastar types which have gained formatters. Disable recently-gained TLS use of OpenSSL instead of gnutls. We don't need it, and it causes link errors with LTO. Fix incorrect skipping in encrypted_file_test, which computed the remaining stream length but did not account for already consumed size_to_compare. Change utils::gcp::storage::client::object_data_source::skip() to match new Seastar behavior (rejecting skip-past-eof with an exception). This is needed since30f1075544switched the test's data source to a Seastar implementation. It is also more correct - if we're asked to skip n bytes but the stream doesn't have n bytes, this is a protocol violation. Contains test fix from Pavel, exposed by [1]: test: Handle premature EOF in test_gcp_storage_skip_read The test intentionally uses file_size larger than the actual object to exercise EOF behavior. When input_stream::skip() is called after EOF, it throws std::runtime_error("premature end of stream"). Catch this specific exception from both streams, verify they agree, and exit the loop gracefully. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> [1]cbd1e17d2f, included in this Seastar submodule update * seastar 4d268e0e...485a62b2 (50): > reactor: open_directory(): honor bypass_fsync > http: Add formatters for http::request and http::reply > Merge 'Assorted set of io-tester cleanups' from Pavel Emelyanov io_tester: Remove unused and internal-only accessor io_tester: Move think-time machinery into thinker_state io_tester: Move _file to io_class_data io_tester: Replace class_data::_start member with a local variable io_tester: Move _alignment from class_data to io_class_data io_tester: Remove buffer allocation from top-level request issuing io_tester: Cleanup context::stop() invocation io_tester: Allocate write buffer once to fill a file io_tester: Declare quantiles arrays as static constexpr io_tester: Drop class_data::type_str() io_tester: Replace != "" comparisons with .empty() io_tester: Replace gen_class_data() if/else chain with a switch io_tester: Deduplicate vectorized I/O classes > io_tester: fix crash from missing metric during startup > net: tls: adjust openssl integration to new module support > http/client: Count and export integrated queue length > Merge 'Introduce pipe_data_source_impl and pipe_data_sink_impl' from Pavel Emelyanov fstream: add pipe_data_source_impl and pipe_data_sink_impl pollable_fd: add write_some/write_all backed by writev pollable_fd: rename write_some/write_all(iovec) to send_some/send_all > reactor: Make pollable_fd_state helper methods private > module: extend seastar.cppm with comprehensive public API exports > Merge 'Add exhaustive input_stream invariant test + fixes' from Pavel Emelyanov tests: add exhaustive input_stream read/skip invariant test iostream: make skip() reject premature end of stream with exception > Merge 'Allow runtime selectability of GnuTLS or OpenSSL' from Noah Watkins net/tls: avoid potential read-past-buffer net/tls: move credential methods to generic tls layer net/tls: rename credentials_impl::dh_params to set_dh_params test/tls: enable openssl tls unit test test/tls: fix CA cert generation to use v3_ca extensions github: disable parallel test execution in alpine workflow crypto: support compiling seastar without gnutls net/tcp: use crypto provider for md5 calculation tls: fix test_peer_certificate_chain_handling for OpenSSL net/tls: fix test for self-signed server cert opoenssl compat net/tls: disable priority strings test for openssl provider core/crypto: expose crypto backend name for introspection test/tls: remove gnutls version guard net/tls: add openssl tls backend http: use backend agnostic tls error code net/tls: make error codes configurable by each tls backend net/tls: move reloadable_credentials to generic tls layer net/tls: move build_certificate to generic tls layer net/tls: move apply_to() to generic tls layer net/tls: move credential methods to generic tls layer net/tls: add OpenSSL-specific methods to public API with no-op defaults net/tls: introduce dh_params and credentials abstraction layer net/tls: add credentials_impl abstract base class net/tls: dispatch tls::error_category() through crypto_provider net/tls: dispatch wrap_client/wrap_server through crypto_provider net/tls: add tls_backend interface to crypto_provider net/tls: move public tls API methods to generic tls layer net/tls: move formatting utilities to generic tls layer net/tls: move credentials_builder blob methods to generic tls layer net/tls: move dh_params::from_file to generic tls layer net/tls: move abstract_credentials file methods to generic tls layer net/tls: move tls_socket_impl to generic tls layer net/tls: move server_session to general tls layer net/tls: move tls_connected_socket_impl to generic tls layer net/tls: move net::get_impl to generic tls layer net/tls: move session_ref to generic tls layer net/tls: add session_impl abstract interface for tls pluggability net/tls: rename tls.cc to be gnutls specific crypto: introduce crypto provider abstraction http: remove unused include > tls: test_send_two_large > rpc: include exception type for remote errors > GHA: increase timeout to 60 minutes > apps/httpd: replace deprecated reply::done() with write_body() > missing header(s) > net: Fix missing throw for runtime_error in create_native_net_device > tests/io_queue: account for token bucket refill granularity in bandwidth checks > Merge 'iovec: fix iovec_trim_front infinite loop on zero-length iovecs' from Travis Downs tests: add regression tests for zero-length iovec handling iovec: fix iovec_trim_front infinite loop on zero-length iovecs > util/process: graduate process management API from experimental > cooking: don't register ready.txt as a build output > sstring: make make_sstring not static > Add SparkyLinux to debian list in install-dependencies.sh > http: allow control over default response headers > Merge 'chunked_fifo: make cached chunk retention configurable' from Brandon Allard tests/perf: add chunked_fifo microbenchmarks chunked_fifo: set the default free chunk retention to 0 chunked_fifo: make free chunk retention configurable > Merge 'reactor_backend: fix pollable_fd_state_completion reuse in io_uring' from Kefu Chai tests: add regression test for pollable_fd_state_completion reuse reactor_backend: use reset() in AIO and epoll poll paths reactor_backend: fix pollable_fd_state_completion reuse after co_await in io_uring > Merge 'coroutine: Generator cleanups' from Kefu Chai coroutine/generator: extract schedule_or_resume helper coroutine/generator: remove unused next_awaiter classes coroutine/generator: remove write-only _started field coroutine/generator: assert on unreachable path in buffered await_resume coroutine/generator: add elements_of tag and #include <ranges> coroutine/generator: add empty() to bounded_container concept > cmake: bump minimum Boost version to 1.79.0 > seastar_test: remove unnecessary headers > cmake: bump minimum GnuTLS version to 3.7.4 > Merge 'reactor: add get_all_io_queues() method' from Travis Downs tests: add unit test for reactor::get_all_io_queues() reactor: add get_all_io_queues() method reactor: move get_io_queue and try_get_io_queue to .cc file > http: deprecate reply::done(), remove _response_line dead field > core: Deprecate scattered_message > ci: add workflow dispatch to tests workflow > perf_tests: exit non-zero when -t pattern matches no tests > Replace duplicate SEGV_MAPERR check in sigsegv_action() with SEGV_ACCERR. > perf_tests: add total runtime to json output > Merge 'Relax large allocation error originating from json_list_template' from Robert Bindar implement move assignment operator for json_list_template json_list_template copy assignment operator reserves capacity upfront > perf_tests: add --no-perf-counters option > Merge 'Fix to_human_readable_value() ability to work with large values' from Pavel Emelyanov memory: Add compile-time test for value-to-human-readable conversion memory: Extend list of suffixes to have peta-s memory: Fix off-by-one in suffix calculation memory: Mark to_human_readable_value() and others constexpr > http: Improve writing of response_line() into the output > Merge 'websocket: add template parameter for text/binary frame mode and implement client-side WebSocket' from wangyuwei websocket: add template parameter for text/binary frame mode websocket: impl client side websocket function > file: Fix checks for file being read-only > reactor: Make do_dump_task_queue a task_queue method > Merge 'Implement fully mixed mode for output_stream-s' from Pavel Emelyanov tests/output_stream: sample type patterns in sanitizer builds tests/output_stream: extend invariant test to cover mixed write modes iostream: allow unrestricted mixing of buffered and zero-copy writes tests/output_stream: remove obsolete ad-hoc splitting tests tests/output_stream: add invariant-based splitting tests iostream: rename output_stream::_size to ::_buffer_size > reactor_backend: replace virtual bool methods with const bool_class members > resource: Avoid copying CPU vector to break it into groups > perf_tests: increase overhead column precision to 3 decimal places > Merge 'Move reactor::fdatasync() into posix_file_impl' from Pavel Emelyanov reactor: Deprecate fdatasync() method file: Do fdatasync() right in the posix_file_impl::flush() file: Propagate aio_fdatasync to posix_file_impl reactor: Move reactor::fdatasync() code to file.cc reactor,file: Make full use of file_open_options::durable bit file: Add file_open_options::durable boolean file: Account io_stats::fsyncs in posix_file_impl::flush() reactor: Move _fsyncs counter onto io_stats > http: Remove connection::write_body() Closes scylladb/scylladb#29553
1189 lines
47 KiB
C++
1189 lines
47 KiB
C++
/*
|
||
* Copyright (C) 2025-present ScyllaDB
|
||
*/
|
||
|
||
/*
|
||
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
|
||
*/
|
||
|
||
|
||
#include "object_storage.hh"
|
||
#include "gcp_credentials.hh"
|
||
#include "object_storage_retry_strategy.hh"
|
||
|
||
#include <algorithm>
|
||
#include <numeric>
|
||
#include <deque>
|
||
|
||
#include <boost/regex.hpp>
|
||
|
||
#include <seastar/core/align.hh>
|
||
#include <seastar/core/gate.hh>
|
||
#include <seastar/core/semaphore.hh>
|
||
#include <seastar/core/sleep.hh>
|
||
#include <seastar/core/units.hh>
|
||
#include <seastar/http/client.hh>
|
||
#include <seastar/util/short_streams.hh>
|
||
|
||
#include "utils/rest/client.hh"
|
||
#include "utils/exponential_backoff_retry.hh"
|
||
#include "utils/error_injection.hh"
|
||
#include "utils/exceptions.hh"
|
||
#include "utils/http.hh"
|
||
#include "utils/http_client_error_processing.hh"
|
||
#include "utils/overloaded_functor.hh"
|
||
|
||
static logger gcp_storage("gcp_storage");
|
||
|
||
static constexpr uint64_t min_gcp_storage_chunk_size = 256*1024;
|
||
static constexpr uint64_t default_gcp_storage_chunk_size = 8*1024*1024;
|
||
|
||
static constexpr char GCP_OBJECT_SCOPE_READ_ONLY[] = "https://www.googleapis.com/auth/devstorage.read_only";
|
||
static constexpr char GCP_OBJECT_SCOPE_READ_WRITE[] = "https://www.googleapis.com/auth/devstorage.read_write";
|
||
static constexpr char GCP_OBJECT_SCOPE_FULL_CONTROL[] = "https://www.googleapis.com/auth/devstorage.full_control";
|
||
|
||
static constexpr char STORAGE_APIS_URI[] = "https://storage.googleapis.com";
|
||
static constexpr char APPLICATION_JSON[] = "application/json";
|
||
static constexpr char LOCATION[] = "Location";
|
||
static constexpr char CONTENT_RANGE[] = "Content-Range";
|
||
static constexpr char RANGE[] = "Range";
|
||
|
||
using namespace std::string_literals;
|
||
using namespace utils::gcp;
|
||
|
||
static bool storage_scope_implies(const scopes_type& scopes, const scopes_type& check_for) {
|
||
if (default_scopes_implies_other_scope(scopes, check_for)) {
|
||
return true;
|
||
}
|
||
if (scopes_contains_scope(check_for, GCP_OBJECT_SCOPE_READ_ONLY)) {
|
||
return scopes_contains_scope(scopes, GCP_OBJECT_SCOPE_READ_WRITE)
|
||
|| scopes_contains_scope(scopes, GCP_OBJECT_SCOPE_FULL_CONTROL)
|
||
;
|
||
}
|
||
if (scopes_contains_scope(check_for, GCP_OBJECT_SCOPE_READ_WRITE)) {
|
||
return scopes_contains_scope(scopes, GCP_OBJECT_SCOPE_FULL_CONTROL);
|
||
}
|
||
return false;
|
||
}
|
||
|
||
static auto parse_rfc3339(const std::string& s) {
|
||
std::chrono::system_clock::time_point t;
|
||
std::istringstream is(s);
|
||
is >> std::chrono::parse("%FT%TZ", t);
|
||
return t;
|
||
}
|
||
|
||
class utils::gcp::storage::client::object_data_sink : public data_sink_impl {
|
||
shared_ptr<impl> _impl;
|
||
std::string _bucket;
|
||
std::string _object_name;
|
||
rjson::value _metadata;
|
||
std::string _session_path;
|
||
std::string _content_type;
|
||
std::deque<temporary_buffer<char>> _buffers;
|
||
uint64_t _accumulated = 0;
|
||
seastar::semaphore_units<> _mem_held;
|
||
bool _closed = false;
|
||
bool _completed = false;
|
||
seastar::named_gate _gate;
|
||
seastar::semaphore _semaphore;
|
||
std::exception_ptr _exception;
|
||
seastar::abort_source* _as;
|
||
public:
|
||
object_data_sink(shared_ptr<impl> i, std::string_view bucket, std::string_view object_name, rjson::value metadata, seastar::abort_source* as)
|
||
: _impl(i)
|
||
, _bucket(bucket)
|
||
, _object_name(object_name)
|
||
, _metadata(std::move(metadata))
|
||
, _semaphore(1)
|
||
, _as(as)
|
||
{}
|
||
future<> put(std::span<temporary_buffer<char>> bufs) override {
|
||
for (auto&& buf : bufs) {
|
||
_buffers.emplace_back(std::move(buf));
|
||
}
|
||
co_await maybe_do_upload(false);
|
||
}
|
||
future<> flush() override {
|
||
return maybe_do_upload(true);
|
||
}
|
||
future<> close() override {
|
||
if (!std::exchange(_closed, true)) {
|
||
co_await flush();
|
||
co_await _gate.close();
|
||
try {
|
||
if (!_exception && !_completed) {
|
||
co_await check_upload();
|
||
}
|
||
} catch (...) {
|
||
_exception = std::current_exception();
|
||
}
|
||
if (auto ex = std::exchange(_exception, {})) {
|
||
co_await remove_upload();
|
||
std::rethrow_exception(ex);
|
||
}
|
||
}
|
||
}
|
||
|
||
size_t buffer_size() const noexcept override {
|
||
return default_gcp_storage_chunk_size;
|
||
}
|
||
|
||
future<> acquire_session();
|
||
future<> do_single_upload(std::deque<temporary_buffer<char>>, size_t offset, size_t len, bool final);
|
||
future<> check_upload();
|
||
future<> remove_upload();
|
||
future<> adjust_memory_limit(size_t);
|
||
future<> maybe_do_upload(bool force) {
|
||
auto total = std::accumulate(_buffers.begin(), _buffers.end(), size_t{}, [](size_t s, auto& buf) {
|
||
return s + buf.size();
|
||
});
|
||
if (total == 0) {
|
||
co_return;
|
||
}
|
||
|
||
co_await adjust_memory_limit(total);
|
||
|
||
// GCP only allows upload of less than 256k on last chunk
|
||
if (total < min_gcp_storage_chunk_size && !_closed) {
|
||
co_return;
|
||
}
|
||
// avoid uploading unless we accumulate enough data. GCP docs says to
|
||
// try to keep uploads to 8MB or more.
|
||
if (force || total >= default_gcp_storage_chunk_size) {
|
||
auto bufs = std::exchange(_buffers, {});
|
||
auto start = _accumulated;
|
||
auto final = _closed;
|
||
|
||
if (!final) {
|
||
// can only write in multiples of 256.
|
||
auto rem = total % min_gcp_storage_chunk_size;
|
||
total -= rem;
|
||
while (rem) {
|
||
auto& buf = bufs.back();
|
||
if (buf.size() > rem) {
|
||
auto keep = buf.size() - rem;
|
||
_buffers.emplace(_buffers.begin(), buf.share(keep, rem));
|
||
buf.trim(keep);
|
||
break;
|
||
} else {
|
||
rem -= buf.size();
|
||
_buffers.emplace(_buffers.begin(), std::move(buf));
|
||
bufs.pop_back();
|
||
}
|
||
}
|
||
}
|
||
assert(std::accumulate(bufs.begin(), bufs.end(), size_t{}, [](size_t s, auto& buf) { return s + buf.size(); }) == total);
|
||
_accumulated += total;
|
||
// allow this in background
|
||
(void)do_single_upload(std::move(bufs), start, total, final);
|
||
}
|
||
}
|
||
};
|
||
|
||
class utils::gcp::storage::client::object_data_source : public seekable_data_source_impl {
|
||
shared_ptr<impl> _impl;
|
||
std::string _bucket;
|
||
std::string _object_name;
|
||
std::string _session_path;
|
||
uint64_t _generation = 0;
|
||
uint64_t _size = 0;
|
||
uint64_t _position = 0;
|
||
std::chrono::system_clock::time_point _timestamp;
|
||
seastar::semaphore_units<> _limits;
|
||
seastar::abort_source* _as;
|
||
std::deque<temporary_buffer<char>> _buffers;
|
||
|
||
size_t buffer_size() const {
|
||
return std::accumulate(_buffers.begin(), _buffers.end(), 0, [](size_t sum, auto& b) { return sum + b.size(); });
|
||
}
|
||
|
||
public:
|
||
object_data_source(shared_ptr<impl> i, std::string_view bucket, std::string_view object_name, seastar::abort_source* as)
|
||
: _impl(i)
|
||
, _bucket(bucket)
|
||
, _object_name(object_name)
|
||
, _as(as)
|
||
{}
|
||
future<temporary_buffer<char>> get() override;
|
||
future<temporary_buffer<char>> skip(uint64_t n) override;
|
||
future<> read_info();
|
||
void adjust_lease();
|
||
|
||
future<temporary_buffer<char>> get(size_t limit) override;
|
||
future<> seek(uint64_t pos) override;
|
||
future<uint64_t> size() override;
|
||
future<std::chrono::system_clock::time_point> timestamp() override;
|
||
};
|
||
|
||
using body_writer = std::function<future<>(output_stream<char>&&)>;
|
||
using writer_and_size = std::pair<body_writer, size_t>;
|
||
using body_variant = std::variant<std::string, writer_and_size>;
|
||
using handler_func_ex = rest::handler_func_ex;
|
||
using headers_type = std::vector<rest::key_value>;
|
||
|
||
using namespace rest;
|
||
|
||
class utils::gcp::storage::client::impl {
|
||
std::string _endpoint;
|
||
std::optional<google_credentials> _credentials;
|
||
seastar::semaphore _unlimited;
|
||
seastar::semaphore& _limits;
|
||
seastar::http::experimental::client _client;
|
||
shared_ptr<seastar::tls::certificate_credentials> _certs;
|
||
future<> authorize(request_wrapper& req, const std::string& scope);
|
||
public:
|
||
impl(const utils::http::url_info&, std::optional<google_credentials>, seastar::semaphore*, shared_ptr<seastar::tls::certificate_credentials> creds);
|
||
impl(std::string_view endpoint, std::optional<google_credentials>, seastar::semaphore*, shared_ptr<seastar::tls::certificate_credentials> creds);
|
||
|
||
future<> send_with_retry(const std::string& path, const std::string& scope, body_variant, std::string_view content_type, handler_func_ex, httpclient::method_type op, key_values headers = {}, seastar::abort_source* = nullptr);
|
||
future<> send_with_retry(const std::string& path, const std::string& scope, body_variant, std::string_view content_type, rest::httpclient::handler_func, httpclient::method_type op, key_values headers = {}, seastar::abort_source* = nullptr);
|
||
future<rest::httpclient::result_type> send_with_retry(const std::string& path, const std::string& scope, body_variant, std::string_view content_type, httpclient::method_type op, key_values headers = {}, seastar::abort_source* = nullptr);
|
||
|
||
auto get_units(size_t s) const {
|
||
return seastar::get_units(_limits, s);
|
||
}
|
||
auto try_get_units(size_t s) const {
|
||
return seastar::try_get_units(_limits, s);
|
||
}
|
||
future<> close();
|
||
};
|
||
|
||
future<> storage::client::impl::authorize(request_wrapper& req, const std::string& scope) {
|
||
if (_credentials) {
|
||
co_await _credentials->refresh(scope, &storage_scope_implies, _certs);
|
||
req.add_header(utils::gcp::AUTHORIZATION, format_bearer(_credentials->token));
|
||
}
|
||
}
|
||
|
||
utils::gcp::storage::client::impl::impl(const utils::http::url_info& url, std::optional<google_credentials> c, seastar::semaphore* memory, shared_ptr<seastar::tls::certificate_credentials> certs)
|
||
: _endpoint(url.host)
|
||
, _credentials(std::move(c))
|
||
, _unlimited(std::numeric_limits<ssize_t>::max())
|
||
, _limits(memory ? *memory : _unlimited)
|
||
, _client(std::make_unique<utils::http::dns_connection_factory>(url.host, url.port, url.is_https(), gcp_storage, certs), 100, seastar::http::experimental::client::retry_requests::yes)
|
||
{}
|
||
|
||
utils::gcp::storage::client::impl::impl(std::string_view endpoint, std::optional<google_credentials> c, seastar::semaphore* memory, shared_ptr<seastar::tls::certificate_credentials> certs)
|
||
: impl(utils::http::parse_simple_url(endpoint.empty() ? STORAGE_APIS_URI : endpoint), std::move(c), memory, std::move(certs))
|
||
{}
|
||
|
||
using status_type = seastar::http::reply::status_type;
|
||
|
||
static std::string get_gcp_error_message(std::string_view body) {
|
||
if (!body.empty()) {
|
||
try {
|
||
auto json = rjson::parse(body);
|
||
if (auto* error = rjson::find(json, "error")) {
|
||
if (auto msg = rjson::get_opt<std::string>(*error, "message")) {
|
||
return *msg;
|
||
}
|
||
}
|
||
} catch (...) {
|
||
}
|
||
}
|
||
return "no info";
|
||
}
|
||
|
||
static future<std::string> get_gcp_error_message(input_stream<char>& in) {
|
||
auto s = co_await util::read_entire_stream_contiguous(in);
|
||
co_return get_gcp_error_message(s);
|
||
}
|
||
|
||
utils::gcp::storage::storage_error::storage_error(const std::string& msg)
|
||
: std::runtime_error(msg)
|
||
, _status(-1)
|
||
{}
|
||
|
||
utils::gcp::storage::storage_error::storage_error(int status, const std::string& msg)
|
||
: std::runtime_error(fmt::format("{}: {}", status, msg))
|
||
, _status(status)
|
||
{}
|
||
|
||
using namespace seastar::http;
|
||
using namespace std::chrono_literals;
|
||
|
||
/**
|
||
* Performs a REST post/put/get with credential refresh/retry.
|
||
*/
|
||
future<>
|
||
utils::gcp::storage::client::impl::send_with_retry(const std::string& path, const std::string& scope, body_variant body, std::string_view content_type, handler_func_ex handler, httpclient::method_type op, key_values headers, seastar::abort_source* as) {
|
||
rest::request_wrapper req(_endpoint);
|
||
req.target(path);
|
||
req.method(op);
|
||
|
||
for (auto& [k,v] : headers) {
|
||
req.add_header(k, v);
|
||
}
|
||
|
||
std::visit(overloaded_functor {
|
||
[&](const std::string& s) { req.content(content_type, s); },
|
||
[&](const writer_and_size& ws) { req.content(content_type, ws.first, ws.second); }
|
||
}, body);
|
||
|
||
// GCP storage requires this even if content is empty
|
||
req.add_header("Content-Length", std::to_string(req.request().content_length));
|
||
|
||
gcp_storage.trace("Sending: {}", redacted_request_type {
|
||
req.request(),
|
||
bearer_filter()
|
||
});
|
||
|
||
try {
|
||
try {
|
||
co_await authorize(req, scope);
|
||
} catch (...) {
|
||
// just disregard the failure, we will retry below in the wrapped handler
|
||
}
|
||
auto wrapped_handler = [this, handler = std::move(handler), &req, scope](const reply& rep, input_stream<char>& in) -> future<> {
|
||
auto _in = std::move(in);
|
||
auto status_class = reply::classify_status(rep._status);
|
||
/*
|
||
* Surprisingly Google Cloud Storage (GCS) commonly returns HTTP 308 during resumable uploads, including when you use PUT. This is expected behavior and
|
||
* not an error. The 308 tells the client to continue the upload at the same URL without changing the method or body, which is exactly how GCS’s
|
||
* resumable upload protocol works.
|
||
*/
|
||
if (status_class != reply::status_class::informational && status_class != reply::status_class::success &&
|
||
rep._status != status_type::permanent_redirect) {
|
||
if (rep._status == status_type::unauthorized) {
|
||
gcp_storage.warn("Request to failed with status {}. Refreshing credentials.", rep._status);
|
||
co_await authorize(req, scope);
|
||
}
|
||
auto content = co_await util::read_entire_stream_contiguous(_in);
|
||
auto error_msg = get_gcp_error_message(std::string_view(content));
|
||
gcp_storage.debug("Got unexpected response status: {}, content: {}", rep._status, content);
|
||
co_await coroutine::return_exception_ptr(std::make_exception_ptr(httpd::unexpected_status_error(rep._status)));
|
||
}
|
||
|
||
std::exception_ptr eptr;
|
||
try {
|
||
// TODO: rename the fault injection point to something more generic
|
||
if (utils::get_local_injector().enter("s3_client_fail_authorization")) {
|
||
throw httpd::unexpected_status_error(status_type::unauthorized);
|
||
}
|
||
co_await handler(rep, _in);
|
||
} catch (...) {
|
||
eptr = std::current_exception();
|
||
}
|
||
if (eptr) {
|
||
co_await coroutine::return_exception_ptr(std::move(eptr));
|
||
}
|
||
};
|
||
object_storage_retry_strategy retry_strategy(10,10ms,10000ms, as);
|
||
co_return co_await rest::simple_send(_client, req, wrapped_handler, &retry_strategy, as);
|
||
} catch (...) {
|
||
try {
|
||
std::rethrow_exception(std::current_exception());
|
||
} catch (const httpd::unexpected_status_error& e) {
|
||
auto status = e.status();
|
||
|
||
if (reply::classify_status(status) == reply::status_class::redirection || status == reply::status_type::not_found) {
|
||
throw storage_io_error{ENOENT, format("GCP object doesn't exist ({})", status)};
|
||
}
|
||
if (status == reply::status_type::forbidden || status == reply::status_type::unauthorized) {
|
||
throw storage_io_error{EACCES, format("GCP access denied ({})", status)};
|
||
}
|
||
|
||
throw storage_io_error{EIO, format("GCP request failed with ({})", status)};
|
||
} catch (...) {
|
||
throw storage_io_error{EIO, format("GCP error ({})", std::current_exception())};
|
||
}
|
||
}
|
||
}
|
||
|
||
future<>
|
||
utils::gcp::storage::client::impl::send_with_retry(const std::string& path, const std::string& scope, body_variant body, std::string_view content_type, rest::httpclient::handler_func f, httpclient::method_type op, key_values headers, seastar::abort_source* as) {
|
||
co_await send_with_retry(path, scope, std::move(body), content_type, [f](const seastar::http::reply& rep, seastar::input_stream<char>& in) -> future<> {
|
||
// ensure these are on our coroutine frame.
|
||
auto& resp_handler = f;
|
||
auto result = co_await util::read_entire_stream_contiguous(in);
|
||
resp_handler(rep, result);
|
||
}, op, headers, as);
|
||
}
|
||
|
||
future<rest::httpclient::result_type>
|
||
utils::gcp::storage::client::impl::send_with_retry(const std::string& path, const std::string& scope, body_variant body, std::string_view content_type, httpclient::method_type op, key_values headers, seastar::abort_source* as) {
|
||
rest::httpclient::result_type res;
|
||
co_await send_with_retry(path, scope, std::move(body), content_type, [&res](const seastar::http::reply& r, std::string_view body) {
|
||
gcp_storage.trace("{}", body);
|
||
res.reply._status = r._status;
|
||
res.reply._content = sstring(body);
|
||
res.reply._headers = r._headers;
|
||
res.reply._version = r._version;
|
||
}, op, headers, as);
|
||
co_return res;
|
||
}
|
||
|
||
future<> utils::gcp::storage::client::impl::close() {
|
||
co_await _client.close();
|
||
}
|
||
|
||
// Get an upload session for the given object
|
||
// See https://cloud.google.com/storage/docs/resumable-uploads
|
||
// See https://cloud.google.com/storage/docs/performing-resumable-uploads
|
||
future<> utils::gcp::storage::client::object_data_sink::acquire_session() {
|
||
std::string body;
|
||
if (!_metadata.IsNull()) {
|
||
body = rjson::print(_metadata);
|
||
}
|
||
auto path = fmt::format("/upload/storage/v1/b/{}/o?uploadType=resumable&name={}"
|
||
, _bucket
|
||
, seastar::http::internal::url_encode(_object_name)
|
||
);
|
||
|
||
auto reply = co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_READ_WRITE
|
||
, std::move(body)
|
||
, APPLICATION_JSON
|
||
, httpclient::method_type::POST
|
||
, {}
|
||
, _as
|
||
);
|
||
|
||
if (reply.result() != status_type::ok) {
|
||
throw failed_operation(int(reply.result()), get_gcp_error_message(reply.body()));
|
||
}
|
||
std::string location = reply.reply._headers[LOCATION];
|
||
gcp_storage.debug("Upload {}/{} -> session uri {}", _bucket, _object_name, location);
|
||
_session_path = utils::http::parse_simple_url(location).path;
|
||
}
|
||
|
||
static const boost::regex range_ex("bytes=(\\d+)-(\\d+)");
|
||
|
||
static bool parse_response_range(const seastar::http::reply& r, uint64_t& first, uint64_t& last) {
|
||
auto& res_headers = r._headers;
|
||
auto i = res_headers.find(RANGE);
|
||
if (i == res_headers.end()) {
|
||
return false;
|
||
}
|
||
|
||
boost::smatch m;
|
||
std::string tmp(i->second);
|
||
if (!boost::regex_match(tmp, m, range_ex)) {
|
||
return false;
|
||
}
|
||
first = std::stoull(m[1].str());
|
||
last = std::stoull(m[2].str());
|
||
|
||
return true;
|
||
}
|
||
|
||
future<> utils::gcp::storage::client::object_data_sink::adjust_memory_limit(size_t total) {
|
||
auto held = _mem_held.count();
|
||
if (held < total) {
|
||
auto want = align_up(total, default_gcp_storage_chunk_size) - held;
|
||
|
||
if (held == 0) {
|
||
// first put into buffer queue. enforce.
|
||
_mem_held = co_await _impl->get_units(want);
|
||
} else {
|
||
// try to get units to cover the bulk of buffers
|
||
// but if we fail, we accept it and try to get by
|
||
// with the lease we have. If we get here we should
|
||
// have at least 8M in our lease, and will in fact do
|
||
// a write, so data should get released.
|
||
auto h = _impl->try_get_units(want);
|
||
if (h) {
|
||
_mem_held.adopt(std::move(*h));
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
// Write a chunk to the dest object
|
||
// See https://cloud.google.com/storage/docs/resumable-uploads
|
||
// See https://cloud.google.com/storage/docs/performing-resumable-uploads
|
||
future<> utils::gcp::storage::client::object_data_sink::do_single_upload(std::deque<temporary_buffer<char>> bufs, size_t offset, size_t len, bool final) {
|
||
// always take the whole memory lease. This might be more or less than what we actually release
|
||
// but we only ever leave sub-256k amount of data in queue, and we want the next
|
||
// put to enforce waiting for a full 8M lease...
|
||
auto mine_held = std::exchange(_mem_held, {});
|
||
|
||
// Ensure to block close from completing
|
||
auto h = _gate.hold();
|
||
// Enforce our concurrency constraints
|
||
auto sem_units = co_await seastar::get_units(_semaphore, 1);
|
||
|
||
// our file range. if the sink was closed, we can set the
|
||
// final size, otherwise, leave it open (*)
|
||
auto last = offset + std::max(len, size_t(1)) - 1; // inclusive.
|
||
auto end = offset + len;
|
||
|
||
for (;;) {
|
||
auto range = fmt::format("bytes {}-{}/{}"
|
||
, offset // first byte
|
||
, last // last byte
|
||
, final ? std::to_string(end) : "*"s
|
||
);
|
||
|
||
try {
|
||
if (_session_path.empty()) {
|
||
co_await acquire_session();
|
||
}
|
||
|
||
gcp_storage.debug("{}:{} write range {}-{}", _bucket, _object_name, offset, offset+len);
|
||
|
||
auto res = co_await _impl->send_with_retry(_session_path
|
||
, GCP_OBJECT_SCOPE_READ_WRITE
|
||
, std::make_pair([&](output_stream<char>&& os_in) -> future<> {
|
||
auto os = std::move(os_in);
|
||
for (auto& buf : bufs) {
|
||
co_await os.write(buf.share());
|
||
}
|
||
co_await os.flush();
|
||
co_await os.close();
|
||
}, len)
|
||
, ""s // no content type
|
||
, httpclient::method_type::PUT
|
||
, rest::key_values({ { CONTENT_RANGE, range } })
|
||
, _as
|
||
);
|
||
|
||
switch (res.result()) {
|
||
case status_type::ok:
|
||
case status_type::created:
|
||
_completed = true;
|
||
gcp_storage.debug("{}:{} completed ({} bytes)", _bucket, _object_name, offset+len);
|
||
co_return; // done and happy
|
||
default:
|
||
if (int(res.result()) == 308) {
|
||
uint64_t first = 0, new_last = 0;
|
||
if (parse_response_range(res.reply, first, new_last) && last != new_last) {
|
||
auto written = (new_last + 1) - offset;
|
||
|
||
gcp_storage.debug("{}:{} partial upload ({} bytes)", _bucket, _object_name, written);
|
||
|
||
if (!final && (len - written) < min_gcp_storage_chunk_size) {
|
||
written = len - std::min(min_gcp_storage_chunk_size, len);
|
||
}
|
||
|
||
auto to_remove = written;
|
||
while (to_remove) {
|
||
auto& buf = bufs.front();
|
||
auto size = std::min(to_remove, buf.size());
|
||
buf.trim_front(size);
|
||
if (buf.empty()) {
|
||
bufs.pop_front();
|
||
}
|
||
to_remove -= size;
|
||
}
|
||
offset += written;
|
||
len -= written;
|
||
auto total = std::accumulate(bufs.begin(), bufs.end(), size_t{}, [](size_t s, auto& buf) {
|
||
return s + buf.size();
|
||
});
|
||
assert(len == total);
|
||
continue;
|
||
}
|
||
// incomplete. ok for partial
|
||
gcp_storage.debug("{}:{} chunk {}:{} done", _bucket, _object_name, offset, offset+len);
|
||
co_return;
|
||
}
|
||
throw failed_upload_error(int(res.result()), get_gcp_error_message(res.body()));
|
||
}
|
||
} catch (...) {
|
||
_exception = std::current_exception();
|
||
gcp_storage.warn("Exception in upload of {}:{} ({}/{}): {}"
|
||
, _bucket
|
||
, _object_name
|
||
, offset
|
||
, len
|
||
, _exception
|
||
);
|
||
break;
|
||
}
|
||
}
|
||
}
|
||
|
||
// Check/close the final object.
|
||
future<> utils::gcp::storage::client::object_data_sink::check_upload() {
|
||
// Now we know the final size. Set it in range
|
||
auto range = fmt::format("bytes */{}", _accumulated);
|
||
|
||
auto res = co_await _impl->send_with_retry(_session_path
|
||
, GCP_OBJECT_SCOPE_READ_WRITE
|
||
, ""s
|
||
, APPLICATION_JSON
|
||
, httpclient::method_type::PUT
|
||
, rest::key_values({ { CONTENT_RANGE, range } })
|
||
, _as
|
||
);
|
||
|
||
switch (res.result()) {
|
||
case status_type::ok:
|
||
case status_type::created:
|
||
_completed = true;
|
||
gcp_storage.debug("{}:{} completed ({})", _bucket, _object_name, _accumulated);
|
||
co_return; // done and happy
|
||
default:
|
||
throw failed_upload_error(int(res.result()), fmt::format("{}:{} incomplete. ({}): {}"
|
||
, _bucket, _object_name, res.reply._headers[RANGE]
|
||
, get_gcp_error_message(res.body())
|
||
));
|
||
}
|
||
}
|
||
|
||
// https://cloud.google.com/storage/docs/performing-resumable-uploads#cancel-upload
|
||
future<> utils::gcp::storage::client::object_data_sink::remove_upload() {
|
||
if (_completed || _session_path.empty()) {
|
||
co_return;
|
||
}
|
||
|
||
gcp_storage.debug("Removing incomplete upload {}:{} ({})", _bucket, _object_name, _session_path);
|
||
|
||
auto res = co_await _impl->send_with_retry(_session_path
|
||
, GCP_OBJECT_SCOPE_READ_WRITE
|
||
, ""s
|
||
, APPLICATION_JSON
|
||
, httpclient::method_type::DELETE
|
||
, {}
|
||
, _as
|
||
);
|
||
|
||
switch (int(res.result())) {
|
||
case 499: // not in enum yet
|
||
gcp_storage.debug("Upload of {}:{} removed ({})", _bucket, _object_name, _session_path);
|
||
co_return; // done and happy
|
||
default: {
|
||
auto msg = get_gcp_error_message(res.body());
|
||
gcp_storage.warn("Failed to remove broken upload of {}:{} ({})", _bucket, _object_name, msg);
|
||
if (!_exception) {
|
||
throw failed_upload_error(int(res.result()), fmt::format("{}:{} incomplete. ({}): {}"
|
||
, _bucket, _object_name, res.reply._headers[RANGE]
|
||
, msg
|
||
));
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
// Read a single buffer from the source object
|
||
future<temporary_buffer<char>> utils::gcp::storage::client::object_data_source::get(size_t limit) {
|
||
// If we don't know the source size yet, get the info from server
|
||
if (_size == 0) {
|
||
co_await read_info();
|
||
}
|
||
|
||
// If we don't have buffers to give, try getting one from server
|
||
if (_buffers.empty()) {
|
||
auto to_read = std::min(_size - _position, limit);
|
||
|
||
// to_read == 0 -> eof
|
||
if (to_read != 0) {
|
||
gcp_storage.debug("Reading object {}:{} ({}-{}/{})", _bucket, _object_name, _position, _position+to_read, _size);
|
||
|
||
auto lease = _impl->try_get_units(to_read);
|
||
if (lease) {
|
||
if (_limits) {
|
||
_limits.adopt(std::move(*lease));
|
||
} else {
|
||
_limits = std::move(*lease);
|
||
}
|
||
} else {
|
||
// If we can't get a lease to cover this read, don't wait, as this
|
||
// could cause deadlock in higher layers, but instead adjust the
|
||
// size down to decrease memory pressure.
|
||
to_read = std::min(to_read, min_gcp_storage_chunk_size);
|
||
gcp_storage.debug("Reading object (adjusted) {}:{} ({}-{}/{})", _bucket, _object_name, _position, _position+to_read, _size);
|
||
}
|
||
|
||
// Ensure we read from the same generation as we queried in read_info. Note: mock server ignores this.
|
||
auto path = fmt::format("/storage/v1/b/{}/o/{}?ifGenerationMatch={}&alt=media"
|
||
, _bucket
|
||
, seastar::http::internal::url_encode(_object_name)
|
||
, _generation
|
||
);
|
||
auto range = fmt::format("bytes={}-{}", _position, _position+to_read-1); // inclusive range
|
||
|
||
co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_READ_ONLY
|
||
, ""s
|
||
, ""s
|
||
, [&](const seastar::http::reply& rep, seastar::input_stream<char>& in) -> future<> {
|
||
if (rep._status != status_type::ok && rep._status != status_type::partial_content) {
|
||
throw failed_operation(fmt::format("Could not read object {}: {} ({}/{} - {})", _bucket, _object_name, _position, _size, int(rep._status)));
|
||
}
|
||
auto old = _position;
|
||
// ensure these are on our coroutine frame.
|
||
auto bufs = co_await util::read_entire_stream(in);
|
||
for (auto&& buf : bufs) {
|
||
_position += buf.size();
|
||
_buffers.emplace_back(std::move(buf));
|
||
}
|
||
gcp_storage.debug("Read object {}:{} ({}-{}/{})", _bucket, _object_name, old, _position, _size);
|
||
}
|
||
, httpclient::method_type::GET
|
||
, rest::key_values({ { RANGE, range } })
|
||
, _as
|
||
);
|
||
}
|
||
}
|
||
|
||
temporary_buffer<char> res;
|
||
|
||
if (!_buffers.empty()) {
|
||
auto&& buf = _buffers.front();
|
||
if (buf.size() >= limit) {
|
||
res = buf.share(0, limit);
|
||
buf.trim_front(limit);
|
||
} else {
|
||
res = std::move(buf);
|
||
_buffers.pop_front();
|
||
}
|
||
}
|
||
|
||
adjust_lease();
|
||
|
||
co_return res;
|
||
}
|
||
|
||
future<temporary_buffer<char>> utils::gcp::storage::client::object_data_source::get() {
|
||
// If we don't have buffers to give, try getting one from server
|
||
co_return co_await get(default_gcp_storage_chunk_size);
|
||
}
|
||
|
||
future<> utils::gcp::storage::client::object_data_source::seek(uint64_t pos) {
|
||
if (_size == 0) {
|
||
co_await read_info();
|
||
}
|
||
auto buf_size = buffer_size();
|
||
assert(buf_size <= _position);
|
||
auto read_pos = _position - buf_size;
|
||
if (pos < read_pos || pos >= _position) {
|
||
_buffers.clear();
|
||
_position = std::min(pos, _size);
|
||
co_return;
|
||
}
|
||
auto n = pos - read_pos;
|
||
// Drop superfluous cache
|
||
while (n > 0 && !_buffers.empty()) {
|
||
auto m = std::min(n, _buffers.front().size());
|
||
_buffers.front().trim_front(m);
|
||
if (_buffers.front().empty()) {
|
||
_buffers.pop_front();
|
||
}
|
||
n -= m;
|
||
}
|
||
adjust_lease();
|
||
}
|
||
|
||
void utils::gcp::storage::client::object_data_source::adjust_lease() {
|
||
auto total = std::accumulate(_buffers.begin(), _buffers.end(), size_t{}, [](size_t s, auto& buf) {
|
||
return s + buf.size();
|
||
});
|
||
if (total < _limits.count()) {
|
||
_limits.return_units(_limits.count() - total);
|
||
}
|
||
}
|
||
|
||
future<uint64_t> utils::gcp::storage::client::object_data_source::size() {
|
||
if (_size == 0) {
|
||
co_await read_info();
|
||
}
|
||
co_return _size;
|
||
}
|
||
|
||
future<std::chrono::system_clock::time_point> utils::gcp::storage::client::object_data_source::timestamp() {
|
||
if (_timestamp.time_since_epoch().count() == 0) {
|
||
co_await read_info();
|
||
}
|
||
co_return _timestamp;
|
||
}
|
||
|
||
future<temporary_buffer<char>> utils::gcp::storage::client::object_data_source::skip(uint64_t n) {
|
||
auto buf_size = buffer_size();
|
||
assert(buf_size <= _position);
|
||
auto read_pos = _position - buf_size;
|
||
auto new_pos = read_pos + n;
|
||
co_await seek(new_pos);
|
||
// seek() clamps position to _size, so if the requested
|
||
// position exceeds _size we are skipping past EOF.
|
||
if (new_pos > _size) {
|
||
throw std::runtime_error("premature end of stream");
|
||
}
|
||
// And get the next buffer
|
||
co_return co_await get();
|
||
}
|
||
|
||
future<> utils::gcp::storage::client::object_data_source::read_info() {
|
||
gcp_storage.debug("Read info {}:{}", _bucket, _object_name);
|
||
|
||
auto path = fmt::format("/storage/v1/b/{}/o/{}", _bucket, seastar::http::internal::url_encode(_object_name));
|
||
|
||
auto res = co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_READ_ONLY
|
||
, ""s
|
||
, ""s
|
||
, httpclient::method_type::GET
|
||
, {}
|
||
, _as
|
||
);
|
||
|
||
if (res.result() != status_type::ok) {
|
||
throw failed_operation(fmt::format("Could not query object {}:{} {}", _bucket, _object_name, res.result()));
|
||
}
|
||
|
||
auto item = rjson::parse(std::move(res.body()));
|
||
// Ensure we got the info we asked for/expect
|
||
if (rjson::get<std::string>(item, "kind") != "storage#object"s) {
|
||
throw failed_operation("Malformed query object reply");
|
||
}
|
||
|
||
_size = std::stoull(rjson::get<std::string>(item, "size"));
|
||
_generation = std::stoull(rjson::get<std::string>(item, "generation"));
|
||
_timestamp = parse_rfc3339(rjson::get<std::string>(item, "updated"));
|
||
}
|
||
|
||
utils::gcp::storage::client::client(std::string_view endpoint, std::optional<google_credentials> c, shared_ptr<seastar::tls::certificate_credentials> certs)
|
||
: _impl(seastar::make_shared<impl>(endpoint, std::move(c), nullptr, std::move(certs)))
|
||
{}
|
||
|
||
utils::gcp::storage::client::client(std::string_view endpoint, std::optional<google_credentials> c, seastar::semaphore& memory, shared_ptr<seastar::tls::certificate_credentials> certs)
|
||
: _impl(seastar::make_shared<impl>(endpoint, std::move(c), &memory, std::move(certs)))
|
||
{}
|
||
|
||
utils::gcp::storage::client::~client() = default;
|
||
|
||
|
||
future<> utils::gcp::storage::client::create_bucket(std::string_view project, rjson::value meta) {
|
||
gcp_storage.debug("Create bucket {}:{}", project, rjson::get(meta, "name"));
|
||
|
||
auto path = fmt::format("/storage/v1/b?project={}", project);
|
||
auto body = rjson::print(meta);
|
||
|
||
auto res = co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_FULL_CONTROL
|
||
, body
|
||
, APPLICATION_JSON
|
||
, httpclient::method_type::POST
|
||
);
|
||
|
||
switch (res.result()) {
|
||
case status_type::ok:
|
||
case status_type::created:
|
||
co_return; // done and happy
|
||
default:
|
||
throw failed_operation(fmt::format("Could not create bucket {}: {}", rjson::get(meta, "name"), res.result()));
|
||
}
|
||
}
|
||
|
||
future<> utils::gcp::storage::client::create_bucket(std::string_view project, std::string_view bucket, std::string_view region, std::string_view storage_class) {
|
||
// Construct metadata. Could fmt::format, but this is somewhat safer.
|
||
rjson::value meta = rjson::empty_object();
|
||
rjson::add(meta, "name", std::string(bucket));
|
||
rjson::add(meta, "location", std::string(region.empty() ? "US" : region));
|
||
rjson::add(meta, "storageClass", std::string(storage_class.empty() ? "STANDARD" : storage_class));
|
||
|
||
rjson::value uniformBucketLevelAccess = rjson::empty_object();
|
||
rjson::add(uniformBucketLevelAccess, "enabled", true);
|
||
rjson::value iamConfiguration = rjson::empty_object();
|
||
rjson::add(iamConfiguration, "uniformBucketLevelAccess", std::move(uniformBucketLevelAccess));
|
||
rjson::add(meta, "iamConfiguration", std::move(iamConfiguration));
|
||
|
||
co_await create_bucket(project, std::move(meta));
|
||
}
|
||
|
||
future<> utils::gcp::storage::client::delete_bucket(std::string_view bucket_in) {
|
||
std::string bucket(bucket_in);
|
||
|
||
gcp_storage.debug("Delete bucket {}", bucket);
|
||
|
||
auto path = fmt::format("/storage/v1/b/{}", bucket);
|
||
|
||
auto res = co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_FULL_CONTROL
|
||
, ""s
|
||
, ""s
|
||
, httpclient::method_type::DELETE
|
||
);
|
||
|
||
switch (res.result()) {
|
||
case status_type::ok: // mock server sends wrong code, but seems acceptable
|
||
case status_type::no_content:
|
||
co_return; // done and happy
|
||
default:
|
||
throw failed_operation(fmt::format("Could not delete bucket {}: {}", bucket, res.result()));
|
||
}
|
||
}
|
||
|
||
static utils::gcp::storage::object_info create_info(const rjson::value& item) {
|
||
utils::gcp::storage::object_info info;
|
||
|
||
info.name = rjson::get<std::string>(item, "name");
|
||
info.content_type = rjson::get_opt<std::string>(item, "contentType").value_or(""s);
|
||
info.size = std::stoull(rjson::get<std::string>(item, "size"));
|
||
info.generation = std::stoull(rjson::get<std::string>(item, "generation"));
|
||
info.modified = parse_rfc3339(rjson::get<std::string>(item, "updated"));
|
||
|
||
return info;
|
||
}
|
||
|
||
// See https://cloud.google.com/storage/docs/listing-objects
|
||
// TODO: maybe make a generator? However, we don't have a streaming
|
||
// json parsing routine as such, so however we do this, we need to
|
||
// read all data from network, etc. Thus there is not all that much
|
||
// point in it. Return chunked_vector to avoid large alloc, but keep it
|
||
// in one object... for now...
|
||
future<utils::chunked_vector<utils::gcp::storage::object_info>> utils::gcp::storage::client::list_objects(std::string_view bucket_in, std::string_view prefix, bucket_paging& pager) {
|
||
utils::chunked_vector<utils::gcp::storage::object_info> result;
|
||
|
||
if (pager.done) {
|
||
co_return result;
|
||
}
|
||
|
||
std::string bucket(bucket_in);
|
||
|
||
gcp_storage.debug("List bucket {} (prefix={}, max_results={})", bucket, prefix, pager.max_results);
|
||
|
||
auto path = fmt::format("/storage/v1/b/{}/o", bucket);
|
||
auto psep = "?";
|
||
if (!prefix.empty()) {
|
||
path += fmt::format("{}prefix={}", psep, prefix);
|
||
psep = "&&";
|
||
}
|
||
if (pager.max_results != 0) {
|
||
path += fmt::format("{}maxResults={}", psep, pager.max_results);
|
||
psep = "&&";
|
||
}
|
||
if (!pager.token.empty()) {
|
||
path += fmt::format("{}pageToken={}", psep, pager.token);
|
||
psep = "&&";
|
||
}
|
||
|
||
co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_READ_ONLY
|
||
, ""s
|
||
, ""s
|
||
, [&](const seastar::http::reply& rep, seastar::input_stream<char>& in) -> future<> {
|
||
if (rep._status != status_type::ok) {
|
||
throw failed_operation(fmt::format("Could not list bucket {}: {} ({})", bucket, rep._status
|
||
, co_await get_gcp_error_message(in)
|
||
));
|
||
}
|
||
|
||
// ensure these are on our coroutine frame.
|
||
auto bufs = co_await util::read_entire_stream(in);
|
||
auto root = rjson::parse(std::move(bufs));
|
||
|
||
if (rjson::get<std::string>(root, "kind") != "storage#objects"s) {
|
||
throw failed_operation("Malformed list object reply");
|
||
}
|
||
|
||
auto items = rjson::find(root, "items");
|
||
if (!items) {
|
||
co_return;
|
||
}
|
||
if (!items->IsArray()) {
|
||
throw failed_operation("Malformed list object items");
|
||
}
|
||
|
||
pager.token = rjson::get_opt<std::string>(root, "nextPageToken").value_or(""s);
|
||
pager.done = pager.token.empty();
|
||
|
||
for (auto& item : items->GetArray()) {
|
||
object_info info = create_info(item);
|
||
result.emplace_back(std::move(info));
|
||
}
|
||
|
||
}
|
||
, httpclient::method_type::GET
|
||
);
|
||
|
||
co_return result;
|
||
}
|
||
|
||
future<utils::chunked_vector<utils::gcp::storage::object_info>> utils::gcp::storage::client::list_objects(std::string_view bucket, std::string_view prefix) {
|
||
bucket_paging dummy(0);
|
||
co_return co_await list_objects(bucket, prefix, dummy);
|
||
}
|
||
|
||
// See https://cloud.google.com/storage/docs/deleting-objects
|
||
future<> utils::gcp::storage::client::delete_object(std::string_view bucket_in, std::string_view object_name_in) {
|
||
std::string bucket(bucket_in), object_name(object_name_in);
|
||
|
||
gcp_storage.debug("Delete object {}:{}", bucket, object_name);
|
||
|
||
auto path = fmt::format("/storage/v1/b/{}/o/{}", bucket, seastar::http::internal::url_encode(object_name));
|
||
|
||
httpclient::result_type res;
|
||
try {
|
||
res = co_await _impl->send_with_retry(path, GCP_OBJECT_SCOPE_READ_WRITE, ""s, ""s, httpclient::method_type::DELETE);
|
||
} catch (const storage_io_error& ex) {
|
||
if (ex.code().value() == ENOENT) {
|
||
gcp_storage.debug("Could not delete {}:{} - no such object", bucket, object_name);
|
||
co_return; // ok...?
|
||
}
|
||
std::rethrow_exception(std::current_exception());
|
||
}
|
||
|
||
switch (res.result()) {
|
||
case status_type::ok:
|
||
case status_type::no_content:
|
||
gcp_storage.debug("Deleted {}:{}", bucket, object_name);
|
||
co_return; // done and happy
|
||
default:
|
||
throw failed_operation(fmt::format("Could not delete object {}:{}: {} ({})", bucket, object_name, res.result()
|
||
, get_gcp_error_message(res.body())
|
||
));
|
||
}
|
||
}
|
||
|
||
// See https://cloud.google.com/storage/docs/copying-renaming-moving-objects
|
||
// GCP does not support moveTo across buckets.
|
||
future<> utils::gcp::storage::client::rename_object(std::string_view bucket, std::string_view object_name, std::string_view new_bucket, std::string_view new_name) {
|
||
co_await copy_object(bucket, object_name, new_bucket, new_name);
|
||
co_await delete_object(bucket, object_name);
|
||
}
|
||
|
||
// See https://cloud.google.com/storage/docs/copying-renaming-moving-objects
|
||
future<> utils::gcp::storage::client::rename_object(std::string_view bucket_in, std::string_view object_name_in, std::string_view new_name_in) {
|
||
std::string bucket(bucket_in), object_name(object_name_in), new_name(new_name_in);
|
||
|
||
gcp_storage.debug("Move object {}:{} -> {}", bucket, object_name, new_name);
|
||
|
||
auto path = fmt::format("/storage/v1/b/{}/o/{}/moveTo/o/{}"
|
||
, bucket
|
||
, seastar::http::internal::url_encode(object_name)
|
||
, seastar::http::internal::url_encode(new_name)
|
||
);
|
||
auto res = co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_READ_WRITE
|
||
, ""s
|
||
, ""s
|
||
, httpclient::method_type::PUT
|
||
);
|
||
|
||
switch (res.result()) {
|
||
case status_type::ok:
|
||
case status_type::created:
|
||
gcp_storage.debug("Moved {}:{} to {}", bucket, object_name, new_name);
|
||
co_return; // done and happy
|
||
default:
|
||
throw failed_operation(fmt::format("Could not rename object {}:{}: {} ({})", bucket, object_name, res.result()
|
||
, get_gcp_error_message(res.body())
|
||
));
|
||
}
|
||
}
|
||
|
||
// See https://cloud.google.com/storage/docs/copying-renaming-moving-objects
|
||
// Copying an object in GCP can only process a certain amount of data in one call
|
||
// Must keep doing it until all data is copied, and check response.
|
||
future<> utils::gcp::storage::client::copy_object(std::string_view bucket_in, std::string_view object_name_in, std::string_view new_bucket_in, std::string_view to_name_in) {
|
||
std::string bucket(bucket_in), object_name(object_name_in), new_bucket(new_bucket_in), to_name(to_name_in);
|
||
|
||
auto path = fmt::format("/storage/v1/b/{}/o/{}/rewriteTo/b/{}/o/{}"
|
||
, bucket
|
||
, seastar::http::internal::url_encode(object_name)
|
||
, new_bucket
|
||
, seastar::http::internal::url_encode(to_name)
|
||
);
|
||
std::string body = "{}";
|
||
|
||
for (;;) {
|
||
auto res = co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_READ_WRITE
|
||
, body
|
||
, APPLICATION_JSON
|
||
, httpclient::method_type::POST
|
||
);
|
||
|
||
if (res.result() != status_type::ok) {
|
||
throw failed_operation(fmt::format("Could not copy object {}:{}: {} ({})", bucket, object_name, res.result()
|
||
, get_gcp_error_message(res.body())
|
||
));
|
||
}
|
||
|
||
auto resp = rjson::parse(res.body());
|
||
if (rjson::get<bool>(resp, "done")) {
|
||
gcp_storage.debug("Copied {}:{} to {}:{}", bucket, object_name, new_bucket, to_name);
|
||
co_return; // done and happy
|
||
}
|
||
|
||
auto token = rjson::get<std::string>(resp, "rewriteToken");
|
||
auto written = rjson::get<uint64_t>(resp, "totalBytesRewritten");
|
||
auto size = rjson::get<uint64_t>(resp, "objectSize");
|
||
|
||
// Call 2+ must include the rewriteToken
|
||
body = fmt::format("{{\"rewriteToken\": \"{}\"}}", token);
|
||
|
||
gcp_storage.debug("Partial copy of {}:{} to {}:{} ({}/{})", bucket, object_name, new_bucket, to_name, written, size);
|
||
}
|
||
}
|
||
|
||
future<utils::gcp::storage::object_info> utils::gcp::storage::client::merge_objects(std::string_view bucket_in, std::string_view dest_object_name, const std::vector<std::string>& source_object_names, rjson::value metadata, seastar::abort_source* as) {
|
||
rjson::value compose = rjson::empty_object();
|
||
rjson::value source_objects = rjson::empty_array();
|
||
|
||
if (source_object_names.size() > 32) {
|
||
throw std::invalid_argument(fmt::format("Can only merge up to 32 objects. {} requested.", source_object_names.size()));
|
||
}
|
||
|
||
for (auto& src : source_object_names) {
|
||
rjson::value obj = rjson::empty_object();
|
||
rjson::add(obj, "name", src);
|
||
rjson::push_back(source_objects, std::move(obj));
|
||
}
|
||
|
||
rjson::add(compose, "sourceObjects", std::move(source_objects));
|
||
rjson::add(compose, "destination", std::move(metadata));
|
||
|
||
std::string bucket(bucket_in), object_name(dest_object_name);
|
||
|
||
auto path = fmt::format("/storage/v1/b/{}/o/{}/compose", bucket, seastar::http::internal::url_encode(object_name));
|
||
auto body = rjson::print(compose);
|
||
|
||
auto res = co_await _impl->send_with_retry(path
|
||
, GCP_OBJECT_SCOPE_READ_WRITE
|
||
, body
|
||
, APPLICATION_JSON
|
||
, httpclient::method_type::POST
|
||
);
|
||
|
||
if (res.result() != status_type::ok) {
|
||
throw failed_operation(fmt::format("Could not merge to object {} -> {}:{}: {} ({})", source_object_names, bucket, object_name, res.result()
|
||
, get_gcp_error_message(res.body())
|
||
));
|
||
}
|
||
|
||
auto resp = rjson::parse(res.body());
|
||
|
||
co_return create_info(resp);
|
||
}
|
||
|
||
future<> utils::gcp::storage::client::copy_object(std::string_view bucket, std::string_view object_name, std::string_view to_name) {
|
||
co_await copy_object(bucket, object_name, bucket, to_name);
|
||
}
|
||
|
||
seastar::data_sink utils::gcp::storage::client::create_upload_sink(std::string_view bucket, std::string_view object_name, rjson::value metadata, seastar::abort_source* as) const {
|
||
return seastar::data_sink(std::make_unique<object_data_sink>(_impl, bucket, object_name, std::move(metadata), as));
|
||
}
|
||
|
||
seekable_data_source utils::gcp::storage::client::create_download_source(std::string_view bucket, std::string_view object_name, seastar::abort_source* as) const {
|
||
return seekable_data_source(std::make_unique<object_data_source>(_impl, bucket, object_name, as));
|
||
}
|
||
|
||
future<bool> storage::client::object_exists(std::string_view bucket, std::string_view object_name, seastar::abort_source* as) const {
|
||
gcp_storage.debug("Get object metadata {}:{}", bucket, object_name);
|
||
|
||
auto path = fmt::format("/storage/v1/b/{}/o/{}", bucket, seastar::http::internal::url_encode(object_name));
|
||
try {
|
||
auto res = co_await _impl->send_with_retry(path, GCP_OBJECT_SCOPE_READ_ONLY, ""s, ""s, httpclient::method_type::GET, {}, as);
|
||
if (res.result() != status_type::ok) {
|
||
throw failed_operation(
|
||
fmt::format("Could not retrieve object metadata {}:{}: {} ({})", bucket, object_name, res.result(), get_gcp_error_message(res.body())));
|
||
}
|
||
} catch (const storage_io_error& e) {
|
||
if (e.code().value() == ENOENT) {
|
||
co_return false;
|
||
}
|
||
throw;
|
||
}
|
||
co_return true;
|
||
}
|
||
|
||
future<> utils::gcp::storage::client::close() {
|
||
return _impl->close();
|
||
}
|
||
|
||
const std::string utils::gcp::storage::client::DEFAULT_ENDPOINT = "https://storage.googleapis.com";
|