Commit Graph

1532 Commits

Author SHA1 Message Date
Michał Chojnowski
c2e5d9e726 logalloc: add hold_reserve
mutation_partition_v2::apply_monotonically() needs to perform some allocations
in a destructor, to ensure that the invariants of the data structure are
restored before returning. But it is usually called with reclaiming disabled,
so the allocations might fail even in a perfectly healthy node with plenty of
reclaimable memory.

This patch adds a mechanism which allows to reserve some LSA memory (by
asking the allocator to keep it unused) and make it available for allocation
right when we need to guarantee allocation success.

(cherry picked from commit 7b3f55a65f)
2024-07-10 08:36:12 +00:00
Michał Chojnowski
80ff0688b1 logalloc: generalize refill_emergency_reserve()
In the next patch, we will want to do the thing as
refill_emergency_reserve() does, just with a quantity different
than _emergency_reserve_max. So we split off the shareable part
to a new function, and use it to implement refill_emergency_reserve().

(cherry picked from commit f784be6a7e)
2024-07-10 08:36:10 +00:00
Botond Dénes
43f77c71c7 [Backport 5.4] : Merge 'Fix usage of utils/chunked_vector::reserve_partial' from Lakshmi Narayanan Sreethar
utils/chunked_vector::reserve_partial: fix usage in callers

The method reserve_partial(), when used as documented, quits before the
intended capacity can be reserved fully. This can lead to overallocation
of memory in the last chunk when data is inserted to the chunked vector.
The method itself doesn't have any bug but the way it is being used by
the callers needs to be updated to get the desired behaviour.

Instead of calling it repeatedly with the value returned from the
previous call until it returns zero, it should be repeatedly called with
the intended size until the vector's capacity reaches that size.

This PR updates the method comment and all the callers to use the
right way.

Fixes #19254

Closes scylladb/scylladb#19279

* github.com:scylladb/scylladb:
  utils/large_bitset: remove unused includes identified by clangd
  utils/large_bitset: use thread::maybe_yield()
  test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial
  utils/lsa/chunked_managed_vector: fix reserve_partial()
  utils/chunked_vector: return void from reserve_partial and make_room
  test/boost/chunked_vector_test: fix testcase tests_reserve_partial
  utils/chunked_vector::reserve_partial: fix usage in callers

(cherry picked from commit b2ebc172d0)

Backported from #19308 to 5.4

Closes scylladb/scylladb#19355
2024-06-19 14:34:29 +03:00
Botond Dénes
98139a8716 Merge '[Backport 5.4] : Reload reclaimed bloom filters when memory is available' from Lakshmi Narayanan Sreethar
PR #17771 introduced a threshold for the total memory used by all bloom filters across SSTables. When the total usage surpasses the threshold, the largest bloom filter will be removed from memory, bringing the total usage back under the threshold. This PR adds support for reloading such reclaimed bloom filters back into memory when memory becomes available (i.e., within the 10% of available memory earmarked for the reclaimable components).

The SSTables manager now maintains a list of all SSTables whose bloom filter was removed from memory and attempts to reload them when an SSTable, whose bloom filter is still in memory, gets deleted. The manager reloads from the smallest to the largest bloom filter to maximize the number of filters being reloaded into memory.

Backported from https://github.com/scylladb/scylladb/pull/18186 to 5.4.

Closes scylladb/scylladb#18660

* github.com:scylladb/scylladb:
  sstable_datafile_test: add testcase to test reclaim during reload
  sstable_datafile_test: add test to verify auto reload of reclaimed components
  sstables_manager: reload previously reclaimed components when memory is available
  sstables_manager: start a fiber to reload components
  sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables
  sstable_datafile_test: add test to verify reclaimed components reload
  sstables: support reloading reclaimed components
  sstables_manager: add new intrusive set to track the reclaimed sstables
  sstable: add link and comparator class to support new instrusive set
  sstable: renamed intrusive list link type
  sstable: track memory reclaimed from components per sstable
  sstable: rename local variable in sstable::total_reclaimable_memory_size
2024-05-30 11:09:51 +03:00
Benny Halevy
8e20379305 utils: chunked_vector: fill ctor: make exception safe
Currently, if the fill ctor throws an exception,
the destructor won't be called, as it object is not
fully constructed yet.

Call the default ctor first (which doesn't throw)
to make sure the destructor will be called on exception.

Fixes scylladb/scylladb#18635

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-21 11:05:38 +03:00
Lakshmi Narayanan Sreethar
e30a2af700 sstable_datafile_test: add testcase to test reclaim during reload
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 4d22c4b68b)
2024-05-14 01:04:42 +05:30
Kefu Chai
d6c7a26419 utils/logalloc: do not allocate memory in reclaim_timer::report()
before this change, `reclaim_timer::report()` calls

```c++
fmt::format(", at {}", current_backtrace())
```

which allocates a `std::string` on heap, so it can fail and throw. in
that case, `std::terminate()` is called. but at that moment, the reason
why `reclaim_timer::report()` gets called is that we fail to reclaim
memory for the caller. so we are more likely to run into this issue. anyway,
we should not allocate memory in this path.

in this change, a dedicated printer is created so that we don't format
to a temporary `std::string`, and instead write directly to the buffer
of logger. this avoids the memory allocation.

Fixes #18099
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18100

(cherry picked from commit fcf7ca5675)
2024-04-02 15:13:49 +03:00
Kefu Chai
6fc3c62223 error_injection: do not cast to milliseconds when injecting timeout
before this change, we always cast the wait duration to millisecond,
even if it could be using a higher resolution. actually
`std::chrono::steady_clock` is using `nanosecond` for its duration,
so if we inject a deadline using `steady_clock`, we could be awaken
earlier due to the narrowing of the duration type caused by the
duration_cast.

in this change, we just use the duration as it is. this should allow
the caller to use the resolution provided by Seastar without losing
the precision.

Fixes #15902

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 8a5689e7a7)
2024-03-20 09:32:00 +00:00
Botond Dénes
62d8c7274a Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El
This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs.

The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function.

Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw.

Fixes #17035

Closes scylladb/scylladb#17073

* github.com:scylladb/scylladb:
  utils: replace assert() by on_internal_error()
  utils: add on_internal_error with common logger
  utils: add a timeuuid minimum, like we had maximum
  test/cql-pytest: tests for "date" type

(cherry picked from commit 2a4b991772)
2024-02-07 13:47:55 +02:00
Michael Huang
75109e9519 cql3: Fix invalid JSON parsing for JSON objects with ASCII keys
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.

Included the error offset where JSON parsing failed for
rjson::parse related functions to help identify parsing errors
better.

Fixes: #7949

Signed-off-by: Michael Huang <michaelhly@gmail.com>

Closes scylladb/scylladb#15499
2023-10-05 22:26:08 +03:00
Avi Kivity
e600f35d1e Merge 'logalloc, reader_concurrency_semaphore: cooperate on OOM kills' from Botond Dénes
Consider the following code snippet:
```c++
future<> foo() {
    semaphore.consume(1024);
}

future<> bar() {
    return _allocating_section([&] {
        foo();
    });
}
```

If the consumed memory triggers the OOM kill limit, the semaphore will throw `std::bad_alloc`. The allocating section will catch this, bump std reserves and retry the lambda. Bumping the reserves will not do anything to prevent the next call to `consume()` from triggering the kill limit. So this cycle will repeat until std reserves are so large that ensuring the reserve fails. At this point LSA gives up and re-throws the `std::bad_alloc`. Beyond the useless time spent on code that is doomed to fail, this also results in expensive LSA compaction and eviction of the cache (while trying to ensure reserves).
Prevent this situation by throwing a distinct exception type which is derived from `std::bad_alloc`. Allocating section will not retry on seeing this exception.
A test reproducing the bug is also added.

Fixes: #15278

Closes scylladb/scylladb#15581

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill
  utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled()
  reader_concurrency_semaphore: use utils::memory_limit_reached exception
  utils: add memory_limit_reached exception
2023-10-05 19:47:21 +03:00
Pavel Emelyanov
c4f1929eea s3: Abort multipart upload if finalize request fails
It may happen that wrapping up multipart upload fails too. However,
before sending the request the driver clears the _upload_id field thus
marking the whole process as "all is OK". So in case the finalization
method fails and thrown, the upload context remains on the server side
forever.

Fix this by keeping the _upload_id set, so even if finalization throws,
closing the uploader notices this and calls abort.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15521
2023-10-03 09:47:33 +03:00
Botond Dénes
c0da6bcfb8 utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled()
Said method catches bad-allocs and retries the passed-in function after
raising the reserves. This does nothing to help the function succeed if
the bad alloc was throw from the semaphore, because the kill limit was
reached. In this case the read should be left to fail and terminate.
Now that the semaphore is throwing utils::memory_limit_reached in this
case, we can distinguish this case and just re-throw the exception.
2023-09-27 10:28:00 -04:00
Botond Dénes
721ffa319d utils: add memory_limit_reached exception
A distinct exception derived from std::bad_alloc, used in cases when
memory didn't really run out, but the process or task reached the memory
limit alloted for it. Using a distinct type for this case allows for LSA
to correctly react to this case.
2023-09-27 10:26:41 -04:00
Kefu Chai
ac3406e537 utils/s3/creds: rename aws_config member variables
- s/key/access_key_id/
- s/secret/secret_access_key/
- s/token/session_token/

so they are more aligned with the AWS document.
for instance, in
https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#ConstructingTheAuthenticationHeader
AWSAccessKeyId is used in the "Authorization" header.

this would help with the readability and maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-23 14:28:07 +08:00
Avi Kivity
1da6a939fe Merge 'Track memory usage of S3 object uploads' from Pavel Emelyanov
The S3 uploading sink needs to collect buffers internally before sending them out, because the minimal upload-able part size is 5Mb. When the necessary amount of bytes is accumulated, the part uploading fibers starts in the background. On flush the sink waits for all the fibers to complete and handles failure of any.

Uploading parallelism is nowadays limited by the means of the http client max-connections parameter. However, when a part uploading fibers waits for it connection it keeps the 5Mb+ buffers on the request's body, so even though the number of uploading parts is limited, the number of _waiting_ parts is effectively not.

This PR adds a shard-wide limiter on the number of background buffers S3 clients (and theirs http clients) may use.

Closes scylladb/scylladb#15497

* github.com:scylladb/scylladb:
  s3::client: Track memory in client uploads
  code: Configure s3 clients' memory usage
  s3::client: Construct client with shared semaphore
  sstables::storage_manager: Introduce config
2023-09-21 18:24:42 +03:00
Botond Dénes
a0c5dee2aa utils/logalloc: introduce logalloc::bad_alloc
This new exception type inherits from std::bad_alloc and allows logalloc
code to add additional information about why the allocation failed. We
currently have 3 different throw sites for std::bad_alloc in logalloc.cc
and when investigating a coredump produced by --abort-on-lsa-bad-alloc,
it is impossible to determine, which throw-site activated last,
triggering the abort.
This patch fixes that by disambiguating the throw-sites and including it
in the error message printed, right before abort.

Refs: #15373

Closes scylladb/scylladb#15503
2023-09-21 17:43:53 +03:00
Kefu Chai
0819788207 utils/s3: use structured binding when appropriate
and use `sstring::starts_with()`, for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15487
2023-09-21 13:26:49 +03:00
Kefu Chai
c364efb998 utils/s3: auth using AWS_SESSION_TOKEN
when accessing AWS resources, uses are allowed to long-term security
credentials, they can also the temporary credentials. but if the latter
are used, we have to pass a session token along with the keys.
see also https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html
so, if we want to programatically get authenticated, we need to
set the "x-amz-security-token" header,
see
https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#UsingTemporarySecurityCredentials

so, in this change, we

1. add another member named `token` in `s3::endpoint_config::aws_config`
   for storing "AWS_SESSION_TOKEN".
2. populate the setting from "object_storage.yaml" and
  "$AWS_SESSION_TOKEN" environment variable.
3. set "x-amz-security-token" header if
   `s3::endpoint_config::aws_config::token` is not empty.

this should allow us to test s3 client and s3 object store backend
with S3 bucket, with the temporary credentials.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15486
2023-09-21 13:26:11 +03:00
Pavel Emelyanov
e6fe18ca55 s3: Handle piece flushing exception
When a piece is uploaded it's first flushed, then upload-copy is issued.
Both happen in the background and if piece flush calls resolves with
exception the exception remains unhandled. That's OK, since upload
finalization code checks that some pieces didn't complete (for whatever
reason) and fails the whole uploading, however, the ignored exception is
reported in logs. Not nice.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15491
2023-09-21 10:39:04 +03:00
Kefu Chai
fe4caeb77f utils/s3/client: do not allocate rapidxml::xml_document on stack
as the size of `rapidxml::xml_document` size quite large, let's
allocate it on the heap. otherwise GCC 13.2.1 warns us like:
```
utils/s3/client.cc: In function ‘seastar::sstring s3::parse_multipart_copy_upload_etag(seastar::sstring&)’:
utils/s3/client.cc:455:9: warning: stack usage is 66208 bytes [-Wstack-usage=]
  455 | sstring parse_multipart_copy_upload_etag(sstring& body) {
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15472
2023-09-21 08:51:08 +03:00
Pavel Emelyanov
fc5306c5e8 s3::client: Track memory in client uploads
When uploading an object part, client spawns a background fiber that
keeps the buffers with data on the http request's write_body() lambda
capture. This generates unbound usage of memory with uploaded buffers
which is not nice. Even though s3 client is limited with http's client
max-connections parallelism, waiting for the available connection still
happens with buffers held in memory.

This patch makes the client claim the background memory from the
provided semaphore (which, in turn, sits on the shard-wide storage
manager instance). Once body writing is complete, the claimed units are
returned back to the semaphore allowing for more background writes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-20 17:50:29 +03:00
Pavel Emelyanov
b299757884 s3::client: Construct client with shared semaphore
The semaphore will be used to cap memory consumption by client. This
patch makes sure the reference to a semaphore exists as an argument to
client's constructor, not more than that.

In scylla binary, the semaphore sits on storage_manager. In tests the
semaphore is some local object. For now the semaphore is unused and is
initialized locked as this patch just pushes the needed argument all the
way around, next patches will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-20 17:50:07 +03:00
Kefu Chai
4d285590f0 utils/config_file: document config_file::value_status
add doxygen style comment to document `value_status` members.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15277
2023-09-18 16:20:06 +03:00
Pavel Emelyanov
30959fc9b1 lsa, test: Extend memory footprint test with per-type total sizes
When memory footprint test is over it prints total size taken by row
cache, memtable and sstables as well as individual objects' sizes. It's
also nice to know the details on the row-cache's individual objects.
This patch extends the printing with total size of allocated object
types according to migrator_fn types.

Sample output:

    mutation footprint:
     - in cache:     11040928
     - in memtable:  9142424
     - in sstable:
       mc:   2160000
       md:   2160000
       me:   2160000
     - frozen:       540
     - canonical:    827
     - query result: 342

     sizeof(cache_entry) = 64
     sizeof(memtable_entry) = 64
     sizeof(bptree::node) = 288
     sizeof(bptree::data) = 72
     -- sizeof(decorated_key) = 32
     -- sizeof(mutation_partition) = 96
     -- -- sizeof(_static_row) = 8
     -- -- sizeof(_rows) = 24
     -- -- sizeof(_row_tombstones) = 40

     sizeof(rows_entry) = 144
     sizeof(evictable) = 24
     sizeof(deletable_row) = 72
     sizeof(row) = 16
     radix_tree::inner_node::node_sizes =  48 80 144 272 528 1040
     radix_tree::leaf_node::node_sizes =  120 216 416 816 3104
     sizeof(atomic_cell_or_collection) = 16
     btree::linear_node_size(1) = 24
     btree::inner_node_size = 216
     btree::leaf_node_size = 120
    LSA stats:
      N18compact_radix_tree4treeI13cell_and_hashjE9leaf_nodeE: 360
      N5bplus4dataIl15intrusive_arrayI11cache_entryEN3dht25raw_token_less_comparatorELm16ELNS_10key_searchE0ELNS_10with_debugE0EEE: 5040
      N5bplus4nodeIl15intrusive_arrayI11cache_entryEN3dht25raw_token_less_comparatorELm16ELNS_10key_searchE0ELNS_10with_debugE0EEE: 19296
      17partition_version: 952416
      N11intrusive_b4nodeI10rows_entryXadL_ZNS1_5_linkEEENS1_11tri_compareELm12ELm20ELNS_10key_searchE0ELNS_10with_debugE0EEE: 317472
      10rows_entry: 1429056
      12blob_storage: 254

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15434
2023-09-18 11:23:18 +02:00
Avi Kivity
d9a453e72e Merge 'Introduce a scylla-native nodetool' from Botond Dénes
This series introduces a scylla-native nodetool.  It is invokable via the main scylla executable as the other native tools we have. It uses the seastar's new `http::client` to connect to the specified node and execute the desired commands.
For now a single command is implemented: `nodetool compact`, invokable as `scylla nodetool compact`. Once all the boilerplate is added to create a new tool, implementing a single command is not too bad, in terms of code-bloat. Certainly not as clean as a python implementation would be, but good enough. The advantages of a C++ implementation is that all of us in the core team know C++ and that it is shipped right as part of the scylla executable..

Closes #14841

* github.com:scylladb/scylladb:
  test: add nodetool tests
  test.py: add ToolTestSuite and ToolTest
  tools/scylla-nodetool: implement compact operation
  tools/scylla-nodetool: implement basic scylla_rest_api_client
  tools: introduce scylla-nodetool
  utils: export dns_connection_factory from s3/client.cc to http.hh
  utils/s3/client: pass logger to dns_connection_factory in constructor
  tools/utils: tool_app_template::run_async(): also detect --help* as --help
2023-09-14 17:20:40 +03:00
Botond Dénes
bf2fad3c00 utils: export dns_connection_factory from s3/client.cc to http.hh
So others can use it too. Move headers only used by said class too.
2023-09-14 05:25:14 -04:00
Botond Dénes
17fd57390e utils/s3/client: pass logger to dns_connection_factory in constructor
We want to publish this class in a header so it can be used by others,
but it uses the s3 logger. We don't want future users to pollute the s3
logs, so allow users to pass their own loggers to the factory.
2023-09-14 05:25:14 -04:00
Botond Dénes
cc16502691 Merge 'Add metrics to S3 client' from Pavel Emelyanov
The added metrics include:

- http client metrics, which include the number of connections, the number of active connections and the number of new connections made so far
- IO metrics that mimic those for traditional IO -- total number of object read/write ops, total number of get/put/uploaded bytes and individual IO request delay (round-trip, including body transfer time)

fixes: #13369

Closes #14494

* github.com:scylladb/scylladb:
  s3/client: Add IO stats metrics
  s3/client: Add HTTP client metrics
  s3/client: Split make_request()
  s3/client: Wrap http client with struct group_client
  s3/client: Move client::stats to namespace scope
  s3/client: Keep part size local variable
2023-09-14 09:49:08 +03:00
Kefu Chai
87088b65b6 util: replace <tab> with spaces
to be aligned with seastar's coding-style.md: scylladb uses seastar's
coding-style.md. so let's adhere to it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15345
2023-09-11 14:38:46 +03:00
Kefu Chai
ce291f4385 s3/client: do not use deprecated tls::connect() overload
seastar has deprecated the overload which accepts `server_name`,
let's use the one which accepts `tls::tls_options`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15324
2023-09-08 18:44:45 +03:00
Pavel Emelyanov
308db51306 s3/client: Add IO stats metrics
These metrics mimic the existing IO ones -- total number of read
operation, total number of read bytes and total read delay. And the same
for writing.

This patch makes no difference between wrting object with plain PUT vs
putting it with multipart uploading. Instead, it "measures" individual
IO writes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
91235a84cd s3/client: Add HTTP client metrics
Currently an http client has several exported "numbers" regarding the
number of transport connections the client uses. This patch exports
those via S3 client's per-sched-group metrics and prepares the ground
for more metrics in next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
08a12cd4a6 s3/client: Split make_request()
There will appear another make_request() helper that'll do mostly the
same. This split will help to avoid code duplication

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
4b548dd240 s3/client: Wrap http client with struct group_client
The http-client is per-sched-group. Next patch will need to keep metrics
per-sched-group too and this sched-group -> http-client map is the good
place to put them on. Wrapping struct will allow extending it with
metrics

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
627c1932e4 s3/client: Move client::stats to namespace scope
The stats is stats about object, not about client, so it's better if it
lives in namespace scope. Also it will avoid conflicts with client stats
that will be reported as metrics (later patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
896b582850 s3/client: Keep part size local variable
This serves two purposes. First, it fixes potential use-after-move since
the bufs are moved on lambda and bufs.size() are called in the same
statement with no defined evaluation order.

Second, this makes 'size' varable alive up to the time request is
complete thus making it possible to update stats with it (later patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Dawid Medrek
c7fe5d7f94 utils/lister: Limit the API of scan_dir() to fs::path
Right now, the function allows for passing the path to a file as a seastar::sstring,
which is then converted to std::filesystem::path -- implicitly to the caller.
However, the function performs I/O, and there is no reason to accept any other type
than std::filesystem::path, especially because the conversion is straightforward.
Callers can perform it on their own.

This commit introduces the more constrained API.

Closes #15266
2023-09-05 20:50:42 +03:00
Benny Halevy
eb51b70e6d utils: atomic_vector: mark for_each functions as const
They only need to access the _vec_lock rwlock
so mark it as mutable, but otherwise they provide a const
interface to the calls, as the called func receives
the entries by value and it cannot modify them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:14:38 +03:00
Michał Chojnowski
c7d9d35030 utils: cached_file: deglobalize cached_file metrics
Move cached_file metrics from a thread_local variable
to cache_tracker.

This is needed so that cache_tracker can know
the memory usage of index caches (for purposes
of cache eviction) without relying on globals.

But it also makes sense even without that motive.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
50b429f255 config: add index_cache_fraction
Adds a configurable upper limit to memory usage by index caches.
See the source code comments added in this patch for more details.

This patch shouldn't change visible behaviour, because the limit is set to 1.0
by default, so it is never triggerred. We will change the default in a future
patch.
2023-09-01 22:34:23 +02:00
Michał Chojnowski
6a7ce6781e utils: lru: add move semantics to list links
Before the patch, fixing list links is done manually in the move constructor of
`evictable`. After the patch, it is done by the move constructors of the links
themselves.

This makes for slightly cleaner code, especially after we add more links in an
upcoming patch.
2023-09-01 22:34:23 +02:00
Piotr Smaroń
34c3688017 db: config: add live_updatable_config_params_changeable_via_cql option
If `live_updatable_config_params_changeable_via_cql` is set to true, configuration parameters defined with `liveness::LiveUpdate` option can be updated in the runtime with CQL, i.e. by updating `system.config` virtual table.
If we don't want any configuration parameter to be changed in the
runtime by updating `system.config` virtual table, this option should be
set to false. This option should be set to false for e.g. cloud users,
who can only perform CQL queries, and should not be able to change
scylla's configuration on the fly.

Current implemenatation is generic, but has a small drawback - messages
returned to the user can be not fully accurate, consider:
```
cqlsh> UPDATE system.config SET value='2' WHERE name='task_ttl_in_seconds';
WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="option is not live-updateable" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
```
where `task_ttl_in_seconds` has been defined with
`liveness::LiveUpdate`, but because `live_updatable_config_params_changeable_via_cql` is set to
`false` in `scylla.yaml,` `task_ttl_in_seconds` cannot be modified in the
runtime by updating `system.config` virtual table.

Fixes #14355

Closes #14382
2023-08-16 17:56:27 +03:00
Pavel Emelyanov
3c6686e181 bptree: Replace assert with static_assert
The one runs under checked constexpr value anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14951
2023-08-06 16:36:12 +03:00
Kamil Braun
39ca07c49b Merge 'Gossiper endpoint locking' from Benny Halevy
This series cleans up and hardens the endpoint locking design and
implementation in the gossiper and endpoint-state subscribers.

We make sure that all notifications (expect for `before_change`, that
apparently can be dropped) are called under lock_endpoint, as well as
all calls to gossiper::replicate, to serialize endpoint_state changes
across all shards.

An endpoint lock gets a unique permit_id that is passed to the
notifications and passed back by them if the notification functions call
the gossiper back for the same endpoint on paths that modify the
endpoint_state and may acquire the same endpoint lock - to prevent a
deadlock.

Fixes scylladb/scylladb#14838
Refs scylladb/scylladb#14471

Closes #14845

* github.com:scylladb/scylladb:
  gossiper: replicate: ensure non-null permit
  gossiper: add_saved_endpoint: lock_endpoint
  gossiper: mark_as_shutdown: lock_endpoint
  gossiper: real_mark_alive: lock_endpoint
  gossiper: advertise_token_removed: lock_endpoint
  gossiper: do_status_check: lock_endpoint
  gossiper: remove_endpoint: lock_endpoint if needed
  gossiper: force_remove_endpoint: lock_endpoint if needed
  storage_service: lock_endpoint when removing node
  gossiper: use permit_id to serialize state changes while preventing deadlocks
  gossiper: lock_endpoint: add debug messages
  utils: UUID: make default tagged_uuid ctor constexpr
  gossiper: lock_endpoint must be called on shard 0
  gossiper: replicate: simplify interface
  gossiper: mark_as_shutdown: make private
  gossiper: convict: make private
  gossiper: mark_as_shutdown: do not call convict
2023-08-02 13:50:08 +02:00
Benny Halevy
929d03b370 utils: UUID: make default tagged_uuid ctor constexpr
So it can be used for gms::null_permit_id in the next patch

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-07-31 19:29:18 +03:00
Benny Halevy
60862c63dd utils/directories: verify_owner_and_mode: add recursive flag
Allow the caller to verify only the top level directories
so that sub-directories can be verified selectively
(in particular, skip validation of snapshots).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-07-31 16:01:43 +03:00
Raphael S. Carvalho
050ce9ef1d cached_file: Evict unused pages that aren't linked to LRU yet
It was found that cached_file dtor can hit the following assert
after OOM

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

cached_file's dtor iterates through all entries and evict those
that are linked to LRU, under the assumption that all unused
entries were linked to LRU.

That's partially correct. get_page_ptr() may fetch more than 1
page due to read ahead, but it will only call cached_page::share()
on the first page, the one that will be consumed now.

share() is responsible for automatically placing the page into
LRU once refcount drops to zero.

If the read is aborted midway, before cached_file has a chance
to hit the 2nd page (read ahead) in cache, it will remain there
with refcount 0 and unlinked to LRU, in hope that a subsequent
read will bring it out of that state.

Our main user of cached_file is per-sstable index caching.
If the scenario above happens, and the sstable and its associated
cached_file is destroyed, before the 2nd page is hit, cached_file
will not be able to clear all the cache because some of the
pages are unused and not linked.

A page read ahead will be linked into LRU so it doesn't sit in
memory indefinitely. Also allowing for cached_file dtor to
clear all cache if some of those pages brought in advance
aren't fetched later.

A reproducer was added.

Fixes #14814.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14818
2023-07-27 00:01:46 +02:00
Kefu Chai
a8254111ef utils: drop operator<< for pretty printers
since all callers of these operators have switched to fmt formatters.
let's drop them. the tests are updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-17 14:02:13 +08:00
Kefu Chai
fc6b84ec1f utils: add fmt formatter for pretty printers
add fmt formatter for `utils::pretty_printed_data_size` and
`utils::pretty_printed_throughput`.

this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `utils::pretty_printed_data_size` and
`utils::pretty_printed_throughput` without the help of `operator<<`.

please note, despite that it's more popular to use the IEC prefixes
when presenting the size of storage, i.e., MiB for 1024**2 bytes instead
of MB for 1000**2 bytes, we are still using the SI binary prefixes as
the default binary prefix, in order to preserve the existing behavior.

also, we use the singular form of "byte" when formating "1". this is
more correct.

the tests are updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-17 14:02:13 +08:00