Commit Graph

3109 Commits

Author SHA1 Message Date
Botond Dénes
b704698ba5 Merge 'Close toc file in remove_by_toc_name()' from Pavel Emelyanov
The method in question suffers from scylladb/seastar#1298. The PR fixes it and makes a bit shorter along the way

Closes #13776

* github.com:scylladb/scylladb:
  sstable: Close file at the end
  sstables: Use read_entire_stream_cont() helper
2023-05-05 11:33:05 +03:00
Botond Dénes
0cccf9f1cc Merge 'Remove some file_writer public methods' from Pavel Emelyanov
One is unused, the other one is not really required in public

Closes #13771

* github.com:scylladb/scylladb:
  file_writer: Remove static make() helper
  sstable: Use toc_filename() to print TOC file path
2023-05-05 10:48:46 +03:00
Raphael S. Carvalho
1f69c46889 sstables: use version_types received from parser or writer
This is only a cosmetical change, no change in semantics

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13779
2023-05-05 10:32:14 +03:00
Pavel Emelyanov
75e7187e1a sstable: Close file at the end
The thing is than when closing file input stream the underlying file is
not .close()-d (see scylladb/seastar#1298). The remove_by_toc_name() is
buggy in this sense. Using with_closeable() fixes it and makes the code
shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 20:37:48 +03:00
Pavel Emelyanov
334383beb5 sstables: Use read_entire_stream_cont() helper
The remove_by_toc_name() wants to read the whole stream into a sstring.
There's a convenience helper to facilitate that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 20:37:09 +03:00
Avi Kivity
f125a3e315 Merge 'tree: finish the reader_permit state renames' from Botond Dénes
In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`.
This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date.

Closes #13573

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update API w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes
2023-05-04 18:29:04 +03:00
Avi Kivity
1d351dde06 Merge 'Make S3 client work with real S3' from Pavel Emelyanov
Current S3 client was tested over minio and it takes few more touches to work with amazon S3.

The main challenge here is to support singed requests. The AWS S3 server explicitly bans unsigned multipart-upload requests, which in turn is the essential part of the sstables S3 backend, so we do need signing. Signing a request has many options and requirements, one of them is -- request _body_ can be or can be not included into signature calculations. This is called "(un)signed payload". Requests sent over plain HTTP require payload signing (i.e. -- request body should be included into signature calculations), which can a bit troublesome, so instead the PR uses unsigned payload (i.e. -- doesn't include the request body into signature calculation, only necessary headers and query parameters), but thus also needs HTTPS.

So what this set does is makes the existing S3 client code sign requests. In order to sign the request the code needs to get AWS key and secret (and region) from somewhere and this somewhere is the conf/object_storage.yaml config file. The signature generating code was previously merged (moved from alternator code) and updated to suit S3 client needs.

In order to properly support HTTPS the PR adds special connection factory to be used with seastar http client. The factory makes DNS resolving of AWS endpoint names and configures gnutls systemtrust.

fixes: #13425

Closes #13493

* github.com:scylladb/scylladb:
  doc: Add a document describing how to configure S3 backend
  s3/test: Add ability to run boost test over real s3
  s3/client: Sign requests if configured
  s3/client: Add connection factory with DNS resolve and configurable HTTPS
  s3/client: Keep server port on config
  s3/client: Construct it with config
  s3/client: Construct it with sstring endpoint
  sstables: Make s3_storage with endpoint config
  sstables_manager: Keep object storage configs onboard
  code: Introduce conf/object_storage.yaml configuration file
2023-05-04 18:08:54 +03:00
Pavel Emelyanov
c4394a059c file_writer: Remove static make() helper
It's simply unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 16:55:41 +03:00
Pavel Emelyanov
eaf534cc4b sstable: Use toc_filename() to print TOC file path
The sstable::write_toc() gets TOC filename from file writer, while it
can get it from itself. This makes the file_writer::get_filename()
private and actually improves logging, as the writer is not required
to have the filename onboard, while sstable always has it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-04 16:54:21 +03:00
Benny Halevy
205daf49fd sstable_directory: coroutinize parallel_for_each_restricted
Using a coroutine simplifies the function and reduced the
number of moves it performs.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-04 11:46:59 +03:00
Benny Halevy
e4acc44814 sstable_directory: parallel_for_each_restricted: use std::ranges for template definition
We'd like the container to be a std::ranges::range.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-04 11:44:24 +03:00
Benny Halevy
e2023877f2 sstable_directory: parallel_for_each_restricted: do not move container
Commit ecbd112979
`distributed_loader: reshard: consider sstables for cleanup`
caused a regression in loading new sstables using the `upload`
directory, as seen in e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/230/testReport/migration_test/TestMigration/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split000___test_migrate_sstable_without_compression_3_0_md_/
```
        query = "SELECT COUNT(*) FROM cf"
        statement = SimpleStatement(query)
        s = self.patient_cql_connection(node, 'ks')
        result = list(s.execute(statement))
>       assert result[0].count == expected_number_of_rows, \
            "Expected {} rows. Got {}".format(expected_number_of_rows, list(s.execute("SELECT * FROM ks.cf")))
E       AssertionError: Expected 1 rows. Got []
E       assert 0 == 1
E         +0
E         -1
```

The reason for the regression is that the call to `do_for_each_sstable`
in `collect_all_shared_sstables` to search for sstables that need
cleanup caused the list of sstables in the sstable directory to be
moved and cleared.

parallel_for_each_restricted moves the container passed to it
into a `do_with` continuation.  This is required for
parallel_for_each_restricted.

However, moving the container is destructive and so,
the decision whether to move or not needs to be the
caller's, not the callee.

This patch changes the signature of parallel_for_each_restricted
to accept a lvalue reference to the container rather than a rvalue reference,
allowing the callers to decide whether to move or not.

Most callers are converted to move the container, as they effectively do
today, and a new method, `filter_sstables` was added for the
`collect_all_shared_sstables` us case, that allows the `func` that
processes each sstable to decide whether the sstable is kept
in `_unshared_local_sstables` or not.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-05-04 11:36:25 +03:00
Pavel Emelyanov
85f06ca556 s3/client: Construct it with config
Similar to previous patch -- extent the s3::client constructor to get
the endpoint config value next to the endpoint string. For now the
configs are likely empty, but they are yet unused too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
caf9e357c8 s3/client: Construct it with sstring endpoint
Currently the client is constructed with socket_address which's prepared
by the caller from the endpoint string. That's not flexible engouh,
because s3 client needs to know the original endpoint string for two
reasons.

First, it needs to lookup endpoint config for potential AWS creds.
Second, it needs this exact value as Host: header in its http requests.

So this patch just relaxes the client constructor to accept the endpoint
string and hard-code the 9000 port. The latter is temporary, this is how
local tests' minio is started, but next patch will make it configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
711514096a sstables: Make s3_storage with endpoint config
Continuation of the previous patch. The sstables::s3_storage gets the
endpoint config instance upon creation.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
bd1e3c688f sstables_manager: Keep object storage configs onboard
The user sstables manager will need to provide endpoint config for
sstables' storage drivers. For that it needs to get it from db::config
and keep in-sync with its updates.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Botond Dénes
48b9f31a08 Merge 'db, sstable: use generation_type instead of its value when appropriate' from Kefu Chai
in this series, we try to use `generation_type` as a proxy to hide the consumers from its underlying type. this paves the road to the UUID based generation identifier. as by then, we cannot assume the type of the `value()` without asking `generation_type` first. better off leaving all the formatting and conversions to the `generation_type`. also, this series changes the "generation" column of sstable registry table to "uuid", and convert the value of it to the original generation_type when necessary, this paves the road to a world with UUID based generation id.

Closes #13652

* github.com:scylladb/scylladb:
  db: use uuid for the generation column in sstable registry table
  db, sstable: add operator data_value() for generation_type
  db, sstable: print generation instead of its value
2023-05-03 09:04:54 +03:00
Kefu Chai
74e9e6dd1a db: use uuid for the generation column in sstable registry table
* change the "generation" column of sstable registry table from
  bigint to uuid
* from helper to convert UUID back to the original generation

in the long run, we encourage user to use uuid based generation
identifier. but in the transition period, both bigint based and uuid
based identifiers are used for the generation. so to cater both
needs, we use a hackish way to store the integer into UUID. to
differentiate the was-integer UUID from the geniune UUID, we
check the UUID's most_significant_bits. because we only support
serialize UUID v1, so if the timestamp in the UUID is zero,
we assume the UUID was generated from an integer when converting it
back to a generation identififer.

also, please note, the only use case of using generation as a
column is the sstable_registry table, but since its schema is fixed,
we cannot store both a bigint and a UUID as the value of its
`generation` column, the simpler way forward is to use a single type
for the generation. to be more efficient and to preserve the type of
the generation, instead of using types like ascii string or bytes,
we will always store the generation as a UUID in this table, if the
generation's identifier is a int64_t, the value of the integer will
be used as the least significant bits of the UUID.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-02 19:23:22 +08:00
Pavel Emelyanov
81a1416ebf sstables: Add and call storage::destroy()
The s3_storage leaks client when sstable gets destoryed. So far this
came unnoticed, but debug-mode unit test ran over minio captured it. So
here's the fix.

When sstable is destroyed it also kicks the storage to do whatever
cleanup is needed. In case of s3 storage the cleanup is in closing the
on-boarded client. Until #13458 is fixed each sstable has its own
private version of the client and there's no other place where it can be
close()d in co_await-able mannter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:15:44 +03:00
Pavel Emelyanov
3e0c3346a8 sstables: Coroutinize sstable::destroy()
To simiplify patching by next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-02 11:15:11 +03:00
Benny Halevy
707bd17858 everywhere: optimize calls to make_flat_mutation_reader_from_mutations_v2 with single mutation
No point in going through the vector<mutation> entry-point
just to discover in run time that it was called
with a single-element vector, when we know that
in advance.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13733
2023-05-02 07:58:34 +03:00
Botond Dénes
f527b28174 Merge 'treewide: reenable -Wmissing-braces' from Kefu Chai
this change silences the warning of `-Wmissing-braces` from
clang. in general, we can initialize an object without constructor
with braces. this is called aggregate initialization.
but the standard does allow us to initialize each element using
either copy-initialization or direct-initialization. but in our case,
neither of them applies, so the clang warns like

```
suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
                options.elements.push_back({bytes(k.begin(), k.end()), bytes(v.begin(), v.end())});
                                            ^~~~~~~~~~~~~~~~~~~~~~~~~
                                            {                        }
```

in this change,

also, take the opportunity to use structured binding to simplify the
related code.

Closes #13705

* github.com:scylladb/scylladb:
  build: reenable -Wmissing-braces
  treewide: add braces around subobject
  cql3/stats: use zero-initialization
2023-04-28 16:00:14 +03:00
Kefu Chai
ba8402067f db, sstable: add operator data_value() for generation_type
so we can apply `execute_cql()` on `generation_type` directly without
extracting its value using `generation.value()`. this paves the road to
adding UUID based generation id to `generation_type`. as by then, we
will have both UUID based and integer based `generation_type`, so
`generation_type::value()` will not be able to represent its value
anymore. and this method will be replaced by `operator data_value()` in
this use case.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 20:39:12 +08:00
Kefu Chai
ae9aa9c4bd db, sstable: print generation instead of its value
this change prepares for the change to use `variant<UUID, int64_t>`
as the value of `generation_type`. as after this change, the "value"
of a generation would be a UUID or an integer, and we don't want to
expose the variant in generation's public interface. so the `value()`
method would be changed or removed by then.

this change takes advantage of the fact that the formatter of
`generation_type` always prints its value. also, it's better to
reuse `generation_type` formatter when appropriate.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 20:39:12 +08:00
Kefu Chai
eb7c41767b treewide: add braces around subobject
this change helps to silence the warning of `-Wmissing-braces` from
clang. in general, we can initialize an object without constructor
with braces. this is called aggregate initialization.
but the standard does allow us to initialize each element using
either copy-initialization or direct-initialization. but in our case,
neither of them applies, so the clang warns like

```
suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
                options.elements.push_back({bytes(k.begin(), k.end()), bytes(v.begin(), v.end())});
                                            ^~~~~~~~~~~~~~~~~~~~~~~~~
                                            {                        }
```

in this change,

also, take the opportunity to use structured binding to simplify the
related code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 16:59:29 +08:00
Raphael S. Carvalho
2dbae856f8 sstable: Piggyback on sstable parser and writer to provide bytes_on_disk
bytes_on_disk is the sum of all sstable components.

As read_simple() fetches the file size before parsing the component,
bytes_on_disk can be added incrementally rather than an additional
step after all components were already parsed.

Likewise, write_simple() tracks the offset for each new component,
and therefore bytes_on_disk can also be added incrementally.

This simplifies s3 life as it no longer have to care about feeding
a bytes_on_disk, which is currently limited to data and index
sizes only.

Refs #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
4d02821094 sstable: restore indentation in read_digest() and read_checksum()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
75dc7b799e sstable: make all parsing of simple components go through do_read_simple()
With all parsing of simple components going through do_read_simple(),
common infrastructure can be reused (exception handling, debug logging,
etc), and also statistics spanning all components can be easily added.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
71cd8e6b51 sstable: Add missing pragma once to random_access_reader.hh
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:48 -03:00
Raphael S. Carvalho
b783bddbdf sstable: make all writing of simple components go through do_write_simple()
With all writing of simple components going through do_write_simple(),
common infrastructure can be reused (exception handling, debug logging,
etc), and also statistics spanning all components can be easily added.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:06:46 -03:00
Raphael S. Carvalho
dcee5c4fae sstable: Restore indentation in read_simple()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:04:52 -03:00
Raphael S. Carvalho
253d9e787b sstable: Coroutinize read_simple()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:04:52 -03:00
Raphael S. Carvalho
0dcdec6a55 sstable: Use filter memory footprint in filter_size()
For S3, filter size is currently set to zero, as we want to avoid
"fstat-ing" each file.

On-disk representation of bloom filter is similar to the in-memory
one, therefore let's use memory footprint in filter_size().

User of filter_size() is API implementing "nodetool cfstats" and
it cares about the size of bloom filter data (that's how it's
described).

This way, we provide the filter data size regardless of the
underlying storage type.

Refs #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-27 12:04:52 -03:00
Kefu Chai
f5b05cf981 treewide: use defaulted operator!=() and operator==()
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.

fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.

in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.

sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.

also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.

please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.

this change is a cleanup to modernize the code base with C++20
features.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13687
2023-04-27 10:24:46 +03:00
Pavel Emelyanov
4f93b440a5 sstables: Remove lost eptr variable from do_write_simple()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13684
2023-04-27 07:37:15 +03:00
Benny Halevy
87d9c4d7f8 sstables: filesystem_storage::change_state: simplify log message
When moving to the base directory, the printout currently looks broken:
```
INFO  2023-04-16 09:15:58,631 [shard 0] sstable - Moving sstable .../data/ks/cf-4c1bb670dc3711ed96733daf102e4aab/upload/md-1-big-Data.db to  in ".../data/ks/cf-4c1bb670dc3711ed96733daf102e4aab/"
```

Since `path` already contains `to`, the message can be just simplified
and `to` need not be printed explicitly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13525
2023-04-24 13:43:48 +03:00
Botond Dénes
1750bb34b7 Merge 'sstables, replica: add generation generator' from Kefu Chai
this is the first step to the uuid-based generation identifier. the goal is to encapsulate the generation related logic in generator, so its consumers do not have to understand the difference between the int64_t based generation and UUID v1 based generation.

this commit should not change the behavior of existing scylla. it just allows us to derive from `generation_generator` so we can have another generator which generates UUID based generation identifier.

Closes #13073

* github.com:scylladb/scylladb:
  replica, test: create generation id using generator
  sstables: add generation_generator
  test: sstables: use generate_n for generating ids for testing
2023-04-24 09:31:08 +03:00
Kefu Chai
6e82aa42d5 sstables: add generation_generator
to prepare for the uuid-based generation identifier, where we
will generate uuid-based generation idenfier if corresponding
option is enabled, otherwise an integer based id. to reduce the
repeatings, generation_generator is extracted out so it can be reused.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 21:51:13 +08:00
Kefu Chai
a2aa133822 treewide: use std::lexicographical_compare_threeway
this the standard library offers
`std::lexicographical_compare_threeway()`, and we never uses the
last two addition parameters which are not provided by
`std::lexicographical_compare_threeway()`. there is no need to have
the homebrew version of trichotomic compare function.

in this change,

* all occurrences of `lexicographical_tri_compare()` are replaced
  with `std::lexicographical_compare_threeway()`.
* ``lexicographical_tri_compare()` is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13615
2023-04-21 14:28:18 +03:00
Kefu Chai
51fc0bc698 sstables: use default generated operator==
C++20 compiler is able to generate defaulted operator== and
operator!=. and the default generated operators behaves exactly
the same as the ones crafted by us. so let's it do its job.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13614
2023-04-21 14:25:39 +03:00
Kefu Chai
c5fa1ac9f7 sstable: specialize fmt::formatter<component_type>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `component_type` without the help of `operator<<`.

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now.

also, please note, to enable fmtlib to format `std::set<component_type>`
in `test/boost/sstable_3_x_test.cc` , we need to include
`<fmt/ranges.h>` in that source file.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13598
2023-04-21 09:49:24 +03:00
Benny Halevy
77b70dbdb7 sstables: compressed_file_data_source_impl: get: throw malformed_sstable_exception on premature eof
Currently, the reader might dereference a null pointer
if the input stream reaches eof prematurely,
and read_exactly returns an empty temporary_buffer.

Detect this condition before dereferencing the buffer
and sstables::malformed_sstable_exception.

Fixes #13599

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #13600
2023-04-21 07:56:58 +03:00
Botond Dénes
804403f618 reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
They is still using the old terminology for permit state names, bring
them up to date with the recent state name changes.
2023-04-19 05:20:42 -04:00
Botond Dénes
4c37dc5507 Merge 'keys: specialize fmt::formatter<partition_key> and friends' from Kefu Chai
this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print following classes without the help of `operator<<`.

- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.

the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.

Refs scylladb#13245

Closes #13513

* github.com:scylladb/scylladb:
  keys: consolidate the formatter for partition_keys
  keys: specialize fmt::formatter<partition_key> and friends
2023-04-17 10:27:31 +03:00
Raphael S. Carvalho
a47bac931c Move TWCS option from table into TWCS itself
enable_optimized_twcs_queries is specific to TWCS, therefore it
belongs to TWCS, not replica::table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13489
2023-04-14 08:28:16 +03:00
Kefu Chai
3738fcbe05 keys: specialize fmt::formatter<partition_key> and friends
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print following classes without the help of `operator<<`.

- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper

the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.

the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-14 13:21:30 +08:00
Raphael S. Carvalho
17261369ea sstables: Allow SSTable loading to discard bloom filter
If bloom filter is not loaded, it means that an always-present filter
is used, which translates into the SSTable being opened on every
single read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:34:22 -03:00
Raphael S. Carvalho
1427a5ce98 sstables: Allow sstable_directory user to feed custom sstable open config
This will be used by load-and-stream to load SSTables in its own
customized way.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:34:16 -03:00
Raphael S. Carvalho
86516f4cef sstables: Move sstable_open_info into open_info.hh
So sstable_directory can access its definition without having to
include sstables.hh.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:31:14 -03:00
Botond Dénes
f1bbf705f9 Merge 'Cleanup sstables in resharding and other compaction types' from Benny Halevy
This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to:
* cleanup uploaded sstables (#11933)
* cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (#9559)

When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set.  They are removed from that set once released by compaction.
Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging.

Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction.

Closes #12422

* github.com:scylladb/scylladb:
  table: discard_sstables: update_sstable_cleanup_state when deleting sstables
  compaction_manager: compact_sstables: retrieve owned ranges if required
  sstables: add a printer for shared_sstable
  compaction_manager: keep owned_ranges_ptr in compaction_state
  compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup
  compaction: refactor compaction_state out of compaction_manager
  compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh
  compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state
  compaction_manager: refactor get_candidates
  compaction_manager: get_candidates: mark as const
  table, compaction_manager: add requires_cleanup
  sstable_set: add for_each_sstable_until
  distributed_loader: reshard: update sstable cleanup state
  table, compaction_manager: add update_sstable_cleanup_state
  compaction_manager: needs_cleanup: delete unused schema param
  compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges
  distributed_loader: reshard: consider sstables for cleanup
  distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard
  distributed_loader: reshard: add optional owned_ranges_ptr param
  distributed_loader: reshard: get a ref to table_state
  distributed_loader: reshard: capture creator by ref
  distributed_loader: reshard: reserve num_jobs buckets
  compaction: move owned ranges filtering to base class
  compaction: move owned_ranges into descriptor
2023-04-11 14:52:29 +03:00