Commit Graph

595 Commits

Author SHA1 Message Date
Botond Dénes
7e7101c180 Revert "Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes"
This reverts commit 628e6ffd33, reversing
changes made to 45ec76cfbf.

The test included with this PR is flaky and often breaks CI.
Revert while a fix is found.

Fixes: #15371
2023-09-13 10:45:37 +03:00
Kefu Chai
571fab4179 build: cmake: build cqlsh as a submodule
since we also redistribute cqlsh, let's package it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-12 18:18:31 +08:00
Kefu Chai
111d20958e build: cmake: build python3 dist tarball with arch postfix
now that `configure.py` always generate python3 dist tarball with
${arch} postfix, let's mirror this behavior. as `build_unified.sh`
uses this naming convention.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-12 18:18:31 +08:00
Kefu Chai
34e3302c01 dbuild: use --userns option when using podman
instead of fabricating a `/etc/password` manually, we can just
leave it to podman to add an entry in `/etc/password` in container.
as podman allows us to map user's account to the same UID in the
container. see
https://docs.podman.io/en/stable/markdown/options/userns.container.html.

this is not only a cosmetic change, it also avoid the permission denied
failure when accessing `/etc/passwd` in the container when selinux is
enabled. without this change, we would otherwise need to either add the
selinux lable to the bind volume with ':Z' option address the failure
like:

```
type=AVC msg=audit(1693449115.261:2599): avc:  denied  { open } for  pid=2298247 comm="bash" path="/etc/passwd" dev="tmpfs" ino=5931 scontext=system_u:system_r:container_t:s0:c252,c259 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
type=AVC msg=audit(1693449115.263:2600): avc:  denied  { open } for  pid=2298249 comm="id" path="/etc/passwd" dev="tmpfs" ino=5931 scontext=system_u:system_r:container_t:s0:c252,c259 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
```

found in `/var/log/audit/audit.log`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15230
2023-09-11 21:41:48 +03:00
Avi Kivity
b8a655f55e Update tools/python3 submodule
* tools/python3 45fbd05...3e833f1 (1):
  > install.sh: replace <tab> with spaces
2023-09-11 21:38:02 +03:00
Avi Kivity
628e6ffd33 Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes
Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large:

1. Large allocations.  Serialization of reconcilable_result causes large allocations for storing result rows in std::deque
2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator  does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s
3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111.

This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows.

This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do.

My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition):
* Node1: 1 live row, 1M dead rows
* Node2: 1M dead rows, 1 live row

This was designed to trigger reconciliation right from the very start of the query.

Before:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

After:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before).

Refs https://github.com/scylladb/scylladb/issues/7929
Refs https://github.com/scylladb/scylladb/issues/3672
Refs https://github.com/scylladb/scylladb/issues/7933
Fixes https://github.com/scylladb/scylladb/issues/9111

Closes #14923

* github.com:scylladb/scylladb:
  test/topology_custom: add test_read_repair.py
  replica/mutation_dump: detect end-of-page in range-scans
  tools/scylla-sstable: write: abort parser thread if writing fails
  test/pylib: add REST methods to get node exe and workdir paths
  test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
  service/storage_proxy: add trace points for the actual read executor type
  service/storage_proxy: add trace points for read-repair
  storage_proxy: Add more trace-level logging to read-repair
  database: Fix accounting of small partitions in mutation query
  database, storage_proxy: Reconcile pages with no live rows incrementally
2023-09-11 19:20:19 +03:00
Botond Dénes
82f4563757 tools/scylla-sstable: write: abort parser thread if writing fails
Currently if writing the sstable fails, e.g. because the input data is
out-of-order, the json parser thread hangs because its output is no
longer consumed. This results in the entire application just freezing.
Fix this by aborting the parsing thread explicitely in the
json_mutation_stream_parser destructor. If the parser thread existed
successfully, this will be a no-op, but on the error-path, this will
ensure that the parser thread doesn't hang.
2023-09-11 07:02:14 -04:00
Botond Dénes
685486a20d Update tools/python3 submodule
* tools/python3 30b8fc21...45fbd056 (1):
  > build_reloc: do not run SCYLLA-VERSION-GEN twice
2023-09-11 10:59:56 +03:00
Avi Kivity
0656810c28 Update tools/java submodule
* tools/java 585b30fda6...9dddad27bf (1):
  > install-dependencies.sh: do not install weak dependencies

Frozen toolchain regenerated.

Closes #15322
2023-09-08 17:22:07 +03:00
Israel Fruchter
3d082acd29 Update tools/cqlsh submodule
* tools/cqlsh 2254e920...66ae7eac (5):
  > switch from `ssl_options` to `ssl_context`
  > cqlsh should use cql v4 by default when connecting #44
  > Revert "Skip pp38-macosx wheel builds"
  > update to newer cibuildwheel
  > Skip pp38-macosx wheel builds

Closes #15308
2023-09-07 22:48:37 +03:00
Kefu Chai
a29838f9e1 sstables: change make_descriptor() to accept fs::path
change another overload of `make_descriptor()` to accept `fs::path`,
in the same spirit of a previous change in this area. so we have
a more consistent API for creating sstable descriptor. and this
new API is simpler to use.

Refs #15187
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-01 07:44:06 +08:00
Kefu Chai
6656707164 sstables: change make_descriptor() to accept fs::path
to lower the programmer's cognitive load. as programmer might want
to pass the full path as the `fname` when calling
`make_descriptor(sstring sstdir, sstring fname)`, but this overload
only accepts the filename component as its second parameter. a
single `path` parameter would be easier to work with.

Refs #15187
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-01 07:44:06 +08:00
Botond Dénes
1609c76d62 tools/scylla-sstable: scrub: don't qurantine sstables after validate
Scylla sstable promises to *never* mutate its input sstables. This
promise was broken by `scylla sstable scrub --scrub-mode=validate`,
because validate moves invalid input sstables into qurantine. This is
unexpected and caused occasional failures in the scrub tests in
test_tools.py. Fix by propagating a flag down to
`scrub_sstables_validate_mode()` in `compaction.cc`, specifying whether
validate should qurantine invalid sstables, then set this flag to false
in `scylla-sstable.cc`. The existing test for validate-mode scrub is
ammended to check that the sstable is not mutated. The test now fails
before the fix and passes afterwards.

Fixes: #14309

Closes #15139
2023-08-23 21:53:12 +03:00
Kefu Chai
adfc139a74 tools/scylla-sstable: path::parent_path() when appropriate
in load_sstables(), `sst_path` is already an instace of `std::filesystem::path`,
so there is no need to cast it to `std::filesystem::path`. also,
`path.remove_filename()` returns something like
"system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/", when the
trailing slash. when we get a component's path in `sstable::filename`,
we always add a "/" in between the `dir` and the filename, so this'd
end up with two slashes in the path like:

"/var/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f//mc-2-big-Data.db"

so, in order to remove the duplicated slash, let's just use
`path.parent_path()` here.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15035
2023-08-21 09:28:03 +03:00
Tomasz Grabiec
bd8bb5d4b1 Merge 'Wire tablet into compaction group' from Raphael "Raph" Carvalho
Compaction group is the data plane for tablets, so this integration
allows each tablet to have its own storage (memtable + sstables).
A crucial step for dynamic tablets, where each tablet can be worked
on independently.

There are still some inefficiencies to be worked on, but as it is,
it already unlocks further development.

```
INFO  2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata
INFO  2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf
```

Closes #14863

* github.com:scylladb/scylladb:
  Kill scylla option to configure number of compaction groups
  replica: Wire tablet into compaction group
  token_metadata: Add this_host_id to topology config
  replica: Switch to chunked_vector for storing compaction groups
  replica: Generate group_id for compaction_group on demand
2023-08-18 15:17:17 +02:00
Avi Kivity
1901475598 Merge 'config: mark "experimental" option unused and cleanups' from Kefu Chai
in this series, the "experimental" option is marked `Unused` as it has been marked deprecated for almost 2 years since scylla 4.6. and use `experimental_features` to specify the used experimental features explicitly.

Closes #14948

* github.com:scylladb/scylladb:
  config: remove unused namespace alias
  config: use std::ranges when appropriate
  config: drop "experimental" option
  test: disable 'enable_user_defined_functions' if experimental_features does not include udf
  test: pylib: specify experimental_features explicitly
2023-08-17 20:42:02 +03:00
Avi Kivity
e8f3b073c3 Merge 'Maintain sstable state explicitly' from Pavel Emelyanov
An sstable can be in one of several states -- normal, quarantined, staging, uploading. Right now this "state" is hard-wired into sstable's path, e.g. quarantined sstable would sit in e.g. /var/lib/data/ks-cf-012345/quarantine/ directory. Respectively, there's a bunch of directory names constexprs in sstables.hh defining each "state". Other than being confusing, this approach doesn't work well with S3 backend. Additionally, there's snapshot subdir that adds to the confusion, because snapshot is not quite a state.

This PR converts "state" from constexpr char* directories names into a enum class and patches the sstable creation, opening and state-changing API to use that enum instead of parsing the path.

refs: #13017
refs: #12707

Closes #14152

* github.com:scylladb/scylladb:
  sstable/storage: Make filesystem storage with initial state
  sstable: Maintain state
  sstable: Make .change_state() accept state, not directory string
  sstable: Construct it with state
  sstables_manager: Remove state-less make_sstable()
  table: Make sstables with required state
  test: Make sstables with upload state in some cases
  tools: Make sstables with normal state
  table: Open-code sstables making streaming helpers
  tests: Make sstables with normal state by default
  sstable_directory: Make sstable with required state
  sstable_directory: Construct with state
  distributed_loader: Make sstable with desired state when populating
  distributed_loader: Make sstable with upload state when uploading
  sstable: Introduce state enum
  sstable_directory: Merge verify and g.c. calls
  distributed_loader: Merge verify and gc invocations
  sstable/filesystem: Put underscores to dir members
  sstable/s3: Mark make_s3_object_name() const
  sstable: Remove filename(dir, ...) method
2023-08-15 17:44:06 +03:00
Raphael S. Carvalho
2590eec352 replica: Generate group_id for compaction_group on demand
There are a few good reasons for this change.
1) compaction_group doesn't have to be aware of # of groups
2) thinking forward to dynamic tablets, # of groups cannot be
statically embedded in group id, otherwise it gets stale.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-08-15 09:04:05 -03:00
Kefu Chai
c82f1d2f57 tools/scylla-sstable: dump column_desc as an object
before this change, `scylla sstable dump-statistics` prints the
"regular_columns" as a list of strings, like:

```
        "regular_columns": [
          "name",
          "clustering_order",
          "type_name",
          "org.apache.cassandra.db.marshal.UTF8Type",
          "name",
          "column_name_bytes",
          "type_name",
          "org.apache.cassandra.db.marshal.BytesType",
          "name",
          "kind",
          "type_name",
          "org.apache.cassandra.db.marshal.UTF8Type",
          "name",
          "position",
          "type_name",
          "org.apache.cassandra.db.marshal.Int32Type",
          "name",
          "type",
          "type_name",
          "org.apache.cassandra.db.marshal.UTF8Type"
        ]
```

but according
https://opensource.docs.scylladb.com/stable/operating-scylla/admin-tools/scylla-sstable.html#dump-statistics,

> $SERIALIZATION_HEADER_METADATA := {
>     "min_timestamp_base": Uint64,
>     "min_local_deletion_time_base": Uint64,
>     "min_ttl_base": Uint64",
>     "pk_type_name": String,
>     "clustering_key_types_names": [String, ...],
>     "static_columns": [$COLUMN_DESC, ...],
>     "regular_columns": [$COLUMN_DESC, ...],
> }
>
> $COLUMN_DESC := {
>     "name": String,
>     "type_name": String
> }

"regular_columns" is supposed to be a list of "$COLUMN_DESC".
the same applies to "static_columnes". this schema makes sense,
as each column should be considered as a single object which
is composed of two properties. but we dump them like a list.

so, in this change, we guard each visit() call of `json_dumper()`
with `StartObject()` and `EndObject()` pair, so that each column
is printed as an object.

after the change, "regular_columns" are printed like:
```
        "regular_columns": [
          {
            "name": "clustering_order",
            "type_name": "org.apache.cassandra.db.marshal.UTF8Type"
          },
          {
            "name": "column_name_bytes",
            "type_name": "org.apache.cassandra.db.marshal.BytesType"
          },
          {
            "name": "kind",
            "type_name": "org.apache.cassandra.db.marshal.UTF8Type"
          },
          {
            "name": "position",
            "type_name": "org.apache.cassandra.db.marshal.Int32Type"
          },
          {
            "name": "type",
            "type_name": "org.apache.cassandra.db.marshal.UTF8Type"
          }
        ]
```

Fixes #15036
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15037
2023-08-15 08:22:51 +03:00
Pavel Emelyanov
9e752ca6ab tools: Make sstables with normal state
Just like tests, tool open sstable by its full path and doesn't make any
assumptions about sstable state

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:02 +03:00
Pavel Emelyanov
c0b922a8af sstable_directory: Construct with state
This is to replace full path sitting on this object eventually. For now
they have to co-exist, but state will be used to make_sstable()-s from
manager with its new API

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:01 +03:00
Kefu Chai
64bc8d2f7d config: drop "experimental" option
"experimental" was marked deprecated in 8b917f7c. this change
was included since Scylla 4.6. now that 5.3 has been branched,
this change will be included 5.4. this should be long enough
for the user's turn around if this option is ever used.

the dtests using this option has been audited and updated
accordingly. and the unit testing this option is removed as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 10:17:34 +08:00
Kefu Chai
374bed8c3d tools: do not create bpo::value unless transfer it to an option_description
`boost::program_options::value()` create a new typed_value<T> object,
without holding it with a shared_ptr. boost::program_options expects
developer to construct a `bpo::option_description` right away from it.
and `boost::program_options::option_description` takes the ownership
of the `type_value<T>*` raw pointer, and manages its life cycle with
a shared_ptr. but before passing it to a `bpo::option_description`,
the pointer created by `boost::program_options::value()` is a still
a raw pointer.

before this change, we initialize positional options as global
variables using `boost::program_options::value()`. but unfortunately,
we don't always initialize a `bpo::option_description` from it --
we only do this on demand when the corresponding subcommand is
called.

so, if the corresponding subcommand is not called, the created
`typed_value<T>` objects are leaked. hence LeakSanitizer warns us.

after this change, we create the option vector as a static
local variable in a function so it is created on demand as well.
as an alternative, we could initialize the options vector as local
variable where it used. but to be more consistent with how
`global_option` is specified. and to colocate them in a single
place, let's keep the existing code layout.

Fixes #14929
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14939
2023-08-04 08:03:11 +03:00
Botond Dénes
2d26613f28 tools: move operation-options to the operations themselves
Currently, operation-options are declared in a single global list, then
operations refer to the options they support via name. This system was
born at a time, when scylla-sstable had a lot of shared options between
its operations, so it was desirable to declare them centrally and only
add references to individual operations, to reduce duplication.
However, as the dust settled, only 2 options are shared by 2 operations
each. This is a very low benefit. Up to now the cost was also very low
-- shared options meant the same in all operations that used them.
However this is about to change and this system becomes very awkward to
use as soon as multiple operations want to have an option with the same
name, but sligthly (or very) different meaning/semantics.
So this patch changes moves the options to the operations themselves.
Each will declare the list of options it supports, without having to
reference some common list.
This also removes an entire (although very uncommon) class of bugs:
option-name referring to inexistent option.

Closes #14898
2023-07-31 20:16:41 +03:00
Kefu Chai
1c525c02a3 tools/utils: use std::shift_left() when appropriate
instead of using a loop of std::swap(), let's use std::shift_left()
when appropriate. simpler and more readable this way.

moreover, the pattern of looking for a command and consume it from
the command line resembles what we have in main(), so let's use
similar logic to handle both of them. probably we can consolidate
them in future.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14888
2023-07-31 09:46:52 +03:00
Kefu Chai
eab160e947 tools/scylla-sstable: mark const variable with constexpr
this change change `const` to `constexpr`. because the string literal
defined here is not only immutable, but also initialized at
compile-time, and can be used by constexpr expressions and functions.

this change is introduced to reduce the size of the change when moving
to compile-time format string in future. so far, seastar::format() does
not use the compile-time format string, but we have patches pending on
review implementing this. and the author of this change has local
branches implementing the changes on scylla side to support compile-time
format string, which practically replaces most of the `format()` calls
with `seastar::format()`.

to reduce the size of the change and the pain of rebasing, some of the
less controversial changes are extracted and upstreamed. this one is one
of them.

this change also addresses following compilation failure:

```
/home/kefu/dev/scylladb/tools/scylla-sstable.cc:2836:44: error: call to consteval function 'fmt::basic_format_string<char, const char *const &, seastar::basic_sstring<char, unsigned int, 15>>::basic_format_string<const char *, 0>' is not a constant expression
 2836 |             .description = seastar::format(description_template, app_name, boost::algorithm::join(operations | boost::adaptors::transformed([] (const auto& op) {
      |                                            ^
/usr/include/fmt/core.h:3148:67: note: read of non-constexpr variable 'description_template' is not allowed in a constant expression
 3148 |   FMT_CONSTEVAL FMT_INLINE basic_format_string(const S& s) : str_(s) {
      |                                                                   ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14887
2023-07-31 09:44:00 +03:00
Avi Kivity
accd6271bc Merge 'tools: introduce tool_app_template and migrate all tools to it' from Botond Dénes
The scaffolding required to have a working scylla tool app, is considerable, leading to a large amount of boilerplate code in each such app.  This logic is also very similar across the two tool apps we have and would presumably be very similar in any future app. This PR extracts this logic into `tools/utils.hh` and introduces `tool_app_template`, which is similar to `seastar::app_template`  in that it centralizes all the option handling and more in a single class, that each tool has to just instantiate and then call `run()` to run the app.
This cuts down on the repetition and boilerplate in our current tool apps and make prototyping new tool apps much easier.

Closes #14855

* github.com:scylladb/scylladb:
  tools/utils.hh: remove unused headers
  tools/utils: make get_selected_operation() and configure_tool_mode() private
  tools/utils.hh: de-template get_selected_operation()
  tools/scylla-types: migrate to tools_app_template
  tools/scylla-types: prepare for migration to tool_app_template
  tools/scylla-sstable.cc: fix indentation
  tools/scylla-sstables: migrate to tool_app_template
  tools/scylla-sstables: prepare for migration to tool_app_template
  tools: extract tool app skeleton to utils.hh
2023-07-30 18:31:10 +03:00
Avi Kivity
1c3d22b717 build: update frozen toolchain to Fedora 38
This refreshes clang to 16.0.6 and libstdc++ to 13.1.1.

compiler-rt, libasan, and libubsan are added to install-dependencies.sh
since they are no longer pulled in as depdendencies.

Closes #13730
2023-07-30 03:08:48 +03:00
Botond Dénes
1eca60fe10 tools/utils.hh: remove unused headers 2023-07-28 08:41:34 -04:00
Botond Dénes
cbcb20f0f9 tools/utils: make get_selected_operation() and configure_tool_mode() private
Their only user is in tools/utils.cc, so move them there, into an
anonymous namespace.
2023-07-28 08:41:34 -04:00
Botond Dénes
fc0c87002c tools/utils.hh: de-template get_selected_operation()
It now has a single user, so it doesn't have to be a template.
For now, make the method inline, so it can stay in the header. It will
be moved to utils.cc in the next patch.
2023-07-28 08:41:16 -04:00
Botond Dénes
8caf258539 tools/scylla-types: migrate to tools_app_template
Discard the locally coded app skeleton and reuse the tool app template
instead. Reduces boilerplate greatly.
2023-07-28 08:30:53 -04:00
Botond Dénes
68a452be00 tools/scylla-types: prepare for migration to tool_app_template
Make options more declarative and create a local reference to
app.configuration() in the main lambda. To faciliate further patching.
2023-07-28 08:30:53 -04:00
Botond Dénes
7598c23359 tools/scylla-sstable.cc: fix indentation
Broken in the previous patch.
2023-07-28 08:30:53 -04:00
Botond Dénes
d082622ab9 tools/scylla-sstables: migrate to tool_app_template
Removing a great amount of boilerplate, streamlinging the main method.
2023-07-28 08:30:53 -04:00
Botond Dénes
092650b20b tools/scylla-sstables: prepare for migration to tool_app_template
Make options more declarative. To facilitate further patching.
2023-07-28 08:30:53 -04:00
Botond Dénes
89d7d80fce tools: extract tool app skeleton to utils.hh
The skeleton of the two existing scylla-native tools (scylla-types and
scylla-sstable) is very similar. By skeleton, I mean all the boilerplate
around creating and configuring a seastar::app_template, representing
operations/command and their options, and presenting and selecting
these.
To facilitate code-sharing and quick development of any new tools,
extract this skeleton from scylla-sstable.cc into tools/utils.hh,
in the form of a new tool_app_template, which wraps a
seastar::app_template and centralizes all the boilerplate logic in a
single place. The extracted code is not a simple copy-paste, although
many elements are simply copied. The original code is not removed yet.
2023-07-28 08:30:53 -04:00
Avi Kivity
cf81eef370 Merge 'schema_mutations, migration_manager: Ignore empty partitions in per-table digest' from Tomasz Grabiec
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

Tombstones expire 7 days after schema change which introduces them. If
one of the nodes is restarted after that, it will compute a different
table schema digest on boot. This may cause performance problems. When
sending a request from coordinator to replica, the replica needs
schema_ptr of exact schema version request by the coordinator. If it
doesn't know that version, it will request it from the coordinator and
perform a full schema merge. This adds latency to every such request.
Schema versions which are not referenced are currently kept in cache
for only 1 second, so if request flow has low-enough rate, this
situation results in perpetual schema pulls.

After ae8d2a550d (5.2.0), it is more liekly to
run into this situation, because table creation generates tombstones
for all schema tables relevant to the table, even the ones which
will be otherwise empty for the new table (e.g. computed_columns).

This change inroduces a cluster feature which when enabled will change
digest calculation to be insensitive to expiry by ignoring empty
partitions in digest calculation. When the feature is enabled,
schema_ptrs are reloaded so that the window of discrepancy during
transition is short and no rolling restart is required.

A similar problem was fixed for per-node digest calculation in
c2ba94dc39e4add9db213751295fb17b95e6b962. Per-table digest calculation
was not fixed at that time because we didn't persist enabled features
and they were not enabled early-enough on boot for us to depend on
them in digest calculation. Now they are enabled before non-system
tables are loaded so digest calculation can rely on cluster features.

Fixes #4485.

Manually tested using ccm on cluster upgrade scenarios and node restarts.

Closes #14441

* github.com:scylladb/scylladb:
  test: schema_change_test: Verify digests also with TABLE_DIGEST_INSENSITIVE_TO_EXPIRY enabled
  schema_mutations, migration_manager: Ignore empty partitions in per-table digest
  migration_manager, schema_tables: Implement migration_manager::reload_schema()
  schema_tables: Avoid crashing when table selector has only one kind of tables
2023-07-28 00:01:33 +03:00
Kefu Chai
2943d3c1b0 tools/scylla-sstable: s/foo.find(bar) != foo.end()/foo.count(bar) != 0/
just for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14816
2023-07-25 11:38:44 +03:00
Botond Dénes
bf6186ed7e Update tools/java submodule
* tools/java 9f63a96f...585b30fd (1):
  > cassandra-stress: add support for using RackAwareRoundRobinPolicy
2023-07-20 18:13:32 +03:00
Botond Dénes
8916aa311e Merge 'build: cmake: build: cmake: build submodules ' from Kefu Chai
this series enables CMake to build submodules. it helps developers to build, for instance, the java tools on demand.

Closes #14751

* github.com:scylladb/scylladb:
  build: cmake: build submodules
  build: cmake: generate version files with add_custom_command()
2023-07-19 12:04:29 +03:00
Botond Dénes
665f69b80d tools,mutation: extract the low-level json utilities into mutation/json.hh
Soon, we will want to convert mutation fragments into json inside the
scylla codebase, not just in tools. To avoid scylla-core code having to
include tools/ (and link against it), move the low-level json utilities
into mutation/.
2023-07-19 01:28:28 -04:00
Botond Dénes
36bca5a6af tools/json_writer: fold SstableKey() overloads into callers
These are very simple methods, and we want to make the low lever writers
not depend on knowing the sstable type.
2023-07-19 01:28:28 -04:00
Botond Dénes
043b0f316f tools/json_writer: allow writing metadata and value separately
The values of cells are potentially very large and thus, when presenting
row content as json in SELECT * FROM MUTATION_FRAGMENTS($table) queries,
we want to separate metadata and cell values into separate columns, so
users can opt out from the potentially big values being included too.
To support this use-case, write(row) and its downstream write methods
get a new `include_value` flag, which defaults to true. When set to
false, cell values will not be included in the json output. At the same
time, new methods are added to convert only cell values of a row to
json.
2023-07-19 01:28:28 -04:00
Botond Dénes
1df004db8c tools/json_writer: split mutation_fragment_json_writer in two classes
1) mutation_partition_json_writer - containing all the low level
   utilities for converting sub-fragment level mutation components (such
   as rows, tombstones, etc.) and their components into json;
2) mutation_fragment_stream_json_writer - containing all the high level
   logic for converting mutation fragment streams to json;

The latter using the former behind the scenes. The goal is to enable
reuse of converting mutation-fragments into json, without being forced
to work around differences in how the mutation fragments are reprenented
in json, on the higher level.
2023-07-19 01:28:28 -04:00
Botond Dénes
0a5b67d6d9 tools/json_writer: allow passing custom std::ostream to json_writer
To allow for use-cases where the user wants to write the json into a
string.
2023-07-19 01:28:28 -04:00
Kefu Chai
959bfae665 build: cmake: build submodules
this mirrors what we have in the `build.ninja` generated by
`configure.py`. with this change, we can build for instance,
`dist-tool-tar` from the `build.ninja` generated by CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-19 13:08:35 +08:00
Avi Kivity
bfaac3a239 Merge 'Make replace sstables implementations exception safe' from Benny Halevy
This is the first phase of providing strong exception safety guarantees by the generic `compaction_backlog_tracker::replace_sstables`.

Once all compaction strategies backlog trackers' replace_sstables provide strong exception safety guarantees (i.e. they may throw an exception but must revert on error any intermediate changes they made to restore the tracker to the pre-update state).

Once this series is merged and ICS replace_sstables is also made strongly exception safe (using infrastructure from size_tiered_backlog_tracker introduced here), `compaction_backlog_tracker::replace_sstables` may allow exceptions to propagate back to the caller rather than disabling the backlog tracker on errors.

Closes #14104

* github.com:scylladb/scylladb:
  leveled_compaction_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  time_window_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  size_tiered_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  size_tiered_backlog_tracker: provide static calculate_sstables_backlog_contribution
  size_tiered_backlog_tracker: make log4 helper static
  size_tiered_backlog_tracker: define struct sstables_backlog_contribution
  size_tiered_backlog_tracker: update_sstables: update total_bytes only if set changed
  compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref
  compaction_backlog_tracker: replace_sstables: add FIXME comments about strong exception safety
2023-07-17 12:32:27 +03:00
Harsh Soni
78c8e92170 dbuild: fix ulimits hard value for docker on osx
Docker-on-osx cannot parse "unlimited" as the hard limit value of ulimit, so, hardcode it to a fixed value.

Closes #14295
2023-07-17 10:30:39 +03:00
Tomasz Grabiec
f2ed9fcd7e schema_mutations, migration_manager: Ignore empty partitions in per-table digest
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

Tombstones expire 7 days after schema change which introduces them. If
one of the nodes is restarted after that, it will compute a different
table schema digest on boot. This may cause performance problems. When
sending a request from coordinator to replica, the replica needs
schema_ptr of exact schema version request by the coordinator. If it
doesn't know that version, it will request it from the coordinator and
perform a full schema merge. This adds latency to every such request.
Schema versions which are not referenced are currently kept in cache
for only 1 second, so if request flow has low-enough rate, this
situation results in perpetual schema pulls.

After ae8d2a550d, it is more liekly to
run into this situation, because table creation generates tombstones
for all schema tables relevant to the table, even the ones which
will be otherwise empty for the new table (e.g. computed_columns).

This change inroduces a cluster feature which when enabled will change
digest calculation to be insensitive to expiry by ignoring empty
partitions in digest calculation. When the feature is enabled,
schema_ptrs are reloaded so that the window of discrepancy during
transition is short and no rolling restart is required.

A similar problem was fixed for per-node digest calculation in
18f484cc753d17d1e3658bcb5c73ed8f319d32e8. Per-table digest calculation
was not fixed at that time because we didn't persist enabled features
and they were not enabled early-enough on boot for us to depend on
them in digest calculation. Now they are enabled before non-system
tables are loaded so digest calculation can rely on cluster features.

Fixes #4485.
2023-07-03 23:06:55 +02:00