Commit Graph

37838 Commits

Author SHA1 Message Date
Patryk Jędrzejczak
68bd0424c2 service: storage_proxy: refactor encode_replica_exception_for_rpc
To properly handle abort_requested_exception thrown from
migration_manager::get_schema_for_read in storage_proxy::handle_read (we
do in the next commit) we have to somehow encode and return it. The
encode_replica_exception_for_rpc function is not suitable for that because
it requires the SourceTuple type (of a value returned by do_query()) which
we don't know when calling get_schema_for_read.

We move the part of encode_replica_exception_for_rpc responsible for
handling exceptions to a new function and rewrite it in a way that doesn't
require the SourceTuple type. As this function fits the name
encode_replica_exception_for_rpc better, we name it this way and rename
the previous encode_replica_exception_for_rpc.
2023-07-17 12:27:33 +02:00
Patryk Jędrzejczak
a21c4abad7 replica: add abort_requested_exception to exception_variant
If migration_manager::get_schema_for_write is called after
migration_manager::drain, it throws abort_requested_exception.
This exception is not present in replica::exception_variant, which
means that RPC doesn't preserve information about its type. If it is
thrown on the replica side, it is deserialized as std::runtime_error
on the coordinator. Therefore, abstract_read_resolver::error logs
information about this exception, even though we don't want it (aborts
are triggered on shutdown and timeouts).

To solve this issue, we add abort_requested_exception to
replica::exception_variant and, in the next commits, refactor
storage_proxy::handle_read so that abort_requested_exception thrown in
migration_manager::get_schema_for_write is properly serialized. Thanks
to this change, unchanged abstract_read_resolver::error correctly
handles abort_requested_exception thrown on the replica side by not
reporting it.
2023-07-13 16:57:10 +02:00
Tomasz Grabiec
b7bc991aa1 Merge 'Fix test_node_isolation flakiness' from Kamil Braun
The test isolates a node and then connects to it through CQL.
The `connect()` step would often timeout on ARM debug builds. This was
already dealt with in the past in the context of other tests: #11289.

The `ManagerClient.con_gen` function creates a connection in a way that
avoids the problem -- connection timeout settings are adjusted to
account for the slowness. Use it in this test to fix the flakiness.

At the same time, reduce the timeout used for the actual CQL request
(after the driver has already connected), because the test expects this
request to timeout and waiting for 200 seconds here is just a waste of
time.

Closes #14663

* github.com:scylladb/scylladb:
  test: test_node_isolation: use `ManagerClient.con_gen` to create CQL connection
  test: manager_client: make `con_gen` for `ManagerClient.__init__` nonoptional
2023-07-12 16:36:54 +02:00
Calle Wilund
890f1f4ad3 generic_server: Handle TLS error codes indicating broken pipe
Fixes  #14625

In broken pipe detection, handle also TLS error codes.

Requires https://github.com/scylladb/seastar/pull/1729

Closes #14626
2023-07-12 16:04:33 +03:00
Botond Dénes
6a63abcb9f Merge 'doc: fix broken links reported by the link checker' from Anna Stuchlik
This PR fixes or removes broken links reported by an online link checker.

Fixes https://github.com/scylladb/scylladb/issues/14488

Closes #14462

* github.com:scylladb/scylladb:
  doc: update the link to ABRT
  doc: fix broken links on the Scylla SStable page
2023-07-12 16:02:23 +03:00
Asias He
d3034e0fab view_update_generator: Increase the registration_queue_size
When repair writes a sstable to disk, we check if the sstable needs view
update processing. If yes, the sstable will be placed into the staging
dir for processing, with the _registration_sem semaphore to prevent too
many pending unprocessed sstables.

We have seen multiple cases in the field where view update processing is
inefficient and way too slow which blocks the base table repair to
finish on time.

This patch increases the registration_queue_size to a bigger number to
mitigate the problem that slow view update processing blocks repair.

It is better to have a consistent base table + inconsistent view table
than inconsistent base table + inconsistent view table.

Currently, sstables in staging dir are not compacted. So we could not
increase the _registration_sem with too big number to avoid accumulate
too many sstables.

The view_build_test.cc is updated to make the test pass.

Closes #14241
2023-07-12 15:51:35 +03:00
Tomasz Grabiec
e8ee0a2f86 Merge 'group0_state_machine: use correct comparison for timeuuids in merger' from Kamil Braun
In d2a4079bbe, `merger` was modified so that when we merge a command, `last_group0_state_id` is taken to be the maximum of the merged command's state_id and the current `last_group0_state_id`. This is necessary for achieving the same behavior as if the commands were applied individually instead of being merged -- where we take the maximum state ID from `group0_history` table which was applied until now (because the table is sorted using the state IDs and we take the greatest row).

However, a subtle bug was introduced -- the `std::max` function uses the `utils::UUID` standard comparison operator which is unfortunately not the same as timeuuid comparison that Scylla performs when sorting the `group0_history` table. So in rare cases it could return the *smaller* of the two timeuuids w.r.t. the correct timeuuid ordering. This would then lead to commands being applied which should have been turned to no-ops due to the `prev_state_id` check -- and then, for example, permanent schema desync or worse.

Fix it by using the correct comparison method.

Fixes: #14600

Closes #14616

* github.com:scylladb/scylladb:
  utils/UUID: reference `timeuuid_tri_compare` in `UUID::operator<=>` comment
  group0_state_machine: use correct comparison for timeuuids in `merger`
  utils/UUID: introduce `timeuuid_tri_compare` for `const UUID&`
  utils/UUID: introduce `timeuuid_tri_compare` for `const int8_t*`
2023-07-12 14:48:18 +02:00
Botond Dénes
296837120d db: move virtual tables into virtual_tables.cc
The definitions of virtual tables make up approximately a quarter of the
huge system_keyspace.cc file (almost 4K lines), pulling in a lot of
headers only used by them.
Move them to a separate source file to make system_keyspace.cc easier
for humans and compilers to digest.
This patch also moves the `register_virtual_tables()`,
`install_virtual_readers()` as well as the `virtual_tables` global.

Closes #14308
2023-07-12 15:26:54 +03:00
Anna Stuchlik
a414ac8fde doc: update the link to ABRT 2023-07-12 14:13:42 +02:00
Kefu Chai
8f31f28446 build: cmake: add test/raft tests
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14656
2023-07-12 15:06:59 +03:00
Kamil Braun
820d7e9520 test: test_node_isolation: use ManagerClient.con_gen to create CQL connection
The test isolates a node and then connects to it through CQL.
The `connect()` step would often timeout on ARM debug builds. This was
already dealt with in the past in the context of other tests: #11289.

The `ManagerClient.con_gen` function creates a connection in a way that
avoids the problem -- connection timeout settings are adjusted to
account for the slowness. Use it in this test to fix the flakiness.

At the same time, reduce the timeout used for the actual CQL request
(after the driver has already connected), because the test expects this
request to timeout and waiting for 200 seconds here is just a waste of
time.
2023-07-12 12:34:02 +02:00
Kefu Chai
20c7b6057b test: silence the deprecation warning.
because `lw_shared_ptr::operator=(T&&)` was deprecated. we started to
have following waring:

```
/home/kefu/dev/scylladb/test/boost/statement_restrictions_test.cc:394:41: warning: 'operator=' is deprecated: call make_lw_shared<> and assign the result instead [-Wdeprecated-declarations]
  394 |         definition.column_specification = std::move(specification);
      |                                         ^
/home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:346:7: note: 'operator=' has been explicitly marked deprecated here
  346 |     [[deprecated("call make_lw_shared<> and assign the result instead")]]
      |       ^
1 warning generated.
```

so, in this change, we use the recommended way to update a lw_shared_ptr.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14648
2023-07-12 13:10:33 +03:00
Kamil Braun
3464877276 test: manager_client: make con_gen for ManagerClient.__init__ nonoptional
`ManagerClient` is given a function that is used to create CQL
connections to the Scylla cluster. For some reason it was typed as
`Optional` even though it was never passed `None`. Fix it.
2023-07-12 11:44:15 +02:00
Kefu Chai
5443bf69f7 storage_proxy: print the expected ex.what()
before this change, the format string contains two placeholders,
but only one extra argument is passed in. if we actually format
this logging message, fmtlib would throw.

after this change, we pass the exception's error message as yet
another argument.

this logging message is printed with "trace" level, guess that's
why we haven't have the exception thrown by fmtlib.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14628
2023-07-12 12:34:51 +03:00
Nadav Har'El
a4087f58df alternator: fix error path for size() function on constants
The DynamoDB documentation for the size() function claims that it only
works on paths (attribute names or references), but it actually works on
constants from the query (e.g., ":val") as well.

It turns out that Alternator supports this undocumented case already, but
gets the error path wrong: Usually, when size() is calculated on the data,
if the data has the wrong type of size() (e.g., an integer), the condition
simply doesn't match. But if the value comes from the query - it should
generate an error that the query is wrong - ValidationException.

This patch fixes this case, and also adds tests for it that pass on both
DynamoDB and Alternator (after this patch).

Fixes #14592

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14593
2023-07-12 12:29:05 +03:00
Pavel Emelyanov
eb549234b0 scylla-gdb: Fix tables filtering
There's -k|--keyspace argument to the tables command that's supposed to
filter tables belonging to specific keyspace that doesn't work. Fix it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14634
2023-07-12 12:26:25 +03:00
Avi Kivity
0fc067a54c build: add -Wimplicit-fallthrough to cmake
In 0cabf4eeb9 ("build: disable implicit fallthrough"), we added
-Wimplicit-fallthrough to configure.py, but forgot to add it to cmake.

Closes #14629
2023-07-12 12:24:22 +03:00
Nadav Har'El
f08bc83cb2 cql-pytest: translate Cassandra's tests for CAST operations
This is a translation of Cassandra's CQL unit test source file
functions/CastFctsTest.java into our cql-pytest framework.

There are 13 tests, 9 of them currently xfail.

The failures are caused by one recently-discovered issue:

Refs #14501: Cannot Cast Counter To Double

and by three previously unknown or undocumented issues:

Refs #14508: SELECT CAST column names should match Cassandra's
Refs #14518: CAST from timestamp to string not same as Cassandra on zero
             milliseconds
Refs #14522: Support CAST function not only in SELECT

Curiously, the careful translation of this test also caused me to
find a bug in Cassandra https://issues.apache.org/jira/browse/CASSANDRA-18647
which the test in Java missed because it made the same mistake as the
implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14528
2023-07-12 11:42:04 +03:00
Nadav Har'El
599636b307 test/alternator: fix flaky test test_ttl_expiration_gsi_lsi
The Alternator test test_ttl.py::test_ttl_expiration_gsi_lsi was flaky.
The test incorrectly assumes that when we write an already expired item,
it will be visible for a short time until being deleted by the TTL thread.
But this doesn't need to be true - if the test is slow enough, it may go
look or the item after it was already expired!

So we fix this test by splitting it into two parts - in the first part
we write a non-expiring item, and notice it eventually appears in the
GSI, LSI, and base-table. Then we write the same item again, with an
expiration time - and now it should eventually disappear from the GSI,
LSI and base-table.

This patch also fixes a small bug which prevented this test from running
on DynamoDB.

Fixes #14495

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14496
2023-07-12 11:23:12 +03:00
Botond Dénes
968421a3e0 Merge 'Stop task manager compaction module properly' from Aleksandra Martyniuk
Due to wrong order of stopping of compaction services, shutdown needs
to wait until all compactions are complete, which may take really long.

Moreover, test version of compaction manager does not abort task manager,
which is strictly bounded to it, but stops its compaction module. This results
in tests waiting for compaction task manager's tasks to be unregistered,
which never happens.

Stopping and aborting of compaction manager and task manager's compaction
module are performed in a proper order.

Closes #14461

* github.com:scylladb/scylladb:
  tasks: test: abort task manager when wrapped_compaction_manager is destructed
  compaction: swap compaction manager stopping order
  compaction: modify compaction_manager::stop()
2023-07-12 09:54:00 +03:00
Avi Kivity
118fa59ba8 tools: add cqlsh shortcut
Add bin/cqlsh as a shortcut to tools/cqlsh/bin/cqlsh, intended for
developers.

Closes #14362
2023-07-12 09:36:59 +03:00
Pavel Emelyanov
033e5348aa scylla-gdb: Print all clients from all idx's
The scylla netw command prints clients from [0] index only, but there
are more of them on messaging service. Print all

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14633
2023-07-12 09:29:02 +03:00
Botond Dénes
c5cb23a825 Merge 'Add scylla table to scylla-gdb' from Pavel Emelyanov
The command is to print interesting and/or hard-to-get-by-hand info about individual tables

Closes #14635

* github.com:scylladb/scylladb:
  test: Add 'scylla table' cmd test
  scylla-gdb: Print table phased barriers
  scylla-gdb: Add 'table' command
2023-07-12 09:26:59 +03:00
Kamil Braun
dc6f6cb6b0 cql_test_env: load host ID from sstables after restart
Performance tests such as `perf-fast-forward` are executed in our CI
environments in two steps (two invocations of the `scylla` process):
first by populating data directories (with `--populate` option), then by
running the actual test.

These tests are using `cql_test_env`, which did not load the previously
saved (in the populate step) Host ID of this node, but generated a new
one randomly instead.

In b39ca97919 we enabled
`consistent_cluster_management` by default. This caused the perf tests
to hang in `setup_group0` at `read_barrier` step. That's because Raft
group 0 was initialized with old configuration -- the one created during
the populate step -- but the Raft server was started with a newly
generated Host ID (which is used as the server's Raft ID), so the server
considered itself as being outside the configuration.

Fix this by reloading the Host ID from disk, simulating more closely the
behavior of main.cc initialization.

Fixes #14599

Closes #14640
2023-07-11 23:30:44 +03:00
Avi Kivity
1545ae2d3b Merge 'Make SSTable cleanup more efficient by fast forwarding to next owned range' from Raphael "Raph" Carvalho
Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node.

That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in https://github.com/scylladb/scylladb/issues/14317.

To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead.

Without further ado,

before:

`INFO  2023-07-01 07:10:26,281 [shard 0] compaction - [Cleanup keyspace2.standard1 701af580-17f7-11ee-8b85-a479a1a77573] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s8o_06uww24drzrroaodpv-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.`

after:

`INFO  2023-07-01 07:07:52,354 [shard 0] compaction - [Cleanup keyspace2.standard1 199dff90-17f7-11ee-b592-b4f5d81717b9] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s4m_5hehd2rejj8w15d2nt-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.`

Fixes #12998.
Fixes #14317.

Closes #14469

* github.com:scylladb/scylladb:
  test: Extend cleanup correctness test to cover more cases
  compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range
  sstables: Close SSTable reader if index exhaustion is detected in fast forward call
  sstables: Simplify sstable reader initialization
  compaction: Extend make_sstable_reader() interface to work with mutation_source
  test: Extend sstable partition skipping test to cover fast forward using token
2023-07-11 23:28:15 +03:00
Avi Kivity
9cdae78d04 test: expr_test: add copyright/license
Closes #14613
2023-07-11 21:45:27 +03:00
Raphael S. Carvalho
60ba1d8b47 test: Extend cleanup correctness test to cover more cases
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-07-11 13:56:24 -03:00
Raphael S. Carvalho
8d58ff1be6 compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range
Today, SSTable cleanup skips to the next partition, one at a time, when it finds
that the current partition is no longer owned by this node.

That's very inefficient because when a cluster is growing in size, existing
nodes lose multiple sequential tokens in its owned ranges. Another inefficiency
comes from fetching index pages spanning all unowned tokens, which was described
in #14317.

To solve both problems, cleanup will now use multi range reader, to guarantee
that it will only process the owned data and as a result skip unowned data.
This results in cleanup scanning an owned range and then fast forwarding to the
next one, until it's done with them all. This reduces significantly the amount
of data in the index caching, as index will only be invoked at each range
boundary instead.

Without further ado,

before:

... 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.

after:

... 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.

Fixes #12998.
Fixes #14317.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-07-11 13:56:24 -03:00
Raphael S. Carvalho
1fefe597e6 sstables: Close SSTable reader if index exhaustion is detected in fast forward call
When wiring multi range reader with cleanup, I found that cleanup
wouldn't be able to release disk space of input SSTables earlier.

The reason is that multi range reader fast forward to the next range,
therefore it enables mutation_reader::forwarding, and as a result,
combined reader cannot release readers proactively as it cannot tell
for sure that the underlying reader is exhausted. It may have reached
EOS for the current range, but it may have data for the next one.

The concept of EOS actually only applies to the current range being
read. A reader that returned EOS will actually get out of this
state once the combined reader fast forward to the next range.

Therefore, only the underlying reader, i.e. the sstable reader,
can for certain know that the data source is completely exhausted,
given that tokens are read in monotonically increasing order.

For reversed reads, that's not true but fast forward to range
is not actually supported yet for it.

Today, the SSTable reader already knows that the underlying SSTable
was exhausted in fast_forward_to(), after it call index_reader's
advance_to(partition_range), therefore it disables subsequent
reads. We can take a step further and also check that the index
was exhausted, i.e. reached EOF.

So if the index is exhausted, and there's no partition to read
after the fast_forward_to() call, we know that there's nothing
left to do in this reader, and therefore the reader can be
closed proactively, allowing the disk space of SSTable to be
reclaimed if it was already deleted.

We can see that the combined reader, under multi range reader,
will incrementally find a set of disjoint SSTable exhausted,
as it fast foward to owned ranges

1:
INFO  2023-07-05 10:51:09,570 [shard 0] mutation_reader - flat_multi_range_mutation_reader(): fast forwarding to range [{-4525396453480898112, start},{-4525396453480898112, end}]
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-1-big-Data.db, start == *end, eof ? true
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - closing reader 0x60100029d800 for /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-1-big-Data.db
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-3-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-4-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-5-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-6-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-7-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-8-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-9-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-10-big-Data.db, start == *end, eof ? false

2:
INFO  2023-07-05 10:51:09,572 [shard 0] mutation_reader - flat_multi_range_mutation_reader(): fast forwarding to range [{-2253424581619911583, start},{-2253424581619911583, end}]
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-2-big-Data.db, start == *end, eof ? true
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - closing reader 0x60100029d400 for /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-2-big-Data.db
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-4-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-5-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-6-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-7-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-8-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-9-big-Data.db, start == *end, eof ? false
INFO  2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-10-big-Data.db, start == *end, eof ? false

And so on.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-07-11 13:56:24 -03:00
Raphael S. Carvalho
f08a4eaacb sstables: Simplify sstable reader initialization
It's odd that we see things like:

    if (!is_initialized()) {
        return initialize().then([this] {
            if (!is_initialized()) {

    and

    return ensure_initialized().then([this, &pr] {
        if (!is_initialized()) {

One might think initialize will actually initialize the reader by
setting up context, and ensure_initialized() will even have stronger
guarantees, meaning that the reader must be initialized by it.

But none are true.

In the context of single-partition read, it can happen initialize()
will not set up context, meaning is_initialized() returns false,
which is why initialization must be checked even after we call
ensure_initialized().

Let's merge ensure_initialized() and initialize() into a
maybe_initialize() which returns a boolean saying if the reader
is initialized.

It makes the code initializing the reader easier to understand.
2023-07-11 13:56:23 -03:00
Michał Chojnowski
b511d57fc8 Revert "Merge 'Compaction resharding tasks' from Aleksandra Martyniuk"
This reverts commit 2a58b4a39a, reversing
changes made to dd63169077.

After patch 87c8d63b7a,
table_resharding_compaction_task_impl::run() performs the forbidden
action of copying a lw_shared_ptr (_owned_ranges_ptr) on a remote shard,
which is a data race that can cause a use-after-free, typically manifesting
as allocator corruption.

Note: before the bad patch, this was avoided by copying the _contents_ of the
lw_shared_ptr into a new, local lw_shared_ptr.

Fixes #14475
Fixes #14618

Closes #14641
2023-07-11 19:11:37 +03:00
Calle Wilund
e1a52af69e messaging_service: Do TLS init early
Fixes #14299

failure_detector can try sending messages to TLS endpoints before start_listen
has been called (why?). Need TLS initialized before this. So do on service creation.

Closes #14493
2023-07-11 18:19:01 +03:00
Kefu Chai
b4dc3f7cd9 scylla-gdb: add sstable::generation_type printer
to inspect the sstable generation after uuid-based generation
change. in this change:

* a pretty printer for sstable::generation_type is added
* now that the pretty printer for the generation_type is registered,
  we can just leverage it when printing the sstable name, so
  instead of checking if `_generation` member variable contains
  `_value`, we use delegate it to `str()`, which is used by
  `str.format()`. as the behavior of `str()` is similar to that of
  the gdb `print` command, and calls `value.format_string()`, which
  in turn calls into `to_string()` if the "value" in question has
  a pretty printer.

after this change, the printer is able to print both the generations
before the uuid change and the ones after the change.

a typical gdb session looks like:

```
(gdb) p generation._value
$5 = f0770b40-1c7c-11ee-b136-bf28f8d18b88
(gdb) p generation
$10 = 3g7g_0bu7_0jpvk2p0mmtlsb8lu0
(gdb) p/x generation._value.least_sig_bits
$7 = 0xb136bf28f8d18b88
(gdb) p/x generation._value.most_sig_bits
$8 = 0xf0770b401c7c11ee
```

if we use `scripts/base36-uuid.py` to encode
the msb and lsb, we'd need to:
```console
scripts/base36-uuid.py -e 0xf0770b401c7c11ee 0xb136bf28f8d18b88
3g7g_0bu7_0jpvk2p0mmtlsb8lu0
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14561
2023-07-11 15:56:20 +03:00
Raphael S. Carvalho
3b1829f0d8 compaction: base compaction throughput on amount of data read
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.

A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.

Fixes #14533.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14615
2023-07-11 15:48:05 +03:00
Kefu Chai
25f4a7c400 sstables: format using format string
instead of concatenating strings, let's format using the builtin
support of `log::debug()`. for two reasons:

1. better performance, after this change, we don't need to
   materialize the concatenated string, if the "debug" level logging
   is not enabled. seasetar::log only formats when a certain log
   level is enabled.
2. better readability. with the format string, it is clear what
   is the fixed part, and which arguments are to be formatted.
   this also helps us to move to compile-time formatting check,
   as fmtlib requires the caller to be explicit when it wants
   to use runtime format string.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14627
2023-07-11 15:31:20 +03:00
Pavel Emelyanov
5518502085 test: Add 'scylla table' cmd test
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-11 15:12:43 +03:00
Pavel Emelyanov
2c2ad09d3c scylla-gdb: Print table phased barriers
These barriers show if there's any operation in progress (read, write,
flush or stream). These are crucial to know if stopping fails, e.g. see
issue #13100

These barriers are symmarized in 'scylla memory' command, but they are
also good to know on per-table basis

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-11 15:10:47 +03:00
Pavel Emelyanov
1948b8fa17 scylla-gdb: Add 'table' command
There's 'scylla tables' one that lists tables on the given/current
shard, but the list is unable to show lots of information. It prints the
table address so it can be explored by hand, but some data is more handy
to be parsed and printed with the script

The syntax is

  $ scylla table ks.cf

For now just print the schema version. To be extended in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-11 15:08:55 +03:00
Botond Dénes
bc5174ced6 Merge 'doc: move the package installation instructions to the documentation' from Anna Stuchlik
Refs: https://github.com/scylladb/scylla-docs/issues/4091
Fixes https://github.com/scylladb/scylla-docs/issues/3419

This PR moves the installation instructions from the [website](https://www.scylladb.com/download/) to the documentation. Key changes:
- The instructions are mostly identical, so they were squeezed into one page with different tabs.
- I've merged the info for Ubuntu and Debian, as well as CentOS and RHEL.
- The page uses variables that should be updated each release (at least for now).
- The Java requirement was updated from Java 8 to Java 11 following [this issue](https://github.com/scylladb/scylla-docs/issues/3419).
- In addition, the title of the Unified Installer page has been updated to communicate better about its contents.

Closes #14504

* github.com:scylladb/scylladb:
  doc: update the prerequisites section
  doc: improve the tile of Unified Installer page
  doc: move package install instructions to the docs
2023-07-11 14:30:11 +03:00
Kamil Braun
051728318d utils/UUID: reference timeuuid_tri_compare in UUID::operator<=> comment 2023-07-11 13:19:50 +02:00
Avi Kivity
f26e36f448 Update seastar submodule
* seastar 2b7a341210...bac344d584 (3):
  > tls: Export error_category instance used by tls + some common error codes
  > reactor: cast enum to int when formatting it
  > cooking: bump up zlib to 1.2.13
2023-07-11 13:24:32 +03:00
Kamil Braun
5779230d28 group0_state_machine: use correct comparison for timeuuids in merger
In d2a4079bbe, `merger` was modified so
that when we merge a command, `last_group0_state_id` is taken to be the
maximum of the merged command's state_id and the current
`last_group0_state_id`. This is necessary for achieving the same
behavior as if the commands were applied individually instead of being
merged -- where we take the maximum state ID from `group0_history` table
which was applied until now (because the table is sorted using the state
IDs and we take the greatest row).

However, a subtle bug was introduced -- the `std::max` function uses the
`utils::UUID` standard comparison operator which is unfortunately not
the same as timeuuid comparison that Scylla performs when sorting the
`group0_history` table. So in rare cases it could return the *smaller*
of the two timeuuids w.r.t. the correct timeuuid ordering. This would
then lead to commands being applied which should have been turned to
no-ops due to the `prev_state_id` check -- and then, for example,
permanent schema desync or worse.

Fix it by using the correct comparison method.

Fixes: #14600
2023-07-11 11:48:02 +02:00
Kamil Braun
5ce802676f utils/UUID: introduce timeuuid_tri_compare for const UUID&
The existing `timeuuid_tri_compare` operates on UUIDs serialized in byte
buffers. Introduce a version which operates directly on the
`utils::UUID` type.

To reuse existing comparison code, we serialize to a buffer before
comparing. But we avoid allocations by using `std::array`. Since the
serialized size needs to be known at compile time for `std::array`, mark
`UUID::serialized_size()` as `constexpr`.
2023-07-11 11:48:02 +02:00
Kamil Braun
668beedadc utils/UUID: introduce timeuuid_tri_compare for const int8_t*
`timeuuid_tri_compare` takes `bytes_view` parameters and converts them
to `const int8_t*` before comparing.

Extract the part that operates on `const int8_t*` to separate function
which we will reuse in a later commit.
2023-07-11 11:48:02 +02:00
Kefu Chai
ef78b31b43 s3/client: add tagging ops
with tagging ops, we will be able to attach kv pairs to an object.
this will allow us to mark sstable components with taggings, and
filter them based on them.

* test/pylib/minio_server.py: enable anonymous user to perform
  more actions. because the tagging related ops are not enabled by
  "mc anonymous set public", we have to enable them using "set-json"
  subcommand.
* utils/s3/client: add methods to manipulate taggings.
* test/boost/s3_test: add a simple test accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14486
2023-07-11 09:30:46 +03:00
Kefu Chai
3b6e37051b build: cmake: add more tests to CMake
to be in-sync with configure.py

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14479
2023-07-11 09:21:26 +03:00
Botond Dénes
37dd2503ff Merge 'replica,sstable: do not assign a value to a shared_ptr' from Kefu Chai
instead using the operator=(T&&) to assign an instance of `T` to a
shared_ptr, assign a new instance of shared_ptr to it.

unlike std::shared_ptr, seastar::shared_ptr allows us to move a value
into the existing value pointed by shared_ptr with operator=(). the
corresponding change in seastar is
319ae0b530.
but this is a little bit confusing, as the behavior of a shared_ptr
should look like a pointer instead the value pointed by it. and this
could be error-prune, because user could use something like
```c++
p = std::string();
```
by accident, and expect that the value pointed by `p` is cleared.
and all copies of this shared_ptr are updated accordingly. what
he/she really wants is:
```c++
*p = std::string();
```
and the code compiles, while the outcome of the statement is that
the pointee of `p` is destructed, and `p` now points to a new
instance of string with a new address. the copies of this
instance of shared_ptr still hold the old value.

this behavior is not expected. so before deprecating and removing
this operator. let's stop using it.

in this change, we update two caller sites of the
`lw_shared_ptr::operator=(T&&)`. instead of creating a new instance
pointee of the pointer in-place, a new instance of lw_shared_ptr is
created, and is assigned to the existing shared_ptr.

Closes #14470

* github.com:scylladb/scylladb:
  sstables: use try_emplace() when appropriate
  replica,sstable: do not assign a value to a shared_ptr
2023-07-11 09:19:48 +03:00
Kefu Chai
0dca0a7f27 build: cmake: include pretty_printers.cc in util
we added pretty_printers.cc back in
83c70ac04f, in which configure.py is
updated. so let's sync the CMake building system accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14442
2023-07-11 09:16:33 +03:00
Pavel Emelyanov
2eebb1312e scylla-gdb: Format IPs with network byte order
The scylla netw command prints connections IPs reversed:

(gdb) scylla netw
Dropped messages: {0, 0, 0, 1, 0 <repeats 15 times>, 1, 0 <repeats 41 times>}
Outgoing connections:
IP: 31.0.142.10, (netw::messaging_service::rpc_protocol_client_wrapper*) 0x600008d6d490:
  stats: {replied = 0, pending = 0, exception_received = 0, sent_messages = 1192, wait_reply = 0, timeout = 0}
  outstanding: 0

It should unpack the address as if it was in big-endian to have it like

(gdb) scylla netw
Dropped messages: {0, 0, 0, 1, 0 <repeats 15 times>, 1, 0 <repeats 41 times>}
Outgoing connections:
IP: 10.142.0.31, (netw::messaging_service::rpc_protocol_client_wrapper*) 0x600008d6d490:
  stats: {replied = 0, pending = 0, exception_received = 0, sent_messages = 1192, wait_reply = 0, timeout = 0}
  outstanding: 0

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14611
2023-07-11 09:12:12 +03:00
Raphael S. Carvalho
bd50943270 compaction: Extend make_sstable_reader() interface to work with mutation_source
As the goal is to make compaction filter to the next owned range,
make_sstable_reader() should be extended to create a reader with
parameters forwarded from mutation_source interface, which will
be used when wiring cleanup with multi range reader.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-07-10 17:19:30 -03:00