Commit Graph

40037 Commits

Author SHA1 Message Date
Kefu Chai
3a8a3100af raft: add formatter for raft::logical_clock::time_point
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for logical_clock::time_point, as fmt does not
  provide formatter for this time_point, as it is not a part of the
  standard library
* remove operator<<() for logical_clock::time_point, as its soly
  purpose is to generate the corresponding fmt::formatter when
  FMT_DEPRECATED_OSTREAM is defined.
* remove operator<<() for logical_clock::duration, as fmt provides
  a default implementation for formatting
  std::chrono::nanoseconds already, which uses `int64_t` as its rep
  template parameter as well.
* include "fmt/chrono.h" so that the source files including this
  header can have access the formatter without including it by
  themselves, this preserve the existing behavior which we have
  before removal of "operator<<()".

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16263
2023-12-04 18:32:03 +02:00
Nadav Har'El
4505a86f46 tablets, mv: fix base-view pairing to consider base replication map
In the view update code, the function get_view_natural_endpoint()
determines which view replica this base replica should send an update
to. It currently gets the *view* table's replication map (i.e., the map
from view tokens to lists of replicas holding the token), but assumes
that this is also the *base* table's replication map.

This assumption was true with vnodes, but is no longer true with
tablets - the base table's replication map can be completely different
from the view table's. By looking at the wrong mapping,
get_view_natural_endpoint() can believe that this node isn't really
a base-replica and drop the view update. Alternatively, it can think
it is a base replica - but use the wrong base-view pairing and create
base-view inconsistencies.

This patch solves this bug - get_view_natural_endpoint() now gets two
separate replication maps - the base's and the view's. The callers
need to remember what the base table was (in some cases they didn't
care at the point of the call), and pass it to the function call.

This patch also includes a simple test that reproduces the bug, and
confirms it is fixed: The test has a 6-node cluster using tablets
and a base table with RF=1, and writes one row to it. Before this
patch, the code usually gets confused, thinking the base replica
isn't a replica and loses the view update. With this patch, the
view update works.

Fixes #16227.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16228
2023-12-04 16:38:54 +02:00
Avi Kivity
60af2f3cb2 Merge 'New commitlog file format using tagged pages' from Calle Wilund
Prototype implementation of format suggested/requested by @avikivity:

Divides segments into disk-write-alignment sized pages, each tagged with segment ID + CRC of data content.
When read, we both verify sector integrity (CRC) to detect corruption, as well as matching ID read with expected one.

If the latter mismatches we have a prematurely terminated segment (read truncation), which, depending on whether the CL is
written in batch or periodic mode, as well as explicit sync, can mean data loss.

Note: all-zero pages are treated as kosher, both to align with newly allocated segments, as well as fully terminated (zero-page) ones.

Note: This is a preview/RFC - the rest of the file format is not modified. At least parts of entry CRC could probably be removed, but I have not done so yet (needs some thinking).

Note: Some slight abstraction breaks in impl. and probably less than maximal efficiency.

v2:
* Removed entry CRC:s in file format.
* Added docs on format v3
* Added one more test for recycling-truncation

v3:
* Fixed typos in size calc and docs
* Changed sect metadata order
* Explicit iter type

Closes scylladb/scylladb#15494

* github.com:scylladb/scylladb:
  commitlog_test: Add test for replaying large-ish mutation
  commitlog_test: Add additional test for segmnent truncation
  docs: Add docs on commitlog format 3
  commitlog: Remove entry CRC from file format
  commitlog: Implement new format using CRC:ed sectors
  commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges
  fragmented_temporary_buffer: Add const iterator access to underlying buffers
  commitlog_replayer: differentiate between truncated file and corrupt entries
2023-12-04 13:31:13 +01:00
Avi Kivity
8fa2e3ad2a Merge 'Remove sstables::remove_by_toc_name()' from Pavel Emelyanov
The helper in question complicates the logic of sstable_directory::process() by making garbage collection differently for sstables deleted "atomically" and deleted "one-by-one". Also, the code that deletes sstables one-by-one and uses remove_by_toc_name() renders excessive TOC file reading, because there's sstable object at hand and it had all_components() ready for use.

Surprisingly, there was no test for the deletion-log functionality. This PR adds one. The test passes before the g.c. and regular unlink fix, and (of course) continues passing after it.

Closes scylladb/scylladb#16240

* github.com:scylladb/scylladb:
  sstables: Drop remove_by_name()
  sstables/fs_storage: Wipe by recognized+unrecognized components
  sstable_directory: Enlight deletion log replay
  sstables: Split remove_by_toc_name()
  test: Add test case to validate deletion log work
  sstable_directory: Close dir on exception
  sstable_directory: Fix indentation after previous patch
  sstable_directory: Coroutinize delete_with_pending_deletion_log()
  test: Sstable on_delete() is not necessarily in a thread
  sstable_directory: Split delete_with_pending_deletion_log()
2023-12-03 17:29:34 +02:00
Wojciech Mitros
a8c9451fb2 commitlog: add max disk size api
Currently, the max size of commitlog is obtained either from the
config parameter commitlog_total_space_in_mb or, when the parameter
is -1, from the total memory allocated for Scylla.
To facilitate testing of the behavior of commitlog hard limit,
expose the value of commitlog max_disk_size in a dedicated API.

Closes scylladb/scylladb#16020
2023-12-03 17:16:58 +02:00
Kefu Chai
39b2ee9751 dist/redhat: avoid mixed use of spaces and tabs
rpmlint complains about "mixed-use-of-spaces-and-tabs". and it
does not good in the editor. so let's replace tab with spaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16246
2023-12-03 17:11:03 +02:00
Nadav Har'El
59ff27ea4a Merge 'Typos: fix typos in comments' from Yaniv Kaul
Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16257

* github.com:scylladb/scylladb:
  Update service/topology_state_machine.hh
  Update raft/tracker.hh
  Update db/view/view.cc
  Typos: fix typos in comments
2023-12-03 11:23:51 +02:00
Yaniv Kaul
030d421931 Update service/topology_state_machine.hh 2023-12-03 10:08:11 +02:00
Yaniv Kaul
7c4b742583 Update raft/tracker.hh 2023-12-03 10:07:55 +02:00
Yaniv Kaul
2b73793a39 Update db/view/view.cc 2023-12-03 10:07:45 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Kamil Braun
01e54f5b12 Merge 'test: delete topology_raft_disabled suite' from Patryk Jędrzejczak
This PR is a necessary step to fix #15854 -- making consistent
cluster management mandatory on master.

Before making consistent cluster management mandatory, we have
to get rid of all tests that depend on the
`consistent_cluster_management=false` config. These are the tests
in the `topology_raft_disabled` suite.

There's the internal Raft upgrade procedure, which is the bulk of the
upgrade logic. Then, there are two thin "layers" around it that
invoke it underneath: recovery procedure and
enable-raft-in-the-cluster procedure. We're getting rid of the
second one by making Raft always enabled, so we naturally have to
get rid of tests that depend on it. The idea is to replace every
necessary enable-raft-in-the-cluster procedure in these tests with
the recovery procedure. Then, we will still be testing the internal
Raft upgrade procedure in the in-tree tests. The
enable-raft-in-the-cluster procedure is already tested by QA tests,
so we don't need to worry about these changes.

Unfortunately, we cannot adapt `test_raft_upgrade_no_schema`.
After making consistent cluster management mandatory on master,
schema commitlog will also become mandatory because
`consistent_cluster_management: True`,
`force_schema_commit_log: False`
is considered a bad configuration. These changes will make
`test_raft_upgrade_no_schema` unimplementable in the Scylla repo.
Therefore, we remove this test. If we want to keep it, we must
rewrite it as an upgrade dtest.

After making all tests in `topology_raft_disabled` use consistent
cluster management, there is no point in keeping this suite.
Therefore, we delete it and move all the tests to `topology_custom`.

Closes scylladb/scylladb#16192

* github.com:scylladb/scylladb:
  test: delete topology_raft_disabled suite
  test: topology_raft_disabled: move tests to topology_custom suite
  test: topology_raft_disabled: move utils to topology suite
  test: topology_raft_disabled: use consistent cluster management
  test: topology_raft_disabled: add new util functions
  test: topology_raft_disabled: delete test_raft_upgrade_no_schema
2023-12-01 17:11:32 +01:00
Pavel Emelyanov
17fd558df8 sstables: Drop remove_by_name()
It was used by deletion log replay and by storage wipe, now it's unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
4405a625f6 sstables/fs_storage: Wipe by recognized+unrecognized components
Currently wiping fs-backed sstable happens via reading and parsing its
TOC file back. Then the three-step process goes:

- move TOC -> TOC.tmp
- remove components (obtained from TOC.tmp)
- remove TOC.tmp

However, wiping sstable happens in one of two cases -- the sstable was
loaded from the TOC file _or_ sstable had evaluated the needed
components and generated TOC file. With that, the 2nd step can be made
without reading the TOC file, just by looking at all components sitting
on the sstable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
de931702ec sstable_directory: Enlight deletion log replay
Garbage collection of sstables is scattered between two strages -- g.c.
per-se and the regular processing.

The former stage collects deletion logs and for each log found goes
ahead and deletes the full sstable with the standard sequence:

- move TOC -> TOC.tmp
- remove components
- remove TOC.tmp

The latter stage picks up partially unlinked sstables that didn't go via
atomic deletion with the log. This comes as

- collect all components
  - keep TOC's and TOC.tmp's in separate lists
  - attach other components to TOC/TOC.tmp by generation value
- for all TOC.tmp's get all attached components and remove them
- continue loading TOC's with attached components

Said that, replaying deletion log can be as light as just the first step
out of the above sequence -- just move TOC to TOC.tmp. After that the
regular processing would pick the remaining components and clean them

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
5ff5946520 sstables: Split remove_by_toc_name()
The helper consists of three phases:

- move TOC -> TOC.tmp
- remove components listed in TOC
- remove TOC.tmp

The first step is needed separately by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
b10ca96e07 test: Add test case to validate deletion log work
The test sequence is

- create several sstables
- create deletion log for a sub-set of them
- partially unlink smaller sub-sub-set
- make sstable directory do the processing with g.c.
- check that the sstables loaded do NOT include the deleted ones

The .throw_on_missing_toc bit set additionally validates that the
directory doesn't contain garbage not attached to any other TOCs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
fcf080b63b sstable_directory: Close dir on exception
When committing the deletion log creation its containing directory is
sync-ed via opened file. This place is not exception safe and directory
can be left unclosed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
bb167dcca5 sstable_directory: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
28b1289d4b sstable_directory: Coroutinize delete_with_pending_deletion_log()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
92f0aa04d0 test: Sstable on_delete() is not necessarily in a thread
One of the test cases injects an observer into sstable->unlink() method
via its _on_delete() callback. The test's callback assumes that it runs
in an async context, but it's a happy coincidence, because deletion via
the deletion log runs so. Next patch is changing it and the test case
will no longer work. But since it's a test case it can just directly
call a libc function for its needs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
ed043e5762 sstable_directory: Split delete_with_pending_deletion_log()
The helper consists of three parts -- prepare the deletion log, unlink
sstables and drop the deletion log. For testing the first part is needed
as a separate step, so here's this split.

It renders two nested async contexts, but it will change soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:37 +03:00
Nadav Har'El
bae6f3387f CODEOWNERS: remove some entries
The ".github/CODEOWNERS" is used by github to recommend reviewers for
pull requests depending on the directories touched in the pull request.
Github ignores entries on that file who are not **maintainers**. Since
Jan is no longer a Scylla maintainer, I remove his entries in the list.

Additionally, I am removing *myself* from *some* of the directories.
For many years, it was an (unwritten) policy that experienced Scylla
developers are expected to help in reviewing pieces of the code they
are familiar with - even if they no longer work on that code today.
But as ScyllaDB the company grew, this is no longer true; The policy
is now that experienced developers are requested review only code in
their own or their team's area of responsibility - experienced developers
should help review *designs* of other parts, but not the actual code.
For this reason I'm removing my name from various directories.
I can still help review such code if asked specifically - but I will no
longer be the "default" reviewer for such code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16239
2023-11-30 20:29:05 +02:00
Tomasz Grabiec
c64ae7b733 scripts: Introduce tablet-mon.py
Closes scylladb/scylladb#15512
2023-11-30 19:15:36 +02:00
Nadav Har'El
49860952f9 Merge 'LIST EFFECTIVE SERVICE LEVEL statement' from Michał Jadwiszczak
Add `LIST EFFECTIVE SERVICE LEVEL` statement to be able to display from which service level come which service level options.

Example:
There are 2 roles: role1 and role2. Role1 is assigned with sl1 (timeout = 2s, workload_type = interactive) and role2 is assigned with sl2 (timeout = 10s, workload_type = batch).
Then, if we grant role1 to role2, the user with role2 will have 2s timeout (from sl1) and batch workload type (from sl2).

```
> LIST EFFECTIVE SERVICE LEVEL OF role2;

 service_level_option | effective_service_level | value
----------------------+-------------------------+-------------
        workload_type |                     sl2 |       batch
              timeout |                     sl1 |          2s
```

Fixes: https://github.com/scylladb/scylladb/issues/15604

Closes scylladb/scylladb#14431

* github.com:scylladb/scylladb:
  cql-pytest: add `LIST EFFECTIVE SERVICE LEVEL OF` test
  docs: add `LIST EFFECTIVE SERVICE LEVEL` statement docs
  cql3:statements: add `LIST EFFECTIVE SERVICE LEVEL` statement
  service:qos: add option to include effective names to SLO
2023-11-30 18:12:52 +02:00
Patryk Jędrzejczak
77c4ee92e5 test: delete topology_raft_disabled suite
After moving all tests out of topology_raft_disabled, we can safely
remove this suite.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
ba990d90bb test: topology_raft_disabled: move tests to topology_custom suite
We move the remaining tests in topology_raft_disabled to
topology_custom. We choose topology_custom because these tests
cannot use consistent topology changes.

We need to modify these tests a bit because we cannot pass
RandomTables to a test case function if the initial cluster size
equals 0. RandomTables.__init__ requires manager.cql to be present.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
659ac9c7f5 test: topology_raft_disabled: move utils to topology suite
We move all used util functions from topology_raft_disabled to
topology before we remove topology_raft_disabled. After this
change, util.py in topology will be the single util file for all
topology tests.

Some util functions in topology_raft_disabled aren't used anymore.
We don't move such functions and remove them instead.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
684b070b20 test: topology_raft_disabled: use consistent cluster management
Soon, we will make consistent cluster management mandatory on
master. Before this, we have to change all tests in the
topology_raft_disabled suite so that they do not depend on the
consistent_cluster_management=false config.

Adapting test_raft_upgrade_majority_loss is simple. We only have
to get rid of the initial upgrade. This initial upgrade didn't
test anything. Every test in topology_raft_disabled had to do it
at the beginning because of consistent_cluster_management=false.

Adapting test_raft_upgrade_basic and test_raft_upgrade_stuck is
more difficult. It requires changing the initial upgrade to
clearing Raft data in RECOVERY mode on all servers and restarting
them. Then, the servers will run the same upgrade procedure as
before.

After changing the tests, we also update their names appropriately.

test_raft_upgrade_stuck becomes a bit slower, so we remove the
comment about running time. Also, one TODO was fixed in the process
of rewriting the test. This fix forced us to skip the test in the
release mode since we cannot update the list of error injections
through manager.server_update_config in this mode.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
1059fece19 test: topology_raft_disabled: add new util functions
They are shorter and more readable than long CQL queries. We use
them even more in the following commit.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
7e43ebf88e test: topology_raft_disabled: delete test_raft_upgrade_no_schema
After making consistent cluster management mandatory on master,
schema commitlog will also become mandatory because
consistent_cluster_management: True,
force_schema_commit_log: False
is considered a bad configuration. These changes will make
test_raft_upgrade_no_schema unimplementable in the Scylla repo, so
we remove it.

If we want to keep this test, we must rewrite it as an upgrade
dtest.
2023-11-30 15:50:21 +01:00
Kefu Chai
7a1fbb38f9 sstable: order uuid-based generation as timeuuid
under most circumstances, we don't care the ordering of the sstable
identifiers, as they are just identifiers. so, as long as they can be
compared, we are good. but we have tests with expect that the sstables
can be ordered by the time they are created. for instance,
sstable_run_based_compaction_test has this expectaion.

before this change, we compare two UUID-based generations by its
(MSB, LSB) lexicographically. but UUID v1 put the lower bits of
the timestamp at the higher bits of MSB, so the ordering of the
"time" in timeuuid is not preserved when comparing the UUID-based
generations. this breaks the test of sstable_run_based_compaction_test,
which feeds the sstables to be compacted in a set, and the set is
ordered with the generation of the sstables.

after this change, we consider the UUID-based generation as
a timeuuid when comparing them.

Fixes #16215
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16238
2023-11-30 14:50:44 +02:00
Michał Jadwiszczak
e3515cfc1b cql-pytest: add LIST EFFECTIVE SERVICE LEVEL OF test 2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
e1d86f9afb docs: add LIST EFFECTIVE SERVICE LEVEL statement docs 2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
2438965b6a cql3:statements: add LIST EFFECTIVE SERVICE LEVEL statement
Add statement to print effective service level of a specified role.
2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
1b08338fe7 service:qos: add option to include effective names to SLO
Allow to include `slo_effective_names` in `service_level_options`
to be able to determine from which service level the specific option value comes from.
2023-11-30 13:07:20 +01:00
Yaron Kaikov
7ce6962141 build_docker.sh: Upgrade package during creation and remove sshd service
When scanning our latest docker image using `trivy` (command: `trivy
image docker.io/scylladb/scylla-nightly:latest`), it shows we have OS
packages which are out of date.

Also removing `openssh-server` and `openssh-client` since we don't use
it for our docker images

Fixes: https://github.com/scylladb/scylladb/issues/16222

Closes scylladb/scylladb#16224
2023-11-30 14:00:15 +02:00
Botond Dénes
d6d9751dd8 tools/scylla-sstable: validate,validate-checksums: print JSON last
Said commands print errors as they validate the sstables. Currently this
intermingles with the regular JSON output of these commands, resulting
in ugly and confusing output.
This is not a problem for scripted use, as logs go to stderr while the
JSON go to stdout, but it is a problem for human users.
Solve this by outputting the JSON into a std::stringstream and printing
it in one go at the very end. This means JSON is accumulated in a memory
buffer, but these commands don't output a lot of JSON, so this shouldn't
be a problem.

Closes scylladb/scylladb#16216
2023-11-30 09:53:47 +03:00
Piotr Smaroń
5fd30578d7 config: introduce value_status::Deprecated
Current mechanism to deprecate config options is implemented in a hacky
way in `main.cpp` and doesn't account for existing
`db::config/boost::po` API controlling lifetime of config options, hence
it's being replaced in this PR by adding yet another `value_status`
enumerator: `Deprecated`, so that deprecation of config options is
controlled in one place in `config.cc`,i.e. when specifying config options.
Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing

With this change, if a `Deprecated` config option is specified as
1. a command line parameter, scylla will run and log:
```
WARN  2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated)
```
(Previously it was only a message printed to standard output, not a
scylla log of warn level).
2. an option in `scylla.yaml`, scylla will run and log:
```
WARN  2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota
```

Fixes #15887
Incorporates dropped https://github.com/scylladb/scylladb/pull/15928

Closes scylladb/scylladb#16184
2023-11-30 08:52:57 +03:00
Avi Kivity
8e9d3af431 Merge 'Commitlog: complete prerequisites and enforce hard limit by default' from Eliran Sinvani
This miniset, completes the prerequisites for enabling commitlog hard limit on by default.
Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances.
It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however,
once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant.

Tests: unit tests.
          LWT write test (#9331)
A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations:
1. The default configuration
2. Hard limit on (without changing the flush threshold)
3. the changes in this PR applied.
The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others.

Closes scylladb/scylladb#10974

* github.com:scylladb/scylladb:
  commitlog: enforce commitlog size hard limit by default
  commitlog: set flush threshold to half of the  limit size
  commitlog: unfold flush threshold assignment
2023-11-29 20:55:53 +02:00
Kamil Braun
8a14839a00 Merge 'handle more failures during topology operations' from Gleb
This series adds handling for more failures during a topology operation
(we already handle a failure during streaming). Here we add handling of
tablet draining errors by aborting the operation and handling of errors
after streaming where an operation cannot be aborted any longer. If the
error happens when rollback is no longer possible we wait for ring delay
and proceed to the next step. Each individual patch that adds the sleep
has an explanation what the consequences of the patch are.

* 'gleb/topology-coordinator-failures' of github.com:scylladb/scylla-dev:
  test: add test to check errro handling during tablet draining
  test: fix test_topology_streaming_failure test to not grep the whole file
  storage_service: add error injection into the tablet migration code
  storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage
  storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state
  storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state
  storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes
  storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure
  storage_service: topology coordinator: add rollback_to_normal node state
  storage_service: topology coordinator: put fence version into the raft state
  storage_service: topology coordinator: do fencing even if draining failed
2023-11-29 19:02:35 +01:00
Avi Kivity
cccd2e7fa7 Merge 'Generalize sstables TOC file reading' from Pavel Emelyanov
TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place.
To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream.

Closes scylladb/scylladb#16175

* github.com:scylladb/scylladb:
  sstable: Generalize toc file read and parse
  s3/client: Don't GET object contents on out-of-bound reads
  s3/client: Cache stats on readable_file
2023-11-29 19:18:31 +02:00
Nadav Har'El
62f89d49e5 tablets, mv: fix on_internal_error on write to base table
This situation before this patch is that when tablets are enabled for
a keyspace, we can create a materialized view but later any write to
the base table fails with an on_internal_error(), saying that:

     "Tried to obtain per-keyspace effective replication map of test
      but it's per-table."

Indeed, with tablets, the replication is different for each table - it's
not the same for the entire keyspace.

So this patch changes the view update code to take the replication
map from the specific base table, not the keyspace.

This is good enough to get materialized-views reads and writes working
in a simple single-node case, as the included test demonstrates (the
test fails with on_internal_error() before this patch, and passes
afterwards).

But this fix is not perfect - the base-view pairing code really needs
to consider not only the base table's replication map, but also the
view table's replication map - as those can be different. We'll fix
this remaining problem as a followup in a separate patch - it will
require a substantially more elaborate test to reproduce the need
for the different mapping and to verify that fix.

Fixes #16209.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16211
2023-11-29 15:29:17 +01:00
Avi Kivity
cd732b1364 Update seastar submodule
* seastar 830ce8673...55a821524 (34):
  > Revert "reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc"
  > epoll: Avoid spinning on aborted connections
Fixes #12774
Fixes #7753
Fixes #13337
  > Merge 'Sanitize test-only reactor facilities' from Pavel Emelyanov
  > test/unit: fix fmt version check
  > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc
  > build: add spaces before () and after commands
  > reactor: use zero-initialization to initialize io_uring_params
  > Merge 'build: do not return a non-false condition if the option is off ' from Kefu Chai
  > memory: do not use variable length array
  > build: use tri_state_option() to link against Sanitizers
  > build: do not define SEASTAR_TYPE_ERASE_MORE on all builds
  > Revert "shared_future: make available() immediate after set_value()"
  > test_runner: do not throw when seastar.app fails to start
  > Merge 'Address issue where Seastar faults in toeplitz hash when reassembling fragment' from John Hester
  > defer, closeable: do not use [[nodiscard(str)]]
  > Merge 'build: generate config-specific rules using generator expressions' from Kefu Chai
  > treewide: use *_v and *_t for better readability
  > build: use different names for .pc files for each build mode
  > perftune.py: skip discovering IRQs for iSCSI disks
  > io-tester: explicit use uint64_t for boost::irange(...)
  > gate: correct the typo in doxygen comment
  > shared_future: make available() immediate after set_value()
  > smp: drop unused templates
  > include fmt/ostream.h to make headers self-sufficient
  > Support ccache in ./configure.py
  > rpc_tester: Disable -Wuninitialized when including boost.accumulators
  > file: construct directory_entry with aggregated ctor
  > file: s/ino64_t/ino_t/, s/off64_t/off_t/
  > sstring_test: include fmt/std.h only if fmtlib >= 10.0.0
  > file: do not include coroutine headers if coroutine is disabled
  > fair_queue::unregister_priority_class:fix assertion
  > Merge 'Generalize `net::udp_channel` into `net::datagram_channel`' from Michał Sala
  > Merge 'Add file::list_directory() that co_yields entries' from Pavel Emelyanov
  > http/file_handler: remove unnecessary cast

Closes scylladb/scylladb#16201
2023-11-29 14:34:30 +02:00
Kefu Chai
c40da20092 utils/pretty_printers: stop using undocumented fmt api
format_parse_context::on_error() is an undocumented API in fmt v9
and in fmt v10, see

- https://fmt.dev/9.1.0/api.html#_CPPv4I0EN3fmt16basic_format_argE
- https://fmt.dev/10.0.0/api.html#_CPPv4I0EN3fmt26basic_format_parse_contextE

despite that this API was once used in its document for fmt v10.0.0, see
https://fmt.dev/10.0.0/api.html#formatting-user-defined-types. it's
still, well, undocumented.

so, to have better compatibility, let's use the documented API in place
of undocumented one. please note, `throw_format_error()` was still
not a public API before 10.1.0, so before that release we have to
throw `fmt::format_error` explicitly. so we cannot use it yet during
the transitional period.

because the class of `fmt::format_error` is defined in `fmt/format.h`,
we need to include this header for using it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16212
2023-11-29 12:49:04 +02:00
Pavel Emelyanov
0da37d5fa6 sstable: Generalize toc file read and parse
There are several places where TOC file is parsed into a vector of
components -- sstable::read_toc(), remove_by_toc_name() and
remove_by_registry_entry(). All three deserve some generalization.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:09:52 +03:00
Pavel Emelyanov
c5d85bdf79 s3/client: Don't GET object contents on out-of-bound reads
If S3 readable file is used inside file input stream, the latter may
call its read methods with position that is above file size. In that
case server replies with generic http error and the fact that the range
was invalid is encoded into reply body's xml.

That's not great to catch this via wrong reply status exception and xml
parsing all the more so we can know that the read is out-of-bound in
advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:09:52 +03:00
Pavel Emelyanov
339182287f s3/client: Cache stats on readable_file
S3-based sstables components are immutable, so every time stat is called
there's no need to ping server again.

But the main intention of this patch is to provide stats for read calls
in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:06:54 +03:00
Calle Wilund
3b70fde3cd commitlog: Make named_files in delete_segments have updated size
Fixes #16207

commitlog::delete_segments deletes (or recycles) segments replayed.
The actual file size here is added to footprint so actual delete then
can determine iff things should be recycled or removed.
However, we build a pending delete list of named_files, and the files
we added did not have size set. Bad. Actual deletion then treated files
as zero-byte sized, i.e. footprint calculations borked.

Simple fix is just filling in the size of the objects when addind.
Added unit test for the problem.

Closes scylladb/scylladb#16210
2023-11-29 09:58:47 +02:00
Yaron Kaikov
c3ee53f3be test.py: enable xml validation
Following https://github.com/scylladb/scylladb/issues/4774#issuecomment-1752089862

Adding back xml validation

Closes: https://github.com/scylladb/scylla-pkg/issues/3441

Closes scylladb/scylladb#16198
2023-11-29 09:02:36 +02:00