Commit Graph

39484 Commits

Author SHA1 Message Date
Kefu Chai
8819865c8d build: cmake: correct the variable names in mode.Dev.cmake
it was a copy-pasta error.

- s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/
- s/Seastar_OptimizationLevel_RELEASE/Seastar_OptimizationLevel_DEV/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15849
2023-10-29 18:30:32 +02:00
Kamil Braun
1c0ae2e7ef Merge 'raft topology: assign tokens after join node response rpc' from Piotr Dulikowski
Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier.

However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet.

Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier.

Refs: scylladb/scylladb#15686
Fixes: scylladb/scylladb#15738

Closes scylladb/scylladb#15724

* github.com:scylladb/scylladb:
  test: test_topology_ops: continuously write during the test
  raft topology: assign tokens after join node response rpc
  storage_service: fix indentation after previous commit
  raft topology: loosen assumptions about transition nodes having tokens
2023-10-29 18:30:32 +02:00
Marcin Maliszkiewicz
020a9c931b db: view: run local materialized view mutations on a separate smp service group
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes https://github.com/scylladb/scylladb/issues/15844

Closes scylladb/scylladb#15845
2023-10-29 18:30:32 +02:00
Patryk Jędrzejczak
a6236072ee raft topology: join_node_request_handler: wait until first node becomes normal
We need to wait until the first node becomes normal in
`join_node_request_handler` to ensure that joining nodes are not
handled as the first node in the cluster.

If we placed a join request before the first node becomes normal,
the topology coordinator would incorrectly skip the join node
handshake in `handle_node_transition` (`case node_state::none`).
It would happen because the topology coordinator decides whether
a node is the first in the cluster by checking if there are no
normal nodes. Therefore, we must ensure at least one normal node
when the topology coordinator handles a join request for a
non-first node.

We change the previous check because it can return true if there
are no normal nodes. `topology::is_empty` would also return false
if the first node was still new or in transition.

Additionally, calling `join_node_request_handler` before the first
node sets itself as normal is frequent during concurrent bootstrap,
so we remove "unlikely" from the comment.

Fixes: scylladb/scylladb#15807

Closes scylladb/scylladb#15775
2023-10-29 18:30:32 +02:00
Kefu Chai
227136ddf5 main.cc: specify shortname for scheduling groups
so, for instance, the logging message looks like:
```
INFO  2023-10-24 15:19:37,290 [shard 0:strm] storage_service - entering STARTING mode
```
instead of
```
INFO  2023-10-24 15:19:37,290 [shard 0:stre] storage_service - entering STARTING mode
```

Fixes #15267
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15821
2023-10-26 10:52:05 +03:00
Kefu Chai
d43afd576e cql3/restrictions/statement_restrictions: s/allow filtering/ALLOW FILTERING/
use the captalized "ALLOW FILTERING" in the error message, because the
error message is a part of the user interface, it would be better to
keep it aligned with our document, where "ALLOW FILTERING" is used.

so, in this change, the lower-cased "allow filtering" error message is
changed to "ALLOW FILTERING", and the tests are updated accordingly.

see also a0ffbf3291

Refs #14321
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15718
2023-10-26 10:00:37 +03:00
Yaniv Kaul
600822379d Docs: small typo in cql extensions page
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#15840
2023-10-25 17:27:04 +03:00
Botond Dénes
5d1e9d8c46 Merge 'Sanitize API -> token_metadata dependency' from Pavel Emelyanov
This is the continuation for 19fc01be23

Registering API handlers for services need to

* use only the required service (sharded<> one if needed)
* get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything)

There are several endpoints scattered over storage_service and snitch that use token metadata and topology. This PR makes those endpoints work the described way and drop the api::ctx -> token_metadata dependency.

Closes scylladb/scylladb#15831

* github.com:scylladb/scylladb:
  api: Remove http::context -> token_metadata dependency
  api: Pass shared_token_metadata instead of storage_service
  api: Move snitch endpoints that use token metadata only
  api: Move storage_service endpoints that use token metadata only
2023-10-25 17:19:39 +03:00
Anna Stuchlik
ad29ba4cad doc: add info about encrypted tables to Backup
This commit updates the introduction of the Backup Your Data page to include information about encryption.

Fixes https://github.com/scylladb/scylladb/issues/15573

Closes scylladb/scylladb#15612
2023-10-25 17:15:15 +03:00
Avi Kivity
782c6a208a Merge 'cql3: mutation_fragments_select_statement: keep erm alive for duration of the query' from Botond Dénes
Said statement keeps a reference to erm indirectly, via a topology node pointer, but doesn't keep erm alive. This can result in use-after-free. Furthermore, it allows for vnodes being pulled from under the query's feet, as it is running.
To prevent this, keep the erm alive for the duration of the query.
Also, use `host_id` instead of `node`, the node pointer is not needed really, as the statement only uses the host id from it.

Fixes: #15802

Closes scylladb/scylladb#15808

* github.com:scylladb/scylladb:
  cql3: mutation_fragments_select_statement: use host_id instead of node
  cql3: mutation_fragments_select_statement: pin erm reference
2023-10-25 15:03:07 +03:00
Piotr Dulikowski
a3ba4b3109 test: test_topology_ops: continuously write during the test
In order to detect issues where requests are routed incorrectly during
topology changes, modify the test_topology_ops test so that it runs a
background process that continuously writes while the test performs
topology changes in the cluster.

At the end of the test check whether:

- All writes were successful (we only require CL=LOCAL_ONE)
- Whether there are any errors from the replica side logic in the nodes'
  logs (which happen e.g. when node receives writes before learning
  about the schema)
2023-10-25 11:50:17 +02:00
Piotr Dulikowski
63aa9332aa raft topology: assign tokens after join node response rpc
Currently, when the topology coordinator accepts a node, it moves it to
bootstrap state and assigns tokens to it (either new ones during
bootstrap, or the replaced node's tokens). Only then it contacts the
joining node to tell it about the decision and let it perform a read
barrier.

However, this means that the tokens are inserted too early. After
inserting the tokens the cluster is free to route write requests to it,
but it might not have learned about all of the schema yet.

Fix the issue by inserting the tokens later, after completing the join
node response RPC which forces the receiving node to perform a read
barrier.
2023-10-25 11:50:17 +02:00
Piotr Dulikowski
46fce4cff3 storage_service: fix indentation after previous commit 2023-10-25 11:50:17 +02:00
Piotr Dulikowski
2d161676c7 raft topology: loosen assumptions about transition nodes having tokens
In later commits, tokens for a joining/replacing node will not be
inserted when the node enters `bootstrapping`/`replacing` state but at
some later step of the procedure. Loosen some of the assumptions in
`storage_service::topology_state_load` and
`system_keyspace::load_topology_state` appropriately.
2023-10-25 11:50:17 +02:00
Anna Stuchlik
e223624e2e doc: fix the Reference page layout
This commit fixes the layout of the Reference
page. Previously, the toctree level was "2",
which made the page hard to navigate.
This PR changes the level to "1".

In addition, the capitalization of page
titles is fixed.

This is a follow-up PR to the ones that
created and updated the Reference section.
It must be backported to branch-5.4.

Closes scylladb/scylladb#15830
2023-10-25 12:15:27 +03:00
Botond Dénes
ceb866fa2e Merge 'Make s3 upload sink PUT small objects' from Pavel Emelyanov
When upload-sink is flushed, it may notice that the upload had not yet been started and fall-back to plain PUT in that case. This will make small files uploading much nicer, because multipart upload would take 3 API calls (start, part, complete) in this case

fixes: #13014

Closes scylladb/scylladb#15824

* github.com:scylladb/scylladb:
  test: Add s3_client test for upload PUT fallback
  s3/client: Add PUT fallback to upload sink
2023-10-25 10:03:46 +03:00
Pavel Emelyanov
8e1ff745fa api: Remove http::context -> token_metadata dependency
Now the token metadata usage is fine grained by the relevant endpoint
handlers only.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:49:05 +03:00
Pavel Emelyanov
be9ea0c647 api: Pass shared_token_metadata instead of storage_service
The token metadata endpoints need token metadata, not storage service

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:48:27 +03:00
Pavel Emelyanov
c23193bed0 api: Move snitch endpoints that use token metadata only
Snitch is now a service can speaks for the local node only. In order to
get dc/rack for peers in the cluster one need to use topology which, in
turn, lives on token metadata. This patch moves the dc/rack getters to
api/token_metadata.cc next to other t.m. related endpoints.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:47:18 +03:00
Pavel Emelyanov
e4c0a4d34d api: Move storage_service endpoints that use token metadata only
There are few of them that don't need the storage service for anything
but get token metadata from. Move them to own .cc/.hh units.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:44:53 +03:00
Botond Dénes
6c90d166cc Merge 'build: cmake: avoid using large amount stack of when compiling parser ' from Kefu Chai
this mirrors what we have in `configure.py`, to build the CqlParser with `-O1`
and disable `-fsanitize-address-use-after-scope` when compiling CqlParser.cc
in order to prevent the compiler from emitting code which uses large amount of stack
space at the runtime.

Closes scylladb/scylladb#15819

* github.com:scylladb/scylladb:
  build: cmake: avoid using large amount stack of when compiling parser
  build: cmake: s/COMPILE_FLAGS/COMPILE_OPTIONS/
2023-10-24 16:19:51 +03:00
Nadav Har'El
4b80130b0b Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak
There are some schema modifications performed automatically (during bootstrap, upgrade etc.) by Scylla that are announced by multiple calls to `migration_manager::announce` even though they are logically one change. Precisely, they appear in:
- `system_distributed_keyspace::start`,
- `redis:create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (for the `system_traces` keyspace).

All these places contain a FIXME telling us to `announce` only once. There are a few reasons for this:
- calling `migration_manager::announce` with Raft is quite expensive -- taking a `read_barrier` is necessary, and that requires contacting a leader, which then must contact a quorum,
- we must implement a retrying mechanism for every automatic `announce` if `group0_concurrent_modification` occurs to enable support for concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs mentioned above would be harder, and fixing the FIXMEs later would also be harder.

This PR fixes the first two FIXMEs and improves the situation with the last one by reducing the number of the `announce` calls to two. Unfortunately, reducing this number to one requires a big refactor. We can do it as a follow-up to a new, more specific issue. Also, we leave a new FIXME.

Fixing the first two FIXMEs required enabling the announcement of a keyspace together with its tables. Until now, the code responsible for preparing mutations for a new table could assume the existence of the keyspace. This assumption wasn't necessary, but removing it required some refactoring.

Fixes #15437

Closes scylladb/scylladb#15594

* github.com:scylladb/scylladb:
  table_helper: announce twice in setup_keyspace
  table_helper: refactor setup_table
  redis: create_keyspace_if_not_exists_impl: fix indentation
  redis: announce once in create_keyspace_if_not_exists_impl
  db: system_distributed_keyspace: fix indentation
  db: system_distributed_keyspace: announce once in start
  tablet_allocator: update on_before_create_column_family
  migration_listener: add parameter to on_before_create_column_family
  alternator: executor: use new prepare_new_column_family_announcement
  alternator: executor: introduce create_keyspace_metadata
  migration_manager: add new prepare_new_column_family_announcement
2023-10-24 15:42:48 +03:00
David Garcia
a5519c7c1f docs: update cofig params design
Closes scylladb/scylladb#15827
2023-10-24 15:41:56 +03:00
Kefu Chai
f8104b92f8 build: cmake: detect rapidxml
we use rapidxml for parsing XML, so let's detect it before using it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15813
2023-10-24 15:12:04 +03:00
Pavel Emelyanov
caa3e751f7 test: Add s3_client test for upload PUT fallback
The test case creates non-jumbo upload simk and puts some bytes into it,
then flushes. In order to make sure the fallback did took place the
multipar memory tracker sempahore is broken in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 15:03:53 +03:00
Kamil Braun
2a21029ff5 Merge 'make topology_coordinator::run noexcept' from Gleb
Topology coordinator should handle failures internally as long as it
remains to be the coordinator. The raft state monitor is not in better
position to handle any errors thrown by it, all it can do it to restart
the coordinator. The series makes topology_coordinator::run handle all
the errors internally and mark the function as noexcept to not leak
error handling complexity into the raft state monitor.

* 'gleb/15728-fix' of github.com:scylladb/scylla-dev:
  storage_service: raft topology: mark topology_coordinator::run function as noexcept
  storage_service: raft topology: do not throw error from fence_previous_coordinator()
2023-10-24 12:16:36 +02:00
Kefu Chai
4abcec9296 test: add __repr__ for MinIoServer and S3_Server
it is printed when pytest passes it down as a fixture as part of
the logging message. it would help with debugging a object_store test.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15817
2023-10-24 12:35:49 +03:00
Pavel Emelyanov
63f2bdca01 s3/client: Add PUT fallback to upload sink
When the non-jumbo sink is flushed and notices that the real upload is
not started yet, it may just go ahead and PUT the buffers into the
object with the single request.

For jumbo sink the fallback is not implemented as it likely doesn't make
and any sense -- jumbo sinks are unlikely to produce less than 5Mb of
data so it's going to be dead code anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 10:59:46 +03:00
Gleb Natapov
dcaaa74cd4 storage_service: raft topology: mark topology_coordinator::run function as noexcept
The function handled all exceptions internally. By making it noexcept we
make sure that the caller (raft_state_monitor_fiber) does not need
handle any exceptions from the topology coordinator fiber.
2023-10-24 10:58:45 +03:00
Gleb Natapov
65bf5877e7 storage_service: raft topology: do not throw error from fence_previous_coordinator()
Throwing error kills the topology coordinator monitor fiber. Instead we
retry the operation until it succeeds or the node looses its leadership.
This is fine before for the operation to succeed quorum is needed and if
the quorum is not available the node should relinquish its leadership.

Fixes #15728
2023-10-24 10:57:48 +03:00
Botond Dénes
23898581d5 cql3: mutation_fragments_select_statement: use host_id instead of node
The statement only uses the node to get its host_id later. Simpler to
obtain and store only the host_id int he first place.
2023-10-24 03:12:58 -04:00
Botond Dénes
3cb1669340 cql3: mutation_fragments_select_statement: pin erm reference
This query bypasses the usual read-path in storage-proxy and therefore
also misses the erm pinning done by storage-proxy. To avoid a vnode
being pulled from under its feet, do the erm pinning in the statement
itself.
2023-10-24 03:12:36 -04:00
Botond Dénes
0cba973972 Update tools/java submodule
* tools/java 3c09ab97...86a200e3 (1):
  > cassandra-stress: add storage options
2023-10-24 09:41:36 +03:00
Kefu Chai
9347b61d3b build: cmake: avoid using large amount stack of when compiling parser
this mirrors what we have in `configure.py`, to build the CqlParser with -O1
and disable sanitize-address-use-after-scope when compiling CqlParser.cc
in order to prevent the compiler from emitting code which uses large amount of stack
at the runtime.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-24 12:40:20 +08:00
Kefu Chai
3da02e1bf4 build: cmake: s/COMPILE_FLAGS/COMPILE_OPTIONS/
according to
https://cmake.org/cmake/help/latest/prop_sf/COMPILE_FLAGS.html,
COMPILE_FLAGS has been superseded by COMPILE_OPTIONS. so let's
replace the former with the latter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-24 12:40:20 +08:00
Pavel Emelyanov
7c580b4bd4 Merge 'sstable: switch to uuid identifier for naming S3 sstable objects' from Kefu Chai
before this change, we create a new UUID for a new sstable managed by the s3_storage, and we use the string representation of UUID defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for naming the objects stored on s3_storage. but this representation is not what we are using for storing sstables on local filesystem when the option of "uuid_sstable_identifiers_enabled" is enabled. instead, we are using a base36-based representation which is shorter.

to be consistent with the naming of the sstables created for local filesystem, and more importantly, to simplify the interaction between the local copy of sstables and those stored on object storage, we should use the same string representation of the sstable identifier.

so, in this change:

1. instead of creating a new UUID, just reuse the generation of the sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the base36-based one (implemented by fmt::formatter<sstables::generation_type>)

Fixes #14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#14406

* github.com:scylladb/scylladb:
  sstable: remove _remote_prefix from s3_storage
  sstable: switch to uuid identifier for naming S3 sstable objects
2023-10-23 21:05:13 +03:00
Pavel Emelyanov
d7031de538 Merge 'test/pylib: extract the env variable related functions out' from Kefu Chai
this series extracts the the env variables related functions out and remove unused `import`s for better readability.

Closes scylladb/scylladb#15796

* github.com:scylladb/scylladb:
  test/pylib: remove duplicated imports
  test/pylib: extract the env variable printing into MinIoServer
  test/pylib: extract _set_environ() out
2023-10-23 21:03:03 +03:00
Aleksandra Martyniuk
0c6a3f568a compaction: delete default_compaction_progress_monitor
default_compaction_progress_monitor returns a reference to a static
object. So, it should be read-only, but its users need to modify it.

Delete default_compaction_progress_monitor and use one's own
compaction_progress_monitor instance where it's needed.

Closes scylladb/scylladb#15800
2023-10-23 16:03:34 +03:00
Anna Stuchlik
55ee999f89 doc: enable publishing docs for branch-5.4
This commit enables publishing documentation
from branch-5.4. The docs will be published
as UNSTABLE (the warning about version 5.4
being unstable will be displayed).

Closes scylladb/scylladb#15762
2023-10-23 15:47:01 +03:00
Avi Kivity
ee9cc450d4 logalloc: report increases of reserves
The log-structured allocator maintains memory reserves to so that
operations using log-strucutured allocator memory can have some
working memory and can allocate. The reserves start small and are
increased if allocation failures are encountered. Before starting
an operation, the allocator first frees memory to satisfy the reserves.

One problem is that if the reserves are set to a high value and
we encounter a stall, then, first, we have no idea what value
the reserves are set to, and second, we have no idea what operation
caused the reserves to be increased.

We fix this problem by promoting the log reports of reserve increases
from DEBUG level to INFO level and by attaching a stack trace to
those reports. This isn't optimal since the messages are used
for debugging, not for informing the user about anything important
for the operation of the node, but I see no other way to obtain
the information.

Ref #13930.

Closes scylladb/scylladb#15153
2023-10-23 13:37:50 +02:00
Tomasz Grabiec
4af585ec0e Merge 'row_cache: make_reader_opt(): make make_context() reentrant ' from Botond Dénes
Said method is called in an allocating section, which will re-try the enclosed lambda on allocation failure. `read_context()` however moves the permit parameter so on the second and later calls, the permit will be in a moved-from state, triggering a `nullptr` dereference and therefore a segfault.

We already have a unit test (`test_exception_safety_of_reads` in `row_cache_test.cc`) which was supposed to cover this, but:
* It only tests range scans, not single partition reads, which is a separate path.
* Turns out allocation failure tests are again silently broken (no error is injected at all). This is because `test/lib/memtable_snapshot_source.hh` creates a critical alloc section which accidentally covers the entire duration of tests using it.

Fixes: #15578

Closes scylladb/scylladb#15614

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: test_exception_safety_of_reads: also cover single-partition reads
  test/lib/memtable_snapshot_source: disable critical alloc section while waiting
  row_cache: make_reader_opt(): make make_context() reentrant
2023-10-23 11:22:13 +02:00
Raphael S. Carvalho
ea6c281b9f replica: Fix major compaction semantics by performing off-strategy first
Major compaction semantics is that all data of a table will be compacted
together, so user can expect e.g. a recently introduced tombstone to be
compacted with the data it shadows.
Today, it can happen that all data in maintenance set won't be included
for major, until they're promoted into main set by off-strategy.
So user might be left wondering why major is not having the expected
effect.
To fix this, let's perform off-strategy first, so data in maintenance
set will be made available by major. A similar approach is done for
data in memtable, so flush is performed before major starts.
The only exception will be data in staging, which cannot be compacted
until view building is done with it, to avoid inconsistency in view
replicas.
The serialization in comapaction manager of reshape jobs guarantee
correctness if there's an ongoing off-strategy on behalf of the
table.

Fixes #11915.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15792
2023-10-23 11:32:03 +03:00
Nadav Har'El
e7dd0ec033 test/cql-pytest: reproduce incompatibility with same-name bind marks
This patch adds a reproducer for a minor compatibility between Scylla's
and Cassandra's handling of a prepared statement when a bind marker with
the same name is used more than once, e.g.,
```
SELECT * FROM tbl WHERE p=:x AND c=:x
```
It turns out that Scylla tells the driver that there is only one bind
marker, :x, whereas Cassandra tells the driver that there are two bind
markers, both named :x. This makes no different if the user passes
a map `{'x': 3}`, but if the user passes a tuple, Scylla accepts only
`(3,)` (assigning both bind markers the same value) and Cassandra
accepts only `(3,3)`.

The test added in this patch demonstrates this incompatibility.
It fails on Scylla, passes on Cassandra, and is marked "xfail".

Refs #15559

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#15564
2023-10-23 11:19:15 +03:00
Aleksandra Martyniuk
a1271d2d5c repair: throw more detailed exception
Exception thrown from row_level_repair::run does not show the root
cause of a failure making it harder to debug.

Add the internal exception contents to runtime_error message.

After the change the log will mention the real cause (last line), e.g.:

repair - repair[92db0739-584b-4097-b6e2-e71a66e40325]: 33 out of 132 ranges failed,
keyspace=system_distributed, tables={cdc_streams_descriptions_v2, cdc_generation_timestamps,
view_build_status, service_levels}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false,
failed_because=seastar::nested_exception: std::runtime_error (Failed to repair for keyspace=system_distributed,
cf=cdc_streams_descriptions_v2, range=(8720988750842579417,+inf))
(while cleaning up after seastar::abort_requested_exception (abort requested))

Closes scylladb/scylladb#15770
2023-10-23 11:15:25 +03:00
Botond Dénes
950a1ff22c Merge 'doc: improve the docs for handling failures' from Anna Stuchlik
This PR improves the way of how handling failures is documented and accessible to the user.
- The Handling Failures section is moved from Raft to Troubleshooting.
- Two new topics about failure are added to Troubleshooting with a link to the Handling Failures page (Failure to Add, Remove, or Replace a Node, Failure to Update the Schema).
- A note is added to the add/remove/replace node procedures to indicate that a quorum is required.

See individual commits for more details.

Fixes https://github.com/scylladb/scylladb/issues/13149

Closes scylladb/scylladb#15628

* github.com:scylladb/scylladb:
  doc: add a note about Raft
  doc: add the quorum requirement to procedures
  doc: add more failure info to Troubleshooting
  doc: move Handling Failures to Troubleshooting
2023-10-23 11:09:28 +03:00
Kefu Chai
5a17a02abb build: cmake: add -ffile-prefix-map option
this mirrors what we already have in configure.py.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15798
2023-10-23 10:26:21 +03:00
Botond Dénes
940c2d1138 Merge 'build: cmake: use add_compile_options() and add_link_options() when appropriate ' from Kefu Chai
instead of appending the options to the CMake variables, use the command to do this. simpler this way. and the bonus is that the options are de-duplicated.

Closes scylladb/scylladb#15797

* github.com:scylladb/scylladb:
  build: cmake: use add_link_options() when appropriate
  build: cmake: use add_compile_options() when appropriate
2023-10-23 09:58:10 +03:00
Botond Dénes
c960c2cdbf Merge 'build: extract code fragments into functions' from Kefu Chai
this series is one of the steps to remove global statements in `configure.py`.

not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.

Refs #15379

Closes scylladb/scylladb#15780

* github.com:scylladb/scylladb:
  build: update modeval using a dict
  build: pass args.test_repeat and args.test_timeout explicitly
  build: pull in jsoncpp using "pkgs"
  build: build: extract code fragments into functions
2023-10-23 09:42:37 +03:00
Kefu Chai
0080b15939 build: cmake: use add_link_options() when appropriate
instead of appending to CMAKE_EXE_LINKER_FLAGS*, use
add_link_options() to add more options. as CMAKE_EXE_LINKER_FLAGS*
is a string, and typically set by user, let's use add_link_options()
instead.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 12:06:42 +08:00
Kefu Chai
686adec52e build: cmake: use add_compile_options() when appropriate
instead of appending to CMAKE_CXX_FLAGS, use add_compile_options()
to add more options. as CMAKE_CXX_FLAGS is a string, and typically
set by user, let's use add_compile_options() instead, the options
added by this command will be added before CMAKE_CXX_FLAGS, and
will have lower priority.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 12:06:42 +08:00