Commit Graph

34498 Commits

Author SHA1 Message Date
Botond Dénes
8e117501ac tools/scylla-sstable: extract sstable_consumer interface into own header
So it can be used in code outside scylla-sstable.cc. This source file is
quite large already, and as we have yet another large chunk of code to
add, we want to add it in a separate file.
2023-01-09 09:46:57 -05:00
Botond Dénes
9b1c486051 tools/json_writer: add accessor to underlying writer 2023-01-09 09:46:57 -05:00
Botond Dénes
cfb5afbe9b tools/scylla-sstable: fix indentation
Left broken by previous patches.
2023-01-09 09:46:57 -05:00
Botond Dénes
d42b0bb5d5 tools/scylla-sstable: export mutation_fragment_json_writer declaration
To json_writer.hh. Method definition are left in scylla-sstable.cc.
Indentation is left broken, will be fixed by the next patch.
2023-01-09 09:46:57 -05:00
Botond Dénes
517135e155 tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer
There is no point in the former implementing said interface. For one it
is a futurized interface, which is not needed for something writing to
the stdout. Rename the methods to follow the naming convention of rjson
writers more closely.
2023-01-09 09:46:57 -05:00
Botond Dénes
0ee1c6ca57 tools/scylla-sstable: extract json writing logic from json_dumper
We want to split this class into two parts: one with the actual logic
converting mutation fragments to json, and a wrapper over this one,
which implements the sstable_consumer interface.
As a first step we extract the class as is (no changes) and just forward
all-calls from now empty wrapper to it.
2023-01-09 09:46:57 -05:00
Botond Dénes
55ef0ed421 tools/scylla-sstable: extract json_writer into its own header
Other source files will want to use it soon.
2023-01-09 09:46:57 -05:00
Botond Dénes
8623818a8d tools/scylla-sstable: use json_writer::DataKey() to write all keys
This method was renamed from its previous name of PartitionKey. Since in
json partition keys and clustering keys look alike, with the only
difference being that the former may also have a token, it makes to have
a single method to write them (with an optional token parameter). This
was the case at some point, json_dumper::write_key() taking this role.
However at a later point, json_writer::PartitionKey() was introduced and
now the code uses both. Standardize on the latter and give it a more
generic name.
2023-01-09 09:46:57 -05:00
Botond Dénes
602fca0a12 tools/scylla-types: fix use-after-free on main lambda captures
The main lambda of scylla-types, the one passed to app_template::run()
was recently made a coroytine. app_template::run() however doesn't keep
this lambda alive and hence after the first suspention point, accessing
the lambda's captures triggers use-after-free.
The simple fix is to convert the coroutine into continuation chain.
2023-01-09 09:46:57 -05:00
Michał Chojnowski
08b3a9c786 configure: don't reduce parsers' optimization level to 1 in release
The line modified in this patch was supposed to increase the
optimization levels of parsers in debug mode to 1, because they
were too slow otherwise. But as a side effect, it also reduced the
optimization level in release mode to 1. This is not a problem
for the CQL frontend, because statement preparation is not
performance-sensitive, but it is a serious performance problem
for Alternator, where it lies in the hot path.

Fix this by only applying the -O1 to debug modes.

Fixes #12463

Closes #12460
2023-01-06 18:04:36 +02:00
Avi Kivity
6868dcf30b tools: toolchain: drop s390x from prepare script architecture list
It's been a long while since we built ScyllaDB for s390x, and in
fact the last time I checked it was broken on the ragel parser
generator generating bad source files for the HTTP parser. So just
drop it from the list.

I kept s390x in the architecture mapping table since it's still valid.

Closes #12455
2023-01-06 09:08:01 +02:00
Botond Dénes
2612f98a6c Merge 'Abort repair tasks' from Aleksandra Martyniuk
Aborting of repair operation is fully managed by task manager.
Repair tasks are aborted:
- on shutdown; top level repair tasks subscribe to global abort source. On shutdown all tasks are aborted recursively
- through node operations (applies to data_sync_repair_task_impls and their descendants only); data_sync_repair_task_impl subscribes to node_ops_info abort source
- with task manager api (top level tasks are abortable)
- with storage_service api and on failure; these cases were modified to be aborted the same way as the ones from above are.

Closes #12085

* github.com:scylladb/scylladb:
  repair: make top level repair tasks abortable
  repair: unify a way of aborting repair operations
  repair: delete sharded abort source from node_ops_info
  repair: delete unused node_ops_info from data_sync_repair_task_impl
  repair: delete redundant abort subscription from shard_repair_task_impl
  repair: add abort subscription to data sync task
  tasks: abort tasks on system shutdown
2023-01-05 15:21:35 +01:00
Avi Kivity
cc6010b512 Merge 'Make restore_replica_count abortable' from Benny Halevy
Similar to the way we allow aborting streaming-based
removenode, subscribe to storage_service::_abort_source
to request abort locally and pass a shared_ptr<abort_source>
to `node_ops_info`, used to abort removenode_with_repair
on shutdown.

Fixes #12429

Closes #12430

* github.com:scylladb/scylladb:
  storage_service: restore_replica_count: demote status_checker related logging to debug level
  storage_service: restore_replica_count: allow aborting removenode_with_repair
  storage_service: coroutinize restore_replica_count
  storage_service: restore_replica_count: undefer stop_status_checker
  storage_service: restore_replica_count: handle exceptions from stream_async and send_replication_notification
  storage_service: restore_replica_count: coroutinize status_checker
2023-01-05 15:21:35 +01:00
Kamil Braun
09da661eeb Merge 'raft: replace experimental raft option with dedicated flag' from Gleb Natapov
Unlike other experimental feature we want to raft to be opt in even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.

* 'consistent-cluster-management-flag' of github.com:scylladb/scylla-dev:
  raft: replace experimental raft option with dedicated flag
  main: move supervisor notification about group registry start where it actually starts
2023-01-05 15:21:35 +01:00
Kamil Braun
df72536fc5 Merge 'docs: add the upgrade guide for Enterprise from 2022.1 to 2022.2' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/12314

This PR adds the upgrade guide for ScyllaDB Enterprise - from version
2022.1 to 2022.2.  Using this opportunity, I've replaced "Scylla" with
"ScyllaDB" in the upgrade-enterprise index file.

In previous releases, we added several upgrade guides - one per platform
(and version). In this PR, I've merged the information for different
platforms to create one generic upgrade guide. It is similar to what
@kbr- added for the Open Source upgrade guide from 5.0 to 5.1. See
https://docs.scylladb.com/stable/upgrade/upgrade-opensource/upgrade-guide-from-5.0-to-5.1/.

Closes #12339

* github.com:scylladb/scylladb:
  docs: add the info about minor release
  docs: add the new upgade guide 2022.1 to 2022.2 to the index and the toctree
  docs: add the index file for the new upgrage guide from 2022.1 to 2022.2
  docs: add the metrics update file to the upgrade guide 2022.1 to 2022.2
  docs: add the upgrade guide for ScyllaDB Enterprise from 2022.1 to 2022.2
2023-01-04 18:07:00 +01:00
Benny Halevy
086546f575 storage_service: restore_replica_count: demote status_checker related logging to debug level
the status_checker is not the main line of business
of restore_replica_count, starting and stopping it
do nt seem to deserve info level logging, which
might have been useful in the past to debug issues
surrounding that.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:05:04 +02:00
Benny Halevy
3879ee1db8 storage_service: restore_replica_count: allow aborting removenode_with_repair
Similar to the way we allow aborting streaming-based
removenode, subscribe to storage_service::_abort_source
to request abort locally and pass a shared_ptr<abort_source>
to `node_ops_info`, used to abort removenode_with_repair
on shutdown.

Fixes #12429

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:05:04 +02:00
Benny Halevy
afece5bdc4 storage_service: coroutinize restore_replica_count
and unwrap the async thread started for streaming.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:05:04 +02:00
Benny Halevy
d1eadc39c1 storage_service: restore_replica_count: undefer stop_status_checker
Now that all exceptions in the rest of the function
are swallowed, just execute the stop_status_checker
deferred action serially before returning, on the
wau to coroutinizing restore_replica_count (since
we can't co_await status_checker inside the deferred
action).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:05:04 +02:00
Benny Halevy
788ecb738d storage_service: restore_replica_count: handle exceptions from stream_async and send_replication_notification
On the way to coroutinizing restore_replica_count,
extract awaiting stream_async and send_replication_notification
into a try/catch blocks so we can later undefer stop_status_checker.

The exception is still returned as an exceptional future
which is logged by the caller as warning.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:02:42 +02:00
Benny Halevy
b54d121dfd storage_service: restore_replica_count: coroutinize status_checker
There is no need to start a thread for the status_checker
and can be implemented using a background coroutine.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-04 19:02:20 +02:00
Botond Dénes
1d273a98b9 readers/multishard: shard_reader::close() silence read-ahead timeouts
Timouts are benign, especially on a read-ahead that turned out to be not
needed at all. They just introduce noise in the logs, so silence them.

Fixes: #12435

Closes #12441
2023-01-04 16:10:09 +02:00
Kamil Braun
4268b1bbc2 Merge 'raft: raft_group0, register RPC verbs on all shards' from Gusev Petr
raft_group0 used to register RPC verbs only on shard 0. This worked on
clusters with the same --smp setting on all nodes, since RPCs in this
case are processed on the same shard as the calling code, and
raft_group0 methods only run on shard 0.

A new test test_nodes_with_different_smp was added to identify the
problem. Since --smp can only be specified via the command line, a
corresponding parameter was added to the ManagerClient.server_add
method.  It allows to override the default parameters set by the
SCYLLA_CMDLINE_OPTIONS variable by changing, adding or deleting
individual items.

Fixes: #12252

Closes #12374

* github.com:scylladb/scylladb:
  raft: raft_group0, register RPC verbs on all shards
  raft: raft_append_entries, copy entries to the target shard
  test.py, allow to specify the node's command line in test
2023-01-04 11:11:21 +01:00
Marcin Maliszkiewicz
61a9816bad utils/rjson: enable inlining in rapidjson library
Due to lack of NDEBUG macro inlining was disabled. It's
important for parsing and printing performance.

Testing with perf_simple_query shows that it reduced around
7000 insns/op, thus increasing median tps by 4.2% for the alternator frontend.

Because inlined functions are called for every character
in json this scales with request/response size. When
default write size is increased by around 7x (from ~180 to ~ 1255
bytes) then the median tps increased by 12%.

Running:
./build/release/test/perf/perf_simple_query_g --smp 1 \
                                --alternator forbid --default-log-level error \
                                --random-seed=1235000092 --duration=60 --write

Results before the patch:

median 46011.50 tps (197.1 allocs/op,  12.1 tasks/op,  170989 insns/op,        0 errors)
median absolute deviation: 296.05
maximum: 46548.07
minimum: 42955.49

Results after the patch:

median 47974.79 tps (197.1 allocs/op,  12.1 tasks/op,  163723 insns/op,        0 errors)
median absolute deviation: 303.06
maximum: 48517.53
minimum: 44083.74

The change affects both json parsing and printing.

Closes #12440
2023-01-04 10:27:35 +02:00
Michał Jadwiszczak
83bb77b8bb test/boost/cql_query_test: enable parallelized_aggregation
Run tests for parallelized aggregation with
`enable_parallelized_aggregation` set always to true, so the tests work
even if the default value of the option is false.

Closes #12409
2023-01-04 10:11:25 +02:00
Anna Stuchlik
c4d779e447 doc: Fix https://github.com/scylladb/scylla-doc-issues/issues/854 - update the procedure to update topology strategy when nodes are on different racks
Closes #12439
2023-01-04 09:50:10 +02:00
Avi Kivity
f600ad5c1b Update seastar submodule
* seastar 3db15b5681...ca586cfb8d (28):
  > reactor: trim returned buffer to received number of bytes
  > util/process: include used header
  > build: drop unused target_include_directories()
  > build: use BUILD_IN_SOURCE instead chdir <SOURCE_DIR>
  > build: specify CMake policy CMP0135 to new
  > tests: only destroy allocated pending connections
  > build: silence the output when generating private keys
  > tests, httpd: Limit loopback connection factory sharding
  > lw_shared_ptr: Add nullptr_t comparing operators
  > noncopyable_function: Add concept for (Func func) constructor
  > reactor: add process::terminate() and process::kill()
  > Merge 'tests, include: include headers without ".." in path' from Kefu Chai
  > build: customize toolset for building Boost
  > build: use different toolset base on specified compiler
  > allocator: add an option to reserve additional memory for the OS
  > Merge 'build: pass cflags and ldflags to cooking.sh' from Kefu Chai
  > build: build static library of cryptopp
  > gate: add gate holders debugging
  > build: detect debug build of yaml-cpp also
  > build: do not use pkg_search_module(IMPORTED_TARGET) for finding yaml-cpp
  > build: bump yaml-cpp to 0.7.0 in cooking_recipe
  > build: bump cryptopp to 8.7.0 in cooking_recipe
  > build: bump boost to 1.81.0 in cooking_recipe
  > build: bump fmtlib to 9.1.0 in cooking_recipe
  > shared_ptr: add overloads for fmt::ptr()
  > chunked_fifo: const_iterator: use the base class ctor
  > build: s/URING_LIBARIES/URING_LIBRARIES/
  > build: export the full path of uring with URING_LIBRARIES

Closes #12434
2023-01-03 17:58:31 +02:00
Alejo Sanchez
889acf710c test/python: increase CQL connection timeout for...
test_ssl

In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to a more reasonable time.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12408
2023-01-03 17:10:46 +02:00
Nadav Har'El
1c96d2134f docs,alternator: link to issue about missing ACL feature
The alternator compatibility.md document mentions the missing ACL
(access control) feature, but unlike other missing features we
forgot to link to the open issue about this missing feature.
So let's add that link.

Refs #5047.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12399
2023-01-03 16:50:33 +02:00
Kamil Braun
fc57626afa Merge 'docs: remove auto_bootstrap option from the documentation' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/12318

This PR removes all occurrences of the `auto_bootstrap` option in the docs.
In most cases, I've simply removed the option name and its definition, but sometimes additional changes were necessary:
- In node-joined-without-any-data.rst, I removed the `auto_bootstrap `option as one of the causes of the problem.
- In rebuild-node.rst, I removed the first step in the procedure (enabling the `auto_bootstrap `option).
- In admin. rst, I removed the section about manual bootstrapping - it's based on setting `auto_bootstrap` to false, which is not possible now.

Closes #12419

* github.com:scylladb/scylladb:
  docs: remove the auto_bootstrap option from the admin procedures - involves removing the Manual Bootstraping section
  docs: remove the auto_bootstrap option from the procedure to replace a dead node
  docs: remove the auto_bootstrap option from the Troubleshooting article about a node joining with no data
  docs: remove the auto_bootstrap option from the procedure to rebuild a node after losing the data volume
  docs: remove the auto_bootstrap option from the procedures to create a cluster or add a DC
2023-01-03 15:44:00 +01:00
Petr Gusev
8417840647 raft: raft_group0, register RPC verbs on all shards
raft_group0 used to register RPC verbs only on shard 0.
This worked on clusters with the same --smp setting on
all nodes, since RPCs in this case are (usually)
processed on the same shard as the calling code,
and raft_group0 methods only run on shard 0.

A new test test_nodes_with_different_smp was added
to identify the problem.

Fixes: #12252
2023-01-03 17:04:07 +03:00
Anna Stuchlik
00ef20c3df docs: remove the auto_bootstrap option from the admin procedures - involves removing the Manual Bootstraping section 2023-01-03 14:48:01 +01:00
Anna Stuchlik
b7d62b2fc7 docs: remove the auto_bootstrap option from the procedure to replace a dead node 2023-01-03 14:47:55 +01:00
Anna Stuchlik
bc62e61df1 docs: remove the auto_bootstrap option from the Troubleshooting article about a node joining with no data 2023-01-03 14:46:38 +01:00
Anna Stuchlik
1602f27cd7 docs: remove the auto_bootstrap option from the procedure to rebuild a node after losing the data volume 2023-01-03 14:45:08 +01:00
Petr Gusev
7725e03a09 raft: raft_append_entries, copy entries to the target shard
If append_entries RPC was received on a non-zero shard, we may
need to pass it to a zero (or, potentially, some other) shard.
The problem is that raft::append_request contains entries in the form
of raft::log_entry_ptr == lw_shared_ptr<log_entry>, which doesn't
support cross-shard reference counting. In debug mode it contains
a special ref-counting facility debug_shared_ptr_counter_type,
which resorts to on_internal_error if it detects such a case.

To solve this, we just copy log entries to the target shard if it
isn't equal to the current one. In most cases, if --smp setting
is the same on all nodes, RPC will be handled on zero shard,
so there will be no overhead.
2023-01-03 15:25:00 +03:00
Petr Gusev
1c23390f12 test.py, allow to specify the node's command line in test
An optional parameter cmdline has been added to
the ManagerClient.server_add method.
It allows you to override the default parameters
set by the SCYLLA_CMDLINE_OPTIONS variable
by changing, adding or deleting individual
items. To change or add a parameter just specify
its name and value one after the other.
To remove parameter use the special keyword
__remove__ as a value. To set a parameter
without a value (such as --overprovisioned)
use the special keyword __missing__ as the value.
2023-01-03 15:24:54 +03:00
Nadav Har'El
eb85f136c8 cql-pytest: document how to write new cql-pytest tests
Add to test/cql-pytest/README.md an explanation of the philosophy
of the cql-pytest test suite, and some guideliness on how to write
good tests in that framework.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12400
2023-01-03 12:13:22 +02:00
Anna Stuchlik
994bc33147 docs: fix the command on the Manager-Monitoring Integration troubleshooting page
Closes #12375
2023-01-03 11:41:16 +02:00
Anna Stuchlik
9d17d812c0 docs: Fix https://github.com/scylladb/scylla-doc-issues/issues/870, update the nodetool rebuild command
Closes #12416
2023-01-03 11:40:40 +02:00
Gleb Natapov
1688163233 raft: replace experimental raft option with dedicated flag
Unlike other experimental feature we want to raft to be optional even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.
2023-01-03 11:15:11 +02:00
Gleb Natapov
29060cc235 main: move supervisor notification about group registry start where it actually starts
99fe580068 moved raft_group_registry::start call a bit later, but
forget to move supervisor notification call. Do it now.
2023-01-03 11:09:30 +02:00
Botond Dénes
2ef71e9c70 Merge 'Improve verbosity of task manager api' from Aleksandra Martyniuk
The PR introduces changes to task manager api:
- extends tasks' list returned with get_tasks with task type,
   keyspace, table, entity, and sequence number
- extends status returned with get_task_status and wait_task
   with a list of children's ids

Closes #12338

* github.com:scylladb/scylladb:
  api: extend status in task manager api
  api: extend get_tasks in task manager api
2023-01-03 10:39:41 +02:00
Botond Dénes
82101b786d Merge 'docs: document scylla-api-client' from Anna Stuchlik
Fixes https://github.com/scylladb/scylladb/issues/11999.

This PR adds a description of scylla-api-cli.

Closes #12392

* github.com:scylladb/scylladb:
  docs: fix the description of the system log POST example
  docs: uptate the curl tool name
  docs: describe how to use the scylla-api-client tool
  docs: fix the scylla-api-client tool name
  docs: document scylla-api-cli
2023-01-03 10:30:04 +02:00
Benny Halevy
63c2cdafe8 sstables: index_reader: close(index_bound&) reset current_list
When closing _lower_bound and *_upper_bound
in the final close() call, they are currently left with
an engaged current_list member.

If the index_reader uses a _local_index_cache,
it is evicted with evict_gently which will, rightfully,
see the respective pages as referenced, and they won't be
evicted gently (only later when the index_reader is destroyed).

Reset index_bound.current_list on close(index_bound&)
to free up the reference.

Ref #12271

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12370
2023-01-02 16:42:33 +01:00
Avi Kivity
767b7be8be Merge 'Get rid of handle_state_replacing' from Benny Halevy
Since [repair: Always use run_replace_ops](2ec1f719de), nodes no longer publish HIBERNATE state so we don't need to support handling it.

Replace is now always done using node operations (using repair or streaming).
so nodes are never expected to change status to HIBERNATE.

Therefore storage_service:handle_state_replacing is not needed anymore.

This series gets rid of it and updates documentation related to STATUS:HIBERNATE respectively.

Fixes #12330

Closes #12349

* github.com:scylladb/scylladb:
  docs: replace-dead-node: get rid of hibernate status
  storage_service: get rid of handle_state_replacing
2023-01-02 13:35:29 +02:00
Gleb Natapov
28952d32ff storage_service: move leave_ring outside of unbootstrap()
We want to reuse the later without the call.

Message-Id: <20221228144944.3299711-17-gleb@scylladb.com>
2023-01-02 12:03:29 +02:00
Gleb Natapov
229cef136d raft: add trace logging to raft::server::start
Allows to see initial state of the server during start.

Message-Id: <20221228144944.3299711-15-gleb@scylladb.com>
2023-01-02 11:57:53 +02:00
Gleb Natapov
96453ff75f service: raft: improve group0_state_machine::apply logging
Trace how many entries are applied as well.

Message-Id: <20221228144944.3299711-14-gleb@scylladb.com>
2023-01-02 11:57:16 +02:00
Gleb Natapov
dbd5b97201 storage_service: improve logging in update_pending_ranges() function
We pass the reason for the change. Log it as well.

Message-Id: <20221228144944.3299711-11-gleb@scylladb.com>
2023-01-02 11:54:03 +02:00