So it can be used in code outside scylla-sstable.cc. This source file is
quite large already, and as we have yet another large chunk of code to
add, we want to add it in a separate file.
There is no point in the former implementing said interface. For one it
is a futurized interface, which is not needed for something writing to
the stdout. Rename the methods to follow the naming convention of rjson
writers more closely.
We want to split this class into two parts: one with the actual logic
converting mutation fragments to json, and a wrapper over this one,
which implements the sstable_consumer interface.
As a first step we extract the class as is (no changes) and just forward
all-calls from now empty wrapper to it.
This method was renamed from its previous name of PartitionKey. Since in
json partition keys and clustering keys look alike, with the only
difference being that the former may also have a token, it makes to have
a single method to write them (with an optional token parameter). This
was the case at some point, json_dumper::write_key() taking this role.
However at a later point, json_writer::PartitionKey() was introduced and
now the code uses both. Standardize on the latter and give it a more
generic name.
The main lambda of scylla-types, the one passed to app_template::run()
was recently made a coroytine. app_template::run() however doesn't keep
this lambda alive and hence after the first suspention point, accessing
the lambda's captures triggers use-after-free.
The simple fix is to convert the coroutine into continuation chain.
The line modified in this patch was supposed to increase the
optimization levels of parsers in debug mode to 1, because they
were too slow otherwise. But as a side effect, it also reduced the
optimization level in release mode to 1. This is not a problem
for the CQL frontend, because statement preparation is not
performance-sensitive, but it is a serious performance problem
for Alternator, where it lies in the hot path.
Fix this by only applying the -O1 to debug modes.
Fixes#12463Closes#12460
It's been a long while since we built ScyllaDB for s390x, and in
fact the last time I checked it was broken on the ragel parser
generator generating bad source files for the HTTP parser. So just
drop it from the list.
I kept s390x in the architecture mapping table since it's still valid.
Closes#12455
Aborting of repair operation is fully managed by task manager.
Repair tasks are aborted:
- on shutdown; top level repair tasks subscribe to global abort source. On shutdown all tasks are aborted recursively
- through node operations (applies to data_sync_repair_task_impls and their descendants only); data_sync_repair_task_impl subscribes to node_ops_info abort source
- with task manager api (top level tasks are abortable)
- with storage_service api and on failure; these cases were modified to be aborted the same way as the ones from above are.
Closes#12085
* github.com:scylladb/scylladb:
repair: make top level repair tasks abortable
repair: unify a way of aborting repair operations
repair: delete sharded abort source from node_ops_info
repair: delete unused node_ops_info from data_sync_repair_task_impl
repair: delete redundant abort subscription from shard_repair_task_impl
repair: add abort subscription to data sync task
tasks: abort tasks on system shutdown
Similar to the way we allow aborting streaming-based
removenode, subscribe to storage_service::_abort_source
to request abort locally and pass a shared_ptr<abort_source>
to `node_ops_info`, used to abort removenode_with_repair
on shutdown.
Fixes#12429Closes#12430
* github.com:scylladb/scylladb:
storage_service: restore_replica_count: demote status_checker related logging to debug level
storage_service: restore_replica_count: allow aborting removenode_with_repair
storage_service: coroutinize restore_replica_count
storage_service: restore_replica_count: undefer stop_status_checker
storage_service: restore_replica_count: handle exceptions from stream_async and send_replication_notification
storage_service: restore_replica_count: coroutinize status_checker
Unlike other experimental feature we want to raft to be opt in even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.
* 'consistent-cluster-management-flag' of github.com:scylladb/scylla-dev:
raft: replace experimental raft option with dedicated flag
main: move supervisor notification about group registry start where it actually starts
Fixes https://github.com/scylladb/scylladb/issues/12314
This PR adds the upgrade guide for ScyllaDB Enterprise - from version
2022.1 to 2022.2. Using this opportunity, I've replaced "Scylla" with
"ScyllaDB" in the upgrade-enterprise index file.
In previous releases, we added several upgrade guides - one per platform
(and version). In this PR, I've merged the information for different
platforms to create one generic upgrade guide. It is similar to what
@kbr- added for the Open Source upgrade guide from 5.0 to 5.1. See
https://docs.scylladb.com/stable/upgrade/upgrade-opensource/upgrade-guide-from-5.0-to-5.1/.
Closes#12339
* github.com:scylladb/scylladb:
docs: add the info about minor release
docs: add the new upgade guide 2022.1 to 2022.2 to the index and the toctree
docs: add the index file for the new upgrage guide from 2022.1 to 2022.2
docs: add the metrics update file to the upgrade guide 2022.1 to 2022.2
docs: add the upgrade guide for ScyllaDB Enterprise from 2022.1 to 2022.2
the status_checker is not the main line of business
of restore_replica_count, starting and stopping it
do nt seem to deserve info level logging, which
might have been useful in the past to debug issues
surrounding that.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Similar to the way we allow aborting streaming-based
removenode, subscribe to storage_service::_abort_source
to request abort locally and pass a shared_ptr<abort_source>
to `node_ops_info`, used to abort removenode_with_repair
on shutdown.
Fixes#12429
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that all exceptions in the rest of the function
are swallowed, just execute the stop_status_checker
deferred action serially before returning, on the
wau to coroutinizing restore_replica_count (since
we can't co_await status_checker inside the deferred
action).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
On the way to coroutinizing restore_replica_count,
extract awaiting stream_async and send_replication_notification
into a try/catch blocks so we can later undefer stop_status_checker.
The exception is still returned as an exceptional future
which is logged by the caller as warning.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
There is no need to start a thread for the status_checker
and can be implemented using a background coroutine.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Timouts are benign, especially on a read-ahead that turned out to be not
needed at all. They just introduce noise in the logs, so silence them.
Fixes: #12435Closes#12441
raft_group0 used to register RPC verbs only on shard 0. This worked on
clusters with the same --smp setting on all nodes, since RPCs in this
case are processed on the same shard as the calling code, and
raft_group0 methods only run on shard 0.
A new test test_nodes_with_different_smp was added to identify the
problem. Since --smp can only be specified via the command line, a
corresponding parameter was added to the ManagerClient.server_add
method. It allows to override the default parameters set by the
SCYLLA_CMDLINE_OPTIONS variable by changing, adding or deleting
individual items.
Fixes: #12252Closes#12374
* github.com:scylladb/scylladb:
raft: raft_group0, register RPC verbs on all shards
raft: raft_append_entries, copy entries to the target shard
test.py, allow to specify the node's command line in test
Due to lack of NDEBUG macro inlining was disabled. It's
important for parsing and printing performance.
Testing with perf_simple_query shows that it reduced around
7000 insns/op, thus increasing median tps by 4.2% for the alternator frontend.
Because inlined functions are called for every character
in json this scales with request/response size. When
default write size is increased by around 7x (from ~180 to ~ 1255
bytes) then the median tps increased by 12%.
Running:
./build/release/test/perf/perf_simple_query_g --smp 1 \
--alternator forbid --default-log-level error \
--random-seed=1235000092 --duration=60 --write
Results before the patch:
median 46011.50 tps (197.1 allocs/op, 12.1 tasks/op, 170989 insns/op, 0 errors)
median absolute deviation: 296.05
maximum: 46548.07
minimum: 42955.49
Results after the patch:
median 47974.79 tps (197.1 allocs/op, 12.1 tasks/op, 163723 insns/op, 0 errors)
median absolute deviation: 303.06
maximum: 48517.53
minimum: 44083.74
The change affects both json parsing and printing.
Closes#12440
Run tests for parallelized aggregation with
`enable_parallelized_aggregation` set always to true, so the tests work
even if the default value of the option is false.
Closes#12409
* seastar 3db15b5681...ca586cfb8d (28):
> reactor: trim returned buffer to received number of bytes
> util/process: include used header
> build: drop unused target_include_directories()
> build: use BUILD_IN_SOURCE instead chdir <SOURCE_DIR>
> build: specify CMake policy CMP0135 to new
> tests: only destroy allocated pending connections
> build: silence the output when generating private keys
> tests, httpd: Limit loopback connection factory sharding
> lw_shared_ptr: Add nullptr_t comparing operators
> noncopyable_function: Add concept for (Func func) constructor
> reactor: add process::terminate() and process::kill()
> Merge 'tests, include: include headers without ".." in path' from Kefu Chai
> build: customize toolset for building Boost
> build: use different toolset base on specified compiler
> allocator: add an option to reserve additional memory for the OS
> Merge 'build: pass cflags and ldflags to cooking.sh' from Kefu Chai
> build: build static library of cryptopp
> gate: add gate holders debugging
> build: detect debug build of yaml-cpp also
> build: do not use pkg_search_module(IMPORTED_TARGET) for finding yaml-cpp
> build: bump yaml-cpp to 0.7.0 in cooking_recipe
> build: bump cryptopp to 8.7.0 in cooking_recipe
> build: bump boost to 1.81.0 in cooking_recipe
> build: bump fmtlib to 9.1.0 in cooking_recipe
> shared_ptr: add overloads for fmt::ptr()
> chunked_fifo: const_iterator: use the base class ctor
> build: s/URING_LIBARIES/URING_LIBRARIES/
> build: export the full path of uring with URING_LIBRARIES
Closes#12434
test_ssl
In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to a more reasonable time.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#12408
The alternator compatibility.md document mentions the missing ACL
(access control) feature, but unlike other missing features we
forgot to link to the open issue about this missing feature.
So let's add that link.
Refs #5047.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12399
Fixes https://github.com/scylladb/scylladb/issues/12318
This PR removes all occurrences of the `auto_bootstrap` option in the docs.
In most cases, I've simply removed the option name and its definition, but sometimes additional changes were necessary:
- In node-joined-without-any-data.rst, I removed the `auto_bootstrap `option as one of the causes of the problem.
- In rebuild-node.rst, I removed the first step in the procedure (enabling the `auto_bootstrap `option).
- In admin. rst, I removed the section about manual bootstrapping - it's based on setting `auto_bootstrap` to false, which is not possible now.
Closes#12419
* github.com:scylladb/scylladb:
docs: remove the auto_bootstrap option from the admin procedures - involves removing the Manual Bootstraping section
docs: remove the auto_bootstrap option from the procedure to replace a dead node
docs: remove the auto_bootstrap option from the Troubleshooting article about a node joining with no data
docs: remove the auto_bootstrap option from the procedure to rebuild a node after losing the data volume
docs: remove the auto_bootstrap option from the procedures to create a cluster or add a DC
raft_group0 used to register RPC verbs only on shard 0.
This worked on clusters with the same --smp setting on
all nodes, since RPCs in this case are (usually)
processed on the same shard as the calling code,
and raft_group0 methods only run on shard 0.
A new test test_nodes_with_different_smp was added
to identify the problem.
Fixes: #12252
If append_entries RPC was received on a non-zero shard, we may
need to pass it to a zero (or, potentially, some other) shard.
The problem is that raft::append_request contains entries in the form
of raft::log_entry_ptr == lw_shared_ptr<log_entry>, which doesn't
support cross-shard reference counting. In debug mode it contains
a special ref-counting facility debug_shared_ptr_counter_type,
which resorts to on_internal_error if it detects such a case.
To solve this, we just copy log entries to the target shard if it
isn't equal to the current one. In most cases, if --smp setting
is the same on all nodes, RPC will be handled on zero shard,
so there will be no overhead.
An optional parameter cmdline has been added to
the ManagerClient.server_add method.
It allows you to override the default parameters
set by the SCYLLA_CMDLINE_OPTIONS variable
by changing, adding or deleting individual
items. To change or add a parameter just specify
its name and value one after the other.
To remove parameter use the special keyword
__remove__ as a value. To set a parameter
without a value (such as --overprovisioned)
use the special keyword __missing__ as the value.
Add to test/cql-pytest/README.md an explanation of the philosophy
of the cql-pytest test suite, and some guideliness on how to write
good tests in that framework.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12400
Unlike other experimental feature we want to raft to be optional even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.
The PR introduces changes to task manager api:
- extends tasks' list returned with get_tasks with task type,
keyspace, table, entity, and sequence number
- extends status returned with get_task_status and wait_task
with a list of children's ids
Closes#12338
* github.com:scylladb/scylladb:
api: extend status in task manager api
api: extend get_tasks in task manager api
Fixes https://github.com/scylladb/scylladb/issues/11999.
This PR adds a description of scylla-api-cli.
Closes#12392
* github.com:scylladb/scylladb:
docs: fix the description of the system log POST example
docs: uptate the curl tool name
docs: describe how to use the scylla-api-client tool
docs: fix the scylla-api-client tool name
docs: document scylla-api-cli
When closing _lower_bound and *_upper_bound
in the final close() call, they are currently left with
an engaged current_list member.
If the index_reader uses a _local_index_cache,
it is evicted with evict_gently which will, rightfully,
see the respective pages as referenced, and they won't be
evicted gently (only later when the index_reader is destroyed).
Reset index_bound.current_list on close(index_bound&)
to free up the reference.
Ref #12271
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#12370
Since [repair: Always use run_replace_ops](2ec1f719de), nodes no longer publish HIBERNATE state so we don't need to support handling it.
Replace is now always done using node operations (using repair or streaming).
so nodes are never expected to change status to HIBERNATE.
Therefore storage_service:handle_state_replacing is not needed anymore.
This series gets rid of it and updates documentation related to STATUS:HIBERNATE respectively.
Fixes#12330Closes#12349
* github.com:scylladb/scylladb:
docs: replace-dead-node: get rid of hibernate status
storage_service: get rid of handle_state_replacing