According to seastar/doc/lambda-coroutine-fiasco.md lambda that
co_awaits once loses its capture frame. In distrobuted_loader
code there's at least one of that kind.
fixes: #12175
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#12170
The method became unused since 70e5252a (table: no longer accept online
loading of SSTable files in the main directory) and the whole concept of
reshuffling sstables was dropped later by 7351db7c (Reshape upload files
and reshard+reshape at boot).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#12165
In a recent commit 757d2a4, we removed the "xfail" mark from the test
test_manual_requests.py::test_too_large_request_content_length
because it started to pass on more modern versions of Python, with a
urllib3 bug fixed.
Unfortunately, the celebration was premature: It turns out that although
the test now *usually* passes, it sometimes fails. This is caused by
a Seastar bug scylladb/seastar#1325, which I opened #12166 to track
in this project. So unfortunately we need to add the "xfail" mark back
to this test.
Note that although the test will now be marked "xfail", it will actually
pass most of the time, so will appear as "xpass" to people run it.
I put a note in the xfail reason string as a reminder why this is
happening.
Fixes#12143
Refs #12166
Refs scylladb/seastar#1325
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12169
The field was not used for anything. We can keep decommissioned server
in `stopped` field.
In fact it caused us a problem: since recently, we're using
`ScyllaCluster.uninstall` to clean-up servers after test suite finishes
(previously we were using `ScyllaServer.uninstall` directly). But
`ScyllaCluster.uninstall` didn't look into the `decommissioned` field,
so if a server got decommissioned, we wouldn't uninstall it, and it left
us some unnecessary artifacts even for successful tests. This is now
fixed.
Closes#12163
Mainly this PR removes global db::config and feature service that are used by sstables::test_env as dependencies for embedded sstables_manager. Other than that -- drop unused methods, remove nested test_env-s and relax few cases that use two temp dirs at a time for no gain.
Closes#12155
* github.com:scylladb/scylladb:
test, utils: Use only one tempdir
sstable_compaction_test: Dont create nested envs
mutation_reader_test: Remove unused create_sstable() helper
tests, lib: Move globals onto sstables::test_env
tests: Use sstables::test_env.db_config() to access config
features: Mark feature_config_from_db_config const
sstable_3_x_test: Use env method to create sst
sstable_3_x_test: Indentation fix after previous patch
sstable_3_x_test: Use sstable::test_env
test: Add config to sstable::test_env creation
config: Add constexpr value for default murmur ignore bits
There's a do_with_cloned_tmp_directory that makes two temp dirs to toss
sstables between them. Make it go with just one, all the more so it
would resemble existing manipulations aroung staging/ subdir
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The "compact" test case runs in sstables::test_env and additionally
wraps it with another instance provided by do_with_tmp_directory helper.
It's simpler to create the temp dir by hand and use outter env.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a bunch of objects that are used by test_env as sstables_manager
dependencies. Now when no other code needs those globals they better sit
on the test_env next to the manager
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently some places use global test config, but it's going to be
removed soon, so switch to using config from environment
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's in fact such. Other than that, next patch will call it with const
config at hand and fail to compile without this fix
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are several cases there that construct sstables_manager by hand
with the help of a bunch of global dependencies. It's nicer to use
existing wrapper.
(indentation left broken until next patch)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
To make callers (tests) construct it with different options. In
particular, one test will soon want to construct it with custom large
data handler of its own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
... and use in some places of sstable_compaction_test. This will allow
getting rid of global test_db_config thing later
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The PR introduces shard_repair_task_impl which represents a repair task
that spans over a single shard repair.
repair_info is replaced with shard_repair_task_impl, since both serve
similar purpose.
Closes#12066
* github.com:scylladb/scylladb:
repair: reindent
repair: replace repair_info with shard_repair_task_impl
repair: move repair_info methods to shard_repair_task_impl
repair: rename methods of repair_module
repair: change type of repair_module::_repairs
repair: keep a reference to shard_repair_task_impl in row_level_repair
repair: move repair_range method to shard_repair_task_impl
repair: make do_repair_ranges a method of shard_repair_task_impl
repair: copy repair_info methods to shard_repair_task_impl
repair: corutinize shard task creation
repair: define run for shard_repair_task_impl
repair: add shard_repair_task_impl
This is the core of dynamic IP address support in Raft, moving out the
IP address sourcing from Raft Group 0 configuration to gossip. At start
of Raft, the raft id <> IP address translation map is tuned into the
gossiper notifications and learns IP addresses of Raft hosts from them.
The series intentionally doesn't contain the part which speeds up the
initial cluster assembly by persisting the translation cache and using
more sources besides gossip (discovery, RPC) to show correctness of the
approach.
Closes#12035
* github.com:scylladb/scylladb:
raft: (rpc) do not throw in case of a missing IP address in RPC
raft: (address map) actively maintain ip <-> raft server id map
Since we switched scylla-machine-image locale to C.UTF-8 because
ubuntu-minimal image does not have en_US.UTF-8 by default, we should
do same on our docker image to reduce image size.
Verified #9570 does not occur on new image, since it is still UTF-8
locale.
Closes#12122
The site member is created in ScyllaCluster.start(), for startup failure
this might not be initialized, so check it's present before stop()ing
it. And delete it as it's not running and proper initialization should
call ScyllaCluster.start().
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#11939
A question was raised on what fetch_size (the requested page size
in a paged scan) counts when there is a filter: does it count the
rows before filtering (as scanned from disk) or after filter (as
will be returned to the client)?
This patch adds a test which demonstrates that Cassandra and Scylla
behave differently in this respect: Cassandra counts post-filtering -
so fetch_size results are actually returned, while Scylla currently
counts pre-filtering.
It is arguable which behavior is the "correct" one - we discuss this in
issue #12102. But we have already had several users (such as #11340)
who complained about Scylla's behavior and expected Cassandra's behavior,
so if we decide to keep Scylla's behavior we should at least explain and
justify this decision in our documentation. Until then, let's have this
test which reminds us of this incompatibility. This test currently passes
on Cassandra and fails (xfail) on Scylla.
Refs #11340
Refs #12102
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12103
This patch adds a regression test for the old issue #65 which is about
a multi-column (tuple) clustering-column relation in a SELECT when one
these columns has reversed order. It turns out that we didn't notice,
but this issue was already solved - but we didn't have a regression test
for it. So this patch adds just a regression test. The test confirms that
Scylla now behaves like was desired when that issue was opened. The test
also passes on Cassandra, confirming that Scylla and Cassandra behave
the same for such requests.
Fixes#65
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12130
Better logging, less code, a minor fix.
Closes#12135
* github.com:scylladb/scylladb:
service/raft: raft_group0: less repetitive logging calls
service/raft: raft_group0: fix sleep_with_exponential_backoff
Remove raft_address_map::get_inet_address()
While at it, coroutinize some rpc mehtods.
To propagate up the event of missing IP address, use coroutine::exception(
with a proper type (raft::transport_error) and a proper error message.
This is a building block from removing
raft_address_map::get_inet_address() which is too generic, and shifting
the responsibility of handling missing addresses to the address map
clients. E.g. one-way RPC shouldn't throw if an address is missing, but
just drop the message.
PS An attempt to use a single template function rendered to be too
complex:
- some functions require a gate, some don't
- some return void, some future<> and some future<raft::data_type>
1) make address map API flexible
Before this patch:
- having a mapping without an actual IP address was an
internal error
- not having a mapping for an IP address was an internal
error
- re-mapping to a new IP address wasn't allowed
After this patch:
- the address map may contain a mapping
without an actual IP address, and the caller must be prepared for it:
find() will return a nullopt. This happens when we first add an entry
to Raft configuration and only later learn its IP address, e.g. via
gossip.
- it is allowed to re-map an existing entry to a new address;
2) subscribe to gossip notifications
Learning IP addresses from gossip allows us to adjust
the address map whenever a node IP address changes.
Gossiper is also the only valid source of re-mapping, other sources
(RPC) should not re-map, since otherwise a packet from a removed
server can remap the id to a wrong address and impact liveness of a Raft
cluster.
3) prompt address map state with app state
Initialize the raft address map with initial
gossip application state, specifically IPs of members
of the cluster. With this, we no longer need to store
these IPs in Raft configuration (and update them when they change).
The obvious drawback of this approach is that a node
may join Raft config before it propagates its IP address
to the cluster via gossip - so the boot process has to
wait until it happens.
Gossip also doesn't tell us which IPs are members of Raft configuration,
so we subscribe to Group0 configuration changes to mark the
members of Raft config "non-expiring" in the address translation
map.
Thanks to the changes above, Raft configuration no longer
stores IP addresses.
We still keep the 'server_info' column in the raft_config system table,
in case we change our mind or decide to store something else in there.
Some log messages in retry loops in the Raft upgrade procedure included
a sentence like "sleeping before retrying..."; but not all of them.
With the recently added `sleep_with_exponential_backoff` abstraction we
can put this "sleeping..." message in a single place, and it's also easy
to say how long we're going to sleep.
I also enjoy using this `source_location` thing.
The SELECT JSON statement, just like SELECT, allows the user to rename
selected columns using an "AS" specification. E.g., "SELECT JSON v AS foo".
This specification was not honored: We simply forgot to look at the
alias in SELECT JSON's implementation (we did it correctly in regular
SELECT). So this patch fixes this bug.
We had two tests in cassandra_tests/validation/entities/json_test.py
that reproduced this bug. The checks in those tests now pass, but these
two tests still continue to fail after this patch because of two other
unrelated bugs that were discovered by the same tests. So in this patch
I also add a new test just for this specific issue - to serve as a
regression test.
Fixes#8078
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12123
* seastar 4f4cc00660...3a5db04197 (16):
> tls: add missing include <map>
> Merge 'util/process: use then_unpack to help automatically unpack tuple.' from Jianyong Chen
> HTTP: define formatter for status_type to fix build.
> fsnotifier: move it into namespace experimental and add docs.
> Move fsnotify.hh to the 'include' directory for public use.
> Merge 'reactor: define make_pipe() and use make_pipe() in reactor::spawn()' from Kefu Chai
> Merge 'Fix: error when compiling http_client_demo' from Amossss
> util/process: using `data_sink_impl::put`
> Merge 'dns: serialize UDP sends.' from Calle Wilund
> build: use correct version when finding liburing
> Merge 'Add simple http client' from Pavel Emelyanov
> future: use invoke_result instead of nested requirements
> Merge 'reactor: use separate calls in reactor and reactor_backend for read/write/sendmsg/recvmsg' from Kefu Chai
> util, core: add spawn_process() helper
> parallel utils: add note about shard-local parallelism
> shared_mutex: return typed exceptional future in with_* error handlers
Closes#12131
Some of the tests in test/alternator/test_ttl.py need an expiration scan
pass to complete and expire items. In development builds on developer
machines, this usually takes less than a second (our scanning period is
set to half a second). However, in debug builds on Jenkins each scan
often takes up to 100 (!) seconds (this is the record we've seen so far).
This is why we set the tests' timeout to 120.
But recently we saw another test run failing. I think the problem is
that in some case, we need not one, but *two* scanning passes to
complete before the timeout: It is possible that the test writes an
item right after the current scan passed it, so it doesn't get expired,
and then we a second scan at a random position, possibly making that
item we mention one of the last items to be considered - so in total
we need to wait for two scanning periods, not one, for the item to
expire.
So this patch increases the timeout from 120 seconds to 240 seconds -
more than twice the highest scanning time we ever saw (100 seconds).
Note that this timeout is just a timeout, it's not the typical test
run time: The test can finish much more quickly, as little as one
second, if items expire quickly on a fast build and machine.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12106
Fix some issues found with gcc 12. Note we can't fully compile with gcc yet, due to [1].
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056Closes#12121
* github.com:scylladb/scylladb:
utils: observer: qualify seastar::noncopyable_function
sstables: generation_type: forgo constexpr on hash of generation_type
logalloc: disambiguate types and non-type members
task_manager: disambiguate types and non-type members
direct_failure_detector: don't change meaning of endpoint_liveness
schema: abort on illegal per column computation kind
database: abort on illegal per partition rate limit operation
mutation_fragment: abort on illegal fragment type
per_partition_rate_limit_options: abort on illegal operation type
schema: drop unused lambda
mutation_partition: drop unused lambda
cql3: create_index_statement: remove unused lambda
transport: prevent signed and unsigned comparison
database: don't compare signed and unsigned types
raft: don't compare signed and unsigned types
compaction: don't compare signed and unsigned compaction counts
bytes_ostream: don't take reference to packed variable
Test identifiers are very unique, but this makes them less
useful in Jenkins Test Result Analyzer view. For example,
counter_test can be counter_test.432 in one run and counter_test.442
in another. Jenkins considers them different and so we don't see
a trend.
Limit the id uniqueness within a test case, so that we'll have
counter_test.{1, 2, 3} consistently. Those test will be grouped
together so we can see pass/fail trends.
Closes#11946
Fix https://github.com/scylladb/scylla-docs/issues/4126Closes#11122
* github.com:scylladb/scylladb:
doc: add info about the time-consuming step due to resharding
doc: add the new KB to the toctree
doc: doc: add a KB about updating the mode in perftune.yaml after upgrade
Our `null` expression, after the prepare stage, is redundant with a
`constant` expression containing the value NULL.
Remove it. Its role in the unprepared stage is taken over by
untyped_constant, which gains a new type_class enumeration to
represent it.
Some subtleties:
- Usually, handling of null and untyped_constant, or null and constant
was the same, so they are just folded into each other
- LWT "like" operator now has to discriminate between a literal
string and a literal NULL
- prepare and test_assignment were folded into the corresponing
untyped_constant functions. Some care had to be taken to preserve
error messages.
Closes#12118
Currently, each data sync repair task is started (and hence run) twice.
Thus, when two running operations happen within a time frame long
enough, the following situation may occur:
- the first run finishes
- after some time (ttl) the task is unregistered from the task manager
- the second run finishes and attempts to finish the task which does
not exist anymore
- memory access causes a segfault.
The second call to start is deleted. A check is added
to the start method to ensure that each task is started at most once.
Fixes: #12089Closes#12090
In ad3d2ee47d, we replaced `bool` as an expression element
(representing a boolean constant) with `constant`. But a comment
and a concept continue to mention it.
Remove the comment and the concept fragment.
Closes#12119
Recent changes in topology restricted the get_dc/get_rack calls. Older
code was trying to locate the endpoint in gossiper, then in system
keyspace cache and if the endpoint was not found in both -- returned
"default" location.
New code generates internal error in this case. This approach already
helped to spot several BUGs in code that had been eventually fixed, but
echoes of that change still pop up.
This patch relaxes the "missing endpoint" case by printing a warning in
logs and returning back the "default" location like old code did.
tests: update_cluster_layout_tests.py::*
hintedhandoff_additional_test.py::TestHintedHandoff::test_hintedhandoff_rebalance
bootstrap_test.py::TestBootstrap::test_decommissioned_wiped_node_can_join
bootstrap_test.py::TestBootstrap::test_failed_bootstap_wiped_node_can_join
materialized_views_test.py::TestMaterializedViews::test_decommission_node_during_mv_insert_4_nodes
refs: #11900
refs: #12054fixes: #11870
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#12067
std::hash isn't constexpr, so gcc refuses to make hash of generation_type
constexpr. It's pointless anyway since we never have a compile-time
sstable generation.
logalloc::tracker has some members with the same names as types from
namespace scope. gcc (rightfully) complains that this changes
the meaning of the name. Qualify the types to disambiguate.
task_manager has some members with the same names as types from
namespace scope. gcc (rightfully) complains that this changes
the meaning of the name. Qualify the types to disambiguate.