The option accepts taskset-style cpulist and limits the launched tests
respectively. When specified, the default number of jobs is adjusted
accordingly, if --jobs is given it overrides this "default" as expected.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patch will need to know if the --jobs option was specified or the
caller is OK with the default. One way to achieve it is to keep 0 as the
default and set the default value afterwards.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
shared_promise::get_shared_future() is marked noexcept, but can
allocate memory. It is invoked by sstable partition index cache inside
an allocating section, which means that allocations can throw
bad_alloc even though there is memory to reclaim, so under normal
conditions.
Fix by allocating the shared_promise in a stable memory, in the
standard allocator via lw_shared_ptr<>, so that it can be accessed outside
allocating section.
Fixes#9666
Tests:
- build/dev/test/boost/sstable_partition_index_cache_test
Message-Id: <20211122165100.1606854-1-tgrabiec@scylladb.com>
Indexed queries are using paging over the materialized view
table. Results of the view read are then used to issue reads of the
base table. If base table reads are short reads, the page is returned
to the user and paging state is adjusted accordingly so that when
paging is resumed it will query the view starting from the row
corresponding to the next row in the base which was not yet
returned. However, paging state's "remaining" count was not reset, so
if the view read was exhausted the reading will stop even though the
base table read was short.
Fix by restoring the "remaining" count when adjusting the paging state
on short read.
Tests:
- index_with_paging_test
- secondary_index_test
Fixes#9198
Message-Id: <20210818131840.1160267-1-tgrabiec@scylladb.com>
First, it doesn't test the gossiper so
it's unclear why have it at all.
And it doesn't test anything more than what we test
using the cql_test_env either.
For testing gossip there is test/manual/gossip.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211122081305.789375-2-bhalevy@scylladb.com>
This series contains fixes for non voting members handling for stepdown
and stable leader check.
* scylla-dev/raft-stepdown-fixes-v2:
raft: handle non voting members correctly in stepdown procedure
raft: exclude non voting nodes from the stable leader check
raft: fix configuration::can_vote() to worth correctly with joint config
To avoid back-calling the system_keyspace from the messaging layer
let the system_keyspace get the preferred ips vector and pass it
down to the messaging_service.
This is part of the effort to deglobalize the system keyspace
and query context.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211119143523.3424773-1-bhalevy@scylladb.com>
The patch also removes the usage of map_reduce() because it is no longer needed
after 6191fd7701 that drops futures from the view mutation building path.
The patch preserves yielding point that map_reduce() provides though by
calling to coroutine::maybe_yield() explicitly.
Message-Id: <YZoV3GzJsxR9AZfl@scylladb.com>
"
After this series, compaction will finally stop including database.hh.
tests: unit(debug).
"
* 'stop_including_database_hh_for_compaction' of github.com:raphaelsc/scylla:
compaction: stop including database.hh
compaction: switch to table_state in get_fully_expired_sstables()
compaction: switch to table_state
compaction: table_state: Add missing methods required by compaction
Make compaction procedure switch to table_state. Only function in
compaction.cc still directly using table is
get_fully_expired_sstables(T,...), but subsequently we'll make it
switch to table_state and then we can finally stop including database.hh
in the compaction code.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
These are the only methods left for compaction to switch to
table_state, so compaction can finally stop including database.hh
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
"
Add a sharded locator::effective_replication_map_factory that holds
shared effective_replication_maps.
To search for e_r_m in the factory, we use a compound `factory_key`:
<replication_strategy type, replication_strategy options, token_metadata ring version>.
Start the sharded factory in main (plus cql_test_env and tools/schema_loader)
and pass a reference to it to storage_proxy and storage_server.
For each keyspace, use the registry to create the effective_replication_map.
When registered, effective_replication_map objects erase themselves
from the factory when destroyed. effective_replication_map then schedules
a background task to clear_gently its contents, protected by the e_r_m_f::stop()
function.
Note that for non-shard 0 instances, if the map
is not found in the registry, we construct it
by cloning the precalculated replication_map
from shard 0 to save the cpu cycles of re-calculating
it time and again on every shard.
Test: unit(dev), schema_loader_test(debug)
DTest: bootstrap_test.py:TestBootstrap.decommissioned_wiped_node_can_join_test update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_with_repair_test (dev)
"
* tag 'effective_replication_map_factory-v7' of https://github.com/bhalevy/scylla:
effective_replication_map: clear_gently when destroyed
database: shutdown keyspaces
test: cql_test_env: stop view_update_generator before database shuts down
effective_replication_map_factory: try cloning replication map from shard 0
tools: schema_loader: start a sharded erm_factory
storage_service: use erm_factory to create effective_replication_map
keyspace: use erm_factory to create effective_replication_map
effective_replication_map: erase from factory when destroyed
effective_replication_map_factory: add create_effective_replication_map
effective_replication_map: enable_lw_shared_from_this
effective_replication_map: define factory_key
keyspace: get a reference to the erm_factory
main: pass erm_factory to storage_service
main: pass erm_factory to storage_proxy
locator: add effective_replication_map_factory
Turns out most of regular writer can be reused by GC writer, so let's
merge the latter into the former. We gain a lot of simplification,
lots of duplication is removed, and additionally, GC writer can now
be enabled with interposer as it can be created on demand by
each interposer consumer (will be done in a later patch).
Refs #6472.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211119120841.164317-1-raphaelsc@scylladb.com>
Prevent reactor stalls by gently clearing the replication_map
and token_metadata_ptr when the effective_replication_map is
destroyed.
This is done in the background, protected by the
effective_replication_map_factory::stop() method.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
release the keyspace effective_replication_map during
shutdown so that effective_replication_map_factory
can be stopped cleanly with no outstanding e_r_m:s.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We can't have view updates happening after the database shuts down.
In particular, mutateMV depends on the keyspace effective_replaication_map
and it is going to be released when all keyspaces shut down, in the next patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Calculating a new effective_replication_map on each shard
is expensive. To try to save that, use the factory key to
look up an e_r_m on shard 0 and if found, use to to clone
its replication map and use that to make the shard-local
e_r_m copy.
In the future, we may want to improve that in 2 ways:
- instead of always going to shard 0, use hash(key) % smp::count
to create the first copy.
- make full copies only on NUMA nodes and keep a shared pointer
on all other shards.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This is required for an upcoming change to create effective_replication_map
on all shards in storage_service::replication_to_all_cores.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Instead of calculating the effective_replication_map
in replicate_to_all_cores, use effective_replication_map_factory::
create_effective_replication_map.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The effective_replication_map_factory keeps nakes pointers
to outstanding effective_replication_map:s.
These are kept valid using a shared effective_replication_map_ptr.
When the last shared ptr reference is dropped the effective_replication_map
object is destroyed, therefore the raw pointer to it in the factory
must be erased.
This now happens in ~effective_replication_map when the object
is marked as registered.
Registration happens when effective_replication_map_factory inserts
the newly created effective_replication_map to its _replication_maps
map, and the factory calles effective_replication_map::set_factory..
Note that effective_replication_map may be created temporarily
and not be inserted to the factory's map, therefore erase
is called only when required.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Make a factory key using the replication_strategy type
and config options, plus the token_metadata ring version
and use it to search an already-registred effective_replication_map.
If not found, calculate a new create_effective_replication_map
and register it using the above key.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So a effective_replication_map_ptr can be generated
using a raw pointer by effective_replication_map_factory.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To be used to locate the effective_replication_map
in the to-be-introduced effective_replication_map_factory.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To be used for creating effective_replication_map
when token_metadata changes, and update all
keyspaces with it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It will be used further to create shared copies
of effective_replication_map based on replication_strategy
type and config options.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Similar to other timeout handling paths, there is no need to print an
ERROR for timeout as the error is not returned anyhow.
Eventually the error will be reported at the query level
when the query times out or fails in any other way.
Also, similar to `storage_proxy::mutate_end`, traces were added
also for the error cases.
FWIW, these extraneous timeout error causes dtest failures.
E.g. alternator_tests:AlternatorTest.test_slow_query_logging
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211118153603.2975509-1-bhalevy@scylladb.com>
We're using a coarse resolution when rounding clock time for sstables to
be evenly distributed across time buckets. Let's use a better resolution,
to make sure sstables won't fall into the edges.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211118172126.34545-1-raphaelsc@scylladb.com>
The test checks every 100 * smp::count milliseconds that a shard
had been able to make at least once step. Shards, in turn, take up
to 100 ms sleeping breaks between steps. It seems like on heavily
loaded nodes the checking period is too small and the test
stuck-detector shoots false-positives.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211118154932.25859-1-xemul@scylladb.com>
I intentionally store lambdas in variables and pass them to
with_scheduling_group using std::ref. Coroutines don't put variables
captured by lambdas on stack frame. If the lambda containing them is not
stored, the captured variables will be lost, resulting in stack/heap use
after free errors. An alternative is to capture variables, then create
local variables inside lambda bodies that contain a copy/moved version
of the captured ones. For example, if the post_flush lambda wasn't
stored in a dedicated variable, then it wouldn't be put on the coroutine
frame. At the first co_await inside of it, the lambda object along with
variables captured by it (old and &newtabs created inside square
brackets) would go away. The underlying objects (e.g. newtabs created in
the outer scope) would still be valid, but the reference to it would be
gone, causing most of the tests to fail.
Message-Id: <20211118131441.215628-2-mikolaj.sieluzycki@scylladb.com>
The previous implementation based on `delivery_queue` had a serious
defect: if receiving a message (`rpc::receive`) blocked, other messages
in the queue had to wait. This would cause, for example, `vote_request`
messages to stop being handled by a server if the server was in the middle
of applying a snapshot.
Now `rpc::receive` returns `void`, not `future<>`. Thus we no longer
need `delivery_queue`: the network message delivery function can simply
call `rpc::receive` directly. Messages which require asynchronous work
to be performed (such as snapshot application) are handled in
`rpc::receive` by spawning a background task. The number of such
background tasks is limited separately for each message type; now if
we exceed that limit, we drop other messages of this type (previously
they would queue up indefinitely and block not only other messages
of this type but different types as well).
Message-Id: <20211116163316.129970-1-kbraun@scylladb.com>
In early versions of the series which proposed protocol servers, the
interface had two methods answering pretty much the same question of
whether the server is running or not:
* listen_addresses(): empty list -> server not running
* is_server_running()
To reduce redundancy and to avoid possible inconsistencies between the
two methods, `is_server_running()` was scrapped, but re-added by a
follow-up patch because `listen_addresses()` proved to be unreliable as
a source for whether the server is running or not.
This patch restores the previous state of having only
`listen_addresses()` with two additional changes:
* rephrase the comment on `listen_addresses()` to make it clear that
implementations must return empty list when the server is not running;
* those implementations that have a reliable source of whether the
server is running or not, use it to force-return an empty list when
the server is not running
Tests: dtest(nodetool_additional_test.py)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211117062539.16932-1-bdenes@scylladb.com>
For leader stepdown purposes a non voting member is not different
from a node outside of the config. The patch makes relevant code paths
to check for both conditions.
If a node is a non voting member it cannot be a leader, so the stable
leader rule should not be applied to it. This patch aligns non voting
node behaviour with a node that was removed from the cluster. Both of
them stepdown from leader position if they happen to be a leader when
the state change occurred.
Make it more robust by tracking both partial and sealed sstables.
This way, maybe_r__e__s__by_sst() won't pick partial sstables as
part of incremental compaction. It works today because interposer
consumer isn't enabled with incremental compaction, so there's
a single consumer which will have sealed the sstable before
the function for early replacement is called, but the story is
different if both is enabled.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211117135817.16274-1-raphaelsc@scylladb.com>
fmt 8 checks format strings at compile time, and requires that
non-compile-time format strings be wrapped with fmt::runtime().
Do that, and to allow coexistence with fmt 7, supply our own
do-nothing version of fmt::runtime() if needed. Strictly speaking
we shouldn't be introducing names into the fmt namespace, but this
is transitional only.
Closes#9640