Commit Graph

29005 Commits

Author SHA1 Message Date
Benny Halevy
0a33762fb1 compaction_manager: add compaction_state when table is constructed
With that, it is always expected that _compaction_state[cf]
exists when compaction jobs are submnitted.

Otherwise, throw std::out_of_range exception.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
29dd24ab46 compaction_manager: remove: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
46ac139490 compaction_manager: remove: detach compaction_state before stopping ongoing compactions
So that the compaction_state won't be found from this point on,
while stopping the ongoing compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
75a2509b07 compaction_manager: remove: serialize stop_ongoing_compactions and gate.close
Now that compaction tasks enter the compaction_state gate there is
no point in stopping ongoing compaction in parallel to closing the gate.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
3940ffb085 compaction_manager: task: keep a reference on compaction_state
And hold its gate to make sure the compaction_state outlives
the task and can be used to wait on all tasks and functions
using it.

With that, doing access _compaction_state[cf] to acquire
shared/exclusive locks but rather get to it via
task->compaction_state so it can be detached from
_compaction_state while task is running, if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
f482d8f377 test: sstable_compaction_test: incremental_compaction_data_resurrection_test: stop table before it's destroyed.
It must remove itself from the compaction_manager,
that will stop_ongoing_compactions.

Without that we're hitting
```
sstable_compaction_test: ./seastar/include/seastar/core/gate.hh:56: seastar::gate::~gate(): Assertion `!_count && "gate destroyed with outstanding requests"' failed.
```

when destroying the compaction_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
3955829286 test: sstable_utils: compact_sstables: deregister compaction also on error path
We need to call deregister_compaction(cdata) also if
compact_sstables failed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:39:10 +02:00
Benny Halevy
5fb66ecd03 test: sstable_compaction_test: partial_sstable_run_filtered_out_test: deregiser_compaction also on error path
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:09:40 +02:00
Benny Halevy
8d7909de83 test: compaction_manager_test: add debug logging to register/deregister compaction
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:09:40 +02:00
Benny Halevy
ca97c919eb test: compaction_manager_test: deregister_compaction: erase by iterator
No need to search for the task again in the list.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:09:40 +02:00
Benny Halevy
5d6ea651d7 test: compaction_manager_test: move methods out of line
No need for them to be inlined in the sstable_utils.hh.

While at it, mark constructor noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:09:40 +02:00
Benny Halevy
e7ab1f8581 compaction_manager: compaction_state: use counter for compaction_disabled
We'd like to use compaction_state::gate both for functions
running with compaction disabled and for and tasks referring
to the compaction_state so that stop_ongoing_compactions
could wait on all functions referring to the state structure.

This is also cleaner with respect to not relying on
gate::use_count() when re-submitting regular compaction
when compaction is re-enabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:08:42 +02:00
Benny Halevy
3268c94e72 compaction_manager: task: delete move and copy constructors
We use a lw_shared_ptr<task> everywhere.
So prevent moving or copying task objects.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:00:18 +02:00
Benny Halevy
0cc6060552 compaction_manager: add per-task debug log messages
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:00:18 +02:00
Benny Halevy
1d8d472028 compaction_manager: stop_ongoing_compactions: log number of tasks to stop
get_compactions().size() may return 0 while there are
non-zero tasks to stop.

Some tasks may not be marked as `compaction_running` since
they are either:
- postponed (due to compaction manger throttling of regular compaction)
- sleeping before retry.

In both cases we still want to stop them so the log message
should reflect both the number of ongoing compactions
and the actual number of tasks we're stopping.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:00:18 +02:00
Gleb Natapov
e56022a8ba migration_manager: co-routinize announce_column_family_update
The patch also removes the usage of map_reduce() because it is no longer needed
after 6191fd7701 that drops futures from the view mutation building path.
The patch preserves yielding point that map_reduce() provides though by
calling to coroutine::maybe_yield() explicitly.

Message-Id: <YZoV3GzJsxR9AZfl@scylladb.com>
2021-11-22 10:48:25 +02:00
Benny Halevy
599ed69023 repair_service: do_decommission_removenode_with_repair: maybe yield
everywhere_replication_strategy::calculate_natural_endpoints
is synchronous and doesn't yield, so add maybe_yield() calls
when looping over many token ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211121090339.3955278-1-bhalevy@scylladb.com>
Message-Id: <20211121102606.76700-2-bhalevy@scylladb.com>
2021-11-22 10:48:25 +02:00
Benny Halevy
9d2631daaf token_metadata: calculate_pending_ranges_for_leaving: maybe yield
We see long stalls as reported in
https://github.com/scylladb/scylla/issues/8030#issuecomment-974783526

everywhere_replication_strategy::calculate_natural_endpoints
is synchronous and doesn't yield, so add maybe_yield() calls
when looping over many token ranges.

Refs #8030

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211121090339.3955278-1-bhalevy@scylladb.com>
Message-Id: <20211121102606.76700-1-bhalevy@scylladb.com>
2021-11-22 10:48:25 +02:00
Benny Halevy
df5ccb8884 storage_service: get_changed_ranges_for_leaving: maybe yield
We see long stalls as reported in
https://github.com/scylladb/scylla/issues/8030#issuecomment-974647167

Even before the change to use erm->get_natural_endpoints,
everywhere_replication_strategy::calculate_natural_endpoints
is synchronous and doesn't yield, so add maybe_yield() calls
when looping over all token ranges.

Refs #8030

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211121090339.3955278-1-bhalevy@scylladb.com>
2021-11-21 11:31:56 +02:00
Raphael S. Carvalho
2b2f0eae05 compaction: STCS: kill needless include of database.hh
This is part of work for reducing compilation time and removing
layer violation in compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211120042727.114909-1-raphaelsc@scylladb.com>
2021-11-21 11:28:29 +02:00
Raphael S. Carvalho
8d9704c030 compaction: LCS: kill needless include of database.hh
This is part of work for reducing compilation time and removing
layer violation in compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211120042232.106651-1-raphaelsc@scylladb.com>
2021-11-20 18:28:55 +02:00
Avi Kivity
96e9c3951c Merge "Finally stop including database.hh in compaction.cc" from Raphael
"
After this series, compaction will finally stop including database.hh.

tests: unit(debug).
"

* 'stop_including_database_hh_for_compaction' of github.com:raphaelsc/scylla:
  compaction: stop including database.hh
  compaction: switch to table_state in get_fully_expired_sstables()
  compaction: switch to table_state
  compaction: table_state: Add missing methods required by compaction
2021-11-20 18:28:05 +02:00
Raphael S. Carvalho
06405729ce compaction: stop including database.hh
after switching to table_state, compaction code can finally stop
including database.hh

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:06:03 -03:00
Raphael S. Carvalho
69ab5c9dff compaction: switch to table_state in get_fully_expired_sstables()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:06:02 -03:00
Raphael S. Carvalho
d89edad9fb compaction: switch to table_state
Make compaction procedure switch to table_state. Only function in
compaction.cc still directly using table is
get_fully_expired_sstables(T,...), but subsequently we'll make it
switch to table_state and then we can finally stop including database.hh
in the compaction code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:06:01 -03:00
Raphael S. Carvalho
12137bca73 compaction: table_state: Add missing methods required by compaction
These are the only methods left for compaction to switch to
table_state, so compaction can finally stop including database.hh

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:05:59 -03:00
Avi Kivity
f3d5b2b2b0 Merge "Add effective_replication_map factory" from Benny
"
Add a sharded locator::effective_replication_map_factory that holds
shared effective_replication_maps.

To search for e_r_m in the factory, we use a compound `factory_key`:
<replication_strategy type, replication_strategy options, token_metadata ring version>.

Start the sharded factory in main (plus cql_test_env and tools/schema_loader)
and pass a reference to it to storage_proxy and storage_server.

For each keyspace, use the registry to create the effective_replication_map.

When registered, effective_replication_map objects erase themselves
from the factory when destroyed. effective_replication_map then schedules
a background task to clear_gently its contents, protected by the e_r_m_f::stop()
function.

Note that for non-shard 0 instances, if the map
is not found in the registry, we construct it
by cloning the precalculated replication_map
from shard 0 to save the cpu cycles of re-calculating
it time and again on every shard.

Test: unit(dev), schema_loader_test(debug)
DTest: bootstrap_test.py:TestBootstrap.decommissioned_wiped_node_can_join_test update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_with_repair_test (dev)
"

* tag 'effective_replication_map_factory-v7' of https://github.com/bhalevy/scylla:
  effective_replication_map: clear_gently when destroyed
  database: shutdown keyspaces
  test: cql_test_env: stop view_update_generator before database shuts down
  effective_replication_map_factory: try cloning replication map from shard 0
  tools: schema_loader: start a sharded erm_factory
  storage_service: use erm_factory to create effective_replication_map
  keyspace: use erm_factory to create effective_replication_map
  effective_replication_map: erase from factory when destroyed
  effective_replication_map_factory: add create_effective_replication_map
  effective_replication_map: enable_lw_shared_from_this
  effective_replication_map: define factory_key
  keyspace: get a reference to the erm_factory
  main: pass erm_factory to storage_service
  main: pass erm_factory to storage_proxy
  locator: add effective_replication_map_factory
2021-11-19 18:19:38 +02:00
Raphael S. Carvalho
c94e6f8567 compaction: Merge GC writer into regular compaction writer
Turns out most of regular writer can be reused by GC writer, so let's
merge the latter into the former. We gain a lot of simplification,
lots of duplication is removed, and additionally, GC writer can now
be enabled with interposer as it can be created on demand by
each interposer consumer (will be done in a later patch).

Refs #6472.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211119120841.164317-1-raphaelsc@scylladb.com>
2021-11-19 14:19:50 +02:00
GavinJE
f8c91bdd1e Update debugging.md
Line 7 does not display correctly in reality.
"crashed" appears as "chrashed" on the website.
Bug needs to be fixed.

Closes #9652
2021-11-19 14:21:53 +03:00
GavinJE
22fa7ecf99 Update compaction_controller.md
Line 15.

"ee" changed to "they"

Closes #9651
2021-11-19 14:19:20 +03:00
Benny Halevy
eed3e95704 effective_replication_map: clear_gently when destroyed
Prevent reactor stalls by gently clearing the replication_map
and token_metadata_ptr when the effective_replication_map is
destroyed.

This is done in the background, protected by the
effective_replication_map_factory::stop() method.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
cd0061dcb5 database: shutdown keyspaces
release the keyspace effective_replication_map during
shutdown so that effective_replication_map_factory
can be stopped cleanly with no outstanding e_r_m:s.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
1e259665fe test: cql_test_env: stop view_update_generator before database shuts down
We can't have view updates happening after the database shuts down.
In particular, mutateMV depends on the keyspace effective_replaication_map
and it is going to be released when all keyspaces shut down, in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
866e1b8479 effective_replication_map_factory: try cloning replication map from shard 0
Calculating a new effective_replication_map on each shard
is expensive.  To try to save that, use the factory key to
look up an e_r_m on shard 0 and if found, use to to clone
its replication map and use that to make the shard-local
e_r_m copy.

In the future, we may want to improve that in 2 ways:
- instead of always going to shard 0, use hash(key) % smp::count
to create the first copy.
- make full copies only on NUMA nodes and keep a shared pointer
on all other shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
0a3d66839a tools: schema_loader: start a sharded erm_factory
This is required for an upcoming change to create effective_replication_map
on all shards in storage_service::replication_to_all_cores.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
23e1344b72 storage_service: use erm_factory to create effective_replication_map
Instead of calculating the effective_replication_map
in replicate_to_all_cores, use effective_replication_map_factory::
create_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
cb240ffbae keyspace: use erm_factory to create effective_replication_map
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
6754e6ca2b effective_replication_map: erase from factory when destroyed
The effective_replication_map_factory keeps nakes pointers
to outstanding effective_replication_map:s.
These are kept valid using a shared effective_replication_map_ptr.

When the last shared ptr reference is dropped the effective_replication_map
object is destroyed, therefore the raw pointer to it in the factory
must be erased.

This now happens in ~effective_replication_map when the object
is marked as registered.

Registration happens when effective_replication_map_factory inserts
the newly created effective_replication_map to its _replication_maps
map, and the factory calles effective_replication_map::set_factory..

Note that effective_replication_map may be created temporarily
and not be inserted to the factory's map, therefore erase
is called only when required.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:20 +02:00
Benny Halevy
8a6fbe800f effective_replication_map_factory: add create_effective_replication_map
Make a factory key using the replication_strategy type
and config options, plus the token_metadata ring version
and use it to search an already-registred effective_replication_map.

If not found, calculate a new create_effective_replication_map
and register it using the above key.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
ecba37dbfd effective_replication_map: enable_lw_shared_from_this
So a effective_replication_map_ptr can be generated
using a raw pointer by effective_replication_map_factory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
f4f41e2908 effective_replication_map: define factory_key
To be used to locate the effective_replication_map
in the to-be-introduced effective_replication_map_factory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
5947de7674 keyspace: get a reference to the erm_factory
To be used for creating effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
1d7556d099 main: pass erm_factory to storage_service
To be used for creating effective_replication_map
when token_metadata changes, and update all
keyspaces with it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
242043368e main: pass erm_factory to storage_proxy
To be used for creating the effective_replication_map per keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
3fed73e7c2 locator: add effective_replication_map_factory
It will be used further to create shared copies
of effective_replication_map based on replication_strategy
type and config options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
3c0fec6b17 storage_proxy: paxos_response_handler::prune: demote write timeout error printout to debug level
Similar to other timeout handling paths, there is no need to print an
ERROR for timeout as the error is not returned anyhow.

Eventually the error will be reported at the query level
when the query times out or fails in any other way.

Also, similar to `storage_proxy::mutate_end`, traces were added
also for the error cases.

FWIW, these extraneous timeout error causes dtest failures.
E.g. alternator_tests:AlternatorTest.test_slow_query_logging

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211118153603.2975509-1-bhalevy@scylladb.com>
2021-11-19 11:09:09 +03:00
Raphael S. Carvalho
5f7ee2e135 test: sstable_compaction_test: fix twcs_reshape_with_disjoint_set_test by using a non-coarse timestamp resolution
We're using a coarse resolution when rounding clock time for sstables to
be evenly distributed across time buckets. Let's use a better resolution,
to make sure sstables won't fall into the edges.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211118172126.34545-1-raphaelsc@scylladb.com>
2021-11-19 11:09:09 +03:00
Pavel Emelyanov
1dd08e367e test, cross-shard-barrier: Increase stall detector period
The test checks every 100 * smp::count milliseconds that a shard
had been able to make at least once step. Shards, in turn, take up
to 100 ms sleeping breaks between steps. It seems like on heavily
loaded nodes the checking period is too small and the test
stuck-detector shoots false-positives.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211118154932.25859-1-xemul@scylladb.com>
2021-11-19 11:09:09 +03:00
Mikołaj Sielużycki
87a212fa56 memtable-sstable: Fix indentation in table::try_flush_memtable_to_sstable.
Message-Id: <20211118131441.215628-3-mikolaj.sieluzycki@scylladb.com>
2021-11-19 11:09:09 +03:00
Mikołaj Sielużycki
6df07f7ff7 memtable-sstable: Convert table::try_flush_memtable_to_sstable to coroutines.
I intentionally store lambdas in variables and pass them to
with_scheduling_group using std::ref. Coroutines don't put variables
captured by lambdas on stack frame. If the lambda containing them is not
stored, the captured variables will be lost, resulting in stack/heap use
after free errors. An alternative is to capture variables, then create
local variables inside lambda bodies that contain a copy/moved version
of the captured ones. For example, if the post_flush lambda wasn't
stored in a dedicated variable, then it wouldn't be put on the coroutine
frame. At the first co_await inside of it, the lambda object along with
variables captured by it (old and &newtabs created inside square
brackets) would go away. The underlying objects (e.g. newtabs created in
the outer scope) would still be valid, but the reference to it would be
gone, causing most of the tests to fail.

Message-Id: <20211118131441.215628-2-mikolaj.sieluzycki@scylladb.com>
2021-11-19 11:09:09 +03:00