When the task starts it needs the stream_manager to get messaging
service and database from. There's a session at hands and this
session is properly initialized thus it has the result-future.
Voila -- we have the manager!
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The helper function called send_mutation_fragments needs the manager
to update stats about stream_transfer_task as it goes on. Carrying the
manager over its stack is quite boring, but there's a helper send_info
object that lives there. Equip the guy with the updating function and
capture the manager by it early to kill one more usage of the global
stream_manager call.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When the stream_session initializes it's being equipped with
the shared-pointer on the stream_result_future very early. In
all the places where stream_session needs the manager this
pointer is alive and session get get manager from it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The stream_mutation_fragments handler need to access the manager. Since
the handler is registered by the manager itself, it can capture the
local manager reference and use container() where appropriate.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The manager is needed to get messaging service and database from.
Actually, the database can be pushed though arguments in all the
places, so effectively session only needs the messaging. However,
the stream-task's need the manager badly and there's no other
place to get it from other than the session.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The stream_result_future needs manager to register on it and to
unregister from it. Also the result-future is referenced from
stream_session that also needs the manager (see next patches).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The plan itself doesn't need it, but it creates some lower level
objects that do. Next patches will use this reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Continuation of the previous patch -- some native stream_manager methods
can enjoy using container() call. One nit -- the [] access to the map
of statistics now runs in const context and cannot create elements, so
switch this place into .at() method.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Streaming manager registers itself in gossiper, so it needs an explicit
dependency reference. Also it forgets to unregister itself, so do it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In case of streaming this mostly means dropping the global
init/uninit calls and replacing them with sharded<stream_manager>
instance. It's still global, but it's being fixed atm.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The start/stop standard is becoming like
sharded<foo> foo;
foo.start();
defer([] { foo.stop() });
foo.invoke_on_all(&foo::start);
...
defer([] { foo.shutdown() });
wait_for_stop_signal();
/* quit making the above defers self-unroll */
where .shutdown() for a service would mean "do whatever is
appropriate to start stopping, the real synchronous .stop() will
come some time later".
According to that, rename .stop() as it's really the mentioned
preparation, not real stopping.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently streaming uses global pointers to save and get a
dependency. Now all the dependencies live on the manager,
this patch changes all the places in streaming/ to get the
needed dependencies from it, not from global pointer (next
patch will remove those globals).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The stream manager is going to become central point of control
for the streaming subsys. This patch makes its dependencies
explicit and prepares the gound for further patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
This series moves the timeout parameter, that is passed to most
f_m_r methods, into the reader_permit. This eliminates
the need to pass the timeout around, as it's taken
from the permit when needed.
The permit timeout is updated in certain cases
when the permit/reader is paused and retrieved
later on for reuse.
Following are perf_simple_query results showing ~1%
reduction in insns/op and corresponding increase in tps.
$ build/release/test/perf/perf_simple_query -c 1 --operations-per-shard 1000000 --task-quota-ms 10
Before:
102500.38 tps ( 75.1 allocs/op, 12.1 tasks/op, 45620 insns/op)
After:
103957.53 tps ( 75.1 allocs/op, 12.1 tasks/op, 45372 insns/op)
Test: unit(dev)
DTest:
repair_additional_test.py:RepairAdditionalTest.repair_abort_test (release)
materialized_views_test.py:TestMaterializedViews.remove_node_during_mv_insert_3_nodes_test (release)
materialized_views_test.py:InterruptBuildProcess.interrupt_build_process_with_resharding_half_to_max_test (release)
migration_test.py:TTLWithMigrate.big_table_with_ttls_test (release)
"
* tag 'reader_permit-timeout-v6' of github.com:bhalevy/scylla:
flat_mutation_reader: get rid of timeout parameter
reader_concurrency_semaphore: use permit timeout for admission
reader_concurrency_semaphore: adjust reactivated reader timeout
multishard_mutation_query: create_reader: validate saved reader permit
repair: row_level: read_mutation_fragment: set reader timeout
flat_mutation_reader: maybe_timed_out: use permit timeout
test: sstable_datafile_test: add sstable_reader_with_timeout
reader_permit: add timeout member
With data segregation on repair, thousands of sstables are potentially
added to maintenance set which causes high latency due to stalls.
That's because N*M sstables are created by a repair,
where N = # of ranges
and M = # of segregations
For TWCS, M = # of windows.
Assuming N = 768 and M = 20, ~15k sstables end up in sstable set
To fix this problem, let's avoid performing data segregation in repair,
as offstrategy will already perform the segregation anyway.
So from now on, only N non-overlapping sstables will be added to set.
Read amplification isn't affected because a query will only touch one
sstable in maintenance set.
When offstrategy starts, it will pick all sstables from set and
compact them in a single step while performing data segregation,
so data is properly laid out before integrated into the main set.
tests:
- sstable_compaction_test.twcs_reshape_with_disjoint_set_test
- mode(dev)
- manual test using repair-based bootstrap
Fixes#9199.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210824185043.76475-1-raphaelsc@scylladb.com>
As a preparation for up-front admission, add a permit parameter to
`make_streaming_reader()`, which will be the admitted permit once we
switch to up-front admission. For now it has to be a non-admitted
permit.
A nice side-effect of this patch is that now permits will have a
use-case specific description, instead of the generic "streaming" one.
Currently, if e.g. find_column_family throws an error,
as seen in #8776 when the table was dropped during repair,
the reader is not closed.
Use a coroutine to simplify error handling and
close the reader if an exception is caught.
Also, catch an error inside the lambda passed to make_interposer_consumer
when making the shared_sstable for streaming, and close the reader
their and return an exceptional future early, since
the reader will not be moved to sst->write_components, that assumes
ownership over it and closes it in all cases.
Fixes#8776
Test: unit(dev)
DTest: repair_additional_test.py:RepairAdditionalTest.repair_while_table_is_dropped_test (dev, debug) w/ https://github.com/scylladb/scylla/pull/8635#issuecomment-856661138
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#8782
* github.com:scylladb/scylla:
streaming: make_streaming_consumer: close reader on errors
streaming: make_streaming_consumer: coroutinize returned function
The off-strategy compaction is now enabled for repair based node
operations. It is not bound to repair based node operations though. It
makes sense to enable it for streaming based node operations too.
Fixes#8820Closes#8821
Currently, if e.g. find_column_family throws an error,
as seen in #8776 when the table was dropped during repair,
the reader is not closed.
Use a coroutine to simplify error handling and
close the reader if an exception is caught.
Also, catch an error inside the lambda passed to make_interposer_consumer
when making the shared_sstable for streaming, and close the reader
their and return an exceptional future early, since
the reader will not be moved to sst->write_components, that assumes
ownership over it and closes it in all cases.
Fixes#8776
Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Both streaming and repair call the distributed sstables writing with
equal lambdas each being ~30 lines of code. The only difference between
them is repair might request offstrategy compaction for new sstable.
Generalization of these two pieces save lines of codes and speeds the
release/repair/row_level.o compilation by half a minute (out of twelve).
tests: unit(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210531133113.23003-1-xemul@scylladb.com>
"
This patchset adds future-returning close methods to all
flat_mutation_reader-s and makes sure that all readers
are explicitly closed and waited for.
The main motivation for doing so is for providing a path
for cancelling outstanding i/o requests via a the input_stream
close (See https://github.com/scylladb/seastar/issues/859)
and wait until they complete.
Also, this series also introduces a stop
method to reader_concurrency_semaphore to be used when
shutting down the database, instead of calling
clear_inactive_readers in the database destructor.
The series does not change microbenchmarks performance in a significant way.
It looks like the results are within the tests' jitter.
- perf_simple_query: (in transactions per second, more is better)
before: median 184701.83 tps (90 allocs/op, 20 tasks/op)
after: median 188970.69 tps (90 allocs/op, 20 tasks/op) (+2.3%)
- perf_mutation_readers: (in time per iteration, less is better)
combined.one_row 65.042ns -> 57.961ns (-10.9%)
combined.single_active 46.634us -> 46.216us ( -0.9%)
combined.many_overlapping 364.752us -> 371.507us ( +1.9%)
combined.disjoint_interleaved 43.634us -> 43.448us ( -0.4%)
combined.disjoint_ranges 43.011us -> 42.991us ( -0.0%)
combined.overlapping_partitions_disjoint_rows 57.609us -> 58.820us ( +2.1%)
clustering_combined.ranges_generic 93.464ns -> 96.236ns ( +3.0%)
clustering_combined.ranges_specialized 86.537ns -> 87.645ns ( +1.3%)
memtable.one_partition_one_row 903.546ns -> 957.639ns ( +6.0%)
memtable.one_partition_many_rows 6.474us -> 6.444us ( -0.5%)
memtable.one_large_partition 905.593us -> 878.271us ( -3.0%)
memtable.many_partitions_one_row 13.815us -> 14.718us ( +6.5%)
memtable.many_partitions_many_rows 161.250us -> 158.590us ( -1.6%)
memtable.many_large_partitions 24.237ms -> 23.348ms ( -3.7%)
average -0.02%
Fixes#1076
Refs #2927
Test: unit(release, debug)
Perf: perf_mutation_readers, perf_simple_query (release)
Dtest: next-gating(release),
materialized_views_test:TestMaterializedViews.interrupt_build_process_and_resharding_max_to_half_test repair_additional_test:RepairAdditionalTest.repair_disjoint_row_3nodes_diff_shard_count_test(debug)
"
* tag 'flat_mutation_reader-close-v7' of github.com:bhalevy/scylla: (94 commits)
mutation_reader: shard_reader: get rid of stop
mutation_reader: multishard_combining_reader: get rid of destructor
flat_mutation_reader: abort if not closed before destroyed
flat_mutation_reader: require close
repair: row_level_repair: run: close repair_meta when done
repair: repair_reader: close underlying reader on_end_of_stream
perf: everywhere: close flat_mutation_reader when done
test: everywhere: close flat_mutation_reader when done
mutation_partition: counter_write_query: close reader when done
index: built_indexes_reader: implement close
mutation_writer: multishard_writer: close readers when done
mutation_writer: feed_writer: close reader when done
table: for_all_partitions_slow: close iteration_step reader when done
view_builder: stop: close all build_step readers
stream_transfer_task: execute: close send_info reader when done
view_update_generator: start: close staging_sstable_reader when done
view: build_progress_virtual_reader: implement close method
view: generate_view_updates: close builder readers when done
view_builder: initialize_reader_at_current_token: close reader before reassigning it
view_builder: do_build_step: close build_step reader when done
...
These two helpers are now namespace-scoped methods, but both
need the migration manager instance inside. All their callers
are now patched to have the migration manager at hands, so
the helpers can be turned into methods.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It is dangerous to print a formatted string as is, like
sslog.warn(err.c_str()) since it might hold curly braces ('{}')
and those require respective runtime args.
Instead, it should be logged as e.g. sslog.warn("{}", err.c_str()).
This will prevent issues like #8436.
Refs #8436
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210408173048.124417-2-bhalevy@scylladb.com>
This just causes unneeded and slower recompliations. Instead replace
with forward declarations, or includes of smaller headers that were
incidentally brought in by the one removed. The .cc files that really
need it gain the include, but they are few.
Ref #1.
Closes#8403
Add a string describing where the sstables originated
from (e.g. memtable, repair, streaming, compaction, etc.)
If configure_writer is called with a nullptr, the origin
will be equal to an empty string.
Introduce test_env_sstables_manager that provides an overload
of configure_writer with no parmeters that calls the base-class'
configure_writer with "test" origin. This was to reduce the
code churn in this patch and to keep the tests simple.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Commit e5be3352cf ("database, streaming, messaging: drop
streaming memtables") removed streaming memtables; this removes
the mechanisms to synchronize them: _streaming_flush_gate and
_streaming_flush_phaser. The memory manager for streaming is removed,
and its 10% reserve is evenly distributed between memtables and
general use (e.g. cache).
Note that _streaming_flush_phaser and _streaming_flush_date are
no longer used to syncrhonize anything - the gate is only used
to protect the phaser, and the phaser isn't used for anything.
Closes#7454
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.
As not all read can be associated with one schema, the schema is allowed
to be null.
The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.
In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
Not used yet, this patch does all the churn of propagating a permit
to each impl.
In the next patch we will use it to track to track the memory
consumption of `_buffer`.