Commit Graph

29701 Commits

Author SHA1 Message Date
Botond Dénes
bdcbf3f71b Merge 'database: Add error message with mutation info on commit log apply failure' from Calle Wilund
Fixes #9408

While it is rare, some customer issues have shown that we can run into cases where commit log apply (writing mutations to it) fails badly. In the known cases, due to oversized mutations. While these should have been caught earlier in the call chain really, it would probably help both end users and us (trying to figure out how they got so big and how they got so far) iff we added info to the errors thrown (and printed), such as ks, cf, and mutation content.

Somewhat controversial, this makes the apply with CL decision path coroutinized, mainly to be able to do the error handling    for the more informative wrapper exception easier/less ugly. Could perhaps do with futurize_invoke + then_wrapper also. But future is coroutines...

This is as stated somewhat problematic, it adds an allocation to perf_simple_query::write path (because of crap clang cr frame folding?). However, tasks/op remain constant and actual tps (though unstable) remain more or less the same (on my crappy measurements).

Counter path is unaffected, as coroutine frame alloc replaces with(...)

dtest for the wrapped exception on separate pr.

Closes #9412

* github.com:scylladb/scylla:
  database: Add error message with mutation info on commit log apply failure
  database: coroutinize do_apply and apply_with_commitlog
2022-01-12 16:16:29 +02:00
Calle Wilund
a6202ae079 database: Add error message with mutation info on commit log apply failure
Fixes #9408

While it is rare, some customer issues have shown that we can run into cases
where commit log apply (writing mutations to it) fails badly. In the known
cases, due to oversized mutations. While these should have been caught earlier
in the call chain really, it would probably help both end users and us (trying
to figure out how they got so big and how they got so far) iff we added info
to the errors thrown (and printed), such as ks, cf, and mutation content.
2022-01-12 14:04:23 +00:00
Calle Wilund
63ea666ca0 database: coroutinize do_apply and apply_with_commitlog
Somewhat controversial. Making the apply with CL decision path
coroutinized, mainly to be able to in next patch make error handling
more informative (because we will have exceptions that are immediate
and/or futurized).

This is as stated somewhat problematic, it adds an allocation to
perf_simple_query::write path (because of crap clang cr frame folding?).
However, tasks/op remain constant and actual tps (though unstable)
remain more or less the same (on my crappy measurements).

Counter path is unaffected, as coroutine frame alloc replaces with(...)
alloc, and all is same and dandy.

I am hoping that the simpler error + verbose code will compensate for
the extra alloc.
2022-01-12 14:04:15 +00:00
Nadav Har'El
23e93a26b3 Merge 'Alternator: stream results + chunk results to remove large allocations' from Calle Wilund
Refs: #9555

When running the "Kraken" dynamodb streams test to provoke the issued observed by QA, I noticed on my setup mainly two things: Large allocation stalls (+ warnings) and timeouts on read semaphores in DB.

This tries to address the first issue, partly by making query_result_view serialization using chunked vector instead of linear one, and by introducing a streaming option for json return objects, avoiding linearizing to string before wire.
Note that the latter has some overhead issues of its own, mainly data copying, since we essentially will be triple buffering (local, wrapped http stream, and final output stream). Still, normal string output will typically do a lot of realloc which is potential extra copies as well, so...

This is not really performance tested, but with these tweaks I no longer get large alloc stalls at least, so that is a plus. :-)

Closes #9713

* github.com:scylladb/scylla:
  alternator::executor: Use streamed result for scan etc if large result
  alternator::streams: Use streamed result in get_records if large result
  executor/server: Add routine to make stream object return
  rjson: Add print to stream of rjson::value
  query_idl: Make qr_partition::rows/query_result::partitions chunked
2022-01-12 15:53:31 +02:00
Calle Wilund
f73ca9659b alternator::executor: Use streamed result for scan etc if large result
Avoids large allocations for larger scans.
Todo: determine threshold
2022-01-12 13:34:49 +00:00
Calle Wilund
0c1ff5c2f5 alternator::streams: Use streamed result in get_records if large result
If we have a resonable result set to send back to client, use direct
streaming of the object.

Todo: determine threshold.
2022-01-12 13:34:49 +00:00
Calle Wilund
4a8a7ef8b4 executor/server: Add routine to make stream object return
Simply retains result object and sets json::json_return_type to
streaming callback.
2022-01-12 13:34:49 +00:00
Calle Wilund
e2d7225df8 rjson: Add print to stream of rjson::value
Allows direct stream of object to seastar::stream. While not 100%
efficient, it has the advantage of avoiding large allocations
(long string) for huge result messages.
2022-01-12 13:34:49 +00:00
Avi Kivity
134601a15e Merge "Convert input side of mutation compactor to v2" from Botond
"
With this series the mutation compactor can now consume a v2 stream. On
the output side it still uses v1, so it can now act as an online
v2->v1 converter. This allows us to push out v2->v1 conversion to as far
as the compactor, usually the next to last component in a read pipeline,
just before the final consumer. For reads this is as far as we can go,
as the intra-node ABI and hence the result-sets built are v1. For
compaction we could go further and eliminate conversion altogether, but
this requires some further work on both the compactor and the sstable
writer and so it is left to be done later.
To summarize, this patchset enables a v2 input for the compactor and it
updates compaction and single partition reads to use it.
"

* 'mutation-compactor-consume-v2/v1' of https://github.com/denesb/scylla:
  table: add make_reader_v2()
  querier: convert querier_cache and {data,mutation}_querier to v2
  compaction: upgrade compaction::make_interposer_consumer() to v2
  mutation_reader: remove unecessary stable_flattened_mutations_consumer
  compaction/compaction_strategy: convert make_interposer_consumer() to v2
  mutation_writer: migrate timestamp_based_splitting_writer to v2
  mutation_writer: migrate shard_based_splitting_writer to v2
  mutation_writer: add v2 clone of feed_writer and bucket_writer
  flat_mutation_reader_v2: add reader_consumer_v2 typedef
  mutation_reader: add v2 clone of queue_reader
  compact_mutation: make start_new_page() independent of mutation_fragment version
  compact_mutation: add support for consuming a v2 stream
  compact_mutation: extract range tombstone consumption into own method
  range_tombstone_assembler: add get_range_tombstone_change()
  range_tombstone_assembler: add get_current_tombstone()
2022-01-12 14:37:19 +02:00
Avi Kivity
4118f2d8be treewide: replace deprecated seastar::later() with seastar::yield()
seastar::later() was recently deprecated and replaced with two
alternatives: a cheap seastar::yield() and an expensive (but more
powerful) seastar::check_for_io_immediately(), that corresponds to
the original later().

This patch replaces all later() calls with the weaker yield(). In
all cases except one, it's unambiguously correct. In one case
(test/perf scheduling_latency_measurer::stop()) it's not so ambiguous,
since check_for_io_immediately() will additionally force a poll and
so will cause more work to be done (but no additional tasks to be
executed). However, I think that any measurement that relies on
the measuring the work on the last tick to be inaccurate (you need
thousands of ticks to get any amount of confidence in the
measurement) that in the end it doesn't matter what we pick.

Tests: unit (dev)

Closes #9904
2022-01-12 12:19:19 +01:00
Avi Kivity
0e5d196499 Merge "move storage proxy verbs to the IDL" from Gleb
* 'gleb/sp-idl-v1' of github.com:scylladb/scylla-dev:
  storage_proxy: move all verbs to the IDL
  idl-compiler: allow const references in send() parameter list
  idl-compiler: support smart pointers in verb's return value
  idl-compiler: support multiple return value and optional in a return value
  idl-compiler: handle :: at the beginning of a type
  idl-compiler: sending one way message without timeout does not require ret value specialization as well
  storage_proxy: convert more address vectors to inet_address_vector_replica_set
2022-01-12 12:34:18 +02:00
Nadav Har'El
7a9f69ec38 Merge 'lister cleanup and test' from Benny Halevy
Split off of #9835.

The series removes extraneous includes of lister.hh from header files
and adds a unit test for lister::scan_dir to test throwing an exception
from the walker function passed to `scan_dir`.

Test: unit(dev)

Closes #9885

* github.com:scylladb/scylla:
  test: add lister_list
  lister: add more overloads of fs::path operator/ for std::string and string_view
  resource_manager: remove unnecessary include of lister.hh from header file
  sstables: sstable_directory: remove unncessary include of lister.hh from header file
2022-01-12 08:20:07 +01:00
Nadav Har'El
c5f29fe3ea configure.py: don't use deprecated mktemp()
configure.py uses the deprecated Python function tempfile.mktemp().
Because this function is labeled a "security risk" it is also a magnet
for automated security scanners... So let's replace it with the
recommended tempfile.mkstemp() and avoid future complaints.

The actual security implications of this mktemp() call is negligible to
non-existent: First it's just the build process (configure.py), not
the build product itself. Second, the worst that an attacker (which
needs to run in the build machine!) can do is to cause a compilation
test in configure.py to fail because it can't write to its output file.

Reported by @srikanthprathi

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220111121924.615173-1-nyh@scylladb.com>
2022-01-11 17:06:14 +02:00
Benny Halevy
1e6829e9f1 test: add lister_list
Test the lister class.

In particular the ability to abort the lister
when the walker function throws an exception.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-11 17:04:16 +02:00
Benny Halevy
8444e50e6a lister: add more overloads of fs::path operator/ for std::string and string_view
To make it easier to append a std::string to a filesystem::path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-11 17:04:16 +02:00
Benny Halevy
f4cd535e3d resource_manager: remove unnecessary include of lister.hh from header file
But define namespace fs = std::filesystem in the header
since many use sites already depend on it
and it's a convention throught scylla's code.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-11 17:04:16 +02:00
Benny Halevy
b9c41dc0fd sstables: sstable_directory: remove unncessary include of lister.hh from header file
The source file depends on it, not the header.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-11 17:04:16 +02:00
Botond Dénes
97d74de8fc Merge "flat_mutation_reader: clone evictable_reader & convert some others" from Michael Livshin
"
The first patch introduces evictable_reader_v2, and the second one
further simplifies it.  We clone instead of converting because there
is at least one downstream (by way of multishard_combining_reader) use
that is not itself straightforward to convert at the moment
(multishard_mutation_query), and because evictable_reader instances
cannot be {up,down}graded (since users also access the undelying
buffers).  This also means that shard_reader, reader_lifecycle_policy
and multishard_combining_reader have to be cloned.
"

* tag 'clone-evictable-reader-to-v2/v3' of https://github.com/cmm/scylla:
  convert make_multishard_streaming_reader() to flat_mutation_reader_v2
  convert table::make_streaming_reader() to flat_mutation_reader_v2
  convert make_flat_multi_range_reader() to flat_mutation_reader_v2
  view_update_generator: remove unneeded call to downgrade_to_v1()
  introduce multishard_combining_reader_v2
  introduce shard_reader_v2
  introduce the reader_lifecycle_policy_v2 abstract base
  evictable_reader_v2: further code simplifications
  introduce evictable_reader_v2 & friends
2022-01-11 17:01:08 +02:00
Botond Dénes
d21803c5d0 Merge "Remove global storage proxy from pagers code" from Pavel Emelyanov
"
The fix is in keeping shared proxy pointer on query_pager.

tests: unit(dev)
"
* 'br-keep-proxy-on-pager-2' of https://github.com/xemul/scylla:
  pager: Use local proxy pointer
  pager: Keep shared pointer to proxy onboard
2022-01-11 17:01:08 +02:00
Nadav Har'El
9d0eaeb90a test/scylla-gdb: enable test for "scylla fiber"
After the rewrite of the test/scylla-gdb, the test for "scylla fiber"
was disabled - and this patch brings it back.

For the "scylla fiber" operation to do something interesting (and not just
print an error message and seem to succeed...) it needs a real task pointer.
The old code interrupted Scylla in a breakpoint and used get_local_tasks(),
but in the new test framework we attach to Scylla while it's idle, so
there are no ready tasks. So in this patch we use the find_vptrs()
function to find a continuation from http_server::do_accept_one() - it
has an interesting fiber of 5 continuations.

After this patch all 33 tests in test/scylla-gdb/test_misc.py pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220110211813.581807-1-nyh@scylladb.com>
2022-01-11 17:01:08 +02:00
Avi Kivity
861cc1d304 Update seastar submodule
* seastar 28fe4214e5...ae8d1c28a2 (3):
  > cross-tree: convert deprecated later() to yield()
  > future: deprecate later(), and add two alternatives
  > reactor: improve lowres_clock, lowres_system_clock granularity
2022-01-11 17:01:08 +02:00
Nadav Har'El
7f5ca5bf3f Merge 'replica: move distributed_loader to replica module' from Avi Kivity
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.

The change exposes a dependency on the construction
order of static variables (which isn't defined), so we remove
the dependency in the first two patches.

Closes #9891

* github.com:scylladb/scylla:
  replica: move distributed_loader into replica module
  tracing: make sure keyspace and table names are available to static constructors
  auth: make sure keyspace and table names are available to static constructors
2022-01-11 17:01:08 +02:00
Pavel Emelyanov
4dd1c15b7b Merge v3 of "Deglobalize repair tracker" from Benny
This series gets rid of the global repair_tracker
and thread-local node_ops_metrics instances.

It does so by first, make the repair_tracker sharded,
with an instance per repair_service shard.
The, exposing the repair_service::repair_tracker
and keeping a reference to the repair_service in repair_info.

Then the node_ops_metrics instances are moved from
thread-local global variables to class repair_service.

The motivation for this series is two fold:
1. There is a global effor the get rid of global services
   and instantiate all services on the stack of main() or cql_test_env.
2. As part of https://github.com/scylladb/scylla/issues/9809,
   we would like to eventually use a generci job tracer for both repair
   and compaction, so this would be one of the prelimanry steps to get there.

Refs #9809

Test: unit(release) (including scylla-gdb)
Dtest: repair_additional_test.py::TestRepairAdditional::{test_repair_disjoint_row_2nodes,test_repair_joint_row_3nodes_2_diff_shard_count} replace_address_test.py::TestReplaceAddress::test_serve_writes_during_bootstrap[rbo_enabled]
(Still seeing https://github.com/scylladb/scylla/issues/9785 but nothing worse)

* github.com:bhalevy/scylla.git deglobalize-repair-tracker-v4
  repair: repair_tracker: get rid of _the_tracker
  repair: repair_service: move free abort_repair_node_ops function to repair_service
  repair_service: deglobalize node_ops_metrics
  repair: node_ops_metrics: fixup indentation
  repair: node_ops_metrics: declare in header file
  repair: repair_info: add check_in_shutdown method
  repair: use repair_info to get to the repair tracker
  repair: move tracker-dependent free functions to repair_service
  repair: tracker: mark get function const
  repair_service: add repair_tracker getter
  repair: make repair_tracker sharded
  repair: repair_tracker: get rid of unused abort_all_abort_source
  repair: repair_tracker: get rid of unused shutdown abort source
2022-01-11 17:01:08 +02:00
Nadav Har'El
261c4b80b5 Update tools/java submodule
* tools/java 6249bfbe2f...b1e09c8b8f (1):
  > dist/debian:set either python (>=2.7) or python2
2022-01-11 17:01:08 +02:00
Calle Wilund
706f20442b query_idl: Make qr_partition::rows/query_result::partitions chunked
When doing potentially large (internal) queries, i.e. alternator streams,
we can cause large allocations here.
2022-01-11 13:52:40 +00:00
Michael Livshin
1f27e12dc6 convert make_multishard_streaming_reader() to flat_mutation_reader_v2
All changes are mechanical.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
be5118a7c9 convert table::make_streaming_reader() to flat_mutation_reader_v2
All changes are mechanical.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
221cd264db convert make_flat_multi_range_reader() to flat_mutation_reader_v2
Mechanical changes and a resulting downgrade in one caller (which is
itself converted later).

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
91d38ef2a9 view_update_generator: remove unneeded call to downgrade_to_v1()
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
7f0e228cbb introduce multishard_combining_reader_v2
All changes are mechanical.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
4bc0deb7e9 introduce shard_reader_v2
Needed for multishard_combining_reader_v2 (see next commit), all
changes are mechanical.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
6499361b6a introduce the reader_lifecycle_policy_v2 abstract base
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
b053716e74 evictable_reader_v2: further code simplifications
Almost all mechanical: not passing a `reader` parameter around when we
know it's the `_reader` member, folding a short one-use method into
its caller.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Michael Livshin
402dbd2ca7 introduce evictable_reader_v2 & friends
Cloning instead of converting because there is at least one
downstream (via multishard_combining_reader) use that is not
straightforward to convert (multishard_mutation_query).

The clone is mostly mechanical and much simpler than the original,
because it does not have to deal with range tombstones when deciding
if it is safe to pause the wrapped reader, and also does not have to
trim any range tombstones.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Raphael S. Carvalho
49eeacff37 compaction_manager: make run_with_compaction_disabled() barrier out non-regular compactions
run_with_compaction_disabled() is used to temporarily disable compaction
for a table T. Not only regular compaction, but all types.
Turns out it's stopping all types but it's only preventing new regular
compactions from starting. So major for example can start even with
compaction temporarily disabled. This is fixed by not allowing
compaction of any type if disabled. This wasn't possible before as
scrub incorrectly ran entirely with compaction disabled, so it wouldn't
be able to start, but now it only disables compaction while retrieving
its candidate list.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220107154942.59800-1-raphaelsc@scylladb.com>
2022-01-10 18:57:16 +02:00
Raphael S. Carvalho
1c23d1099a Make population more resilient when reshape fails
Reshape isn't mandatory for correctness, unlike resharding.
So we can allow boot to continue even in face of reshape
failure. Without this, boot will fail right away due to
unhandled exception. This is intended to make population
more resilient as any exception, even "benign" ones,
may cause boot to fail. It's better to allow boot to
continue from where it left off, as if there's an exception
like io error, or OOM, population will be unable to
complete anyway.

This patch was written based on observation that dangling
errors in interposer consumer used by compaction can cause
a different exception to be triggered, like broken_promise,
when user asked reshape to stop. This can no longer happen
now, but better safe than sorry.
So regular compaction can now pick on backlog once node is
online.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220107130539.14899-1-raphaelsc@scylladb.com>
2022-01-10 18:57:16 +02:00
Avi Kivity
4392c20bd3 replica: move distributed_loader into replica module
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.
2022-01-10 15:25:28 +02:00
Avi Kivity
bfa4abaf6b tracing: make sure keyspace and table names are available to static constructors
Static constructors (specifically for the `system_keyspaces` global variable)
need their dependencies to be already constructed when their own
construction begins. Because tracing uses seastar::sstring, which is not
constexpr, we must change it to std::string_view (which is). Change
the type and perform the required adjustments. The definition is moved
to the header file for simplicity.
2022-01-10 15:24:57 +02:00
Gleb Natapov
1db151bd75 storage_proxy: move all verbs to the IDL
Define all verbs in the IDL instead of manually codding them.
2022-01-10 14:58:28 +02:00
Gleb Natapov
c998f77cd2 idl-compiler: allow const references in send() parameter list
Currently send function parameters and rpc handler's function parameters
have both to be values, but sometimes we want send function to receive a
const reference to a value to avoid copying, but a handler still needs
to get it by value obviously. Support that by introducing one more type
attribute [[ref]]. If present the code generator makes send function
argument to look like 'const type&' and handler's argument will be
'type'.
2022-01-10 14:44:20 +02:00
Gleb Natapov
f3d5507f86 idl-compiler: support smart pointers in verb's return value
A verb's handler may return a 'foreign_ptr<smart_ptr<type>>' value which
is received on a client side as a naked 'type'. Current verb generator
code can only support symmetric handler/send helper where return type pf
a handler matches return type of a send function. Fix that by adding two
new attributes that can annotate a return type: unique_ptr,
lw_shared_ptr. If unique_ptr attribute is present the return type of a
handler will be 'foreign_ptr<unique_ptr<type>>' and the return type of a
send function will be just 'type'.
2022-01-10 14:29:37 +02:00
Gleb Natapov
9329234941 idl-compiler: support multiple return value and optional in a return value
RPC verbs can be extended to return more then one value and new values
are returned as rpc::optional. When adding a return value to a verb
its return values becomes rpc::tuple<type1, type2, type3>. In addition
new return values may be marked as rpc::optional for backwards
compatibility. The patch allow to part return expression of the form:
  -> type1, type2 [[version 1.1.0]]
which will be translated into:
  rpc::tuple<type1, rpc::optional<type2>>
2022-01-10 14:23:51 +02:00
Gleb Natapov
9c88ea2303 idl-compiler: handle :: at the beginning of a type
Currently types starting from '::' like '::ns::type' cause parsing
errors. Fix it.
2022-01-10 14:22:48 +02:00
Gleb Natapov
cf8c42ee42 idl-compiler: sending one way message without timeout does not require ret value specialization as well 2022-01-10 14:16:20 +02:00
Gleb Natapov
ff6a0fffaf storage_proxy: convert more address vectors to inet_address_vector_replica_set 2022-01-10 13:48:20 +02:00
Benny Halevy
50a361c280 repair: repair_tracker: get rid of _the_tracker
the global _the_tracker pointer is no longer used,
remove it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-10 12:03:57 +02:00
Benny Halevy
ceb08b9302 repair: repair_service: move free abort_repair_node_ops function to repair_service
Do not depend on the_repair_tracker().

With that, the_repair_tracker() is no longer used
and should be deleted.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-10 11:59:22 +02:00
Benny Halevy
6bd78eb9a6 repair_service: deglobalize node_ops_metrics
Embed the node_ops_metrics instance in
a sharded repair_service member.

Test: curl -silent http://127.0.0.1:9180/metrics | grep node_ops | grep -v "^#"
      on a freshly started scylla instance.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-10 11:57:54 +02:00
Benny Halevy
a9c30f47fe repair: node_ops_metrics: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-10 11:52:58 +02:00
Benny Halevy
91cee22792 repair: node_ops_metrics: declare in header file
For de-globalizing its thread-local instance
by placing a node_ops_metrics member in repair_service.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-10 11:52:54 +02:00