Commit Graph

29741 Commits

Author SHA1 Message Date
Botond Dénes
0e1bdca71b test/boost/sstable_compaction_test: migrate scrub tests to v2 2022-01-14 08:54:26 +02:00
Botond Dénes
da0c5adcc3 test/lib/simple_schema: add v2 of make_row() and make_static_row() 2022-01-14 08:54:26 +02:00
Botond Dénes
d57634ad46 compaction: use v2 version of mutation_writer::segregate_by_partition() 2022-01-14 08:54:26 +02:00
Botond Dénes
e772326b10 mutation_writer: add v2 version of segregate_by_partition()
Just a facade using converters behind the scenes. The actual segregator
is not worth migrating to v2 while mutation and the flushing readers
don't have a v2 versions. Still, migrating all users to a v2 API allows
the conversion to happen at a single point where more work is necessary,
instead of scattered around all the users.
We leave the v1 version in place to aid incremental migration to the v2
one.
2022-01-14 08:54:26 +02:00
Botond Dénes
b315d17c2a compaction: migrate scrub and validate to v2
We add v2 version of external API but leave the old v1 in place to help
incremental migration. The implementation is migrated to v2.
2022-01-14 08:54:26 +02:00
Botond Dénes
f61fcfbada mutation_fragment_stream_validator: migrate validator to v2
Add support for validating v2 streams while still keeping the v1
support. Since the underlying logic is largely independent of the format
version, this is simple to do and will allow incremental migration of
users.
2022-01-14 08:54:26 +02:00
Nadav Har'El
8bcd23fa02 Merge: move rest of internal ddl users to use raft from Gleb
The patch series moves the rest of internal ddl users to do schema
change over raft (if enabled). After that series only tests are left
using old API.

* 'gleb/raft-schema-rest-v6' of github.com:scylladb/scylla-dev: (33 commits)
  migration_manager: drop no longer used functions
  system_distributed_keyspace: move schema creation code to use raft
  auth: move table creation code to use raft
  auth: move keyspace creation code to use raft
  table_helper: move schema creation code to use raft
  cql3: make query_processor inherit from peering_sharded_service
  table_helper: make setup_table() static
  table_helper: co-routinize setup_keyspace()
  redis: move schema creation code to go through raft
  thrift: move system_update_column_family() to raft
  thrift: authenticate a statement before verifying in system_update_column_family()
  thrift: co-routinize system_update_column_family()
  thrift: move system_update_keyspace() to raft
  thrift: authenticate a statement before verifying in system_update_keyspace()
  thrift: co-routinize system_update_keyspace()
  thrift: move system_drop_keyspace() to raft
  thrift: authenticate a statement before verifying in system_drop_keyspace()
  thrift: co-routinize system_drop_keyspace()
  thrift: move system_add_keyspace() to raft
  thrift: co-routinize system_add_keyspace()
  ...
2022-01-12 18:09:08 +02:00
Gleb Natapov
2aec9009ef migration_manager: drop no longer used functions 2022-01-12 16:40:06 +02:00
Gleb Natapov
9ce62bcc33 system_distributed_keyspace: move schema creation code to use raft 2022-01-12 16:40:06 +02:00
Gleb Natapov
50b7806c57 auth: move table creation code to use raft 2022-01-12 16:40:06 +02:00
Gleb Natapov
4273a3308c auth: move keyspace creation code to use raft 2022-01-12 16:40:06 +02:00
Gleb Natapov
03184bd786 table_helper: move schema creation code to use raft 2022-01-12 16:40:06 +02:00
Gleb Natapov
eb62e81843 cql3: make query_processor inherit from peering_sharded_service
This what we can get to a distributed object from shard local one.
2022-01-12 16:40:06 +02:00
Gleb Natapov
e2a29d9239 table_helper: make setup_table() static
It will make it easier to move schema creation to shard 0.
2022-01-12 16:40:06 +02:00
Gleb Natapov
3995f75b30 table_helper: co-routinize setup_keyspace()
Also replace open-coded loops with more modern c++ alternatives.
2022-01-12 16:40:05 +02:00
Gleb Natapov
5b4982d01f redis: move schema creation code to go through raft 2022-01-12 16:33:16 +02:00
Gleb Natapov
dd36150a7d thrift: move system_update_column_family() to raft 2022-01-12 16:33:16 +02:00
Gleb Natapov
bcfdcc51d6 thrift: authenticate a statement before verifying in system_update_column_family()
Otherwise it is possible to infer if a table exist without having proper
credentials.
2022-01-12 16:33:16 +02:00
Gleb Natapov
aec413d0f7 thrift: co-routinize system_update_column_family() 2022-01-12 16:33:16 +02:00
Gleb Natapov
d9c315891a thrift: move system_update_keyspace() to raft 2022-01-12 16:33:16 +02:00
Gleb Natapov
7ffbdde554 thrift: authenticate a statement before verifying in system_update_keyspace()
Otherwise it is possible to infer if a table exist without having proper
credentials.
2022-01-12 16:33:16 +02:00
Gleb Natapov
1b4538f5bd thrift: co-routinize system_update_keyspace() 2022-01-12 16:33:16 +02:00
Gleb Natapov
64b8f4fe50 thrift: move system_drop_keyspace() to raft 2022-01-12 16:33:16 +02:00
Gleb Natapov
52fc815f24 thrift: authenticate a statement before verifying in system_drop_keyspace()
Otherwise it is possible to infer if a table exist without having proper
credentials.
2022-01-12 16:33:16 +02:00
Gleb Natapov
45ff7e30a1 thrift: co-routinize system_drop_keyspace() 2022-01-12 16:33:16 +02:00
Gleb Natapov
a17f82c647 thrift: move system_add_keyspace() to raft 2022-01-12 16:33:16 +02:00
Gleb Natapov
3a3a3f693e thrift: co-routinize system_add_keyspace() 2022-01-12 16:33:16 +02:00
Gleb Natapov
845b617256 thrift: move system_drop_column_family() to raft 2022-01-12 16:33:16 +02:00
Gleb Natapov
9b6a9b104e thrift: co-routinize system_drop_column_family() 2022-01-12 16:33:16 +02:00
Gleb Natapov
7cfedb50bb thrift: move system_add_column_family() to raft 2022-01-12 16:33:16 +02:00
Gleb Natapov
e4ac3c2777 thrift: authenticate a statement before verifying in system_add_column_family()
Otherwise it is possible to infer if a table exist without having proper
credentials.
2022-01-12 16:33:16 +02:00
Gleb Natapov
d5f14306d0 thrift: co-routinize system_add_column_family() 2022-01-12 16:33:16 +02:00
Gleb Natapov
1491cc2906 alternator: move create_table() to raft 2022-01-12 16:33:16 +02:00
Gleb Natapov
0cd6d283ad alternator: move update_table() to raft 2022-01-12 16:33:15 +02:00
Gleb Natapov
7ee39ff94b alternator: move validation in update_table() to the begining 2022-01-12 16:33:15 +02:00
Gleb Natapov
740b2181e1 alternator: move update_tags() to raft 2022-01-12 16:33:15 +02:00
Gleb Natapov
57be1b773e alternator: move delete_table() to raft 2022-01-12 16:33:15 +02:00
Gleb Natapov
0ac20b5494 alternator: make some functions static
Make add_stream_options, supplement_table_info, supplement_table_stream_info static. They only need a pointer
to storage_proxy, so pass it directly.
2022-01-12 16:33:15 +02:00
Gleb Natapov
2e4a8bdfaa alternator: co-routinize delete_table() 2022-01-12 16:33:15 +02:00
Gleb Natapov
459539e812 migration_manager: do not allow creating keyspace with arbitrary timestamp
This was needed to fix issue #2129 which was only manifest itself with
auto_bootstrap set to false. The option is ignored now and we always
wait for schema to synch during boot.
2022-01-12 16:33:15 +02:00
Botond Dénes
bdcbf3f71b Merge 'database: Add error message with mutation info on commit log apply failure' from Calle Wilund
Fixes #9408

While it is rare, some customer issues have shown that we can run into cases where commit log apply (writing mutations to it) fails badly. In the known cases, due to oversized mutations. While these should have been caught earlier in the call chain really, it would probably help both end users and us (trying to figure out how they got so big and how they got so far) iff we added info to the errors thrown (and printed), such as ks, cf, and mutation content.

Somewhat controversial, this makes the apply with CL decision path coroutinized, mainly to be able to do the error handling    for the more informative wrapper exception easier/less ugly. Could perhaps do with futurize_invoke + then_wrapper also. But future is coroutines...

This is as stated somewhat problematic, it adds an allocation to perf_simple_query::write path (because of crap clang cr frame folding?). However, tasks/op remain constant and actual tps (though unstable) remain more or less the same (on my crappy measurements).

Counter path is unaffected, as coroutine frame alloc replaces with(...)

dtest for the wrapped exception on separate pr.

Closes #9412

* github.com:scylladb/scylla:
  database: Add error message with mutation info on commit log apply failure
  database: coroutinize do_apply and apply_with_commitlog
2022-01-12 16:16:29 +02:00
Calle Wilund
a6202ae079 database: Add error message with mutation info on commit log apply failure
Fixes #9408

While it is rare, some customer issues have shown that we can run into cases
where commit log apply (writing mutations to it) fails badly. In the known
cases, due to oversized mutations. While these should have been caught earlier
in the call chain really, it would probably help both end users and us (trying
to figure out how they got so big and how they got so far) iff we added info
to the errors thrown (and printed), such as ks, cf, and mutation content.
2022-01-12 14:04:23 +00:00
Calle Wilund
63ea666ca0 database: coroutinize do_apply and apply_with_commitlog
Somewhat controversial. Making the apply with CL decision path
coroutinized, mainly to be able to in next patch make error handling
more informative (because we will have exceptions that are immediate
and/or futurized).

This is as stated somewhat problematic, it adds an allocation to
perf_simple_query::write path (because of crap clang cr frame folding?).
However, tasks/op remain constant and actual tps (though unstable)
remain more or less the same (on my crappy measurements).

Counter path is unaffected, as coroutine frame alloc replaces with(...)
alloc, and all is same and dandy.

I am hoping that the simpler error + verbose code will compensate for
the extra alloc.
2022-01-12 14:04:15 +00:00
Nadav Har'El
23e93a26b3 Merge 'Alternator: stream results + chunk results to remove large allocations' from Calle Wilund
Refs: #9555

When running the "Kraken" dynamodb streams test to provoke the issued observed by QA, I noticed on my setup mainly two things: Large allocation stalls (+ warnings) and timeouts on read semaphores in DB.

This tries to address the first issue, partly by making query_result_view serialization using chunked vector instead of linear one, and by introducing a streaming option for json return objects, avoiding linearizing to string before wire.
Note that the latter has some overhead issues of its own, mainly data copying, since we essentially will be triple buffering (local, wrapped http stream, and final output stream). Still, normal string output will typically do a lot of realloc which is potential extra copies as well, so...

This is not really performance tested, but with these tweaks I no longer get large alloc stalls at least, so that is a plus. :-)

Closes #9713

* github.com:scylladb/scylla:
  alternator::executor: Use streamed result for scan etc if large result
  alternator::streams: Use streamed result in get_records if large result
  executor/server: Add routine to make stream object return
  rjson: Add print to stream of rjson::value
  query_idl: Make qr_partition::rows/query_result::partitions chunked
2022-01-12 15:53:31 +02:00
Calle Wilund
f73ca9659b alternator::executor: Use streamed result for scan etc if large result
Avoids large allocations for larger scans.
Todo: determine threshold
2022-01-12 13:34:49 +00:00
Calle Wilund
0c1ff5c2f5 alternator::streams: Use streamed result in get_records if large result
If we have a resonable result set to send back to client, use direct
streaming of the object.

Todo: determine threshold.
2022-01-12 13:34:49 +00:00
Calle Wilund
4a8a7ef8b4 executor/server: Add routine to make stream object return
Simply retains result object and sets json::json_return_type to
streaming callback.
2022-01-12 13:34:49 +00:00
Calle Wilund
e2d7225df8 rjson: Add print to stream of rjson::value
Allows direct stream of object to seastar::stream. While not 100%
efficient, it has the advantage of avoiding large allocations
(long string) for huge result messages.
2022-01-12 13:34:49 +00:00
Avi Kivity
134601a15e Merge "Convert input side of mutation compactor to v2" from Botond
"
With this series the mutation compactor can now consume a v2 stream. On
the output side it still uses v1, so it can now act as an online
v2->v1 converter. This allows us to push out v2->v1 conversion to as far
as the compactor, usually the next to last component in a read pipeline,
just before the final consumer. For reads this is as far as we can go,
as the intra-node ABI and hence the result-sets built are v1. For
compaction we could go further and eliminate conversion altogether, but
this requires some further work on both the compactor and the sstable
writer and so it is left to be done later.
To summarize, this patchset enables a v2 input for the compactor and it
updates compaction and single partition reads to use it.
"

* 'mutation-compactor-consume-v2/v1' of https://github.com/denesb/scylla:
  table: add make_reader_v2()
  querier: convert querier_cache and {data,mutation}_querier to v2
  compaction: upgrade compaction::make_interposer_consumer() to v2
  mutation_reader: remove unecessary stable_flattened_mutations_consumer
  compaction/compaction_strategy: convert make_interposer_consumer() to v2
  mutation_writer: migrate timestamp_based_splitting_writer to v2
  mutation_writer: migrate shard_based_splitting_writer to v2
  mutation_writer: add v2 clone of feed_writer and bucket_writer
  flat_mutation_reader_v2: add reader_consumer_v2 typedef
  mutation_reader: add v2 clone of queue_reader
  compact_mutation: make start_new_page() independent of mutation_fragment version
  compact_mutation: add support for consuming a v2 stream
  compact_mutation: extract range tombstone consumption into own method
  range_tombstone_assembler: add get_range_tombstone_change()
  range_tombstone_assembler: add get_current_tombstone()
2022-01-12 14:37:19 +02:00
Avi Kivity
4118f2d8be treewide: replace deprecated seastar::later() with seastar::yield()
seastar::later() was recently deprecated and replaced with two
alternatives: a cheap seastar::yield() and an expensive (but more
powerful) seastar::check_for_io_immediately(), that corresponds to
the original later().

This patch replaces all later() calls with the weaker yield(). In
all cases except one, it's unambiguously correct. In one case
(test/perf scheduling_latency_measurer::stop()) it's not so ambiguous,
since check_for_io_immediately() will additionally force a poll and
so will cause more work to be done (but no additional tasks to be
executed). However, I think that any measurement that relies on
the measuring the work on the last tick to be inaccurate (you need
thousands of ticks to get any amount of confidence in the
measurement) that in the end it doesn't matter what we pick.

Tests: unit (dev)

Closes #9904
2022-01-12 12:19:19 +01:00