Commit Graph

4972 Commits

Author SHA1 Message Date
Tomasz Grabiec
b044db863f Merge 'db/virtual_table: Streaming tables for large data + describe_ring example table' from Juliusz Stasiewicz
This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data.

This PR was created by @jul-stas and @StarostaGit

Closes #8961

* github.com:scylladb/scylla:
  test/boost: run_mutation_source_tests on streaming virtual table
  system_keyspace: Introduce describe_ring table as virtual_table
  storage_service: Pass the reference down to system_keyspace
  endpoint_details: store `_host` as `gms::inet_address`
  queue_reader: implement next_partition()
  virtual_tables: Introduce streaming_virtual_table
  flat_mutation_reader: Add a new filtering reader factory method
2021-07-23 18:05:51 +02:00
Avi Kivity
d0d42891e9 Merge 'Harden batchlog_manager stop and call from main in deferred action' from Benny Halevy
This PR contains the parts relevant to batchlog_manager stop in #8998 without adding a gate to the storage_proxy for synchronization with on-going queries in storage_proxy::drain_on_shutdown.

As explained in #9009, we see that the batchlog_manager isn't stopped if scylla shuts down during startup, e.g. when waiting for gossip to settle, since currently the batchlog_manager is stopped only from `storage_service::do_drain`, while `storage_service::drain_on_shutdown` deferred shutdown is installed only later on:
222ef17305/main.cc (L1419-L1421)

Fixes #9009

Test: unit(dev)
DTest: compact_storage_tests.py:TestCompactStorage.wide_row_test paging_test:TestPagingDatasetChanges.test_cell_TTL_expiry_during_paging update_cluster_layout_tests:TestUpdateClusterLayout.simple_add_new_node_while_adding_info_{1,2}_test (dev)

Closes #9010

* github.com:scylladb/scylla:
  main: add deferred stop of batchlog_manager
  batchlog_manager: refactor drain out of stop
  batchlog_manager: stop: break _sem on shard 0
  batchlog_manager: stop: use abort_source to abort batchlog_replay_loop
  batchlog_manager: do_batch_log_replay: hold _gate
2021-07-22 15:47:29 +03:00
Benny Halevy
5165780d81 batchlog_manager: refactor drain out of stop
drain() aborts the replay loop fiber
and returns its future.

It's grabbing _gate so stop() will wait on it.

The intention is to call stop_replay_loop from
storage_service::decommission and do_drain rather
than stop, so we can stop the batchlog manager once,
using a deferred action in main.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-07-20 20:23:06 +03:00
Benny Halevy
c47fbda076 batchlog_manager: stop: break _sem on shard 0
Abort do_batch_log_replay if waiting on the semaphore.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-07-20 19:35:23 +03:00
Benny Halevy
deef1b4f59 batchlog_manager: stop: use abort_source to abort batchlog_replay_loop
Harden start/stop by using an abort_source to abort from
the replay loop.

Extract the loop into batchlog_replay_loop() coroutine,
with the _stop abourt source as a stop condition,
plus use it for sleep_abortable to be able to promptly
stop while sleeping.

start() stores batchlog_replay_loop's future in a newly added
_started member, which is waited on in stop() to synchronize
with the start process at any stage.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-07-20 19:32:55 +03:00
Benny Halevy
976b517f55 batchlog_manager: do_batch_log_replay: hold _gate
So we can wait on do_batch_log_replay on stop().

Note that do_batch_log_replay is called both from
batchlog_replay_loop and from the storage_service.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-07-20 19:30:55 +03:00
Juliusz Stasiewicz
65c87e2c74 system_keyspace: Introduce describe_ring table as virtual_table
This change adds "system.describe_ring" table using the new
streaming_virtual_table infrastructure.
2021-07-20 14:19:17 +02:00
Juliusz Stasiewicz
f8067d938d storage_service: Pass the reference down to system_keyspace
According to the policy of avoiding globals.
2021-07-20 14:18:24 +02:00
Piotr Wojtczak
9a77751c6b virtual_tables: Introduce streaming_virtual_table
This change adds another implementation of the virtual_table interface,
useful for cases where there's bigger amounts of data.
2021-07-20 14:00:54 +02:00
Calle Wilund
4990ba2769 commitlog: Make allocate_when_possible a template
And call it by-value with the polymorphic writers. This
eliminates outer coroutine frame and ensures we use only one
for fast-case allocation.
2021-07-19 08:27:30 +00:00
Calle Wilund
69ead0e658 commitlog: break fast path alloc into non-fut/corout + outer loop
Removes 2 coroutine frames in fast path (as long as segment + space is
avail). Puts IPS back on track with master.
2021-07-19 08:27:30 +00:00
Calle Wilund
62acc84e58 commitlog: Drop stream/subscription from replayer
Change args to values so stays on coroutine frame.
Remove pointless subscription/stream usage, just iterate.
2021-07-19 08:27:30 +00:00
Calle Wilund
5e8af28da7 commitlog: coroutinize commitlog::read_log_file 2021-07-19 08:27:30 +00:00
Calle Wilund
b3c35f9ec0 commitlog: coroutinize commitlog::create_commitlog 2021-07-19 08:27:30 +00:00
Calle Wilund
ef471d0a93 commitlog: coroutinize commitlog::add_entries 2021-07-19 08:27:30 +00:00
Calle Wilund
96434b1b12 commitlog: coroutinize commitlog::add_entry 2021-07-19 08:27:30 +00:00
Calle Wilund
e16cff6952 commitlog: coroutinize commitlog::add 2021-07-19 08:27:30 +00:00
Calle Wilund
da360fb841 commitlog: change entry_writer usage to reference
Calling frames keeps object alive in all paths. Use references in
allocate()/allocate_when_possible()
2021-07-19 08:27:30 +00:00
Calle Wilund
42bfae513a commitlog: coroutinize segment_manager::clear 2021-07-19 08:27:30 +00:00
Calle Wilund
554a09baab commitlog: coroutinize segment_manager::do_pending_deletes 2021-07-19 08:27:30 +00:00
Calle Wilund
9e18cf3f5f commitlog: coroutinize segment_manager::delete_file 2021-07-19 08:27:30 +00:00
Calle Wilund
ca65387c53 commitlog: coroutinize segment_manager::shutdown 2021-07-19 08:27:30 +00:00
Calle Wilund
4678d1fbec commitlog: coroutinize segment_manager::shutdown_all_segments 2021-07-19 08:27:30 +00:00
Calle Wilund
2f048e658b commitlog: coroutinize segment_manager::sync_all_segments 2021-07-19 08:27:30 +00:00
Calle Wilund
ad4e4e9ee4 commitlog: coroutinize segment_manager::clear_reserve_segments 2021-07-19 08:27:30 +00:00
Calle Wilund
ec430807fc commitlog: coroutinize segment_manager::active_segment 2021-07-19 08:27:30 +00:00
Calle Wilund
13bba1ef39 commitlog: coroutinize segment_manager::new_segment 2021-07-19 08:27:30 +00:00
Calle Wilund
ccd34203dc commitlog: coroutinize segment_manager::allocate_segment 2021-07-19 08:27:30 +00:00
Calle Wilund
f5de830f0c commitlog: coroutinize segment_manager::rename_file 2021-07-19 08:27:30 +00:00
Calle Wilund
011bc68209 commitlog: coroutinize segment_manager::init 2021-07-19 08:27:30 +00:00
Calle Wilund
04c725b29c commitlog: coroutinize segment_manager::list_descriptors 2021-07-19 08:27:30 +00:00
Calle Wilund
d514fc5822 commitlog: coroutinize segment_manager::replenish_reserve 2021-07-19 08:27:30 +00:00
Calle Wilund
d4bd17d577 commitlog: coroutinize segment::shutdown 2021-07-19 08:17:33 +00:00
Calle Wilund
e9820827e3 commitlog: coroutinize segment::close 2021-07-19 08:17:33 +00:00
Calle Wilund
999701a8ee commitlog: coroutinize segment::batch_cycle 2021-07-19 08:17:33 +00:00
Calle Wilund
cef7ee2014 commitlog: coroutinize segment::do_flush 2021-07-19 08:17:33 +00:00
Calle Wilund
1a76d735f2 commitlog: coroutinize segment::flush 2021-07-19 08:17:33 +00:00
Calle Wilund
0b1e2084ce commitlog: coroutinize segment::cycle 2021-07-19 08:17:33 +00:00
Calle Wilund
79b9cb1e5c commitlog: coroutinize allocate_when_possible 2021-07-19 08:17:33 +00:00
Calle Wilund
e545b382bd commitlog: coroutinize segment::allocate 2021-07-19 08:17:33 +00:00
Nadav Har'El
5183e0cbe9 Merge 'Fix artificial view update size limit' from Piotr Sarna
The series which split the view update process into smaller parts
accidentally put an artificial 10MB limit on the generated mutation
size, which is wrong - this limit is configurable for users,
and, what's more important, this data was already validated when
it was inserted into the base table. Thus, the limit is lifted.

The series comes with a cql-pytest which failed before the fix and succeeds now. This bug is also  covered by `wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view` dtest, but it needs over a minute to run, as opposed to cql-pytest's <1 second.

Fixes #9047

Tests: unit(release), dtest(wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view)

Closes #9048

* github.com:scylladb/scylla:
  cql-pytest: add a materialized views suite with first cases
  db,view: drop the artificial limit on view update mutation size
2021-07-15 17:03:07 +03:00
Piotr Sarna
697e2fc66d db,view: drop the artificial limit on view update mutation size
The series which split the view update process into smaller parts
accidentally put an artificial 10MB limit on the generated mutation
size, which is wrong - this limit is configurable for users,
and, what's more important, this data was already validated when
it was inserted into the base table. Thus, the limit is lifted.

Tests: unit(release), dtest(wide_rows_test)
2021-07-15 14:09:37 +02:00
Botond Dénes
1b7eea0f52 reader_concurrency_semaphore: admission: flip the switch
This patch flips two "switches":
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) by now all permits are obtained up-front, so this patch just yanks
out the restricted reader from all reader stacks and simultaneously
switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By
doing this admission is now waited on when creating the permit.

(2) we switch to an admission algorithm that adds a new aspect to the
existing resource availability: the number of used/blocked reads. Namely
it only admits new reads if in addition to the necessary amount of
resources being available, all currently used readers are blocked. In
other words we only admit new reads if all currently admitted reads
requires something other than CPU to progress. They are either waiting
on I/O, a remote shard, or attention from their consumers (not used
currently).

We flip these two switches at the same time because up-front admission
means cache reads now need to obtain a permit too. For cache reads the
optimal concurrency is 1. Anything above that just increases latency
(without increasing throughput). So we want to make sure that if a cache
reader hits it doesn't get any competition for CPU and it can run to
completion. We admit new reads only if the read misses and has to go to
disk.

Another change made to accommodate this switch is the replacement of the
replica side read execution stages which the reader concurrency
semaphore as an execution stage. This replacement is needed because with
the introduction of up-front admission, reads are not independent of
each other any-more. One read executed can influence whether later reads
executed will be admitted or not, and execution stages require
independent operations to work well. By moving the execution stage into
the semaphore, we have an execution stage which is in control of both
admission and running the operations in batches, avoiding the bad
interaction between the two.
2021-07-14 17:19:02 +03:00
Botond Dénes
7bfa40a2f1 treewide: use make_tracking_only_permit()
For all those reads that don't (won't or can't) pass through admission
currently.
2021-07-14 17:19:02 +03:00
Botond Dénes
f28b5018f2 view/view_update_generator: use obtain_reader_permit() 2021-07-14 16:48:43 +03:00
Botond Dénes
ea2345c944 db/size_estimates_virtual_reader: mark as blocked when obtaining local ranges 2021-07-14 16:48:43 +03:00
Nadav Har'El
1ff1c3735b Merge 'Remove the mutation-based restriction checks' from Piotr Sarna
This series unifies the interface for checking if CQL restrictions are satisfied. Previously, an additional mutation-based approach was added in the materialized views layer, but the decision was reached that it's better to have a single API based on partition slices. With that, the regular selection path gets simplified at the cost of more complicated view generation path, which is a good tradeoff.
Note that in order to unify the interface, the view layer performs ugly transformations in order to adjust the input for `is_satisfied_by`. Reviewers, please take a close look at this code (`matches_view_filter`, `clustering_prefix_matches`, `partition_key_matches`), because it looks error-prone and relies on dirty internals of our serialization layer. If somebody has a better suggestion on how to do the transformation, I'm all ears.

Tests: unit(release), manual(playing with materialized views with custom filters)
Fixes #7215

Closes #8979

* github.com:scylladb/scylla:
  db,view,table: drop unneeded time point parameter
  cql3,expr: unify get_value
  cql3,expr: purge mutation-based is_satisfied_by
  db,view: migrate key checks from the deprecated is_satisfied_by
  db,view: migrate checking view filter to new is_satisfied_by
  db,view: add a helper result builder class
  db,view: move make_partition_slice helper function up
2021-07-13 12:42:13 +03:00
Piotr Sarna
a1813c9b34 db,view,table: drop unneeded time point parameter
Now that restriction checking is translated to the partition-slice-style
interface, checking the partition/clustering key restrictions for views
can be performed without the time point parameter.
The parameter is dropped from all relevant call sites.
2021-07-13 10:40:08 +02:00
Piotr Sarna
37fc3f4b5b db,view: migrate key checks from the deprecated is_satisfied_by
Last two users of the mutation-based is_satisfied_by function
were in the partition/clustering key checks. These functions are now
translated to use the original API.
2021-07-13 10:40:07 +02:00
Piotr Sarna
d6b0a8338a db,view: migrate checking view filter to new is_satisfied_by
In order to unify the interfaces, the is_satisfied_by flavor
for mutations is getting deprecated. In order to be able to remove it,
one of its biggest users, the matches_view_filter() function,
is translated to the other variant.
2021-07-13 10:04:03 +02:00