mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-31 12:06:44 +00:00
4c3e82bd67f5bf8cbf44d5b1fdfd941ddfd5f9f8
"
This series introduces the view_builder class, a sharded service
responsible for building all defined materialized views. This process
entails walking over the existing data in a given base table, and using
it to calculate and insert the respective entries for one or more views.
The view_builder uses the migration_manager to subscribe to schema
change events, and update its bookkeeping accordingly. We prefer this
to having the database call into the view_builder, as that would
create a cyclic dependency.
We serialize changes to the views of a particular base table, such
that schema changes do not interfere with the view building process.
We employ a flat_mutation_reader for each base table for which we're
building views.
We consume from the reader associated with each base table until all
its views are built. If the reader reaches the end and there are
incomplete views, then a view was added while others were being built.
In such cases, we restart the reader to the beginning of the current
token, but not to the beginning of the token range, when the view is
added. Then, when we exhaust the reader, we simply create a new one
for the whole token range, and resume building the pending views.
We aim to be resource-conscious. On a given shard, at any given moment,
we consume at most from one reader. We also strive for fairness, in that
each build step inserts entries for the views of a different base. Each
build step reads and generates updates for batch_size rows. We lack a
controller, which could potentially allow us to go faster (to execute
multiple steps at the same time, or consume more rows per batch), and
also which would apply backpressure, so we could, for example, delay
executing a build step.
Interaction with the system tables:
- When we start building a view, we add an entry to the
scylla_views_builds_in_progress system table. If the node restarts
at this point, we'll consider these newly inserted views as having
made no progress, and we'll treat them as new views;
- When we finish a build step, we update the progress of the views
that we built during this step by writing the next token to the
scylla_views_builds_in_progress table. If the node restarts here,
we'll start building the views at the token in the next_token
column.
- When we finish building a view, we mark it as completed in the
built views system table, and remove it from the in-progress system
table. Under failure, the following can happen:
* When we fail to mark the view as built, we'll redo the last
step upon node reboot;
* When we fail to delete the in-progress record, upon reboot
we'll remove this record.
A view is marked as completed only when all shards have finished
their share of the work, that is, if a view is not built, then all
shards will still have an entry in the in-progress system table;
- A view that a shard finished building, but not all other shards,
remains in the in-progress system table, with first_token ==
next_token.
Interaction with the distributed system tables:
- When we start building a view, we mark the view build as being
in-progress;
- When we finish building a view, we mark the view as being built.
Upon failure, we ensure that if the view is in the in-progress
system table, then it may not have been written to this table. We
don't load the built views from this table when starting. When
starting, the following happens:
* If the view is in the system.built_views table and not the
in-progress system table, then it will be in this one;
* If the view is in the system.built_views table and not in
this one, it will still be in the in-progress system table -
we detect this and mark it as built in this table too,
keeping the invariant;
* If the view is in this table but not in system.built_views,
then it will also be in the in-progress system table - we
don't detect this and will redo the missing step, for
simplicity.
View building is necessarily a sharded process. That means that on
restart, if the number of shards has changed, we need to calculate
the most conservative token range that has been built, and build
the remainder.
When building view updates, we consider that everything is new and
nothing pre-existing is there (which means no tombstones will be sent
out to the paired view replicas).
Tests:
unit (debug)
dtest (materialized_view_test.py(smp=1, smp=2))
"
* 'view-building/v4' of https://github.com/duarten/scylla: (22 commits)
tests/view_build_test: Add tests for view building
tests/cql_test_env: Move eventually() to this file
tests/cql_assertions: Assert result set is not empty
tests/cql_test_env: Start the view_builder
db/view/view_builder: Allow synchronizing with the end of a build
db/view/view_builder: Actually build views
flat_mutation_reader: Make reader from mutation fragments
db/view/view_builder: React to schema changes
service/migration_listener: Add class for view notifications
db/view: Introduce view_builder
column_family: Add function to populate views
column_family: Allow synchronizing with in-progress writes
database: Compare view id instead of name in find_views()
database: Add get_views() function
db/view: Return a future when sending view updates
service/storage_service: Allow querying the view build status
db: Introduce system_distributed_keyspace
tests: Add unit test for build_progress_virtual_reader
db/system_keyspace: Add API for MV-related system tables
db/system_keyspace: Add virtual reader for MV in-progress build status
...
…
Scylla
Quick-start
$ git submodule update --init --recursive
$ sudo ./install-dependencies.sh
$ ./configure.py --mode=release
$ ninja-build -j4 # Assuming 4 system threads.
$ ./build/release/scylla
$ # Rejoice!
Please see HACKING.md for detailed information on building and developing Scylla.
Running Scylla
- Run Scylla
./build/release/scylla
- run Scylla with one CPU and ./tmp as data directory
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
- For more run options:
./build/release/scylla --help
Building Fedora RPM
As a pre-requisite, you need to install Mock on your machine:
# Install mock:
sudo yum install mock
# Add user to the "mock" group:
usermod -a -G mock $USER && newgrp mock
Then, to build an RPM, run:
./dist/redhat/build_rpm.sh
The built RPM is stored in /var/lib/mock/<configuration>/result directory.
For example, on Fedora 21 mock reports the following:
INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds
INFO: Results and/or logs in: /var/lib/mock/fedora-21-x86_64/result
Building Fedora-based Docker image
Build a Docker image with:
cd dist/docker
docker build -t <image-name> .
Run the image with:
docker run -p $(hostname -i):9042:9042 -i -t <image name>
Contributing to Scylla
Description
Languages
C++
72.3%
Python
26.5%
CMake
0.3%
GAP
0.3%
Shell
0.3%