Raphael S. Carvalho 5aeeb0b3e8 compaction: add support to parallel compaction on the same column family
It was noticed that small sstables will accumulate for a column family because
scylla was limited to two compaction per shard, and a column family could have
at most one compaction running at a given shard. With the number of sstables
increasing rapidly, read performance is degraded.

At the moment, our compaction manager works by running two compaction task
handlers that run in parallel to the rest of the system. Each task handler
gets to run when needed, gets a column family from compaction manager queue,
runs compaction on it, and goes to sleep again. That's basically its cycle.
Compaction manager only allows one instance of a column family to be on its
queue, meaning that it's impossible for a column family to be compacted in
parallel. One compaction starts after another for a given column family.

To solve the problem described, we want to concurrently run compaction jobs
of a column family that have different "size tier" (or "weight").
For those unfamiliar, compaction job contains a list of sstables that will be
compacted together.
The "size tier" of a compaction job is the log of the total size of the input
sstables. So a compaction job only gets to run if its "size tier" is not the
same of an ongoing compaction. There is no point in compacting concurrently at
the same "size tier", because that slows down both compactions.

We will no longer queue column families in compaction manager. Instead, we
create a new fiber to run compaction on demand.
This fiber that runs asynchronously will do the following:
1) Get a compaction job from compaction strategy.
2) Calculate "size tier" of compaction job.
3) Run compaction job if its "size tier" is not the same of an ongoing
compaction for the given column family.
As before, it may decide to re-compact a column family based on a stat stored
in column family object.

Ran all compaction-related dtests.

Fixes #1216.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d30952ff136192a522bde4351926130addec8852.1462311908.git.raphaelsc@scylladb.com>
2016-05-04 11:46:09 +03:00
2016-03-24 15:37:00 +02:00
2016-04-26 08:40:28 +03:00
2016-04-08 08:12:47 +03:00
2016-04-08 08:12:47 +03:00
2016-05-02 11:10:33 +03:00
2016-04-08 08:12:47 +03:00
2016-04-08 08:12:47 +03:00
2016-04-08 08:12:47 +03:00
2015-10-26 15:59:58 +02:00
2016-04-08 08:12:47 +03:00
2015-06-24 13:09:51 +03:00
2016-01-08 21:10:25 +01:00
2016-01-24 12:29:21 +02:00
2016-04-08 08:12:47 +03:00
2016-04-08 08:12:47 +03:00
2016-04-08 08:12:47 +03:00
2016-04-08 08:12:47 +03:00
2015-09-20 10:45:35 +03:00
2016-04-21 14:55:26 +03:00
2016-03-11 18:27:13 +00:00
2015-12-07 09:50:27 +01:00
2016-04-08 08:12:47 +03:00
2016-03-03 13:27:22 +02:00
2016-04-08 08:12:47 +03:00

#Scylla

##Building Scylla

In addition to required packages by Seastar, the following packages are required by Scylla.

Submodules

Scylla uses submodules, so make sure you pull the submodules first by doing:

git submodule init
git submodule update --recursive

Building and Running Scylla on Fedora

  • Installing required packages:
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing
  • Build Scylla
./configure.py --mode=release --with=scylla --disable-xen
ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM

  • Run Scylla
./build/release/scylla

  • run Scylla with one CPU and ./tmp as data directory
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
  • For more run options:
./build/release/scylla --help

Building Fedora RPM

As a pre-requisite, you need to install Mock on your machine:

# Install mock:
sudo yum install mock

# Add user to the "mock" group:
usermod -a -G mock $USER && newgrp mock

Then, to build an RPM, run:

./dist/redhat/build_rpm.sh

The built RPM is stored in /var/lib/mock/<configuration>/result directory. For example, on Fedora 21 mock reports the following:

INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds
INFO: Results and/or logs in: /var/lib/mock/fedora-21-x86_64/result

Building Fedora-based Docker image

Build a Docker image with:

cd dist/docker
docker build -t <image-name> .

Run the image with:

docker run -p $(hostname -i):9042:9042 -i -t <image name>

Contributing to Scylla

Do not send pull requests.

Send patches to the mailing list address scylladb-dev@googlegroups.com. Be sure to subscribe.

In order for your patches to be merged, you must sign the Contributor's License Agreement, protecting your rights and ours. See http://www.scylladb.com/opensource/cla/.

Description
No description provided
Readme 459 MiB
Languages
C++ 72.3%
Python 26.5%
CMake 0.3%
GAP 0.3%
Shell 0.3%