Prior to463d0ab, only one table could be cleaned up at a time on a given shard. Since then, all tables belonging to a given keyspace are cleaned up in parallel. Cleanup serialization on each shard was enforced with a semaphore, which was incorrectly removed by the patch aforementioned. So space requirement for cleanup to succeed can be up to the size of keyspace, increasing the chances of node running out of space. Node could also run out of memory if there are tons of tables in the keyspace. Memory requirement is at least #_of_tables * 128k (not taking into account write behind, etc). With 5k tables, it's ~0.64G per shard. Also all tables being cleaned up in parallel will compete for the same disk and cpu bandwidth, so making them all much slower, and consequently the operation time is significantly higher. This problem was detected with cleanup, but scrub and upgrade go through the same rewrite procedure, so they're affected by exact the same problem. Fixes #8247. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210312162223.149993-1-raphaelsc@scylladb.com> (cherry picked from commit7171244844)
Scylla
Quick-start
Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).
Building and running Scylla with the frozen toolchain is as easy as:
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
Running Scylla
- Run Scylla
./build/release/scylla
- run Scylla with one CPU and ./tmp as work directory
./build/release/scylla --workdir tmp --smp 1
- For more run options:
./build/release/scylla --help
Testing
See test.py manual.
Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also experimental support for the API of Amazon DynamoDB, but being experimental it needs to be explicitly enabled to be used. For more information on how to enable the experimental DynamoDB compatibility in Scylla, and the current limitations of this feature, see Alternator and Getting started with Alternator.
Documentation
Documentation can be found in ./docs and on the wiki. There is currently no clear definition of what goes where, so when looking for something be sure to check both. Seastar documentation can be found here. User documentation can be found here.
Training
Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.
Building a CentOS-based Docker image
Build a Docker image with:
cd dist/docker/redhat
docker build -t <image-name> .
This build is based on executables downloaded from downloads.scylladb.com, not on the executables built in this source directory. See further instructions in dist/docker/redhat/README.md to build a docker image from your own executables.
Run the image with:
docker run -p $(hostname -i):9042:9042 -i -t <image name>