The chunk size used in sstable compression can be set when creating a
table, using the "chunk_length_in_kb" parameter. It can be any power-of-two
multiple of 1KB. Very large compression chunks are not useful - they
offer diminishing returns on compression ratio, and require very large
memory buffers and reading a very large amount of disk data just to
read a small row. In fact, small chunks are recommended - Scylla
defaults to 4 KB chunks, and Cassandra lowered their default from 64 KB
(in Cassandra 3) to 16 KB (in Cassandra 4).
Therefore, allowing arbitrarily large chunk sizes is just asking for
trouble. Today, a user can ask for a 1 GB chunk size, and crash or hang
Scylla when it runs out of memory. So in this patch we add a hard limit
of 128 KB for the chunk size - anything larger is refused.
Fixes#9933
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#14267
This patch adds some minimal tests for the "with compression = {..}" table
configuration. These tests reproduce three known bugs:
Refs #6442: Always print all schema parameters (including default values)
Scylla doesn't return the default chunk_length_in_kb, but Cassandra
does.
Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
for compression settings by default
Cassandra switched, long ago, the "sstable_compression" attribute's
name to "class". This can break Cassandra applications that create
tables (where we won't understand the "class" parameter) and applications
that inquire about the configuration of existing tables. This patch adds
tests for both problems.
Refs #9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a
core dump on all nodes
Our test for this issue hangs Scylla (or crashes, depending on the test
environment configuration), when a huge allocation is attempted during
memtable flush. So this test is marked "skip" instead of xfail.
The tests included here also uncovered a new minor/insignificant bug,
where Scylla allows floating point numbers as chunk_length_in_kb - this
number is truncated to an integer, and allowed, unlike Cassandra or
common sense.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#14261