Files
scylladb/conf
Avi Kivity 5687a4840d conf: pair sstable_format=ms with column_index_size_in_kb=1
One of the advantages of Trie indexes (with sstable_format=ms) is that
the index is more compact, and more suitable for paging from disk
(fewer pages required per search). We can exploit it by setting
column_index_size_in_kb to 1 rather than 64, increasing the index file
size (and requiring more index pages to be loaded and parsed) in return
for smaller data file reads.

To test this, I created a 1M row partition with 300-byte rows, compacted
it into a single sstable, and tested reads to a single row.

With column_index_size_in_kb=64:

Rows.db file size 60k
3 pages read from Rows.db (4k each)
2x 32k read from Data.db

With column_index_size_in_kb=1:

Rows.db file size 2MB (33X)
5 pages read from Rows.db (4k each, 1.7X)
1x 4107 bytes read from Data.db (0.5X IOPS, 0.06X bandwidth)

Given that Rows.db will be typically cached, or at least all but one of the
levels (its size is 157X smaller than Data.db), we win on both IOPS
and bandwidth.

I would have expected the the Data.db read to be closer to 1k, but this
is already an improvement.

Given that, set column_index_size_in_kb=1, but only for new clusters
where we also select sstable_format=ms.

Raw data (w1, w64 are working directories with different
column_index_size_in_kb):

```console
$ ls -l w*/data/bench/wide_partition-*/*{Rows,Data}.db
-rw-r--r-- 1 avi avi 314964958 Apr 19 16:17 w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Data.db
-rw-r--r-- 1 avi avi   2001227 Apr 19 16:17 w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db
-rw-r--r-- 1 avi avi 314963261 Apr 19 16:18 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db
-rw-r--r-- 1 avi avi     59989 Apr 19 16:18 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db
```

column_index_size_in_kb=64 trace:

```
cqlsh> SELECT * FROM bench.wide_partition WHERE pk = 0 AND ck = 654321 BYPASS CACHE;

 pk | ck     | v
----+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 | 654321 | 9OXdwmDHRapL2w5YruWLTOtiC3PKbyctSDdQ8YpuPKtWkSYBF10G7bKo2rdnxSAd52HLI21568YM7OwK05B6qAF7X2b6910qsJEA106QBEcFWQVybMCkxkpO4VDRcAVNLRgjB3vygcDBP17GBTb2s7l47UOloy3KtZ7J5YQgKcf7zlFSKGHa49vnRrzoXZCdYexOpix6jcSV2SiwRNqgv6XmYhx43ZwGa4zUtOe0eIKJj7KTxu5bzyWUWGW7US4NLFZRD8Vdb6EasIFkOfVKdiFp2LZHMXGRvtvdF93UTFUb

(1 rows)

Tracing session: 19219900-3bf3-11f1-bc43-c0a4e62b53d1

 activity                                                                                                                                                                                                                 | timestamp                        | source    | source_elapsed | client
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+----------------+-----------
                                                                                                                                                                                                       Execute CQL3 query |       2026-04-19 16:24:30.992000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                                                                                                 Parsing a statement [shard 0/sl:default] | 2026-04-19 16:24:30.992643+00:00 | 127.0.0.1 |              1 | 127.0.0.1
                                                                                                                                            Processing a statement for authenticated user: anonymous [shard 0/sl:default] | 2026-04-19 16:24:30.992738+00:00 | 127.0.0.1 |             96 | 127.0.0.1
                                                                                                                                                               Executing read query (reversed false) [shard 0/sl:default] | 2026-04-19 16:24:30.992765+00:00 | 127.0.0.1 |            123 | 127.0.0.1
                        Creating read executor for token -3485513579396041028 with all: [cf134ebd-5f1b-4844-94e3-e5c7ad9421f0] targets: [cf134ebd-5f1b-4844-94e3-e5c7ad9421f0] repair decision: NONE [shard 0/sl:default] | 2026-04-19 16:24:30.992781+00:00 | 127.0.0.1 |            139 | 127.0.0.1
                                                                           Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 0/sl:default] | 2026-04-19 16:24:30.992782+00:00 | 127.0.0.1 |            140 | 127.0.0.1
                                                                                                                                                                         read_data: querying locally [shard 0/sl:default] | 2026-04-19 16:24:30.992795+00:00 | 127.0.0.1 |            153 | 127.0.0.1
                                                                                                                            Start querying singular range {{-3485513579396041028, pk{000400000000}}} [shard 0/sl:default] | 2026-04-19 16:24:30.992801+00:00 | 127.0.0.1 |            160 | 127.0.0.1
                                                                                                                                      [reader concurrency semaphore sl:default] admitted immediately [shard 0/sl:default] | 2026-04-19 16:24:30.992805+00:00 | 127.0.0.1 |            163 | 127.0.0.1
                                                                                                                                            [reader concurrency semaphore sl:default] executing read [shard 0/sl:default] | 2026-04-19 16:24:30.992814+00:00 | 127.0.0.1 |            172 | 127.0.0.1
                        Reading key {-3485513579396041028, pk{000400000000}} from sstable w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db [shard 0/sl:default] | 2026-04-19 16:24:30.992837+00:00 | 127.0.0.1 |            195 | 127.0.0.1
                                         page cache miss: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Partitions.db, page=0, readahead=1 [shard 0/sl:default] | 2026-04-19 16:24:30.992851+00:00 | 127.0.0.1 |            209 | 127.0.0.1
                                              page cache miss: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=14, readahead=1 [shard 0/sl:default] | 2026-04-19 16:24:30.995294+00:00 | 127.0.0.1 |           2653 | 127.0.0.1
                                                            page cache hit: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=14 [shard 0/sl:default] | 2026-04-19 16:24:30.995375+00:00 | 127.0.0.1 |           2733 | 127.0.0.1
                                               page cache miss: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=2, readahead=1 [shard 0/sl:default] | 2026-04-19 16:24:30.995376+00:00 | 127.0.0.1 |           2734 | 127.0.0.1
                                                            page cache hit: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=14 [shard 0/sl:default] | 2026-04-19 16:24:30.995463+00:00 | 127.0.0.1 |           2821 | 127.0.0.1
                                                             page cache hit: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=2 [shard 0/sl:default] | 2026-04-19 16:24:30.995463+00:00 | 127.0.0.1 |           2821 | 127.0.0.1
                              w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: scheduling bulk DMA read of size 32768 at offset 206057984 [shard 0/sl:default] | 2026-04-19 16:24:30.995471+00:00 | 127.0.0.1 |           2829 | 127.0.0.1
                              w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: scheduling bulk DMA read of size 32768 at offset 206090752 [shard 0/sl:default] | 2026-04-19 16:24:30.995475+00:00 | 127.0.0.1 |           2833 | 127.0.0.1
 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: finished bulk DMA read of size 32768 at offset 206057984, successfully read 32768 bytes [shard 0/sl:default] | 2026-04-19 16:24:30.995586+00:00 | 127.0.0.1 |           2945 | 127.0.0.1
                            Page stats: 1 partition(s) (1 live, 0 dead), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 1 cell(s) (1 live, 0 dead) [shard 0/sl:default] | 2026-04-19 16:24:30.995637+00:00 | 127.0.0.1 |           2995 | 127.0.0.1
 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: finished bulk DMA read of size 32768 at offset 206090752, successfully read 32768 bytes [shard 0/sl:default] | 2026-04-19 16:24:30.995645+00:00 | 127.0.0.1 |           3003 | 127.0.0.1
                                                                                                                                                                                    Querying is done [shard 0/sl:default] | 2026-04-19 16:24:30.995653+00:00 | 127.0.0.1 |           3012 | 127.0.0.1
                                                                                                                                                                Done processing - preparing a result [shard 0/sl:default] | 2026-04-19 16:24:30.995670+00:00 | 127.0.0.1 |           3028 | 127.0.0.1
                                                                                                                                                                                                         Request complete |       2026-04-19 16:24:30.995039 | 127.0.0.1 |           3039 | 127.0.0.1

                              w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: scheduling bulk DMA read of size 32768 at offset 206090752 [shard 0/sl:default] | 2026-04-19 16:22:43.107215+00:00 | 127.0.0.1 |           8685 | 127.0.0.1
```

column_index_size_in_kb=1 trace:

```
cqlsh> SELECT * FROM bench.wide_partition WHERE pk = 0 AND ck = 654321 BYPASS CACHE;

 pk | ck     | v
----+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 | 654321 | FIA7X52ZqYwvDxEGlmWJUSy1I94WTuWZTdLwXr9HBQ90RJLqYKr5nInTADSI6hzofwawaXphAQK07YMoyzFfRaGeKPQPKUb35XpLEGvLJ4xu9r4es8wUEHPXaFBGdMcWUkyDJSTYCFzZAPCzUHEuPJHMXVrI6UExWrIR0Xujg4GZa9UciU9rbEvrSBwSzoPEfbXJ6qZSGiTD8gcXz5kdAblLxsAeWug8tZqslsTu04HMLKfZ8WopQvHbpR6YlGSnM99CiBgz30LMmllULV4VA4u9kMpzsRV2IE2tKmJOddEl

(1 rows)

Tracing session: 3953a1f0-3bf3-11f1-b976-4a3dc2a7a57f

 activity                                                                                                                                                                                                              | timestamp                        | source    | source_elapsed | client
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+----------------+-----------
                                                                                                                                                                                                    Execute CQL3 query |       2026-04-19 16:25:25.007000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                                                                                              Parsing a statement [shard 0/sl:default] | 2026-04-19 16:25:25.007423+00:00 | 127.0.0.1 |              1 | 127.0.0.1
                                                                                                                                         Processing a statement for authenticated user: anonymous [shard 0/sl:default] | 2026-04-19 16:25:25.007511+00:00 | 127.0.0.1 |             89 | 127.0.0.1
                                                                                                                                                            Executing read query (reversed false) [shard 0/sl:default] | 2026-04-19 16:25:25.007536+00:00 | 127.0.0.1 |            114 | 127.0.0.1
                     Creating read executor for token -3485513579396041028 with all: [e7bd75e7-6d2a-46dc-9f66-430524f40e0d] targets: [e7bd75e7-6d2a-46dc-9f66-430524f40e0d] repair decision: NONE [shard 0/sl:default] | 2026-04-19 16:25:25.007551+00:00 | 127.0.0.1 |            129 | 127.0.0.1
                                                                        Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 0/sl:default] | 2026-04-19 16:25:25.007553+00:00 | 127.0.0.1 |            131 | 127.0.0.1
                                                                                                                                                                      read_data: querying locally [shard 0/sl:default] | 2026-04-19 16:25:25.007556+00:00 | 127.0.0.1 |            134 | 127.0.0.1
                                                                                                                         Start querying singular range {{-3485513579396041028, pk{000400000000}}} [shard 0/sl:default] | 2026-04-19 16:25:25.007562+00:00 | 127.0.0.1 |            139 | 127.0.0.1
                                                                                                                                   [reader concurrency semaphore sl:default] admitted immediately [shard 0/sl:default] | 2026-04-19 16:25:25.007564+00:00 | 127.0.0.1 |            142 | 127.0.0.1
                                                                                                                                         [reader concurrency semaphore sl:default] executing read [shard 0/sl:default] | 2026-04-19 16:25:25.007573+00:00 | 127.0.0.1 |            151 | 127.0.0.1
                      Reading key {-3485513579396041028, pk{000400000000}} from sstable w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Data.db [shard 0/sl:default] | 2026-04-19 16:25:25.007594+00:00 | 127.0.0.1 |            172 | 127.0.0.1
                                       page cache miss: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Partitions.db, page=0, readahead=1 [shard 0/sl:default] | 2026-04-19 16:25:25.007607+00:00 | 127.0.0.1 |            184 | 127.0.0.1
                                           page cache miss: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=488, readahead=1 [shard 0/sl:default] | 2026-04-19 16:25:25.016029+00:00 | 127.0.0.1 |           8607 | 127.0.0.1
                                                         page cache hit: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=488 [shard 0/sl:default] | 2026-04-19 16:25:25.016109+00:00 | 127.0.0.1 |           8687 | 127.0.0.1
                                           page cache miss: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=486, readahead=1 [shard 0/sl:default] | 2026-04-19 16:25:25.016111+00:00 | 127.0.0.1 |           8688 | 127.0.0.1
                                           page cache miss: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=285, readahead=1 [shard 0/sl:default] | 2026-04-19 16:25:25.016176+00:00 | 127.0.0.1 |           8754 | 127.0.0.1
                                                         page cache hit: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=488 [shard 0/sl:default] | 2026-04-19 16:25:25.016260+00:00 | 127.0.0.1 |           8838 | 127.0.0.1
                                                         page cache hit: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=486 [shard 0/sl:default] | 2026-04-19 16:25:25.016261+00:00 | 127.0.0.1 |           8839 | 127.0.0.1
                                                         page cache hit: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=285 [shard 0/sl:default] | 2026-04-19 16:25:25.016261+00:00 | 127.0.0.1 |           8839 | 127.0.0.1
                             w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Data.db: scheduling bulk DMA read of size 4107 at offset 206086656 [shard 0/sl:default] | 2026-04-19 16:25:25.016268+00:00 | 127.0.0.1 |           8846 | 127.0.0.1
 w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Data.db: finished bulk DMA read of size 4107 at offset 206086656, successfully read 4608 bytes [shard 0/sl:default] | 2026-04-19 16:25:25.016340+00:00 | 127.0.0.1 |           8918 | 127.0.0.1
                         Page stats: 1 partition(s) (1 live, 0 dead), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 1 cell(s) (1 live, 0 dead) [shard 0/sl:default] | 2026-04-19 16:25:25.016367+00:00 | 127.0.0.1 |           8945 | 127.0.0.1
                                                                                                                                                                                 Querying is done [shard 0/sl:default] | 2026-04-19 16:25:25.016385+00:00 | 127.0.0.1 |           8963 | 127.0.0.1
                                                                                                                                                             Done processing - preparing a result [shard 0/sl:default] | 2026-04-19 16:25:25.016401+00:00 | 127.0.0.1 |           8979 | 127.0.0.1
                                                                                                                                                                                                      Request complete |       2026-04-19 16:25:25.015989 | 127.0.0.1 |           8989 | 127.0.0.1
```

Closes scylladb/scylladb#29552
2026-04-20 17:53:56 +03:00
..