Commit Graph

32 Commits

Author SHA1 Message Date
Geoff Montee
0eb5603ebd Docs: describe the system tables
Fixes issue #12818 with the following docs changes:

docs/dev/system_keyspace.md: Added missing system tables, added table of contents (TOC), added categories

Closes scylladb/scylladb#27789
2026-03-04 08:55:43 +02:00
Botond Dénes
7e90ed657c Merge 'Fix client_options docs' from Karol Baryła
https://github.com/scylladb/scylladb/pull/25746 added a new column to `system.clients`: `client_options frozen<map<text, text>>`. This column stores all options sent by the client in the `STARTUP` message.
This PR also added `CLIENT_OPTIONS` to the list of values sent in `SUPPORTED` message, and documented that drivers can send their configuration (as JSON) in `STARTUP` under this key.

Documentation for the new column was not added to the description of `system.clients` table, and documentation about the new `STARTUP` key was added in `protocol-extensions.md`, but in the section about shard awareness extension.

This PR adds missing `system.clients` column description, moves the documentation of `CLIENT_OPTIONS` into its own section, and expands it a bit.

Backport: none, because this fixes internal documentation.

Closes scylladb/scylladb#28126

* github.com:scylladb/scylladb:
  protocol-extensions.md: Fix client_options docs
  system_keyspace.md: Add client_options column
  system_keyspace.md: Fix order in system.clients
2026-02-20 14:23:34 +02:00
Ferenc Szili
1136a3f398 docs: add effective_capacity to system keyspace docs
This adds the description of effective_capacity to the documentation
of the system keyspace.
2026-01-18 16:57:08 +01:00
Karol Baryła
30d4d3248d system_keyspace.md: Add client_options column
It was recently introduced, but the documentation was not updated.
2026-01-13 11:35:52 +01:00
Karol Baryła
a0a6140436 system_keyspace.md: Fix order in system.clients
scheduling_group column is places after protocol_version in the current
version.
2026-01-13 11:33:34 +01:00
Ferenc Szili
10eb364821 load_balancer: implement size-based load balancing
This changes introduces tablet size based load balancing. It is an
extension of capacity based balancing with the addition of actual tablet
sizes.

It computes the difference between the most and least loaded nodes in
the DC and stops further balancing if this difference is bellow the
config option size_based_balance_threshold_percentage.

This config option does not apply to the absolute load, but instead to
the percentage of how much the most loaded node is more loaded than the
least loaded node:

delta = (most_loaded - least_loaded) / most_loaded

If this delta is smaller then the config threshold, the balancer will
consider the nodes balanced.
2025-12-27 11:20:20 +01:00
Ferenc Szili
621cb19045 load_sketch: use tablet sizes in load computation
This commit changes load_sketch so that it computes node and shard load
based on tablet sizes instead of tablet count.
2025-12-27 10:37:23 +01:00
Ferenc Szili
e96863be0c virtual_table: add tablet_sizes virtual table
This change adds the tablet_sizes virtual table. The contents of this
table are gathered from the current load_stats data structure.
2025-11-21 16:53:28 +01:00
Asias He
cb7db47ae1 repair: Add incremental_mode option for tablet repair
This patch introduces a new `incremental_mode` parameter to the tablet
repair REST API, providing more fine-grained control over the
incremental repair process.

Previously, incremental repair was on and could not be turned off. This
change allows users to select from three distinct modes:

- `regular`: This is the default mode. It performs a standard
  incremental repair, processing only unrepaired sstables and skipping
  those that are already repaired. The repair state (`repaired_at`,
  `sstables_repaired_at`) is updated.

- `full`: This mode forces the repair to process all sstables, including
  those that have been previously repaired. This is useful when a full
  data validation is needed without disabling the incremental repair
  feature. The repair state is updated.

- `disabled`: This mode completely disables the incremental repair logic
  for the current repair operation. It behaves like a classic
  (pre-incremental) repair, and it does not update any incremental
  repair state (`repaired_at` in sstables or `sstables_repaired_at` in
  the system.tablets table).

The implementation includes:

- Adding the `incremental_mode` parameter to the
  `/storage_service/repair/tablet` API endpoint.
- Updating the internal repair logic to handle the different modes.
- Adding a new test case to verify the behavior of each mode.
- Updating the API documentation and developer documentation.

Fixes #25605

Closes scylladb/scylladb#25693
2025-09-09 06:50:21 +03:00
Asias He
5377f87e5a tablet: Add sstables_repaired_at to system.tablets table
It is used to store the repaired_at for each tablet.
2025-08-11 10:10:07 +08:00
Pavel Emelyanov
5fcdf948d9 doc: Update system.clients schema with scheduling_group cell
It was added by 9319d65971 (db/virtual_tables: add scheduling group
column to system.clients) recently.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#25294
2025-08-05 10:16:20 +03:00
Michael Litvak
4777444024 tablets: add base_table column to system.tablets
Add a new column base_table to the system.tablets table.

It can be set to point to another table to indicate that the tablets of
this table are co-located with the tablets of the base table.

When it's set, we don't store other tablet information in system.tablets
and in the in-memory tablet map object for this table, and we need to
refer instead to the base table tablet information. The method
get_tablet_map always returns the base tablet map.
2025-07-01 10:29:59 +03:00
Michael Litvak
4e2742a30b docs: update system.tablets schema
The schema of system.tablets in the docs is outdated. replace it with
the current schema.
2025-07-01 10:29:59 +03:00
Botond Dénes
92b5fe8983 db/system_keyspace: introduce the corrupt_data table
To serve as a place to store corrupt mutation fragments. These fragments
cannot be written to sstables, as they would be spread around by
compaction and/or repair. They even might make parsing the sstable
impossible. So they are stored in this special table instead, kept
around to be inspected later and possibly restored if possible.
2025-06-24 11:05:30 +03:00
Tomasz Grabiec
0b9a75d7b6 virtual-tables: Introduce system.load_per_node
Can be used to query per-node stats about load as seen by the load
balancer.

In particular, node's capacity will be used by tablet-mon.py to
scale tablet columns so that equal height is equal node utilization.
2025-04-09 20:21:51 +02:00
Aleksandra Martyniuk
4c75701756 docs: locator: update the docs and formatter of tablet_task_info 2025-02-14 09:13:11 +01:00
Pavel Emelyanov
81f7a6d97d doc: Update system.sstables table schema description
The partition key had been renamed and its type changed some time ago,
but the doc wasn't updated. Fix it.

refs: #20998

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#22683
2025-02-10 16:09:49 +02:00
Asias He
9d58a911f1 docs: Update system_keyspace.md for tablet repair related info 2024-11-20 09:42:41 +08:00
Kefu Chai
ad649be1bf treewide: drop thrift support
thrift support was deprecated since ScyllaDB 5.2

> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.

so let's drop it. in this change,

* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is
  preserved for backward compatibility, as we could load
  from an existing system.local table which still contains
  this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for
  backward compatibility with java-based nodetool.
* `rpc_port` and `start_rpc` options are preserved, but
  they are marked as "Unused". so that the new release
  of scylladb can consume existing scylla.yaml configurations
  which might contain these settings. by making them
  deprecated, user will be able get warned, and update
  their configurations before we actually remove them
  in the next major release.

Fixes #3811
Fixes #18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 06:44:59 +08:00
Ferenc Szili
8e9771d010 sstable: added docs for system.large_partitions.dead_rows 2024-05-07 15:44:33 +02:00
Ferenc Szili
c528597a84 sstables: add docs changes for system.large_partitions
This commit updates the documentation changes for the new column
range_tombstones in system.large_partitions
2024-04-22 15:25:41 +02:00
Raphael S. Carvalho
0d5ba1ee4b tablets: Add resize decision metadata to tablet metadata
The new metadata describes the ongoing resize operation (can be either
of merge, split or none) that spans tablets of a given table.
That's managed by group0, so down nodes will be able to see the
decision when they come back up and see the changes to the
metadata.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:06 -03:00
Tomasz Grabiec
4a06ffb43c tablets: Store transition kind per tablet
Will be used to distinguish regular migration from rebuild, repair and
RF change.
2024-01-23 01:12:57 +01:00
Yaniv Kaul
862909ee4f Typos: fix typos in documentation
Using codespell, went over the docs and fixed some typos.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#16275
2023-12-07 11:10:17 +02:00
Avi Kivity
23be6f0336 tablets: change persistent type of replica set from set to list
The system.tablets table stores replica sets as a CQL set type,
which is sorted. This means that if, in a tablet replica set
[n1, n2, n3] n2 is replaced with n4, then on reload we'll see
[n1, n3, n4], changing the relative position of n3 from the third
replica to the second.

The relative position of replicas in a replica set is important
for materialized views, as they use it to pair base replicas with
view replicas. To prepare for materialized views using tablets,
change the persistent data type to list, which preserves order.

The code that generates new replica sets already preserves order:
see locator::replace_replica().

While this changes the system schema, tablets are an experimental
feature so we don't need to worry about upgrades.

Closes #15111
2023-08-21 22:55:14 +02:00
Kefu Chai
b8c565875b docs/dev/system_keyspace: add raft table
it is one of the non-volatile tables. we need add more of them.
but let's do this piecemeal.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-24 10:08:04 +08:00
Kefu Chai
eee0003312 docs/dev/system_keyspace: move sstables and tablets into another section
not all tables in system keyspace are volatile. among other things,
system.sstables and system.tablets are persisted using sstables like
regular user tables. so move them into the section where we have
other regular tables there.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-05-24 10:08:03 +08:00
Kefu Chai
1246568e3b docs/dev/system_keyspace: use timeuuid for sstables.generation
we changed the type of generation column in system.sstables
from bigint to timeuuid in 74e9e6dd1a
but that change failed to update the document accordingly. so let's
update the document to reflect the change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13994
2023-05-23 14:37:28 +03:00
Tomasz Grabiec
9d786c1ebc db: tablets: Add persistence layer 2023-04-24 10:49:37 +02:00
Pavel Emelyanov
08e9046d07 system_keyspace: Add ownership table
The schema is

CREATE TABLE system.sstables (
    location text,
    generation bigint,
    format text,
    status text,
    uuid uuid,
    version text,
    PRIMARY KEY (location, generation)
)

A sample entry looks like:

 location                                                            | generation | format | status | uuid                                 | version
---------------------------------------------------------------------+------------+--------+--------+--------------------------------------+---------
 /data/object_storage_ks/test_table-d096a1e0ad3811ed85b539b6b0998182 |          2 |    big | sealed | d0a743b0-ad38-11ed-85b5-39b6b0998182 |      me

The uuid field points to the "folder" on the storage where the sstable
components are. Like this:

s3
`- test_bucket
   `- f7548f00-a64d-11ed-865a-0c1fbc116bb3
      `- Data.db
       - Index.db
       - Filter.db
       - ...

It's not very nice that the whole /var/lib/... path is in fact used as
location, it needs the PR #12707 to fix this place.

Also, the "status" part is not yet fully functional, it only supports
three options:

- creating -- the same as TemporaryTOC file exists on disk
- sealed -- default state
- deleting -- the analogy for the deletion log on disk

The latter needs support from the distributed_loader, which's not yet
there. In fact, distributes_loader also needs to be patched to actualy
select entries from this table on load. Also it needs the mentioned
PR #12707 to support staging and quarantine sstables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:44:28 +03:00
Benny Halevy
2f49eebb04 db/system_keyspace: add collection_elements column to system.large_cells
And bump the schema version offset since the new schema
should be distinguishable from the previous one.

Refs scylladb/scylladb#11660

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-04 08:42:08 +03:00
David Garcia
bb21c3c869 Move dev docs to docs/dev 2022-06-24 18:07:08 +01:00