This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased.
The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost.
Refs scylladb/scylladb#6403Fixesscylladb/scylladb#12278Closesscylladb/scylladb#15567
* github.com:scylladb/scylladb:
docs: Update Hinted Handoff documentation
db/hints: Add endpoint_downtime_not_bigger_than()
db/hints: Migrate hinted handoff when cluster feature is enabled
db/hints: Handle arbitrary directories in resource manager
db/hints: Start using hint_directory_manager
db/hints: Enforce providing IP in get_ep_manager()
db/hints: Introduce hint_directory_manager
db/hints/resource_manager: Update function description
db/hints: Coroutinize space_watchdog::scan_one_ep_dir()
db/hints: Expose update lock of space watchdog
db/hints: Add function for migrating hint directories to host ID
db/hints: Take both IP and host ID when storing hints
db/hints: Prepare initializing endpoint managers for migrating from IP to host ID
db/hints: Migrate to locator::host_id
db/hints: Remove noexcept in do_send_one_mutation()
service: Add locator::host_id to on_leave_cluster
service: Fix indentation
db/hints: Fix indentation
Finalization of tablet split was only synchronizing with migrations, but
that's not enough as we want to make sure that all processes like repair
completes first as they might hold erm and therefore will be working
with a "stale" version of token metadata.
For synchronization to work properly, handling of tablet split finalize
will now take over the state machine, when possible, and execute a
global token metadata barrier to guarantee that update in topology by
split won't cause problems. Repair for example could be writing a
sstable with stale metadata, and therefore, could generate a sstable
that spans multiple tablets. We don't want that to happen, therefore
we need the barrier.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18380
In order to correctly restore schema from `DESC SCHEMA WITH INTERNALS`, we need a way to drop a column with a timestamp in the past.
Example:
- table t(a int pk, b int)
- insert some data1
- drop column b
- add column b int
- insert some data2
If the sstables weren't compacted, after restoring the schema from description:
- we will loss column b in data2 if we simply do `ALTER TABLE t DROP b` and `ALTER TABLE t ADD b int`
- we will resurrect column b in data1 if we skip dropping and re-adding the column
Test for this: https://github.com/scylladb/scylla-dtest/pull/4122Fixes#16482Closesscylladb/scylladb#18115
* github.com:scylladb/scylladb:
docs/cql: update ALTER TABLE docs
test/cqlpytest: add test for prepared `ALTER TABLE ... DROP ... USING TIMESTAMP ?`
test/cql-pytest: remove `xfail` from alter table with timestamp tests
cql3/statements: extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop
cql3/statements: pass `query_options` to `prepare_schema_mutations()`
cql3/statements: add bound terms to alter table statement
cql3/statements: split alter_table_statement into raw and prepared
schema: allow to specify timestamp of dropped column
We move consistent cluster management out of experimental and
make it the default for new clusters in 6.0. In code, we make the
`consistent-topology-changes` flag unused and assumed to be true.
In 6.0, the topology upgrade procedure will be manual and
voluntary, so some clusters will still be using the gossip-based
topology even though they support the raft-based topology.
Therefore, we need to continue testing the gossip-based topology.
This is possible by using the `force-gossip-topology-changes` flag
introduced in scylladb/scylladb#18284.
Ref scylladb/scylladb#17802Closesscylladb/scylladb#18285
* github.com:scylladb/scylladb:
docs: raft.rst: update after removing consistent-topology-changes
treewide: fix indentation after the previous patch
db: config: make consistent-topology-changes unused
test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls
test: test_read_required_hosts: run with force-gossip-topology-changes
storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes
storage_service: join_token_ring: fix finish_setup_after_join calls
dclocal_read_repair_chance and read_repair_chance have been removed in Cassandra 3.11 and 4.x, see
https://issues.apache.org/jira/browse/CASSANDRA-13910. if we expose these properties via DDL, Cassandra would fail to consume the CQL statement creating the table when performing migration from Scylla to Cassandra 4.x, as the latter does not understand these properties anymore.
currently the default values of `dc_local_read_repair_chance` and `read_repair_chance` are both "0". so they are practically disabled, unless user deliberately set them to a value greater than 0.
also, as a side effect, Cassandra 4.x has better support of Python3. the cqlsh shipped along with Cassandra 3.11.16 only supports python2.7, see
https://github.com/apache/cassandra/blob/cassandra-3.11.16/bin/cqlsh.py it errors out if the system only provides python3 with the error of
```
No appropriate python interpreter found.
```
but modern linux systems do not provide python2 anymore.
so, in this change, we deprecate these two options.
Fixes#3502
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18087
* github.com:scylladb/scylladb:
docs: drop documents related to {,dclocal_}read_repair_chance
treewide: remove {dclocal_,}read_repair_chance options
Repair may miss some tablets that migrated across nodes.
So if tombstones expire after some timeout, then we can
have data resurrection.
Set default tombstone_gc mode to "repair" for tables which
use tablets (if repair is required).
Fixes: #16627.
Closesscylladb/scylladb#18013
* github.com:scylladb/scylladb:
test: check default value of tombstone_gc
test: topology: move some functions to util.py
cql3: statements: change default tombstone_gc mode for tablets
since "read_repair_chance" and "dclocal_read_repair_chance" are
removed, and not supported anymore. let's stop documenting them.
Refs #3502
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
When issuing warnings about partitions with the number of rows above a configured threshold, the large partitions handler does not take into consideration the number of range tombstone markers in the total rows count. This fix adds the number of range tombstone markers to the total number of rows and saves this total in system.large_partitions.rows (if it is above the threshold). It also adds a new column range_tombstones to the system.large_partitions table which only contains the number of range tombstone markers for the given partition.
This PR fixes the first part of issue #13968
It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.
Closesscylladb/scylladb#18346
* github.com:scylladb/scylladb:
sstables: add docs changes for system.large_partitions
sstable: large data handler needs to count range tombstones as rows
This commit adds the description and usage instructions of Scylla Doctor
to the "How to Report a ScyllaDB Problem" page.
Scylla Doctor replaces Health Check Report, so the description of
and references to the latter are removed with this commit.
Fixes https://github.com/scylladb/scylladb/issues/16276Closesscylladb/scylladb#17617
Currently, if tombstone_gc mode isn't specified for a table,
then "timeout" is used by default. With tablets, running
"nodetool repair -pr" may miss a tablet if it migrated across
the nodes. Then, if we expire tombstones for ranges that
weren't repaired, we may get data resurrection.
Set default tombstone_gc mode value for DDLs that don't
specify it. It's set to "repair" for tables which use tablets
unless they use local replication strategy or rf = 1.
Otherwise it's set to "timeout".
This commit excludes OSS-specific links and content
added in https://github.com/scylladb/scylladb/pull/17624
to separate files and adds the include directive `.. scylladb_include_flag::`
to include these files in the doc source files.
Reason: Adding the link to the Open Source upgrade guide
(/upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology)
breaks the Enterprise documentation because the Enterprise docs don't
contain that upgrade guide. We must add separate files for OSS and
Enterprise to prevent failing the Enterprise build and breaking the
links.
Closesscylladb/scylladb#18372
in order to reduce the confusion like:
> I cannot find foobar in the list, is it supported?
also, take this opportunity to use "console" instead of "shell" for
rendering the code block. it's a better fit in this case. since we
are using pygment for syntax highlighting,
see https://pygments.org/docs/lexers/#pygments.lexers.shell.BashSessionLexer
for details on the "console" lexer.
and add a prompt before the command line, so that "console" lexer
can render the command line and output better.
also, add a note explaining that user should refer the output of
`scylla` to see the list of logger classes.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18311
This commit includes updates related to replacing system_auth with system_auth_v2.
- The keyspace name system_auth is renamed to system_auth_v2.
- The procedures are updated to account for system_auth_v2.
- No longer required system_auth RF changes are removed from procedures.
- The information is added that if the consistent topology updates feature
was not enabled upon upgrade from 5.4, there are limitations or additional
steps to do (depending on the procedure).
The files with that kind of information are to be found in _common folders
and included as needed.
- The upgrade guide has been updated to reflect system_auth_v2 and related impacts.
Closesscylladb/scylladb#18077
Currently, Scylla logs a warning when it writes a cell, row or partition which are larger than certain configured sizes. These warnings contain the partition key and in case of rows and cells also the cluster key which allow the large row or partition to be identified. However, these keys can contain user-private, sensitive information. The information which identifies the partition/row/cell is also inserted into tables system.large_partitions, system.large_rows and system.large_cells respectivelly.
This change removes the partition and cluster keys from the log messages, but still inserts them into the system tables.
The logged data will look like this:
Large cells:
WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large cell ks_name/tbl_name: cell_name (SIZE bytes) to sstable.db
Large rows:
WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large row ks_name/tbl_name: (SIZE bytes) to sstable.db
Large partitions:
WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large partition ks_name/tbl_name: (SIZE bytes) to sstable.db
Fixes#18041Closesscylladb/scylladb#18166
This commit adds image information for the latest patch release
to the GCP and Azure deployment page.
The information now replaces the reference to the Download Center
so that the user doesn't have to jump to another website.
Fixes https://github.com/scylladb/scylladb/issues/18144Closesscylladb/scylladb#18168
Maintainers are also allowed to commit their own backport PR. They are
allowed to backport their own code, opening a PR to get a CI run for a
backport doesn't change this.
Closesscylladb/scylladb#17727
This change adds the missing Cassandra compaction option unchecked_tombstone_compaction.
Setting this option to true causes the compaction to ignore tombstone_threshold, and decide whether to do a compaction only based on the value of tombstone_compaction_interval
Fixes#1487Closesscylladb/scylladb#17976
* github.com:scylladb/scylladb:
removed forward declaration of resharding_descriptor
compaction options and troubleshooting docs
cql-pytest/test_compaction_strategy_validation.py
test/boost/sstable_compaction_test.cc
compaction: implement unchecked_tombstone_compaction
in order to help the developers to understand the transitions
of `node_state` and the `transition_state` on each of the `node_state`,
in this change, the nested state machine diagram is added to the
node state diagram.
please note, instead of trying to merge similar states like
bootstrapping and replacing into a single state, we keep them as
separate ones, and replicate the nested state machine diagram in them
as well, to be more clear.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18025
Fixesscylladb/scylladb#17513
* 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev:
doc: amend snitch changing procedure to work with raft
test: add test to check that snitch change takes effect.
raft topology: update rack/dc info in topology state on reboot if changed
To change snitch with raft all nodes need to be started simultaneously
since each node will try to update its state in the raft and for that
quorum is required.
Introduces relative link support for individual properties listed on the configuration properties page. For instance, to link to a property from a different document, use the syntax :ref:`memtable_flush_static_shares <confprop_memtable_flush_static_shares>`.
Additionally, it also adds support for linking groups. For example, :ref:`Ungrouped properties <confgroup_ungrouped_properties>`.
Closesscylladb/scylladb#17753
Document the manual upgrade procedure that is required to enable
consistent cluster management in clusters that were upgraded from an
older version to ScyllaDB Open Source 6.0. This instruction is placed in
previously placeholder "Enable Raft-based Topology" page which is a part
of the upgrade instructions to ScyllaDB Open Source 6.0.
Add references to the new description in the "Raft Consensus Algorithm
in ScyllaDB" document in relevant places.
Extend the "Handling Node Failures" document so that it mentions steps
required during recovery of a ScyllaDB cluster running version 6.0.
Fixes: scylladb/scylladb#17341Closesscylladb/scylladb#17624
This commit updates the Upgrade ScyllaDB Image page.
- It removes the incorrect information that updating underlying OS packages is mandatory.
- It adds information about the extended procedure for non-official images.
Closesscylladb/scylladb#17867
This PR fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because tablet replica allocation for
rebuild will not be able to find a viable destination, as the replacing node
is not considered to be a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.
The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is in left state and replacing node is in normal state.
The replacing node waits for this draining to be complete on boot
before the node is considered booted.
Fixes https://github.com/scylladb/scylladb/issues/17025
Nodes in the left state will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:
1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first.
2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement.
It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.
Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).
In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet replica sets.
Currently left nodes are never removed from topology, so will
accumulate in memory. We could garbage-collect them from topology
coordinator if a left node is absent in any replica set. That means we
need a new state - left_for_real.
Closesscylladb/scylladb#17388
* github.com:scylladb/scylladb:
test: py: Add test for view replica pairing after replace
raft, api: Add RESTful API to query current leader of a raft group
test: test_tablets_removenode: Verify replacing when there is no spare node
doc: topology-on-raft: Document replace behavior with tablets
tablets, raft topology: Rebuild tablets after replacing node is normal
tablets: load_balancer: Access node attributes via node struct
tablets: load_balancer: Extract ensure_node()
mv: Switch to using host_id-based replica set
effective_replication_map: Introduce host_id-based get_replicas()
raft topology: Keep nodes in the left state to topology
tablets: Introduce read_required_hosts()
This PR fixes comments left from #17481 , namely
- adds case selection to boost suite
- describes the case selection in documentation
Closesscylladb/scylladb#17721
* github.com:scylladb/scylladb:
docs: Add info about the ability to run specific test case
test.py: Support case selection for boost tests
The test.py usage is documented, the ability to run a specific test by
its name is described in doc. Extend it with the new ability to run
specific test case as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
couple minor formatting fixes.
Closesscylladb/scylladb#17518
* github.com:scylladb/scylladb:
docs: remove leading space in table element
docs: remove space in words