RPCs from old nodes will still use old format so translation will be
used in this case. The change is backwards compatible thanks to RPC
extensibility.
Also remove forcing of the replacing node to be alive which is not
needed any more since gossiper no longer inhibits replacing nodes from
advertising themselves.
The parameter was needed when nodes were addressed by IP, so during
replace with the same IP a new node had to "hide" itself from the
cluster to not get accidentally confused with the old node. Now,
when nodes are addressed by host id the situation is impossible.
In this rather large path we mode to address nodes in storage proxy by
host ids instead of ips. Some subsystems storage proxy calls to are
not yet converted to host ids, so we translate back and forth when we
interact with them.
Currently the locator::topology object, when created, does not contain
local node, but it is started to be used to access local database. It
sort of work now because there are explicit checks in the code to handle
this special case like in topology::get_location for instance. We do not
want to hack around it and instead rely on an invariant that the local
node is always there. To do that we add local node during
locator::topology creation. There is a catch though. Unlike with IP host
ID is not known during startup. We actually need to read from the
database to know it, so the topology starts with host ID zero and then
it changes once to the real one. This is not a problem though. As long as
the (one node) topology is consistent (_cfg.this_host_id is equal to the
node's id) local access will work.
Local replication strategy returns zero host id in replica set instead
of the real one. It mostly works now because code that translates ids
to ips knows that zero host id is a special one. But we want to use host
ids directly and we need to return real one (or handle zero special case
everywhere).
What wait_for_ip is actually does is waiting for a node to appear in the
gossiper since this is when it is added to the raft address map. Drop
the usage of the address map and check the gossiper directly.
Now we have enough functionality in the gossiper and messaging service
to get rid of ip2id function in the topology coordinator. We can use
hos ids directly.
If an RPC client creation was triggered by send function that has host
id as a dst send it as part of CLIENT_ID RPC which is always the first
RPC on each connection. If receiver's host id does not match it will
drop the connection.
The function checks if a node with provided id is alive. If it fails to map id to ip
or there is no state for the ip found the node is considered to be dead.
The function looks up provided host id in gossip_address_map and throws
unknown_address if the mapping is not available. Otherwise it sends the
message by IP found.
We want to be able to address nodes by host ids. For that lets generate
send functions that gets host_id as a dst parameter.
Changes to raft_rpc are needed because otherwise the compiler cannot
select a correct overload.
Building upon commit 69b47694, this change addresses a subtle synchronization
weakness in node visibility checks during recovery mode testing.
Previous Approach:
- Waited only for the first node to see its peers
- Insufficient to guarantee full cluster consistency
Current Solution:
1. Implement comprehensive node visibility verification
2. Ensure all nodes mutually recognize each other
3. Prevent potential schema propagation race conditions
Key Improvements:
- Robust cluster state validation before keyspace creation
- Eliminate partial visibility scenarios
Fixesscylladb/scylladb#21724
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21726
Task status information from nodetool commands is not retained permanently:
- Status of completed tasks is only kept for `task_ttl_in_seconds`
- Status is removed after being queried, making it a one-time operation
This behavior is important for users to understand since subsequent
queries for the same completed task will not return any information.
Add documentation to make this clear to users.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21386
these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed.
please note, because `mutation/mutation.hh` does not include `seastar/coroutine/maybe_yield.hh` anymore, and quite a few source files were relying on this header to bring in the declaration of `maybe_yield()`, we have to include this header in the places where this symbol is used. the same applies to `seastar/core/when_all.hh`.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#21727
* github.com:scylladb/scylladb:
.github: add "mutation" to CLEANER_DIR
mutation: remove unused "#include"s
in order to prevent future inclusion of unused headers, let's include
"mutation" subdirectory to CLEANER_DIR, so that this workflow can
identify the regressions in future.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.
please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Demote --scylla-data-dir and --scylla-yaml-file to schema source
helpers, rather than schema source in themselves. This practically means
that when these options are used, they won't define where the tool will
attempt to load the schema from, they will just be helpers to help locate
the schema, for whichever schema source the tool was instructed to use
(or left to choose).
--scylla-data-dir and --scylla-yaml-file being schema sources were
problematic with encryption at rest and for S3 support (not yet
implemented). With encryption, the tool needs access to the
configuration, so --scylla-yaml-file is often used to provide the path
to the configuration file, which contains encryption configuration,
needed for the tool to decrypt the sstable. Currently, using this option
implies forcing the tool to read the schema from the schema tables,
which is a problematic option for tests -- Scylla might be compacting a
schema sstable and this will make the tool fail to load the schema.
Demoting these options the schema helpers, allows providing them, while
at the same time having the option to use a different schema-source.
To allow the user to force the tool to load the schema from the schema
tables, a new --schema-tables option is added. Similarly, a
--sstable-schema option is introduced to force the tool to load the
schema from the sstable itself.
With this, each 4 schema source now has an option to force the use of
said schema source. There are various helper options to be used along
with these.
The documentation as well as the tests are updated with the changes.
The schema related documentation gets an rather extensive facelift
because it was a bit out-of-date and incomplete.
Fixes: scylladb/scylladb#20534Closesscylladb/scylladb#21678
This change improves dependency management by explicitly specifying
library linkage visibility in CMake targets.
Previously, some ScyllaDB targets used `target_link_libraries()`
without `PUBLIC` or `PRIVATE` keywords, which resulted in transitive
library dependencies by default. This unintentionally exposed
non-public dependencies to downstream targets.
Changes:
- Always use explicit `PRIVATE` or `PUBLIC` keywords with
`target_link_libraries()`
- Tighten build dependency tree
- Enforce a more modular linkage model
See: [CMake documentation on library dependencies](https://cmake.org/cmake/help/latest/command/target_link_libraries.html)
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21686
Once e.g. `ALTER KEYSPACE` is performed, all in-memory objects should be updated accordingly, but this is not entirely true for keyspace metadata object. The reason for that is that keyspace metadata are stored in 2 system tables: `system_schema.keyspaces` and `system_schema.scylla_keyspaces`. Up until now the in-memory keyspace metadata object has been updated only with entries from the first table, and missed updates when entries from the 2nd table changed. These entries were e.g. initial tablets or storage options.
This change fixes this oversight by considering both tables when checking if keyspace metadata need to be updated. From the implementation point of view, the change is simple: we're considering `system_schema.scylla_keyspaces` also in `merge_keyspaces()` and if old and new schemas have any differences, we include that when altering ks.
Fixes#20768
Backport: no need, I don't think the issue is severe, atm it seems like it can only influence the tablets number, which should not bring the cluster down nor result in returning bad data, it can mostly influence the speed of the db.
Closesscylladb/scylladb#20852
The checksummed file data source uses the chunk size to enforce that the
reads from the underlying file input stream will be aligned at the chunk
boundary. This is necessary so that we can validate the checksum of each
chunk.
However, a mismatch in the numeric types caused a bug where the
underlying file input stream would read a smaller portion of the data
file than expected.
The bug is located in the following lines:
```
auto start = _beg_pos & ~(chunk_size - 1);
auto end = (_end_pos & ~(chunk_size - 1)) + chunk_size;
```
`_beg_pos` and `_end_pos` are `uint64_t`, whereas `chunk_size` is
`uint32_t`. When executing the AND operation, the compiler converts the
right operand from `uint32_t` to `uint64_t`. Since the integer is
unsigned, the four most-significant bytes are filled with zeros, thus
erroneously truncating the corresponding bytes of the position.
Fix the bug by explicitly converting the chunk size to `uint64_t` before
any arithmetic operations. Also, replace the handwritten alignment
implementations with the `align_up()` and `align_down()` helpers.
Finally, restrict the file end position to not exceed the file length.
Since the last chunk can be smaller than the chunk size, it could happen
that the end position exceeds the file length after the round-up. This
is not a bug on its own since `make_file_input_stream()` can accept
lengths that go beyond end-of-file, but still it makes the code more
error prone and should be avoided.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#21665
Before these changes, we dereferenced `app_state` in
`manager::endpoint_downtime_not_bigger_than()` before checking that it's
not a null pointer. We fix that.
Fixesscylladb/scylladb#21699Closesscylladb/scylladb#21676