Before, the `nodetool getendpoints` expected the key as one string separated by : (for example 1:val:ue). This caused errors if any part of the key had a colon because it was unclear whether a colon was a separator or part of the key.
This change adds a new API endpoint, `/storage_service/natural_endpoints/v2/{keyspace}`, which accepts composite partition keys as multiple key_component query parameters (e.g., ?key_component=1&key_component=val:ue). The `nodetool getendpoints` command was updated to support a new `--key-components` option, allowing users to pass key components as an array. The client and test infrastructure were extended to support multiple values for a query parameter, and tests were added to verify correct behavior with composite keys.
The previous method of passing partition keys as colon-separated strings is preserved for backward compatibility.
Backport is not required, since this change relies on recent Seastar updates
Fixes#16596Closesscylladb/scylladb#26169
* github.com:scylladb/scylladb:
docs: document --key-components option for getendpoints
test/nodetool/test_getendpoints: add coverage for --key-components param in getendpoints
nodetool: Introduce new option --key-components to specify compound partition keys as array
rest_api/test_storage_service: add v2 natural_endpoints test for composite key with multiple components
api/storage_service: add GET 'natural_endpoints' v2 to support composite keys with ':'
rest_api_mock: support duplicate query parameters
test/rest_api: support multiple query values per key in RestApiSession.send()
nodetool: add support of new seastar query_parameters_type to scylla_rest_client
In test_two_tablets_concurrent_repair_and_migration_repair_writer_level
safe_rolling_restart returns ready cql. However, get_all_tablet_replicas
uses the cql reference from manager that isn't ready. Wait for cql.
Fixes: #26328Closesscylladb/scylladb#26349
Adds a parameterized test to verify that multiple --key-components arguments
are handled correctly by nodetool's getendpoints command. Ensures the
constructed REST request includes all key_component values in the expected format.
Adds a test case for the `/storage_service/natural_endpoints/v2/{keyspace}` endpoint,
verifying that it correctly resolves natural endpoints for a composite partition key
passed as multiple `key_component` query parameters.
Previously, only the last value of a repeated query parameter was captured,
which could cause inaccurate request matching in tests. This update ensures
that all values are preserved by storing duplicates as lists in the `params` dict.
Previously, the send() method in RestApiSession only supported one value per query parameter key.
This patch updates it to support passing lists of values, allowing the same key to appear multiple
times in the query string (e.g. ?key=value1&key=value2).
Applying lazy evaluation to the BTI encoding of clustering keys
was probably a bad default.
The possible benefits are dubious (because it's quite likely that the laziness
won't allow us to avoid that much work), but the overhead needed to
implement the laziness is large and immediate.
In this patch we get rid of the laziness.
We rewrite lazy_comparable_bytes_from_clustering_position
and lazy_comparable_bytes_from_ring_position
so that they performs the key translation eagerly,
all components to a single bytes_ostream in one synchronous call.
perf_bti_key_translation (microbenchmark added in this series, 1 iteration is 100 translations of a clustering key with 8 cells of int32_type):
```
Before:
test iterations median mad min max allocs tasks inst cycles
lcb_mismatch_test.lcb_mismatch 9233 109.930us 0.000ns 109.930us 109.930us 4356.000 0.000 2615394.3 614709.6
After:
test iterations median mad min max allocs tasks inst cycles
lcb_mismatch_test.lcb_mismatch 50952 19.487us 0.000ns 19.487us 19.487us 198.000 0.000 603120.1 109042.9
```
Enhancement, backport not required.
Closesscylladb/scylladb#26302
* github.com:scylladb/scylladb:
sstables/trie: BTI-translate the entire partition key at once
sstables/trie: avoid an unnecessary allocation of std::generator in last_block_offset()
sstables/trie: perform the BTI-encoding of position_in_partition eagerly
types/comparable_bytes: add comparable_bytes_from_compound
test/perf: add perf_bti_key_translation
Currently, replica::tablet_map_to_mutation generates a mutation having a row per tablet.
With enough tablets (10s of thousands) in the table we observe reactor stalls when freezing / unfreezing such large mutations, as seen in https://github.com/scylladb/scylladb/pull/18095#issuecomment-2029246954, and I assume we would see similar stalls also when converting those mutation into canonical_mutation and back, as they are similar to frozen_mutation, and bit more expensive since they also save the column mappings.
This series takes a different approach than allowing freeze to yield.
`tablet_map_to_mutation` is changed to `tablet_map_to_mutations`, able to generate multiple split mutations, that when squashed together are equivalent to the previously large mutation. Those mutations are fed into a `process_mutation` callback function, provided by the caller, which may add those mutation to a vector for further processing, and/or process them inline by freezing or making a canonical mutation.
In addition, split the large mutations would also prevent hitting the commitlog maximum mutation size.
Closesscylladb/scylladb#18162
* github.com:scylladb/scylladb:
schema_tables: convert_schema_to_mutations: simplify check for system keyspace
tablets: read_tablet_mutations: use unfreeze_and_split_gently
storage_service: merge_topology_snapshot: freeze snp.mutations gently
mutation: async_utils: add unfreeze_and_split_gently
mutation: add for_each_split_mutation
tablets: tablet_map_to_mutations: maybe split tablets mutation
tablets: tablet_map_to_mutations: accept process_func
perf-tablets: change default tables and tablets-per-table
perf-tablets: abort on unhandled exception
Split the tablets mutations by number of rows, based on
`min_tablets_in_mutation` (currently calibrated to 1024),
similar to the splitting done in
`storage_service::merge_topology_snapshot`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prepare for generating several mutations for the
tablet_map by calling process_func for each generated mutation.
This allows the caller to directly freeze those mutations
one at a time into a vector of frozen mutations or simililarly
convert them into canonical mutations.
Next patch will split large tablet mutations to prevent stalls.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
tablets-per-table must be a power of 2, so round up 10000 to 16K.
also, reduce number of tables to have a total of about 100K
tablets, otherwise we hit the maximum commitlog mutation size
limit in save_tablet_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The directory utils/ is supposed to contain general-purpose utility
classes and functions, which are either already used across the project,
or are designed to be used across the project.
This patch moves 8 files out of utils/:
utils/advanced_rpc_compressor.hh
utils/advanced_rpc_compressor.cc
utils/advanced_rpc_compressor_protocol.hh
utils/stream_compressor.hh
utils/stream_compressor.cc
utils/dict_trainer.cc
utils/dict_trainer.hh
utils/shared_dict.hh
These 8 files together implement the compression feature of RPC.
None of them are used by any other Scylla component (e.g., sstables have
a different compression), or are ready to be used by another component,
so this patch moves all of them into message/, where RPC is implemented.
Theoretically, we may want in the future to use this cluster of classes
for some other component, but even then, we shouldn't just have these
files individually in utils/ - these are not useful stand-alone
utilities. One cannot use "shared_dict.hh" assuming it is some sort of
general-purpose shared hash table or something - it is completely
specific to compression and zstd, and specifically to its use in those
other classes.
Beyond moving these 8 files, this patch also contains changes to:
1. Fix includes to the 5 moved header files (.hh).
2. Fix configure.py, utils/CMakeLists.txt and message/CMakeLists.txt
for the three moved source files (.cc).
3. In the moved files, change from the "utils::" namespace, to the
"netw::" namespace used by RPC. Also needed to change a bunch
of callers for the new namespace. Also, had to add "utils::"
explicitly in several places which previously assumed the
current namespace is "utils::".
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25149
This PR migrates limits tests from dtest to this repository.
One reason is that there is an ongoing effort to migrate tests from dtest to here.
Debug logs are enabled on `test_max_cells` for `lsa-timing` logger, to have more information about memory reclaim operation times and memory chunk sizes. This will allow analysis of their value distributions, which can be helpful with debugging if the issue reoccurs.
Also, scylladb keeps sql files with metrics which, with some modifications, can be used to track metrics over time for some tests. This would show if there are pauses and spikes or the test performance is more or less consistent over time.
scylla-dtest PR that removes migrated tests:
[limits_test.py: remove tests already ported to scylladb repo #6232](https://github.com/scylladb/scylla-dtest/pull/6232)
Fixes#25097
This is a migration of existing tests to this repository. No need for backport.
Closesscylladb/scylladb#26077
* github.com:scylladb/scylladb:
test: dtest: limits_test.py: test_max_cells log level
test: dtest: limits_test.py: make the tests work
test: dtest: test_limits.py: remove test that are not being migrated
test: dtest: copy unmodified limits_test.py
Generate view updates from a pending base replica if it's a reading
replica, i.e. it's in the last stage of transition write_both_read_new
before becoming the new base replica.
Previously we didn't generate view updates on a pending replica. The
problem with that is that when a base token is migrated from one replica
B1 to another B2, at one stage we generate view updates only from B1,
then at the next stage we generate view updates only from B2. During
this transition, it can happen that for some write neither B1 nor B2
generate view update, because each one sees the other as the base
replica.
We fix this by generating view updates from both base replicas in the
phase before the transition. We can generate view updates on the pending
replica in this case, even if it requires read-before-write, because
it's in a stage where it contains all data and serves reads.
Fixes https://github.com/scylladb/scylladb/issues/24292
backport not needed - the issue mostly affects MV with tablets which is still experimental
Closesscylladb/scylladb#25904
* github.com:scylladb/scylladb:
test: mv: test view update during topology operations
mv: generate view updates on both shards in intranode migration
mv: generate view updates on pending replica
Until now, every PutItem operation appeared in the Alternator Streams as
two events - a REMOVE and a MODIFY. DynamoDB Streams emits only INSERT
or MODIFY, depending on whether a row was replaced, or created anew. A
related issue scylladb#6918 concerns distinguishing the mutation type properly.
This was because each call to PutItem emitted the two CDC rows, returned
by GetRecords. Since this patch, we use a collection tombstone for the
`:attrs` column, and a separate tombstone for each regular column in the
table's schema. We don't expect that new tables would have any other
regular column, except for the `:attrs` and keys, but we may encounter
them in in upgraded tables which had old GSIs or LSIs.
Fixes: scylladb#6930.
Closesscylladb/scylladb#24991
We would like to have an additional service level
available for users of the Vector Store service,
which would allow us to de/prioritize vector
operations as needed. To allow that, we increase
the number of scheduling groups from 19 to 20
and adjust the related test accordingly.
Closesscylladb/scylladb#26316
We had very rudimentary tests for the "duration" CQL type in the cqlpy
framework - just for reproducing issue #8001. But we left two
alternative formats, and a lot of corner cases, untested.
So this patch aims to add the missing tests - to exhaustively cover
the "duration" literal formats and their capabilities.
Some of the examples tested in the new test are inspired by Cassandra's
unit test test/unit/org/apache/cassandra/cql3/DurationTest.java and the
corner cases that this file covers. However, the new tests are not direct
translation of that file because DurationTest.java was not a CQL test -
it was a unit test of Cassandra's internal "Duration" type, so could not
be directly translated into a CQL-based test.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25092
The handler in question when called for tablets-enabled keyspace, returns ranges that are inconsistent with those from system.tablets. Like this:
system.tablets:
```
TabletReplicas(last_token=-4611686018427387905, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 0)])
TabletReplicas(last_token=-1, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)])
TabletReplicas(last_token=4611686018427387903, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 1), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)])
TabletReplicas(last_token=9223372036854775807, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 1), ('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0)])
```
range_to_endpoint_map:
```
{'key': ['-9069053676502949657', '-8925522303269734226'], 'value': ['127.110.40.2', '127.110.40.3']}
{'key': ['-8925522303269734226', '-8868737574445419305'], 'value': ['127.110.40.2', '127.110.40.3']}
...
{'key': ['-337928553869203886', '-288500562444694340'], 'value': ['127.110.40.1', '127.110.40.3']}
{'key': ['-288500562444694340', '105026475358661740'], 'value': ['127.110.40.1', '127.110.40.3']}
{'key': ['105026475358661740', '611365860935890281'], 'value': ['127.110.40.1', '127.110.40.3']}
...
{'key': ['8307064440200319556', '9117218379311179096'], 'value': ['127.110.40.2', '127.110.40.1']}
{'key': ['9117218379311179096', '9125431458286674075'], 'value': ['127.110.40.2', '127.110.40.1']}
```
Not only the number of ranges differs, but also separating tokens do not match (e.g. tokens -2 and 0 belong to different tablets according to system.tablets, but fall into the same "range" in the API result).
The source of confusion is that despite storage_service::get_range_to_address_map() is given correct e.r.m. pointer from the table, it still uses token_metadata::sorted_token() to work with. The fix is -- when the e.r.m. is per-table, the tokens should be get from token_metadata's tablet_map (e.g. compare this to storage_service::effective_ownership() -- it grabs tokens differently for vnodes/tables cases).
This PR fixes the mentioned problem and adds validation test. The test also checks /storage_service/describe_ring endpoint that happens to return correct set of values.
The API is very ancient, so the bug is present in all versions with tablets
Fixes#26331Closesscylladb/scylladb#26231
* github.com:scylladb/scylladb:
test: Add validation of data returned by /storage_service endpoints
test,lib: Add range_to_endpoint_map() method to rest client
api: Indentation fix after previous patches
storage_service: Get tablet tokens if e.r.m. is per-table
storage_service,api: Get e.r.m. inside get_range_to_address_map()
storage_service: Calculate tokens on stack
This is yet another part in the BTI index project.
Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25626
Next parts: make `ms` the default. Then, general tweaks and improvements. Later, potentially a full `da` format implementation.
This patch series introduces a new, Scylla-only sstable format version `ms`, which is like `me`, but with the index components (Summary.db and Index.db) replaced with BTI index components (Partitions.db and Rows.db), as they are in Cassandra 5.0's `da` format version.
(Eventually we want to just implement `da`, but there are several other changes (unrelated to the index files) between `me` and `da`. By adding this `ms` as an intermediate step we can adapt the new index formats without dragging all the other changes into the mix (and raising the risk of regressions, which is already high)).
The high-level structure of the PR is:
1. Introduce new component types — `Partitions` and `Rows`.
2. Teach `class sstable` to open them when they exist.
3. Teach the sstable writer how to write index data to them.
4. Teach `class sstable` and unit tests how to deal with sstables that have no `Index` or `Summary` (but have `Partitions` and `Rows` instead).
5. Introduce the new sstable version `ms`, specify that it has `Partitions` and `Rows` instead of `Index` and `Summary`.
6. Prepare unit tests for the appearance of `ms`.
7. Enable `ms` in unit tests.
8. Make `ms` enablable via db::config (with a silent fall back to `me` until the new `MS_SSTABLE_FORMAT` cluster feature is enabled).
9. Prepare integration tests for the appearance of `ms`.
10. Enable both `ms` and `me` in tests where we want both versions to be tested.
This series doesn't make `ms` the default yet, because that requires teaching Scylla Manager and a few dtests about the new format first. It can be enabled by setting `sstable_format: ms` in the config.
Per a review request, here is an example from `perf_fast_forward`, demonstrating some motivation for a new format. (Although not the main one. The main motivations are getting rid of restrictions on the RAM:disk ratio, and index read throughput for datasets with tiny partitions). The dataset was populated with `build/release/scylla perf-fast-forward --smp=1 --sstable-format=$VERSION --data-directory=data.$VERSION --column-index-size-in-kb=1 --populate --random-seed=0`.
This test involves a partition with 1000000 clustering rows (with 32-bit keys and 100-byte values) and ~500 index blocks, and queries a few particular rows from the partition. Since the branching factor for the BIG promoted index is 2 (it's a binary search), the lookup involves ~11.2 sequential page reads per row. The BTI format has a more reasonable branching factor, so it involves ~2.3 page reads per row.
`build/release/scylla perf-fast-forward --smp=1 --data-directory=perf_fast_forward_data/me --run-tests=large-partition-select-few-rows`:
```
offset stride rows iterations avg aio aio (KiB)
500000 1 1 70 18.0 18 128
500001 1 1 647 19.0 19 132
0 1000000 1 748 15.0 15 116
0 500000 2 372 29.0 29 284
0 250000 4 227 56.0 56 504
0 125000 8 116 106.0 106 928
0 62500 16 67 195.0 195 1732
```
`build/release/scylla perf-fast-forward --smp=1 --data-directory=perf_fast_forward_data/ms --run-tests=large-partition-select-few-rows`:
```
offset stride rows iterations avg aio aio (KiB)
500000 1 1 51 5.1 5 20
500001 1 1 64 5.3 5 20
0 1000000 1 679 4.0 4 16
0 500000 2 492 8.0 8 88
0 250000 4 804 16.0 16 232
0 125000 8 409 31.0 31 516
0 62500 16 97 54.0 54 1056
```
Index file size comparison for the default `perf_fast_forward` tables with `--random-seed=0`:
Large partition table (dominated by intra-partition index): 2.4 MB with `me`, 732 kB with `ms`.
For the small partitions table (dominated by inter-partition index): 11 MB with `me`, 8.4 MB with `ms`.
External tests:
I ran SCT test `longevity-mv-si-4days-streaming-test` test on 6 nodes with 30 shards each for 8 hours. No anomalies were observed.
New functionality, no backport needed.
Closesscylladb/scylladb#26215
* github.com:scylladb/scylladb:
test/boost/bloom_filter_test: add test_rebuild_from_temporary_hashes
test/cluster: add test_bti_index.py
test: prepare bypass_cache_test.py for `ms` sstables
sstables/trie/bti_index_reader: add a failure injection in advance_lower_and_check_if_present
test/cqlpy/test_sstable_validation.py: prepare the test for `ms` sstables
tools/scylla-sstable: add `--sstable-version=?` to `scylla sstable write`
db/config: expose "ms" format to the users via database config
test: in Python tests, prepare some sstable filename regexes for `ms`
sstables: add `ms` to `all_sstable_versions`
test/boost/sstable_3_x_test: add `ms` sstables to multi-version tests
test/lib/index_reader_assertions: skip some row index checks for BTI indexes
test/boost/sstable_inexact_index_test: explicitly use a `me` sstable
test/boost/sstable_datafile_test: skip test_broken_promoted_index_is_skipped for `ms` sstables
test/resource: add `ms` sample sstable files for relevant tests
test/boost/sstable_compaction_test: prepare for `ms` sstables.
test/boost/index_reader_test: prepare for `ms` sstables
test/boost/bloom_filter_tests: prepare for `ms` sstables
test/boost/sstable_datafile_test: prepare for `ms` sstables
test/boost/sstable_test: prepare for `ms` sstables.
sstables: introduce `ms` sstable format version
tools/scylla-sstable: default to "preferred" sstable version, not "highest"
sstables/mx/reader: use the same hashed_key for the bloom filter and the index reader
sstables/trie/bti_index_reader: allow the caller to passing a precalculated murmur hash
sstables/trie/bti_partition_index_writer: in add(), get the key hash from the caller
sstables/mx: make Index and Summary components optional
sstables: open Partitions.db early when it's needed to populate key range for sharding metadata
sstables: adapt sstable::set_first_and_last_keys to sstables without Summary
sstables: implement an alternative way to rebuild bloom filters for sstables without Index
utils/bloom_filter: add `add(const hashed_key&)`
sstables: adapt estimated_keys_for_range to sstables without Summary
sstables: make `sstable::estimated_keys_for_range` asynchronous
sstables/sstable: compute get_estimated_key_count() from Statistics instead of Summary
replica/database: add table::estimated_partitions_in_range()
sstables/mx: implement sstable::has_partition_key using a regular read
sstables: use BTI index for queries, when present and enabled
sstables/mx/writer: populate BTI index files
sstables: create and open BTI index files, when enabled
sstables: introduce Partition and Rows component types
sstables/mx/writer: make `_pi_write_m.partition_tombstone` a `sstables::deletion_time`
`SELECT` commands with SERIAL consistency level are historically allowed for vnode-based views, even though they don't provide linearizability guarantees and in general don't make much sense. In this PR we prohibit LWTs for tablet-based views, but preserve old behavior for vnode-based views for compatibility. Similar logic is applied to CDC log tables.
We also add a general check that disallows colocating a table with another colocated table, since this is not needed for now.
Fixes https://github.com/scylladb/scylladb/issues/26258
backports: not needed (a new feature)
Closesscylladb/scylladb#26284
* github.com:scylladb/scylladb:
cql_test_env.cc: log exception when callback throws
lwt: prohibit for tablet-based views and cdc logs
tablets: disallow chains of colocated tables
database: get_base_table_for_tablet_colocation: extract table_id_by_name lambda
The test looks at metrics to confirm whether queries
hit the row cache, the index cache or the disk,
depending on various settings.
BIG index readers use a two level, read-through index cache,
where the higher layer stores parsed "index pages"
of Index.db, while the lower layer is a cache of
raw 4kiB file pages of Index.db.
Therefore, if we want to count index cache hits,
the appropriate metric to check in this case is
`scylla_sstables_index_page_hits",
which counts hits in the higher layer.
This is done by the test.
However, BTI index readers don't have an equivalent of the higher cache
layer. Their cache only stores the raw 4 kiB pages, and the hits are
counted in `scylla_sstables_index_page_cache_hits`. (The same metric
is incremented by the lower layer of the BIG index cache).
Before this commit, the test would fail with `ms` sstables,
because their reads don't increment `scylla_sstables_index_page_hits`.
In this commit we adapt the test so that it instead checks
`scylla_sstables_index_page_cache_hits` for `ms` sstables.
BIG sstables and BTI sstables use different code paths for
validating the Data file against the index. So we want to test
both types of indexes, not just the default one.
This patch changes the test so that it explicitly tests both
`me` and `ms` instead of only testing the default format.
Note that we disable some tests for BTI indexes:
the tests which check that validation detects mismatches
between the row index ("promoted index") and the Data file.
This is because currently iteration over the row
index in BTI isn't implemented at the moment,
so for BTI the validation behaves as if there was no row indexes.
Add `ms` to tests which already test many format versions.
The tests check that sstable files in newer verisons are
the same as in `mc`.
Arbitrarily, for `ms`, we only check the files common
between `mc` and `ms`.
If we want to extend this test more, so that it checks
that `Partitions.db` and `Rows.db` don't change over time,
we have to add `ms` versions of all the sstables under
`test/resources` which are used in this test. We won't do that
in this patch series. And I'm not sure if we want to do that at all.
Block monotonicity checks can't be implemented for BTI row indexes
because they don't store full clustering positions, only some encoded
prefixes.
The emptiness check could be implemented with some effort,
but we currently don't bother.
The two tests which use this `is_empty()` method aren't very
useful anyway. (They check that the promoted index is empty when
there are no clustering keys. That doesn't really need a dedicated
test).
The test currently implicitly uses the default sstable format.
But it assumes that the index reader type is `sstables::index_reader`,
and it wants some methods specific to that type (and absent from
the base `abstract_index_reader`).
If we switch the default format from `me` to `ms`,
without doing something about this,
this test will start failing on the downcast to `sstables::index_reader`.
We deal with this by explicitly specifying `me`.
`me` and `ms` data readers are identical.
And this is a test of the data reader, not the index reader.
So it's perfectly fine to just use `me`.
This is an old test for some workaround for incorrectly-generated
promoted indexes. It doesn't make sense to port this test to newer
sstable formats. So just skip it for the new sstable versions.
There are some tests which want sstables of all format versions
in `test/resource`. This tests adds `ms` files for those tests.
I didn't think much about this change, I just mechanically
generated the `ms` from the existing `me` sstables in the same directories
(using `scylla sstable upgrade`) for the tests which were complaining
about the lack of `ms` files.
This PR adds metrics to the vector store client, as described in https://scylladb.atlassian.net/wiki/spaces/RND/pages/86245395/Vector+Store+Core+APIs#Metrics:
- number of the dns refreshes
We would like the dns refreshes to see if the network client is working properly.
Here is the added metric:
\# HELP scylla_vector_store_dns_refreshes Number of DNS refreshes
\# TYPE scylla_vector_store_dns_refreshes gauge
scylla_vector_store_dns_refreshes{shard="0"} 1.000000
Fixes: VECTOR-68
Closesscylladb/scylladb#25288
* github.com:scylladb/scylladb:
metrics, test: added a test case for vs metrics
metrics, vector_search: add a dns refresh metric
vector_search: move the ann implementation to impl
When a test fails inside a do_with_cql_env callback, the logs don’t
make it clear where the failure happened. This is because cql_env
immediately begins shutting down services, which obscures the
original failure.
PR #26237 fixed linker errors by linking `cql3` to `vector_search` but
this introduced a circular dependency between these two static
libraries, sometimes causing failures during compilation :
```
ninja: error: dependency cycle:
/home/user/Development/scylladb/build/debug/cql3/CqlParser.hpp ->
data_dictionary/libdata_dictionary.a ->
data_dictionary/CMakeFiles/data_dictionary.dir/data_dictionary.cc.o ->
/home/user/Development/scylladb/build/debug/cql3/CqlParser.hpp
```
So, instead of linking the `vector_search` library to the `cql3`
library, link it directly to the executable where the `cql3` library is
also to be linked. For the test cases, this means linking
`vector_search` to the `test-lib` library. Since both `vector_search`
and `cql3` are static libraries, the linker will resolve them correctly
regardless of the order in which they are linked.
Refs #26235
Refs #26237
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#26318
add new test cases checking view consistency when writing to a table
with MV and generating view updates while data is migrated.
one case has tablet migrations while writing to the table. The other
case does the equivalent for vnode keyspaces - it adds a new node.
The tests reproduce issue scylladb/scylladb#24292
Before this patch, the test `cluster/test_alternator::test_localnodes_joining_nodes` was one of the slowest tests in the test/cluster framework, taking over two minutes to run.
As comments in the test already acknowledged, there was no good reason why this test had to be so slow. The test needed to, intentionally, boot a server which took a long time (2 minutes) to fail its boot. But it didn't really need to wait for this failure - the right thing to do was to just kill the server at the end of the test. But we just didn't have the test-framework API to do it. So in this series, the first patch introduces the missing API, and the second patch uses it to fix test_localnodes_joining_nodes to kill the (unsuccessfully) booting server.
After this patch, the test takes just 7 seconds to run.
This is a test speedup only, so no real need to backport it - old release anyway get fewer test runs and the latency of these runs is less important.
Closesscylladb/scylladb#25312
* github.com:scylladb/scylladb:
test/cluster: greatly speed up test_localnodes_joining_nodes
test/pylib: add the ability to stop currently-starting servers
Partitions.db uses a piece of the murmur hash of the partition key
internally. The same hash is used to query the bloom filter.
So to avoid computing the hash twice (which involves converting the
key into a hashable linearized form) it would make sense to use
the same `hashed_key` for both purposes.
This is what we do in this patch. We extract the computation
of the `hashed_key` from `make_pk_filter` up to its parent
`sstable_set_impl::create_single_key_sstable_reader`,
and we pass this hash down both to `make_pk_filter` and
to the sstable reader. (And we add a pointer to the `hashed_key`
as a parameter to all functions along the way, to propagate it).
The number of parameters to `mx::make_reader` is getting uncomfortable.
Maybe they should be packed into some structs.
Partitions.db internally uses a piece of the partition key murmur
hash (the same hash which is used to compute the token and the
relevant bits in the bloom filter). Before this patch,
the Partitions.db reader computes the hash internally from the
`sstables::partition_key`.
That's a waste, because this hash is usually also computed
for bloom filter purposes just before that.
So in this patch we let the caller pass that hash instead.
The old index interface, without the hash, is kept for convenience.
In this patch we only add a new interface, we don't switch the callers
to it yet. That will happen in the next commit.
Partitions.db internally uses a piece of the partition key murmur
hash (the same hash which is used to compute the token and the
relevant bits in the bloom filter). Before this patch,
the Partitions.db writer computes the hash internally from the
`sstables::partition_key`.
That's a waste, because this hash is also computed for bloom filter
purposes just before that, in the owning sstable writer.
So in this patch we let the caller pass that hash here instead.
`sstable::set_first_and_last_keys` currently takes the first and last
key from the Summary component. But if only BTI indexes are used,
this component will be nonexistent. In this case, we can use the first
and last keys written in the footer of Partitions.db.
Currently, `sstable::estimated_keys_for_range` works by
checking what fraction of Summary is covered by the given
range, and multiplying this fraction to the number of all keys.
Since computing things on Summary doesn't involve I/O (because Summary
is always kept in RAM), this is synchronous.
In a later patch, we will modify `sstable::estimated_keys_for_range`
so that it can deal with sstables that don't have a Summary
(because they use BTI indexes instead of BIG indexes).
In that case, the function is going to compute the relevant fraction
by using the index instead of Summary. This will require making
the function asynchronous. This is what we do in this patch.
(The actual change to the logic of `sstable::estimated_keys_for_range`
will come in the next patch. In this one, we only make it asynchronous).