scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-22 07:42:16 +00:00

Author	SHA1	Message	Date
Szymon Malewski	15493872b2	vector_search: fix decimal/varint precision loss in filter value_to_json() value_to_json() converts CQL values to JSON for vector search filters. For decimal and varint types, it used rjson::parse() on the JSON string, which parses through a double and silently loses precision for values exceeding ~15 significant digits — producing wrong filter results. Additionally, for decimal type we need an exact string representation that preserves the original (unscaled, scale) pair, because partition keys use byte-level identity: different serialized representations of the same numeric value are distinct rows, so the filter must reproduce the exact representation stored in the key. Add big_decimal::to_string_canonical() which follows the Java BigDecimal toString() spec (JDK 8+), producing a bijective string representation that uses exponential notation for extreme scales instead of expanding trailing zeros (which could cause OOM). This could replace to_string(), but doing so has wider consequences (e.g. hash/equality contract for decimal_type) described in SCYLLADB-1574. Use it in value_to_json() for decimal_type, and use rjson::from_string() for varint_type, both bypassing the lossy double parse path. Tests cover the new to_string_canonical() and the filter fix, as well as existing decimal type behavior (key representation, clustering order, toJson) that we rely on and must not break. The CQL decimal type tests (test_type_decimal.py) also pass against Cassandra. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1583 Refs: https://scylladb.atlassian.net/browse/SCYLLADB-1574 Closes scylladb/scylladb#29505	2026-05-18 17:07:26 +03:00
Nadav Har'El	c56361a6d7	vector_store_client: read and return similarity_scores The vector store returns for every ANN search, in addition to the keys of the matching items, two additional vectors - "distances" and "similarity_cores". The "distances" are raw distance metrics - lower scores are better matches, while "similarity_scores" are modified such that higher scores are better matches. Traditionally, search scores in systems like Cassandra and Open Search use the "similarity scores" approach (higher is better, results are returned in decreasing similarity order), so this is the more interesting vector of the two. But before this patch, our vector_store_client::ann() inspected only "distances". But... then, it didn't return even that to the caller :-) So in this patch, we: 1. Ignore "distances" and instead look at "similarity scores", which is what users really want based on their experience with other vector and non-vector search engines. 2. Return the similarity score of each match together with the match. We already have this score (the vector store returns it) and we can add it to the existing primary_key structure of each result. So each result is a "struct primary_key" which has fields partition, clustering, and after this patch - similarity. Existing callers in CQL and Alternator vector search will ignore this "similarity" field in each result, and not notice it was added. But in the next patch, we'll allow Alternator's vector search to return this similarity in each result. The existing unit tests for vector_store_client.cc mocked vector-store responses with "distances", without "similarity_scores", so no longer represent what we actually expect the vector store to do. So this patch also contains modifications for these tests, to mock and to test "similarity_scores" - not "distances". The more interesting tests, in the next patch, use the real vector store and check that we really do get a "similarity_scores" response from it. This patch also handles a small corner case for DOT_PRODUCT, which is the only unbounded similarity function. If the similarity overflows the 32-bit float, the vector store returns a JSON "null" instead of a JSON number (since JSON doesn't support infinite numbers). Our existing vector-store client code errored out when it saw this "null", which is wrong - the request should be allowed to proceed. So in this patch when we see a "null" JSON for similarity, we return +Inf. This is usually correct because the top results really have +Inf, not -Inf, but if we ask for all items we can reach those with similarity -Inf and incorrectly assign +Inf to them (we have a test for this case in the next patch). But this problenm won't happen when Limit is low, and in any case it's better than aborting the request after it had already succeeded. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 14:19:17 +03:00
Nadav Har'El	34136d3bc2	Merge 'vector_search: test: migrate CQL tests for vector search from C++/Boost to pytest' from Karol Nowacki Migrate vector search (ANN ordered select query) CQL tests from C++/Boost suite to pytest. This migration includes: - New pytest tests in `test/cqlpy/test_vector_search_with_vector_store_mock.py` - VectorStoreMock server as pytest fixture to simulate vector store responses The benefits of this migration are: - Extended test coverage to verify CQL protocol serialization and driver - Reduced overall test time (no compilation required for pytest) Fixes SCYLLADB-695 No backport needed as this is a refactoring. Closes scylladb/scylladb#29593 * github.com:scylladb/scylladb: vector_search: test: migrate paging warnings tests to Python vector_search: test: migrate local_vector_index to Python vector_search: test: migrate vector_index_with_additional_filtering_column to Python vector_search: test: migrate cql_error_contains_http_error_description to Python vector_search: test: migrate pk in restriction test to Python	2026-05-10 22:09:17 +03:00
Karol Nowacki	20b953ef8c	vector_search: test: migrate paging warnings tests to Python Move the paging warning related tests from C++ vector_store_client_test to Python test_vector_search_with_vector_store_mock.	2026-05-05 18:23:30 +02:00
Karol Nowacki	84787ce6a5	vector_search: test: migrate local_vector_index to Python Move the local vector index test from C++ vector_store_client_test to Python test_vector_search_with_vector_store_mock. The test creates a local vector index on ((pk1, pk2), embedding) and verifies that SELECT with partition key restriction and ANN ordering works correctly.	2026-05-05 18:23:30 +02:00
Karol Nowacki	0bb7e47090	vector_search: test: migrate vector_index_with_additional_filtering_column to Python Move the SCYLLADB-635 regression test from C++ vector_store_client_test to Python test_vector_search_with_vector_store_mock. The test creates a vector index on (embedding, ck1) and verifies that SELECT with ANN ordering works correctly when additional filtering columns are included in the index definition.	2026-05-05 18:23:30 +02:00
Karol Nowacki	5a8af3c727	vector_search: test: migrate cql_error_contains_http_error_description to Python Move the test that verifies HTTP error descriptions from the vector store are propagated through CQL InvalidRequest messages from the C++ vector_store_client_test to the Python test_vector_search_with_vector_store_mock. The test configures the mock to return HTTP 404 with 'index does not exist' and asserts the CQL SELECT raises InvalidRequest containing '404'.	2026-05-05 18:23:30 +02:00
Karol Nowacki	b672972c5f	vector_search: test: migrate pk in restriction test to Python Move vector search (ANN ordered select query) with IN restrictions on partition key from C++/Boost test suite to pytest (cqlpy). Add VectorStoreMock server as pytest fixture to simulate vector store responses.	2026-05-05 18:23:30 +02:00
Karol Nowacki	207de967fb	vector_search: test: default timeout in test_dns_resolving_repeated Replace explicit 1-second timeouts in repeat_until() with the default STANDARD_WAIT (10s). The 1-second timeout could be too aggressive for loaded CI environments where lowres_clock granularity (~10ms) combined with OS scheduling delays and resource contention (-c2 -m2G) could cause the loop to expire before the DNS refresh task completes its cycle. This also unifies test timeouts across test cases.	2026-05-05 17:23:39 +02:00
Karol Nowacki	4722be1289	vector_search: test: fix flaky test_dns_resolving_repeated Move trigger_dns_resolver() inside the repeat_until loop instead of calling it once before the loop. The test was intermittently timing out on CI. The exact root cause is not fully understood, but the hypothesis is that a single trigger signal can be lost somewhere (not exactly known where). This is not an issue for the production code because refresh trigger will be called multiple times - in every query where all configured nodes will be unreachable. By triggering inside the loop, we ensure the signal is re-sent on each iteration until the resolver actually performs the refresh and picks up the new (failing) DNS resolution. This makes the test resilient to timing-dependent signal loss without changing production code. Fixes: SCYLLADB-1794	2026-05-05 17:23:39 +02:00
Avi Kivity	eec0b20dbc	cql3: statement_restrictions: prepare statement_restrictions for capturing `this` Prevent copying/moving, that can change the address, and instead enforce using shared_ptr. Most of the code is already using shared_ptr, so the changes aren't very large. To forbid non-shared_ptr construction, the constructors are annotated with a private_tag tag class.	2026-04-19 20:57:03 +03:00
Karol Nowacki	c643f321af	vector_search: decrease default connection timeout to 3s Decrease the default connection timeout to 3s to better align with the default CQL query timeout of 10s. The previous timeout allowed only one failover request in high availability scenario before hitting the CQL query timeout. By decreasing the timeout to 3s, we can perform up to three failover requests within the CQL query timeout, which significantly improves the chances of successfully completing the query in high availability scenarios. Fixes: SCYLLADB-95	2026-04-17 12:26:39 +03:00
Karol Nowacki	9269ca9cf7	vector_search: add unreachable node detection time config Add option `vector_store_unreachable_node_detection_time_in_ms` to control parameters related to detecting unreachable vector store nodes. This parameter is used to set the TCP connect timeout, keepalive parameters, and TCP_USER_TIMEOUT. By configuring these parameters, we can detect unreachable vector store nodes faster and trigger failover mechanisms in a timely manner.	2026-04-17 12:26:38 +03:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Michał Hudobski	7d648961ed	vector_search: forward non-primary key restrictions to Vector Store service Include non-primary key restrictions (e.g. regular column filters) in the filter JSON sent to the Vector Store service. Previously only partition key and clustering column restrictions were forwarded, so filtering on regular columns was silently ignored. Add get_nonprimary_key_restrictions() getter to statement_restrictions. Add unit tests for non-primary key equality, range, and bind marker restrictions in filter_test. Fixes: SCYLLADB-970 Closes scylladb/scylladb#29019	2026-04-10 17:16:29 +02:00
Nadav Har'El	22e7ef46a7	Merge 'vector_search: fix SELECT on local vector index' from Karol Nowacki Queries against local vector indexes were failing with the error: ```ANN ordering by vector requires the column to be indexed using 'vector_index'``` This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895 Backport to 2026.1 is required as this issue occurs also on this branch. Closes scylladb/scylladb#28862 * github.com:scylladb/scylladb: index: fix DESC INDEX for vector index vector_search: test: refactor boilerplate setup vector_search: fix SELECT on local vector index index: test: vector index target option serialization test index: test: secondary index target option serialization test	2026-04-07 17:43:35 +03:00
Avi Kivity	8c629d55b0	test: vector_search: check [[nodiscard]] return values of expected<> types Clang 22 verifies [[nodiscard]] for co_await, causing compilation failures where return values of expected<> were silently discarded. These call sites were discarding the return value of client::request() and vector_store_client::ann(), both of which return expected<> types marked [[nodiscard]]. Rather than suppressing the warning with (void) casts, properly check the return values using the established test patterns: BOOST_CHECK(result) where the call is expected to succeed, and BOOST_CHECK(!result) where the call is expected to fail. Closes scylladb/scylladb#29297	2026-04-07 15:25:08 +03:00
Karol Nowacki	a32e4bb9f4	vector_search: test: refactor boilerplate setup The test boilerplate setup for some vector store client tests has been extracted to a common function.	2026-03-30 16:46:48 +02:00
Karol Nowacki	6bc88e817f	vector_search: fix SELECT on local vector index Queries against local vector indexes were failing with the error: "ANN ordering by vector requires the column to be indexed using 'vector_index'" This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895	2026-03-30 16:46:48 +02:00
Karol Nowacki	7659a5b878	vector_search: test: fix flaky test The test assumes that the sleep duration will be at least the value of the sleep parameter. However, the actual sleep time can be slightly less than requested (e.g., a 100ms sleep request might result in a 99ms sleep). This commit adjusts the test's time comparison to be more lenient, preventing test flakiness.	2026-03-13 16:28:22 +01:00
Karol Nowacki	5474cc6cc2	vector_search: fix race condition on connection timeout When a `with_connect` operation timed out, the underlying connection attempt continued to run in the reactor. This could lead to a crash if the connection was established/rejected after the client object had already been destroyed. This issue was observed during the teardown phase of a upcoming high-availability test case. This commit fixes the race condition by ensuring the connection attempt is properly canceled on timeout. Additionally, the explicit TLS handshake previously forced during the connection is now deferred to the first I/O operation, which is the default and preferred behavior. Fixes: SCYLLADB-832	2026-03-13 16:28:22 +01:00
Szymon Wasik	d27610f138	vector_store_client: Return HTTP error description, not just code This simple patch adds support for storing the HTTP error description that Vector Store client receives from vector store. Until now it was just printed to the log but it was not returned. For this reason it was not forwarded to the drivers which forced users to access ScyllaDB server logs to understand what is wrong with Vector Store. This patch also updates formatter to print the message next to the error code. Fixes: VECTOR-189	2026-03-10 17:22:30 +01:00
Piotr Dulikowski	23ed0d4df8	Merge 'vector_search: fix TLS server name with IP' from Karol Nowacki SNI works only with DNS hostnames. Adding an IP address causes warnings on the server side. This change adds SNI only if it is not an IP address. This change has no unit tests, as this behavior is not critical, since it causes a warning on the server side. The critical part, that the server name is verified, is already covered. This PR also adds warning logs to improve future troubleshooting of connections to the vector-store nodes. Fixes: VECTOR-528 Backports to 2025.04 and 2026.01 are required, as these branches are also affected. Closes scylladb/scylladb#28637 * github.com:scylladb/scylladb: vector_search: fix TLS server name with IP vector_search: add warn log for failed ann requests	2026-03-09 15:03:22 +01:00
Marcin Maliszkiewicz	f177259316	Merge 'vector_search: small improvements' from Karol Nowacki vector_search: small improvements This PR addresses several minor code quality issues and style inconsistencies within the vector_search module. No backport is needed as these improvements are not visible to the end user. Closes scylladb/scylladb#28718 * github.com:scylladb/scylladb: vector_search: fix names of private members vector_search: remove unused global variable	2026-03-09 11:42:35 +01:00
Karol Nowacki	45477d9c6b	vector_search: test: include ANN error in assertion When the test fails, the assertion message does not include the error from the ANN request. This change enhances the assertion to include the specific ANN error, making it easier to diagnose test failures.	2026-03-03 14:19:20 +01:00
Karol Nowacki	ab6c222fc4	vector_search: test: fix HTTPS client test flakiness The default 100ms timeout for client readiness in tests is too aggressive. In some test environments, this is not enough time for client creation, which involves address resolution and TLS certificate reading, leading to flaky tests. This commit increases the default client creation timeout to 10 seconds. This makes the tests more robust, especially in slower execution environments, and prevents similar flakiness in other test cases. Fixes: VECTOR-547, SCYLLADB-802	2026-03-03 14:19:20 +01:00
Karol Nowacki	30487e8854	index: fix vector index with filtering target column The secondary index mechanism is currently used to determine the target column. This mechanism works incorrectly for vector indexes with filtering because it returns the last specified column as the target (vectors) column. However, the syntax for a vector index requires the first column to be the target: ``` CREATE CUSTOM INDEX ON t(vectors, users) USING 'vector_index'; ``` This discrepancy eventually leads to the following exception when performing an ANN search on a vector index with filtering columns: ```` ANN ordering by vector requires the column to be indexed using 'vector_index' ```` This commit fixes the issue by introducing dedicated logic for vector indexes to correctly identify the target(vectors) column. Fixes: SCYLLADB-635 Closes scylladb/scylladb#28740	2026-03-02 18:47:58 +02:00
Karol Nowacki	647172d4b8	vector_search: fix names of private members According to coding style in Scylla, member variables are prefixed with underscore.	2026-03-02 14:08:16 +01:00
Marcin Maliszkiewicz	c5dc086baf	Merge 'vector_search: return NaN for similarity_cosine with all-zero vectors' from Dawid Pawlik The ANN vector queries with all-zero vectors are allowed even on vector indexes with similarity function set to cosine. When enabling the rescoring option, those queries would fail as the rescoring calls `similarity_cosine` function underneath, causing an `InvalidRequest` exception as all-zero vectors were not allowed matching Cassandra's behaviour. To eliminate the discrepancy we want the all-zero vector `similarity_cosine` calls to pass, but return the NaN as the cosine similarity for zero vectors is mathematically incorrect. We decided not to use arbitrary values contrary to USearch, for which the distance (not to be confused with similarity) is defined as cos(0, 0) = 0, cos(0, x) = 1 while supporting the range of values [0, 2]. If we wanted to convert that to similarity, that would mean sim_cos(0, x) = 0.5, which does not support mathematical reasoning why that would be more similar than for example vectors marking obtuse angles. It's safe to assume that all-zero vectors for cosine similarity shouldn't make any impact, therefore we return NaN and eliminate them from best results. Adjusted the tests accordingly to check both proper Cassandra and Scylla's behaviour. Fixes: SCYLLADB-456 Backport to 2026.1 needed, as it fixes the bug for ANN vector queries using rescoring introduced there. Closes scylladb/scylladb#28609 * github.com:scylladb/scylladb: test/vector_search: add reproducer for rescoring with zero vectors vector_search: return NaN for similarity_cosine with all-zero vectors	2026-02-23 13:10:44 +01:00
Karol Nowacki	ca7f9a8baf	vector_search: fix TLS server name with IP SNI works only with DNS hostnames. Adding an IP address causes warnings on the server side. This change adds SNI only if it is not an IP address. This change has no unit tests, as this behavior is not critical, since it causes a warning on the server side. The critical part, that the server name is verified, is already covered. Fixes: VECTOR-528	2026-02-19 13:00:03 +01:00
Karol Nowacki	aef5ff7491	vector_search: test: Fix flaky cert rewrite test The test is flaky most likely because when TLS certificate rewrite happens simultaneously with an ANN request, the handshake can hang for a long time (~60s). This leads to a timeout in the test case. This change introduces a checkpoint in the test so that it will wait for the certificate rewrite to happen before sending an ANN request, which should prevent the handshake from hanging and make the test more reliable. Fixes: #28012	2026-02-12 09:58:54 +01:00
Dawid Pawlik	4e32502bb3	test/vector_search: add reproducer for rescoring with zero vectors Add reproducer for the SCYLLADB-456 issue following exception on ANN vector queries with rescoring with similarity cosine.	2026-02-11 13:41:09 +01:00
Szymon Malewski	29d090845a	vector_index: rescoring: Fetch oversampled rows So far with oversampling the extended set of keys was returned from VS, but query to the base table was still limited by the query `limit`. Now for rescoring we want to fetch rows for all the keys returned from VS. However later we need to restore the command limit, to trim result_set accordingly. For non-rescoring scenarios we trim directly keys set returned from VS if it happens to exceed query limit. With this change rescoring validation tests (except `no_nulls_in_rescored_results`) pass fully. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-83	2026-01-22 15:38:44 +01:00
Szymon Malewski	57e7a4fa4f	select_statement: Modify `needs_post_query_ordering` condition Our plan for rescoring is to use the existing post-query ordering mechanism to sort (and trim) result_set by similarity column. For general SELECT case this ordering is permitted only for queries with IN on the partition key and an ORDER BY, which is checked in `needs_post_query_ordering`. Recently this check was overriden for ANN queries in https://github.com/scylladb/scylladb/pull/28109 to enable IN queries handled by VS without excessive post-processing. In this patch we revert that change - ANN case will be handled by general check. However we change the condition - we will enable post processing anytime `_ordering_comparator` is set. In current implementation `_ordering_comparator` is created only in `select_statement::prepare` with `get_ordering_comparator`, only for the same conditions as were checked in `needs_post_query_ordering`, so this change should be transparent for general SELECT. For ANN query it is also not set (yet), so it will not influence ANN filtering, but we confirm that this functionality still works by adding filtering test: `test/vector_search/filter_test.cc::vector_store_client_test_filtering_ann_cql`. Rescoring ordering for ANN queries will be enabled when we add `_ordering_comparator` in following patch.	2026-01-22 15:38:44 +01:00
Nadav Har'El	5c1e525618	Merge 'vector_search: cache restrictions JSON at prepare time ' from Dawid Pawlik Add `prepared_filter` class which handles the preparation, construction, and caching of Vector Search filtering compatible JSON object. If no bind markers found in primary key restrictions, the JSON object will be built once at prepare time and cached for use during execution calls. Additionally, this patch moves the filter functions from `cql3::restrictions` to `vector_search` namespace and does some renaming to make the purpose of those functions clear. Follow-up: https://github.com/scylladb/scylladb/pull/28109 Fixes: [SCYLLADB-299](https://scylladb.atlassian.net/browse/SCYLLADB-299) [SCYLLADB-299]: https://scylladb.atlassian.net/browse/SCYLLADB-299?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28276 * github.com:scylladb/scylladb: test/vector_search: add filter tests with bind variables vector_search: cache restrictions JSON at prepare time refactor: vector_search: move filter logic to vector_search namespace	2026-01-21 19:03:35 +02:00
Szymon Malewski	d6226500f6	vector_search: Add more rescoring validation tests Adding tests for specific cases of rescoring processing: - wildcard selection - "SELECT * ..." is a case with slightly different path of rescoring processing. We want to confirm that it is handled correctly. - calculating similarity with other vectors in SELECT clause should not influence ANN ordering. - NULL handling - results that for any reason have NULL in a score should be filtered out. As rescoring is not implemented yet, the tests use boost::unit_test::expected_failures to indicate that the test reports errors.	2026-01-20 21:01:45 +01:00
Karol Nowacki	376c70be75	vector_search: Add rescoring validation test Verify that vector store results will be correctly rescored and reordered according to the rescoring algorithm. As rescoring is not implemented yet, the tests use `boost::unit_test::expected_failures` to indicate that they report errors. First test checks rescoring with a simple selection list. Second makes sure that rescoring is not triggered for quantization=f32 - full representation of vectors. Third repeats the first one, but adds to it returning of similarity score value.	2026-01-20 21:01:45 +01:00
Karol Nowacki	b268eda67e	vector_search: test: Add `rescoring` index option test Add tests to validate `rescoring` index options. It also improves tests for related `oversampling` option validation.	2026-01-20 21:01:45 +01:00
Dawid Pawlik	f27ef79d0d	test/vector_search: add filter tests with bind variables Add tests that check if preparation of the filter does work and not produce cache when the restrictions consist of bind variables.	2026-01-20 18:17:46 +01:00
Dawid Pawlik	e62cb29b7d	vector_search: cache restrictions JSON at prepare time Add `prepared_filter` class which handles the preparation, construction and caching of Vector Search filtering compatible JSON object. If no bind markers found in SELECT statement, the JSON object will be built once at prepare time and cached for use during execution calls. Adjust tests accordingly to use prepared filters. Follow-up: #28109 Fixes: SCYLLADB-299	2026-01-20 17:15:52 +01:00
Dawid Pawlik	f54a4010c0	refactor: vector_search: move filter logic to vector_search namespace Move Vector Search filter functions from `cql3::restrictions` to `vector_search` namespace as it's a better place according to it's purpose. The effective name has now changed from `cql3::restrictions::to_json` to `vector_search::to_json` which clearly mentions that the JSON object will be used for Vector Search. Rename the auxilary functions to use `to_json` suffix instead of variety of verbs as those functions logic focuses on building JSON object from different structures. The late naming emphasized too much distinction between those functions, while they do pretty much the same thing. Follow-up: #28109	2026-01-20 13:13:43 +01:00
Karol Nowacki	bca17290f4	vector_search: test: Test oversampling Add test to verify that Scylla correctly oversamples the limit according to the oversampling option.	2026-01-19 10:28:46 +01:00
Karol Nowacki	e347f6d0d4	vector_search: test: Add rescoring index options test Add tests to validate quantization and oversampling index options.	2026-01-19 10:28:44 +01:00
Karol Nowacki	24b037e8e3	vector_search: test: Extract Configure utility to shared header Move Configure test utility to dedicated file for reuse across test suites.	2026-01-19 10:21:44 +01:00
Nadav Har'El	34d28475d9	Merge 'Implement Vector Search filtering API' from Dawid Pawlik Since Vector Store service filtering API has been implemented (scylladb/vector-store#334), there is a need for the implementation of Scylla side part. This patch should implement a `statement_restrictions` parsing into Vector Store filtering API compatible JSON objects. Those objects should be added to ANN query vector POST requests as `filter` object. After this patch, the subset of all operations ([Vector Search Filtering Milestone 1](https://scylladb.atlassian.net/wiki/spaces/RND/pages/156729450/Vector+Search+Filtering+Design+Document#Milestone-1)) happy path should be completed, allowing users to filter on primary key columns with single column `=` and `IN` or multiple column `()=()` and `() IN ()`. The restrictions for other operations should be implemented in a PR on Vector Store service side. --- This PR implements parsing the `statement_restrictions` into Vector Store filtering API compatible JSON objects. The JSON objects are created and used in ANN vector queries with filtering. It closes the Scylla side implementation of Vector Search filtering milestone 1. Unit tests for `statement_restrictions` parsing are added. Integration tests will be added on Vector Store service side PR. --- Fixes: SCYLLADB-249 New feature, should land into 2026.1 Closes scylladb/scylladb#28109 * github.com:scylladb/scylladb: docs: update documentation on filtering with vector queries test/vector_search: add test for filtered ANN with VS mock test/vector_search: add restriction to JSON conversion unit tests vector_search: cql: construct and use filter in ANN vector queries select_statement: do not require post query ordering for vector queries vector_search: add `statement_restrictions` to JSON parsing	2026-01-18 16:11:29 +02:00
Dawid Pawlik	67d3454d2b	test/vector_search: add test for filtered ANN with VS mock Implement a test using Vector Store mock to check if end-to-end integration works with filtered ANN query.	2026-01-16 11:18:23 +01:00
Dawid Pawlik	a54be82536	test/vector_search: add restriction to JSON conversion unit tests Add unit tests for conversion of CQL restrictions to Vector Store filtering API compatible JSON objects. The tests include: - empty restriction - `ALLOW FILTERING` in restriction - single column restrictions - `=`, `<`, `>`, `<=`, `>=`, `IN` - multiple column restrictions - `()=()`, `()<()`, `()>()`, `()<=()`, `()>=()`, `() IN ()` - multiple restrictions conjunction - `TEXT` and `BOOLEAN` column restrictions	2026-01-16 11:18:23 +01:00
Dawid Pawlik	2a38794b8e	vector_search: cql: construct and use filter in ANN vector queries Add `filter` option in `ann()` function to write the filter JSON object as the POST request in ANN vector queries. Adjust existing `vector_store_client_test` tests accordingly.	2026-01-16 11:18:23 +01:00
Avi Kivity	c6dfae5661	treewide: #include Seastar headers with angle brackets Seastar is an external library from the point of view of ScyllaDB, so should be included with angle brackets. Closes scylladb/scylladb#27947	2026-01-13 14:56:15 +02:00
Karol Nowacki	addac8b3f7	vector_search: test: Fix flaky DNS resolution test The `vector_store_client_test_dns_resolving_repeated` test had race conditions causing it to be flaky. Two main issues were identified: 1. Race between initial refresh and manual trigger: The test assumes a specific resolution sequence, but timing variations between the initial DNS refresh (on client creation) and the first manual trigger (in the test loop) can cause unexpected delayed scheduling. 2. Extra triggers from resolve_hostname fiber: During the client refresh phase, the background DNS fiber clears the client list. If resolve_hostname executes in the window after clearing but before the update completes, pending triggers are processed, incrementing the resolution count unexpectedly. At count 6, the mock resolver returns a valid address (count % 3 == 0), causing the test to fail. The fix relaxes test assertions to verify retry behavior and client clearing on DNS address loss, rather than enforcing exact resolution counts. Fixes: #27074 Closes scylladb/scylladb#27685	2025-12-21 20:02:16 +02:00

1 2

81 Commits