A Vector Store node is now considered down if it returns an HTTP 5xx status.
This can happen, for example, if the node fails to
connect to the database or has not completed its initial full scan.
The logic for marking a node as 'up' is also enhanced. A node is now
only considered up when its status is 'SERVING'.
Move the response_content_to_sstring utility function from
vector_store_client.cc to utils.hh to enable reuse across
multiple files.
This refactoring prepares for the upcoming `client.cc` implementation
that will also need this functionality.
Introduce dedicated unit tests for the client class to verify existing
functionality and serve as regression tests.
These tests ensure that invalid client requests do not cause nodes to
be marked as down.
The maximum backoff delay for status checking now depends on the
`read_request_timeout_in_ms` configuration option. The delay is set
to twice the value of this parameter.
This exception should only occur due to internal errors, not client or external issues.
If triggered, it indicates an internal problem. Therefore, we notify about this exception
using on_internal_error_noexcept.
Introduces logic to mark clients that fail to answer an ANN request as
"down". Down clients are omitted from further requests until they
successfully respond to a health check.
Health checks for down clients are performed in the background using the
`status` endpoint, with an exponential backoff retry policy ranging
from 100ms to 20s.
To unify error handling, the low-level client methods now return
`std::expected` instead of throwing exceptions. This allows for
consistent and explicit error propagation from the client up to the
caller.
The relevant error types have been moved to a new `vector_search/error.hh`
header to centralize their definitions.
This refactoring extracts low-level client logic into a new, dedicated
`client` class. The new class is responsible for connecting to the
server and serializing requests.
This change prepares for extending the `vector_store_client` to check
node status via the `api/v1/status` endpoint.
`/ann` Response deserialization remains in the `vector_store_client` as it
is schema-dependent.
This patch removes the dependence of vector search module
on the cql3 module by moving the contents of cql3/type_json.hh
to types/json_utils.hh and removing the usage of cql3 primary_key
object in vector_store_client. We also make the needed adjustments
to files that were previously using the afformentioned type_json.hh
file.
This fixes the circular dependency cql3 <-> vector_search.
Closesscylladb/scylladb#26482
The `vector_store_client_uri_update_to_invalid` test was flaky because
it performed real DNS lookups, making it dependent on the network
environment.
This commit replaces the live DNS queries with a mock to make the test
hermetic and prevent intermittent failures.
`vector_search_metrics_test` test did not call configure{vs},
as a consequence the test did real DNS queries, which made the test
flaky.
The refreshes counter increment has been moved before the call to the resolver.
In tests, the resolver is mocked leading to lack of increments in production code.
Without this change, there is no way to test DNS counter increments.
The change also simplifies the test making it more readable.
This commit adds a dns refresh counting metric
to the vector_store service. We would like to
track it to make sure that the networking is working
correctly.
The vector store client now supports a comma-separated list of URIs in
the `vector_store_primary_uri` configuration option.
It uses the vector store nodes from these URIs for load balancing and high
availability, querying the next node if the current one fails.
The `vector_store_client::port()` and `vector_store_client::host()` methods
were only used in the test code.
Moreover, these tests are no longer needed, as the proper parsing of the
URI is already tested in other tests that perform requests to the
vector store server mock.
This change introduces a load balancing mechanism for the vector store client.
The client can now distribute requests across multiple vector store nodes.
The distribution mechanism performs random selection of nodes for each request.
Rename `HTTP_REQUEST_RETRIES` to `ANN_RETRIES` in `vector_store_client`,
as it now applies to all vector store nodes, not just HTTP requests.
Also, remove an unused test setter function.
The DNS resolution logic now processes all IP addresses returned in a DNS
response, not just the primary one.
The client will iterate through the list of resolved IPs, attempting to
query the next one if a request fails. This improves high availability
by allowing the client to query other available nodes if one is down.
The DNS resolution logic and its background task are moved out of the
`vector_store_client` and into a new, dedicated class `vector_search::dns`.
This refactoring is the first step towards supporting DNS hostnames
that resolve to multiple IP addresses.
Signed-off-by: Karol Nowacki <karol.nowacki@scylladb.com>
Vector search related implementation moved to a new module vector_search.
As the vector search functionality is going to be extended, it is
better to keep it in a separate module.