Files
scylladb/index/vector_index.hh
Karol Nowacki bcd320a82a vector_search: fix SELECT on local vector index
Queries against local vector indexes were failing with the error:
"ANN ordering by vector requires the column to be indexed using 'vector_index'"

This was a regression introduced by 15788c3734, which incorrectly
assumed the first column in the targets list is always the vector column.
For local vector indexes, the first column is the partition key, causing
the failure.

Previously, serialization logic for the target index option was shared
between vector and secondary indexes. This is no longer viable due to
the introduction of local vector indexes and vector indexes with filtering
columns, which have  different target format.

This commit introduces a dedicated JSON-based serialization format for
vector index targets, identifying the target column (tc), filtering
columns (fc), and partition key columns (pk). This ensures unambiguous
serialization and deserialization for all vector index types.

This change is backward compatible for regular vector indexes. However,
it breaks compatibility for local vector indexes and vector indexes with
filtering columns created in version 2026.1.0. To mitigate this, usage
of these specific index types will be blocked in the 2026.1.0 release
by failing ANN queries against them in vector-store service.

Fixes: SCYLLADB-895
2026-04-20 12:33:13 +02:00

55 lines
2.3 KiB
C++

/*
* Copyright 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include "schema/schema.hh"
#include "data_dictionary/data_dictionary.hh"
#include "cql3/statements/index_target.hh"
#include "index/secondary_index_manager.hh"
#include <vector>
namespace secondary_index {
class vector_index: public custom_index {
public:
// The minimal TTL for the CDC used by Vector Search.
// Required to ensure that the data is not deleted until the vector index is fully built.
static constexpr int VS_TTL_SECONDS = 86400; // 24 hours
vector_index() = default;
~vector_index() override = default;
std::optional<cql3::description> describe(const index_metadata& im, const schema& base_schema) const override;
bool view_should_exist() const override;
void validate(const schema &schema, const cql3::statements::index_specific_prop_defs &properties,
const std::vector<::shared_ptr<cql3::statements::index_target>> &targets, const gms::feature_service& fs,
const data_dictionary::database& db) const override;
table_schema_version index_version(const schema& schema) override;
static bool has_vector_index(const schema& s);
static bool has_vector_index_on_column(const schema& s, const sstring& target_name);
static bool is_vector_index_on_column(const index_metadata& im, const sstring& target_name);
static void check_cdc_options(const schema& schema);
static sstring serialize_targets(const std::vector<::shared_ptr<cql3::statements::index_target>>& targets);
static sstring get_target_column(const sstring& targets);
static bool is_rescoring_enabled(const index_options_map& properties);
static float get_oversampling(const index_options_map& properties);
static sstring get_cql_similarity_function_name(const index_options_map& properties);
private:
void check_uses_tablets(const schema& schema, const data_dictionary::database& db) const;
void check_cdc_not_explicitly_disabled(const schema& schema) const;
void check_target(const schema& schema, const std::vector<::shared_ptr<cql3::statements::index_target>>& targets) const;
void check_index_options(const cql3::statements::index_specific_prop_defs& properties) const;
};
std::unique_ptr<secondary_index::custom_index> vector_index_factory();
}