Problem
-------
Secondary indexes are implemented via materialized views under the
hood. The way an index behaves is determined by the configuration
of the view. Currently, it can be modified by performing the CQL
statement `ALTER MATERIALIZED VIEW` on it. However, that raises some
concerns.
Consider, for instance, the following scenario:
1. The user creates a secondary index on a table.
2. In parallel, the user performs writes to the base table.
3. The user modifies the underlying materialized view, e.g. by setting
the `synchronous_updates` to `true` [1].
Some of the writes that happened before step 3 used the default value
of the property (which is `false`). That had an actual consequence
on what happened later on: the view updates were performed
asynchronously. Only after step 3 had finished did it change.
Unfortunately, as of now, there is no way to avoid a situation like
that. Whenever the user wants to configure a secondary index they're
creating, they need to do it in another schema change. Since it's
not always possible to control how the database is manipulated in
the meantime, it leads to problems like the one described.
That's not all, though. The fact that it's not possible to configure
secondary indexes is inconsistent with other schema entities. When
it comes to tables or materialized views, the user always have a means
to set some or even all of the properties during their creation.
Solution
--------
The solution to this problem is extending the `CREATE INDEX` CQL
statement by view properties. The syntax is of form:
```
> CREATE INDEX <index name>
> .. ON <keyspace>.<table> (<columns>)
> .. WITH <properties>
```
where `<properties>` corresponds to both index-specific and view
properties [2, 3]. View properties can only be used with indexes
implemented with materialized views; for example, it will be impossible
to create a vector index when specifying any view property (see
examples below).
When a view property is provided, it will be applied when creating the
underlying materialized view. The behavior should be similar to how
other CQL statements responsible for creating schema entities work.
High-level implementation strategy
----------------------------------
1. Make auxiliary changes.
2. Introduce data structures representing the new set of index
properties: both index-specific and those corresponding to the
underlying view.
3. Extend `CREATE INDEX` to accept view properties.
4. Extend `DESCRIBE INDEX` and other `DESCRIBE` statements to include
view properties in their output.
User documentation is also updated at the steps to reflect the
corresponding changes.
Implementation considerations
-----------------------------
There are a number of schema properties that are now obsolete. They're
accepted by other CQL statements, but they have no effect. They
include:
* `index_interval`
* `replicate_on_write`
* `populate_io_cache_on_flush`
* `read_repair_chance`
* `dclocal_read_repair_chance`
If the user tries to create a secondary index specifying any of those
keywords, the statement will fail with an appropriate error (see
examples below).
Unlike materialized views, we forbid specifying the clustering order
when creating a secondary index [4]. This limitation may be lifted
later on, but it's a detail that may or may not prove troublesome. It's
better to postpone covering it to when we have a better perspective on
the consequences it would bring.
Examples
--------
Good examples
```
> CREATE INDEX idx ON ks.t (v);
> CREATE INDEX idx ON ks.t (v) WITH comment = 'ok view property';
> CREATE INDEX idx ON ks.t (v)
.. WITH comment = 'multiple view properties are ok'
.. AND synchronous_updates = true;
> CREATE INDEX idx ON ks.t (v)
.. WITH comment = 'default value ok'
.. AND synchronous_updates = false;
```
Bad examples
```
> CREATE INDEX idx ON ks.t (v) WITH replicate_on_write = true;
SyntaxException: Unknown property 'replicate_on_write'
> CREATE INDEX idx ON ks.t (v)
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot specify options for a non-CUSTOM index"
> CREATE CUSTOM INDEX idx ON ks.t (v)
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="CUSTOM index requires specifying the index class"
> CREATE CUSTOM INDEX idx ON ks.t (v)
.. USING 'vector_index'
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="You cannot use view properties with a vector index"
> CREATE INDEX idx ON ks.t (v) WITH CLUSTERING ORDER BY (v ASC);
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Indexes do not allow for specifying the clustering order"
```
and so on. For more examples, see the relevant tests.
References:
[1] https://docs.scylladb.com/manual/branch-2025.4/cql/cql-extensions.html#synchronous-materialized-views
[2] https://docs.scylladb.com/manual/branch-2025.4/cql/secondary-indexes.html#create-index
[3] https://docs.scylladb.com/manual/branch-2025.4/cql/mv.html#mv-options
[4] https://docs.scylladb.com/manual/branch-2025.4/cql/dml/select.html#ordering-clauseFixesscylladb/scylladb#16454
Backport: not needed. This is an enhancement.
Closesscylladb/scylladb#24977
* github.com:scylladb/scylladb:
cql3: Extend DESC INDEX by view properties
cql3: Forbid using CLUSTERING ORDER BY when creating index
cql3: Extend CREATE INDEX by MV properties
cql3/statements/create_index_statement: Allow for view options
cql3/statements/create_index_statement: Rename member
cql3/statements/index_prop_defs: Re-introduce index_prop_defs
cql3/statements/property_definitions: Add extract_property()
cql3/statements/index_prop_defs.cc: Add namespace
cql3/statements/index_prop_defs.hh: Rename type
cql3/statements/view_prop_defs.cc: Move validation logic into file
cql3/statements: Introduce view_prop_defs.{hh,cc}
cql3/statements/create_view_statement.cc: Move validation of ID
schema/schema.hh: Do not include index_prop_defs.hh
Allow creating materialized views and secondary indexes in a tablets keyspace only if it's RF-rack-valid, and enforce RF-rack-validity while the keyspace has views by restricting some operations:
* Altering a keyspace's RF if it would make the keyspace RF-rack-invalid
* Adding a node in a new rack
* Removing / Decommissioning the last node in a rack
Previously the config option `rf_rack_valid_keyspaces` was required for creating views. We now remove this restriction - it's not needed because we always maintain RF-rack-validity for keyspaces with views.
The restrictions are relevant only for keyspaces with numerical RF. Keyspace with rack-list-based RF are always RF-rack-valid.
Fixesscylladb/scylladb#23345
Fixes https://github.com/scylladb/scylladb/issues/26820
backport to relevant versions for materialized views with tablets since it depends on rf-rack validity
Closesscylladb/scylladb#26354
* github.com:scylladb/scylladb:
docs: update RF-rack restrictions
cql3: don't apply RF-rack restrictions on vector indexes
cql3: add warning when creating mv/index with tablets about rf-rack
service/tablet_allocator: always allow tablet merge of tables with views
locator: extend rf-rack validation for rack lists
test: test rf-rack validity when creating keyspace during node ops
locator: fix rf-rack validation during node join/remove
test: test topology restrictions for views with tablets
test: add test_topology_ops_with_rf_rack_valid
topology coordinator: restrict node join/remove to preserve RF-rack validity
topology coordinator: add validation to node remove
locator: extend rf-rack validation functions
view: change validate_view_keyspace to allow MVs if RF=Racks
db: enforce rf-rack-validity for keyspaces with views
replica/db: add enforce_rf_rack_validity_for_keyspace helper
db: remove enforce parameter from check_rf_rack_validity
test: adjust test to not break rf-rack validity
It should be possible to return the similarity of vectors in CQL statements following the [Cassandra compatible syntax](https://cassandra.apache.org/doc/latest/cassandra/getting-started/vector-search-quickstart.html#query-vector-data-with-cql):
```
SELECT comment, similarity_cosine(comment_vector, [0.1, 0.15, 0.3, 0.12, 0.05])
FROM cycling.comments_vs;
```
Although the calculations are slow, and we already have calculated results returned via Vector Store API,
we need the functionality as it allows us to calculate similarity of vectors not stored in vector indexes.
It will be needed for [quantization and rescoring](https://scylladb.atlassian.net/wiki/spaces/RND/pages/195985800/Quantization+and+Rescoring).
The feature is also a nice-to-have in testing as requested many times by testing and CX teams.
The optimized version utilizing already calculated distances from Vector Store without a need of rescoring will be coming soon after via https://github.com/scylladb/scylladb/pull/27991.
---
The patch adds functions:
- `similarity_cosine(<vector>, <vector>)`,
- `similarity_euclidean(<vector>, <vector>)`,
- `similarity_dot_product(<vector>, <vector>)`
Where `<vector>` is either a column of type `VECTOR<FLOAT, N>` or a vector of floats literal.
These functions can be called with every `SELECT` query, not only ANN vector queries as opposed to https://github.com/scylladb/scylladb/pull/25993.
The similarity calculations are implemented inspired by [USearch's implementation](
a2f1759910/include/usearch/index_plugins.hpp (L1304-L1385)) and made compatible with [Cassandra's documentation](https://cassandra.apache.org/doc/5.0/cassandra/developing/cql/functions.html#vector-similarity-functions).
That would guarantee the results in ScyllaDB are calculated using the exact same algorithms as used in Vector Store indexes.
---
Fixes: SCYLLADB-88
Fixes: SCYLLADB-89
New feature, should land into 2026.1
Closesscylladb/scylladb#27524
* github.com:scylladb/scylladb:
docs: add vector similarity functions documentation
test/cqlpy: add similarity functions correctness tests
test/cqlpy: add similarity functions invalid call tests
cql3: introduce similarity functions syntax
vector_similarity_fcts: introduce similarity functions
vector_similarity_fcts: retrieve similarity function argument types
vector_similarity_fcts: add calculating similarity between vectors
Add documentation in `functions.rst` as the CQL reference
for a vector similarity functions.
This includes the syntax, example usage, and prerequisites
for the parameters.
The doc about DDL statements claims that an `ALTER KEYSPACE` will fail
in the presence of an ongoing global topology operation.
This limitation was specifically referring to RF changes, which Scylla
implements as global topology requests (`keyspace_rf_change`), and it
was true when it was first introduced (1b913dd880) because there was
no global topology request queue at that time, so only one ongoing
global request was allowed in the cluster.
This limitation was lifted with the introduction of the global topology
request queue (6489308ebc), and it was re-introduced again very
recently (2e7ba1f8ce) in a slightly different form; it now applies only
to RF changes (not to any request type) and only those that affect the
same keyspace. None of these two changes were ever reflected in the doc.
Synchronize the doc with the current state.
Fixes#27776.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#27786
Update the documentation about restrictions to tablets keyspaces related
to RF-rack.
* MV/SI require the keyspace to be RF-rack-valid
* topology operations are restricted if a keyspace has views to preserve
RF-rack-validity
Currently some things are not supported for colocated tables: it's not
possible to repair a colocated table, and due to this it's also not
possible to use the tombstone_gc=repair mode on a colocated table.
Extend the documentation to explain what colocated tables are and
document these restrictions.
Fixesscylladb/scylladb#27261Closesscylladb/scylladb#27516
This is a temporary solution as handling this property may require
a bit more attention or at least a bit more focus. For now, let's
forbid using it so it's clear it won't get applied. A simple test
is provided to cover it.
We document the restriction.
After the previous patch that extended the grammar and provided
basic functionalities to accommodate properties of materialized views
in indexes, this commit takes another step and actually applies them
to the underlying view when it's being created.
We're providing validation tests for each property, with the single
exception of CLUSTERING ORDER BY. That one will be handled separately
in an upcoming commit.
We also update the user documentation.
The PRUNE MATERALIZED VIEW statement is performed as follows:
1. Perform a range scan of the view table from the view replicas based
on the ranges specified in the statement.
2. While reading the paged scan above, for each view row perform a read
from all base replicas at the corresponding primary key. If a discrepancy
is detected, delete the row in the view table.
When reading multiple rows, this is very slow because for each view row
we need to performe a single row query on multiple replicas.
In this patch we add an option to speed this up by performing many of the
single base row reads concurrently, at the concurrency specified in the
USING CONCURRENCY clause.
Aside from the unit test, I checked manually on a 3-node cluster with 10M rows, using vnodes. There were actually no ghost rows in the test, but we still had to iterate over all view rows and read the corresponding base rows. And actual ghost rows, if there are any, should be a tiny fraction of all rows. I compared concurrencies 1,2,10,100 and the results were:
* Pruning with concurrency 1 took total 1416 seconds
* Pruning with concurrency 2 took total 731 seconds
* Pruning with concurrency 10 took total 234 seconds
* Pruning with concurrency 100 took total 171 seconds
So after a concurrency of 10 or so we're hitting diminishing returns (at least in this setup). At that point we may be no longer bottlenecked by the reads, but by CPU on the shard that's handling the PRUNE
Fixes https://github.com/scylladb/scylladb/issues/27070Closesscylladb/scylladb#27097
* github.com:scylladb/scylladb:
mv: allow setting concurrency in PRUNE MATERIALIZED VIEW
cql: add CONCURRENCY to the USING clause
The PRUNE MATERALIZED VIEW statement is performed as follows:
1. Perform a range scan of the view table from the view replicas based
on the ranges specified in the statement.
2. While reading the paged scan above, for each view row perform a read
from all base replicas at the corresponding primary key. If a discrepancy
is detected, delete the row in the view table.
When reading multiple rows, this is very slow because for each view row
we need to performe a single row query on multiple replicas.
In this patch we add an option to speed this up by performing many of the
single base row reads concurrently, at the concurrency specified in the
USING CONCURRENCY clause.
Fixes https://github.com/scylladb/scylladb/issues/27070
This commit fixes the information about object storage:
- Object storage configuration is no longer marked as experimental.
- Redundant information has been removed from the description.
- Information related to object storage for SStabels has been removed
as the feature is not working.
Fixes https://github.com/scylladb/scylladb/issues/26985Closesscylladb/scylladb#26987
This patch adds the missing warning about the lack of possibility
to return the similarity distance. This will be added in the next
iteration.
Fixes#27086
It has to be backported to 2025.4 as this is the limitation in 2025.4.
Closesscylladb/scylladb#27096
This patch series re-enables support for speculative retry values `0` and `100`. These values have been supported some time ago, before [schema: fix issue 21825: add validation for PERCENTILE values in speculative_retry configuration. #21879
](https://github.com/scylladb/scylladb/pull/21879). When that PR prevented using invalid `101PERCENTILE` values, valid `100PERCENTILE` and `0PERCENTILE` value were prevented too.
Reproduction steps from [[Bug]: drop schema and all tables after apply speculative_retry = '99.99PERCENTILE' #26369](https://github.com/scylladb/scylladb/issues/26369) are unable to reproduce the issue after the fix. A test is added to make sure the inclusive border values `0` and `100` are supported.
Documentation is updated to give more information to the users. It now states that these border values are inclusive, and also that the precision, with automatic rounding, is 1 decimal digit.
Fixes#26369
This is a bug fix. If at any time a client tries to use value >= 99.5 and < 100, the raft error will happen. Backport is needed. The code which introduced inconsistency is introduced in 2025.2, so no backporting to 2025.1.
Closesscylladb/scylladb#26909
* github.com:scylladb/scylladb:
test: cqlpy: add test case for non-numeric PERCENTILE value
schema: speculative_retry: update exception type for sstring ops
docs: cql: ddl.rst: update speculative-retry-options
test: cqlpy: add test for valid speculative_retry values
schema: speculative_retry: allow 0 and 100 PERCENTILE values
Clarify how the value of `XPERCENTILE` is handled:
- Values 0 and 100 are supported
- The percentile value is rounded to the nearest 0.1 (1 decimal place)
Refs #26369
Auto-exands numeric RF in CREATE/ALTER KEYSPACE statements for
new DCs specified in the statement.
Doesn't auto-expand existing options, as the rack choice may not be in
line with current replica placement. This requires co-locating tablet
replicas, and tracking of co-location state, which is not implemented yet.
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
We want to add strongly consistent tables as an option. We will have
two kind of strongly consistent tables: globally consistent and locally
consistent. The former means that requests from all DCs will be globally
linearisable while the later - only requests to the same DCs will be
linearisable. To allow configuring all the possibilities the patch
adds new parameter to a keyspace definition "consistency" that can be
configured to be `eventual`, `global` or `local`. Non eventual setting
is supported for tablets enabled keyspaces only. Since we want to start
with implementing local consistency configuring global consistency will
result in an error for now.
This patch adds the missing documentation for the SELECT ... ANN
statement that allows performing vector queries. This is just the
basic explanation of the grammar and how to use it. More
comprehensive documentation about vector search will be added
separately in Scylla Cloud documentation and features description.
Links to this additional documentation will be added as part of
VECTOR-244.
Fixes: VECTOR-247.
This patch adds CQL documentation about creating vector search
indexes. It includes the syntax and description of parameters.
It does not cover VECTOR type that is already supported and
documented and it does not cover querying vectors which will be
covered by a separate PR.
Fixes: VECTOR-217
Closesscylladb/scylladb#26233
This patch updates the create keyspace statement docs. It explains how
the `replication` option in the create keyspace statement is now optional,
and behaves the same as if we specified an empty set as following:
`WITH replication = {}`.
An example with no `replication` option specified has also been added.
Refs #25145
Update create-keyspace-statement section of ddl.rst since replication factor is no longer mandatory.
Add an example for keyspace creation without specifying replication factor.
Add an example for keyspace creation without specifying both `class` and replication factor.
Refs: #16028
Update create-keyspace-statement section of ddl.rst since `class` is no longer mandatory.
Add an example for keyspace creation without specifying `class`.
Refs: #16029
ScyllaDB supports non-frozen UDTs since 3.2, no need to keep referencing
this limitation in the current docs. Replace the description of the
limitation with general description of frozen semantics for UDTs.
Fixes: #22929Closesscylladb/scylladb#24763
This commit removes the Non-Reserved CQL Keywords and Reserved CQL Keywords pages-keyword
as that content is already covered on the Appendices page.
Redirections are added to avoid 404s for the removed pages.
In addition, the Appendices page title is extended with "Reserved CQL Keywords and Types"
to help users understand what those appendices are about.
Fixes https://github.com/scylladb/scylladb/issues/24319Closesscylladb/scylladb#24320
This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes.
The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389)
and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint.
This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases.
The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data.
When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID.
For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name.
The directory name for this log table becomes the longest possible representation.
Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas.
To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows:
255 bytes (common filesystem limit for a path component)
- 32 bytes (for the 32-character UUID string)
- 1 byte (for the '-' separator)
- 15 bytes (for the '_scylla_cdc_log' suffix)
- 15 bytes (reserved for future use)
----------
= 192 bytes (Maximum allowed name length)
This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038).
This patch also updates/adds all associated tests to validate the new 192-byte limit.
The documentation has been updated accordingly.
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS").
This commit explicitly documents the differences in the behavior.
Refs: #13011
Current protocol extension that sends tablet info to drivers only does
that if the driver selects a non-replica coordinator for a routable
request. It works well if some node on the replica list is replaced by
other node, or if some replicas are removed from the list. Driver will
at some point send a request to stale replica, and receive new list in
response.
The issue is with extending the list with new replicas. In that case old
replicas are all still correct, so driver will not select any wrong
replica, and will not receive the new list. As far as I know that only
scenario where this could happen is RF increase.
It could be to some degree worked around in the drivers, but it would
add significant complexity (definitely more than any other invalidations
we introduced) while still not being ideal solution. This scenario
should be rare enough, and the consequences of not handling it minor
enough (new replicas not being used as coordinators) that it does not
warrant driver-side solution. Instead this commit adds info about this
to documentation, advising users to restart applications after replica
lists are extended.
It is worth noting that if new tablet feedback protocol extension is
implemented then this problem goes away. See issue #21664.
Closesscylladb/scylladb#23447
We introduce a new term in the glossary: RF-rack-valid keyspace.
We also highlight in our user documentation that all keyspaces
must remain RF-rack-valid throughout their lifetime, and failing
to guarantee that may result in data inconsistencies or other
issues. We base that information on our experience with materialized
views in keyspaces using tablets, even though they remain
an experimental feature.
Along with the new term, we introduce a new configuration option
called `rf_rack_valid_keyspaces`, which, when enabled, will enforce
preserving all keyspaces RF-rack-valid. That functionality will be
implemented in upcoming commits. For now, we materialize the
restriction in form of a named requirement: a function verifying
that the passed keyspace is RF-rack-valid.
The option is disabled by default. That will change once we adjust
the existing tests to the new semantics. Once that is done, the option
will first be enabled by default, and then it will be removed.
Fixesscylladb/scylladb#20356
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
While using aggregate functions in conjunction with PER PARTITION LIMIT
can make sense, we want to disable it until we can offer proper
implementation, see #9879 for discussion.
We want to match Cassandra, and for queries with aggregate functions it
behaves as follows:
- it silently ignores PER PARTITION LIMIT if GROUP BY is present, which
matches our previous implementation.
- rejects PER PARTITION LIMIT when GROUP BY is *not* present.
This patch adds rejection of the second group.
Fixes#9879Closesscylladb/scylladb#23086
This commit adds a link to the Limitations section on the Tablets page
to the CQL pag, the tablets option.
This is actually the place where the user will need the information:
when creating a keyspace.
In addition, I've reorganized the section for better readability
(otherwise, the section about limitations was easy to miss)
and moved the section up on the page.
Note that I've removed the updated content from the `_common` folder
(which I deleted) to the .rst page - we no longer split OSS and Enterprise,
so there's no need to keep using the `scylladb_include_flag` directive
to include OSS- and Ent-specific content.
Fixes https://github.com/scylladb/scylladb/issues/22892
Fixes https://github.com/scylladb/scylladb/issues/22940Closesscylladb/scylladb#22939
This series extends the table schema with per-table tablet options.
The options are used as hints for initial tablet allocation on table creation and later for resize (split or merge) decisions,
when the table size changes.
* New feature, no backport required
Closesscylladb/scylladb#22090
* github.com:scylladb/scylladb:
tablets: resize_decision: get rid of initial_decision
tablet_allocator: consider tablet options for resize decision
tablet_allocator: load_balancer: table_size_desc: keep target_tablet_size as member
network_topology_strategy: allocate_tablets_for_new_table: consider tablet options
network_topology_strategy: calculate_initial_tablets_from_topology: precalculate shards per dc using for_each_token_owner
network_topology_strategy: calculate_initial_tablets_from_topology: set default rf to 0
cql3: data_dictionary: format keyspace_metadata: print "enabled":true when initial_tablets=0
cql3/create_keyspace_statement: add deprecation warning for initial tablets
test: cqlpy: test_tablets: add tests for per-table tablet options
schema: add per-table tablet options
feature_service: add TABLET_OPTIONS cluster schema feature
This pull request is an implementation of vector data type similar to one used by Apache Cassandra.
The patch contains:
- implementation of vector_type_impl class
- necessary functionalities similar to other data types
- support for serialization and deserialization of vectors
- support for Lua and JSON format
- valid CQL syntax for `vector<>` type
- `type_parser` support for vectors
- expression adjustments such as:
- add `collection_constructor::style_type::vector`
- rename `collection_constructor::style_type::list` to `collection_constructor::style_type::list_or_vector`
- vector type encoding (for drivers)
- unit tests
- cassandra compatibility tests
- necessary documentation
Co-authored-by: @janpiotrlakomy
Fixes https://github.com/scylladb/scylladb/issues/19455Closesscylladb/scylladb#22488
* github.com:scylladb/scylladb:
docs: add vector type documentation
cassandra_tests: translate tests covering the vector type
type_codec: add vector type encoding
boost/expr_test: add vector expression tests
expression: adjust collection constructor list style
expression: add vector style type
test/boost: add vector type cql_env boost tests
test/boost: add vector type_parser tests
type_parser: support vector type
cql3: add vector type syntax
types: implement vector_type_impl
Use the keyspace initial_tablets for min_tablet_count, if the latter
isn't set, then take the maximum of the option-based tablet counts:
- min_tablet_count
- and expected_data_size_in_gb / target_tablet_size
- min_per_shard_tablet_count (via
calculate_initial_tablets_from_topology)
If none of the hints produce a positive tablet_count,
fall back to calculate_initial_tablets_from_topology * initial_scale.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Per-table hints should be used instead.
Note: the warning is produced by check_against_restricted_replication_strategies
which is called also from alter_keyspace_statement.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Unlike with vnodes, each tablet is served only by a single
shard, and it is associated with a memtable that, when
flushed, it creates sstables which token-range is confined
to the tablet owning them.
On one hand, this allows for far better agility and elasticity
since migration of tablets between nodes or shards does not
require rewriting most if not all of the sstables, as required
with vnodes (at the cleanup phase).
Having too few tablets might limit performance due not
being served by all shards or by imbalance between shards
caused by quantization. The number of tabelts per table has to be
a power of 2 with the current design, and when divided by the
number of shards, some shards will serve N tablets, while others
may serve N+1, and when N is small N+1/N may be significantly
larger than 1. For example, with N=1, some shards will serve
2 tablet replicas and some will serve only 1, causing an imbalance
of 100%.
Now, simply allocating a lot more tablets for each table may
theoretically address this problem, but practically:
a. Each tablet has memory overhead and having too many tablets
in the system with many tables and many tablets for each of them
may overwhelm the system's and cause out-of-memory errors.
b. Too-small tablets cause a proliferation of small sstables
that are less efficient to acces, have higher metadata overhead
(due to per-sstable overhead), and might exhaust the system's
open file-descriptors limitations.
The options introduced in this change can help the user tune
the system in two ways:
1. Sizing the table to prevent unnecessary tablet splits
and migrations. This can be done when the table is created,
or later on, using ALTER TABLE.
2. Controlling min_per_shard_tablet_count to improve
tablet balancing, for hot tables.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Add missing vector type documentation including: definition of vector,
adjustment of term definition, JSON encoding, Lua and cql3 type
mapping, vector dimension limit, and keyword specification.
Add a paragraph documenting the decision to deprecate
the COMPACT STORAGE feature, and instruct the user
how to enable the feature despite that.
Note that we don't have an official migration strategy
for users like `DROP COMPACT STORAGE`, which is not
implemented at this time (See #3882).
Fixes#16375
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As part of #18750, we added a CQL statement CREATE ROLE WITH SALTED HASH that prevented hashing a password when creating a role, effectively leading to inserting a hash given by the user directly into the database. In #21350, we noticed that Cassandra had implemented a CQL statement of similar semantics but different syntax. We decided to rename Scylla's statement to be compatible with Cassandra. Unfortunately, we didn't notice one more difference between what we had in Scylla and what was part of Cassandra.
Scylla's statement was originally supposed to only be used when restoring the schema and the user needn't have to be aware of its existence at all: the database produced a sequence of CQL statements that the user saved to a file and when a need to restore the schema arose, they would execute the contents of the file. That's why that although we documented the feature, it was only done in the necessary places. Those that weren't related to the backup & restore procedure were deliberately skipped.
Cassandra, on the other hand, added the statement for a different purpose (for details, see the relevant issue) and it was supposed to be used by the user by design. The statement is also documented as such.
Since we want to preserve compatibility with Cassandra, we document the statement and its semantics in the user documentation, explicitly implying that it can be used by the user.
We also add a test verifying that logging in works correctly.
Fixesscylladb/scylladb#21691
Backport: not needed. The relevant code didn't make it to 6.2 or any previous version of OSS.
Closesscylladb/scylladb#21752
* github.com:scylladb/scylladb:
docs: Update documentation on CREATE ROLE WITH HASHED PASSWORD
test/boost: Add test for creating roles with hashed passwords
remove the "ScyllaDB Enterprise" labels in document. because
there is no need to differentiate ScyllaDB Enterprise from its OSS
variant, let's stop adding the "ScyllaDB Enterprise" labels to
enterprise-only features. this helps to reduce the confusion.
as we are still in the process of porting the enterprise features
to this repo, this change does not fixscylladb/scylladb#22175.
we will review the document again when completing the migration.
we also take this opportunity to stop referencing "Enterprise" in
the changed paragraph.
Refs scylladb/scylladb#22175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22177