The STORAGE option is designed to hold a map of options
used for customizing storage for given keyspace.
The option is kept in a system_schema.scylla_keyspaces table.
The option is only available if the whole cluster is aware
of it - guarded by a cluster feature.
Example of the table contents:
```
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;
keyspace_name | storage_options | storage_type
---------------+------------------------------------------------+--------------
ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} | S3
```
Makes final function and initial condition to be optional while
creating UDA. No final function means UDA returns final state
and defeult initial condition is `null`.
Fixes: #10324
The error message incorrectly stated that the timeout value cannot
be longer than 24h, but it can - the actual restriction is that the
value cannot be expressed in units like days or months, which was done
in order to significantly simplify the parsing routines (and the fact
that timeouts counted in days are not expected to be common).
Fixes#10286Closes#10294
"
By way of having an implementation of `data_dictionary` and using that.
The schema loader only needs a database to parse cql3 statements, which
are all coordinator-side objects and hence been largely migrated to use
data dictionary instead.
A few hard-dependencies on replica:: objects were found and resolved:
* index::secondary_index_manager
* tombstone_gc
The former was migrated to use `data_dictionary::table` instead of
`replica::table`. This in turn requires disentangling
`replica::data_dictionary_impl` from `replica::database`, as currently
the former can only really be used by the latter.
What all of this achieves us is that we no longer have to instantiate a
`replica::database` object in `tools::load_schema()`. We want to use the
standard allocator in tools, which means they cannot use LSA memory at
all. Database on the other hand creates memtable and row-cache instances
so it had to go.
Refs: #9882
Tests: unit(dev, schema_loader_test:debug,
cql-pytest/test_tools.py:debug)
"
* 'tools-schema-loader-database-impl/v2' of https://github.com/denesb/scylla:
tools/schema_loader: use own data dictionary impl
tombstone_gc: switch to using data dictionary
index/secondary_index_manager: switch to using data dictionary
replica/table: add as_data_dictionary()
replica: disentangle data_dictionary_impl from database
replica: move data_dictionary_impl into own header
But only on the surface, the only internal function needing the database
(`needs_repair_before_gc()`) still gets a real database because the
replication factor cannot be obtained from the data dictionary
currently. Although this might not look like an improvement, it is
enough to avoid a `real_database()` call for tables that don't have
tombstone gc mode set to repair.
Commit 1c99ed6ced added tracing logs
about the index chosen for the query, but aggregate queries have
a separate code path, which wasn't taken into account.
After this patch, tracing for aggregate queries also includes
this additional information.
Closes#10195
It seams that batch prepared statements always return false for
depends_on_keyspace and depends_on_column_family, this in turn
renders the removal criteria from the cache to always be false
which result by the queries not being evicted.
Here we change the functions to return the true state meaning,
they will return true if any of the sub queries is dependant upon
the keyspace or column family.
In this fix we first make the API more coherent and then use this new API to implement
the batch statement's dependency test.
Fixes#10129
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Closes#10132
* github.com:scylladb/scylla:
prepared_statements: Invalidate batch statement too
cql3 statements: Change dependency test API to express better it's purpose
Recently, coordinator_result was introduced as an alternative for
exceptions. It was placed in the main "exceptions/exceptions.hh" header,
which virtually every single source file in Scylla includes.
But unfortunately, it brings in some heavy header files and templates,
leading to a lot of wasted build time - ClangBuildAnalyzer measured that
we include exceptions.hh in 323 source files, taking almost two seconds
each on average.
In this patch, we split the coordinator_result feature into a separate
header file, "exceptions/coordinator_result", and only the few places
which need it include the header file. Unfortunately, some of these
few places are themselves header, so the new header file ends up being
included in 100 source files - but 100 is still much less than 323 and
perhaps we can reduce this number 100 later.
After this patch, the total Scylla object-file size is reduced by 6.5%
(the object size is a proxy for build time, which I didn't directly
measure). ClangBuildAnalyzer reports that now each of the 323 includes
of exceptions.hh only takes 80ms, coordinator_result.hh is only included
100 times, and virtually all the cost to include it comes from Boost's
result.hh (400ms per inclusion).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220228204323.1427012-1-nyh@scylladb.com>
It seams that batch prepared statements always return false for
depends_on, this in turn renders the removal criteria from the
prepared statements cache to always be false which result by the
queries not being evicted.
Here we change the function to return the true state meaning,
they will return true if one of the sub queries is dependant
upon the keyspace and/ or column family.
Fixes#10129
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
purpose
Cql statements used to have two API functions, depends_on_keyspace and
depends_on_column_family. The former, took as a parameter only a table
name, which makes no sense. There could be multiple tables with the same
name each in a different keyspace and it doesn't make sense to
generalize the test - i.e to ask "Does a statement depend on any table
named XXX?"
In this change we unify the two calls to one - depends on that takes a
keyspace name and optionally also a table name, that way every logical
dependency tests that makes sense is supported by a single API call.
Adjusts the indexed_table_select_statement so that it uses the
result-aware methods in storage_proxy and propagates failed results as
result_message::exception.
Modifies the remaining logic of execute_without... (apart from the
do_execute call) so that the result-aware versions of storage_proxy's
methods are called and failed results are converted to
result_message::exception.
The select_statement will be able to propagate coordinator failures
without throwing, so it's important to override the default
implementations of execute and excecute_without... so that the first
calls the latter and not the other way around.
Adds:
- Includes for result-related helper methods (to be used in later
commits),
- Alias for coordinator_result,
- The wrap_result_to_error_message function - a bit similar to
utils::result_wrap. Adapts a callable T -> shared_ptr<result_message>
to take result<T> -> shared_ptr<result_message>. If the result is
failed, it converts it into result_message::exception and returns.
Modifies the modification_statement code so that is converts failed
`result<>` into a `result_message::exception` without involving the C++
exception runtime.
Modifies the batch_statement code so that is converts failed `result<>`
into a `result_message::exception` without involving the C++ exception
runtime.
Changes the interface of `mutate_with_triggers` so that it returns
`future<result<>>` instead of `future<>`. No intermediate
`mutate_with_triggers_result` method is introduced because all call
sites will be changed in this PR so that they properly handle failed
`result<>`s with exceptions-as-values.
Detect whether a statement is a count(*) query in prepare time. If so,
instantiate a new `select_statement` subclass -
`parallelized_select_statement`. This subclass has a different execution
logic, that enables it to distribute count(*) queries across a cluster.
Also, a new counter was added - `select_parallelized` that counts the
number of parallelized aggregation SELECT query executions.
Schema changes on top of Raft do not allow concurrent changes.
If two changes are attempted concurrently, one of them gets
`group0_concurrent_modification` exception.
Catch the exception in CQL DDL statement execution function and retry.
In addition, the description of CQL DDL statements in group 0 history
table was improved.
The description parameter is used for the group 0 history mutation.
The default is empty, in which case the mutation will leave
the description column as `null`.
I filled the parameter in some easy places as an example and left the
rest for a follow-up.
This is how it looks now in a fresh cluster with a single statement
performed by the user:
cqlsh> select * from system.group0_history ;
key | state_id | description
---------+--------------------------------------+------------------------------------------------------
history | 9ec29cac-7547-11ec-cfd6-77bb9e31c952 | CQL DDL statement
history | 9beb2526-7547-11ec-7b3e-3b198c757ef2 | null
history | 9be937b6-7547-11ec-3b19-97e88bd1ca6f | null
history | 9be784ca-7547-11ec-f297-f40f0073038e | null
history | 9be52e14-7547-11ec-f7c5-af15a1a2de8c | null
history | 9be335dc-7547-11ec-0b6d-f9798d005fb0 | null
history | 9be160c2-7547-11ec-e0ea-29f4272345de | null
history | 9bdf300e-7547-11ec-3d3f-e577a2e31ffd | null
history | 9bdd2ea8-7547-11ec-c25d-8e297b77380e | null
history | 9bdb925a-7547-11ec-d754-aa2cc394a22c | null
history | 9bd8d830-7547-11ec-1550-5fd155e6cd86 | null
history | 9bd36666-7547-11ec-230c-8702bc785cb9 | Add new columns to system_distributed.service_levels
history | 9bd0a156-7547-11ec-a834-85eac94fd3b8 | Create system_distributed(_everywhere) tables
history | 9bcfef18-7547-11ec-76d9-c23dfa1b3e6a | Create system_distributed_everywhere keyspace
history | 9bcec89a-7547-11ec-e1b4-34e0010b4183 | Create system_distributed keyspace
`announce` now takes a `group0_guard` by value. `group0_guard` can only
be obtained through `migration_manager::start_group0_operation` and
moved, it cannot be constructed outside `migration_manager`.
The guard will be a method of ensuring linearizability for group 0
operations.
1. Generalize the name so it mentions group 0, which schema will be a
strict subset of.
2. Remove the fact that it performs a "read barrier" from the name. The
function will be used in general to ensure linearizability of group0
operations - both reads and writes. "Read barrier" is Raft-specific
terminology, so it can be thought of as an implementation detail.
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Add the missing partition-key validation in INSERT JSON statements.
Scylla, following the lead of Cassandra, forbids an empty-string partition
key (please note that this is not the same as a null partition key, and
that null clustering keys *are* allowed).
Trying to INSERT, UPDATE or DELETE a partition with an empty string as
the partition key fails with a "Key may not be empty". However, we had a
loophole - you could insert such empty-string partition keys using an
"INSERT ... JSON" statement.
The problem was that the partition key validation was done in one place -
`modification_statement::build_partition_keys()`. The INSERT, UPDATE and
DELETE statements all inherited this same method and got the correct
validation. But the INSERT JSON statement - insert_prepared_json_statement
overrode the build_partition_keys() method and this override forgot to call
the validation function. So in this patch we add the missing validation.
Note that the validation function checks for more than just empty strings -
there is also a length limit for partition keys.
This patch also adds a cql-pytest reproducer for this bug. Before this
patch, the test passed on Cassandra but failed on Scylla.
Reported by @FortTell
Fixes#9853.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220116085216.21774-1-nyh@scylladb.com>
And instantly convert the validate_keyspace() as it's not called
from anywhere but the validate_column_family().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Straightforward replacement. Internals of the has_column_family_access()
temporarily get .real_database(), but it will be changed soon.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Straightforward replacement. Internals of the has_keyspace_access()
temporarily get .real_database(), but it will be changed soon.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This was needed to fix issue #2129 which was only manifest itself with
auto_bootstrap set to false. The option is ignored now and we always
wait for schema to synch during boot.
Pagers are created by alternator and select statement, both
have the proxy reference at hands. Next, the pager's unique_ptr
is put on the lambda of its fetch_page() continuation and thus
it survives the fetch_page execution and then gets destroyed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.
To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.
In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:
1) GC a tombstone after gc_grace_seconds
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;
This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.
2) Never GC a tombstone
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};
3) GC a tombstone immediately
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};
4) GC a tombstone after repair
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};
In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.
A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.
Tests: compaction_test.py, ninja test
Fixes#3560
[avi: resolve conflicts vs data_dictionary]
The main intention is actually to free the qp.proxy() from the
need to provide the get_stats() method.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After previous patches there's a whole bunch of places that do
qp.proxy().data_dictionary()
while the data_dictionary is present on the query processor itself
and there's a public method to get one. So use it everywhere.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The prepare_schema_mutations is not sleeping method, so there's no
point in getting call-local shared pointer on proxy. Plain reference
is more than enough.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the largest user of proxy argument. Fix them all and
their callers (all sit in the same .cc file).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This completes the batch_ and modification_statement rework.
Also touch the private batch_statement::read_command while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some internal methods that use proxy argument. Replace
most of them with query_processor, next patch will fix the rest --
those that interact with batch statement.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some proxy arguments left in the batch_statement internals.
Fix most of them to be query_processors. Few remainders will come
later as they rely on other statements to be fixed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>