Rename reclaim_timer::_reserve_segments to _segments_to_release
as it is clearer and more suitable for later patches
that will add reclaim_timers in more functions.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This PR introduces improvements to `expr::to_restriction` and prepares the validation part for restriction classes removal.
`expr::to_restriction` is currently used to take a restriction from the WHERE clause, prepare it, perform some validation checks and finally convert it to an instance of the restriction class.
Soon we will get rid of the restriction class.
In preparation for that `expr::to_restriction` is split into two independent parts:
* The part that prepares and validates a binary_operator
* The part that converts a binary_operator to restriction
Thanks to this split getting rid of restriction class will be painless, we will just stop using the second part.
`to_restriction.cc` is replaced by `restrictions.hh/cc`. In the future we can put all the restriction expressions code there to avoid clutter in `expression.hh/cc`.
This change made it much easier to fix#10631, so I did that as well.
Fixes: #10631Closes#10979
* github.com:scylladb/scylla:
cql-pytest: Test that IS NOT only accepts NULL
cql-pytest: Enable testInvalidCollectionNonEQRelation
cql3: Move single element IN restrictions handling
cql3: Check for disallowed operators early
cql3: Simplify adding restrictions
cql3: Reorganize to_restriction code
cql3: Fix IS NOT NULL check in to_restriction
cql3: Swap order of arguments in error message
bytes_ostream is an incremental builder for a discontiguous byte container.
managed_bytes is a non-incremental (size must be known up front) byte
container, that is also compatible with LSA. So far, conversion between
them involves copying. This is unfortunate, since query_result is generated
as a bytes_ostream, but is later converted to managed_bytes (today, this
is done in cql3::expr::get_non_pk_values() and
compound_view_wrapper::explode(). If the two types could be made compatible,
we could use managed_bytes_view instead of creating new objects and avoid
a copy. It's also nicer to have one less vocabulary type.
This patch makes bytes_ostream use managed_bytes' internal representation
(blob_storage instead of bytes_ostream::chunk) and provides a conversion
to managed_bytes. All bytes_ostream users are left in place, but the goal
is to make bytes_ostream a write-only type with the only observer a conversion
to managed_bytes.
It turns out to be relatively simple. The internal representations were
already similar. I made blob_storage::ref_type self-initializing to
reduce churn (good practice anyway) and added a private constructor
to managed_bytes for the conversion.
Note that bytes_ostream can only be used to construct a non-LSA managed_bytes,
but LSA uses of managed_bytes are very strictly controlled (the entry
points to memtable and cache) so that's not a problem.
A unit test is added.
Closes#10986
After acquiring the _compaction_state write lock,
select all sstables using get_candidates and register them
as compacting, then unlock the _compaction_state lock
to let regular compaction run in parallel.
Also, run major compaction in maintenance scheduling group.
We should separate the scheduling groups used for major compaction
from the the regular compaction scheduling group so that
the latter can be affected by the backlog tracker in case
backlog accumulates during a long running major compaction.
Fixes#10961Closes#10984
* github.com:scylladb/scylla:
compaction_manager: major_compaction_task: run in maintenance scheduling groupt
compaction_manager: allow regular compaction to run in parallel to major
The IS_NOT operator can only be used during materialized view creation
and it can only be used to express IS NOT NULL.
Trying to write something like IS NOT 42 should cause an error.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Restrictions like
col IN (1)
get converted to
col = 1
as an optimization/simplification.
This used to be done in prepare_binary_operator,
but it fits way better inside of
validate_and_prepare_new_restriction.
When it was being done in prepare_binary_operator
the conversion happened before validation checks
and the error messages would describe an equality
restriction despite the user making an IN restriction.
Now the conversion happens after all validation
is finished, which ensures that all checks are
being done on the original expression.
Fixes: #10631
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Move checking for disallowed operators
earlier in the code flow.
This is needed to pass some tests that
expect one error message instead of the other.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
The code that adds restrictions in statement_restrictions.cc
is unnecessarily convoluted.
The code to handle IS NOT NULL is actually repeated twice,
once in the constructor and once in add_is_not_restriction.
I missed this when I orignally modified this code.
There is no need to keep duplicate code, we can just
use the new add_is_not_restriction.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
expr::to_restriction is currently used to
take a restriction from the WHERE clause,
prepare it, perform some validation checks
and finally convert it to an instance of
the restriction class.
Soon we will get rid of the restriction class.
In preparation for that expr::to_restriction
is split into two independent parts:
* The part that prepares and validates a binary_operator
* The part that converts a binary_operator to restriction
Thanks to this split getting rid of restriction class
will be painless, we will just stop using the
second part.
This commit splits expr::to_restriction into two functions;
* validate_and_prepare_new_restriction
* convert_to_restriction
that handle each of those parts.
All helper validation methods in the anonymous namespace
are copied from the to_restriction.cc file.
to_restriction.cc isn't the best filename for the new functionality,
so it has been renamed to restrictions.hh/cc.
In the future all the code regarding restrictions could be
put there to reduce clutter in expression.hh/cc
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
expr::to_restriction performs a check to see if
the restriction is of form: `col IS NOT NULL`
There is a mistake in this check.
It uses is<null>(prepared_binop.rhs)
to determine if the right hand side of binary operator
is a null, but the binary operator is already prepared.
During preparation expr::null is converted to expr::constant
and that wouldn't be detected by this check.
The check has been changed to check for null constant instead
of expr::null.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
The error message displays two arguments in
a specific order, but the tests actually
expect them to be swapped.
Swap the arguments to match the expected
error messages in tests.
It wasn't detected earlier because the
check was never reached, but this will change
soon in the following commits.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This PR migrates the ScyllaDB end-user documentation from the [scylla-docs](https://github.com/scylladb/scylla-docs/) repository, according to the [migration plan](https://docs.google.com/document/d/15yBf39j15hgUVvjeuGR4MCbYeArqZrO1ir-z_1Urc6A/edit?usp=sharing). All the files are added to the `docs` subfolder.
**This PR does not cover any content changes.**
How to test this PR:
1. Go to `scylla/docs`.
2. Run `make preview`. The docs should build without any warnings.
3. Open http://127.0.0.1:5500/ in your browser. You should see the documentation landing page:

Closes#10976
* github.com:scylladb/scylla:
doc: fix errors -fix the indent in the conf.py file
doc: fix the path to Alternator
doc: fix errors - add Alternator to the toctree
doc: fix errors- update the conf.py file
doc: fix errors - remove the CNAME file
doc: add the CNAME and robots files
doc: move index and README from scylla-docs repo
doc: move the documentation from the scylla-docs repo
doc: remove the old index file
This PR gets rid of exception throws/rethrows on the replica side for writes and single-partition reads. This goal is achieved without using `boost::outcome` but rather by replacing the parts of the code which throw with appropriate seastar idioms and by introducing two helper functions:
1.`try_catch` allows to inspect the type and value behind an `std::exception_ptr`. When libstdc++ is used, this function does not need to throw the exception and avoids the very costly unwind process. This based on the "How to catch an exception_ptr without even try-ing" proposal mentioned in https://github.com/scylladb/scylla/issues/10260.
This function allows to replace the current `try..catch` chains which inspect the exception type and account it in the metrics.
Example:
```c++
// Before
try {
std::rethrow_exception(eptr);
} catch (std::runtime_exception& ex) {
// 1
} catch (...) {
// 2
}
// After
if (auto* ex = try_catch<std::runtime_exception>(eptr)) {
// 1
} else {
// 2
}
```
2. `make_nested_exception_ptr` which is meant to be a replacement for `std::throw_with_nested`. Unlike the original function, it does not require an exception being currently thrown and does not throw itself - instead, it takes the nested exception as an `std::exception_ptr` and produces another `std::exception_ptr` itself.
Apart from the above, seastar idioms such as `make_exception_future`, `co_await as_future`, `co_return coroutine::exception()` are used to propagate exceptions without throwing. This brings the number of exception throws to zero for single partition reads and writes (tested with scylla-bench, --mode=read and --mode=write).
Results from `perf_simple_query`:
```
Before (719724e4df):
Writes:
Normal:
127841.40 tps ( 56.2 allocs/op, 13.2 tasks/op, 50042 insns/op, 0 errors)
Timeouts:
94770.81 tps ( 53.1 allocs/op, 5.1 tasks/op, 78678 insns/op, 1000000 errors)
Reads:
Normal:
138902.31 tps ( 65.1 allocs/op, 12.1 tasks/op, 43106 insns/op, 0 errors)
Timeouts:
62447.01 tps ( 49.7 allocs/op, 12.1 tasks/op, 135984 insns/op, 936846 errors)
After (d8ac4c02bfb7786dc9ed30d2db3b99df09bf448f):
Writes:
Normal:
127359.12 tps ( 56.2 allocs/op, 13.2 tasks/op, 49782 insns/op, 0 errors)
Timeouts:
163068.38 tps ( 52.1 allocs/op, 5.1 tasks/op, 40615 insns/op, 1000000 errors)
Reads:
Normal:
151221.15 tps ( 65.1 allocs/op, 12.1 tasks/op, 43028 insns/op, 0 errors)
Timeouts:
192094.11 tps ( 41.2 allocs/op, 12.1 tasks/op, 33403 insns/op, 960604 errors)
```
Closes#10368
* github.com:scylladb/scylla:
database: avoid rethrows when handling exceptions from commitlog
database: convert throw_commitlog_add_error to use make_nested_exception_ptr
utils: add make_nested_exception_ptr
storage_proxy: don't rethrow when inspecting replica exceptions on write path
database: don't rethrow rate_limit_exception
storage_proxy: don't rethrow the exception in abstract_read_resolver::error
utils/exceptions.cc: don't rethrow in is_timeout_exception
utils/exceptions: add try_catch
utils: add abi/eh_ia64.hh
storage_proxy: don't rethrow exceptions from replicas when accounting read stats
message: get rid of throws in send_message{,_timeout,_abortable}
database/{query,query_mutations}: don't rethrow read semaphore exceptions
Recently we noticed a regression where with certain versions of the fmt
library,
SELECT value FROM system.config WHERE name = 'experimental_features'
returns string numbers, like "5", instead of feature names like "raft".
It turns out that the fmt library keep changing their overload resolution
order when there are several ways to print something. For enum_option<T> we
happen to have to conflicting ways to print it:
1. We have an explicit operator<<.
2. We have an *implicit* convertor to the type held by T.
We were hoping that the operator<< always wins. But in fmt 8.1, there is
special logic that if the type is convertable to an int, this is used
before operator<<()! For experimental_features_t, the type held in it was
an old-style enum, so it is indeed convertible to int.
The solution I used in this patch is to replace the old-style enum
in experimental_features_t by the newer and more recommended "enum class",
which does not have an implicit conversion to int.
I could have fixed it in other ways, but it wouldn't have been much
prettier. For example, dropping the implicit convertor would require
us to change a bunch of switch() statements over enum_option (and
not just experimental_features_t, but other types of enum_option).
Going forward, all uses of enum_option should use "enum class", not
"enum". tri_mode_restriction_t was already using an enum class, and
now so does experimental_features_t. I changed the examples in the
comments to also use "enum class" instead of enum.
This patch also adds to the existing experimental_features test a
check that the feature names are words that are not numbers.
Fixes#11003.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11004
Fix two cql-pytest that have been "XPASS"ing (unexpectedly passing)
by removing the "xfail" (expecting failure) mark from them:
One test was for an issue that has already been fixed (refs #10081).
The second test was a translated Cassandra test that should never
have failed because it doesn't trigger the issue that supposedly failed
it (that test sets a large value for a non-indexed column, so doesn't
trigger the problem we have with large values in an indexed column).
Closes#11006
When running test/cql-pytest, pytest prints one warning at the end:
/home/nyh/scylla/test/cql-pytest/test_secondary_index.py:82: DeprecationWarning: ResultSet indexing support will be removed in 4.0.
Consider using ResultSet.one() to get a single row.
assert any([index_name in event.description for event in cql.execute(query, trace=True).get_query_trace().events])
So in this patch I do exactly what the warning recommends - use one().
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11002
Python has deprecated the distutils package. In several places in the
Alternator and Redis test suites, we used distutils.version to check if
the library is new enough for running the test (and skip the test if
it's too old). On new versions of Python, we started getting deprecation
warnings such as:
DeprecationWarning: The distutils package is deprecated and slated for
removal in Python 3.12. Use setuptools or check PEP 632 for potential
alternatives
PEP 632 recommends using package.version instead of distutils.version,
and indeed it works well. After applying this patch, Alternator and
Redis test runs no long end in silly deprecation warnings.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11007
This new test suite is expected to gather all kinds of permissions
tests - granting, revoking, authorizing, and so on.
Right now it contains a single minimal test which ensures that
the default superuser can be granted applicable permissions,
which they already have anyway.
The test suite added in this pull request will also be useful
when developing #10633 - permissions for UDF/UDA infrastructure.
Closes#10991
* github.com:scylladb/scylla:
cql-pytest: add initial permissions test suite
cql-pytest: enable CassandraAuthorizer for Scylla and Cassandra
There was a bug which caused incorrect results of limits()
for columns with reversed clustering order.
Such columns have reversed_type as their type and this
needs to be taken into account when comparing them.
It was introduced in 6d943e6cd0.
This commit replaced uses of get_value_comparator
with type_of. The difference between them is that
get_value_comparator applied ->without_reversed()
on the result type.
Because the type was reversed, comparisons like
1 < 2 evaluated to false.
This caused the test testIndexOnKeyWithReverseClustering
to fail, but sadly it wasn't caught by CI because
the CI itself has a bug that makes it skip some tests.
The test passes now, although it has to be run manually
to check that.
Fixes: #10918
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Closes#10994
Scylla's coding standard requires that each header is self-sufficient,
i.e., it includes whatever other headers it needs - so it can be included
without having to include any other header before it.
We have a test for this, "ninja dev-headers", but it isn't run very
frequently, and it turns out our code deviated from this requirement
in a few places. This patch fixes those places, and after it
"ninja dev-headers" succeeds again.
Fixes#10995
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10997
Currently, applying schema mutations involves flushing all schema
tables so that on restart commit log replay is performed on top of
latest schema (for correctness). The downside is that schema merge is
very sensitive to fdatasync latency. Flushing a single memtable
involves many syncs, and we flush several of them. It was observed to
take as long as 30 seconds on GCE disks under some conditions.
This patch changes the schema merge to rely on a separate commit log
to replay the mutations on restart. This way it doesn't have to wait
for memtables to be flushed. It has to wait for the commitlog to be
synced, but this cost is well amortized.
We put the mutations into a separate commit log so that schema can be
recovered before replaying user mutations. This is necessary because
regular writes have a dependency on schema version, and replaying on
top of latest schema satisfies all dependencies. Without this, we
could get loss of writes if we replay a write which depends on the
latest schema on top of old schema.
Also, if we have a separate commit log for schema we can delay schema
parsing for after the replay and avoid complexity of recognizing
schema transactions in the log and invoking the schema merge logic.
I reproduced bad behavior locally on my machine with a tired (high latency)
SSD disk, load driver remote. Under high load, I saw table alter (server-side part) taking
up to 10 seconds before. After the patch, it takes up to 200 ms (50:1 improvement).
Without load, it is 300ms vs 50ms.
Fixes#8272Fixes#8309Fixes#1459Closes#10333
* github.com:scylladb/scylla:
config: Introduce force_schema_commit_log option
config: Introduce unsafe_ignore_truncation_record
db: Avoid memtable flush latency on schema merge
db: Allow splitting initiatlization of system tables
db: Flush system.scylla_local on change
migration_manager: Do not drop system.IndexInfo on keyspace drop
Introduce SCHEMA_COMMITLOG cluster feature
frozen_mutation: Introduce freeze/unfreeze helpers for vectors of mutations
db/commitlog: Improve error messages in case of unknown column mapping
db/commitlog: Fix error format string to print the version
db: Introduce multi-table atomic apply()
Convert most use sites from `co_return coroutine::make_exception`
to `co_await coroutine::return_exception{,_ptr}` where possible.
In cases this is done in a catch clause, convert to
`co_return coroutine::exception`, generating an exception_ptr
if needed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#10972
This new test suite is expected to gather all kinds of permissions
tests - granting, revoking, authorizing, and so on.
Right now it contains a single minimal test which ensures that
the default superuser can be granted applicable permissions,
which they already have anyway.
In order to be able to test permissions, an authorizer different
than AllowAllAuthorizer (default) must be set.
CassandraAuthorizer is thus enabled - it works on default user/password
pair, so it doesn't introduce any regressions to the test suite.
"
The option controlls the IO bandwidth of the compaction sched class.
It's not set to be 16MB/s, but is unused. This set makes it 0 by
default (which means unlimited), live-updateable and plugs it to the
seastar sched group IO throttling.
branch: https://github.com/xemul/scylla/tree/br-compaction-throttling-3
tests: unit(dev),
v2: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1010/ ,
v2: manual config update
"
* 'br-compaction-throttling-3-a' of https://github.com/xemul/scylla:
compaction_manager: Add compaction throughput limit
updateable_value: Support dummy observing
serialized_action: Allow being observer for updateable_value
config: Tune the config option
The node now refuses to boot if schema tables were truncated.
This adds a config option to ignore truncation records as a
workaround if user truncated them manually.
Currently, applying schema mutations involves flushing all schema
tables so that on restart commit log replay is performed on top of
latest schema (for correctness). The downside is that schema merge is
very sensitive to fdatasync latency. Flushing a single memtable
involves many syncs, and we flush several of them. It was observed to
take as long as 30 seconds on GCE disks under some conditions.
This patch changes the schema merge to rely on a separate commit log
to replay the mutations on restart. This way it doesn't have to wait
for memtables to be flushed. It has to wait for the commitlog to be
synced, but this cost is well amortized.
We put the mutations into a separate commit log so that schema can be
recovered before replaying user mutations. This is necessary because
regular writes have a dependency on schema version, and replaying on
top of latest schema satisfies all dependencies. Without this, we
could get loss of writes if we replay a write which depends on the
latest schema on top of old schema.
Also, if we have a separate commit log for schema we can delay schema
parsing for after the replay and avoid complexity of recognizing
schema transactions in the log and invoking the schema merge logic.
One complication with this change is that replay_position markers are
commitlog-domain specific and cannot cross domains. They are recorded
in various places which survive node restart: sstables are annotated
with the maximum replay position, and they are present inside
truncation records. The former annotation is used by "truncate"
operation to drop sstables. To prevent old replay positions from being
interpreted in the context in the new schema commitlog domain, the
change refuses to boot if there are truncation records, and also
prohibits truncation of schema tables.
The boot sequence needs to know whether the cluster feature associated
with this change was enabled on all nodes. Fetaures are stored in
system.scylla_local. Because we need to read it before initializing
schema tables, the initialization of tables now has to be split into
two phases. The first phase initializes all system tables except
schema tables, and later we initialize schema tables, after reading
stored cluster features.
The commitlog domain is switched only when all nodes are upgraded, and
only after new node is restarted. This is so that we don't have to add
risky code to deal with hot-switching of the commitlog domain. Cold
switching is safer. This means that after upgrade there is a need for
yet another rolling restart round.
Fixes#8272Fixes#8309Fixes#1459