Refs #20686
Refs #15607
In #15060 we added forced new commitlog segment on user initated flush,
mainly so that tests can verify tombstone gc and other compaction related
things, without having to wait for "organic" segment deletion.
Schema commitlog was not included, mainly because we did not have tests
featuring compaction checks of schema related tables, but also because
it was assumed to be lower general througput.
There is however no real reason to not include it, and it will make some
testing much quicker and more predictable.
Closesscylladb/scylladb#20691
* Also dump diagnostics when a read times out while active (not queued).
* Add the "Trigger permit" line, containing the details of the permit which caused the diagnostics dump (by e.g. timing out).
* Add the "Identified bottleneck(s)" line, containing the identified bottlenecks which lead to permits being queued. This line is missing if no such bottleneck can be identified.
* Document the new features, as well as the stat dump, which was added some time ago.
Example of the new dump format:
```
INFO 2024-09-12 08:09:48,046 [shard 0:main] reader_concurrency_semaphore - Semaphore reader_concurrency_semaphore_dump_reader_diganostics with 8/10 count and 106192275/32768 memory resources: timed out, dumping permit diagnostics:
Trigger permit: count=0, memory=0, table=ks.tbl0, operation=mutation-query, state=waiting_for_admission
Identified bottleneck(s): memory
permits count memory table/operation/state
3 2 26M *.*/push-view-updates-2/active
3 2 16M ks.tbl1/push-view-updates-1/active
1 1 15M ks.tbl2/push-view-updates-1/active
1 0 13M ks.tbl1/multishard-mutation-query/active
1 0 12M ks.tbl0/push-view-updates-1/active
1 1 10M ks.tbl3/push-view-updates-2/active
1 1 6060K ks.tbl3/multishard-mutation-query/active
2 1 1930K ks.tbl0/push-view-updates-2/active
1 0 1216K ks.tbl0/multishard-mutation-query/active
6 0 0B ks.tbl1/shard-reader/waiting_for_admission
3 0 0B *.*/data-query/waiting_for_admission
9 0 0B ks.tbl0/mutation-query/waiting_for_admission
2 0 0B ks.tbl2/shard-reader/waiting_for_admission
4 0 0B ks.tbl0/shard-reader/waiting_for_admission
9 0 0B ks.tbl0/data-query/waiting_for_admission
7 0 0B ks.tbl3/mutation-query/waiting_for_admission
5 0 0B ks.tbl1/mutation-query/waiting_for_admission
2 0 0B ks.tbl2/mutation-query/waiting_for_admission
8 0 0B ks.tbl1/data-query/waiting_for_admission
1 0 0B *.*/mutation-query/waiting_for_admission
26 0 0B permits omitted for brevity
96 8 101M total
Stats:
permit_based_evictions: 0
time_based_evictions: 0
inactive_reads: 0
total_successful_reads: 0
total_failed_reads: 0
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 0
reads_admitted: 1
reads_enqueued_for_admission: 82
reads_enqueued_for_memory: 0
reads_admitted_immediately: 1
reads_queued_because_ready_list: 0
reads_queued_because_need_cpu_permits: 82
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 0
reads_queued_with_eviction: 0
total_permits: 97
current_permits: 96
need_cpu_permits: 0
awaits_permits: 0
disk_reads: 0
sstables_read: 0
```
Fixes: https://github.com/scylladb/scylladb/issues/19535
Improvement, no backport needed.
Closesscylladb/scylladb#20545
* github.com:scylladb/scylladb:
docs/dev/reader-concurrency-semaphore.md: update the documentation on diagnostics dumps
test/boost/reader_concurrency_semaphore_test: test the new diagnostics functionality
reader_concurrency_semaphore: add bottleneck self-diagnosis to diagnosis dump
reader_concurrency_semaphore: include trigger permit in diagnostic dump
reader_concurrency_semaphore: propagate permit to do_dump_reader_permit_diagnostics()
reader_concurrency_semaphore: use consistent exception type for timeout
reader_concurrency_semaphore: dump diagnostics when non-waiting reader times out
So the table is not dropped while the query is ongoing.
query() already does this but using old-fashioned enter()+leave(),
convert it to use the new RAII helper.
Closesscylladb/scylladb#20583
The main goal of this PR is to fix a bug (#20619) in the alternator_enforce_authorization=false setting - which didn't do its job (i.e, _don't_ check permissions) when authorization is configured in CQL but not wanted in Alternator.
The series also a few smaller bugs in the code that were discovered while debugging the main issue:
1. A potential use-after-free (that didn't seem to hit us in practice) is fixed.
2. A confusing error message (that was also reported in #20619) is improved.
3. Make the alternator_enforce_authorization live-updatable. There was no reason why it shouldn't be, and as this series needs to make this flag available to more code, let's just do it properly and assume the flag is live-updatable.
Because the RBAC feature has not been backported to any open-source branches, neither should these fixes. But if some private branch received a backport of the RBAC feature, it should get these fixes too.
Fixes#20619.
Closesscylladb/scylladb#20640
* github.com:scylladb/scylladb:
alternator: make alternator_enforce_authorization live-updateable
alternator: fix alternator_enforce_authorization=false
alternator: improve error message when unauthenticated
alternator: avoid use-after-free in RBAC
* seastar ec5da7a6...69f88e2f (38):
> build: s/Sanitizers_COMPILER_OPTIONS/Sanitizers_COMPILE_OPTIONS
> test: Update httpd test with request/reply body writing sugar
> http: Add sugar to request and response body writers
> utils: Add util::write_to_stream() helper
> seastar-addr2line: adjust llvm termination regex
> README.md: add Crimson project
> rpc: conditionally use fmt::runtime() based on SEASTAR_LOGGER_COMPILE_TIME_FMT
> build: check the combination of Sanitizers
> tls: clear session ticket before releasing
> print: remove dead code
> doc/lambda-coroutine-fiasco: reword for better readability
> rpc: fix compilation error caused by fmt::runtime()
> tutorial: explain the use case of rethrow_exception and coroutine::exception
> reactor: print more informative error when io_submit fails
> README.md: note GitHub discussions
> prometheus: `fmt::print` to stringstream directly
> doc: add document for testing with seastar
> seastar/testing: only include used headers
> test: Add abortable http client test cases
> http/client: Add abortable make_request() API method
> http/client: Abort established connections
> http/client: Handle abort source in pool wait
> http/client: Add abort source to factory::make() method
> http/client: Pass abort_source here and there
> http/client: Idnentation fix after previous patch
> http/client: Merge some continuations explicitly
> signal: add seastar signal api
> httpd: remove unused prometheus structs
> print: use fmtlib's fmt::format_string in format()
> rpc: do not use seastar::format() in rpc logger
> treewide: s/format/seastar::format/
> prometheus: sanitize label value for text protocol
> tests: unit test prometheus wire format
> io-tester: Introduce batches to rate-based submission
> io-tester: Generalize issueing request and collecting its result
> io-tester: Cancel intent once
> io-tester: Dont carry rps/parallelism variables over lambdas
> io-tester: Simplify in-flight management
The breaking changes in the seastar submodule necessitate corresponding
modifications in our code. These changes must be implemented together in
a single commit to maintain consistency. So that each commit is buildable.
following changes are included in addition to seastar submodule update:
* instead of passing a `const char*` for the format string, pass a
templated `fmt::format_string<...>`, this depends on the
`seastar::format()` change in seastar.
* explicitly call `fmt::runtime()` if the format string is not a
consteval expression. this depends on the `seastar::format()` change
in seastar. as `seastar::format()` does not accept a plain
`const char*` which is not constexpr anymore.
* pass abort_source to `dns_connection_factory::make()`. this depends on
the change in seastar, which added a `abort_source*` argument to
the pure virtual member function of `connection_factory::make()`.
* call call {fmt,seastar}::format() explicitly. this is a follow up of
3e84d43f, which takes care of all places where we should call
`fmt::format()` and `seastar::format()` explicitly to disambiguate the
`format()` call. but more `format()` call made their way into the source
tree after 3e84d43f. so we need fix them as well.
* include used header in tests
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Update seastar submodule
Please enter the commit message for your changes. Lines starting
Closesscylladb/scylladb#20649
branch-6.2 is already available, adding support for it in mergify to
allow backport to this new branch.
in addition, since branch 5.4 reached EOL - removing it
Closesscylladb/scylladb#20669
Fixes#20633
Cannot assert on actual request_controller when releasing permit, as the
release, if we have waiters in queue, will subtract some units to hand to them.
Instead assert on permit size + waiter status (and if zero, also controller value)
* v2 - use SCYLLA_ASSERT
Closesscylladb/scylladb#20654
repair_put_row_diff_with_rpc_stream_process_op() always returns
stop_iteration::no (or throws). Moreover, the return value is ignored
by its only caller. Simplify by returning a plain future<>.
Closesscylladb/scylladb#20610
Most of the analysis of the WHERE clause is done in statement_restrictions. It determines
what parts to use for the primary or secondary index, and what parts to use for filtering.
The difficult part is that it has a very wide interface. After construction, the user must pick
the correct bits from many public functions. There are subtle interactions between them
that are hard to untangle.
This series simplifies the interface as it is used for selection filtering. In the end, only
two public functions are used, both returning expressions: one for the partition-level
filtering, one for the clustering row level filtering.
In the end, the WHERE clause is factored into three parts:
- one part goes into the read_command of the primary or secondary index
- another part (that references only partition key columns and static key columns) is used to filter entire partitions
- another part (that currently references only clustering key columns and regular columns, but one day may reference other columns) is used to filter clustering rows
Refactoring, no backport.
Closesscylladb/scylladb#20487
* github.com:scylladb/scylladb:
cql3: statement_restrictions: drop accessors for single-column key restrictions
cql3: selection: adjust indentation
cql3: selection: delete empty loop
cql3: statement_restrictions, selection: fold multi-column restrictions into row-level filter
cql3: statement_restrictions, selection: merge clustering key filter and regular columns filter
cql3: statement_restrictions, selection: merge partition key filter and static columns filter
cql3: selection: filter regular and static rows as a single expression each
cql3: statement_restrictions: collect regular column and static column filters into single expressions
cql3: selection: filter clustering key as a single expression
cql3: statement_restrictions: expose filter for clustering key
cql3: selection: filter partition key as a single expression
cql3: statement_restrictions: expose filter for partition key
cql3: statement_restrictions: remove relations used for indexing from filtering
cql3: statement_restrictions: bail out of find_idx if !_uses_secondary_index
cql3: statement_restrictions, modification_statement: pass correct value of check_indexes
cql3: statement_restrictions: correct mismatched clustering/partition restrictions references
cql3: statement_restrictions: precalculate get_column_defs_for_filtering()
cql3: selection: do_filter(): push static/regular row glue to higher level
In https://github.com/scylladb/scylladb/pull/18729, we introduced a new statement tenant `$maintenance`, but the change wasn't protected by any cluster feature.
This wasn't a problem for OSS, since unknown isolation cookie just uses default scheduling group. However, in enterprise that leads to creating a service level on not-upgraded nodes, which may end up in an error if user create maximum number of service levels.
This patch adds a cluster feature to guard adding the new tenant. It's done in the way to handle two upgrade scenarios:
- version without `$maintenance` tenant -> version with `$maintenance` tenant guarded by a feature
- version with `$maintenance` tenant but not guarded by a feature -> version with `$maintenance` tenant guarded by a feature
The PR adds `enabled` flag to statement tenants.
This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection.
The `$maintenance` tenant is added to the config as disabled and it gets enabled once the corresponding feature is enabled.
Fixesscylladb/scylladb#20070
Refs scylladb/scylla-enterprise#4403Closesscylladb/scylladb#19802
* github.com:scylladb/scylladb:
message/messaging_service: guard adding maintenance tenant under cluster feature
message/messaging_service: add feature_service dependency
message/messaging_service: add `enabled` flag to statement tenants
For no good reason, the "alternator_enforce_authorization" flag (which
chooses whether to enable authentication and authorization checks in
Alternator) was not live-updatable, so make it so.
Both "server" and "executor" objects use this configuration flag, the
former is fixed in this patch (to hold a live-updatable reference
instead of a copy of a boolean), the latter was already prepared for
this change and already held a live-updatable reference.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When the configuration has alternator_enforce_authorization=false,
Alternator should not do authentication (check which user signed each
request) nor authorization (check if that user has permissions to do
each operation).
Our implementation forgot to disable the authorization checks when
it's configured to false. The (incorrect) assumption was that when
alternator_enforce_authorization is configured to false, the CQL
'authenticator' and 'authorizer' configuration is also disabled -
so the authorization checks will be no-ops. But we can't assume
that: Users are free to configure 'authenticator' and 'authorizer'
for use in CQL, and then set alternator_enforce_authorization=false
just for Alternator.
So this patch adds a new test for this case - when we have
authenticator=PasswordAuthenticator, authorizer=CassandraAuthorizer
but alternator_enforce_authorization=false, and fixes it to work
correctly.
The heart of the fix is trivial: the `verify_*_permission()` functions
just need to check the alternator_enforce_authorization and return
immediately when false. The bigger part of this change is to get the
alternator_enforce_authorization into the "executor" object and then
to pass it into the verify calls.
Although alternator_enforce_authorization is not YET live updatable,
this code is prepared for the future that it may become live
updatable, so the executor object saves not the boolean value of
this flag, but a live-updatable reference to it.
Fixes#20619
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When access-control checks report permission denied, we want to report
the name of the authenticated role (the role signing the request) which
didn't have the permission. When authentication was disabled, and there
is no authenticated role, we printed the fake name "anonymous", but this
can confuse users (it confused me!) to think there's an actual role
named "anonymous". So let's change that string to "<anonymous>" with
angle brackets - it makes it more obvious that this isn't a real role,
but actually an anonymous request.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
While auditing the code, I noticed that the current Alternator access
control checks have code like:
```
return client_state.check_has_permission(auth::command_desc(
permission_to_check,
auth::make_data_resource(schema->ks_name(), schema->cf_name()))).then(
```
There's a problem here - it turns out that, unfortunately, command_desc
holds a reference to the "resource" object - not a copy. So the temporary
object returned by make_data_resource may be freed and then used...
Curiously, we've not seen a bug caused by this in practice (not even in
debug build mode), but better safe than sorry, so this patch changes the
code in one of two ways:
1. Code using coroutines can keep the "resource" as a variable on the
stack.
2. Code using continuations needs to hold the "resource" with do_with(),
but since this already incurs the cost of an extra allocation
(even in the successful case), might as well just switch to using
coroutines and have less ugly code.
This patch does not change any functionality, and all the tests seem to
work before and after it the same.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
hello
This PR adds the possibility to gather resource consumption metrics. The collected metrics can be used to compare performance before and after specific changes aimed at increasing performance. Currently, this functionality works only in manual mode, and this is just raw data. Later on, these metrics can be used in Jupyter notebook to analyze and visualize how the resources are used and can provide the insight on how to improve it. This PR is a first insight after gathering these metrics.
Add the possibility to gather resource consumption for the test.py execution. SQLite DB will be created with different performance metrics that will allow comparing the resource consumption between changes.
The DB will be in the tmp directory that by default set to testlog. Across the runs, the DB will not be deleted, so each new run will just add information to the existing DB.
Parameter --get-metrics was added to switch on or off the metrics gathering. By default, it's switched on.
Closes: scylladb/qa-tasks#1666Closes: scylladb/qa-tasks#1707Closesscylladb/scylladb#19881
Currently the function calls boost::partial_sort with a middle
iterator that might be out of bound and cause undefined behavior.
Check the vector size, and do a partial sort only if its longer
than `max_sstables`, otherwise sort the whole vector.
Fixesscylladb/scylladb#20608
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#20609
The `consume*()` variants just forward the call to the `_impl` method with the same name. The latter, being a member of `::impl`, will bypass the top level `fill_buffer()`, etc. methods and thus will never call `set_close_required()`. Do this in the top-level `consume*()` methods instead, to ensure a reader, on which only `consume*()` is called, and then is destroyed, will complain as it should (and abort).
Only one place was found in core code, which didn't close the reader: `split_mutation() in `mutation/mutation.cc` and this reader is the "from-mutation" one which has no real close routine. All other places were in tests. All this is to say, there were no real bugs uncovered by this PR.
Fixes#16520
Improvement, no backport required.
Closesscylladb/scylladb#16522
* github.com:scylladb/scylladb:
readers/flat_mutation_reader_v2: call set_close_required() from consume*()
test/boost/sstable_compaction_test: close reader after use
test/boost/repair_test: close reader after use
mutation/mutation: split_mutation(): close reader after use
"crawling" is a little bit obscure in this context. so let's rename this class to reflect the fact that this reader only reads the entire content of the sstable.
both crawling reader for kl and mx formats are renamed. also, in order to be consistent, all "crawling reader" in variable names are updated as well.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#20599
* github.com:scylladb/scylladb:
sstable: s/crawling_sstable_mutation_reader/sstable_full_scan_reader
sstable/mx/reader: add comment for mx_crawling_sstable_mutation_reader
Requests sent by S3 are retriable, so when request.write_body() is
called, it should keep everything intact in case http client will call
it again.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20579
New sstables for a table are created by the table::make_sstable() method. The method then calls sstables_manager::make_sstable() and passes there a path to component files which, in turn, sits on table::config. Since some time ago having an on-disk path for an sstable had become optional, as sstables could be put on S3 storage without local paths involved. In that case the aforementioned "path" is ~~ab~~used as a key in the system.sstables registry, that references a record with information used to retrieve URLs of sstables' objects.
This PR removes the "path" argument from sstables_manager::make_sstable() and its sstable_sdirectory peer. The details of sstables' location are moved onto storage_options and depend on storage type. For now in both storage types this location is still the good-old $datadir/$keyspace/$table-$uuid string. S3 storage needs to be patched more to use more elegant "location" value.
Eventually the `table::config::{datadir|all_datadirs}` will be removed, this PR is the step towards it.
closes: #12707Closesscylladb/scylladb#20542
* github.com:scylladb/scylladb:
table: Use storage options to clean the storage
sstables/storage: Re-use ocally generated vector of paths
sstables/storage: Visit options once to initialize storage
sstables_manager: Return table storage options when initalizing storage
sstables/storage: Fix indentation after previous patch
table: Move datadirs initialization parallelism to storage level
sstables/storage: Split the visitor's overloaded functor
restore: Don't use table_dir to construct sstable_directory
sstable_directory: Remove table_dir field
sstable_directory: Use options details in lister
sstables_manager: Remove table_dir from make_sstable()
sstables: Remove table_dir from sstable constructor
sstables/storage: Remove sstring dir from make_storage()
sstables/storage: Use options to construct
tests: Properly initialize storage options with "dir"
distributed_loader: Create S3 options with prefix for restore
storage_options: Add special-purpose local options maker
storage_options: Keep local path / s3 prefix onboard
table: Get another options when initializing storage
Allow create_pending_deletion_log to delete a bunch of sstables
potentially resides in different prefixes (e.g. in the base directory
and under staging/).
The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups. Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories.
Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit.
Fixesscylladb/scylladb#18862
Needs backport to 6.0 since tablets require this capability
Closesscylladb/scylladb#19555
* github.com:scylladb/scylladb:
sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory
sstables: storage: keep base directory in base class
sstables: storage: define opened_directory in header file
sstable_directory: use only dirlog
"crawling" is a little bit obscure in this context. so let's rename this
class to reflect the fact that this reader only reads the entire content
of the sstable.
both crawling reader for kl and mx formats are renamed. also, in order
to be consistent, all "crawling reader" in variable names are updated
as well.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The cleanup compaction task is a maintenance operation that runs after
topology changes. So, run it under the maintenance scheduling group to
avoid interference with regular compaction tasks. Also remove the share
allocations done by the cleanup task, as they are unnecessary when
running under the maintenance group.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#20582
Adding a new tenant needs to be done under cluster feature protection.
However it wasn't the case for adding `$maintenance` statement tenant
and to fix it we need to support an upgrade from node which doesn't
know about maintenance tenant at all and from one which uses it without
any cluster feature protection.
This commit adds `enabled` flag to statement tenants.
This way, when the tenant is disabled, it cannot be used to create
a connection, but it can be used to accept an incoming connection.
When filtering, we apply single-column and multi-column filters separately.
This is completely unnecessary. Find the multi-column filters during prepare
time and append them to the row-level filter.
This slightly changes the original: in the original, if we had a multi-column
filter, we applied all of the restrictions. But hopefully if we check
for multi-column filters, that's what we need.
The two filters are used in the same way: check the filter, return false if
it matches.
Unify the two filters into a clustering_row_level_filter.
Since one of the two filters wasn't std::optional, we take the liberty
of making the combined filter non-optional.
The two filters are used in the same way: check the filter, set a boolean
flag if it matches, return false. The two boolean flags are in turn checked
in the same way.
Unify the two filters into a partition_level_filter.
Since one of the two filters wasn't std::optional, we take the liberty
of making the combined filter non-optional.
Since 3c7af28725, the cqlsh submodule no longer contains a
bin/cqlsh shell script. This broke the supermodule's bin/cqlsh
shortcut.
Fix it by invoking cqlsh.py directly.
Closesscylladb/scylladb#20591
Cleanup of a deallocated tablet throws an exception.
Since failed cleanup is retried, we end up in an infinite loop.
Ignore cleanup of deallocated storage groups.
Fixes: #19752.
Needs to be backported to all branches with tablets (6.0 and later)
Closesscylladb/scylladb#20584
* github.com:scylladb/scylladb:
test: check if cleanup of deallocated sg is ignored
replica: ignore cleanup of deallocated storage group
To drop a semaphore it should not be held by anyone, so we need to
release out units before checking if a semaphore can be dropped.
Fixes: scylladb/scylladb#20602Closesscylladb/scylladb#20607
as `_bucket` is an `unordered_map<bucket_id, timestamp_bucket_writer>`,
when writing to a given bucket, we try to create a writer with the
specified bucket id, so the returned iterator should point to a node
whose `first` element is always the bucket id.
so, there is no need to reference `it` for the bucket id, let's just
reference the parameter. simpler this way.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20598
Instead of filtering regular and static columns column by column, call
is_satisfied_by() for an expression containing all the static columns
predicates, and one for all the regular column.
We cannot have one expression, since the code sets
_current_static_row_does_not_match only for static columns.
Note the fix for #20485 is now implicit, since the evaluation machinery
will treat missing regular columns as NULL.
Similar to previous work with clustering and partition key, expose
static and reglar column filters as single expressions.
Since we don't currently expose a boolean for whether those filters
exist, we expose them now as non-optionals. In any case evaluating
an empty conjunction is plenty fast.