The namespace replica is broken in the middle with sstable_list alias,
while the latter can be declared earlier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18664
Currently all tables are printed in statements like `DESC TABLES`, `DESC KEYSPACE ks` or `DESC SCHEMA`.
But when we create a table with cdc enabled, additional table with `_scylla_cdc_log` suffix is created.
Those tables shouldn't be recreated manually but created automatically when the base table is created.
This patch hides tables with `_scylla_cdc_log` suffix in all describe statements.
To preserve properties values of those tables, `ALTER TABLE` statement with all properties and their current values for log cdc table is added to description of the base table.
Fixes#18459Closesscylladb/scylladb#18467
* github.com:scylladb/scylladb:
test/cql-pytest/test_describe: add test for hiding cdc tables
cql3/statements/describe_statement: hide cdc tables
schema: add a method to generate ALTER statement with all properties
schema: extract schema's properties generation
Currently, if the fill ctor throws an exception,
the destructor won't be called, as it object is not fully constructed yet.
Call the default ctor first (which doesn't throw)
to make sure the destructor will be called on exception.
Fixesscylladb/scylladb#18635
- [x] Although the fixes is for a rare bug, it has very low risk and so it's worth backporting to all live versions
Closesscylladb/scylladb#18636
* github.com:scylladb/scylladb:
chunked_vector_test: add more exception safety tests
chunked_vector_test: exception_safe_class: count also moved objects
utils: chunked_vector: fill ctor: make exception safe
We don't attempt to create an endpoint manager for a hint directory if there is no mapping host ID–IP corresponding to the directory's name, an IP address. That prevents a segmentation fault.
Fixesscylladb/scylladb#18649Closesscylladb/scylladb#18650
* github.com:scylladb/scylladb:
db/hints: Remove an unused header
db/hints: Remove migrating flag before initializing endpoint managers
db/hints: Prevent segmentation fault when initializing endpoint managers
This patch adds metrics that will be reported per-table per-node.
The added metrics (that are part of the per-table per-shard metrics)
are:
scylla_column_family_cache_hit_rate
scylla_column_family_read_latency
scylla_column_family_write_latency
scylla_column_family_live_disk_space
Fixes#18642
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Closesscylladb/scylladb#18645
incremental_reader_selector is the mechanism for incremental comsumption
of disjoint sstables on range reads.
tablet_sstable_set was implemented, such that selector is efficient with
tablets.
The problem is selector is vnode addicted and will only consider a given
set exhausted when maximum token is reached.
With tablets, that means a range read on first tablet of a given shard
will also consume other tablets living in the same shard. That results
in combined reader having to work with empty sstable readers of tablets
that don't intersect with the range of the read. It won't cause extra
I/O because the underlying sstables don't intersect with the range of
the read. It's only unnecessary CPU work, as it involves creating
readers (= allocation), feeding them into combined reader, which will
in turn invoke the sstable readers only to realize they don't have any
data for that range.
With 100k tablets (ranges), and 100 tablets per shard, and ~5 sstables
per tablet, there will be this amount of readers (empty or not):
(100k * ((100^2 + 100) / 2) * avg_sstable_per_tablet=5) = ~2.5 billions.
~5000 times more readers, it can be quite significant additional cpu
work, even though I/O dominates the most in scans. It's an inefficiency
that we rather get rid of.
The behavior can be observed from logs (there's 1 sstable for each of
4 tablets, but note how readers are created for every single one of
them when reading only 1 tablet range):
```
table - make_reader_v2 - range=(-inf, {-4611686018427387905, end}]
incremental_reader_selector - create_new_readers(null): selecting on pos {minimum token, w=-1}
sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._34qn42... that has range [{-9151620220812943033, start},{-4813568684827439727, end}]
incremental_reader_selector - create_new_readers(null): selecting on pos {-4611686018427387904, w=-1}
sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._368nk2... that has range [{-4599560452460784857, start},{-78043747517466964, end}]
incremental_reader_selector - create_new_readers(null): selecting on pos {0, w=-1}
sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._38lj42... that has range [{851021166589397842, start},{3516631334339266977, end}]
incremental_reader_selector - create_new_readers(null): selecting on pos {4611686018427387904, w=-1}
sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._3dba82... that has range [{5065088566032249228, start},{9215673076482556375, end}]
```
Fix is about making sure the tablet set won't select past the
supplied range of the read.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18556
Currently, all documentation links that feature anywhere in the help output of scylla-nodetool, are hard-coded to point to the documentation of the latest stable release. As our documentation is version and product (open-source or enterprise) specific, this is not correct. This PR addresses this, by generating documentation links such that they point to the documentation appropriate for the product and version of the scylladb release.
Fixes: https://github.com/scylladb/scylladb/issues/18276
- [x] the native nodetool is a new feature, no backport needed
Closesscylladb/scylladb#18476
* github.com:scylladb/scylladb:
tools/scylla-nodetool: make doc link version-specific
release: introduce doc_link()
build: pass scylla product to release.cc
There are two metrics to help observe base-write throttling:
* current_throttled_base_writes
* last_mv_flow_control_delay
Both show a snapshot of what is happening right at the time of querying
these metrincs. This doesn't work well when one wants to investigate the
role throttling is playing in occasional write timeouts.s Prometheus
scrapes metrics in multi-second intervals, and the probability of that
instant catching the throttling at play is very small (almost zero).
Add two new metrics:
* throttled_base_writes_total
* mv_flow_control_delay_total
These accumulate all values, allowing graphana to derive the values and
extract information about throttle events that happened in the past
(but not necessarily at the instant of the scrape).
Note that dividing the two values, will yield the average delay for a
throttle, which is also useful.
Closesscylladb/scylladb#18435
Before these changes, if initializing endpoint
managers after the migration of hinted handoff
to host ID is done throws an exception, we
don't remove the flag indicating the migration
is still in progress. However, the migration
has, in practice, finished -- all of the
hint directories have been mapped to host IDs
and all of the nodes in the cluster are
host-ID-based. Because of that, it makes sense
to remove the flag early on.
If hinted handoff is still IP-based and there is
a hint directory representing an IP without
a corresponding mapping to a host ID in
`locator::token_metadata`, an attemp to initialize
its endpoint manager will result in a segmentation
fault. This commit prevents that.
We have to account for moved objects as well
as copied objects so they will be balanced with
the respective `del_live_object` calls called
by the destructor.
However, since chunked_vector requires the
value_type to be nothrow_move_constructible,
just count the additional live object, but
do not modify _countdown or, respectively, throw
an exception, as this should be considered only
for the default and copy constructors.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, if the fill ctor throws an exception,
the destructor won't be called, as it object is not
fully constructed yet.
Call the default ctor first (which doesn't throw)
to make sure the destructor will be called on exception.
Fixesscylladb/scylladb#18635
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Tables with `_scylla_cdc_log` suffix are internal tables used by cdc.
We want to hide those tables in all describe statements, as they
shouldn't be created by user but created by Scylla when user creates a
table with cdc enabled.
Instead, we include `ALTER TABLE <cdc log table> WITH <all table properties>`
to the description of cdc base table, so all changes to cdc log table's
properties are preserved in backup.
In the describe statement, we need to generate `ALTER TABLE` statement
with all schema's properties for some tables (cdc log tables).
The method prints valid CQL statement with current values of
the properties.
In commit 642f9a1966 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.
This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.
It is set to 0.1 by default.
Fixes#18615Closesscylladb/scylladb#18634
Closesscylladb/scylladb#18616
* github.com:scylladb/scylladb:
replica: Make it explicit table's sstable set is immutable
replica: avoid reallocations in tablet_sstable_set
replica: Avoid compound set if only one sstable set is filled
There's a loop that calculates the number of shard matches over a tablet
map. The check of the given shard against optional<shard> can be made
shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18592
As part of the unification process, alternator tests are migrated to the PythonTestSuite instead of using the RunTestSuite. The main idea is to have one suite, so there will be easier to maintain and introduce new features.
Introduce the prepare_sql option for suite.yaml to add possibility to run cql statements as precondition for the test suite.
Related: https://github.com/scylladb/scylladb/issues/18188Closesscylladb/scylladb#18442
The default limit of open file descriptors
per process may be too small for iotune on
certain machines with large number of cores.
In such case iotune reports failure due to
unability to create files or to set up seastar
framework.
This change configures the limit of open file
descriptors before running iotune to ensure
that the failure does not occur.
The limit is set via 'resource.setrlimit()' in
the parent process. The limit is then inherited
by the child process.
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#18546
In b4e66ddf1d (4.0) we added a new batchlog_manager configuration
named delay, but forgot to initialize it in cql_test_env. This somehow
worked, but doesn't with clang 18.
Fix it by initializing to 0 (there isn't a good reason to delay it).
Also provide a default to make it safer.
Closesscylladb/scylladb#18572
* tools/cqlsh e5f5eafd...c8158555 (11):
> cqlshlib/sslhandling: fix logic of `ssl_check_hostname`
> cqlshlib/sslhandling.py: don't use empty userkey/usercert
> Dockerfile: noninteractive isn't enough for answering yet on apt-get
> fix cqlsh version print
> cqlshlib/sslhandling: change `check_hostname` deafult to False
> Introduce new ssl configuration for disableing check_hostname
> set the hostname in ssl_options.server_hostname when SSL is used
> issue-73 Fixed a bug where username and password from the credentials file were ignored.
> issue-73 Fixed a bug where username and password from the credentials file were ignored.
> issue-73
> github actions: update `cibuildwheel==v2.16.5`
Fixes: scylladb/scylladb#18590Closesscylladb/scylladb#18591
The code is based on similar idea as perf_simple_query. The main differences are:
- it starts full scylla process
- communicates with alternator via http (localhost)
- uses richer table schema with all dynamoDB types instead of only strings
Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).
Results on my machine (with 1 vCPU):
> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
...
median 23402.59616090321
median absolute deviation: 598.77
maximum: 24014.41
minimum: 19990.34
> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
...
median 16089.34211320635
median absolute deviation: 552.65
maximum: 16915.95
minimum: 14781.97
The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).
Related: https://github.com/scylladb/scylladb/issues/12518Closesscylladb/scylladb#13121
* github.com:scylladb/scylladb:
test: perf: alternator: add option to skip data pre-population
perf-alternator-workloads: add operations-per-shard option
test: perf: add global secondary indexes write workload for alternator
test: perf: add option to continue after failed request
test: perf: add read modify write workload for alternator (lwt)
test: perf: add scan workload for alternator
test: perf: add end-to-end benchmark for alternator
test: perf: extract result aggregation logic to a separate struct
in 906700d5, we accepted 0 as well as the return code of
"nodetool <command> --help", because we needed to be prepared for
the newer seastar submodule while be compatible with the older
seastar versions. now that in 305f1bd3, we bumped up the seastar
module, and this commit picked up the change to return 0 when
handling "--help" command line option in seastar, we are able to
drop the workaround.
so, in this change, we only use "0" as the expected return code.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18627
in the same spirit of d57a82c156, this change adds `dist-unified` as one of the default targets. so that it is built by default. the unified package is required to when redistributing the precompiled packages -- we publish the rpm, deb and tar balls to S3.
- [x] cmake related change, no need to backport
Closesscylladb/scylladb#18621
* github.com:scylladb/scylladb:
build: cmake: use paths to be compatible with CI
build: cmake build dist-unified by default
password_authenticator::create_default_if_missing() is a confusing mix of
coroutines and continuations, simplify it to a normal coroutine.
Closesscylladb/scylladb#18571
our CI workflow for publishing the packages expects the tar balls
to be located under `build/$buildMode/dist/tar`, where `$buildMode`
is "release" or "debug".
before this change, the CMake building system puts the tar balls
under "build/dist" when the multi-config generator is used. and
`configure.py` uses multi-config generator.
in this change, we put the tar balls for redistribution under
`build/$<CONFIG>/dist/tar`, where `$<CONFIG>` is "RelWithDebInfo"
or "Debug", this works better with the CI workflow -- we just need
to map "release" and "debug" to "RelWithDebInfo" and "Debug" respectively.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in the same spirit of d57a82c156, this change adds `dist-unified`
as one of the default targets. so that it is built by default.
the unified package is required to when redistributing the precompiled
packages -- we publish the rpm, deb and tar balls to S3.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Most of the time only main set is filled, so we can avoid one layer
of indirection (= compound set) when maintenance set is empty.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently empty storage_groups are allocated for tablets that are
not on this shard.
Allocate storage groups dynamically, i.e.:
- on table creation allocate only storage groups that are on this
shard;
- allocate a storage group for tablet that is moved to this shard;
- deallocate storage group for tablet that is cleaned up.
Stop compaction group before it's deallocated.
Add a flag to table::cleanup_tablet deciding whether to deallocate
sgs and use it in commitlog tests.
During compaction_group::cleanup sstables set is updated, but
row_cache::_underlaying still keeps a shared ptr to the old set.
Due to that descriptors to deleted sstables aren't closed.
Refresh snapshot in order to store new sstables set in _underlying
mutation source.
Add rwlock which prevents storage groups from being added/deleted
while some other layers itereates over them (or their compaction
groups).
Add methods to iterate over storage groups with the lock held.
In the following patches, storage groups (and so also sstables sets)
will be allocated only for tablets that are located on this shard.
Some layers may try to read non-existing sstable sets.
Handle this case as if the sstables set was empty instead of calling
on_internal_error.
If allow_write_both_read_old tablet transition stage fails, move
to cleanup_target stage before reverting migration.
It's a preparation for further patches which deallocate storage
group of a tablet during cleanup.
Pass compaction group id to
shard_reshaping_compaction_task_impl::reshape_compaction_group.
Modify table::as_table_state to return table_state of the given
compaction group.