When a table is dropped it should wait for all pending operations in the
table before the table is destroyed, because the operations may use the
table's resources.
With counter update operations, currently this is not the case. The
table may be destroyed while there is a counter update operation in
progress, causing an assert to be triggered due to a resource being
destroyed while it's in use.
The reason the operation is not waited for is a mistake in the lifetime
management of the object representing the write in progress. The commit
fixes it so the object lives for the duration of the entire counter
update operation, by moving it to the `do_with` list.
Fixesscylladb/scylla-enterprise#4475Closesscylladb/scylladb#20017
In the case of the user semaphore, this receives the new
reader_concurrency_semaphore_cpu_limit config item.
Not used yet.
(cherry picked from commit 59faa6d4ff)
This config item is propagated to the table object via table::config.
Although the field in table::config, used to propagate the value, was
utils::updateable_value<T>, it was assigned a constant and so the
live-update chain was broken.
This patch fixes this.
(cherry picked from commit dbccb61636)
Separate keyspace which also behaves as system brings
little benefit while creating some compatibility problems
like schema digest mismatch during rollback. So we decided
to move auth tables into system keyspace.
Fixes https://github.com/scylladb/scylladb/issues/18098Closesscylladb/scylladb#18769
(cherry picked from commit 2ab143fb40)
[avi: adjust test/alternator/suite.yaml to reflect new keyspace]
Currently, we do not explicitly set a scheduling group for the schema
commitlog which causes it to run in the default scheduling group (called
"main"). However:
- It is important and significant enough that it should run in a
scheduling group that is separate from the main one,
- It should not run in the existing "commitlog" group as user writes may
sometimes need to wait for schema commitlog writes (e.g. read barrier
done to learn the schema necessary to interpret the user write) and we
want to avoid priority inversion issues.
Therefore, introduce a new scheduling group dedicated to the schema
commitlog.
Fixes: scylladb/scylladb#15566Closesscylladb/scylladb#18715
User-defined types can depend on each other, creating directed acyclic graph.
In order to support restoring schema from `DESC SCHEMA`, UDTs should be
ordered topologically, not alphabetically as it was till now.
This patch changes the way UDTs are ordered in `DESC SCHEMA`/`DESC KEYSPACE <ks>` statements, so the output can be safely copy-pasted to restore the schema.
Fixes#18539Closesscylladb/scylladb#18302
* github.com:scylladb/scylladb:
test/cql-pytest/test_describe: add test for UDTs ordering
cql3/statements/describe_statement: UDTs topological sorting
cql3/statements/describe_statement: allow to skip alphabetical sorting
types: add a method to get all referenced user types
db/cql_type_parser: use generic topological sorting
db/cql_type_parses: futurize raw_builder::build()
test/boost: add test for topological sorting
utils: introduce generic topological sorting algorithm
PR https://github.com/scylladb/scylladb/pull/18186 introduced a fiber that reloads reclaimed bloom filters when memory becomes available. Use maintenance scheduling group to run that fiber instead of running it in the main scheduling group.
Fixes#18675Closesscylladb/scylladb#18721
* github.com:scylladb/scylladb:
sstables_manager: use maintenance scheduling group to run components reload fiber
sstables_manager: add member to store maintenance scheduling group
Store that maintenance scheduling group inside the sstables_manager. The
next patch will use this to run the components reloader fiber.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
When a tablet is migrated away, any inactive read which might be reading from said tablet, has to be dropped. Otherwise these inactive reads can prevent sstables from being removed and these sstables can potentially survive until the tablet is migrated back and resurrect data.
This series introduces the fix as well as a reproducer test.
Fixes: https://github.com/scylladb/scylladb/issues/18110Closesscylladb/scylladb#18179
* github.com:scylladb/scylladb:
test: add test for cleaning up cached querier on tablet migration
querier: allow injecting cache entry ttl by error injector
replica/table: cleanup_tablet(): clear inactive reads for the tablet
replica/database: introduce clear_inactive_reads_for_tablet()
replica/database: introduce foreach_reader_concurrency_semaphore
reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table()
reader_concurrency_semaphore: allow storing a range with the inactive reader
reader_concurrency_semaphore: avoid detach() in inactive_read_handle::abandon()
Currently we have a single method -- detach_column_family() -- which
does something with each semaphore. Soon there will be another one.
Introduce a method to do something with all semaphores, to make this
smoother. Enterprise has a different set of semaphores, and this will
reduce friction.
The populate_views() and generate_and_propagate_view_updates() both naturally belong to view_update_generator -- they don't need anything special from table itself, but rather depend on some internals of the v.u.generator itself.
Moving them there lets removing the view concurrency semaphore from keyspace and table, thus reducing the cross-components dependencies.
Closesscylladb/scylladb#18421
* github.com:scylladb/scylladb:
replica: Do not carry view concurrency semaphore pointer around
view: Get concurrency semaphore via database, not table
view_update_generator: Mark mutate_MV() private
view: Move view_update_generator methods' code
view: Move table::generate_and_propagate_view_updates into view code
view: Move table::populate_views() into view_update_generator class
There are two places that get total:live stats for a table snapshot --
database::get_snapshot_details() and table::get_snapshot_details(). Both
do pretty similar thing -- walk the table/snapshots/ directory, then
each of the found sub-directory and accumulate the found files' sizes
into snapshot details structure.
Both try to tell total from live sizes by checking whether an sstable
component found in snapshots is present in the table datadir. The
database code does it in a more correct way -- not just checks the file
presense, but also compares if it's a hardlink on the snapshot file,
while the table code just checks if the file of the same name exists.
This patch does both -- makes both database and table call the same
helper method for a single snapshot details, and makes the generalized
version use more elaborated collision check, thus fixing the per-table
details getting behavior.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18347
There's a database::get_snapshot_details() method that returns collection of all snapshots for all ks.cf out there and there are several *snapshot_details* aux structures around it. This PR keeps only one "details" and cleans up the way it propagates from database up to the respective API calls.
Closesscylladb/scylladb#18317
* github.com:scylladb/scylladb:
snapshot_ctl: Brush up true_snapshots_size() internals
snapshot_ctl: Remove unused details struct
snapshot_ctl: No double recoding of details
database,snapshots: Move database::snapshot_details into snapshot_ctl
database,snapshots: Make database::get_snapshot_details() return map, not vector
table,snapshots: Move table::snapshot_details into snapshot_ctl
The semaphore in question is used to limit parallelism of manipulations with table's sstables. It's currently used in two places -- sstable_directory (mainly on boot) and by table::take_snapshot() to take snapshot. For the latter, there's also a database -> sharded<directory_semaphore> reference.
This PR sanitizes the semaphore usage. The results are
- directory_semaphore no longer needs to friend several classes that mess with its internals
- database no longer references directory_semaphore
Closesscylladb/scylladb#18281
* github.com:scylladb/scylladb:
database: Keep local directory_semaphore to initialize sstables managers
database: Don't reference directory_semaphore
table: Use directory semaphore from sstables manager
table: Indentation fix after previous patch
table: Use directory_semaphore for rate-limited snapshot taking
sstables: Move directory_semaphore::parallel_for_each() to header
sstables: Move parallel_for_each_restricted to directory_semaphore
table: Use smp::all_cpus() to iterate over all CPUs locally
So that it's in-sync with table::get_snapshot_details(). Next patches
will improve this place even further.
Also, there can be many snapshots and vector can grow large, but that's
less of an issue here.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.
in this change,
* utils: drop the range formatters in to_string.hh and to_string.c, as
we don't use them anymore. and the tests for them in
test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
we are not able to print the elements using operator<< anymore
after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
due to a bug in {fmt} v9, {fmt} fails to format a range whose
element type is `basic_sstring<uint8_t>`, as it considers it
as a string-like type, but `basic_sstring<uint8_t>`'s char type
is signed char, not char. this issue does not exist in {fmt} v10,
so, in this change, we add a workaround to explicitly specialize
the type trait to assure that {fmt} format this type using its
`fmt::formatter` specialization instead of trying to format it
as a string. also, {fmt}'s generic ranges formatter calls the
pair formatter's `set_brackets()` and `set_separator()` methods
when printing the range, but operator<< based formatter does not
provide these method, we have to include this change in the change
switching to {fmt}, otherwise the change specializing
`fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
for comparing values. but without the operator<< based formatters,
Boost.Test would not be able to print them. after removing
the homebrew formatters, we need to use the generic
`boost_test_print_type()` helper to do this job. so we are
including `test_utils.hh` in tests so that we can print
the formattable types.
* treewide: add "#include "utils/to_string.hh" where
`fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.
Refs scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Now database is constructed with sharded<directory_semaphore>, but it no
longer needs sharded, local is enough.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It was only used by table taking snapshot code. Now it uses sstables
manager's reference and database can stop carrying it around.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
sstables_manager now depends on system_keyspace for access to the
system.sstables table, needed by object storage. This violates
modularity, since sstables_manager is a relatively low-level leaf
module while system_keyspace integrates large parts of the system
(including, indirectly, sstables_manager).
One area where this is grating is sstables::test_env, which has
to include the much higher level cql_test_env to accommodate it.
Fix this by having sstables_manager expose its dependency on
system_keyspace as an interface, sstables_registry, and have
system_keyspace implement the glue logic in
system_keyspace_sstables_manager.
Closesscylladb/scylladb#17868
New keyspace is added similarly as system_schema keyspace,
it's being registred via system_keyspace::make which calls
all_tables to build its schema.
Dummy table 'roles' is added as keyspaces are being currently
registered by walking through their tables. Full table schemas
will be added in subsequent commits.
Change can be observed via cqlsh:
cassandra@cqlsh> describe keyspaces;
system_auth_v2 system_schema system system_distributed_everywhere
system_auth system_distributed system_traces
cassandra@cqlsh> describe keyspace system_auth_v2;
CREATE KEYSPACE system_auth_v2 WITH replication = {'class': 'LocalStrategy'} AND durable_writes = true;
CREATE TABLE system_auth_v2.roles (
role text PRIMARY KEY
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = 'comment'
AND compaction = {'class': 'SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 604800
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
This patch changes get_unlimited_query_max_result_size():
* Also set the page-size field, not just the soft/hard limits
* Renames it to get_query_max_result_size()
* Update callers, specifically storage_proxy::get_max_result_size(),
which now has a much simpler common return path and has to drop the
page size on one rare return path.
This is a purely mechanical change, no behaviour is changed.
Add a new method database::maybe_get_keyspace_local_ranges that
optionally returns the owned ranges for the given keyspace if it has a
effective_replication_map for the entire keyspace.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
During shutdown, as all system tables are closed in parallel, there is a
possibility of a race condition between compaction stoppage and the
closure of the compaction_history table. So, quiesce all the compaction
tasks before attempting to close the tables.
Fixes#15721
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#17218
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.
When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().
Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17314
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `gc_clock::time_point`,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17254
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `db::write_type`, and drop
its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17093
This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.
This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
This change removes usage of db::config to
get path of commitlog_directory. Instead, it
introduces a new parameter to directly pass
the path to db::commitlog::config::from_db_config().
Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
This change replaces the usage of db::config with
usage of utils::directories to get dirs paths in
replica::database class.
Moreover, it adjusts tests that require construction
of replica::database - its constructor has been
changed to accept utils::directories object.
Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables.
This PR addresses both:
* When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly.
* When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`.
Tests are also added.
Fixes: #16492Closesscylladb/scylladb#16517
* github.com:scylladb/scylladb:
test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI
test/cql-pytest: test_tools.py: extract some fixture logic to functions
test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas
tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas
test/boost/schema_loader_test: add test for mvs and indexes
tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL
replica/database: extract existing_index_names and get_available_index_name
tools/schema_loader: make real_db.tables the only source of truth on existing tables
tools/schema_loader: table(): store const keyspace&
tools/schema_loader: make database,keyspace,table non-movable
cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
cql3/statements/create_index_statement: make build_index_schema() public
cql3/statements/create_index_statement: relax some method's dependence on qp
cql3/statements/create_view_statement: make prepare_view() public
this change addresses the possible data resurrection after
"nodetool compact" and "nodetool flush" commands. and prepare for
the fix of a similar data resurrection issue after "nodetool cleanup".
active commitlog segments are recycled in the background once they are
discarded.
and there is a chance that we could have data resurrection even after
"nodetool cleanup", because the mutations in commitlog's active segments
could change the tables which are supposed to be removed by
"nodetool cleanup", so as a solution to address this problem in the
pre-tablets era, we force new active segments of commitlog, and flush the
involved memtables. since the active segments are discarded in the
background, the completion of the "nodetool cleanup" does not guarantee
that these mutation won't be applied to memtable when server restarts,
if it is killed right away.
the same applies to "force_flush", "force_compaction" and
"force_keyspace_compaction" API calls which are used by nodetool as
well. quote from Benny's comment
> If major comapction doesn't wait for the commitlog deletion it is
> also exposed to data resurrection since theoretically it could purge
> tombstones based on the assumption that commitlog would not resurrect
> data that they might shadow, BUT on a crash/restart scenario commitlog
> replay would happen since the commitlog segments weren't deleted -
> breaking the contract with compaction.
so to ensure that the active segments are reclaimed upon completion of
"nodetool cleanup", "nodetool compact" and "nodetool flush" commands,
let's wait for pending deletes in `database::flush_all_tables()`, so the
caller wait until the reclamation of deleted active segments completes.
Refs #4734
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16915
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for db::operation_type, and
remove their operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16832
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we
* define a formatter for `db::consistency_level`
* drop its `operator<<`, as it is not used anymore
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16755
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.
Fixes#16180
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#16658
To standalone functions in index/secondary_index_manager.{hh,cc}. This
way, alternative data dictionary implementations (in
tools/schema_loader.cc), can also re-use this code without having to
instantiate a database or resorting to copy-paste.
The functions are slighly changed: there are some additional params
added to cover for things not internally available in the database
object. const sstring& is converted to std::string_view.
In 7d5e22b43b ("replica: memtable: don't forget memtable
memory allocation statistics") we taught memtable_list to remember
learned memory allocation reserves so a new memtable inherits these
statistics from an older memtable. Share it now further across tablets
that belong to the same table as well. This helps the statistics be more
accurate for tablets that are migrated in, as they can share existing
tablet's memory allocation history.
Closesscylladb/scylladb#16571
* github.com:scylladb/scylladb:
table, memtable: share log-structured allocator statistics across all memtables in a table
memtable: consolidate _read_section, _allocating_section in a struct
Those two members are passed from memtable_list to memtable. Since we
wish to pass them from table, it becomes awkward to pass them as two
separate variables as their contents are specific to memtable internals.
Wrap them in a name that indicates their role (being table-wide shared
data for memtables) and pass them as a unit.