Commit Graph

379 Commits

Author SHA1 Message Date
Lakshmi Narayanan Sreethar
8925a2c3cb replica/database: add an optional variant to get_keyspace_local_ranges
Add a new method database::maybe_get_keyspace_local_ranges that
optionally returns the owned ranges for the given keyspace if it has a
effective_replication_map for the entire keyspace.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-15 17:44:47 +05:30
Lakshmi Narayanan Sreethar
3b7b315f6a replica/database: quiesce compaction before closing system tables during shutdown
During shutdown, as all system tables are closed in parallel, there is a
possibility of a race condition between compaction stoppage and the
closure of the compaction_history table. So, quiesce all the compaction
tasks before attempting to close the tables.

Fixes #15721

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17218
2024-02-15 09:44:16 +02:00
Patryk Wrobel
a3fb44cbca Rename keyspace::get_effective_replication_map()
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.

When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().

Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17314
2024-02-13 20:22:02 +02:00
Kefu Chai
cfb2c2c758 db: add formatter for gc_clock::time_point
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gc_clock::time_point`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17254
2024-02-11 16:39:25 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Kefu Chai
aad8035bed replica/database: use structured-bind when appropriate
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17104
2024-02-01 16:31:29 +02:00
Kefu Chai
7a8e8c2ced db: add formatter for db::write_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::write_type`, and drop
its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17093
2024-02-01 10:22:45 +02:00
Pavel Emelyanov
7c5c89ba8d Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel"
This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.

This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
2024-01-31 15:08:14 +03:00
Patryk Wrobel
804afffb11 db/commitlog: do not use db::config to get dirs
This change removes usage of db::config to
get path of commitlog_directory. Instead, it
introduces a new parameter to directly pass
the path to db::commitlog::config::from_db_config().

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
9483d149af Use utils::directories to get dirs paths in replica::database
This change replaces the usage of db::config with
usage of utils::directories to get dirs paths in
replica::database class.

Moreover, it adjusts tests that require construction
of replica::database - its constructor has been
changed to accept utils::directories object.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Nadav Har'El
69a68e35dd Merge 'scylla-sstable: add support for loading schema of views and indexes' from Botond Dénes
Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables.
This PR addresses both:
* When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly.
* When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`.

Tests are also added.

Fixes: #16492

Closes scylladb/scylladb#16517

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI
  test/cql-pytest: test_tools.py: extract some fixture logic to functions
  test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
  tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas
  tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas
  test/boost/schema_loader_test: add test for mvs and indexes
  tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL
  replica/database: extract existing_index_names and get_available_index_name
  tools/schema_loader: make real_db.tables the only source of truth on existing tables
  tools/schema_loader: table(): store const keyspace&
  tools/schema_loader: make database,keyspace,table non-movable
  cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
  cql3/statements/create_index_statement: make build_index_schema() public
  cql3/statements/create_index_statement: relax some method's dependence on qp
  cql3/statements/create_view_statement: make prepare_view() public
2024-01-24 23:36:54 +02:00
Kefu Chai
33794eca19 database: wait until commitlog are reclaimed in flush_all_tables()
this change addresses the possible data resurrection after
"nodetool compact" and "nodetool flush" commands. and prepare for
the fix of a similar data resurrection issue after "nodetool cleanup".

active commitlog segments are recycled in the background once they are
discarded.

and there is a chance that we could have data resurrection even after
"nodetool cleanup", because the mutations in commitlog's active segments
could change the tables which are supposed to be removed by
"nodetool cleanup", so as a solution to address this problem in the
pre-tablets era, we force new active segments of commitlog, and flush the
involved memtables. since the active segments are discarded in the
background, the completion of the "nodetool cleanup" does not guarantee
that these mutation won't be applied to memtable when server restarts,
if it is killed right away.

the same applies to "force_flush", "force_compaction" and
"force_keyspace_compaction" API calls which are used by nodetool as
well. quote from Benny's comment

> If major comapction doesn't wait for the commitlog deletion it is
> also exposed to data resurrection since theoretically it could purge
> tombstones based on the assumption that commitlog would not resurrect
> data that they might shadow, BUT on a crash/restart scenario commitlog
> replay would happen since the commitlog segments weren't deleted -
> breaking the contract with compaction.

so to ensure that the active segments are reclaimed upon completion of
"nodetool cleanup", "nodetool compact" and "nodetool flush" commands,
let's wait for pending deletes in `database::flush_all_tables()`, so the
caller wait until the reclamation of deleted active segments completes.

Refs #4734
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16915
2024-01-22 17:31:57 +02:00
Kefu Chai
5c0484cb02 db: add formatter for db::operation_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for db::operation_type, and
remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16832
2024-01-19 10:16:41 +02:00
Kefu Chai
3f0fbdcd86 replica: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16810
2024-01-17 09:27:09 +02:00
Kefu Chai
344ea25ed8 db: add fmt::format for db::consistency_level
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for `db::consistency_level`
* drop its `operator<<`, as it is not used anymore

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16755
2024-01-12 10:49:00 +02:00
Lakshmi Narayanan Sreethar
76f0d5e35b reader_permit: store schema_ptr instead of raw schema pointer
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.

Fixes #16180

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16658
2024-01-11 08:37:56 +02:00
Botond Dénes
94aac35169 replica/database: extract existing_index_names and get_available_index_name
To standalone functions in index/secondary_index_manager.{hh,cc}. This
way, alternative data dictionary implementations (in
tools/schema_loader.cc), can also re-use this code without having to
instantiate a database or resorting to copy-paste.

The functions are slighly changed: there are some additional params
added to cover for things not internally available in the database
object. const sstring& is converted to std::string_view.
2024-01-04 03:20:17 -05:00
Tomasz Grabiec
715e062d4a Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity
In 7d5e22b43b ("replica: memtable: don't forget memtable
memory allocation statistics") we taught memtable_list to remember
learned memory allocation reserves so a new memtable inherits these
statistics from an older memtable. Share it now further across tablets
that belong to the same table as well. This helps the statistics be more
accurate for tablets that are migrated in, as they can share existing
tablet's memory allocation history.

Closes scylladb/scylladb#16571

* github.com:scylladb/scylladb:
  table, memtable: share log-structured allocator statistics across all memtables in a table
  memtable: consolidate _read_section, _allocating_section in a struct
2024-01-03 14:03:40 +01:00
Benny Halevy
fadcef01f5 database: setup_scylla_memory_diagnostics_producer: replace infinity sign with unlimited string
The infinity unicode sign used for dumping read concurrency semaphore
state, `∞` may be misrendered.
For example: https://jenkins.scylladb.com/job/scylla-master/job/dtest-release/451/artifact/logs-full.release.011/1703288463175_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_add_dc_during_mv_insert/node1.log
```
  Read Concurrency Semaphores:
    user: 0/100, 1K/9M, queued: 0
    streaming: 0/10, 0B/9M, queued: 0
    system: 0/10, 0B/9M, queued: 0
    compaction: 0/∞, 0B/∞
```

Instead, just print the word `unlimited`.

This was introduced in 34c213f9bb

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16534
2024-01-03 14:46:10 +02:00
Avi Kivity
02111d6754 memtable: consolidate _read_section, _allocating_section in a struct
Those two members are passed from memtable_list to memtable. Since we
wish to pass them from table, it becomes awkward to pass them as two
separate variables as their contents are specific to memtable internals.

Wrap them in a name that indicates their role (being table-wide shared
data for memtables) and pass them as a unit.
2023-12-26 21:11:48 +02:00
Avi Kivity
a7efaca878 Merge 'Move initial_tablets to system_schema.scylla_keyspaces' from Pavel Emelyanov
Right now the initial_tablets is kept as replication strategy option in the legacy system_schema.keyspaces table. However, r.s. options are all considered to be replication factors, not anything else. Other than being confusing, this also makes it impossible to extend keyspace configuration with non-integer tablets-related values.

This PR moves the initial_tablets into scylla-specific part of the schema. This opens a way to more ~~ugly~~ flexible ways of configuring tablets for keyspace, in particular it should be possible to use boolean on/off switch in CREATE KEYSPACE or some other trick we find appropriate.

Mos of what this PR does is extends arguments passed around keyspace_metadata and abstract_replication_strategy. The essence of the change is in last patches
* schema_tables: Relax extract_scylla_specific_ks_info() check
* locator,schema: Move initial tablets from r.s. options to params

refs: #16319
refs: #16364

Closes scylladb/scylladb#16555

* github.com:scylladb/scylladb:
  test: Add sanity tests for tablets initialization and altering
  locator,schema: Move initial tablets from r.s. options to params
  schema_tables: Relax extract_scylla_specific_ks_info() check
  locator: Keep optional initial_tablets on r.s. params
  ks_prop_defs: Add initial_tablets& arg to prepare_options()
  keyspace_metadata: Carry optional<initial_tablets> on board
  locator: Pass abstract_replication_strategy& into validate_tablet_options()
  locator: Carry r.s. params into process_tablet_options()
  locator: Call create_replication_strategy() with r.s. params
  locator: Wrap replication_strategy_config_options into replication_strategy_params
  locator: Use local members in ..._replication_strategy constructors
2023-12-25 17:44:10 +02:00
Pavel Emelyanov
562fcf0c19 locator: Keep optional initial_tablets on r.s. params
Now all the callers have it at hands (spoiler: not yet initialized, but
still) so the params can also have it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:02:41 +03:00
Pavel Emelyanov
a67c535539 keyspace_metadata: Carry optional<initial_tablets> on board
The object in question fully describes the keyspace to be created and,
among other things, contains replication strategy options. Next patches
move the "initial_tablets" option out of those options and keep it
separately, so the ks metadata should also carry this option separately.

This patch is _just_ extending the metadata creation API, in fact the
new field is unused (write-only) so all the places that need to provide
this data keep it disengaged and are explicitly marked with FIXME
comment. Next patches will fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:58:05 +03:00
Pavel Emelyanov
a943bd927b locator: Call create_replication_strategy() with r.s. params
Previous patch added params to r.s. classes' constructors, but callers
don't construct those directly, instead they use the create_r.s.()
wrapper. This patch adds params to the wrapper too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:54:59 +03:00
Pavel Emelyanov
f621afa3ec database: Copy storage options too when updating keyspace metadata
When altering a keyspace several keyspace_metadata objects are created
along the way. The last one, that is then kept on the keyspace_metadata
object, forgets to get its copy of storage options thus transparently
converting to LOCAL type.

The bug surfaces itself when altering replication strategy class for
S3-backed storage -- the 2nd attempt fails, because after the 1st one
the keyspace_metadata gets LOCAL storage options and changing storage
options is not allowed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16524
2023-12-25 13:31:15 +02:00
Nadav Har'El
79011eeb24 Merge 'virtual_tables, schema_registry: fix use after free related to schema registry' from Avi Kivity
Both virtual tables and schema registry contain thread_local caches that are destroyed
at thread exit. after a Seastar change[1], these destructions can happen after the reactor
is destroyed, triggering a use-after-free.

Fix by scoping the destruction so it takes place earlier.

[1] 101b245ed7

Closes scylladb/scylladb#16510

* github.com:scylladb/scylladb:
  schema_registry, database: flush entries when no longer in use
  virtual_tables: scope virtual tables registry in system_keyspace
2023-12-21 17:10:25 +02:00
Avi Kivity
c00b376a3e schema_registry, database: flush entries when no longer in use
The schema registry disarms internal timers when it is destroyed.
This accesses the Seastar reactor. However, after [1] we don't have ordering
between the reactor destruction and the thread_local registry destruction.

Fix this by flushing all entries when the database is destroyed. The
database object is fundamental so it's unlikely we'll have anything
using the registry after it's gone.

[1] 101b245ed7
2023-12-21 17:00:41 +02:00
Kefu Chai
6018e0fea7 database: log when done with truncating
truncating is an unusual operation, and we write a logging message
when the truncate op starts with INFO level, it would be great if
we can have a matching logging messge indicating the end of truncate
on the server side. this would help with investigation the TRUNCATE
timeout spotted on the client. at least we can rule out the problem
happening we server is performing truncate.

Refs #15610
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16247
2023-12-21 13:59:09 +02:00
Avi Kivity
7fce057cda database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics
reader_concurrency_sempaphore are triplicated: each metrics is registered
for streaming, user, and system classes.

To fix, just move the metrics registration from database to
reader_concurrency_sempaphore, so each reader_concurrency_sempaphore
instantiated will register its metrics (if its creator asked for it).

Adjust the names given to reader_concurrency_sempaphore so we don't
change the labels.

scylla-gdb is adjusted to support the new names.
2023-12-13 09:16:18 -05:00
Botond Dénes
e1b30f50be reader_concurrency_semaphore: add register_metrics constructor parameter
To be used in the next patch to control whether the semaphore registers
and exports metrics or not. We want to move metric registration to the
semaphore but we don't want all semaphores to export metrics. The
decision on whether a semaphore should or shouldn't export metrics
should be made on a case-by-case basis so this new parameter has no
default value (except for the for_tests constructor).
2023-12-13 06:25:45 -05:00
Avi Kivity
814f3eb6b5 sstables: name sstables_manager
Soon, the reader_concurrency_semaphore will require a unique
and meaningful name in order to label its metrics. To prepare
for that, name sstable_manager instances. This will be used
to generate a name for sstable_manager's reader_concurrency_semaphore.
2023-12-13 04:40:33 -05:00
Pavel Emelyanov
3eaadfcd4a database: Do not auto snapshot non-local storages' tables
Snapshotting is not yet supported for those (see #13025) and
auto-snapshot would step on internal error. Skip it and print a warning
into logs

fixes #16078

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 13:47:12 +03:00
Pavel Emelyanov
44c076472c database: Simplify snapshot booleans in truncate_table_on_all_shards()
There are three of them in this function -- with_snapshot argument,
auto_snapshot local copy of db::config option and the should_snapshot
local variable that's && of the above two. The code can go with just one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 13:06:28 +03:00
Botond Dénes
d2a88cd8de Merge 'Typos: fix typos in code' from Yaniv Kaul
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16289

* github.com:scylladb/scylladb:
  Update unified/build_unified.sh
  Update main.cc
  Update dist/common/scripts/scylla-housekeeping
  Typos: fix typos in code
2023-12-06 07:36:41 +02:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Benny Halevy
f40bb7c583 database: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Patryk Jędrzejczak
c8ee7d4499 db: make schema commitlog feature mandatory
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.

A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
  cluster feature.

The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.

The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.

When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.

In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.

Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
  assumed to be true.

Closes scylladb/scylladb#16254
2023-12-04 21:02:16 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Benny Halevy
66ba983fe0 compaction_manager: flush_all_tables before major compaction
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb/scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
be763bea34 database: add flush_all_tables
Flushes all tables after forcing force_new_active_segment
of the commitlog to make sure all commitlog segments can
get recycled.

Otherwise, due to "false sharing", rarely-written tables
might inhibit recycling of the commitlog segments they reference.

After f42eb4d1ce,
that won't allow compaction to purge some tombstones based on
the min_gc_time.

To be used in the next patch by major compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Pavel Emelyanov
f2a99ad30a replica: Move storage options validation to sstables manager
Currently the cql statement .validate() callback is responsible for
checking if the non-local storage options are allowed with the
respective feature. Next patch will need to extend this check to also
validate the details of the provided storage options, but doing it at
cql level doesn't seem correct -- it's "too far" from query processor
down to sstables manager.

Good news is that there's a lower-level validation of the new keyspace,
namely the database::validate_new_keyspace() call. Move the storage
options validation into sstables manager, while at it, reimplement it
as a visitor to facilitate further extentions and plug the new
validation to the aforementioned database::validate_new_keyspace().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 15:24:59 +03:00
Pavel Emelyanov
11b704e8b8 replica/{ks|cf}: Move storage init/destroy to sstables manager
It's the manager that knows about storages and it should init/destroy
it. Also the "upload" and "staging" paths are about to be hidden in
sstables/ code, this code move also facilitates that.

The indentation in storage.cc is deliberately broken to make next patch
look nicer (spoiler: it won't have to shift those lines right).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
68cf26587c database: Add get_sstables_manager(bool_class is_system) method
There's one place that does this selection, soon there will appear
another, so it's worth having a convenience helper getter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Kamil Braun
3976808b12 schema_tables: turn view schema fixing code into a sanity check
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.

The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).

The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.

The main motivation of this patch is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.

The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.
2023-10-24 13:33:35 +02:00
Botond Dénes
ad90bb8d87 replica/database: remove "streaming" from dirty memory metric description
We don't have streaming memtables for a while now.

Closes scylladb/scylladb#15638
2023-10-20 13:09:57 +03:00
Patryk Jędrzejczak
fbcd667030 replica: keyspace::create_replication_strategy: remove a redundant parameter
The options parameter is redundant. We always use
`_metadata->strategy_options()` and
`keyspace::create_replication_strategy` already assumes that
`_metadata` is set by using its other fields.

Closes scylladb/scylladb#15776
2023-10-20 10:20:49 +03:00
Avi Kivity
7d5e22b43b replica: memtable: don't forget memtable memory allocation statistics
A memtable object contains two logalloc::allocating_section members
that track memory allocation requirements during reads and writes.
Because these are local to the memtable, each time we seal a memtable
and create a new one, these statistics are forgotten. As a result
we may have to re-learn the typical size of reads and writes, incurring
a small performance penalty.

The solution is to move the allocating_section object to the memtable_list
container. The workload is the same across all memtables of the same
table, so we don't lose discrimination here.

The performance penalty may be increased later if log changes to
memory reserve thresholds including a backtrace, so this reduces the
odds of incurring such a penalty.

Closes scylladb/scylladb#15737
2023-10-18 17:43:33 +02:00
Kamil Braun
c1486fee40 Merge 'commitlog: drop truncation_records after replay' from Petr Gusev
This is a follow-up for #15279 and it fixes two problems.

First, we restore flushes on writes for the tables that were switched to the schema commitlog if `SCHEMA_COMMITLOG` feature is not yet enabled. Otherwise durability is not guaranteed.

Second, we address the problem with truncation records, which could refer to the old commitlog if any of the switched tables were truncated in the past. If the node crashes later, and we replay schema commitlog, we may skip some mutations since their `replay_position`s will be smaller than the `replay_position`s stored for the old commitlog in the `truncated` table.

It turned out that this problem exists even if we don't switch commitlogs for tables. If the node was rebooted the segment ids will start from some small number - they use `steady_clock` which is usually bound to boot time. This means that if the node crashed we may skip the mutations because their RPs will be smaller than the last truncation record RP.

To address this problem we delete truncation records as soon as commitlog is replayed. We also include a test which demonstrates the problem.

Fixes #15354

Closes scylladb/scylladb#15532

* github.com:scylladb/scylladb:
  add test_commitlog
  system.truncated: Remove replay_position data from truncated on start
  main.cc: flush only local memtables when replaying schema commitlog
  main.cc: drop redundant supervisor::notify
  system_keyspace: flush if schema commitlog is not available
2023-10-18 11:14:31 +02:00
Avi Kivity
f42eb4d1ce Merge 'Store and propagage GC timestamp markers from commitlog' from Calle Wilund
Fixes #14870

(Originally suggested by @avikivity). Use commit log stored GC clock min positions to narrow compaction GC bounds.
(Still requires augmented manual flush:es with extensive CL clearing to pass various dtest, but this does not affect "real" execution).

Adds a lowest timestamp of GC clock whenever a CF is added to a CL segment the first time. Because GC clock is wall
clock time and only connected to TTL (not cell/row timestamps), this gives a fairly accurate view of GC low bounds
per segment. This is then (in a rather ugly way) propagated to tombstone_gc_state to narrow the allowed GC bounds for
a CF, based on what is currently left in CL.

Note: this is a rather unoptimized version - no caching or anything. But even so, should not be excessively expensive,
esp. since various other code paths already cache the results.

Closes scylladb/scylladb#15060

* github.com:scylladb/scylladb:
  main/cql_test_env: Augment compaction mgr tombstone_gc_state with CL GC info
  tombstone_gc_state: Add optional callback to augment GC bounds
  commitlog: Add keeping track of approximate lowest GC clock for CF entries
  database: Force new commitlog segment on user initiated flush
  commitlog: Add helper to force new active segment
2023-10-17 18:27:43 +03:00
Petr Gusev
a0aee54f2c add test_commitlog
Check that commitlog provides durability in case
of a node reboot:
* truncate table T, truncation_record RP=1000;
* clean shutdown node/reboot machine/restart node, now RP=~0
since segment ids count from boot time;
* write some data to T; crash/restart
* check data is retained
2023-10-17 18:16:50 +04:00