Commit Graph

50 Commits

Author SHA1 Message Date
Pavel Emelyanov
6c115c691f sstables_loader: Provide endpoint type for get_sstables_from_object_store()
Currently the method scans db::config to find one. It has some
drawbacks. First, it's not very nice. Second, it needs to handle the
case when the endpoint is missing, while it relally never is. Third, the
type in config entry is not necessarily set.

It's nicer to get the type from storage manager.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-12-02 11:18:32 +03:00
Botond Dénes
86ed627fc4 compaction: move code to namespace compaction
The namespace usage in this directory is very inconsistent, with files
and classes scattered in:
* global namespace
* namespace compaction
* namespace sstables

With cases, where all three used in the same file. This code used to
live in sstables/ and some of it still retains namespace sstables as a
heritage of that time. The mismatch between the dir (future module) and
the namespace used is confusing, so finish the migration and move all
code in compaction/ to namespace compaction too.

This patch, although large, is mechanic and only the following kind of
changes are made:
* replace namespace sstable {} with namespace compaction {}
* add namespace compaction {}
* drop/add sstables::
* drop/add compaction::
* move around forward-declarations so they are in the correct namespace
  context

This refactoring revealed some awkward leftover coupling between
sstables and compaction, in sstables/sstable_set.cc, where the
make_sstable_set() methods of compaction strategies are implemented.
2025-09-25 15:03:56 +03:00
Pavel Emelyanov
a1ea553fe1 code: Replace distributed<> with sharded<>
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.

Most of the patch is generated with

    for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
    for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done

and a small manual change in test/perf/perf.hh

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#26136
2025-09-19 12:22:51 +02:00
Michał Jadwiszczak
233f4dcee3 db/view/view_building_worker: register staging sstable to view building coordinator when needed
Change return type of `check_needs_view_update_path()`. Instead of
retrning bool which tells whether to use staging directory (and register
to `view_update_generator`) or use normal directory.

Now the function returns enum with possible values:
- `normal_directory` - use normal directory for the sstable
- `staging_directly_to_generator` - use staging directory and register
      to `view_update_generator`
- `staging_managed_by_vbc` - use staging directory but don't register it
      to `view_update_generator` but create view building tasks for
      later

The third option is new, it's used when the table has any view which is
in building process currrently. In this case, registering it to `view_update_generator`
prematurely may lead to base-view inconsistency
(for example when a replica is in a pending state).
2025-08-27 10:23:03 +02:00
Benny Halevy
b01524c5a3 replica: distributed_loader: stop tracking highest_generation
It is not needed anymore as we always generate
uuid generations.

Move highest_generation_seen(sharded<sstables::sstable_directory>& directory)
to sstables/sstable_directory module.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-08 11:46:21 +03:00
Robert Bindar
ca1a9c8d01 Add support for nodetool refresh --skip-reshape
This patch adds the new option in nodetool, patches the
load_new_ss_tables REST request with a new parameter and
skips the reshape step in refresh if this flag is passed.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#24409
Fixes: #24365
2025-06-10 12:52:13 +03:00
Pavel Emelyanov
4ab049ac8d code: Push bool skip_cleanup flag around
Just put the boolean into the callstack between API and distributed
loader to reduce the churn in the next patches. No functional changes,
flag is false and unused.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-05-13 16:51:21 +03:00
Pavel Emelyanov
b5a124f60c sstable_directory: Move highest_generation_seen() to distributed_loader.cc
This method is only used by the loader code (and tests). Also, There's the
highest_version_seen() peer that sits in the loader code either.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23324
2025-04-01 09:15:14 +03:00
Kefu Chai
57b14220ce tree: remove unused "#include"s
these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

in which, instead of using `seastarx.hh`, `readers/mutation_reader.hh`,
use `using seastar::future` to include `future` in the global namespace,
this makes `readers/mutation_reader.hh` a header exposing `future<>`,
but this is not a good practice, because, unlike `seastarx.hh` or
`seastar/core/future.hh`, `reader/mutation_reader.hh`  is not
responsible for exposing seastar declarations. so, we trade the
using statement for `#include "seastarx.hh"` in that file to decouple
the source files including it from this header because of this statement.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22439
2025-01-28 14:12:06 +03:00
Pavel Emelyanov
bb094cc099 Merge 'Make restore task abortable' from Calle Wilund
Fixes #20717

Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data.

Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()".

Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader.

There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored.

Closes scylladb/scylladb#21567

* github.com:scylladb/scylladb:
  test_backup: Add restore abort test case
  sstables_loader: Make restore task abortable
  distributed_loader: Add optional abort_source to get_sstables_from_object_store
  s3_storage: Add optional abort_source to params/object
  s3::client: Make "readable_file" abortable
2024-12-19 12:23:33 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Calle Wilund
6a2a18a2fc distributed_loader: Add optional abort_source to get_sstables_from_object_store 2024-12-02 12:30:24 +00:00
Kefu Chai
787ea4b1d4 treewide: accept list of sstables in "restore" API
before this change, we enumerate the sstables tracked by the
system.sstables table, and restore them when serving
requests to "storage_service/restore" API. this works fine with
"storage_service/backup" API. but this "restore" API cannot be
used as a drop-in replacement of the rclone based API currently
used by scylla-manager.

in order to fill the gap, in this change:

* add the "prefix" parameter for specifying the shared prefix of
  sstables
* add the "sstables" parameter for specifying the list of  TOC
  components of sstables
* remove the "snapshot" parameter, as we don't encode the prefix
  on scylla's end anymore.
* make the "table" parameter mandatory.

Fixes scylladb/scylladb#20461
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-01 23:24:56 +08:00
Pavel Emelyanov
11a04bfb66 code: Introduce restore API method
The method starts a task that uses sstables_loader load-and-stream
functionality to bring new sstables into the cluster. The existing
load-and-stream picks up sstables from upload/ directory, the newly
introduced task collects them from S3 bucket and given prefix (that
correspond to the path where backup API method put them).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-28 15:42:49 +03:00
Pavel Emelyanov
6a006d2255 distributed_loader: Split get_sstables_from_upload_dir()
Next patches will need this method to initialize sstable_directory
differently and then do its regular processing. For that, split the
method into two, next patch will re-use the common part it needs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
66d72e010c distributed_loader: Lock table via global table ptr
The lock_table() method needs database, ks and cf to find the table on
all shards. The same can be achieved with the help of global_table_ptr
thing that all the core callers already have at hand.

There's a test that doesn't have global table, but it can get one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20139
2024-08-14 20:53:21 +03:00
Calle Wilund
d6742e9bce distributed_loader: Remove load_prio_keyspaces
Fixes #13334

All required code paths (see enterprise) now uses
extensions::is_extension_internal_keyspace.
The old mechanism can be removed. One less global var.

Closes scylladb/scylladb#20047
2024-08-08 12:10:27 +03:00
Pavel Emelyanov
b728857954 distributed_loader: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:47 +03:00
Pavel Emelyanov
0d946a5fdf distributed_loader: Propagate view_builder& via process_upload_dir()
Preparation to next patches, they'll make use of this new argument

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Petr Gusev
b70bca71bc system_keyspace: move load_truncation_times into distributed_loader::populate_keyspace
load_truncation_times() now works only for
schema tables since the rest is not loaded
until distributed_loader::init_non_system_keyspaces.
An attempt to call cf.set_truncation_time
for non-system table just throws an exception,
which is caught and logged with debug level.
This means that the call cf.get_truncation_time in
paxos_state.cc has never worked as expected.

To fix that we move load_truncation_times()
closer to the point where the tables are loaded.
The function distributed_loader::populate_keyspace is
called for both system and non-system tables. Once
the tables are loaded, we use the 'truncated' table
to initialize _truncated_at field for them.

The truncation_time check for schema tables is also moved
into populate_keyspace since is seems like a more natural
place for it.
2023-10-05 15:19:52 +04:00
Benny Halevy
87d438b234 distributed_loader: populate_keyspace: iterate over datadirs in the inner loop
It is more efficient to iterate over multiple data directories
in the inner loop rather than the outer loop.

Following patch will make use of the datadir in
table_populator.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-23 08:50:24 +03:00
Pavel Emelyanov
bb4ddbb996 distributed_loader: Generalize datadir parallelizm loop
Population of keyspaces happens first fo system keyspaces, then for
non-system ones. Both methods iterate over config datadirs to populate
from all configured directories. This patch generalizes this loop into
the populate_keyspace() method.

(indentation is deliberately left broken)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-15 17:49:53 +03:00
Pavel Emelyanov
0430ebf851 distributed_loader: Provide keyspace ref to populate_keyspace
The method in question tries to find keyspace reference on the database
by the given keyspace name. However, one of the callers aready has the
keyspace reference at hands and can just pass it. The other calls can
find the keyspace on its own.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-15 17:49:03 +03:00
Petr Gusev
beb29f094b system_keyspace: drop load phases
We want to switch system.scylla_local table to the
schema commitlog, but load phases hamper here - schema
commitlog is initialized after phase1,
so a table which is using it should be moved to phase2,
but system.scylla_local contains features, and we need
them before  schema commitlog initialization for
SCHEMA_COMMITLOG feature.

In this commit we are taking a different approach to
loading system tables. First, we load them all in
one pass in 'readonly' mode. In this mode, the table
cannot be written to and has not yet been assigned
a commit log. To achieve this we've added _readonly bool field
to the table class, it's initialized to true in table's
constructor. In addition, we changed the table constructor
to always assign nullptr to commitlog, and we trigger
an internal error if table.commitlog() property is accessed
while the table is in readonly mode. Then, after
triggering on_system_tables_loaded notifications on
feature_service and sstable_format_selector, we call
system_keyspace::mark_writable and eventually
table::mark_ready_for_writes which selects the
proper commitlog and marks the table as writable.

In sstable_compaction_test we drop several
mark_ready_for_writes calls since they are redundant,
the table has already been made writable in
env.make_table_for_tests call.

The table::commitlog function either returns the current
commitlog or causes an error if the table is readonly. This
didn't work for virtual tables, since they never called
mark_ready_for_writes. In this commit we add this
call to initialize_virtual_tables.
2023-09-13 23:17:20 +04:00
Petr Gusev
c4787a160b system_keyspace: remove unused parameter 2023-09-13 23:00:15 +04:00
Kamil Braun
33c19baabc db: system_keyspace: take simpler service references in make
Take references to services which are initialized earlier. The
references to `gossiper`, `storage_service` and `raft_group0_registry`
are no longer needed.

This will allow us to move the `make` step right after starting
`system_keyspace`.
2023-06-18 13:39:27 +02:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Pavel Emelyanov
3d7122d2fe distributed_loader: Move garbage collecting into sstable_directory
It's the directory that owns the components lister and can reason about
the way to pick up dangling bits, be it local directories or entries
from the ownership table.

First thing to do is to move the g.c. code into sstable_directory. While
at it -- convert ssting dir into fs::path dir and switch logger.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-17 15:16:23 +03:00
Raphael S. Carvalho
fe6df3d270 sstable_loader: Discard SSTable bloom filter on load-and-stream
Load-and-stream reads the entire content from SSTables, therefore it can
afford to discard the bloom filter that might otherwise consume a significant
amount of memory. Bloom filters are only needed by compaction and other
replica::table operations that might want to check the presence of keys
in the SSTable files, like single-partition reads.

It's not uncommon to see Data:Filter ratio of less than 100:1, meaning
that for ~300G of data, filters will take ~3G.

In addition to saving memory footprint, it also reduces operation time
as load-and-stream no longer have to read, parse and build the filters
from disk into memory.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-04-13 11:34:22 -03:00
Benny Halevy
aa4b18f8fb distributed_loader: reshard: add optional owned_ranges_ptr param
For passing owned_ranges_ptr from
distributed_loader::process_upload_dir.

Refs #11933

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:57:41 +03:00
Petr Gusev
5a5d664a5a init_system_keyspace: refactoring towards explicit load phases
We aim (#12642) to use the schema commit log
for raft tables. Now they are loaded at
the first call to init_system_keyspace in
main.cc, but the schema commitlog is only
initialized shortly before the second
call. This is important, since the schema
commitlog initialization
(database::before_schema_keyspace_init)
needs to access schema commitlog feature,
which is loaded from system.scylla_local
and therefore is only available after the
first init_system_keyspace call.

So the idea is to defer the loading of the raft tables
until the second call to init_system_keyspace,
just as it works for schema tables.
For this we need a tool to mark which tables
should be loaded in the first or second phase.

To do this, in this patch we introduce system_table_load_phase
enum. It's set in the schema_static_props for schema tables.
It replaces the system_keyspace::table_selector in the
signature of init_system_keyspace.

The call site for populate_keyspace in init_system_keyspace
was changed, table_selector.contains_keyspace was replaced with
db.local().has_keyspace. This check prevents calling
populate_keyspace(system_schema) on phase1, but allows for
populate_keyspace(system) on phase2 (to init raft tables).
On this second call some tables from system keyspace
(e.g. system.local) may have already been populated on phase1.
This check protects from double-populating them, since every
populated cf is marked as ready_for_writes.
2023-03-24 15:54:46 +04:00
Pavel Emelyanov
e67751ee92 distributed_loader: Let make_sstables_available choose target directory
When sstables are loaded from upload/ subdir, the final step is to move
them from this directory into base or staging one. The uploading code
evaluates the target directory, then pushes it down the stack towards
make_sstables_available() method.

This patch replaces the path argument with bool to_staging one. The
goal is to remove the knowlege of exact sstable location (nowadays --
its files' path) from the distributed loader and keep it in sstable
object itself. Next patches will make full use of this change.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 17:23:59 +03:00
Pavel Emelyanov
0c7efe38e1 distributed_loader: Rename table_population_metadata
It used to be just metadata by providing the meta for population, now it
does the population by itself, so rename it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-15 20:15:04 +03:00
Pavel Emelyanov
16fca3fa8a distributed_loader: Move populate_column_family() into population meta
This ownership change also requires the auto& = *this alias and extra
specification where to call reshard() and reshape() from.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-15 19:57:41 +03:00
Pavel Emelyanov
e6e65c87d5 sstable_directory: Add io-prio argument to .reshard()
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:31:41 +03:00
Pavel Emelyanov
420fc8d4df sstable_directory: Add io-prio argument to .reshape()
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-07 19:22:27 +03:00
Kamil Braun
a483915c62 db: system_keyspace: add a virtual table with raft configuration
Add a new virtual table `system.raft_state` that shows the currently
operating Raft configuration for each present group. The schema is the
same as `system.raft_snapshot_config` (the latter shows the config from
the last snapshot). In the future we plan to add more columns to this
table, showing more information (like the current leader and term),
hence the generic name.

Adding the table requires some plumbing of
`sharded<raft_group_registry>&` through function parameters to make it
accessible from `register_virtual_tables`, but it's mostly
straightforward.

Also added some APIs to `raft_group_registry` to list all groups and
find a given group (returning `nullptr` if one isn't found, not throwing
an exception).
2023-01-17 12:28:00 +01:00
Pavel Emelyanov
7ca5e143d7 sstable_directory: Convert sort-sstables argument to flags struct
The sstable_directory::process_sstable_dir() accepts a boolean to
control its behavior when collecting sstables. Turn this boolean into a
structure of flags. The intention is to extend this flags set in the
future (next patch).

This boolean is true all the time, but one place sets it to true in a
"verbose" manner, like this:

        bool sort_sstables_according_to_owner = false;
        process_sstable_dir(directory, sort_sstables_according_to_owner).get();

the local variable is not used anymore. Using designated initializers
solves the verbosity in a nicer manner.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-11-22 18:19:23 +03:00
Benny Halevy
119c0f3983 distributed_loader: pre-load all sstables metadata for table before populating it
We should scan all sstables in the table directory and its
subdirectories to determine the highest sstable version and generation
before using it for creating new sstables (via reshard or reshape).

Fixes scylladb/scylladb#11793

Note: table_population_metadata::start_subdir is called
in a seastar thread to facilitate backporting to old versions
that do not support coroutines yet.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-19 14:16:57 +03:00
Pavel Emelyanov
9f79525f8e distributed_loader: Pass sys_ks argument to init_system_keyspace()
It's final destination is virtual tabls registration code called from
init_system_keyspace() eventually

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-06 17:55:03 +03:00
Calle Wilund
d9c391e366 Revert "distributed_loader: Remove unused load-prio manipulations"
This reverts commit 7396de72b1.

In 7396de7 (and refactorings before it) the set of prioritized keyspaces (and processing thereof)
was removed, due to apparent non-usage (which is true for open-source version).

This functionality is however required for certain features of the enterprise version (ear).
As such is needs to be restored and reenabled. This reverts the actual commit, patch after
ensures we use the prio set.
2022-08-23 10:34:05 +00:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Tomasz Grabiec
c5ad05c819 db: Allow splitting initiatlization of system tables
We will need some system tables to be initialized earlier in the boot
so that system.scylla_local can be read before schema tables are
initialized.
2022-07-06 22:08:56 +02:00
Pavel Emelyanov' via ScyllaDB development
b0b29edcd7 distributed-loader: Remove ensure_system_table_directories
It looks like the exactly same code is called few steps above via

distributed_loader::init_system_keyspace
 `- distributed_loader::populate_keyspace

While at it -- move the supervisor::notify("loading system sstables")
handing around in the more suitable location.

tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/981/

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220621165313.31284-1-xemul@scylladb.com>
2022-06-22 13:59:00 +03:00
Botond Dénes
c450508954 Merge "Introduce sharded<system_keyspace> instance" from Pavel Emelyanov
"
Making the system-keyspace into a standard sharded instance will
help to fix several dependency knots.

First, the global qctx and local-cache both will be moved onto the
sys-ks, all their users will be patched to depend on system-keyspace.
Now it's not quite so, but we're moving towards this state.

Second, snitch instance now sits in the middle of another dependency
loop. To untie one the preferred ip and dc/rack info should be
moved onto system keyspace altogether (now it's scattered over several
places). The sys-ks thus needs to be a sharded service with some
state.

This set makes system-keyspace sharded instance, equipps it with all
the dependencies it needs and passes it as dependency into storage
service, migration manager and API. This helps eliminating a good
portion of global qctx/cache usage and prepares the ground for snitch
rework.

tests: unit(dev)
       v1: unit(debug), dtest.simple_boot_shutdown(dev)
"

* 'br-sharded-system-keyspace-instance-2' of https://github.com/xemul/scylla: (25 commits)
  system_keyspace: Make load_host_ids non-static
  system_keyspace: Make load_tokens non-static
  system_keyspace: Make remove_endpoint and update_tokens non-static
  system_keyspace: Coroutinize update_tokens
  system_keyspace: Coroutinize remove_endpoint
  system_keyspace: Make update_cached_values non-static
  system_keyspace: Coroutinuze update_peer_info
  system_keyspace: Make update_schema_version non-static
  schema_tables: Add sharded<system_keyspace> argument to update_schema_version_and_announce
  replica: Push sharded<system_keyspace> down to parse_system_tables
  api: Carry sharded<system_keyspace> reference along
  storage_service: Keep sharded<system_keyspace> reference
  migration_manager: Keep sharded<system_keyspace> reference
  system_keyspace: Remove temporary qp variable
  system_keyspace: Make get_preferred_ips non-static
  system_keyspace: Make cache_truncation_record non-static
  system_keyspace: Make check_health non-static
  system_keyspace: Make build_bootstrap_info non-static
  system_keyspace: Make build_dc_rack_info non-static
  system_keyspace: Make setup_version non-static
  ...
2022-03-17 08:16:29 +02:00
Benny Halevy
a1d0f089c8 replica: distributed_database: populate_column_family: trigger offstrategy compaction only for the base directory
In https://github.com/scylladb/scylla/issues/10218
we see off-strategy compaction happening on a table
during the initial phases of
`distributed_loader::populate_column_family`.

It is caused by triggering offtrategy compaction
too early, when sstables are populated from the staging
directory in a144d30162.

We need to trigger offstrategy compaction only of the base
table directory, never the staging or quarantine dirs.

Fixes #10218

Test: unit(dev)
DTest: materialized_views_test.py::TestInterruptBuildProcess

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220316152812.3344634-1-bhalevy@scylladb.com>
2022-03-16 18:57:00 +02:00
Pavel Emelyanov
009c449cc3 replica: Push sharded<system_keyspace> down to parse_system_tables
The method needs to call merge_schema() that will need system keyspace
instance at hand. The parse_s._t. method is boot-time one, pushing the
main-local instance through it is fine

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Raphael S. Carvalho
a144d30162 distributed_loader: postpone reshape of repair-originated sstables
SSTables created by repair will potentially not conform to the compaction strategy
layout goal. If node shuts down before off-strategy has a chance to
reshape those files, node will be forced to reshape them on restart. That
causes unexpected downtime. Turns out we can skip reshape of those files
on boot, and allow them to be reshaped after node becomes online, as if
the node never went down. Those files will go through same procedure as
files created by repair-based ops. They will be placed in maintenance set,
and be reshaped iteratively until ready for integration into the main set.

Fixes #9895.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-12 13:14:31 -03:00
Avi Kivity
4392c20bd3 replica: move distributed_loader into replica module
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.
2022-01-10 15:25:28 +02:00