Commit Graph

15 Commits

Author SHA1 Message Date
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Raphael S. Carvalho
f9e33f7046 sstables: Introduce filter for sstable_directory::reshape
This will be useful to allow sstable_directory user to filter out
sstables that should not be reshaped. The default filter is
implemented as including everything.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-12 11:54:17 -03:00
Benny Halevy
b9c41dc0fd sstables: sstable_directory: remove unncessary include of lister.hh from header file
The source file depends on it, not the header.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-11 17:04:16 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Raphael S. Carvalho
1924e8d2b6 treewide: Move compaction code into a new top-level compaction dir
Since compaction is layered on top of sstables, let's move all compaction code
into a new top-level directory.
This change will give me extra motivation to remove all layer violations, like
sstable calling compaction-specific code, and compaction entanglement with
other components like table and storage service.

Next steps:
- remove all layer violations
- move compaction code in sstables namespace into a new one for compaction.
- move compaction unit tests into its own file

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>
2021-07-07 23:21:51 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Asias He
28007f13f8 distributed_loader: Add get_sstables_from_upload_dir
This function scans sstables under the upload directory and return a list of
sstables for each shard.

Refs #7831
2021-01-16 20:03:17 +08:00
Benny Halevy
d893cbd918 sstable_directory: support sstables with both TemporaryTOC and TOC
Keep descriptors in a map so it could be searched easily by generation.
and possibly delete the descriptor, if found, in the presence of
a temporary toc component.

A following patch will add support to create_links for moving
sstables between directories.  It is based on keeping a TemporaryTOC
file in the destination directory while linking all source components.
If scylla crashes here, the destination sstable will have both
its TemporaryTOC and TOC components and it needs to be removed
to roll the move backwards.

Then, create_links will atomically move the TemporaryTOC from
the destination back to the source directory, to toggle rolling
back to rolling forward by marking the source sstable for removal.
If scylla crashes here, the source sstable will have both
its TemporaryTOC and TOC components and it needs to be removed
to roll the move forward.

Add unit test for this case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:57:40 +02:00
Raphael S. Carvalho
6f805bd123 sstable_directory: Fix 50% space requirement for resharding
This is a regression caused by aebd965f0.

After the sstable_directory changes, resharding now waits for all sstables
to be exhausted before releasing reference to them, which prevents their
resources like disk space and fd from being released. Let's restore the
old behavior of incrementally releasing resources, reducing the space
requirement significantly.

Fixes #7463.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20201020140939.118787-1-raphaelsc@scylladb.com>
2020-10-21 09:51:26 +02:00
Benny Halevy
57cc5f6ae1 sstable_directory: use a external load_semaphore
Although each sstable_directory limits concurrency using
max_concurrent_for_each, there could be a large number
of calls to do_for_each_sstable running in parallel
(e.g per keyspace X per table in the distributed_loader).

To cap parallelism across sstable_directory instances and
concurrent calls to do_for_each_sstable, start a sharded<semaphore>
and pass a shared semaphore& to the sstable_directory:s.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-08 11:57:06 +03:00
Benny Halevy
c26c784882 sstables: sstable_directory: use max_concurrent_for_each
Use max_concurrent_for_each instead of parallel_for_each in
sstable_directory::parallel_for_each_restricted to avoid
creating potentially thousands of continuations,
one for each sstable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-07 14:45:20 +03:00
Benny Halevy
a3918bdc96 distributed_loader: reenable verify_owner_and_mode when loading new sstables
The call to `verify_owner_and_mode` from `flush_upload_dir`
fell between the cracks in b34c0c2ff6
(distributed_loader: rework uploading of SSTables).

It causes https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/528/testReport/nodetool_additional_test/TestNodetool/nodetool_refresh_with_wrong_upload_modes_test/
to fail like this:
```
/Directory cannot be accessed .* write/ not found in 'Nodetool command '/jenkins/workspace/scylla-master/dtest-release/scylla/.ccm/scylla-repository/7351db7cab7bbf907172940d0bbf8b90afde90ba/scylla-tools-java/bin/nodetool -h 127.0.87.1 -p 7187 refresh -- keyspace1 standard1' failed; exit status: 1; stdout: nodetool: Scylla API server HTTP POST to URL '/storage_service/sstables/keyspace1' failed: Failed to load new sstables: std::filesystem::__cxx11::filesystem_error (error system:13, filesystem error: remove failed: Permission denied [/jenkins/workspace/scylla-master/dtest-release/scylla/.dtest/dtest-rqzo7km7/test/node1/data/keyspace1/standard1-8a57a660b29611eabf0c000000000000/upload/mc-3-big-TOC.txt])
```

Reenable it in this patch makes the dtest pass again.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200621140439.85843-1-bhalevy@scylladb.com>
2020-06-22 14:03:13 +03:00
Glauber Costa
4d6aacb265 sstable_directory: add helper to reshape existing unshared sstables
Before moving SSTables to the main directory, we may need to reshape them
into in-strategy. This patch provides helper code that reshapes the SSTables
that are known to be unshared local in the sstable directory, and updates the
sstable directory with the result.

Rehaping can be made more or less aggressive by passing a reshape mode
(relaxed or strict), which will influence the amount of SSTables reshape
can tolerate to consider a particular slice of the SSTable set
offstrategy.

Because the compaction expects an std::vector everywhere, we changed
our chunked vector for the unshared sstables to a std::vector so we
can more easily pass it around without conversions.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-06-18 09:37:18 -04:00
Glauber Costa
aebd965f0e distributed_load: initial handling of off-strategy SSTables
Off-strategy SSTables are SSTables that do not conform to the invariants
that the compaction strategies define. Examples of offstrategy SSTables
are SSTables acquired over bootstrap, resharding when the cpu count
changes or imported from other databases through our upload directory.

This patch introduces a new class, sstable_directory, that will
handle SSTables that are present in a directory that is not one of the
directories where the table expects its SSTables.

There is much to be done to support off-strategy compactions fully. To
make sure we make incremental progress, this patch implements enough
code to handle resharding of SSTables in the upload directory. SSTables
are resharded in place, before we start accessing the files.

Later, we will take other steps before we finally move the SSTables into
the main directory. But for now, starting with resharding will not only
allow us to start small, but it will also allow us to start unleashing
much needed cleanups in many places. For instance, once we start
resharding on boot before making the SSTables available, we will be able
to expurge all places in Scylla where, during normal operations, we have
extra handler code for the fact that SSTables could be shared.

Tests: a new test is added and it passes in debug mode.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-06-08 16:06:00 -04:00
Glauber Costa
e48ad3dc23 remove manifest_file filter from table.
When we are scanning an sstable directory, we want to filter out the
manifest file in most situations. The table class has a filter for that,
but it is a static filter that doesn't depend on table for anything. We
are better off removing it and putting in another independent location.

While it seems wasteful to use a new header just for that, this header
will soon be populated with the sstable_directory class.

Tests: unit (dev)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-06-08 16:06:00 -04:00