Commit Graph

1609 Commits

Author SHA1 Message Date
Botond Dénes
c3c71b3aa5 readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/
The argument type (v1 or v2 reader) is enough to disambiguate and
overloading the v1 method makes a transition to v2 more seamless.
2022-04-20 10:59:09 +03:00
Botond Dénes
75786c42cb Merge 'Add repair unit tests/v1' from Mikołaj Sielużycki
This patch series splits up parts of repair pipeline to allow unit testing
various bits of code without having to run full dtest suite. The reason why
repair pipeline has no unit tests is that by definition repair requires multiple
nodes, while unit test environment works only for a single node.

However, it is possible to explicitly define interfaces between various parts of the
pipeline, inject dependencies and test them individually. This patch series is focused
on taking repair_rows_on_wire (frozen mutation representation of changes coming from
another node) and flushing them to an sstable.

The commits are split into the following parts:
- pulling out classes to separate headers so that they can be included (potentially indirectly) from the test,
- pulling out repair_meta::to_repair_rows_list and part of repair_meta::flush_rows_in_working_row_buf so that they can be tested,
- refactoring repair_writer so that the actual writing logic can be injected as dependency,
- creating the unit test.

tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)

Closes #10345

* github.com:scylladb/scylla:
  repair: Add unit test for flushing repair_rows_on_wire to disk.
  repair: Extract mutation_fragment_queue and repair_writer::impl interfaces.
  repair: Make parts of repair_writer interface private.
  repair: Rename inputs to flush_rows.
  repair: Make repair_meta::flush_rows a free function.
  repair: Split flush_rows_in_working_row_buf to two functions and make one static.
  repair: Rename inputs to to_repair_rows_list.
  repair: Make to_repair_rows_list a free function.
  repair: Make repair_meta::to_repair_rows_list a static function
  repair: Fix indentation in repair_writer.
  repair: Move repair_writer to separate header.
  repair: Move repair_row to a separate header.
  repair: Move repair_sync_boundary to a separate header.
  repair: Move decorated_key_with_hash to separate header.
  repair: Move row_repair hashing logic to separate class and file.
2022-04-14 18:17:03 +03:00
Botond Dénes
737cc798ca Merge "Add flat_mutation_reader_from_mutation_v2" from Benny Halevy
"
Optimize consuming from a single partition.

This gives us significant improvement with single, small mutations,
as shown with perf_mutation_readers, compared to the vector-based
flat_mutation_reader_from_mutations_v2.

These are expected to be common on the write path,
and can be optimized for view building.

results from: perf_mutation_readers -c1 --random-seed=840478750
(userspace cpu-frequency governer, 2.2GHz)

test                                      iterations      median         mad         min         max

Before:
combined.one_row                              720118   825.668ns     1.020ns   824.648ns   827.750ns

After:
combined.one_mutation                         881482   751.157ns     0.397ns   750.211ns   751.912ns
combined.one_row                              843270   756.553ns     0.303ns   755.889ns   757.911ns

The grand plan is to follow up
with make_flat_mutation_reader_from_frozen_mutation_v2
so that we can read directly from either a mutation
or frozen_mutation without having to unfreeze it e.g. in
table::push_view_replica_updates.

Test: unit(dev)
Perf: perf_mutation_readers(release)
"

* tag 'flat_mutation_reader_from_mutation-v3' of https://github.com/bhalevy/scylla:
  perf: perf_mutation_readers: add one_mutation case
  test: mutation_query_test: make make_source static
  mutation readers: refactor make_flat_mutation_reader_from_mutation*_v2
  mutation readers: add make_flat_mutation_reader_from_mutation_v2
  readers: delete slice_mutation.hh
  test: flat_mutation_reader_test: mock_consumer: add debug logging
  test: flat_mutation_reader_test: mock_consumer: make depth counter signed
2022-04-14 17:23:21 +03:00
Botond Dénes
fa75d58cf0 Merge "Make snitch start/stop code look classical" from Pavel Emelyanov
"
There's a generic way to start-stop services in scylla, that includes
5 "actions" (some are optional and/or implicit though)

  service_config cfg = ...
  sharded<service>.start(cfg)
  service.invoke_on_all(&service::start)
  service.invoke_on_all(&service::shutdown)
  service.invoke_on_all(&servuce::stop)
  sharded<service>.stop()

and most of the service out there conforms to that scheme. Not snitch
(spoiler: and not tracing), for which there's a couple of helpers that
do all that magic behind the scenes, "configuring" snitch is done with
the help of overloaded constructors. The latter is extra complicated
with the need to register snitch drivers in class-registry for each
constructor overload. Also there's an external shards synchronization
on stop.

This set brings snitch start/stop code to the described standard: the
create/stop helpers are removed, creation acceps the config structure,
per-shard start/stop (snitch has no drain for now) happens in the
simple invoke-on-all manner.

The intended side effect of this change is the ability to add explicit
dependencies to snitch (in the future, not in this set).

tests: unit(dev)
"

* 'br-snitch-config' of https://github.com/xemul/scylla:
  snitch: Remove create_snitch/stop_snitch
  snitch: Simplify stop (and pause_io)
  snitch: Move io_is_stopped to property-file driver
  snitch: Remove init_snitch_obj()
  snitch: Move instance creation into snitch_ptr constructor
  snitch: Make config-based construction of all drivers
  snitch: Declare snitch_ptr peering and rework container() method
  snitch: Introduce container() method
2022-04-14 16:56:32 +03:00
Benny Halevy
a4b69fe7b6 test: mutation_query_test: make make_source static
No need for it to be public.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-14 11:15:19 +03:00
Benny Halevy
e85241d5b6 mutation readers: add make_flat_mutation_reader_from_mutation_v2
Optimize reading from a single partition.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-14 11:14:43 +03:00
Benny Halevy
ee2c7948f3 test: flat_mutation_reader_test: mock_consumer: add debug logging
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-14 08:41:31 +03:00
Benny Halevy
38cdfca824 test: flat_mutation_reader_test: mock_consumer: make depth counter signed
We want to return stop_iteration::yes once we crossed
the initial depth threshold, with an unsigned depth counter,
it might wraparound and look > 1.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-14 08:41:31 +03:00
Tomasz Grabiec
0c365818c3 utils/chunked_managed_vector: Fix sigsegv during reserve()
Fixes the case of make_room() invoked with last_chunk_capacity_deficit
but _size not in the last reserved chunk.

Found during code review, no user impact.

Fixes #10364.

Message-Id: <20220411224741.644113-1-tgrabiec@scylladb.com>
2022-04-12 16:37:11 +03:00
Tomasz Grabiec
01eeb33c6e utils/chunked_vector: Fix sigsegv during reserve()
Fixes the case of make_room() invoked with last_chunk_capacity_deficit
but _size not in the last reserved chunk.

Found during code review, no known user impact.

Fixes #10363.

Message-Id: <20220411222605.641614-1-tgrabiec@scylladb.com>
2022-04-12 16:35:17 +03:00
Mikołaj Sielużycki
b16e12f3a1 repair: Add unit test for flushing repair_rows_on_wire to disk.
The unit test executes a simplified repair scenario by:
- producing a random stream of mutation mutation_fragments,
- convering them to repair_rows_on_wire,
- convering them to list of repair_rows using the conversion logic
  extracted in previous commits from repair_meta,
- flushing the rows to an sstable using the logic extracted in previous
  commits from repair_meta,
- comparing the sstable contents with the originally produced mutation
  fragments.

The test checks only the flushing part and is not concerned with any
other piece of the repair pipeline.
2022-04-12 09:22:10 +02:00
Calle Wilund
d478896d46 commitlog: kill non-recycled segment management
It has been default for a while now. Makes no sense to not do it.
Even hints can use it (even if it makes no difference there)
2022-04-11 16:34:00 +00:00
Pavel Emelyanov
828a951886 snitch: Remove create_snitch/stop_snitch
After previous patches both, create_snitch() and stop_snitch() no look
like the classica sharded service start/stop sequence. Finally both
helpers can be removed and the rest of the user can just call start/stop
on locally obtained sharded references.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:43:25 +03:00
Pavel Emelyanov
633746b87d snitch: Make config-based construction of all drivers
Currently snitch drivers register themselves in class-registry with all
sorts of construction options possible. All those different constuctors
are in fact "config options".

When later snitch will declare its dependencies (gossiper and system
keyspace), it will require patching all this registrations, which's very
inconvenient.

This patch introduces the snitch_config struct and replaces all the
snitch constructors with the snitch_driver(snitch_config cfg) one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:38:34 +03:00
Avi Kivity
59d56a3fd7 Merge 'Add keyspace storage options' from Piotr Sarna
This series is part of the shared storage project.

The STORAGE option is designed to hold a map of options
used for customizing storage for given keyspace.
The option is kept in a system_schema.scylla_keyspaces table.

This option is guarded with a schema feature, because it's kept in a new schema table: `system_schema.scylla_keyspaces`.

Example of the contents of the new table:
```cql
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;

 keyspace_name | storage_options                                | storage_type
---------------+------------------------------------------------+--------------
           ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} |           S3
```
Native storage options are not kept in the table, as this format doesn't hold any extra options and it would therefore just be a waste of storage.

Closes #10144

* github.com:scylladb/scylla:
  test: regenerate schema_change_test for storage options case
  test: improve output of schema_change_test regeneration
  docs: add a paragraph on keyspace storage options
  test: add test cases for keyspace storage options
  database,cql3: add STORAGE option to keyspaces
  db: add keyspace-storage-options experimental feature
  db,schema_tables: add scylla_keyspaces table
  db,gms: add SCYLLA_KEYSPACE schema feature
  db,gms: add KEYSPACE_STORAGE_OPTIONS feature
2022-04-10 17:23:56 +03:00
Raphael S. Carvalho
7b1589cb3d tests: chunked_managed_vector_test: Test correctness when crossing chunk boundary
While reviewing "utils/chunked_managed_vector: Fix corruption in case there is more
than one chunk", I was worried that there could be a correctness issue
when pop_back() pops off the first element of the last chunk, but turns
out I made an off-by-one error in my theory. Anyway, I wrote a unit test
to verify my assumption and I found worth submitting it upstream.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220408133555.12397-2-raphaelsc@scylladb.com>
2022-04-08 16:44:16 +02:00
Piotr Sarna
151d8f7c58 test: regenerate schema_change_test for storage options case
Keyspace storage options series adds a new schema table:
system_schema.scylla_keyspaces. The regenerated cases ensure
that this new table is taken into account when the schema feature
is available.
2022-04-08 09:17:01 +02:00
Piotr Sarna
4705a5fa42 test: improve output of schema_change_test regeneration
Schema change test operates on pre-generated sstables, and sometimes
this set of sstables needs to be regenerated. In order to make the
regeneration process more ergonomic, the output is now directly
copyable as valid C++ representation of UUIDs.
2022-04-08 09:17:01 +02:00
Piotr Sarna
3272b4826f db: add keyspace-storage-options experimental feature
Specifying non-standard keyspace options is experimental, so it's
going to be protected by a configuration flag.
2022-04-08 09:17:01 +02:00
Tomasz Grabiec
41fe01ecff utils/chunked_managed_vector: Fix corruption in case there is more than one chunk
If reserve() allocates more than one chunk, push_back() should not
work with the last chunk. This can result in items being pushed to the
wrong chunk, breaking internal invariants.

Also, pop_back() should not work with the last chunk. This breaks when
there is more than one chunk.

Currently, the container is only used in the sstable partition index
cache.

Manifests by crashes in sstable reader which touch sstables which have
partition index pages with more than 1638 partition entries.

Introduced in 78e5b9fd85 (4.6.0)

Fixes #10290

Message-Id: <20220407174023.527059-1-tgrabiec@scylladb.com>
2022-04-07 21:26:35 +03:00
Pavel Emelyanov
9066224cf4 table: Don't export compaction manager reference
There's a public call on replica::table to get back the compaction
manager reference. It's not needed, actually. The users of the call are
distributed loader which already has database at hand, and a test that
creates itw own instance of compaction manager for its testing tables
and thus also has it available.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220406171351.3050-1-xemul@scylladb.com>
2022-04-07 09:27:45 +03:00
Michael Livshin
a90e02c302 skeleton_reader: inherit from flat_mutation_reader_v2::impl
(completely mechanical change)

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Message-Id: <20220406122912.2248111-1-michael.livshin@scylladb.com>
2022-04-06 16:55:54 +03:00
Michael Livshin
6001a0fef1 multi_partition_reader: inherit from flat_mutation_reader_v2::impl
(completely mechanical change)

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Message-Id: <20220406122122.2246058-1-michael.livshin@scylladb.com>
2022-04-06 16:55:07 +03:00
Benny Halevy
abbf5de68c frozen_mutation: introduce consume method
Allowing to consume the frozen_mutation directly
to a stream rather than unfreezing it first
and then consuming the unfrozen mutation.

Streaming directly from the frozen_mutation
saves both cpu and memory, and will make it
easier to be made async as a follow, to allow
yielding, e.g. between rows.

This is used today only in to_data_query_result
which is invoked on the read-repair path.

Refs #10038
Fixes #10021

Test: unit(release)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220405055807.1834494-1-bhalevy@scylladb.com>
2022-04-05 10:51:21 +03:00
Raphael S. Carvalho
840500fc4d compaction: Make cleanup for Leveled strategy bucket-aware
Bucket awareness in cleanup was introduced in a69d98c3d0.
STCS and TWCS already support it, and now LCS will receive it.

The goal of bucket awareness is to reduce writeamp in cleanup,
therefore reducing operation time. Additionally, garbage collection
becomes more efficient as shadowed data can now be potentially
compacted with the data that shadows it, assuming they're on
the same level.

The implementation for LCS is simple. Will reuse the procedure
for STCS for returning jobs in level 0. And one job will be
returned for each non-empty level > 0. What allows us to do it
is our incremental selection approach used in compaction,
that sets a limit on memory usage and disk space requirement.

Fixes #10097.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220331173417.211257-1-raphaelsc@scylladb.com>
2022-04-05 09:10:21 +03:00
Botond Dénes
c8ea0715e9 tests: move away from table::make_reader()
Use v2 equivalents instead.
2022-04-01 13:39:26 +03:00
Avi Kivity
af07519928 Merge "Remove reader from mutations v1" from Botond
"
First migrate all users to the v2 variant, all of which are tests.
However, to be able to properly migrate all tests off it, a v2 variant
of the restricted reader is also needed. All restricted reader users are
then migrated to the freshly introduced v2 variant and the v1 variant is
removed.
Users include:
* replica::table::make_reader_v2()
* streaming_virtual_table::as_mutation_source()
* sstables::make_reader()
* tests

This allows us to get rid of a bunch of conversions on the query path,
which was mostly v2 already.

With a few tests we did kick the can down the road by wrapping the v2
reader in `downgrade_to_v1()`, but this series is long enough already.

Tests: unit(dev), unit(boost/flat_mutation_reader_test:debug)
"

* 'remove-reader-from-mutations-v1/v3' of https://github.com/denesb/scylla:
  readers: remove now unused v1 reader from mutations
  test: move away from v1 reader from mutations
  test/boost/mutation_reader_test: use fragment_scatterer
  test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh
  test/boost: mutation_fragment_test: refactor fragment_scatterer
  readers: remove now unused v1 reversing reader
  test/boost/flat_mutation_reader_test: convert to v2
  frozen_mutation: fragment_and_freeze(): convert to v2
  frozen_mutation: coroutinize fragment_and_freeze()
  readers: migrate away from v1 reversing reader
  db/virtual_table: use v2 variant of reversing and forwardable readers
  replica/table: use v2 variant of reversing reader
  sstables/sstable: remove unused make_crawling_reader_v1()
  sstables/sstable: remove make_reader_v1()
  readers: add v2 variant of reversing reader
  readers/reversing: remove FIXME
  readers: reader from mutations: use mutation's own schema when slicing
2022-03-31 13:29:11 +03:00
Botond Dénes
fd69add579 test: move away from v1 reader from mutations
Use the v2 variant instead.
2022-03-31 10:36:23 +03:00
Botond Dénes
2e00ff314d test/boost/mutation_reader_test: use fragment_scatterer
Instead of the open-coded equivalent the test currently has.
2022-03-31 10:25:45 +03:00
Botond Dénes
feecc19d5b test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh
We want to use it in test/boost/mutation_reader_test.cc too.
2022-03-31 10:25:45 +03:00
Botond Dénes
226f01162e test/boost: mutation_fragment_test: refactor fragment_scatterer
Instead of taking an output parameter in the constructor, take just the
desired number of mutations to build and return the mutation list from
`consume_end_of_stream()`.
2022-03-31 10:25:45 +03:00
Botond Dénes
b8f0ab3b98 readers: remove now unused v1 reversing reader 2022-03-31 10:04:45 +03:00
Botond Dénes
56e3c6add6 test/boost/flat_mutation_reader_test: convert to v2 2022-03-31 10:04:29 +03:00
Botond Dénes
2e634883d9 frozen_mutation: fragment_and_freeze(): convert to v2 2022-03-31 09:57:48 +03:00
Botond Dénes
219cb881a4 sstables/sstable: remove make_reader_v1()
No external users, only used internally, by make_reader(), who delegates
cases currently unsupported by v2 to it. The code needed from
make_reader_v1() is inlined into make_reader() and the former is
removed.
2022-03-31 09:57:48 +03:00
Botond Dénes
470dc0d013 readers: add v2 variant of reversing reader
The v2 format allows for a much simpler reversing mechanism since
clustering fragments can simply be reversed as they are read. Fragments
are directly pushed in the reader's buffer eliminating a separate move
phase.
Existing reverse reader unit tests are converted to test the v2 one.
2022-03-31 09:57:48 +03:00
Botond Dénes
b029bd3db7 tree: remove mutation_reader.hh include
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
2022-03-30 15:42:51 +03:00
Botond Dénes
b7954138ac mutation_reader: move compacting reader into readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
11c378a175 mutation_reader: move queue reader to readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
f24f2f726a mutation_reader: move filtering reader into readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
d0ea895671 readers: move multishard reader & friends to reader/multishard.cc
Since the multishard reader family weighs more than 1K SLOC, it gets
its own .cc file.
2022-03-30 15:42:51 +03:00
Botond Dénes
f8015d9c26 readers: move combined reader into readers/
Since the combined reader family weighs more than 1K SLOC, it gets its
own .cc file.
2022-03-30 15:42:51 +03:00
Botond Dénes
0c3d4091a4 Merge "Make TWCS' cleanup bucket aware" from Raphael S. Carvalho
"
Quoting patch 3/4:
"This continues the work in a69d98c3d0,
by implementing the cleanup method in TWCS to make it bucket aware.
Till now, the default impl was used which cleanups on file at a
time, starting from the smallest.

The cleanup strategy for TWCS is simple. It's simply calling the
size tiered cleanup method for each bucket, so there will be
one job for each tier in each window.

The next strategies to receive this improvement are LCS and ICS
(the latter one being only available in enterprise).

Refs #10097."

** Simply put, the goal is to reduce writeamp when performing cleanup
on a TWCS table, therefore reducing the operation time. **

tests: unit(dev).
"

* 'twcs_cleanup_bucket_aware/v1' of https://github.com/raphaelsc/scylla:
  tests: sstable_compaction_test: Add test for TWCS' bucket-aware cleanup
  compaction: TWCS: Implement cleanup method for bucket awareness
  compaction: TWCS: change get_buckets() signature to work with const qualified functions
  compaction_strategy: get_cleanup_compaction_jobs: accept candidates by value
2022-03-30 11:45:28 +03:00
Raphael S. Carvalho
a1fd9c1ee8 tests: sstable_compaction_test: Add test for TWCS' bucket-aware cleanup
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-29 09:52:11 -03:00
Botond Dénes
316ff9eb86 test/boost/mutation_test: add test for mutation_fragment_stream_validator 2022-03-29 13:19:05 +03:00
Botond Dénes
a69d98c3d0 Merge "Improve efficiency of cleanup compaction by making it bucket aware" from Raphael S. Carvalho
"
Cleanup compaction works by rewriting all sstables that need clean up, one at
a time.
This approach can cause bad write amplification because the output data is
being made incrementally available for regular compaction.
Cleanup is a long operation on large data sets, and while it's happening,
new data can be written to buckets, triggering regular compaction.
Cleanup fighting for resources with regular compaction is a known problem.
With cleanup adding one file at a time to buckets, regular may require multiple
rounds to compact the data in a given bucket B, producing bad writeamp.

To fix this problem, cleanup will be made bucket aware. As each compaction
strategy has its own definition of bucket, strategies will implement their
own method to retrieve cleanup jobs. The method will be implemented such that
all files in a bucket B will be cleaned up together, and on completion,
they'll be made available for regular at once.

For STCS / ICS, a bucket is a size tier.
For TWCS, a bucket is a window.
For LCS, a bucket is a level.

In this way, writeamp problem is fixed as regular won't have to perform
multiple rounds to compact the data in a given bucket. Additionally, cleanup
will now be able to deduplicate data and will become way more efficient at
garbage collecting expired data.

The space requirement shouldn't be an issue, as compacting an entire bucket
happens during regular compaction anyway.
With leveled strategy, compacting an entire level is also not a problem because
files in a level L don't overlap and therefore incremental compaction is
employed to limit the space requirement.

By the time being, only STCS cleanup was made bucket aware. The others will be
using a default method, where one file is cleaned up at a time. Making cleanup
of other strategies bucket aware is relatively easy now and will be done soon.

Refs #10097.
"

* 'cleanup-compaction-revamp/v3' of https://github.com/raphaelsc/scylla:
  test: sstable_compaction_test: Add test for strategy cleanup method
  compaction: STCS: Implement cleanup strategy
  compaction_manager: Wire cleanup task into the strategy cleanup method
  compaction_strategy: Allow strategies to define their own cleanup strategy
  compaction: Introduce compaction_descriptor::sstables_size
  compaction: Move decision of garbage collection from strategy to task type
2022-03-25 16:30:28 +02:00
Raphael S. Carvalho
5312526e5e test: sstable_compaction_test: Add test for strategy cleanup method
Stresses default and STCS implementations of cleanup method

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-25 11:23:29 -03:00
Avi Kivity
72c6859c25 Merge "readers: get rid of v1 mutation from fragments" from Botond
"
The only real user is view building, which is converted to v2 and then
the v1 version of the mutation from fragments reader is removed.

Tests: unit(dev, release)
"

* 'v2-only-from-fragments-mutations/v1' of https://github.com/denesb/scylla:
  readers: remove now unused v1 reader from fragments
  test/boost: flat_mutation_reader_test: remove reader from fragments test
  replica/table: migrate generate_and_propagate_view_updates() to v2
  replica/table: migrate populate_views() to v2
  db/view: convert view_update_builder interface to v2
  db/view: migrate view_update_builder to v2
2022-03-22 15:18:25 +02:00
Raphael S. Carvalho
c25d8f6770 compaction: Move decision of garbage collection from strategy to task type
For compaction to be able to purge expired data, like tombstones, a
sstable set snapshot is set in the compaction descriptor.

That's a decision that belongs to task type. For example, all regular
compaction enable GC, whereas scrub for example doesn't for safety
reasons.

The problem is that the decision is being made by every instantiation
of compaction_descriptor in the strategies, which is both unnecessary
and also adds lots of boilerplate to the code, making it hard to
understand and work with.

As sstable set snapshot is an implementation detail, a new method
is being added to compaction_descriptor to make the intention
clearer, making the interface easier to understand.

can_purge_tombstones, used previously by rewrite task only, is being
reused for communicating GC intention into task::compact_sstables().

The boilerplate was a pain when adding a new strategy method for
the ongoing work on cleanup, described by issue #10097.
Another benefit is that we'll now only create a set snapshot when
compaction will really run. Before, it could happen that the snapshot
would be discarded if the compaction attempt had to be postponed,
which is a waste of cpu cycles.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-21 12:14:04 -03:00
Nadav Har'El
f76f6dbccb secondary index: avoid special characters in default index names
In CQL, table names are limited to so-called word characters (letters,
numbers and underscores), but column names don't have such a limitation.
When we create a secondary index, its default name is constructed from
the column name - so can contain problematic characters. It can include
even the "/" character. The problem is that the index name is then used,
like a table name, to create a directory with that name.

The test included in this patch demonstrates that before this patch, this
can be misused to create subdirectories anywhere in the filesystem, or to
crash Scylla when it fails to create a directory (which it considers an
unrecoverable I/O error).

In this patch we do what Cassandra does - remove all non-word
characters from the indexed column name before constructing the default
index name. In the included test - which can run on both Scylla and
Cassandra - we verify that the constructed index name is the same as
in Cassandra, which is useful to know (e.g., because knowing the index
name is needed to DROP the index).

Also, this patch adds a second line of defense against the security problem
described above: It is now an error to create a schema with a slash or
null (the two characters not allowed in Unix filenames) in the keyspace
or table names. So if the first line of defense (CQL checking the validity
of its commands) fails, we'll have that second line of defense. I verified
that if I revert the default-index-name fix, the second line of defense
kicks in, and the index creation is aborted and cannot create files in
the wrong place to crash Scylla.

Fixes #3403

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220320162543.3091121-1-nyh@scylladb.com>
2022-03-20 18:33:48 +02:00