Commit Graph

23244 Commits

Author SHA1 Message Date
Avi Kivity
6f986df458 Merge "Fix TWCS compaction aggressiveness due to data segregation" from Raphael
"
After data segregation feature, anything that cause out-of-order writes,
like read repair, can result in small updates to past time windows.
This causes compaction to be very aggressive because whenever a past time
window is updated like that, that time window is recompacted into a
single SSTable.
Users expect that once a window is closed, it will no longer be written
to, but that has changed since the introduction of the data segregation
future. We didn't anticipate the write amplification issues that the
feature would cause. To fix this problem, let's perform size-tiered
compaction on the windows that are no longer active and were updated
because data was segregated. The current behavior where the last active
window is merged into one file is kept. But thereafter, that same
window will only be compacted using STCS.

Fixes #6928.
"

* 'fix_twcs_agressiveness_after_data_segregation_v2' of github.com:raphaelsc/scylla:
  compaction/twcs: improve further debug messages
  compaction/twcs: Improve debug log which shows all windows
  test: Check that TWCS properly performs size-tiered compaction on past windows
  compaction/twcs: Make task estimation take into account the size-tiered behavior
  compaction/stcs: Export static function that estimates pending tasks
  compaction/stcs: Make get_buckets() static
  compact/twcs: Perform size-tiered compaction on past time windows
  compaction/twcs: Make strategy easier to extend by removing duplicated knowledge
  compaction/twcs: Make newest_bucket() non-static
  compaction/twcs: Move TWCS implementation into source file
2020-08-19 17:19:01 +03:00
Avi Kivity
f6b66456fd Update seastar submodule
Contains patch from Rafael to fix up includes.

* seastar c872c3408c...7f7cf0f232 (9):
  > future: Consider result_unavailable invalid in future_state_base::ignore()
  > future: Consider result_unavailable invalid in future_state_base::valid()
  > Merge "future-util: split header" from Benny
  > docs: corrected some text and code-examples in streaming-rpc docs
  > future: Reduce nesting in future::then
  > demos: coroutines: include std-compat.hh
  > sstring: mark str() and methods using it as noexcept
  > tls: Add an assert
  > future: fix coroutine compilation
2020-08-19 17:18:57 +03:00
Rafael Ávila de Espíndola
56724d084d sstables: Move date_tiered_compaction_strategy_options::date_tiered_compaction_strategy_options out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-6-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola
07b3ead752 sstables: Move size_tiered_compaction_strategy_options::size_tiered_compaction_strategy_options out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-5-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola
7b3946fa0e sstables: Move compaction_strategy_impl::compaction_strategy_impl out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-4-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola
9ba765fe6f sstables: Move compaction_strategy_impl::get_value out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-3-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola
06b15aa7e3 sstables: Move time_window_compaction_strategy_options' constructors to a .cc
These are not trivial and not hot.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-2-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Raphael S. Carvalho
d601f78b4b compaction/twcs: improve further debug messages
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
086f277584 compaction/twcs: Improve debug log which shows all windows
The current log prints one log entry for each window, it doesn't print
the # of SSTs in the bucket, and the now information is copied across
all the window entries.

previously, it looked like this:

[shard 0] compaction - Key 1597331160000000, now 1597331160000000
[shard 0] compaction - Key 1597331100000000, now 1597331160000000
[shard 0] compaction - Key 1597331040000000, now 1597331160000000
[shard 0] compaction - Key 1597330980000000, now 1597331160000000

this made it harder to group all windows which reflect the state of
the strategy in a given time.

now, it looks like as follow:

[shard 0] compaction - time_window_compaction_strategy::newest_bucket:
  now 1597331160000000
  buckets = {
    key=1597331160000000, size=1
    key=1597331100000000, size=2
    key=1597331040000000, size=1
    key=1597330980000000, size=1
  }

Also the level of this log is changed from debug to trace, given that
now it's compressed and only printed once.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
3be1420083 test: Check that TWCS properly performs size-tiered compaction on past windows
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
96436312be compaction/twcs: Make task estimation take into account the size-tiered behavior
The task estimation was not taking into account that TWCS does size-tiered
on the the windows, and it only added 1 to the estimation when there
could be more tasks than that depending on the amount of SSTables in
all the existing size tiers.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
d287b1c198 compaction/stcs: Export static function that estimates pending tasks
That will be useful for allowing other compaction strategies that use
STCS to properly estimate the pending tasks.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
b62737fd05 compaction/stcs: Make get_buckets() static
STCS will export a static function to estimate pending tasks, and
it relies on get_buckets() being static too.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:07 -03:00
Botond Dénes
48550eaae4 scylla-gdb.py: add find_vptrs_of_type() helper functions
One of the most common code I write when investigating coredumps is
finding all objects of a certain type and creating a human readable
report of certain properties of these objects. This usually involves
retrieving all objects with a vptr with `find_vptrs()` and matching
their type to some pattern. I found myself writing this boilerplate over
and over again, so in this patch I introduce a convenience method to
avoid repeating it in the future.
Message-Id: <20200818145247.2358116-1-bdenes@scylladb.com>
2020-08-18 17:20:06 +02:00
Botond Dénes
74ffafc8a7 scylla-gdb.py: scylla fiber: add actual return to early return
scylla_fiber._walk() has an early return condition on the passed-in
pointer actually being a task pointer. The problem is that the actual
return statement was missing and only an error was logged. This resulted
in execution continuing and further weird errors being printed due to
the code not knowing how to handle the bad pointer.
Message-Id: <20200818144902.2357289-1-bdenes@scylladb.com>
2020-08-18 17:17:25 +02:00
Botond Dénes
f3af6ff221 scylla-gdb.py: scylla fiber: add new FQ name of thread_wake_task
thread_wake_task was moved into an anonymous namespace, add this new
fully qualified name to the task name white-list. Leave the old name for
backward compatibility.

While at it, also add `seastar::thread_context` which is also a task
object, for better seastar thread support.
Message-Id: <20200818142206.2354921-1-bdenes@scylladb.com>
2020-08-18 16:48:01 +02:00
Botond Dénes
ece638fb3f scylla-gdb.py: collection_element(): add std::tuple support
Accessing the element of a tuple from the gdb command line is a
nightmare, add support to collection_element() retrieving one of its
elements to make this easier.
Message-Id: <20200818141123.2351892-1-bdenes@scylladb.com>
2020-08-18 16:48:01 +02:00
Botond Dénes
077dc7c021 scylla-gdb.py: boost_intrusive_list: add __len__() operator
Message-Id: <20200818141340.2352666-1-bdenes@scylladb.com>
2020-08-18 16:48:01 +02:00
Dejan Mircevski
fb6c011b52 everywhere: Insert space after switch
Quoth @avikivity: "switch is not a function, and we celebrate that by
putting a space after it like other control-flow keywords."

https://github.com/scylladb/scylla/pull/7052#discussion_r471932710

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 14:31:04 +03:00
Botond Dénes
78f94ba36a table: get_sstables_by_partition_key(): don't make a copy of selected sstables
Currently we assign the reference to the vector of selected sstables to
`auto sst`. This makes a copy and we pass this local variable to
`do_for_each()`, which will result in a use-after-free if the latter
defers.
Fix by not making a copy and instead just keep the reference.

Fixes: #7060

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200818091241.2341332-1-bdenes@scylladb.com>
2020-08-18 14:20:31 +03:00
Avi Kivity
ecb2bdad54 Merge 'Replace operator_type with an enum' from Dejan
"
operator_type is awkward because it's not copyable or assignable. Replace it with a new enum class.

Tests: unit(dev)
"

* dekimir-operator-type:
  cql3: Drop operator_type entirely
  cql3: Drop operator_type from the parser
  cql3/expr: Replace operator_type with an enum
2020-08-18 13:45:20 +03:00
Dejan Mircevski
1aa326c93b cql3: Drop operator_type entirely
Since no live code uses it anymore, it can be safely removed.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 12:27:01 +02:00
Dejan Mircevski
d97605f4f8 cql3: Drop operator_type from the parser
Replace operator_type with the nicer-behaved oper_t in CQL parser and,
consequently, in the relation hierarchy and column_condition.

After this, no references to operator_type remain in live code.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 12:27:00 +02:00
Dejan Mircevski
71c921111d cql3/expr: Replace operator_type with an enum
operator_type is awkward because it's not copyable or assignable.
Replace it in expression representation with a new enum class, oper_t.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 12:27:00 +02:00
Avi Kivity
b0ae9d0c7d Update tools/python3 submodule
* tools/python3 196be5a...f89ade5 (1):
  > reloc: cleanup deb builddir
2020-08-18 13:10:01 +03:00
Pekka Enberg
385ad5755b configure.py: Move tarballs to build/<mode>/dist/tar
As suggested by Avi, let's move the tarballs from
"build/dist/<mode>/tar" to "build/<mode>/dist/tar" to retain the
symmetry of different build modes, and make the tarballs easier to
discover. While at it, let's document the new tarball locations.

Message-Id: <20200818100427.1876968-1-penberg@scylladb.com>
2020-08-18 13:07:52 +03:00
Botond Dénes
22a6493716 view_update_generator: fix race between registering and processing sstables
fea83f6 introduced a race between processing (and hence removing)
sstables from `_sstables_with_tables` and registering new ones. This
manifested in sstables that were added concurrently with processing a
batch for the same sstables being dropped and the semaphore units
associated with them not returned. This resulted in repairs being
blocked indefinitely as the units of the semaphore were effectively
leaked.

This patch fixes this by moving the contents of `_sstables_with_tables`
to a local variable before starting the processing. A unit test
reproducing the problem is also added.

Fixes: #6892

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200817160913.2296444-1-bdenes@scylladb.com>
2020-08-18 10:22:35 +03:00
Takuya ASADA
352a136ae2 scylla-python3: move scylla-python3 to separated repository
Except scylla-python3, each scylla package has its own git repository, same package script filename, same build directory structure.
To put python3 thing on scylla repo, we created 'python3' directory on multiple locations, made '-python3' suffixed files, dig deeper build directory not to conflict scylla-server package build.
We should move all scylla-python3 related files to new repository, scylla-python3.

To keep compatibility with current Jenkins script, provide packages on
build/ directory for now.

Fixes #6751
2020-08-18 09:34:08 +03:00
Raphael S. Carvalho
f9f0be9ac8 compact/twcs: Perform size-tiered compaction on past time windows
After data segregation feature, anything that cause out-of-order writes,
like read repair, can result in small updates to past time windows.
This causes compaction to be very aggressive because whenever a past time
window is updated like that, that time window is recompacted into a
single SSTable.
Users expect that once a window is closed, it will no longer be written
to, but that has changed since the introduction of the data segregation
future. We didn't anticipate the write amplification issues that the
feature would cause. To fix this problem, let's perform size-tiered
compaction on the windows that are no longer active and were updated
because data was segregated. The current behavior where the last active
window is merged into one file is kept. But thereafter, that same
window will only be compacted using STCS.

Fixes #6928.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
820b47e9a3 compaction/twcs: Make strategy easier to extend by removing duplicated knowledge
TWCS is hard to extend because its knowledge on what to do with a window
bucket is duplicated in two functions. Let's remove this duplication by
placing the knowledge into a single function.

This is important for the coming change that will perform size-tiered
instead of major on windows that are no longer active.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
f2b588cfc4 compaction/twcs: Make newest_bucket() non-static
To fix #6928, newest_bucket() will have to access the class fields.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
b95359314d compaction/twcs: Move TWCS implementation into source file
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Pavel Solodovnikov
9aa4712270 lwt: introduce paxos_grace_seconds per-table option to set paxos ttl
Previously system.paxos TTL was set as max(3h, gc_grace_seconds).

Introduce new per-table option named `paxos_grace_seconds` to set
the amount of seconds which are used to TTL data in paxos tables
when using LWT queries against the base table.

Default value is equal to `DEFAULT_GC_GRACE_SECONDS`,
which is 10 days.

This change allows to easily test various issues related to paxos TTL.

Fixes #6284

Tests: unit (dev, debug)

Co-authored-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200816223935.919081-1-pa.solodovnikov@scylladb.com>
2020-08-17 16:44:14 +02:00
Nadav Har'El
4c73d43153 Alternator: allow CreateTable with SSESpecification explicitly disabled
While Alternator doesn't yet support creating a table with a different
"server-side encryption" (a.k.a. encryption-at-rest) parameters, the
SSESpecification option with Enabled=false should still be allowed, as
it is just the default, and means exactly the same as would a missing
SSESpecification.

This patch also adds a test for this case, which failed on Alternator
before this patch.

Fixes #7031.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200812205853.173846-1-nyh@scylladb.com>
2020-08-17 13:48:52 +02:00
Nadav Har'El
159a966949 alternator test, streams: add test LSI key attributes in OldImage
This patch adds a test that attributes which serve as a key for a
secondary index still appear in the OldImage in an Alternator Stream.

This is a special case, because although usually Alternator attributes
are saved as map elements, not stand-alone Scylla columns, in the special
case of secondary-index keys they *are* saved as actual Scylla columns
in the base table. And it turns out we produce wrong results in this case:
CDC's "preimage" does not currently include these columns if they didn't
change, while DynamoDB requires that all columns, not just the changed ones,
appear in OldImage. So the test added in this patch xfails on Alternator
(and as usual, passes on DynamoDB).

Refs #7030.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200812144656.148315-1-nyh@scylladb.com>
2020-08-17 13:46:53 +02:00
Pekka Enberg
d6354cb507 dbuild: Use host $USER and $HOME in Podman container
The "user.home" system property in JVM does not use the "HOME"
environment variable. This breaks Ant and Maven builds with Podman,
which attempts to look up the local Maven repository in "/root/.m2" when
building tools, for example:

  build.xml:757: /root/.m2/repository does not exist.

To fix the issue, let's bind-mount an /etc/passwd file, which contains
host username for UID 0, which ensures that Podman container $USER and $HOME
are the same as on the host.

Message-Id: <20200817085720.1756807-1-penberg@scylladb.com>
2020-08-17 13:46:28 +03:00
Avi Kivity
f4dbe3e65e Update tools/jmx submodule
* tools/jmx c5ed831...be8f1ac (1):
  > dist/common/systemd: set WorkingDirectory to get heap dump correctly
2020-08-17 09:54:59 +03:00
Avi Kivity
3b1ff90a1a Merge "Get rid of seed concept in gossip" from Asias
"
gossip: Get rid of seed concept

The concept of seed and the different behaviour between seed nodes and
non seed nodes generate a lot of confusion, complication and error for
users. For example, how to add a seed node into into a cluster, how to
promote a non seed node to a seed node, how to choose seeds node in
multiple DC setup, edit config files for seeds, why seed node does not
bootstrap.

If we remove the concept of seed, it will get much easier for users.
After this series, seed config option is only used once when a new node
joins a cluster.

Major changes:

Seed nodes are only used as the initial contact point nodes.

Seed nodes now perform bootstrap. The only exception is the first node
in the cluster.

The unsafe auto_bootstrap option is now ignored.

Gossip shadow round now talks to all nodes instead of just seed nodes.

Refs: #6845
Tests: update_cluster_layout_tests.py + manual test
"

* 'gossip_no_seed_v2' of github.com:asias/scylla:
  gossip: Get rid of seed concept
  gossip: Introduce GOSSIP_GET_ENDPOINT_STATES verb
  gossip: Add do_apply_state_locally helper
  gossip: Do not talk to seed node explicitly
  gossip: Talk to live endpoints in a shuffled fashion
2020-08-17 09:50:51 +03:00
Avi Kivity
5356e8319d Merge 'Support building packages on non-x86 platform' from Takuya
"
Allow users to build unofficial packages for non-x86 platform.
"

* syuu1228-aarch64_packaging_fix:
  dist/debian: allow building non-amd64 .deb
  configure.py: disable DPDK by default on non-x86_64 platform
2020-08-17 08:26:17 +03:00
Takuya ASADA
c73e945cf6 dist/debian: allow building non-amd64 .deb
Allow building .deb on any architecture, not only amd64.
2020-08-17 14:16:24 +09:00
Takuya ASADA
06079f0656 configure.py: disable DPDK by default on non-x86_64 platform
Since configure.py without option fails on some non-x86 architecture
such as ARM64, we should disable it on such architectures.
2020-08-17 14:16:24 +09:00
Asias He
d0b3f3dfe8 gossip: Get rid of seed concept
The concept of seed and the different behaviour between seed nodes and
non seed nodes generate a lot of confusion, complication and error for
users. For example, how to add a seed node into into a cluster, how to
promote a non seed node to a seed node, how to choose seeds node in
multiple DC setup, edit config files for seeds, why seed node does not
bootstrap.

If we remove the concept of seed, it will get much easier for users.
After this series, seed config option is only used once when a new node
joins a cluster.

Major changes:

- Seed nodes are only used as the initial contact point nodes.

- Seed nodes now perform bootstrap. The only exception is the first node
  in the cluster.

- The unsafe auto_bootstrap option is now ignored.

- Gossip shadow round now attempts to talk to all nodes instead of just seed nodes.

Manual test:

- bootstrap n1, n2, n3  (n1 and n2 are listed as seed, check only n1
  will skip bootstrap, n2 and n3 will bootstrap)
- shtudown n1, n2, n3
- start n2 (check non seed node can boot)
- start n1 (check n1 talks to both n2 and n3)
- start n3 (check n3 talks to both n1 and n3)

Upgrade/Downgrade test:

- Initialize cluster
  Start 3 node with n1, n2, n3 using old version
  n1 and n2 are listed as seed

- Test upgrade starting from seed nodes
  Rolling restart n1 using new version
  Rolling restart n2 using new version
  Rolling restart n3 using new version

- Test downgrade to old version
  Rolling restart n1 using old version
  Rolling restart n2 using old version
  Rolling restart n3 using old version

- Test upgrade starting from non seed nodes
  Rolling restart n3 using new version
  Rolling restart n2 using new version
  Rolling restart n1 using new version

Notes on upgrade procedure:

There is no special procedure needed to upgrade to Scylla without seed
concept. Rolling upgrade node one by one is good enough.

Fixes: #6845
Tests: ./test.py + update_cluster_layout_tests.py + manual test
2020-08-17 10:35:16 +08:00
Takuya ASADA
75c2362c95 dist/debian: disable debuginfo compression on .deb
Since older binutils on some distribution does not able to handle
compressed debuginfo generated on Fedora, we need to disable it.
However, debian packager force debuginfo compression since debian/compat = 9,
we have to uncompress them after compressed automatically.

Fixes #6982
2020-08-16 18:13:29 +03:00
Avi Kivity
125795bda5 Merge " Build tarballs to build/dist/<mode>/tar directory" from Pekka
"
This patch series changes the build system to build all tarballs to
build/dist/<mode>/tar directory. For example, running:

  ./tools/toolchain/dbuild ./configure.py --mode=dev && ./tools/toolchain/dbuild ninja-build dist-tar

produces the following tarballs in build/dist/dev/tar:

  $ ls -1 build/dist/dev/tar/
  scylla-jmx-package.tar.gz
  scylla-package.tar.gz
  scylla-python3-package.tar.gz
  scylla-tools-package.tar.gz

This makes it easy to locate release tarballs for humans and scripts. To
preserve backward compatibility, the tarballs are also retained in their
original locations. Once release engineering infrastructure has been
adjusted to use the new locations, we can drop the duplicate copies.
"

* 'penberg/build-dist-tar/v1' of github.com:penberg/scylla:
  configure.py: Copy tarballs to build/dist/<mode>/tar directory
  configure.py: Add "dist-<component>-tar" targets
  reloc/python3: Add "--builddir" to build_deb.sh
  configure.py: Use copy-on-write copies when possible
2020-08-16 17:55:35 +03:00
Avi Kivity
061ec49a6c Merge "Improve error reporting on invalid internal schema access" from Tomasz
"
Contains several fixes which improve debuggability in situations where
too large column ids are passed to column definition loop methods.
"

* 'schema-range-check-fix' of github.com:tgrabiec/scylla:
  schema: Add table name and schema version to error messages
  schema: Use on_internal_error() for range check errors
  schema: Fix off-by-one in column range check
  schema: Make range checks for regular and static columns the same as for clustering columns
2020-08-16 17:48:48 +03:00
Raphael S. Carvalho
81ec49c82f sstables/sstable_set: rename method to retrieve sstable runs
select() is too generic for the method that retrieve sstable runs,
and it has a completely different meaning that the former select
method used to select sstables based on token range.
let's give it a more descriptive name.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200811193401.22749-1-raphaelsc@scylladb.com>
2020-08-16 17:41:16 +03:00
Raphael S. Carvalho
b07920dd1f sstables: Fix remove_by_toc_name() on temporary toc
regression caused by 55cf219c97.

remove_by_toc_name() must work both for a sealed sstable with toc,
and also a partial sstable with tmp toc.
so dirname() should be called conditionally on the condition of
the sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200813160612.101117-1-raphaelsc@scylladb.com>
2020-08-16 17:35:55 +03:00
Raphael S. Carvalho
7d7f9e1c54 sstables/LCS: increase per-level overlapping tolerance in reshape
LCS can have its overlapping invariant broken after operations that can
proceed in parallel to regular compaction like cleanup. That's because
there could be two compactions in parallel placing data in overlapping
token ranges of a given level > 0.
After reshape, the whole table will be rewritten, on restart, if a
given level has more than (fan_out*2)=20 overlaps.
That may sound like enough, but that's not taking into account the
exponential growth in # of SSTables per level, so 20 overlaps may
sound like a lot for level 2 which can afford 100 sstables, but it's
only 2% of level 3, and 0.2% of level 4. So let's change the
overlapping tolerance from the constant of fan_out*2 to 10% of level
limit on # of SSTables, or fan_out, whichever is higher.

Refs #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200810154510.32794-1-raphaelsc@scylladb.com>
2020-08-16 17:33:48 +03:00
Raphael S. Carvalho
11df96718a compaction: Prevent non-regular compaction from picking compacting SSTables
After 8014c7124, cleanup can potentially pick a compacting SSTable.
Upgrade and scrub can also pick a compacting SSTable.
The problem is that table::candidates_for_compaction() was badly named.
It misleads the user into thinking that the SSTables returned are perfect
candidates for compaction, but manager still need to filter out the
compacting SSTables from the returned set. So it's being renamed.

When the same SSTable is compacted in parallel, the strategy invariant
can be broken like overlapping being introduced in LCS, and also
some deletion failures as more than one compaction process would try
to delete the same files.

Let's fix scrub, cleanup and ugprade by calling the manager function
which gets the correct candidates for compaction.

Fixes #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200811200135.25421-1-raphaelsc@scylladb.com>
2020-08-16 17:31:03 +03:00
Nadav Har'El
7e01ae089e cdc: avoid including cdc/cdc_options.hh everywhere
Before this patch, modifying cdc/cdc_options.hh required recompiling 264
source files. This is because this header file was included by a couple
other header files - most notably schema.hh, where a forward declaration
would have been enough. Only the handful of source files which really
need to access the CDC options should include "cdc/cdc_options.hh" directly.

After this patch, modifying cdc/cdc_options.hh requires only 6 source files
to be recompiled.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200813070631.180192-1-nyh@scylladb.com>
2020-08-16 14:41:47 +03:00