Commit Graph

23218 Commits

Author SHA1 Message Date
Raphael S. Carvalho
d601f78b4b compaction/twcs: improve further debug messages
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
086f277584 compaction/twcs: Improve debug log which shows all windows
The current log prints one log entry for each window, it doesn't print
the # of SSTs in the bucket, and the now information is copied across
all the window entries.

previously, it looked like this:

[shard 0] compaction - Key 1597331160000000, now 1597331160000000
[shard 0] compaction - Key 1597331100000000, now 1597331160000000
[shard 0] compaction - Key 1597331040000000, now 1597331160000000
[shard 0] compaction - Key 1597330980000000, now 1597331160000000

this made it harder to group all windows which reflect the state of
the strategy in a given time.

now, it looks like as follow:

[shard 0] compaction - time_window_compaction_strategy::newest_bucket:
  now 1597331160000000
  buckets = {
    key=1597331160000000, size=1
    key=1597331100000000, size=2
    key=1597331040000000, size=1
    key=1597330980000000, size=1
  }

Also the level of this log is changed from debug to trace, given that
now it's compressed and only printed once.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
3be1420083 test: Check that TWCS properly performs size-tiered compaction on past windows
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
96436312be compaction/twcs: Make task estimation take into account the size-tiered behavior
The task estimation was not taking into account that TWCS does size-tiered
on the the windows, and it only added 1 to the estimation when there
could be more tasks than that depending on the amount of SSTables in
all the existing size tiers.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
d287b1c198 compaction/stcs: Export static function that estimates pending tasks
That will be useful for allowing other compaction strategies that use
STCS to properly estimate the pending tasks.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
b62737fd05 compaction/stcs: Make get_buckets() static
STCS will export a static function to estimate pending tasks, and
it relies on get_buckets() being static too.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:07 -03:00
Raphael S. Carvalho
f9f0be9ac8 compact/twcs: Perform size-tiered compaction on past time windows
After data segregation feature, anything that cause out-of-order writes,
like read repair, can result in small updates to past time windows.
This causes compaction to be very aggressive because whenever a past time
window is updated like that, that time window is recompacted into a
single SSTable.
Users expect that once a window is closed, it will no longer be written
to, but that has changed since the introduction of the data segregation
future. We didn't anticipate the write amplification issues that the
feature would cause. To fix this problem, let's perform size-tiered
compaction on the windows that are no longer active and were updated
because data was segregated. The current behavior where the last active
window is merged into one file is kept. But thereafter, that same
window will only be compacted using STCS.

Fixes #6928.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
820b47e9a3 compaction/twcs: Make strategy easier to extend by removing duplicated knowledge
TWCS is hard to extend because its knowledge on what to do with a window
bucket is duplicated in two functions. Let's remove this duplication by
placing the knowledge into a single function.

This is important for the coming change that will perform size-tiered
instead of major on windows that are no longer active.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
f2b588cfc4 compaction/twcs: Make newest_bucket() non-static
To fix #6928, newest_bucket() will have to access the class fields.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
b95359314d compaction/twcs: Move TWCS implementation into source file
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Avi Kivity
f4dbe3e65e Update tools/jmx submodule
* tools/jmx c5ed831...be8f1ac (1):
  > dist/common/systemd: set WorkingDirectory to get heap dump correctly
2020-08-17 09:54:59 +03:00
Avi Kivity
3b1ff90a1a Merge "Get rid of seed concept in gossip" from Asias
"
gossip: Get rid of seed concept

The concept of seed and the different behaviour between seed nodes and
non seed nodes generate a lot of confusion, complication and error for
users. For example, how to add a seed node into into a cluster, how to
promote a non seed node to a seed node, how to choose seeds node in
multiple DC setup, edit config files for seeds, why seed node does not
bootstrap.

If we remove the concept of seed, it will get much easier for users.
After this series, seed config option is only used once when a new node
joins a cluster.

Major changes:

Seed nodes are only used as the initial contact point nodes.

Seed nodes now perform bootstrap. The only exception is the first node
in the cluster.

The unsafe auto_bootstrap option is now ignored.

Gossip shadow round now talks to all nodes instead of just seed nodes.

Refs: #6845
Tests: update_cluster_layout_tests.py + manual test
"

* 'gossip_no_seed_v2' of github.com:asias/scylla:
  gossip: Get rid of seed concept
  gossip: Introduce GOSSIP_GET_ENDPOINT_STATES verb
  gossip: Add do_apply_state_locally helper
  gossip: Do not talk to seed node explicitly
  gossip: Talk to live endpoints in a shuffled fashion
2020-08-17 09:50:51 +03:00
Avi Kivity
5356e8319d Merge 'Support building packages on non-x86 platform' from Takuya
"
Allow users to build unofficial packages for non-x86 platform.
"

* syuu1228-aarch64_packaging_fix:
  dist/debian: allow building non-amd64 .deb
  configure.py: disable DPDK by default on non-x86_64 platform
2020-08-17 08:26:17 +03:00
Takuya ASADA
c73e945cf6 dist/debian: allow building non-amd64 .deb
Allow building .deb on any architecture, not only amd64.
2020-08-17 14:16:24 +09:00
Takuya ASADA
06079f0656 configure.py: disable DPDK by default on non-x86_64 platform
Since configure.py without option fails on some non-x86 architecture
such as ARM64, we should disable it on such architectures.
2020-08-17 14:16:24 +09:00
Asias He
d0b3f3dfe8 gossip: Get rid of seed concept
The concept of seed and the different behaviour between seed nodes and
non seed nodes generate a lot of confusion, complication and error for
users. For example, how to add a seed node into into a cluster, how to
promote a non seed node to a seed node, how to choose seeds node in
multiple DC setup, edit config files for seeds, why seed node does not
bootstrap.

If we remove the concept of seed, it will get much easier for users.
After this series, seed config option is only used once when a new node
joins a cluster.

Major changes:

- Seed nodes are only used as the initial contact point nodes.

- Seed nodes now perform bootstrap. The only exception is the first node
  in the cluster.

- The unsafe auto_bootstrap option is now ignored.

- Gossip shadow round now attempts to talk to all nodes instead of just seed nodes.

Manual test:

- bootstrap n1, n2, n3  (n1 and n2 are listed as seed, check only n1
  will skip bootstrap, n2 and n3 will bootstrap)
- shtudown n1, n2, n3
- start n2 (check non seed node can boot)
- start n1 (check n1 talks to both n2 and n3)
- start n3 (check n3 talks to both n1 and n3)

Upgrade/Downgrade test:

- Initialize cluster
  Start 3 node with n1, n2, n3 using old version
  n1 and n2 are listed as seed

- Test upgrade starting from seed nodes
  Rolling restart n1 using new version
  Rolling restart n2 using new version
  Rolling restart n3 using new version

- Test downgrade to old version
  Rolling restart n1 using old version
  Rolling restart n2 using old version
  Rolling restart n3 using old version

- Test upgrade starting from non seed nodes
  Rolling restart n3 using new version
  Rolling restart n2 using new version
  Rolling restart n1 using new version

Notes on upgrade procedure:

There is no special procedure needed to upgrade to Scylla without seed
concept. Rolling upgrade node one by one is good enough.

Fixes: #6845
Tests: ./test.py + update_cluster_layout_tests.py + manual test
2020-08-17 10:35:16 +08:00
Takuya ASADA
75c2362c95 dist/debian: disable debuginfo compression on .deb
Since older binutils on some distribution does not able to handle
compressed debuginfo generated on Fedora, we need to disable it.
However, debian packager force debuginfo compression since debian/compat = 9,
we have to uncompress them after compressed automatically.

Fixes #6982
2020-08-16 18:13:29 +03:00
Avi Kivity
125795bda5 Merge " Build tarballs to build/dist/<mode>/tar directory" from Pekka
"
This patch series changes the build system to build all tarballs to
build/dist/<mode>/tar directory. For example, running:

  ./tools/toolchain/dbuild ./configure.py --mode=dev && ./tools/toolchain/dbuild ninja-build dist-tar

produces the following tarballs in build/dist/dev/tar:

  $ ls -1 build/dist/dev/tar/
  scylla-jmx-package.tar.gz
  scylla-package.tar.gz
  scylla-python3-package.tar.gz
  scylla-tools-package.tar.gz

This makes it easy to locate release tarballs for humans and scripts. To
preserve backward compatibility, the tarballs are also retained in their
original locations. Once release engineering infrastructure has been
adjusted to use the new locations, we can drop the duplicate copies.
"

* 'penberg/build-dist-tar/v1' of github.com:penberg/scylla:
  configure.py: Copy tarballs to build/dist/<mode>/tar directory
  configure.py: Add "dist-<component>-tar" targets
  reloc/python3: Add "--builddir" to build_deb.sh
  configure.py: Use copy-on-write copies when possible
2020-08-16 17:55:35 +03:00
Avi Kivity
061ec49a6c Merge "Improve error reporting on invalid internal schema access" from Tomasz
"
Contains several fixes which improve debuggability in situations where
too large column ids are passed to column definition loop methods.
"

* 'schema-range-check-fix' of github.com:tgrabiec/scylla:
  schema: Add table name and schema version to error messages
  schema: Use on_internal_error() for range check errors
  schema: Fix off-by-one in column range check
  schema: Make range checks for regular and static columns the same as for clustering columns
2020-08-16 17:48:48 +03:00
Raphael S. Carvalho
81ec49c82f sstables/sstable_set: rename method to retrieve sstable runs
select() is too generic for the method that retrieve sstable runs,
and it has a completely different meaning that the former select
method used to select sstables based on token range.
let's give it a more descriptive name.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200811193401.22749-1-raphaelsc@scylladb.com>
2020-08-16 17:41:16 +03:00
Raphael S. Carvalho
b07920dd1f sstables: Fix remove_by_toc_name() on temporary toc
regression caused by 55cf219c97.

remove_by_toc_name() must work both for a sealed sstable with toc,
and also a partial sstable with tmp toc.
so dirname() should be called conditionally on the condition of
the sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200813160612.101117-1-raphaelsc@scylladb.com>
2020-08-16 17:35:55 +03:00
Raphael S. Carvalho
7d7f9e1c54 sstables/LCS: increase per-level overlapping tolerance in reshape
LCS can have its overlapping invariant broken after operations that can
proceed in parallel to regular compaction like cleanup. That's because
there could be two compactions in parallel placing data in overlapping
token ranges of a given level > 0.
After reshape, the whole table will be rewritten, on restart, if a
given level has more than (fan_out*2)=20 overlaps.
That may sound like enough, but that's not taking into account the
exponential growth in # of SSTables per level, so 20 overlaps may
sound like a lot for level 2 which can afford 100 sstables, but it's
only 2% of level 3, and 0.2% of level 4. So let's change the
overlapping tolerance from the constant of fan_out*2 to 10% of level
limit on # of SSTables, or fan_out, whichever is higher.

Refs #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200810154510.32794-1-raphaelsc@scylladb.com>
2020-08-16 17:33:48 +03:00
Raphael S. Carvalho
11df96718a compaction: Prevent non-regular compaction from picking compacting SSTables
After 8014c7124, cleanup can potentially pick a compacting SSTable.
Upgrade and scrub can also pick a compacting SSTable.
The problem is that table::candidates_for_compaction() was badly named.
It misleads the user into thinking that the SSTables returned are perfect
candidates for compaction, but manager still need to filter out the
compacting SSTables from the returned set. So it's being renamed.

When the same SSTable is compacted in parallel, the strategy invariant
can be broken like overlapping being introduced in LCS, and also
some deletion failures as more than one compaction process would try
to delete the same files.

Let's fix scrub, cleanup and ugprade by calling the manager function
which gets the correct candidates for compaction.

Fixes #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200811200135.25421-1-raphaelsc@scylladb.com>
2020-08-16 17:31:03 +03:00
Nadav Har'El
7e01ae089e cdc: avoid including cdc/cdc_options.hh everywhere
Before this patch, modifying cdc/cdc_options.hh required recompiling 264
source files. This is because this header file was included by a couple
other header files - most notably schema.hh, where a forward declaration
would have been enough. Only the handful of source files which really
need to access the CDC options should include "cdc/cdc_options.hh" directly.

After this patch, modifying cdc/cdc_options.hh requires only 6 source files
to be recompiled.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200813070631.180192-1-nyh@scylladb.com>
2020-08-16 14:41:47 +03:00
Piotr Jastrzebski
01ea159fde codebase wide: use try_emplace when appropriate
C++17 introduced try_emplace for maps to replace a pattern:
if(element not in a map) {
    map.emplace(...)
}

try_emplace is more efficient and results in a more concise code.

This commit introduces usage of try_emplace when it's appropriate.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4970091ed770e233884633bf6d46111369e7d2dd.1597327358.git.piotr@scylladb.com>
2020-08-16 14:41:09 +03:00
Pekka Enberg
39400f58fb build_unified.sh: Compress generated tarball
Fixes #7039
Message-Id: <20200813124921.1648028-1-penberg@scylladb.com>
2020-08-16 14:41:01 +03:00
Dejan Mircevski
edf91e9e06 test: Restore a case in user_types_test
This testcase was temporarily commented out in 37ebe52, because it
relied on buggy (#6369) behaviour fixed by that commit.  Specifically,
it expected a NULL comparison to match a NULL cell value.  We now
bring it back, with corrected result expectation.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-16 13:49:55 +03:00
Pavel Emelyanov
319c9dda92 scylla-gdb: Fix netw command
The _clients is std::vector, it doesn't have _M_elems.
Luckily there's std_vector() class for it.

The seastar::rpc::server::_conns is unordered_map, not
unordered_set.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200814070858.32383-1-xemul@scylladb.com>
2020-08-16 11:41:02 +03:00
Asias He
c76296e97e scylla-gdb.py: Add boost_intrusive_list_printer
It is needed to print the boost::intrusive::list which is used
by repair_meta_for_masters in repair.

Fixes #7037

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Signed-off-by: Asias He <asias@scylladb.com>
2020-08-15 20:26:02 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Konstantin Osipov
6d7393b0df build: Make it possible to opt-out from building the packages.
Message-Id: <20200812162845.852515-1-kostja@scylladb.com>
2020-08-15 20:26:02 +03:00
Tomasz Grabiec
db1c8c439a schema: Add table name and schema version to error messages 2020-08-14 14:35:09 +02:00
Tomasz Grabiec
817c2e0508 schema: Use on_internal_error() for range check errors 2020-08-14 14:35:09 +02:00
Tomasz Grabiec
43d503102b schema: Fix off-by-one in column range check
We'd fail in std::vector::at() instead.

Let's catch all invalid accesses, as intended.
2020-08-14 14:34:51 +02:00
Tomasz Grabiec
b41f2c719b schema: Make range checks for regular and static columns the same as for clustering columns 2020-08-14 14:34:51 +02:00
Pekka Enberg
3c1db2fb87 configure.py: Copy tarballs to build/dist/<mode>/tar directory 2020-08-14 13:06:13 +03:00
Pekka Enberg
5a2d271df8 configure.py: Add "dist-<component>-tar" targets 2020-08-14 13:06:13 +03:00
Pekka Enberg
e4685020ba reloc/python3: Add "--builddir" to build_deb.sh
Add a "--builddir" command line option to build_deb.sh script of Python
3 so that we can use it to control artifact build location.
2020-08-14 13:06:13 +03:00
Pekka Enberg
7adae6b04a configure.py: Use copy-on-write copies when possible
Pass the "--reflink=auto" command line option to "cp" to use
copy-on-write copies whenever the filesystem supports it to reduce disk
space usage.
2020-08-14 13:06:13 +03:00
Asias He
e6ceec1685 gossip: Fix race between shutdown message handler and apply_state_locally
1. The node1 is shutdown
2. The node1 sends shutdown message to node2
3. The node2 receives gossip shutdown message but the handler yields
4. The node1 is restarted
5. The node1 sends new gossip endpoint_state to node2, node2 applies the state
   in apply_state_locally and calls gossiper::handle_major_state_change
   and then calls gossiper::mark_alive
6. The shutdown message handler in step 3 resumes and sets status of node1 to SHUTDOWN
7. The gossiper::mark_alive fiber in step 5 resumes and calls gossiper::real_mark_alive,
   node2 will skip to mark node1 as alive because the status of node1 is
   SHUTDOWN. As a result, node1 is alive but it is not marked as UP by node2.

To fix, we serialize the two operations.

Fixes #7032
2020-08-13 11:06:04 +03:00
Nadav Har'El
ee7291aa88 merge: CDC: allow "full" preimage in logs
Merged pull request https://github.com/scylladb/scylla/pull/7028
By Calle Wilund:

Changes the "preimage" option from binary true/false to on/off/full (accepting true/false, and using old style notation for normal to string - for upgrade reasons), where "full" will force us to include all columns in pre image log rows.

Adds small test (just adding the case to preimage test).
Uses the feature in alternator

Fixes #7030

  alternator: Set "preimage" to "full" for streams
  cdc_test: Do small test of "full"
  cdc: Make pre image optionally "full" (include all columns)
2020-08-12 23:19:46 +03:00
Calle Wilund
730c5ea283 alternator: Set "preimage" to "full" for streams
Fixes #7030

Dynamo/alternator streams old image data is supposed to
contain the full old value blob (all keys/values).

Setting preimage=full ensures we get even those properties
that have separate columns if they are not part of an actual
modification.
2020-08-12 16:05:00 +00:00
Calle Wilund
8cc5076033 cdc_test: Do small test of "full"
Not a huge test change, but at least verifies it works.
2020-08-12 16:04:52 +00:00
Calle Wilund
2eb4522fef cdc: Make pre image optionally "full" (include all columns)
Makes the "preimage" option for cdc non-binary, i.e. it can now
be "true"/"on", "false"/"off" or "full. The two former behaving like
previously, the latter obviously including all columns in pre image.
2020-08-12 16:03:06 +00:00
Avi Kivity
79851d6216 Update tools/java submodule
* tools/java f2c7cf8d8d...d6c0ad1e2e (3):
  > sstableloader: Preserve droppedColumns in column rename handling
  > Revert "reloc: Build relocatable package without Maven"
  > reloc: Build relocatable package without Maven
2020-08-12 16:58:45 +03:00
Takuya ASADA
7cccb018b8 aws: update enhanced networking supported instance list
Sync enhanced networking supported instance list to latest one.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

Fixes #6991
2020-08-12 15:43:17 +03:00
Nadav Har'El
8135647906 merge: Add metrics to semaphores
Merged pull request https://github.com/scylladb/scylla/pull/7018
by Piotr Sarna:

This series addresses various issues with metrics and semaphores - it mainly adds missing metrics, which makes it possible to see the length of the queues attached to the semaphores. In case of view building and view update generation, metrics was not present in these services at all, so a first, basic implementation is added.

More precise semaphore metrics would ease the testing and development of load shedding and admission control.

	view_builder: add metrics
	db, view: add view update generator metrics
	hints: track resource_manager sending queue length
	hints: add drain queue length to metrics
	table: add metrics for sstable deletion semaphore
	database: remove unused semaphore
2020-08-12 12:39:59 +03:00
Botond Dénes
4cfab59eb1 scylla-gdb.py: find_db(): don't return current shard's database for shard=0
The `shard` parameter of `find_db()` is optional and is defaulted to
`None`. When missing, the current shard's database instance is returned.
The problem is that the if condition checking this uses `not shard`,
which also evaluates to `True` if `shard == 0`, resulting in returning
the current shard's database instance for shard 0. Change the condition
to `shard is None` to avoid this.

Fixes: #7016
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200812091546.1704016-1-bdenes@scylladb.com>
2020-08-12 12:22:46 +03:00
Avi Kivity
736863c385 Merge "repair: Add progress metrics for node ops" from Asias
"
This series adds progress metrics for the node operations. Metrics for bootstrap and rebuild progress are added as a starter. I will add more for the remaining operations after getting feedback.

With this the Scylla Monitor and Scylla Manager can know the progress of the bootstrap and other node operations. E.g.,

    scylla_node_ops_bootstrap_nr_ranges_finished{shard="0",type="derive"} 50
    scylla_node_ops_bootstrap_nr_ranges_total{shard="0",type="derive"} 1040

Fixes #1244, #6733
"

* 'repair_progress_metrics_v3' of github.com:asias/scylla:
  repair: Add progress metrics for repair ops
  repair: Add progress metrics for rebuild ops
  repair: Add progress metrics for bootstrap ops
2020-08-12 11:42:14 +03:00
Avi Kivity
8853eddaf6 Merge 'repair: Track repair_meta created on both repair follower and master' from Asias
"
It is pretty hard to find the repair_meta object when debugging a core.
This patch makes it is easier by putting repair_meta object created by
both repair follower and master into a map.

Fixes #7009
"

* asias-repair_make_debug_eaiser_track_all_repair_metas:
  repair: Add repair_meta_tracker to track repair_meta for followers and masters
  repair: Move thread local object _repair_metas out of the function
2020-08-12 11:01:32 +03:00