Commit Graph

15432 Commits

Author SHA1 Message Date
Duarte Nunes
a3bbd52e2e Merge 'Add materialized view metrics' from Piotr
"
This series introduces materialized view statistics, as stated in issue #3385:
 - updates pushed
 - updates failed
 - row lock stats

It also addresses issue #3416 by decoupling user write stats from view
update stats.
"

* 'materialized_view_metrics_9' of https://github.com/psarna/scylla:
  view: adapt view_stats to act as write stats
  storage_proxy: decouple write_stats from stats
  db: add row locking metrics
  view: add view metrics
2018-05-22 18:41:51 +01:00
Avi Kivity
49892a06b9 Merge "exception safety and minimum work for compaction controller" from Glauber
"
This was sent before as two separate patchsets. It is now unified
because it has a lot of common infrastructure.

In this patchset I am aiming at two goals:

1) Provide a minimum amount of shares for user-initiated operations like
nodetool compact and nodetool cleanup

2) Be more robust with exceptions in the backlog tracker

For the first, the main difference is that I now made the compaction
controller a part of the compaction manager. It then becomes easy to
consult with the compaction controller for the correct amount of shares
those operations should have.

In compaction_strategy.cc, the major_compaction_strategy object was
actually already unused before. So instead of making use of it, which
would require some form of information flow downwards about the backlog
we need to export, I am creating a user-initiated backlog type inside
the compaction manager.

With the two changes described above everything is very well
self-contained within the compaction manager and the implementation
becomes trivial.

For the second, I am now handling exceptions in two places:

1) the backlog computation. Those are const functions so if we just have
a transient exception when compacting the backlog, all we need to do is
return some fixed amount of shares and try again in the next adjustment
window.

2) the process of adding / removing SSTables. Those are harder, since if
we fail to manipulate the list we'll be left in an inconsistent state.
The best approach is then to disable the backlog tracker and return a
fixed amount of shares globally.

Tests: unit (release)
"

* 'backlog-improvements-v3' of github.com:glommer/scylla:
  compaction_manager: disable backlog tracker if we see an exception
  backlog tracker: protect against exceptions in backlog calculation.
  STCS_backlog: protect against negative backlog
  STCS_backlog: remove unused attribute
  compaction strategy: move size tiered backlog to a header
  compaction_strategy: delete major_compaction_strategy class
  compaction: make sure that user-initiated compactions always have a minimum priority
  backlog_controller: add constants to represent a globally disabled controller
  backlog_controller: move compaction controller to the compaction manager
  backlog_controller: allow users to compute inverse function of shares
2018-05-22 18:35:42 +03:00
Piotr Sarna
3792bed3ed view: adapt view_stats to act as write stats
This commit adapts view_stats structure so it can be passed
to storage_proxy as write stats. Thanks to that, mv replica updates
will not interfere with user write metrics. As a side effect it also
provides more stats to replica view updates.

Closes #3385
Closes #3416
2018-05-22 16:52:58 +02:00
Piotr Sarna
1d590b3ca4 storage_proxy: decouple write_stats from stats
This commit extracts metrics related to writes from stats structure,
so it can be easily replaced later, e.g. for materialized view metrics.

References #3385
References #3416
2018-05-22 16:52:58 +02:00
Piotr Sarna
9246bb36bc db: add row locking metrics
This commit adds statistics to row_locker class. Metrics are
independendly counted for all lock types: row<->partition and
exclusive<->shared.

Metrics gathered:
 - total acquisitions
 - operations that wait on the lock
 - histogram of the time spent on waiting on this type of lock

References #3385
References #3416
2018-05-22 16:52:58 +02:00
Piotr Sarna
49bebcfa25 view: add view metrics
This commit introduces view statistics:
 - updates pushed to local/remote replicas
 - updates failed to be pushed to local/remote replicas

Metrics are kept on per-table basis, i.e. updates_pushed_remote
shows the number of total updates (mutations) pushed to all paired
mv replicas that this particular table has.
Every single update is taken into consideration, so if view update
requires removing a row from one view and adding a row to another,
it will be counted as 2 updates.

References #3385
References #3416
2018-05-22 16:52:58 +02:00
Tomasz Grabiec
e554a39fbb tests: memtable_snapshot_source: Fix compact()
Compactor collects all currently active memtables and later replaces
them with the merged result. The problem is that active memtable
belongs to the input set during compaction and as a result mutations
applied concurrently with compaction could be lost once compaction
replaces the memtables. The fix is to open a new active memtable when
compaction starts.

Caused sporadic failures of row_cache_test.cc:test_continuity_is_populated_when_read_overlaps_with_older_version()
Message-Id: <1526997724-13037-1-git-send-email-tgrabiec@scylladb.com>
2018-05-22 15:08:07 +01:00
Glauber Costa
d4e7783188 compaction_manager: disable backlog tracker if we see an exception
If we see an exception when adding or removing SSTables from the backlog
tracker, the backlog tracker can be inconsistent forever. It would be
best if we act before that happens and disable the backlog tracker. Once
the backlog tracker is disabled it will default to returning a fixed
number of shares.

We can either disable the backlog tracker or remove it. But if we remove
it we can end up with a backlog of zero if that's the only tracker with
a backlog. We then keep it registered but mark it as disabled. This also
leaves room for recovery in some situations: we can recover the backlog
by a doing a schema change in the column family that had the backlog
disabled, for instance.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:36:32 -04:00
Glauber Costa
fde26ec633 backlog tracker: protect against exceptions in backlog calculation.
Backlog calculations should be exception free, but there are at cases in
which I can see they happening. One example is if  some backlog tracker
that uses temporary objects fails an allocation.

Memory shortages can be specially pernicious: if we leave the
responsibility of catching those to the individual backlog tracker, we
will keep trying to make more allocations in the other backlog trackers
if we have many column families. By handling it here we can stop that.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:36:22 -04:00
Glauber Costa
3e08bd17f0 STCS_backlog: protect against negative backlog
A negative backlog can be interpreted as a very large backlog.
Part of that is because we keep the total_size as an unsigned type,
which is what we expect. But in case there is an issue-- like an
exception that causes some SSTable not to be tracked then this size
can become negative. Returning a zero backlog is better than allowing
it to be interpreted as a giant number.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:36:22 -04:00
Glauber Costa
4b4e9f6c8c STCS_backlog: remove unused attribute
This attribute ended up being unused in the final version.
Spotted now while reading the code for other purposes.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:36:22 -04:00
Glauber Costa
10046593be compaction strategy: move size tiered backlog to a header
It's very common to other strategies to include a SizeTiered
step somehow inside their algorithms: LCS will do SizeTiered on
L0, TWCS will do SizeTiered within a window, etc.

To make it easier for those strategies to consume the SizeTiered
backlog tracker, we will move that to its own file.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:36:22 -04:00
Glauber Costa
36ccb1dd7c compaction_strategy: delete major_compaction_strategy class
It was already unused before this series. In an earlier version I have
used it to provide an ad-hoc backlog for major compactions. But now that
this is done by the compaction manager, this class really isn't being
used.

And it is likely it won't be: major compaction is not a compaction
strategy a user can choose, unlike the others that need to be built
through make_compaction_strategy.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:33:59 -04:00
Glauber Costa
9320d6f17f compaction: make sure that user-initiated compactions always have a minimum priority
We have observed the following behavior with user initiated compactions,
like major compactions:

- if there are no writes, the backlog doesn't increase.
- as compaction progresses the backlog decreases.
- at some point, the backlog is so low that compaction barely makes any
  progress.

Going forward, we should allow one to read from the generated partial
SSTables, in which case this doesn't matter that much. But for
user-iniated compactions we would like to guarantee a minimum baseline.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:33:25 -04:00
Glauber Costa
c55ab93178 backlog_controller: add constants to represent a globally disabled controller
There are situations in which we want the controllers to stop working
altogether. Usually that's when we have an unimplemented controller or
some exception.

We want to return fixed shares in this case, but this is a very
different situation from when we want fixed shares for *one* backlog
tracker: we want to return fixed shares, yes, but if we disable 200
backlog trackers (because they all failed, for instance), we don't want
that fixed number x 200 to be our backlog.

So the mechanism to globally disable the controller is still granted,
and infinity is a good way to represent that. It's a float that the
controller can easily test against. But actually using infinity in the
code is confusing. People reading it may interpret it as the other way
around from what it means, just meaning "a very large backlog".

Let's turn that into a constant instead. It will help us convey meaning.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:25:23 -04:00
Glauber Costa
d758a416f8 backlog_controller: move compaction controller to the compaction manager
There was recently an attempt to add minimum shares to major compactions
which ended up being harder than it should be due to all the plumbing
necessary to call the compaction controller from inside the compaction
manager-- since it is currently a database object. We had this problem
again when trying to return fixed shares in case of an exception.

Taking a step back, all of those problems stem from the fact that the
compaction controller really shouldn't be a part of the database: as it
deals with compactions and its consequences it is a lot more natural to
have it inside the compaction manager to begin with.

Once we do that, all the aforementioned problems go away. So let's move
there where it belongs.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-22 09:24:19 -04:00
Calle Wilund
62c3b4c429 commitlog: Ensure file objects are closed before object free
Fixes #3446

Previously, only shutdown-synced objects where actually closed,
which is wrong.

This introduces yet another queue, processed together with the
deletion objects, which ensures we explicitly close all objects
that have been discarded.

Message-Id: <20180521140456.32100-1-calle@scylladb.com>
2018-05-22 14:52:06 +03:00
Duarte Nunes
4b2fd8d6f2 Merge 'Use hinted handoff to replay missed updates from base to view' from Piotr
"This series leverages hinted handoff for failed view replica
updates."

* 'materialized_view_updates_with_hh_5' of https://github.com/psarna/scylla:
  storage_proxy: enable hinted handoff for materialized views
  storage_proxy: make view updates use consistency_level::ANY
2018-05-22 11:24:37 +01:00
Paweł Dziepak
05c94bc98d mutation_partition: do not dereference null in find_cell()
row::find_cell() may be called for cells that do not exist in that row.
In such case nullptr shall be returned, this patch makes sure that
it is not dereferenced.
Message-Id: <20180522091726.24396-1-pdziepak@scylladb.com>
2018-05-22 10:31:09 +01:00
Glauber Costa
d3f985ef46 backlog_controller: allow users to compute inverse function of shares
There are some situations in which we want to force a specific amount of
shares and don't have a backlog. We can provide a function to get that
from the controller.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-05-21 19:35:07 -04:00
Avi Kivity
51f5599c75 Merge seastar upstream
* seastar a6cb005...5da5d4e (6):
  > append_challenged_posix_file_impl: Ensure continuation uses non-stale object
  > utils: make make_visitor() public
  > tcp: Adjust receive window
  > tcp: Fix allowed sending size calculation in can_send
  > tcp: Fix assert in tcp::tcb::output_one
  > be more descriptive with failed syscalls for filesystem operations

Contains alternative fix for #3446 (will also be fixed directly).
2018-05-21 20:35:30 +03:00
Piotr Sarna
f5d6326ced storage_proxy: enable hinted handoff for materialized views
This commit initializes and enables hinted handoff for materialized
views, even if HH is not explicitly turned on in config.

User writes still use hinted handoff only if it is explicitly enabled,
while materialized views are allowed to use it unconditionally
in order to store failed replica updates somewhere.

Fixes #3383
2018-05-21 17:09:27 +02:00
Piotr Sarna
da0d458f5f storage_proxy: make view updates use consistency_level::ANY
This commit makes view replica updates internally use consistency
level ANY, so in case an update fails it will fall back to hinted
handoff.

References #3383
2018-05-21 17:09:27 +02:00
Piotr Sarna
ba9e8a4f2c tests: initialize hints directory for cql env
This commit initializes hints_directory config value for cql_test_env.
It's needed now because materialized views support force-enables
hinted handoff.

Message-Id: <2aadf35eee329c1f89977c4a55660f330bd9d591.1526914827.git.sarna@scylladb.com>
2018-05-21 18:06:01 +03:00
Botond Dénes
204f6fd478 test.py: print test args when listing failed tests
This can be very helpful when a test only fails when run with some
particular arguments.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <dac1f7e23afa904156e65c3bb3c8fd52b7e999ff.1526906955.git.bdenes@scylladb.com>
2018-05-21 17:28:18 +03:00
Avi Kivity
f9c2ff1f9c install: prepare /etc directory
install(1) creates missing directories on recent Fedora, but not
on CentOS 7. This causes the RPM build (which installs to a pristine
tree, without an existing /etc) to fail.

Fix by setting up /etc.

Tests: rpm (Fedora, CentOS)
Message-Id: <20180520124937.20466-1-avi@scylladb.com>
2018-05-21 09:51:46 +02:00
Asias He
db8c3a7059 streaming: Do not use dht::split_ranges_to_shards
There is no need to call dht::split_ranges_to_shards to split the token
range into <shard> : <a lot of small ranges> mapping and create a flat
mutation reader with a lot of small ranges.

Because:

1) The flat mutation reader on each shard only returns data belongs to
this local shard, there is no correctness issue if we do not split and
feed the sub ranges only belongs to this local shard.

2) With murmur3_partitioner_ignore_msb_bits = 12, it is almost certain
that given a token range, all the shards will have data for the range
anyway. Even if we ask all the shards to work on the token range and
some of the shards have no data for it, it is fine. We simply send no
data from this shard.

Tests: update_cluster_layout_tests.py

Message-Id: <ac00cd21d6156c47b74451dd415d627481e48212.1526864222.git.asias@scylladb.com>
2018-05-21 10:42:45 +03:00
Takuya ASADA
5407c34c73 dist/debian: depends to coreutils instead of realpath on Ubuntu 18.04
On Ubuntu 18.04 realpath package is dropped, it becomes part of coreutils.

Fixes #3445

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180521031954.30815-1-syuu@scylladb.com>
2018-05-21 10:42:05 +03:00
Asias He
0c54c6e16f storage_service: Add node has left the cluster log
Remove a node from the cluster is a major operation, it deserves a log
for it. Add a log when node is removed from the cluster by `nodetool
decommission` or `nodetool removenode`.

Message-Id: <b6adf34492c8138296911f2b37b39e9dd8ed10a2.1523347916.git.asias@scylladb.com>
2018-05-19 21:47:05 +03:00
Asias He
e20038eb84 streaming: Handle stream_mutation rpc handler on all shards
In streaming, the sender sends the mutations on all the local shards in
parallel, it is possible that the receiver handle more than one such
connection on the same shard. It is determined by where the tcp
connection goes. Current rpc ignores the dest shard id when sending the
rpc message.

For instance, say node1 has 2 shards, node2 has 2 shards. Currently, we
can end up with like this:

   Node 1 shard 0 -> Node 2 shard 1
   Node 1 shard 1 -> Node 2 shard 1

It is better if we do:

   Node 1 shard 0 -> Node 2 shard 0
   Node 1 shard 1 -> Node 2 shard 1

This patch solves this problem by let the handler always handle on
shard = src_cpu_id % smp::count.

If sender and receiver have the same shard config, it is completely
distributed the work evenly.

If sender and receiver do not have the same shard config, it is
unavoidable some of the shard will do more work than the others.

Tests: dtest update_cluster_layout_tests.py

Message-Id: <911827bcf67459a07ec92623a9ed4c4fbba195ca.1524622375.git.asias@scylladb.com>
2018-05-19 21:08:25 +03:00
Calle Wilund
f69a52c475 storage_service: Add more error info to "isolate_on_error" shutdown
Fixes #2793

Prints error handle class (commitlog or "other/disk") + exception
type and message. While not exhaustive, at least gives a correlation
point to (hopefully) other log printouts.

Message-Id: <20180509081040.7676-1-calle@scylladb.com>
2018-05-19 21:06:03 +03:00
Piotr Jastrzebski
1520ffe7f5 sstables: check buffer size when reading vints
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <6ecbedae818fbef1f67a4472aba4ce443b9df0ee.1525888830.git.piotr@scylladb.com>
2018-05-19 21:01:45 +03:00
Avi Kivity
46a0109608 Merge "Support compression when writing SSTables 3.x." from Vladimir
"
For compression, SSTables 3.x format uses CRC32 for checksumming
compressed chunks as well as for calculating the full file checksum.
Also, while for older formats "full checksum" of a compressed data file
means a combination of checksums of its compressed chunks, in SSTables
3.x this now reads literally and assumes the checkum of all bytes
written, including per-chunk digests.

Tests: unit {debug, release}
"

* 'projects/sstables-30/write-compression/v3' of https://github.com/argenet/scylla:
  tests: Add unit tests for writing compressed SSTables 3.x.
  tests: Validate Digest32.crc for SSTables 3.x write tests.
  tests: Fix invalid Digest file for write_counter_table test.
  sstables: Support writing compressed SSTables 3.0.
  sstables: Make compressed streams customizable on checksumming.
  sstables: Move checksum calculation logic to compressed_output_stream.
2018-05-19 20:52:08 +03:00
Vladimir Krivopalov
d588a7e743 tests: Add unit tests for writing compressed SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-19 20:52:08 +03:00
Vladimir Krivopalov
e5ab271863 tests: Validate Digest32.crc for SSTables 3.x write tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-19 20:52:08 +03:00
Vladimir Krivopalov
fcc7bad777 tests: Fix invalid Digest file for write_counter_table test.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-19 20:52:07 +03:00
Vladimir Krivopalov
dd00d90a05 sstables: Support writing compressed SSTables 3.0.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-19 20:52:07 +03:00
Vladimir Krivopalov
cc62ad3b69 sstables: Make compressed streams customizable on checksumming.
Use either Adler32 or CRC32 while writing to or reading from a
compressed stream.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-19 20:52:07 +03:00
Vladimir Krivopalov
5183294676 sstables: Move checksum calculation logic to compressed_output_stream.
Previously, compressed_output_stream used to calculate checksum of the
supplied chunk and pass it to the 'compression' object to combine with
the full checksum calculated on prior writes.
Now, all the checksum calculation happens inside
compressed_output_stream and 'compression' only stores the result.

This is done to loosen ties between two classes and simplify
compressed_output_stream customisation with various checksum algorithms.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-19 20:52:07 +03:00
Glauber Costa
596a525950 commitlog: don't move pointer to segment
We are currently moving the pointer we acquired to the segment inside
the lambda in which we'll handle the cycle.

The problem is, we also use that same pointer inside the exception
handler. If an exception happens we'll access it and we'll crash.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180518125820.10726-1-glauber@scylladb.com>
2018-05-18 17:25:18 +02:00
Avi Kivity
684bb2042d Merge "Fixes and improvements for gdb LSA commands" from Tomasz
* tag 'tgrabiec/fixes-and-improvements-for-gdb-scripts-v1' of github.com:tgrabiec/scylla:
  gdb: Print live object size from 'scylla lsa-segment'
  gdb: Extend 'scylla segment-descs' output with full occupancy info
  gdb: Print allocated object's type name instead of full LSA migrator
  gdb: Fix LSA migrator discovery
  gdb: Drop code related to LSA zones
  gdb: Fix uses of removed segment_desctriptor::_lsa_managed
  lsa: Add use for debug::static_migrators
2018-05-17 15:54:21 +03:00
Tomasz Grabiec
d4a2d22812 gdb: Print live object size from 'scylla lsa-segment' 2018-05-17 14:22:20 +02:00
Tomasz Grabiec
08026a64c5 gdb: Extend 'scylla segment-descs' output with full occupancy info
After:

 0x600007220000: lsa free=24800  used=106272  81.08% region=0x600000403210
 0x600007240000: lsa free=13     used=131059  99.99% region=0x600000403210
 0x600007260000: lsa free=23072  used=108000  82.40% region=0x600000403210
 0x600007280000: lsa free=16772  used=114300  87.20% region=0x600000403210
 0x6000072a0000: lsa free=23996  used=107076  81.69% region=0x600000401410
 0x6000072c0000: lsa free=15552  used=115520  88.13% region=0x600000403210
2018-05-17 14:22:20 +02:00
Tomasz Grabiec
abd667d924 gdb: Print allocated object's type name instead of full LSA migrator
Before:

  0x6000302604e0: live {_vptr.migrate_fn_type = 0x3797a00 <vtable for standard_migrator<cache_entry>+16>, _migrators = std::any containing seastar::lw_shared_ptr<(anonymous namespace)::migrators> = {[contained value] = {_p = 0x600000080a80}}, _align = 8, _index = 0} @ 0x6000302604e8

After:

  0x6000302604e0: live cache_entry @ 0x6000302604e8
2018-05-17 14:22:14 +02:00
Tomasz Grabiec
653fcc10bb gdb: Fix LSA migrator discovery
Fixes 'scylla lsa-segment' which broke after recent changes, probably
commit b3699f286d.
2018-05-17 14:22:14 +02:00
Tomasz Grabiec
bb8f82f43f gdb: Drop code related to LSA zones
LSA zones have been removed.
2018-05-17 14:22:14 +02:00
Tomasz Grabiec
84a7961c23 gdb: Fix uses of removed segment_desctriptor::_lsa_managed 2018-05-17 14:22:14 +02:00
Tomasz Grabiec
498a4132c5 lsa: Add use for debug::static_migrators
Otherwise GDB complains about it being optimized out, breaking our
debug scritps.
2018-05-17 14:22:14 +02:00
Avi Kivity
d9c80cac26 dist: move Red Hat installation from .spec %install to new install.sh
Move code to a traditional install.sh script (more traditional would be
a "make install", but this is close enough).

This allows testing installation independently of packaging. In addition,
non-Red Hat-packaging can share much of the code in install.sh.

Ref #3243.

Tests: build+install rpm
Message-Id: <20180517114147.30863-1-avi@scylladb.com>
2018-05-17 13:46:27 +02:00
Avi Kivity
98967da94f Merge seastar upstream
* seastar 0a1a327...a6cb005 (1):
  > Merge " misc fixes for iotune" from Glauber
2018-05-17 12:42:46 +03:00