Commit Graph

12459 Commits

Author SHA1 Message Date
Tomasz Grabiec
1d6fec0755 row_cache: Drop not very useful prefixes from metric names
This drops "total_opertaions_" and "objects_" prefixes. There is no
convention of adding them in other parts of the system, and they don't
add much value.

Fixes scylladb/scylla-grafana-monitoring#169.

Message-Id: <1499160342-25865-1-git-send-email-tgrabiec@scylladb.com>
2017-07-04 13:37:12 +03:00
Nadav Har'El
d95f908586 Fix test to use non-wrapping range
The test put a wrapping range into a non-wrapping range variable.
This was harmless at the time this test was written, but newer code
may not be as forgiving so better use a non-wrapping range as intended.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170704103128.29689-1-nyh@scylladb.com>
2017-07-04 13:36:29 +03:00
Avi Kivity
07b8adce0e sstables: fix use-after-free in read_simple()
`r` is moved-from, and later captured in a different lambda. The compiler may
choose to move and perform the other capture later, resulting in a use-after-free.

Fix by copying `r` instead of moving it.

Discovered by sstable_test in debug mode.
Message-Id: <20170702082546.20570-1-avi@scylladb.com>
2017-07-04 10:24:07 +02:00
Raphael S. Carvalho
7b777fe2e3 sstables/lcs: choose sstable with highest droppable tombstone ratio
Currently, lcs will choose, for tombstone compaction, sstable with
the lowest ratio from the ones which ratio is at least above threshold
(0.2 by default).

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170703185633.6644-1-raphaelsc@scylladb.com>
2017-07-04 10:25:10 +03:00
Avi Kivity
bcf7867ac9 Merge "small fixes and cleanup for leveled strategy (part 2)" from from Raphael
* 'lcs_improvements_part_2' of github.com:raphaelsc/scylla:
  lcs: Match estimated tasks arithmetic to score in LCS
  lcs: prevent leveled_compaction_strategy.hh from being included more than once
  lcs: use vector instead for storing a level of sstables
  compaction: keep only one variant of size_tiered_most_interesting_bucket
  lcs: get rid of unused code in leveled_manifest
2017-07-04 10:10:53 +03:00
Raphael S. Carvalho
7606ffd744 lcs: Match estimated tasks arithmetic to score in LCS
Contains fix for CASSANDRA-8904.

Added TARGET_SCORE to get rid of magic number for target score which
is now used more than once.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-04 03:35:02 -03:00
Raphael S. Carvalho
dfb5463478 lcs: prevent leveled_compaction_strategy.hh from being included more than once
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-04 03:35:00 -03:00
Raphael S. Carvalho
db98ab6aaf lcs: use vector instead for storing a level of sstables
list is no longer needed because lcs no longer moves a sstable breaking
invariant at its level to level 0. Now lcs incrementally restores invariant
by compacting together first set of overlapping tables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-04 03:34:57 -03:00
Raphael S. Carvalho
b350352e6c compaction: keep only one variant of size_tiered_most_interesting_bucket
two variants of size_tiered_most_interesting_bucket existed to avoid copy,
but subsequent work will make lcs use vector for each level of sstables,
so let's only keep one variant.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-04 03:34:51 -03:00
Raphael S. Carvalho
5921600b95 lcs: get rid of unused code in leveled_manifest
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-04 03:34:34 -03:00
Nadav Har'El
d177ec05cb repair: further limit parallelism of checksum calculation
Repair today has a semaphore limiting the number of ongoing checksum
comparisons running in parallel (on one shard) to 100. We needed this
number to be fairly high, because a "checksum comparison" can involve
high latency operations - namely, sending an RPC request to another node
in a remote DC and waiting for it to calculate a checksum there, and while
waiting for a response we need to proceed calculating checksums in parallel.

But as a consequence, in the current code, we can end up with as many as
100 fibers all at the same stage of reading partitions to checksum from
sstables. This requires tons of memory, to hold at least 128K of buffer
(even more with read-ahead) for each of these fibers, plus partition data
for each. But doing 100 reads in parallel is pointless - one (or very few)
should be enough.

So this patch adds another semaphore to limit the number of checksum
*calculations* (including the read and checksum calculation) on each shard
to just 2. There may still be 100 ongoing checksum *comparisons*, in
other stages of the comparisons (sending the checksum requests to other
and waiting for them to return), but only 2 will ever be in the stage of
reading from disk and checksumming them.

The limit of 2 checksum calculations (per shard) applies on the repair
slave, not just to the master: The slave may receive many checksum
requests in parallel, but will only actually work on 2 at a time.

Because the parallelism=100 now rate-limits operations which use very little
memory, in the future we can safely increase it even more, to support
situations where the disk is very fast but the link between nodes has
very high latency.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170703151329.25716-1-nyh@scylladb.com>
2017-07-03 18:14:57 +03:00
Piotr Jastrzebski
80f08921c4 Make table_helper independent from trace_keyspace_helper
table_helper is a generic helper than can easily be used in other places.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <11e46dbc1c90d0273a41c8144e6f6013e21efcdb.1499077818.git.piotr@scylladb.com>
2017-07-03 15:55:00 +03:00
Raphael S. Carvalho
972a0237ef database: restore indentation for cleanup_sstables
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170630035324.19881-2-raphaelsc@scylladb.com>
2017-07-03 12:48:54 +03:00
Raphael S. Carvalho
b9d0645199 database: fix potential use-after-free in sstable cleanup
when do_for_each is in its last iteration and with_semaphore defers
because there's an ongoing cleanup, sstable object will be used after
freed because it was taken by ref and the container it lives in was
destroyed prematurely.

Let's fix it with a do_with, also making code nicer.

Fixes #2537.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170630035324.19881-1-raphaelsc@scylladb.com>
2017-07-03 12:48:53 +03:00
Avi Kivity
5883e85da3 Merge "improve maintainability of compaction strategies" from Raphael
"compaction_strategy.cc keeps the full implementation of size tiered,
major, and null strategies, and partial implementation of leveled
and date tiered strategies. It's a mess. In the future, we will also
need space for time window strategy. The file is hard to read and
maintain.
My goal here is to improve maintainability of the strategies by
putting each of them into its own header.

NOTE: No semantic change is introduced here."

* 'improve_compaction_strategy_maintainability' of github.com:raphaelsc/scylla:
  compaction_strategy: move dtcs to its existing header
  compaction_strategy: move lcs implementation to its own header
  compaction_strategy: move stcs implementation to its own header
  compaction_strategy: move compaction_strategy_impl to its own header
2017-07-03 11:39:30 +03:00
Takuya ASADA
0c81974bc4 dist/common/systemd: move scylla-server.service to be after network-online.target instead of network.target
To make sure start Scylla after network is up, we need to move from
network.target to network-online.target.

Fixes #2337

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1493661832-9545-1-git-send-email-syuu@scylladb.com>
2017-07-03 10:01:21 +03:00
Asias He
b2a2fbcf73 repair: Do not store the failed ranges
The number of failed ranges can be large so it can consume a lot of memory.
We already logged the failed ranges in the log. No need to storge them
in memory.

Message-Id: <7a70c4732667c5c3a69211785e8efff0c222fc28.1498809367.git.asias@scylladb.com>
2017-07-03 10:00:25 +03:00
Takuya ASADA
1c35549932 dist/common/scripts/scylla_cpuscaling_setup: skip configuration when cpufreq driver doesn't loaded
Configuring cpufreq service on VMs/IaaS causes an error because it doesn't supported cpufreq.
To prevent causing error, skip whole configuration when the driver not loaded.

Fixes #2051

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1498809504-27029-1-git-send-email-syuu@scylladb.com>
2017-07-03 09:59:56 +03:00
Takuya ASADA
e645b0fb13 dist/common/scripts: move EC2 configuration verification to 'scylla_ec2_check'
Currently we only have EC2 configuration verification on AMI, so move it to
/usr/lib/scylla and run it from scylla_setup, to make it usable for
non-AMI users.

Fixes #1997

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1498811107-29135-1-git-send-email-syuu@scylladb.com>
2017-07-03 09:59:28 +03:00
Avi Kivity
6895f6e603 sstable_datafile_test: fix sstable_expired_data_ratio failure
A comment states that we want the file to be old enough, but sets
a timestamp of max(), which is in the future. This may have passed
because the conversion from numeric_limits<time_t>::max() to
db_clock::time_point is not well defined (their dynamic range is
different), so truncation may have converted the large number to a
low one.
Message-Id: <20170702082903.20879-1-avi@scylladb.com>
2017-07-02 20:22:51 +02:00
Avi Kivity
51b6066212 cql3: operation: correctly format error messages
Error messages incorrectly used the debug representation of the receiver,
rather than the text representation of the operation itself.

Fixes #113.
Message-Id: <20170701101325.3163-1-avi@scylladb.com>
2017-07-02 20:06:50 +02:00
Duarte Nunes
d157e4558a utils/log_histogram: Remove largest() function
It should never have existed in the first place, as there are no
legitimate callers and it can be misused.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170630095939.2429-1-duarte@scylladb.com>
2017-07-02 14:29:17 +03:00
Gleb Natapov
d23111312f main: wait for wait_for_gossip_to_settle() to complete during boot
Boot should not continue until a future returned by
wait_for_gossip_to_settle() is resolved.  Commit 991ec4a16 mistakenly
broke that, so restore it back. Also fix calls for supervisor::notify()
to be in the right places.

Message-Id: <20170702082355.GQ14563@scylladb.com>
2017-07-02 11:32:36 +03:00
Avi Kivity
5bc13e4454 Revert "Make table_helper independent from trace_keyspace_helper"
This reverts commit db5bf363d0. Causes
errors of the sort

    Exiting on unhandled exception: exceptions::invalid_request_exception
    (Keyspace 'system_traces' does not exist)
2017-07-02 11:30:51 +03:00
Avi Kivity
7c809917b6 compaction_manager: fix debug mode build (periodic_compaction_submission_interval)
Turn static constexpr variable into a function.
2017-07-01 19:34:46 +03:00
Avi Kivity
c2c69e003f compaction: fix build on debug mode (DEFAULT_TOMBSTONE_COMPACTION_INTERVAL)
Debug mode wants to allocate storage for a constexpr variable for some
reason. Turn it into a function.
2017-07-01 19:26:22 +03:00
Avi Kivity
59f649e2bc Revert "cql_server::do_accepts: modernize loop"
This reverts commit 37af493f6e. Connections
are not accepted and ^C does not work anymore.
2017-07-01 12:54:23 +03:00
Jesse Haber-Kucharsky
1100bb8a5b cql: Eagerly throw lexing and parsing exceptions
Previously, lexing and parsing errors were aggregated while CQL queries were
evaluated. Afterwards, the first collected error (if present) would be thrown as
an exception.

The problem was that when parsing and lexing errors were aggregated this way,
the parser would continue even in spite of errors like "no viable alternative".
Semantic actions attached to grammar rules would still execute, though with
variables that had not yet been initialized. This would crash Scylla.

This change modifies the error-handling strategy of CQL parsing. Rather than
aggregate errors, we throw an exception on the first error we encounter. This
ensures that grammar actions never execute unless there is a precise match.

One possible issue with this approach is that the generated C++ code from the
ANTLR grammar may not be exception-safe. I compiled Scylla in debug-mode with
ASan support and executed several erroneous CQL queries with `cqlsh`. No memory
leaks were reported.

Fixes #2466.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <db1f650a2bbb615b506d9015486eece45375a440.1498836703.git.jhaberku@scylladb.com>
2017-07-01 12:13:44 +03:00
Raphael S. Carvalho
69a9ad468c compaction_strategy: move dtcs to its existing header
Goal is to improve maintainability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:09 -03:00
Raphael S. Carvalho
4d387475fe compaction_strategy: move lcs implementation to its own header
Goal is to improve maintainability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:07 -03:00
Raphael S. Carvalho
4b46d286fd compaction_strategy: move stcs implementation to its own header
Goal is to improve maintainability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:06 -03:00
Raphael S. Carvalho
0d9bb0da39 compaction_strategy: move compaction_strategy_impl to its own header
compaction_strategy.cc keeps the full implementation of size tiered,
major, and null strategies, and partial implementation of leveled
and date tiered strategies. It's a mess. In the future, we will also
need space for time window strategy. The file is hard to read and
maintain.
My goal here is to eventually improve maintainability of the
strategies by putting each of them into its own header.
This is the first step towards that goal.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-30 03:50:04 -03:00
Raphael S. Carvalho
9fa855e105 compaction_strategy: use duration type for default tombstone compaction interval
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170630041838.20604-1-raphaelsc@scylladb.com>
2017-06-30 08:56:22 +03:00
Piotr Jastrzebski
db5bf363d0 Make table_helper independent from trace_keyspace_helper
table_helper is a generic helper than can easily be used in other places.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <3e360a963d4a53de6d758ba8bada78fc572f001a.1498745600.git.piotr@scylladb.com>
2017-06-29 17:20:07 +03:00
Tomasz Grabiec
97005825bf row_cache: Fix compilation errors with gcc 5
Message-Id: <1498741526-27055-1-git-send-email-tgrabiec@scylladb.com>
2017-06-29 16:34:46 +03:00
Avi Kivity
6da9b6eb81 cql3: error_listener: add virtual destructor
Found by Eclipse.
Message-Id: <20170629063324.31309-1-avi@scylladb.com>
2017-06-29 10:51:20 +02:00
Avi Kivity
9298fea27b Merge seastar upstream
* seastar 0ab7ae5...c848486 (2):
  > build: export full cflags in pkgconfig file (Fixes #2439)
  > configure: Avoid putting tmp file on /tmp
2017-06-29 11:35:24 +03:00
Avi Kivity
fc966c0c4c Merge "tombstone removal compaction" from Raphael
"This feature is intended to make compaction more efficient at getting rid of
droppable tombstone and expired data wasting disk space. So far, people have
been dealing with it manually through major compaction.

With strategies other than date tiered, large sstables will be left untouched
for a long time even though it's all expired. Date tiered suffers from it when
mixing data with different TTL because it only includes for compaction sstable
that is fully expired.

sstables keeps as metadata a histogram which allows us to easily estimate
droppable data ratio from gc_before. sstables which droppable data ratio is
above 20% (default value for tombstone_threshold option) will be considered
candidates for the operation.

Like in C*, we will only do tombstone removal compaction when there's nothing
to compact in standard way. It would be interesting to trigger it too when
disk usage is above a given threshold, but I decided to leave this for later.

Fixes #2306."

* 'tombstone_removal_compaction_v4' of github.com:raphaelsc/scylla:
  tests: more testing for tombstone compaction options
  tests: basic tombstone compaction test for date tiered
  compaction/dtcs: add support for tombstone compaction
  tests: basic test of tombstone compaction with lcs
  compaction/lcs: add support for tombstone compaction
  tests: basic tombstone compaction test for size tiered
  compaction/stcs: add support for tombstone compaction
  tests: add test for estimation of droppable tombstone ratio
  sstables: introduce function to estimate droppable tombstone ratio
  compaction_manager: periodically submit cfs for compaction
  streaming_histogram: fix coding style
  tests: add streaming_histogram_test
  streaming_histogram: implement sum
  tests: add test for sstable with bad tombstone histogram
  sstables: discard bad streaming histogram for future use
  tests: add sstable tombstone histogram test
  streaming_histogram: fix update
  streaming_histogram: move it to utils
  streaming_histogram: do not limit it to be used by sstables
  sstables: update tombstone_histogram for cells with expiration time
2017-06-29 10:19:59 +03:00
Avi Kivity
1317c4a03e Update ami submodule
* dist/ami/files/scylla-ami f10db69...5dfe42f (1):
  > don't fetch perf from amazon repo
2017-06-29 09:38:48 +03:00
Raphael S. Carvalho
ab335c8085 tests: more testing for tombstone compaction options
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
ce4dc15a20 tests: basic tombstone compaction test for date tiered
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
f76ece5349 compaction/dtcs: add support for tombstone compaction
Unlike other strategies, dtcs has tombstone compaction disabled by
default due to:
- deletion shouldn't be used with DTCS; rather data is deleted through TTL.
- with time series workloads, it's usually better to wait for whole sstable
to be expired rather than compacting a single sstable when it's more than
20% (default value) expired.
See CASSANDRA-9234 for more details.

For tombstone compaction, unworthy sstables are filtered out and the oldest
one is chosen because it's the one less likely to shadow data and it's also
relatively big.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
c400bf97b9 tests: basic test of tombstone compaction with lcs
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
70e54cfe6e compaction/lcs: add support for tombstone compaction
LCS will choose its candidate by starting from highest level and
getting sstable which has highest droppable tombstone ratio.
Unlike STCS which needs to choose oldest sstable from biggest tier,
LCS can choose the one with highest d__t__r because sstables in
a given level don't overlap.
Sstable picked up for tombstone removal compaction won't be demoted
or promoted.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
138fda468f tests: basic tombstone compaction test for size tiered
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
8fd80ac22c compaction/stcs: add support for tombstone compaction
Larger sstables are hard to find sstable peers and therefore are
left uncompacted for a long time. Expired data and tombstones which
can be purged will waste disk space meanwhile.

sstable tracks droppable tombstone from which ratio can be calculated.
If ratio is greater than threshold (0.2 by default), sstable will
be eligible for compaction. Oldest sstables from biggest tiers are
preferrable because droppable data in them are more likely to satisfy
the conditions for purge, like not shadowing data in another sstable.

Subsequent patches will add support in leveled and date tiered
strategies.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
ad24470972 tests: add test for estimation of droppable tombstone ratio
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
eb6d17b748 sstables: introduce function to estimate droppable tombstone ratio
Function used to estimate ratio of droppable tombstone.
A tombstone is considered droppable for cells expired before
gc_before and regular tombstones older than gc_before.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Raphael S. Carvalho
0d21129cc7 compaction_manager: periodically submit cfs for compaction
This is useful for a column family which isn't generating new content
and will have lots of expired data later on that can be purged.
Compaction submission is NO-OP if there's nothing to do, so I think
it's reasonable to do it at an interval of 1 hour.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:03 -03:00
Raphael S. Carvalho
719dbf547d streaming_histogram: fix coding style
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:08:12 -03:00