Commit Graph

891 Commits

Author SHA1 Message Date
Raphael S. Carvalho
ef18b1162b sstables/compaction_manager: rename and better explain reshard function
submit doesn't properly describe the function and also improve explanation
of the relationship between function itself and its job parameter.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170912032034.23043-1-raphaelsc@scylladb.com>
2017-09-12 12:25:17 +03:00
Tomasz Grabiec
3f527e028d Merge "Reduce dependencies on sstables.hh" from Avi
This patchset reduces includes of sstables.hh, reducing compile time
by both reducing the amount of code compiled, and the amount of
needless recompiles caused by false dependencies.  It does so by
replacing lw_shared_ptr<sstable>, which requires a complete class,
with a new custom type shared_sstable, which allows an incomplete
sstable class definition.

* https://github.com/avikivity/scylla deps2/v2.1
  database: change truncate() to flush while compaction is disabled
  database: make run_with_compaction_disabled() a non-template
  database: add indirection to compaction_manager instance
  database: remove dependency on compaction.hh and compaction_manager.hh
  size_estimates_virtual_reader.hh: add missing include
  system_keyspace: add missing include
  main: add missing include
  storage_service: add missing include
  repair: add missing include
  compaction.hh: add missig include and forward declaration
  compaction_manager: add missing include
  shared_index_lists.hh: add missing include
  perf_fast_forward: add missing include
  sstable_mutation_test: add missing include
  sstables: extract version and format enum into a separate header file
  database.hh: add missing forward declaration for
    foreign_sstable_open_info
  cql_test_env: add forward declaration
  database: make column_family::disable_sstable_write() out-of-line
  sstables: introduce make_sstable() as a shortcut for
    make_lw_shared<sstable>
  treewide: use shared_sstable, make_sstable in place of
    lw_shared_ptr<sstable>
  sstables: use support for lw_shared_ptr with incomplete type for
    shared_sstable
  sstables: reduce dependencies
  streaming: remove unneeded includes
2017-09-12 09:56:46 +02:00
Tomasz Grabiec
ee1e7732a6 database: Create tables with continuous cache
When table is created, it doesn't contain any data, so we can mark the whole
data range as continuous in cache. This way reads will immediately hit, and
flushes will populate. If sstables are later attached, the attaching process
is supposed to invalidate affected ranges (and it does).

Fixes #2536.

Message-Id: <1505200269-4031-1-git-send-email-tgrabiec@scylladb.com>
2017-09-12 10:53:07 +03:00
Avi Kivity
f7023501d6 treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable>
Since shared_sstable is going to be its own type soon, we can't use the old alias.
2017-09-12 10:43:05 +03:00
Avi Kivity
88b91c84a1 database: make column_family::disable_sstable_write() out-of-line
Reduces dependencies.
2017-09-12 10:43:05 +03:00
Avi Kivity
9b540eccb0 database: remove dependency on compaction.hh and compaction_manager.hh 2017-09-11 20:09:45 +03:00
Avi Kivity
f9c8c1ddc2 database: add indirection to compaction_manager instance
Allows making it forward-declared later on, reducing dependencies.
2017-09-11 20:09:45 +03:00
Avi Kivity
9d0aaa941a database: make run_with_compaction_disabled() a non-template
Allows reducing dependencies down the line, and un-templating
non-performance-critical functions is a good thing.
2017-09-11 20:09:45 +03:00
Avi Kivity
6b5514a3df database: change truncate() to flush while compaction is disabled
In preparation to make run_with_compaction_disabled() a non-template,
we want to remove any non-copyable captures (so the function can be
an std::function, which requires copyability). Move the flush within
the compaction disabled region. This changes the behavior, but it shouldn't
matter.
2017-09-11 20:09:45 +03:00
Paweł Dziepak
e401d2d50b db: reject non-Scylla counter sstables in flush_upload_dir
Scylla already refuses to load counter sstables that do not have Scylla
component. However, if this happens because of 'nodetool refresh'
command the existing protection will trigger after sstables have been
moved to the data directory. This is too later, so an additional check
is added when the upload directory is scanned.
2017-09-06 12:04:26 +01:00
Paweł Dziepak
6a5e8bace1 db: disallow loading non-Scylla counter sstables
Scylla does not support local and remote counter shards. This means that
it is unsafe to directly load sstables that may contain them.
2017-09-06 12:03:58 +01:00
Tomasz Grabiec
d22fdf4261 row_cache: Improve safety of cache updates
Cache imposes requirements on how updates to the on-disk mutation source
are made:
  1) each change to the on-disk muation source must be followed
     by cache synchronization reflecting that change
  2) The two must be serialized with other synchronizations
  3) must have strong failure guarantees (atomicity)

Because of that, sstable list update and cache synchronization must be
done under a lock, and cache synchronization cannot fail to synchronize.

Normally cache synchronization achieves no-failure thing by wiping the
cache (which is noexcept) in case failure is detect. There are some
setup steps hoever which cannot be skipped, e.g. taking a lock
followed by switching cache to use the new snapshot. That truly cannot
fail.  The lock inside cache synchronizers is redundant, since the
user needs to take it anyway around the combined operation.

In order to make ensuring strong exception guarantees easier, and
making the cache interface easier to use correctly, this patch moves
the control of the combined update into the cache. This is done by
having cache::update() et al accept a callback (external_updater)
which is supposed to perform modiciation of the underlying mutation
source when invoked.

This is in-line with the layering. Cache is layered on top of the
on-disk mutation source (it wraps it) and reading has to go through
cache. After the patch, modification also goes through cache. This way
more of cache's requirements can be confined to its implementation.

The failure semantics of update() and other synchronizers needed to
change due to strong exception guaratnees. Now if it fails, it means
the update was not performed, neither to the cache nor to the
underlying mutation source.

The database::_cache_update_sem goes away, serialization is done
internally by the cache.

The external_updater needs to have strong exception guarantees. This
requirement is not new. It is however currently violated in some
places. This patch marks those callbacks as noexcept and leaves a
FIXME. Those should be fixed, but that's not in the scope of this
patch. Aborting is still better than corrupting the state.

Fixes #2754.

Also fixes the following test failure:

  tests/row_cache_test.cc(949): fatal error: in "test_update_failure": critical check it->second.equal(*s, mopt->partition()) has failed

which started to trigger after commit 318423d50b. Thread stack
allocation may fail, in which case we did not do the necessary
invalidation.
2017-09-04 10:04:29 +02:00
Tomasz Grabiec
bf75b882ae database: Add non-throwing try_trigger_compaction() 2017-09-04 10:04:29 +02:00
Tomasz Grabiec
116d4ae02b database: Make add_sstable() have strong exception guarantees
If insert() fails, we left the database with stats updated, but
sstable not being attached.
2017-09-04 10:04:29 +02:00
Tomasz Grabiec
56e3ce05db row_cache: Don't require presence checker to be supplied externally
The API is simpler and safer this way.
2017-09-04 10:04:29 +02:00
Tomasz Grabiec
df787afe6a database: Supply presence checker in sstable snapshots 2017-09-04 10:04:29 +02:00
Tomasz Grabiec
ab8632b225 database: Add missing serialization of sstable set udpate and cache invalidation
Commit e3ad676433 missed a few places.

It is required to serialize sstable list update and cache synchronization
in order to preserve partition update isolation.

Fixes #2746.
2017-09-04 10:04:29 +02:00
Glauber Costa
e642aee3f7 database: wait for asynchronous operations to end before closing CF
This was part of "add gate for generic async operations to column family" but
somehow didn't make it into the final patch.

Add the missing piece.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20170830164205.4497-1-glauber@scylladb.com>
2017-08-31 11:16:30 +03:00
Tzach Livyatan
12fb975282 Fix typos in metrics description
Fixes #2658

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <20170803121732.19640-1-tzach@scylladb.com>
2017-08-28 10:48:28 +03:00
Glauber Costa
83323e155e database: add gate for generic async operations to column family
run_with_compaction_disabled(), which is called by truncate, has a
pretty large defer point in remove(). When the code gets to finally
execute, we can't guarantee that the column family will still be alive.

That is true in particular if we issued a drop table command following
truncate: by the time truncate gets to resume, the CF will be gone.
Before the column family is dropped, it will always call its stop()
method, which means we have an opportunity to do some waiting there. We
already wait for flushes and current compactions to end.

Traditionally, we have been solving similar problems by adding a gate
that will catch asynchronous operations and making sure that potentially
asynchronous operations will enter the gate before executing. Let's do
the same thing here. We will close() the gate during stop().

Fixes #2726

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-08-24 13:12:57 -04:00
Glauber Costa
d090e7be35 database: make sure that column family is always stopped when dropped
truncate can throw exceptions. If it does, cf->stop() will never be
called because it is contained in a .then clause instead of finally.

One of the things that truncate does - in a finally block of its own -
is initiate a final compaction. If it returns an exception nobody will
wait for that compaction to finish (since cf->stop() is the one doing
that) and we'll crash.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-08-24 13:01:47 -04:00
Amnon Heiman
abbd78367c Add configuration to disable per keyspace and column family metrics
The number of keysapce and column family metrics reported is
proportional to the number of shards times the number of keysapce/column
families.

This can cause a performance issue both on the reporting system and on
the collecting system.

This patch adds a configuration flag (set to false by default) to enable
or disable those metrics.

Fixes #2701

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170821113843.1036-1-amnon@scylladb.com>
2017-08-22 19:19:54 +03:00
Raphael S. Carvalho
10eaa2339e compaction: Make resharding go through compaction manager
Two reasons for this change:
1) every compaction should be multiplexed to manager which in turn
will make decision when to schedule. improvements on it will
immediately benefit every existing compaction type.
2) active tasks metric will now track ongoing reshard jobs.

Fixes #2671.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170817224334.6402-1-raphaelsc@scylladb.com>
2017-08-20 11:35:14 +03:00
Botond Dénes
e70cfc8f36 incremental_reader_selector: account for possibly disengaged lower bound
In addition to the constructor (fixed previously) the check for no
sstables on the first call to select() also has to be prepared for the
lower bound of the range being disengaged.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4ab1296c71814fcd492996fa36fd00fd7bbbbc7f.1502949875.git.bdenes@scylladb.com>
2017-08-17 10:07:26 +03:00
Botond Dénes
af83b7f57b incremental_reader_selector: use lazy_deref instead of tertiary operator
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4f4b884c6a1f517bd654f3b27608d854b17a66e1.1502948635.git.bdenes@scylladb.com>
2017-08-17 08:45:46 +03:00
Avi Kivity
8df6dd1fa0 database: make incremental_reader_selector robust vs. full-range partition_range
incremental_reader_selector assumes the partition_range it receives has a lower
bound, but it was seen in mutation_test that this is not so.

Fix by checking whether the bound exists or not.
Message-Id: <20170815095852.14149-1-avi@scylladb.com>
2017-08-15 11:03:22 +01:00
Duarte Nunes
7fb6a74302 combined_mutation_reader: Drop exhausted readers if not in FF mode
Exhausted readers can be fast forwarded, so we have to keep them
around. However, if the current reader is not fast forwardable, then
we can drop those readers and their buffers.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-14 14:37:27 +02:00
Botond Dénes
9ee9988097 Add combined_mutation_reader_test unit test 2017-08-10 12:38:10 +03:00
Botond Dénes
3e97a5cd6b Remove range_sstable_reader
range_sstable_reader is replaced with combined_mutation_reader, using
the incremental_reader_selector.
2017-08-10 12:38:10 +03:00
Botond Dénes
bfc74f1312 Add incremental_reader_selector
incremental_reader_selector is a specialization of reader_selector for
the case when sstables have narrow and/or disjoint token ranges. To
exploit this it creates new readers on-demand when their sstable's
token range intersects with the current ring position.
2017-08-10 12:38:02 +03:00
Botond Dénes
94fc550e68 sstable_set::incremental_selector: select() now returns a selection
A seletion contains - in addition to the list of sstables - a next_token
which is a hint as to what is the next best token to call select() with.
This should be the smallest token such that at the next call to
select() the least number of new sstables will be returned, without
skipping any.
2017-08-09 16:27:33 +03:00
Glauber Costa
4a911879a3 add active streaming reads metric
In commit f38e4ff3f, we have separated streaming reads from normal reads
for the purpose of determining the maximum number of reads going on.
However, we'll now be totally unaware of how many reads will be
happening on behalf of streaming and that can be important information
when debugging issues.

This patch adds this metric so we don't fly blind.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1501909973-32519-1-git-send-email-glauber@scylladb.com>
2017-08-05 11:06:37 +03:00
Avi Kivity
f38e4ff3f9 database: prevent streaming reads from blocking normal reads
Streaming reads and normal reads share a semaphore, so if a bunch of
streaming reads use all available slots, no normal reads can proceed.

Fix by assigning streaming reads their own semaphore; they will compete
with normal reads once issued, and the I/O scheduler will determine the
winner.

Fixes #2663.
Message-Id: <20170802153107.939-1-avi@scylladb.com>
2017-08-03 10:23:01 +01:00
Avi Kivity
911536960a database: remove streaming read queue length limit
If we fail a streaming read due queue overload, we will fail the entire repair.
Remove the limit for streaming, and trust the caller (repair) to have bounded
concurrency.

Fixes #2659.
Message-Id: <20170802143448.28311-1-avi@scylladb.com>
2017-08-03 10:21:07 +01:00
Duarte Nunes
a85232dd82 Fix compilation errors on GCC 6
GCC 6 inconsistently requires explicitly calling a member function
through "this->" for lambda functions capturing "this".

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170731143755.21970-1-duarte@scylladb.com>
2017-07-31 17:40:44 +03:00
Avi Kivity
3fe6731436 Merge "educe the effect of the latency metrics" from Amnon
"This series reduce that effect in two ways:
1. Remove the latency counters from the system keyspaces
2. Reduce the histogram size by limiting the maximum number of buckets and
   stop the last bucket."

Fixes #2650.

* 'amnon/remove_cf_latency_v2' of github.com:cloudius-systems/seastar-dev:
  database: remove latency from the system table
  estimated histogram: return a smaller histogram
2017-07-31 15:58:30 +03:00
Duarte Nunes
c81431ad16 column_family: Re-acquire flush permit in case of error
If we fail to flush an sstable, after creating the flush_reader, then
we will have released the flush permit when we retry the flush. Ensure
that when retrying, we re-acquire the flush permit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-31 12:40:19 +02:00
Duarte Nunes
9162e016da column_family: Don't hold sstable read lock when retrying flush
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-31 12:40:19 +02:00
Duarte Nunes
1a33cc6847 sstables: Release the flush permit before fsyncing
This allows a queued flush to start while we fsync the current
sstable, which helps reduce the overall time new writes are blocked on
dirty memory.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-31 12:40:19 +02:00
Duarte Nunes
a2b732c156 dirty_memory_manager: Refactor flush permit lifetime management
This patch refactors how the flush permit lifetime is managed,
dropping the current hash table in favour of a RAII approach.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-31 12:40:19 +02:00
Duarte Nunes
f647f5b14a dirty_memory_manager: Invert permit acquisition order
For an upcoming fix it is required to invert the permit acquisition
order: first we acquire the background work permit and then the single
flush permit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-31 12:40:19 +02:00
Duarte Nunes
e371accac8 memtable_list: Register different seal functions for each behaviour
Instead of passing a flush_behaviour to the seal function, use two
different functions for each of the behaviours.

This will be important in the forthcoming patches, which will require
the signatures of those functions to differ.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-31 12:40:19 +02:00
Avi Kivity
e855a28fae Revert "Merge "memtable flush: Fixes and improvements" from Duarte"
This reverts commit 733a64a1df, reversing
changes made to e11e66723a.

Breaks sstable_test and perf_fast_forward.
2017-07-31 12:44:28 +03:00
Duarte Nunes
0f1bd81523 column_family: Re-acquire flush permit in case of error
If we fail to flush an sstable, after creating the flush_reader, then
we will have released the flush permit when we retry the flush. Ensure
that when retrying, we re-acquire the flush permit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 21:09:18 +02:00
Duarte Nunes
2f4cffc7f6 column_family: Don't hold sstable read lock when retrying flush
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 21:09:18 +02:00
Duarte Nunes
5e64839e85 sstables: Release the flush permit before fsyncing
This allows a queued flush to start while we fsync the current
sstable, which helps reduce the overall time new writes are blocked on
dirty memory.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 21:09:18 +02:00
Duarte Nunes
ef1275e9dd dirty_memory_manager: Refactor flush permit lifetime management
This patch refactors how the flush permit lifetime is managed,
dropping the current hash table in favour of a RAII approach.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 21:09:18 +02:00
Duarte Nunes
cfc8fae33f dirty_memory_manager: Invert permit acquisition order
For an upcoming fix it is required to invert the permit acquisition
order: first we acquire the background work permit and then the single
flush permit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 21:09:18 +02:00
Duarte Nunes
7e68e4677d memtable_list: Register different seal functions for each behaviour
Instead of passing a flush_behaviour to the seal function, use two
different functions for each of the behaviours.

This will be important in the forthcoming patches, which will require
the signatures of those functions to differ.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-07-27 21:09:18 +02:00
Amnon Heiman
a71b9e498a database: remove latency from the system table
This patch remove the latency histograms from the system table, it also
extend the already existing exclusion to all system keyspaces.

It also uses the new get_histogram API to set a minimal bucket size to
100 microseconds.
2017-07-27 11:41:15 +03:00