Commit Graph

14781 Commits

Author SHA1 Message Date
Avi Kivity
5f2600a71d migration_manager: remove dependency on messaging_service.hh in header
Use the new msg_addr.hh header to remove a dependency on
messaging_service.hh.
2018-03-12 20:05:23 +02:00
Avi Kivity
dd12214628 messaging_service: move msg_addr into its own header file
Make it possible to use msg_addr without depending on messaging_service.hh.
2018-03-12 20:05:23 +02:00
Avi Kivity
af383228fb locator: remove empty file locator.cc
Empty but for compiler-time-consuming includes.
Message-Id: <20180312073018.21646-1-avi@scylladb.com>
2018-03-12 10:32:26 +01:00
Avi Kivity
29d0a46220 locator: add copyright and license statements to production_snitch_base.cc
Message-Id: <20180312073104.21840-1-avi@scylladb.com>
2018-03-12 10:30:48 +01:00
Asias He
8624467e26 utils: Remove utils/utils.cc
It is used to make sure the header compiles in the early days.
Message-Id: <531fc6570805bd163afedd53f5d71e1b79a477d1.1520840644.git.asias@scylladb.com>
2018-03-12 09:47:40 +02:00
Duarte Nunes
0ccf1c581a Merge 'Reduce gratuitous inclusions of system_keyspace.hh' from Avi
Try to avoid recompilations by reducing inclusions of system_keyspace.hh
in other header files.

Tests: unit (release)

* tag 'system_keyspace.hh/v1' of https://github.com/avikivity/scylla:
  storage_service: remove system_keyspace.hh include
  locator: de-inline reconnectable_snitch_helper
  locator: de-inline production_snitch_base
  cql3: remove #include of system_keyspace.hh
2018-03-11 22:56:20 +00:00
Avi Kivity
cd668061fc storage_service: remove system_keyspace.hh include
Re-distribute include among the files that really need it.
2018-03-11 18:53:49 +02:00
Avi Kivity
b946f8b308 locator: de-inline reconnectable_snitch_helper
Reduce dependencies by de-inlining reconnectable_snitch_helper. A
new home is found in production_snitch_base.cc, which is somewhat
related.
2018-03-11 18:31:05 +02:00
Avi Kivity
84004a2574 locator: de-inline production_snitch_base
De-inlining allows us to remove some dependencies, and those functions
are too complex to inline anyway.

A few always-throwing functions get the [[noreturn]] attribute to
avoid damaging code generation.
2018-03-11 18:22:49 +02:00
Avi Kivity
4f6b892aa1 cql3: remove #include of system_keyspace.hh
We include system_keyspace for just the string "system" (and a related
is_system_keyspace() function). Replace with a forward-declared functions.
2018-03-11 18:02:23 +02:00
Avi Kivity
7441c7153f Merge seastar upstream
* seastar 08e02dc...42159d4 (9):
  > memory: avoid unconditional calls to __tls_init
  > io_tester: bring back information about think time
  > Merge "Avoid continuations in I/O Scheduler path" from Glauber
  > Merge "Extend io_tester to support CPU loads" from Glauber
  > tutorial: fix undue complication in semaphore get_units() example
  > Tutorial: in HTML target, inline code snippets shouldn't be gray
  > tutorial: add build target for split HTML file
  > tutorial: mention seastar::thread as option for object lifetime management
  > tutorial: document new seastar::future::wait()
2018-03-11 15:45:42 +02:00
Avi Kivity
9569ba5e38 Update scylla-ami submodule
* dist/ami/files/scylla-ami 3aa87a7...5170011 (3):
  > scylla_install_ami: install enhanced networking NIC drivers
  > scylla_install_ami: set kernel-ml as default kernel
  > scylla_install_ami: fix NIC down with enhanced networking on new base AMI
2018-03-11 15:45:05 +02:00
Raphael S. Carvalho
fb8ce14a36 sstables: don't set clustering components twice when loading sstable
already called in update_info_for_opened_data() which is called by
open_data(); no need for clustering components to be set early
either.

found it when auditing the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180310225213.26017-1-raphaelsc@scylladb.com>
2018-03-11 10:10:35 +02:00
Tomasz Grabiec
3937352a9a doc: Fix row_cache.md
Dropped unfinished sentence and added missing "after".
Message-Id: <1520615404-18458-1-git-send-email-tgrabiec@scylladb.com>
2018-03-10 16:27:04 +02:00
Raphael S. Carvalho
87035bd8d1 sstables: fix min and max timestamp when negative timestamp is specified
unsigned type was incorrectly used for keeping track of min and max
timestamp, so a negative number would be treated as a very high
number that would *incorrectly* end up as max timestamp in sstable
metadata.

Fixes #3000.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180308162217.18963-1-raphaelsc@scylladb.com>
2018-03-08 18:31:30 +02:00
Avi Kivity
596a9d0fb3 Merge "Make reader concurrency dual-restricted by count and memory" from Botond
"
Refs #2692
Fixes #3246

The current restricting algorithm [1] restricts the active-reader queue
based on the memory consumption of the existing active readers. When
this memory consumption is above the limit new readers are not admitted.
The inactive reader queue on the other hand has a fixed length.
This caused performance regressions on two workloads:
* read-only: since the inactive-reader queue length is severly limited
  (compared to the previous situation) reads will timeout at loads
  comfortably handled before.
* mixed: since the memory consumption happens only at admission time
  (already created active readers are not limited) memory consumption
  growed significantly causing problems when compactions kicked in.

The solution is to reintroduce the old limit of 100 active concurrent
user-reads while still keeping the memory-based limit as well. For
workloads that don't consume a lot of memory or on large boxes with lots
of memory the count-based limit will be reached which is reverting to the
old well-known behaviour. For memory-hungry workloads or on small boxes
with little memory the memory based-limit will kick in sooner avoiding
memory overconsumption.

[1] introduced by bdbbfe9390
"

* 'restricted-reader-dual-limit/v3' of https://github.com/denesb/scylla:
  Modify unit tests so that they test the dual-limits
  Use the reader_concurrency_semaphore to limit reader concurrency
  Add reader_concurrency_semaphore
  Add reader_resource_tracker param to mutation_source
  mv reader_resource_tracker.hh -> reader_concurrency_semaphore.hh
2018-03-08 14:36:05 +02:00
Botond Dénes
341ddd096a Modify unit tests so that they test the dual-limits 2018-03-08 14:12:12 +02:00
Botond Dénes
1259031af3 Use the reader_concurrency_semaphore to limit reader concurrency 2018-03-08 14:12:12 +02:00
Botond Dénes
dfa04c3fea Add reader_concurrency_semaphore
This semaphore implements the new dual, count and memory based active
reader limiting. As purely memory-based limiting proved to cause
problems on big boxes admitting a large number of readers (more than any
disk could handle) the previous count-based limit is reintroduced in
addition to the existing memory-based limit.
When creating new readers first the count-based limit is checked. If
that clears the memory limit is checked before admitting the reader.
reader_conccurency_semaphore wraps the two semaphores that implement
these limits and enforces the correct order of limit checking.
This class also completely replaces the restricted_reader_config struct,
it encapsulates all data and related functinality of the latter, making
client code simpler.
2018-03-08 14:12:12 +02:00
Botond Dénes
872fd369ba Add reader_resource_tracker param to mutation_source
Soon, reader_resource_tracker will only be constructible after the
reader has been admitted. This means that the resource tracker cannot be
preconstructed and just captured by the lambda stored in the mutation
source and instead has to be passed in along the other parameters.
2018-03-08 14:12:09 +02:00
Botond Dénes
d5bb8a47fc mv reader_resource_tracker.hh -> reader_concurrency_semaphore.hh
In preparation to reader_concurrency_semaphore being added to the file.
The reader_resource_tracker is really only a helper class for
reader_concurrency_semaphore so the latter is better suited to provide
the name of the file.
2018-03-08 10:29:16 +02:00
Avi Kivity
0ebfe448e3 Merge "Row-level eviction" from Tomasz
"
This series switches granularity of memory-pressure-induced eviction in cache
from a partition to a row.

Since 9b21a9b cache can store partial partitions with row granularity but they
were still evicted as a unit. This is problematic for the following reasons:

 - more is evicted than necessary, which decreases cache efficiency. In the
   worst case, whole cache gets evicted at once

 - evicting large amounts of memory (large partitions) at once may impact
   latency badly

Fixes #2576.

See the documentation added in patch titled "doc: Document row cache eviction"
for details on how eviction works.

Open issues to be fixed incrementally:

  - range tombstones are not evictable

  - cache update still has partition granularity, which
    causes bad latency on memtable flush with large partitions
"

* tag 'tgrabiec/row-level-eviction-v3' of github.com:scylladb/seastar-dev: (43 commits)
  doc: Document row cache eviction
  tests: cache: Add tests for row-level eviction
  tests: cache: Check that data is evictable after schema change
  tests: cache: Move definitions to the top
  tests: perf_cache_eviction: Switch eviction counter to row granularity
  tests: row_cache_alloc_stress: Avoid quadratic behavior
  cache: Introduce unlink_from_lru()
  cache: Add row-level stats about cache update from memtable
  mvcc: Propagate information if insertion happened from ensure_entry_if_complete()
  cache: Track number of rows and row invalidations
  cache: Evict with row granularity
  cache: Track static row insertions separately from regular rows
  tests: mvcc: Use apply_to_incomplete() to create versions
  tests: mvcc: Fix test_apply_to_incomplete()
  tests: cache: Do not depend on particular granularity of eviction
  tests: cache: Make sure readers touch rows in test_eviction()
  mvcc: Store complete rows in each version in evictable entries
  mvcc: Introduce partition_snapshot_row_cursor::ensure_entry_in_latest()
  tests: cache: Invoke partial eviction in test_concurrent_reads_and_eviction
  cache: Ensure all evictable partition_versions have a dummy after all rows
  ...
2018-03-07 17:57:07 +02:00
Tomasz Grabiec
4caeed7e40 doc: Document row cache eviction 2018-03-07 16:52:59 +01:00
Tomasz Grabiec
180a877db3 tests: cache: Add tests for row-level eviction 2018-03-07 16:52:59 +01:00
Tomasz Grabiec
9fab5068c6 tests: cache: Check that data is evictable after schema change 2018-03-07 16:52:59 +01:00
Tomasz Grabiec
f0e0c79a70 tests: cache: Move definitions to the top 2018-03-07 16:52:59 +01:00
Tomasz Grabiec
1e4f9eb2c1 tests: perf_cache_eviction: Switch eviction counter to row granularity 2018-03-07 16:52:59 +01:00
Tomasz Grabiec
48f91b4605 tests: row_cache_alloc_stress: Avoid quadratic behavior
Partitions corresponding to keys have 40k rows. With row-level
eviction touching them inside the loop became a serious performance
issue, because touch() now needs to walk over all rows.
2018-03-07 16:52:59 +01:00
Tomasz Grabiec
641bcd0b35 cache: Introduce unlink_from_lru()
Will be used in row_cache_alloc_stress to unlink partitions which we
don't want to get evicted, instead of reapeatedly calling touch() on
them after each subsequent population. After switching to row-level
LRU, doing so greatly increases run time of the test due to quadratic
behavior.
2018-03-07 16:52:59 +01:00
Tomasz Grabiec
b9d22584bb cache: Add row-level stats about cache update from memtable 2018-03-07 16:52:58 +01:00
Tomasz Grabiec
7c34cd04e2 mvcc: Propagate information if insertion happened from ensure_entry_if_complete()
It's needed by users to update statistics, different ones depending on
if the row already existed or not.
2018-03-07 16:50:55 +01:00
Raphael S. Carvalho
aa75684ee7 sstables: Warn when an extra-large partition is written
Based on https://issues.apache.org/jira/browse/CASSANDRA-9643

For compaction_large_partition_warning_threshold_mb option set to 1,
follow an example output:

WARN  2018-02-22 19:52:11,029 [shard 0] sstable - Writing large
row system/local:{key: pk{00056c6f63616c}, token:-7564491331177403445}
(1276758 bytes)

Fixes #2209.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180306175912.19259-1-raphaelsc@scylladb.com>
2018-03-07 15:49:46 +00:00
Duarte Nunes
9254a9a6fe db/system_keyspace: Move dependency on db/schema_tables to source file
And add missing dependencies to header file.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180307111304.2914-1-duarte@scylladb.com>
2018-03-07 14:45:36 +02:00
Asias He
73d8e2743f dht: Fix log in range_streamer
The address and keyspace should be swapped.

Before:
  range_streamer - Bootstrap with ks3 for keyspace=127.0.0.1 succeeded,
  took 56 seconds

After:
  range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded,
  took 56 seconds

Message-Id: <5c49646f1fbe45e3a1e7545b8470e04b166922c4.1520416042.git.asias@scylladb.com>
2018-03-07 11:49:58 +02:00
Tomasz Grabiec
6ba272a610 debug: scylla_row_cache_report: Remove duplicated phrase from printout
Message-Id: <1520412164-10746-1-git-send-email-tgrabiec@scylladb.com>
2018-03-07 11:15:57 +02:00
Tomasz Grabiec
ad7e2f7460 cache: Add back parition count argument to row_cache_update_one_batch_end probe
sebug/scylla_row_cache_report.stp expects it.

Removed in c4974392b7.
Message-Id: <1520412152-10680-1-git-send-email-tgrabiec@scylladb.com>
2018-03-07 11:15:56 +02:00
Tomasz Grabiec
da901b93fc cache: Track number of rows and row invalidations 2018-03-06 11:50:29 +01:00
Tomasz Grabiec
381bf02f55 cache: Evict with row granularity
Instead of evicting whole partitions, evicts whole rows.

As part of this, invalidation of partition entries was changed to not
evict from snapshots right away, but unlink them and let them be
evicted by the reclaimer.
2018-03-06 11:50:29 +01:00
Tomasz Grabiec
dce9185fc9 cache: Track static row insertions separately from regular rows
So that row eviction counter, which doesn't look at the static row,
can be in sync with row insertion counter.
2018-03-06 11:50:28 +01:00
Tomasz Grabiec
19951ede7d tests: mvcc: Use apply_to_incomplete() to create versions
So that the test doesn't depend on internal invariants.
2018-03-06 11:50:28 +01:00
Tomasz Grabiec
ed6271fc87 tests: mvcc: Fix test_apply_to_incomplete()
It should use evictable entries instead of non-evictable ones,
because they are required by apply_to_incomplete().
2018-03-06 11:50:28 +01:00
Tomasz Grabiec
f2bdac2874 tests: cache: Do not depend on particular granularity of eviction 2018-03-06 11:50:28 +01:00
Tomasz Grabiec
c306c1050e tests: cache: Make sure readers touch rows in test_eviction()
With row-level eviction just creating a reader won't necessarily
update the LRU.
2018-03-06 11:50:28 +01:00
Tomasz Grabiec
ab407d99cc mvcc: Store complete rows in each version in evictable entries
For row-level eviction we need to ensure that each version has
complete rows so that eviction from older versions doesn't affect the
value of the row in newer snapshots.

This is achieved by copying the row from an older version before
applying the increment in the new version.

Only affects evictable entries, memtables are not affected.
2018-03-06 11:50:28 +01:00
Tomasz Grabiec
29d167bf01 mvcc: Introduce partition_snapshot_row_cursor::ensure_entry_in_latest()
To avoid duplication of logic between cache reader and
ensure_entry_if_complete().
2018-03-06 11:50:28 +01:00
Tomasz Grabiec
fb2107416b tests: cache: Invoke partial eviction in test_concurrent_reads_and_eviction
In hope of catching more issues.
2018-03-06 11:50:27 +01:00
Tomasz Grabiec
bee875fa7d cache: Ensure all evictable partition_versions have a dummy after all rows
Every evictable version will have a dummy entry at the end so that it can be
tracked in the LRU.

It is also needed to allow old versions to stay around (with
tombstones and static rows) after all rows are evicted. Such versions
must be fully discontinuous, and we need some entry to mark that.
2018-03-06 11:50:27 +01:00
Tomasz Grabiec
5320705300 cache: Propagate cache_tracker to places manipulating evictable entries
cache_tracker reference will be needed to link/unlink row entries.

No change of behavior in this patch.
2018-03-06 11:50:27 +01:00
Tomasz Grabiec
30df3ddd7d cache: Do not evict from cache_entry destructor
We will need to propagate a cache_tracker reference to evict(). Instead
of evicting from destructor, do so before cache_entry gets unlinked
from the tree. Entries which are not linked, don't need to be
explicitly evicted.
2018-03-06 11:50:27 +01:00
Tomasz Grabiec
4efab6f6a6 cache: Use on_evicted() in cache_tracker::clear()
In preparation for switching LRU to row level.
2018-03-06 11:50:27 +01:00