Commit Graph

12395 Commits

Author SHA1 Message Date
Raphael S. Carvalho
e224653d70 sstables: update tombstone_histogram for cells with expiration time
That tombstone_histogram is used to determine droppable data ratio
for a sstable, and unlike C*, we were only updating it for
tombstones. We need to update it with expiration time of cells too,
if any. Creation time (expiration - ttl) cannot be used because if
ttl > gc_grace_seconds, the resulting sstable could be considered
worth dropping by tomstone compaction before any data is actually
expired.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-27 16:50:38 -03:00
Avi Kivity
08488a75e0 dist: tolerate sysctl failures
sysctl may fail in a container environment if /proc is not virtualized
properly.

Fixes #1990
Message-Id: <20170625145930.31619-1-avi@scylladb.com>
2017-06-27 16:11:48 +02:00
Avi Kivity
ff7be8241f Merge "Fix compilation issues in older environments" from Tomasz
* 'tgrabiec/fix-compilation-issues' of github.com:cloudius-systems/seastar-dev:
  tests: streamed_mutation_test: Avoid using boost::size() on row ranges
  tests: row_cache: Remove unused method
2017-06-27 16:30:54 +03:00
Tomasz Grabiec
eb844a10e9 tests: streamed_mutation_test: Avoid using boost::size() on row ranges
Fails to compile with libboost 1.55.
2017-06-27 15:27:13 +02:00
Tomasz Grabiec
e68925595c tests: row_cache: Remove unused method 2017-06-27 14:10:37 +02:00
Vlad Zolotarov
6839a50677 db::commitlog: entry_writer add a virtual destructor
Add a virtual destructor for a base class commitlog::entry_writer.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1498511180-18391-1-git-send-email-vladz@scylladb.com>
2017-06-27 10:17:10 +03:00
Takuya ASADA
1e86196ed5 dist/debian: unofficial support of Ubuntu non-LTS versions / Debian non-stable versions
Currently our build script only supports Ubuntu 14.04/16.04 and Debian 8, this
change extends support to Ubuntu non-LTS versions / Debian non-stable versions.
Note that this is unofficial support, users should build the package for these
distributions theirselves.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1498491473-28691-1-git-send-email-syuu@scylladb.com>
2017-06-26 18:55:55 +03:00
Asias He
cc02a62756 repair: Prefer nodes in local dc when streaming
When peer nodes have the same partition data, i.e., with the same
checksum, we currently choose to stream from any of them randomly.
To improve streaming performance, select the peer within the same DC.
This patch is supposed to improve repair perforamnce with multiple DC.

Message-Id: <c6a345b6e8ed2b59f485e53c865241e463b44507.1498490831.git.asias@scylladb.com>
2017-06-26 18:34:21 +03:00
Avi Kivity
1170f56447 Merge "Speed up gossip dissemination in large cluster" from Asias
Fixes #2528.

* tag 'asias/gossip_talk_to_more_nodes/v3' of github.com:cloudius-systems/seastar-dev:
  gossip: Use vector for _live_endpoints
  gossip: Talk to more live nodes in each gossip round
2017-06-26 17:59:43 +03:00
Asias He
e31d4a3940 gossip: Use vector for _live_endpoints
To speed up the random access in get_random_node. Switch to use vector
instead of set.
2017-06-26 22:49:59 +08:00
Asias He
437899909d gossip: Talk to more live nodes in each gossip round
In large clusters with multiple DC deployment, it is observed that it
takes long delay for gossip update to disseminate in the cluster.

To speed up, talk to more live nodes in each gossip round.

Fixes #2528
2017-06-26 22:49:59 +08:00
Nadav Har'El
6cf44f6817 Optimize column_family::make_sstable_reader() for one partition
This patch does the same thing to column_family::make_sstable_reader() as
commit 186f031 did to sstable::as_mutation_source().

Although usually one can fast_forward_to() on the result of a
column_family::make_sstable_reader(), earlier we had an optimization
where if a single partition was specified, it was read exactly,
and fast_forward_to() was *NOT* allowed.

With the mutation_reader::forwarding flag patch, when this flag
was on - requesting fast_forward_to() - we disabled this optimization.
This makes sense, but is not backward compatible with the code which
previously assumes this optimization exists. In particular,
column_family::data_query() does a single partition read but does not
specify forwarding::no explicitly.
So this patch returns this optimization, despite this meaning that we
blatently ignore the fwd_mr flag in that case.

Fixes #2524.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170626141121.30322-1-nyh@scylladb.com>
2017-06-26 17:13:03 +03:00
Avi Kivity
9b21a9bfb6 Merge "Implement partial cache" from Tomasz and Piotr
"This series enables cache to keep partial partitions.
Reads no longer have to read whole partition from sstables
in order to cache the result.

The 10MB threshold for partition size in cache is lifted.

Known issues:

 - There is no partial eviction yet, whole partitions are still evicted,
   and partition snapshots held by active reads are not evictable at all
 - Information about range continuity is not recorded if that
   would require inserting a dummy entry, or if previous entry
   doesn't belong to the latest snapshot
 - Cache update after memtable flush happening concurrently with reads
   may inhibit that reads' ability to populate cache (new issue)
 - Cache update from flushed memtables has partition granularity,
   so may cause latency problems with large partition
 - Schema is still tracked per-partition, so after schema changes
   reads may induce high latency due to whole partition needing
   to be converted atomically
 - Range tombstones are repeated in the stream for every range between
   cache entries they cover (new issue)
 - Populating scans for both small and large partitions (perf_fast_forward)
   experienced a 40% reduction of throughput, CPU bound

How was this tested:

 - test.py --mode release
 - row_cache_stress_test -c1 -m1G
 - perf_fast_forward, passes except for the test case checking range continuity population
   which would require inserting a dummy entry (mentioned above)
 - perf_simple_query (-c1 -m1G --duration 32):
     before: 90k [ops/s] stdev: 4k [ops/s]
     after:  94k [ops/s] stdev: 2k [ops/s]"

* tag 'tgrabiec/introduce-partial-cache-v8' of github.com:cloudius-systems/seastar-dev: (130 commits)
  tests: row_cache: Add test_tombstone_merging_in_partial_partition test case
  tests: Introduce row_cache_stress_test
  utils: Add helpers for dealing with nonwrapping_range<int>
  tests: simple_schema: Allow passing the tombstone to make_range_tombstone()
  tests: simple_schema: Accept value by reference
  tests: simple_schema: Make add_row() accept optional timestamp
  tests: simple_schema: Make new_timestamp() public
  tests: simple_schema: Introduce make_ckeys()
  tests: simple_schema: Introduce get_value(const clustered_row&) helper
  tests: simple_schema: Fix comment
  tests: simple_schema: Add missing include
  row_cache: Introduce evict()
  tests: Add cache_streamed_mutation_test
  tests: mutation_assertions: Allow expecting fragments
  mutation_fragment: Implement equality check
  tests: row_cache: Add test for population of random partitions
  tests: row_cache: Add test for partition tombstone population
  tests: row_cache: Test reading randomly populated partition
  tests: row_cache: Add test_single_partition_update()
  tests: row_cache: Add test_scan_with_partial_partitions
  ...
2017-06-26 14:54:37 +03:00
Avi Kivity
555621b537 Disentable memtables from sstables
Remove sstable::write_components(memtable), replacing it with a helper.

Fixes #2354
Message-Id: <20170624142639.16662-1-avi@scylladb.com>
2017-06-26 09:37:11 +02:00
Avi Kivity
236a8370e4 Remove use of std::random_shuffle()
It was removed in C++17. Replace with std::shuffle().
Message-Id: <20170626063809.7563-1-avi@scylladb.com>
2017-06-26 09:36:38 +02:00
Avi Kivity
c4ae2206c7 messaging: respect inter_dc_tcp_nodelay configuration parameter
We respect it partially (client side only) for now.

Fixes #6.
Message-Id: <20170623172048.23103-1-avi@scylladb.com>
2017-06-24 21:49:27 +02:00
Duarte Nunes
2dfd7040eb CMakeLists.txt: Add boost support
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170623172236.15507-1-duarte@scylladb.com>
2017-06-24 21:49:27 +02:00
Avi Kivity
801b5220d6 Merge seastar upstream
* seastar 9e2b7ec...0ab7ae5 (4):
  > Update fmt submodule
  > rpc: add options to control tcp_nodelay
  > core: Fix compilation for older versions of Boost
  > tests/lowres_clock_test: Fix compilation issues
2017-06-24 20:47:52 +03:00
Tomasz Grabiec
b0bcf2be53 tests: row_cache: Add test_tombstone_merging_in_partial_partition test case 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
23c6f517cb tests: Introduce row_cache_stress_test
Runs readers, updates and eviction concurrently and verifies the
following property of reads:

  - reads see all past writes

  - reads see no partial writes within a single partition
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
4b4aef789e utils: Add helpers for dealing with nonwrapping_range<int> 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
5c9f87fb27 tests: simple_schema: Allow passing the tombstone to make_range_tombstone() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
edf4a3494c tests: simple_schema: Accept value by reference 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
5f70df472f tests: simple_schema: Make add_row() accept optional timestamp 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
53867c4328 tests: simple_schema: Make new_timestamp() public 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
51b5814ec2 tests: simple_schema: Introduce make_ckeys() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
074c67fe4d tests: simple_schema: Introduce get_value(const clustered_row&) helper 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
8ffc776e06 tests: simple_schema: Fix comment 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
ecacd2e84a tests: simple_schema: Add missing include 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
b56232b216 row_cache: Introduce evict() 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
c4e8effffa tests: Add cache_streamed_mutation_test
[tgrabiec:
  - extracted from a larger commit
  - removed coupling with how cache_streamed_mutation is created (the
    code went out of sync), used more stable make_reader(). it's simpler too.
  - replaced false/true literals with is_continuous/is_dummy where appropraite
  - dropped tests for cache::underlying (class is gone)
  - reused streamed_mutation_assertions, it has better error messages
  - fixed the tests to not create tombstones with missing timestamps
  - relaxed range tombstone assertions to only check information relevant for the query range
  - print cache on failure for improved debuggability
]
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
44fdee3f2e tests: mutation_assertions: Allow expecting fragments 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
1f23130b07 mutation_fragment: Implement equality check 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
116bcb8b30 tests: row_cache: Add test for population of random partitions 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
930a1415fe tests: row_cache: Add test for partition tombstone population 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
9bfece6f82 tests: row_cache: Test reading randomly populated partition 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
0358334579 tests: row_cache: Add test_single_partition_update()
[tgrabiec: Extracted from "row_cache: Introduce cache_streamed_mutation"]
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
8bb76e2f12 tests: row_cache: Add test_scan_with_partial_partitions 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
896bf2e5de Remove unused methods from MVCC
Some apply methods where replaced by apply_to_incomplete().

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
6f6575f456 row_cache: Enable partial partition population 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
5a0ae55f6d Introduce schema_upgrader 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
1828e28bbb database: Invalidate cache atomically with attaching streaming sstables
Not doing so may cause reads to see partial writes, if another
update+read happens in between.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
896196b841 database: Invalidate cache from seal_active_streaming_memtable_immediate()
Cache must be synchronized atomically with changing the underlying
mutation source, otherwise write atomicity may not hold.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
7ae40d7045 tests: Add test for update_invalidating() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
e792220c3a row_cache: Introduce update_invalidating() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
c29878f49f row_cache: Extract memtable walking logic from update() into do_update()
So that it can be reused in update_invalidating().
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
6ebfb730ee partition_entry: Introduce partition_tombstone() getter 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
fb62dfab02 tests: mvcc: Introduce test_schema_upgrade_preserves_continuity 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
164989a574 tests: mvcc: Add test for partition_entry::apply_to_incomplete() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
e433e68610 partition_entry: Make squashed() and upgrade() work with not fully continuous versions
Those methods first create a neutral mutation_partition, and left-fold
it with the versions. The problem is that there is no neutral element
for static row continuity, the flag from the first addend always
wins. We have to copy the flag from the first version to preserve
the logical value.
2017-06-24 18:06:11 +02:00