Commit Graph

13209 Commits

Author SHA1 Message Date
Botond Dénes
3280fbc4d4 Add restricted_reader_test unit test 2017-10-03 12:44:17 +03:00
Botond Dénes
47e07b787e restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-10-03 12:44:12 +03:00
Botond Dénes
0a07e9e7c7 mutation_reader.hh: Move restricted_reader related code
In preparation of make_restricted_reader taking a mutation_source as
its argument.
2017-10-03 12:39:22 +03:00
Avi Kivity
78eae8bf48 Revert "Merge "Make restricting_mutation_reader more accurate" from Botond"
This reverts commit c6e5dcc556, reversing
changes made to 19b21a0ab2. Failes to build,
plus author has more changes.
2017-10-03 11:58:59 +03:00
Pekka Enberg
641f28da02 cql3/statements: Clean up select_statement class definition
We have some historical #ifdef'd code that really ought to be removed by now...

Message-Id: <1507015932-8165-1-git-send-email-penberg@scylladb.com>
2017-10-03 11:17:32 +03:00
Avi Kivity
c6e5dcc556 Merge "Make restricting_mutation_reader more accurate" from Botond
"Currently restricting_mutation_reader restricts mutation_readears on a
count basis. This is inaccurate on multiple levels. The reader might be
a combined_mutation_reader, which might be composed of multiple
individual readers, whose number might change during the lifetime of the
reader. The memory consumption of the readers can vary and may change
during the lifetime of the reader as well.
To remedy this, make the restriction memory-consumption based. The
restricting semaphore is now configured with the amound of memory
(bytes) that its readers are allowed to consume in total. New readers
consume 128k units up-front to account for read-ahead buffers, and then
consume additional units for any buffer (returned
from input_stream<>::read()) they keep around.
Like before, readers already allowed to read will not be blocked,
instead new readers will be blocked on their first read if all the units
all consumed."

Fixes #2692.

* 'bdenes/restricting_mutation_reader-v4' of https://github.com/denesb/scylla:
  Update reader restriction related metrics
  Add restricted_reader_test unit test
  restricted_mutation_reader: restrict based-on memory consumption
  mutation_reader.hh: Move restricted_reader related code
2017-10-03 11:15:34 +03:00
Daniel Fiala
19b21a0ab2 types: Allow 'T' as a date-time separator in timestamps.
* Letter 'T' is specified in ISO 8601 and also in Cassandra
  documentation.

Signed-off-by: Daniel Fiala <daniel@scylladb.com>
Message-Id: <20171003073558.19257-1-daniel@scylladb.com>
2017-10-03 11:10:11 +03:00
Avi Kivity
3cc1c2c387 Merge seastar upstream
* seastar 899fc4e...c62bbf9 (6):
  > Merge "CPU Scheduler for seastar" from Avi
  > reactor: set SCHED_FIFO policy for timer thread
  > future: mark future::wait() as noexcept
  > shared_promise: Make get_shared_future() const-qualified
  > Remove pessimizing and redundant std::move()-s reported by Clang-tidy utility
  > Work around GCC 5 bug: scylladb/seastar#338, scylladb/seastar#339
2017-10-02 20:47:32 +03:00
Avi Kivity
dd5ab75d04 range: add missing include
Message-Id: <20171002144608.5032-1-avi@scylladb.com>
2017-10-02 16:49:24 +02:00
Avi Kivity
5ed6d1b176 dist: enable CAP_SYS_NICE
Allow scylla to use SCHED_FIFO for the timer thread for more accurate
scheduling.
Message-Id: <20171001121500.28318-1-avi@scylladb.com>
2017-10-02 16:32:00 +02:00
Avi Kivity
dbce5158a3 Update ami submodule
* dist/ami/files/scylla-ami 5ffa449...be90a3f (1):
  > amazon kernel: enable updates
2017-10-02 17:07:09 +03:00
Piotr Jastrzebski
83fd22face Add test to reproduce #2854
When memtable gets flushed, existing mutation_readers created
for it stop handling fast_forward_to correctly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f580ac59f3fcec53e7c78ad7a8b6374eb36958c6.1506690042.git.piotr@scylladb.com>
2017-09-29 15:17:53 +02:00
Piotr Jastrzebski
2583207d9d Fix memtable scanning_reader::fast_forward_to
If memtable is flushed then call fast_forward_to on _delegate.
Otherwise call iterator_reader::fast_forward_to.

Fixes #2854

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <6bf1c8bafce845ef945698ce4d722c3c8606e632.1506690042.git.piotr@scylladb.com>
2017-09-29 15:17:39 +02:00
Asias He
c0b965ee56 gossip: Better check for gossip stabilization on startup
This is a backport of Apache CASSANDRA-9401
(2b1e6aba405002ce86d5badf4223de9751bf867d)

It is better to check the number of nodes in the endpoint_state_map
is not changing for gossip stabilization.

Fixes #2853
Message-Id: <e9f901ac9cadf5935c9c473433dd93e9d02cb748.1506666004.git.asias@scylladb.com>
2017-09-29 08:57:25 +02:00
Tomasz Grabiec
d75f243a8b Update seastar submodule
Fixes #2770.
Fixes #2819.

* seastar 92fdce2...899fc4e (14):
  > scollectd: increment the metadata iterator with the values
  > Enable Travis CI builds for Seastar.
  > tests: Fix httpd test compilation error caused by unconditionally explicit tuple constructor in GCC5: scylladb/seastar#326
  > core::shared_future: add available() and failed() methods
  > rpc: make sure that _write_buf stream is always properly closed
  > log: Fail on attempt to register logger with the same name twice
  > Merge "Make backtraces useful on ASLR-enabled machines as well" from Botond
  > reactor: add option to bypass fsync
  > future-util: modernize do_until() implementation
  > future-util: fix do_until() API to not have forwarding references
  > input_stream: add rvalue variant of input_stream::consume()
  > logger: remove extra spaces after timestamp
  > tutorial: lifetime management
  > Fix broken link for fsqual failure message
2017-09-28 15:27:34 +02:00
Piotr Jastrzebski
6069bab755 Cache single queries to non-existing partitions
This way we don't need to query sstables again
when the query is repeated.

Fixes #1533

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <8f8559ff19c534dbbb7c9ef6c28271cec607ba20.1506521461.git.piotr@scylladb.com>
2017-09-27 16:15:18 +02:00
Tomasz Grabiec
b704710954 migration_manager: Make sure schema pulls eventually happen when schema_tables_v3 is enabled
We don't pull schema during rolling upgrade, that is until
schema_tables_v3 feature is enabled on all nodes.

Because features are enabled from gossiper timer, there is a race
between feature enablement and processing of endpoint states which may
trigger schema pull.  It can happen that we first try to pull, but
only later enable the feature. In that case the schema pull will not
happen until the next schema change.

The fix is to ensure that pulls abandoned due to feature not being enabled
will be retried when it is enabled.

Fixes sporadic failure in dtest:

  repair_additional_test.py:RepairAdditionalTest.repair_schema_test
Message-Id: <1506428715-8182-2-git-send-email-tgrabiec@scylladb.com>
2017-09-27 12:00:07 +01:00
Tomasz Grabiec
7a58fb5767 gossiper: Allow waiting for feature to be enabled
Message-Id: <1506428715-8182-1-git-send-email-tgrabiec@scylladb.com>
2017-09-27 11:57:06 +01:00
Raphael S. Carvalho
63eb9f61c0 db: use correct dirty memory manager for system column families
Dirty memory manager for non-system column families was being used
when applying mutations to system cfs.
That previously lead to deadlock when updating history. Basically,
write disable waits on compaction, and compaction waits on a write
that would release dirty memory for updating compaction history.

Only using the correct dirty manager wouldn't solve this problem
if write is disabled for system cf, but the problem is completely
solved in addition to previous change which updates history
outside the sstable lock.

Refs #2769.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170918215238.9810-3-raphaelsc@scylladb.com>
2017-09-26 19:51:31 +02:00
Raphael S. Carvalho
e34c1db642 db: update compaction history outside the sstable write lock
The reason to do that is because compaction can deadlock if refresh
disables write which waits for compaction, and compaction in turn
waits for dirty memory[1] that would be released by memtable write.

Dirty memory manager for non-system cfs was being used for system cfs,
which was useful for exposing this problem.

[1]: when updating compaction history.

Fixes #2769.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170918215238.9810-2-raphaelsc@scylladb.com>
2017-09-26 19:51:12 +02:00
Asias He
4b1034b9cd storage_service: Remove the stream_hints
Our hinted handoff implementation will not use the
db::system_keyspace::HINTS system table to store hints.
No need to stream them.

Acked-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <3b9190e250b54321ceb87767f4722c7458d41797.1506391500.git.asias@scylladb.com>
2017-09-26 19:05:21 +03:00
Paweł Dziepak
af1976bc30 Merge "Fix cache reader skipping rows in some cases" from Tomasz
"Fixes the problem of concurrent populations of clustering row ranges
leading to some readers skipping over some of the rows.
Spotted during code review.

Fixes #2834."

* tag 'tgrabiec/fix-cache-reader-skipping-rows-v2' of github.com:scylladb/seastar-dev:
  tests: mvcc: Add test for partition_snapshot_row_cursor
  tests: row_cache: Add test for concurrent population
  tests: row_cache: Make populate_range() accept partition_range
  tests: Add simple_schema::make_ckey_range()
  cache_streamed_mutation: Add missing _next_row.maybe_refresh() call
  mvcc: partition_snapshot_row_cursor: Fix cursor skipping over rows added after its position
  mvcc: partition_snapshot_row_cursor: Rename up_to_date() to iterators_valid()
  mvcc: Keep track of all iterators in partition_snapshot_row_cursor
  mvcc: Make partition_snapshot_row_cursor printable
2017-09-26 15:09:58 +01:00
Tomasz Grabiec
3eb251e3a4 tests: perf_fast_forward: Fail if ran with more than one shard
The test reads only from local shard, if ran with more shards,
current shard will miss some of the data.

Message-Id: <1506081609-12811-1-git-send-email-tgrabiec@scylladb.com>
2017-09-26 15:23:10 +03:00
Calle Wilund
dd2b8821a4 everywhere_strategy: Make get_natural_endpoints handle non-init state
Make get_natural_endpoints return local address iff token metadata
is not yet setup (since that is the one address we already know of).

If a request has a consistency level requiring more endpoints, it
will still fail, but for calls with, for example, CL=ONE, at startup
we will succeed, and more or less act like local strategy. Yet,
further down the line, have data distributed as desired.

Acked-by: Gleb Natapov <gleb@scylladb.com>
Message-Id: <20170926113512.15707-1-calle@scylladb.com>
2017-09-26 15:21:30 +03:00
Asias He
98e9049820 gossip: Print SCHEMA_TABLES_VERSION correctly
Found this when debugging gossip with debug print. The application state
SCHEMA_TABLES_VERSION was printed as UNKNOWN.
Message-Id: <d7616920d2e6516b5470a758bcf9c88f3d857381.1506391495.git.asias@scylladb.com>
2017-09-26 08:38:28 +02:00
Tomasz Grabiec
e5e9886014 tests: mvcc: Add test for partition_snapshot_row_cursor 2017-09-25 11:21:58 +02:00
Tomasz Grabiec
e4adc9c600 tests: row_cache: Add test for concurrent population 2017-09-25 11:21:58 +02:00
Tomasz Grabiec
a3fb7ce660 tests: row_cache: Make populate_range() accept partition_range 2017-09-25 11:21:58 +02:00
Tomasz Grabiec
dd7af02251 tests: Add simple_schema::make_ckey_range() 2017-09-25 11:21:58 +02:00
Tomasz Grabiec
e83cd508f6 cache_streamed_mutation: Add missing _next_row.maybe_refresh() call
We were checking if the cursor is up_to_date(), but this is not enough
to guarantee that the cursor is valid, merely that its iterators are
valid. The cursor may be invalidated even if its iterators are valid
if there was an insertion after cursor's position.

Fixes #2834.
2017-09-25 11:21:58 +02:00
Tomasz Grabiec
2f8d91043d mvcc: partition_snapshot_row_cursor: Fix cursor skipping over rows added after its position
The cursor maintains a heap of iterators in all versions. If rows were
inserted before the latest version's iterator, cursor would not see
them. Fix by redoing the lookup for iterators not in the current row
in maybe_refresh().

Refs #2834.
2017-09-25 11:21:58 +02:00
Tomasz Grabiec
09d99b0358 mvcc: partition_snapshot_row_cursor: Rename up_to_date() to iterators_valid() 2017-09-25 11:21:58 +02:00
Tomasz Grabiec
4ee11641c0 mvcc: Keep track of all iterators in partition_snapshot_row_cursor
Will be needed when updating the iterator for latest version. Before
this change, such iterator could be neither in _current_row nor in
_heap.

Besides that, this will allow user to always access the iterator of
latest version, which enables some optimizations in the future of
avoiding unnecessary lookups. get_iterator_in_latest_version() is now
always valid.
2017-09-25 11:21:58 +02:00
Tomasz Grabiec
a8cbd34dde mvcc: Make partition_snapshot_row_cursor printable 2017-09-25 11:21:58 +02:00
Tomasz Grabiec
8e46d15f91 storage_service: Register features before joining
Since commit 8378fe190, we disable schema sync in a mixed cluster.
The detection is done using gossiper features. We need to make sure
the features are registerred, and thus can be enabled, before the
bootstrapping of a non-seed node happens. Otherwise the bootstrap will
hang waiting on schema sync which will not happen.
Message-Id: <1505893837-27876-2-git-send-email-tgrabiec@scylladb.com>
2017-09-25 09:13:02 +01:00
Tomasz Grabiec
b92dcb0284 storage_service: Extract register_features()
Message-Id: <1505893837-27876-1-git-send-email-tgrabiec@scylladb.com>
2017-09-25 09:12:46 +01:00
Tomasz Grabiec
d11d696072 tests: mutation_source_tests: Fix use-after-scope on partition range
Message-Id: <1506096881-3076-1-git-send-email-tgrabiec@scylladb.com>
2017-09-22 19:13:47 +02:00
Botond Dénes
015ac042a8 combined_mutation_reader_test: remove unneeded includes
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <a388efa6fc93049f4d69c049764cc9225a04bce4.1506098363.git.bdenes@scylladb.com>
2017-09-22 18:45:04 +02:00
Botond Dénes
a7984a9908 combined_mutation_reader_test: remove leftover debug logging
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <96e61fcd2543ec84921f1b2188d7248e55e7efe0.1506097635.git.bdenes@scylladb.com>
2017-09-22 18:44:47 +02:00
Tomasz Grabiec
5def901a92 sstables: Don't register logger with the same name twice
There can be one logger with given name. This was causing
--logger-log-level sstable=trace to not work for the majority of log
points.

Message-Id: <1505902259-4561-1-git-send-email-tgrabiec@scylladb.com>
2017-09-20 16:40:06 +03:00
Tomasz Grabiec
02d41864af Merge "Fix miss opportunity to update gossiper features" from Asias
The gossiper checks if features should be enabled from its timer
callback when it detects that endpoint_state_map changed, that is
different than shadow_endpoint_state_map.

shadow_endpoint_state_map is also assigned from endpoint_state_map in
storage_service::replicate_tm_and_ep_map(), called from
storage_service::on_change()

Call gossiper:maybe_enable_features() in replicate_tm_and_ep_map so
that we won't miss gossip feature update.

Fixes #2824

* git@github.com:scylladb/seastar-dev asias/gossip_miss_feature_update_v1:
  gossip: Move the _features_condvar signal code to
    maybe_enable_features
  gossip: Make maybe_enable_features public
  storage_service: Check gossip feature update in
    replicate_tm_and_ep_map
2017-09-20 11:16:37 +02:00
Asias He
ebc3bada12 storage_service: Check gossip feature update in replicate_tm_and_ep_map
This is another place we can update endpoint_state_map in addition to
gossiper::run().

Call the gossiper:maybe_enable_features() so that we won't miss gossip
feature update.
2017-09-20 16:58:33 +08:00
Asias He
6022b7423a gossip: Make maybe_enable_features public
It will be needed by storage_service.
2017-09-20 16:58:33 +08:00
Asias He
68c7a391b5 gossip: Move the _features_condvar signal code to maybe_enable_features
It is easier to call to features update logic outside gossiper.
2017-09-20 16:58:32 +08:00
Asias He
173cba67ba storage_service: Remove rpc client on all shards in on_dead
We should close connections to nodes that are down on all shards instead
of the shard which runs the on_dead gossip callback.

Found by Gleb.
Message-Id: <527a14105a07218066e9f1da943693d9de6993e5.1505894260.git.asias@scylladb.com>
2017-09-20 10:23:31 +02:00
Botond Dénes
43dba8f173 Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics.
2017-09-20 11:16:21 +03:00
Botond Dénes
b2db29dc65 Add restricted_reader_test unit test 2017-09-20 11:15:45 +03:00
Botond Dénes
33e97e7457 restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-09-20 11:14:35 +03:00
Botond Dénes
e4a9e55e0d mutation_reader.hh: Move restricted_reader related code
In preparation of make_restricted_reader taking a mutation_source as
its argument.
2017-09-20 11:12:57 +03:00
Tomasz Grabiec
741ec61269 streaming: Fix streaming not streaming all ranges
It skipped one sub-range in each of the 10 range batch, and
tried to access the range vector using end() iterator.

Fixes sporadic failures of
update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_node_1_test.

Message-Id: <1505848902-16734-1-git-send-email-tgrabiec@scylladb.com>
2017-09-20 10:33:59 +03:00