Commit Graph

15678 Commits

Author SHA1 Message Date
Avi Kivity
95b00aae33 Revert scylla-ami update in "scylla_setup: fix conditional statement of silent mode"
This reverts part of commit 364c2551c8. I mistakenly
changed the scylla-ami submodule in addition to applying the patch. The revert
keeps the intended part of the patch and undoes the scylla-ami change.
2018-06-10 14:53:40 +03:00
Asias He
d23dafa7ac dht: Remove column_families parameter in add_rx_ranges and add_tx_ranges
In 4b1034b (storage_service: Remove the stream_hints), we removed the
only user of the api with the column_families parameter.

std::vector column_families = { db::system_keyspace::HINTS };
streamer->add_tx_ranges(keyspace, std::move(ranges_per_endpoint),
column_families);

We can simplify the code range_streamer a bit by removing it.

Fixes #3476

Tests: dtest update_cluster_layout_tests.py
Message-Id: <c81d79c5e6dbc8dd78c1242837de892e39d6abd2.1528356342.git.asias@scylladb.com>
2018-06-10 14:53:40 +03:00
Avi Kivity
f9d66f88bb transport: advertise the shard serving a connection
It is useful for the client driver to know which shard is serving a
particular connection, so it can only send requests through that connection
which will be served by the same shard, eliminating a hop.

Support that by advertising a "SCYLLA_SHARD" option, with a value
corresponding to the shard number.

Acked-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180606203437.1198-1-avi@scylladb.com>
2018-06-07 10:43:16 +03:00
Avi Kivity
4a90eeb326 Update seastar submodule
* seastar 12cffef...e7275e4 (9):
  > tests: execution_stage_test: capture sg by value
  > Merge "Add in-path parameter suport to the code generation" from Amnon
  > Merge "Add scheduling_group inheritance to execution_stage" from Avi
  > tutorial: explain how to find origin of exception
  > tls: Ensure handshake always drains output before return/throw
  > build: cmake: correct stdc++fs library name once more
  > perftune.py: make sure config file existing before write
  > Update travis-ci integration
  > build: fix compilation issues on cmake. missing stdc++-fs
2018-06-06 19:07:16 +03:00
Avi Kivity
6f23403137 Merge "Virtualize IndexInfo system table" from Duarte
"
The IndexInfo table tracks the secondary indexes that have already
been populated. Since our secondary index implementation is backed by
materialized views, we can virtualize that table so queries are
actually answered by built_views.

Fixes #3483
"

* 'built-indexes-virtual-reader/v2' of github.com:duarten/scylla:
  tests/virtual_reader_test: Add test for built indexes virtual reader
  db/system_keysace: Add virtual reader for IndexInfo table
  db/system_keyspace: Explain that table_name is the keyspace in IndexInfo
  index/secondary_index_manager: Expose index_table_name()
  db/legacy_schema_migrator: Don't migrate indexes
2018-06-06 17:35:51 +03:00
Piotr Sarna
0818eb42ae cql3: remove additional IN relation check
Commit 80fc1b1408 introduced additional checks to ensure
that IN relation in WHERE clause can only occur on last restricted
column. This check is not present in current Cassandra code,
the restriction isn't mentioned anywhere in 'IN relation' documentation
and removing it fixes issue 2865.
Running cql_tests dtest suite doesn't show any regression after removing
this check.

Also at: https://github.com/psarna/scylla/tree/remove_additional_in_relation_check

Tests: dtest (cql_tests), unit (release)

Fixes #2865

Message-Id: <aa8c0b33618dd58cd153e83589ac016bc63f4343.1528288388.git.sarna@scylladb.com>
2018-06-06 16:01:54 +03:00
Tomasz Grabiec
9975135110 row_cache: Make sure reader makes forward progress after each fill_buffer()
If reader's buffer is small enough, or preemption happens often
enough, fill_buffer() may not make enough progress to advance
_lower_bound. If also iteartors are constantly invalidated across
fill_buffer() calls, the reader will not be able to make progress.

See row_cache_test.cc::test_reading_progress_with_small_buffer_and_invalidation()
for an examplary scenario.

Also reproduced in debug-mode row_cache_test.cc::test_concurrent_reads_and_eviction

Message-Id: <1528283957-16696-1-git-send-email-tgrabiec@scylladb.com>
2018-06-06 16:01:52 +03:00
Vlad Zolotarov
12e3e4fb2a service::client_state::has_access(): make readable_system_resources an std::unordered_set
There is not reason to use an std::set for it since we don't care about
the ordering - only about the existance of a particular entry.
Hash table will be more efficient for this use case.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1528220892-5784-2-git-send-email-vladz@scylladb.com>
2018-06-06 15:29:29 +03:00
Duarte Nunes
833d34e88a Merge 'Make rows in a secondary index ordered by token' from Piotr
"
As in #3423, ensuring token order on secondary index queries can be done
by adding an additional column to views that back secondary indexes.
This column is a first clustering column and contains token value,
computed on updates.
This series also updates tests and comments refering to issue 3423.

Tests: unit (release, debug)
"

* 'order_by_token_in_si_5' of https://github.com/psarna/scylla:
  cql3: update token order comments
  index, tests: add token column to secondary index schema
  view: add handling of a token column for secondary indexes
  view: add is_index method
2018-06-06 10:07:43 +01:00
Vlad Zolotarov
2dde372ae6 locator::ec2_multi_region_snitch: don't call for ec2_snitch::gossiper_starting()
ec2_snitch::gossiper_starting() calls for the base class (default) method
that sets _gossip_started to TRUE and thereby prevents to following
reconnectable_snitch_helper registration.

Fixes #3454

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1528208520-28046-1-git-send-email-vladz@scylladb.com>
2018-06-06 12:00:17 +03:00
Asias He
6496cdf0fb db: Get rid of the streaming memtable delayed flush
In 455d5a5 (streaming memtables: coalesce incoming writes), we
introduced the delayed flush to coalesce incoming streaming mutations
from different stream_plan.

However, most of the time there will be one stream plan at a time, the
next stream plan won't start until the previous one is finished. So, the
current coalescing does not really work.

The delayed flush adds 2s of dealy for each stream session. If we have lots
of table to stream, we will waste a lot of time.

We stream a keyspace in around 10 stream plans, i.e., 10% of ranges a
time. If we have 5000 tables, even if the tables are almost empty, the
delay will waste 5000 * 10 * 2 = 27 hours.

To stream a keyspace with 4 tables, each table has 1000 rows.

Before:

 [shard 0] stream_session - [Stream #944373d0-5d9c-11e8-9cdb-000000000000] Executing streaming plan for Bootstrap-ks-index-0 with peers={127.0.0.1}, master
 [shard 0] stream_session - [Stream #944373d0-5d9c-11e8-9cdb-000000000000] Streaming plan for Bootstrap-ks-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1030 KiB, 125.21 KiB/s
 [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks succeeded, took 8.233 seconds

After:

 [shard 0] stream_session - [Stream #e00bf6a0-5d99-11e8-a7b8-000000000000] Executing streaming plan for Bootstrap-ks-index-0 with peers={127.0.0.1}, master
 [shard 0] stream_session - [Stream #e00bf6a0-5d99-11e8-a7b8-000000000000] Streaming plan for Bootstrap-ks-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1030 KiB, 4772.32 KiB/s
 [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks succeeded, took 0.216 seconds

Fixes #3436

Message-Id: <cb2dde263782d2a2915ddfe678c74f9637ffd65b.1526979175.git.asias@scylladb.com>
2018-06-06 10:16:02 +03:00
Piotr Sarna
70ba8c8317 cql3: update token order comments
Comments about token order were outdated with token column patches
and they are now up to date.

Fixes #3423
2018-06-06 09:02:37 +02:00
Piotr Sarna
4a9bf7ed5b index, tests: add token column to secondary index schema
Additional token column is now present in every view schema
that backs a secondary index. This column is always a first part
of the clustering key, so it forces token order on queries.
Column's name is ideally idx_token, but can be postfixed
with a number to ensure its uniqueness.

It also updates tests to make them acknowledge the new token order.

Fixes #3423
2018-06-06 09:02:33 +02:00
Piotr Sarna
d5e7b5507b view: add handling of a token column for secondary indexes
In order to ensure token order on secondary index queries,
first clustering column for each view that backs a secondary index
is going to store a token computed from base's partition keys.
After this commit, if there exists a column that is not present
in base schema, it will be filled with computed token.
2018-06-05 18:59:25 +02:00
Tomasz Grabiec
f775fc2e4c mvcc: Fix partition_entry::open_version()
After 70c72773be it's possible that
open_version() is called with a phase which is smaller than the phase
of the latest version, because latest version belongs to the
in-progress cache update. In such case we must return the existing
non-latest snapshot and not create a new version on top of the
in-progress update. Not doing this violates several invariants, and
may lead to inconsistencies, including violation of write atomicity or
temporary loss of writes.

partition_entry::read() was already adjusted by the aforementioned
commit. Do a similar adjustement for open_version().

Fixes sporadic failures of row_cache_test.cc::test_concurrent_reads_and_eviction
Message-Id: <1528211847-22825-1-git-send-email-tgrabiec@scylladb.com>
2018-06-05 18:22:38 +03:00
Takuya ASADA
60844ae67b dist/common/scripts/scylla_coredump_setup: don't run sysctl on Ubuntu 18.04
Since 99-scylla.conf is not included on Ubuntu 18.04, skip running it.

Fixes #3494

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180605093619.9197-1-syuu@scylladb.com>
2018-06-05 12:47:46 +03:00
Takuya ASADA
222b8588ee dist/common/systemd/scylla-server.service.in: add local-fs.target as dependency
We mistakenly only added network-online.target is doens't promises to
wait /var/lib/scylla mount.
To do this we need local-fs.target.

Fixes #3441

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180521083349.8970-1-syuu@scylladb.com>
2018-06-05 12:26:21 +03:00
Piotr Sarna
06eee0f525 view: add is_index method
is_index method returns true if view that owns it
is backing a secondary index.
2018-06-05 11:10:24 +02:00
Piotr Sarna
6130a00597 dist: add scylla/hints directory to scripts
/var/lib/scylla/hints directory was missing from dist-specific
scripts, which may cause package installations to fail.
Package building scripts and descriptions are updated/

Fixes #3495

Message-Id: <0f5596cb49500416820ece023b7f76a4e2427799.1528184949.git.sarna@scylladb.com>
2018-06-05 11:33:29 +03:00
Avi Kivity
4aaf7bbc1d Merge "Add test for compression" from Piotr
"
It turns out that compression just works for SSTables 3.x.
Thanks to the previous work done on the write path.
This series cleans up tests a bit and introduces test for compression
on the read path.
"

* 'haaawk/sstables3/read-compression-v1' of ssh://github.com/scylladb/seastar-dev:
  Add test for compression in sstables 3.x
  Extract test_partition_key_with_values_of_different_types_read
  sstable_3_x_test: use SEASTAR_THREAD_TEST_CASE
  Drop UNCOMPRESSD_ when code will be used for compressed too
2018-06-04 20:33:50 +03:00
Piotr Jastrzebski
25a7f03f7f Add test for compression in sstables 3.x
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-06-04 18:41:10 +02:00
Piotr Jastrzebski
be9c7391aa Extract test_partition_key_with_values_of_different_types_read
It will be used also for testing compression.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-06-04 18:41:10 +02:00
Piotr Jastrzebski
1f324b7fc8 sstable_3_x_test: use SEASTAR_THREAD_TEST_CASE
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-06-04 18:40:52 +02:00
Piotr Jastrzebski
3e3ccdb323 Drop UNCOMPRESSD_ when code will be used for compressed too
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-06-04 18:29:02 +02:00
Avi Kivity
6d6c355dc0 Merge "augment system.local with sharding information" from Glauber
"
This patch adds nr_shards, msb_ignore, and the actual sharding algorithm to the
system.local table. Drivers and other tools can then make use of this
information to talk to scylla in an optimal way
"

* 'system_tables-v3' of github.com:glommer/scylla:
  system_keyspace: add sharding information to local table
  partitioner: export the name of the algorithm used to do intra-node sharding
2018-06-04 18:50:28 +03:00
Glauber Costa
bdce561ada system_keyspace: add sharding information to local table
We would like the clients to be able to route work directly to the right
shards. To do that, they need to know the sharding algorithm and its
parameters.

The algorithm can be copied into the client, but the parameters need to
be exported somewhere. Let's use the local table for that.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
---
v2: force msb to zero on non-murmur
2018-06-04 11:25:58 -04:00
Glauber Costa
250d9332dc partitioner: export the name of the algorithm used to do intra-node sharding
We will export this on system tables. To avoid hard-coding it in the system
table level, keep it at least in the dht layer where it belongs.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-06-04 11:25:58 -04:00
Takuya ASADA
ad4ca1e166 dist: simplified build script templates
Currently, build_deb.sh looks very complicated because each of distribution
requires different parameter, and we are applying them by sed command one-by-one.

This patch will replace them by Mustache, it's simple and easy syntax
template language.
Both .rpm distributions and .deb distributions have pystache (a Python
implimentation of Mustache), we will use it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180604104026.22765-1-syuu@scylladb.com>
2018-06-04 14:38:52 +03:00
Paweł Dziepak
24764712b6 sstable: fix capture by reference of stack variable in continuation
Message-Id: <20180604102542.21799-1-pdziepak@scylladb.com>
2018-06-04 14:35:49 +03:00
Duarte Nunes
dfa779ebe7 Merge 'Separate hinted handoff manager for materialized views' from Piotr
"
This series introduces a separate hinted handoff manager for materialized views.

Steps:
 * decouple resource limits from hinted handoff, so multiple instances can share space
   and throughput limits in order to avoid internal fragmentation for every instance's
   reservations
 * add a subdirectory to data/, responsible for storing materialized view hints
 * decouple registering global metrics from hinted handoff constructor, now that there
   can be more than one instance - otherwise 'registering metrics twice' errors are going to occur
 * add a hints_for_views_manager to storage proxy and route failed view updates to use it
   instead of the original hints_manager
 * restore previous semantics for enabling/disabling hinted handoff - regular hinted handoff
   can be disabled or enabled just for specific datacenters without influencing materialized
   views flow
"

* 'separate_hh_for_mv_4' of https://github.com/psarna/scylla:
  storage_proxy: restore optional hinted handoff
  storage_proxy: add hints manager for views
  hints: decouple hints manager metrics from constructor
  db, config: add view_pending_updates directory
  hints: move space_watchdog to resource manager
  hints: move send limiter to resource manager
  hints: move constants to resource_manager
2018-06-04 12:03:59 +01:00
Duarte Nunes
01676a2cda tests/virtual_reader_test: Add test for built indexes virtual reader
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-06-04 11:31:29 +01:00
Duarte Nunes
3e39985c7a db/system_keysace: Add virtual reader for IndexInfo table
The IndexInfo table tracks the secondary indexes that have already
been populated. Since our secondary index implementation is backed by
materialized views, we can virtualize that table so queries are
actually answered by built_views.

Fixes #3483

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-06-04 11:14:17 +01:00
Duarte Nunes
65c4205334 db/system_keyspace: Explain that table_name is the keyspace in IndexInfo
This patch adds the same comment that exists in Apache Cassandra,
explaining that the table_name column in the IndexInfo system table
actually refers to the keyspace name. Don't be fooled.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-06-04 11:14:17 +01:00
Duarte Nunes
bc4db67524 index/secondary_index_manager: Expose index_table_name()
Expose secondary_index::index_table_name() so knowledge on how to
built an index name can remain centralized.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-06-04 11:14:17 +01:00
Duarte Nunes
7187963bda db/legacy_schema_migrator: Don't migrate indexes
Previous versions contained no indexes, and Apache Cassandra indexes
cannot be migrated to Scylla.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-06-04 11:14:17 +01:00
Vlad Zolotarov
e759803f48 cql3::authorized_prepared_statements_cache: properly set the expiration timeout
Because authorized_prepared_statements_cache caches the information that comes from
the permissions cache and from the prepared statements cache it should has the entries
expiration period set to the minimum of expiration periods of these caches.

The same goes to the entry refresh period but since prepared statements cache does have a
refresh period authorized_prepared_statements_cache's entries refresh period
is simply equal to the one of the permissions cache.

Fixes #3473

Tests: dtest{release} auth_test.py

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1527789716-6206-1-git-send-email-vladz@scylladb.com>
2018-06-04 10:34:54 +02:00
Piotr Jastrzebski
0b72594c1f data_consume_rows_context_m: Use find_first and find_next
Those methods of boost::dynamic_bitset allow much more
efficient implementation of skip_absent_columns and
move_to_next_column.

Also fix some indentation and variable naming.

Test: unit {release}

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <8a4dea51060c5a02bb774eac43e9eb67d316049a.1528100153.git.piotr@scylladb.com>
2018-06-04 11:18:03 +03:00
Piotr Sarna
f12fdcffdb storage_proxy: restore optional hinted handoff
Since hinted handoff for materialized views is now a separate entity,
regular hinted handoff can go back to being optional.
2018-06-04 09:46:06 +02:00
Piotr Sarna
a6aae369da storage_proxy: add hints manager for views
This commit adds a separate hints manager that serves
only failed materialized view updates.
2018-06-04 09:46:06 +02:00
Piotr Sarna
204bc17bd7 hints: decouple hints manager metrics from constructor
Now that more than one instance of hints manager can be present
at the same time, registering metrics is moved out of the constructor
to prevent 'registering metrics twice' errors.
2018-06-04 09:46:06 +02:00
Piotr Sarna
a791dce0ae db, config: add view_pending_updates directory
Hints for materialized view updates need to be kept somewhere,
because their dedicated hints manager has to have a root directory.
view_pending_updates directory resides in /data and is used
for that purpose.
2018-06-04 09:46:06 +02:00
Piotr Sarna
f345efc79a hints: move space_watchdog to resource manager
Space watchdog is decoupled from hints manager and moved to resource
manager, so it can be shared among different hints manager instances.
2018-06-04 09:46:01 +02:00
Piotr Sarna
ef40f7e628 hints: move send limiter to resource manager
Send limiting semaphore is moved from hints manager to resource manager.
In consequence, hints manager now keeps a reference to its resource
manager.
2018-06-04 09:35:58 +02:00
Piotr Sarna
2315937854 hints: move constants to resource_manager
Constants related to managing resources are moved to newly created
resource_manager class. Later, this class will be used to manage
(potentially shared) resources of hints managers.
2018-06-04 09:35:58 +02:00
Avi Kivity
9b21fbc055 Merge "LCS: enable compaction controller" from Glauber
"

In preparation, we change LCS so that it tries harder to push data
to the last level, where the backlog is supposed to be zero.

The backlog is defined as:

backlog_of_stcs_in_l0 + Sum(L in level) sizeof(L) * (max_level - L) * fan_out

where:
 * the fan_out is the amount of SSTables we usually compact with the
   next level (usually 10).
 * max_levels is the number of levels currently populated
 * sizeof(L) is the total amount of data in a particular level.

Tests: unit (release)
"

* 'lcs-backlog-v2' of github.com:glommer/scylla:
  LCS: implement backlog tracker for compaction controller
  LCS: don't construct property in the body of constructor
  LCS: try harder to move SSTables to highest levels.
  leveled manifest: turn 10 into a constant
  backlog: add level to write progress monitor
2018-06-04 10:29:56 +03:00
Amos Kong
364c2551c8 scylla_setup: fix conditional statement of silent mode
Commit 300af65555 introdued a problem in
conditional statement, script will always abort in silent mode, it doesn't
care about the return value.

Fixes #3485

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <1c12ab04651352964a176368f8ee28f19ae43c68.1528077114.git.amos@scylladb.com>
2018-06-04 10:14:06 +03:00
Glauber Costa
6317bd45d7 LCS: implement backlog tracker for compaction controller
This is the last missing tracker among the major strategies. After
this, only DTCS is left.

To calculate the backlog, we will define the point of zero-backlog
as having all data in the last level. The backlog is then:

Sum(L in levels) sizeof(L) * (max_levels - L) * fan_out,

where:
 * the fan_out is the amount of SSTables we usually compact with the
   next level (usually 10).
 * max_levels is the number of levels currently populated
 * sizeof(L) is the total amount of data in a particular level.

Care is taken for the backlog not to jump when a new level has been just
recently created.

Aside from that, SSTables that accumulate in L0 can be subject to STCS.
We will then add a STCS backlog in those SSTables to represent that.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-06-03 18:14:09 -04:00
Glauber Costa
04546df55c LCS: don't construct property in the body of constructor
Right now we are constructing the _max_sstable_size_in_mb property in
the body of the constructor, which it makes it hard for us to use from
other properties.

We are doing that because we'd like to test for bounds of that value. So
a cleaner way is to have a helper function for that.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-06-03 18:14:09 -04:00
Glauber Costa
28382cb25c LCS: try harder to move SSTables to highest levels.
Our current implementation of LCS can end up with situations in which
just a bit of data is in the highest levels, with the majority in the
lowest levels. That happens because we will only promote things to
highest levels if the amount of data in the current level is higher than
the maximum.

This is a pre-existing problem in itself, but became even clearer when
we started trying to define what is the backlog for LCS.

We have discussed ways to fix this it by redefining the criteria on when
to move data to the next levels. That would require us to change the way
things are today considerably, allowing parallel compactions, etc. There
is significant risk that we'll increase write amplication and we would
need to carefully validate that.

For now I will propose a simpler change, that essentially solves the
"inverted pyramid" problem of current LCS without major disruption:
keep selecting compaction candidates with the same criteria that we do
today, we should help make sure we are not compacting high levels for no
reason; but if there is nothing to do, use the idle time to push data to
higher levels. As an added benefit, old data that is in the higher level
can also be compacted away faster.

With this patch we see that in an idle, post-load system all data is
eventually pushed to the last level. Systems under constant writes keep
behaving the same way they did before.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-06-03 18:12:19 -04:00
Glauber Costa
e64b471e3d leveled manifest: turn 10 into a constant
We increase levels in powers of 10 but that is a parameter
of the algorithm. At least make it into a constant so that we can
reuse it somewhere else.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-06-03 16:55:58 -04:00