This reverts part of commit 364c2551c8. I mistakenly
changed the scylla-ami submodule in addition to applying the patch. The revert
keeps the intended part of the patch and undoes the scylla-ami change.
In 4b1034b (storage_service: Remove the stream_hints), we removed the
only user of the api with the column_families parameter.
std::vector column_families = { db::system_keyspace::HINTS };
streamer->add_tx_ranges(keyspace, std::move(ranges_per_endpoint),
column_families);
We can simplify the code range_streamer a bit by removing it.
Fixes#3476
Tests: dtest update_cluster_layout_tests.py
Message-Id: <c81d79c5e6dbc8dd78c1242837de892e39d6abd2.1528356342.git.asias@scylladb.com>
It is useful for the client driver to know which shard is serving a
particular connection, so it can only send requests through that connection
which will be served by the same shard, eliminating a hop.
Support that by advertising a "SCYLLA_SHARD" option, with a value
corresponding to the shard number.
Acked-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180606203437.1198-1-avi@scylladb.com>
* seastar 12cffef...e7275e4 (9):
> tests: execution_stage_test: capture sg by value
> Merge "Add in-path parameter suport to the code generation" from Amnon
> Merge "Add scheduling_group inheritance to execution_stage" from Avi
> tutorial: explain how to find origin of exception
> tls: Ensure handshake always drains output before return/throw
> build: cmake: correct stdc++fs library name once more
> perftune.py: make sure config file existing before write
> Update travis-ci integration
> build: fix compilation issues on cmake. missing stdc++-fs
"
The IndexInfo table tracks the secondary indexes that have already
been populated. Since our secondary index implementation is backed by
materialized views, we can virtualize that table so queries are
actually answered by built_views.
Fixes#3483
"
* 'built-indexes-virtual-reader/v2' of github.com:duarten/scylla:
tests/virtual_reader_test: Add test for built indexes virtual reader
db/system_keysace: Add virtual reader for IndexInfo table
db/system_keyspace: Explain that table_name is the keyspace in IndexInfo
index/secondary_index_manager: Expose index_table_name()
db/legacy_schema_migrator: Don't migrate indexes
If reader's buffer is small enough, or preemption happens often
enough, fill_buffer() may not make enough progress to advance
_lower_bound. If also iteartors are constantly invalidated across
fill_buffer() calls, the reader will not be able to make progress.
See row_cache_test.cc::test_reading_progress_with_small_buffer_and_invalidation()
for an examplary scenario.
Also reproduced in debug-mode row_cache_test.cc::test_concurrent_reads_and_eviction
Message-Id: <1528283957-16696-1-git-send-email-tgrabiec@scylladb.com>
There is not reason to use an std::set for it since we don't care about
the ordering - only about the existance of a particular entry.
Hash table will be more efficient for this use case.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1528220892-5784-2-git-send-email-vladz@scylladb.com>
"
As in #3423, ensuring token order on secondary index queries can be done
by adding an additional column to views that back secondary indexes.
This column is a first clustering column and contains token value,
computed on updates.
This series also updates tests and comments refering to issue 3423.
Tests: unit (release, debug)
"
* 'order_by_token_in_si_5' of https://github.com/psarna/scylla:
cql3: update token order comments
index, tests: add token column to secondary index schema
view: add handling of a token column for secondary indexes
view: add is_index method
ec2_snitch::gossiper_starting() calls for the base class (default) method
that sets _gossip_started to TRUE and thereby prevents to following
reconnectable_snitch_helper registration.
Fixes#3454
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1528208520-28046-1-git-send-email-vladz@scylladb.com>
In 455d5a5 (streaming memtables: coalesce incoming writes), we
introduced the delayed flush to coalesce incoming streaming mutations
from different stream_plan.
However, most of the time there will be one stream plan at a time, the
next stream plan won't start until the previous one is finished. So, the
current coalescing does not really work.
The delayed flush adds 2s of dealy for each stream session. If we have lots
of table to stream, we will waste a lot of time.
We stream a keyspace in around 10 stream plans, i.e., 10% of ranges a
time. If we have 5000 tables, even if the tables are almost empty, the
delay will waste 5000 * 10 * 2 = 27 hours.
To stream a keyspace with 4 tables, each table has 1000 rows.
Before:
[shard 0] stream_session - [Stream #944373d0-5d9c-11e8-9cdb-000000000000] Executing streaming plan for Bootstrap-ks-index-0 with peers={127.0.0.1}, master
[shard 0] stream_session - [Stream #944373d0-5d9c-11e8-9cdb-000000000000] Streaming plan for Bootstrap-ks-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1030 KiB, 125.21 KiB/s
[shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks succeeded, took 8.233 seconds
After:
[shard 0] stream_session - [Stream #e00bf6a0-5d99-11e8-a7b8-000000000000] Executing streaming plan for Bootstrap-ks-index-0 with peers={127.0.0.1}, master
[shard 0] stream_session - [Stream #e00bf6a0-5d99-11e8-a7b8-000000000000] Streaming plan for Bootstrap-ks-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1030 KiB, 4772.32 KiB/s
[shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks succeeded, took 0.216 seconds
Fixes#3436
Message-Id: <cb2dde263782d2a2915ddfe678c74f9637ffd65b.1526979175.git.asias@scylladb.com>
Additional token column is now present in every view schema
that backs a secondary index. This column is always a first part
of the clustering key, so it forces token order on queries.
Column's name is ideally idx_token, but can be postfixed
with a number to ensure its uniqueness.
It also updates tests to make them acknowledge the new token order.
Fixes#3423
In order to ensure token order on secondary index queries,
first clustering column for each view that backs a secondary index
is going to store a token computed from base's partition keys.
After this commit, if there exists a column that is not present
in base schema, it will be filled with computed token.
After 70c72773be it's possible that
open_version() is called with a phase which is smaller than the phase
of the latest version, because latest version belongs to the
in-progress cache update. In such case we must return the existing
non-latest snapshot and not create a new version on top of the
in-progress update. Not doing this violates several invariants, and
may lead to inconsistencies, including violation of write atomicity or
temporary loss of writes.
partition_entry::read() was already adjusted by the aforementioned
commit. Do a similar adjustement for open_version().
Fixes sporadic failures of row_cache_test.cc::test_concurrent_reads_and_eviction
Message-Id: <1528211847-22825-1-git-send-email-tgrabiec@scylladb.com>
We mistakenly only added network-online.target is doens't promises to
wait /var/lib/scylla mount.
To do this we need local-fs.target.
Fixes#3441
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180521083349.8970-1-syuu@scylladb.com>
"
It turns out that compression just works for SSTables 3.x.
Thanks to the previous work done on the write path.
This series cleans up tests a bit and introduces test for compression
on the read path.
"
* 'haaawk/sstables3/read-compression-v1' of ssh://github.com/scylladb/seastar-dev:
Add test for compression in sstables 3.x
Extract test_partition_key_with_values_of_different_types_read
sstable_3_x_test: use SEASTAR_THREAD_TEST_CASE
Drop UNCOMPRESSD_ when code will be used for compressed too
"
This patch adds nr_shards, msb_ignore, and the actual sharding algorithm to the
system.local table. Drivers and other tools can then make use of this
information to talk to scylla in an optimal way
"
* 'system_tables-v3' of github.com:glommer/scylla:
system_keyspace: add sharding information to local table
partitioner: export the name of the algorithm used to do intra-node sharding
We would like the clients to be able to route work directly to the right
shards. To do that, they need to know the sharding algorithm and its
parameters.
The algorithm can be copied into the client, but the parameters need to
be exported somewhere. Let's use the local table for that.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
---
v2: force msb to zero on non-murmur
We will export this on system tables. To avoid hard-coding it in the system
table level, keep it at least in the dht layer where it belongs.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Currently, build_deb.sh looks very complicated because each of distribution
requires different parameter, and we are applying them by sed command one-by-one.
This patch will replace them by Mustache, it's simple and easy syntax
template language.
Both .rpm distributions and .deb distributions have pystache (a Python
implimentation of Mustache), we will use it.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180604104026.22765-1-syuu@scylladb.com>
"
This series introduces a separate hinted handoff manager for materialized views.
Steps:
* decouple resource limits from hinted handoff, so multiple instances can share space
and throughput limits in order to avoid internal fragmentation for every instance's
reservations
* add a subdirectory to data/, responsible for storing materialized view hints
* decouple registering global metrics from hinted handoff constructor, now that there
can be more than one instance - otherwise 'registering metrics twice' errors are going to occur
* add a hints_for_views_manager to storage proxy and route failed view updates to use it
instead of the original hints_manager
* restore previous semantics for enabling/disabling hinted handoff - regular hinted handoff
can be disabled or enabled just for specific datacenters without influencing materialized
views flow
"
* 'separate_hh_for_mv_4' of https://github.com/psarna/scylla:
storage_proxy: restore optional hinted handoff
storage_proxy: add hints manager for views
hints: decouple hints manager metrics from constructor
db, config: add view_pending_updates directory
hints: move space_watchdog to resource manager
hints: move send limiter to resource manager
hints: move constants to resource_manager
The IndexInfo table tracks the secondary indexes that have already
been populated. Since our secondary index implementation is backed by
materialized views, we can virtualize that table so queries are
actually answered by built_views.
Fixes#3483
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds the same comment that exists in Apache Cassandra,
explaining that the table_name column in the IndexInfo system table
actually refers to the keyspace name. Don't be fooled.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Expose secondary_index::index_table_name() so knowledge on how to
built an index name can remain centralized.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Because authorized_prepared_statements_cache caches the information that comes from
the permissions cache and from the prepared statements cache it should has the entries
expiration period set to the minimum of expiration periods of these caches.
The same goes to the entry refresh period but since prepared statements cache does have a
refresh period authorized_prepared_statements_cache's entries refresh period
is simply equal to the one of the permissions cache.
Fixes#3473
Tests: dtest{release} auth_test.py
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1527789716-6206-1-git-send-email-vladz@scylladb.com>
Now that more than one instance of hints manager can be present
at the same time, registering metrics is moved out of the constructor
to prevent 'registering metrics twice' errors.
Hints for materialized view updates need to be kept somewhere,
because their dedicated hints manager has to have a root directory.
view_pending_updates directory resides in /data and is used
for that purpose.
Constants related to managing resources are moved to newly created
resource_manager class. Later, this class will be used to manage
(potentially shared) resources of hints managers.
"
In preparation, we change LCS so that it tries harder to push data
to the last level, where the backlog is supposed to be zero.
The backlog is defined as:
backlog_of_stcs_in_l0 + Sum(L in level) sizeof(L) * (max_level - L) * fan_out
where:
* the fan_out is the amount of SSTables we usually compact with the
next level (usually 10).
* max_levels is the number of levels currently populated
* sizeof(L) is the total amount of data in a particular level.
Tests: unit (release)
"
* 'lcs-backlog-v2' of github.com:glommer/scylla:
LCS: implement backlog tracker for compaction controller
LCS: don't construct property in the body of constructor
LCS: try harder to move SSTables to highest levels.
leveled manifest: turn 10 into a constant
backlog: add level to write progress monitor
This is the last missing tracker among the major strategies. After
this, only DTCS is left.
To calculate the backlog, we will define the point of zero-backlog
as having all data in the last level. The backlog is then:
Sum(L in levels) sizeof(L) * (max_levels - L) * fan_out,
where:
* the fan_out is the amount of SSTables we usually compact with the
next level (usually 10).
* max_levels is the number of levels currently populated
* sizeof(L) is the total amount of data in a particular level.
Care is taken for the backlog not to jump when a new level has been just
recently created.
Aside from that, SSTables that accumulate in L0 can be subject to STCS.
We will then add a STCS backlog in those SSTables to represent that.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Right now we are constructing the _max_sstable_size_in_mb property in
the body of the constructor, which it makes it hard for us to use from
other properties.
We are doing that because we'd like to test for bounds of that value. So
a cleaner way is to have a helper function for that.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Our current implementation of LCS can end up with situations in which
just a bit of data is in the highest levels, with the majority in the
lowest levels. That happens because we will only promote things to
highest levels if the amount of data in the current level is higher than
the maximum.
This is a pre-existing problem in itself, but became even clearer when
we started trying to define what is the backlog for LCS.
We have discussed ways to fix this it by redefining the criteria on when
to move data to the next levels. That would require us to change the way
things are today considerably, allowing parallel compactions, etc. There
is significant risk that we'll increase write amplication and we would
need to carefully validate that.
For now I will propose a simpler change, that essentially solves the
"inverted pyramid" problem of current LCS without major disruption:
keep selecting compaction candidates with the same criteria that we do
today, we should help make sure we are not compacting high levels for no
reason; but if there is nothing to do, use the idle time to push data to
higher levels. As an added benefit, old data that is in the higher level
can also be compacted away faster.
With this patch we see that in an idle, post-load system all data is
eventually pushed to the last level. Systems under constant writes keep
behaving the same way they did before.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We increase levels in powers of 10 but that is a parameter
of the algorithm. At least make it into a constant so that we can
reuse it somewhere else.
Signed-off-by: Glauber Costa <glauber@scylladb.com>