Currently, we weren't completing a query as early as possible if it
reached the partition limit, we instead had to wait until reaching the
end of the specified partition ranges. This patches fixes that by
including a check to the partition limit in the termination condition.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161213114559.26438-1-duarte@scylladb.com>
column_mapping is not safe to access across shards, because data_type
is not safe to access. One of the manifestation of this is that
abstract_type::is_value_compatible_with() always fails if the two
types belong to different shards.
During replay, column_mapping lives on the replaying shard, and is
used by converting_mutation_partition_applier against the schema on
the target shard. Since types in the mapping will be considered
incompatible with types in the schema, all cells will be dropped.
Fix by using column_mapping in a safe way, by copying it to the target
shard if necessary. Each shard maintains its own cache of column
mappings.
Fixes#1924.
Message-Id: <1481310463-13868-1-git-send-email-tgrabiec@scylladb.com>
* seastar 0773e98...6fbd792 (2):
> tls: Only run our "verify" function in client session
> Merge "Clean the metric definition" from Amnon
Includes patch from Amnon adjusting the metrics registration due to seastar
API changes.
* seastar 0a74317...0773e98 (6):
> tls: Add support for client cetrificate verification & priority strings
> semaphore: add consume_units
> semaphore: add available_units()
> thread: check need_preempt for threads in a scheduling group as well
> tutorial: fix semaphore example, and text
> stop_iteration: add && and || operators
"This series:
- We can make reader with ranges
- Fix possible use after free of 'si'
- Streaming ranges now are sorted and merged
- Fix shard_begin shard_end end loop in both streaming and repair"
A range now alternates between different shards: the first part of the
range goes to shard X, the next to shard X+1, but after a while we go
back to shard X. So we can't do a simple loop between shard_begin and
shard_end.
Fix by using the newly introduced dht::split_range_to_shards
Use the cf.make_streaming_reader with ranges to simplify the code a bit.
Now that we have the new interface to make readers with ranges, we can
simplify the code a lot.
1) Less readers are needed
before: number of ranges of readers
after: smp::count readers at most
2) No foreign_ptr is needed
There is no need to forward to a shard to make the foreign_ptr for
send_info in the first phase and forward to that shard to execute the
send_info in the second phase.
3) No do_with is needed in send_mutations since si now is a
lw_shared_ptr
4) Fix possible user after free of 'si' in do_send_mutations
We need to take a reference of 'si' when sending the mutation with
send_stream_mutation rpc call, otherwise:
msg1 got exception
si->mutations_done.broken()
si is freed
msg2 got exception
si is used again
The issue is introduced in dc50ce0ce5 (streaming: Make the mutation
readers when streaming starts) which is master only, branch 1.5 is not
affected.
Allow to make a streaming reader with a vector of ranges in addition to
a single range. This will be used soon in following streaming patch.
We can make the reader more efficient later.
This patch fixes a typo in i_partitioner::tri_compare() where we were
using std::max instead of std::min, thus avoiding accessing random
memory and getting random results.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161211165043.17816-1-duarte@scylladb.com>
This patch adds uuid file support for ubuntu system. It also split the
behaviour between restart and daily checks. The first run in r mode and
the second in d mode.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Allows scylla-housekeeping getting the uuid from a file instead of the
command line.
If the file is missing no uuid will be used.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
"Function to calculate maximum purgeable timestamp is made 10 times faster when
compacting sstables overlap with 10% of all sstables.
That's possible with an incremental selector that will incrementally select
sstables based on key being compacted.
Currently, we iterate through all non-compacting sstables and consult their
bloom filter to determine max purgeable timestamp, and that will be very
expensive for compactions that are frequently deciding whether or not to purge
tombstones."
* 'filter_overhead_fix_v4' of github.com:raphaelsc/scylla:
compaction: reduce bloom filter overhead with incremental selector
tests: add test for sstable set's incremental selector
sstable_set: introduce incremental selector
compatible_ring_position: add function to return token
The procedure to calculate max purgeable timestamp is optimized
by only visiting sstables that overlap with key being currently
compacted. That's done using incremental sstable selector.
Function to calculate maximum purgeable timestamp is made 10 times
faster when compacting sstables overlap with 10% of all sstables.
Fixes#1322.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Incrementally select sstables from sstable set using token
in ascending order.
For leveled strategy, it returns all sstables that belong
to current interval. For other strategies, it just return
all sstables from the set.
Useful for compaction which needs all sstables that overlap
with key being currently compacted to calculate maximum
purgeable timestamp.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The semaphore future may be unavailable for many reasons. Specifically,
if the task quota is depleted right between sem.wait() and the .then()
clause in get_units() the resulting future won't be available.
That is particularly visible if we decrease the task quota, since those
events will be more frequent: we can in those cases clearly see this
counter going up, even though there aren't more requests pending than
usual.
This patch improves the situation by replacing that check. We now verify
whether or not there are waiters in the semaphore.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <113c0d6b43cd6653ce972541baf6920e5765546b.1481222621.git.glauber@scylladb.com>
When row cache is disabled, update_cache() will do nothing to the
memtable. Active readers may keep the memtable alive for unbounded
amount of time, preventing it from going away. This doesn't play well
with virtual dirty accounting. Soon before calling update_cache(), the
memory which was subtracted during flush is added back to the amount
of virtual dirty memory. If there was write pressure all along, we
will be at the dirty memory limit. When we give back subtracted memory
this will put virtual dirty way above the limit. This will stall all
writes until another memtable flush drags virtual dirty down or
readers finally release the memtable. We want to prevent upward
jumps of virtual dirty.
First part of the fix is to ensure that as long as the memtable's
region is in the dirty group, we will not revert flushed memory. This
must happen synchronously from region's memory being removed from the
group in order to prevent upward virtual dirty jumps. To make this
easier, tracking of flushed memory was moved to the memtable object.
Another part of the fix is to gradually clear the memtable when cache
is disabled in a similar fashion as when it's moved to cache. This
ensures that the actual memory held by memtable's region is released
sooner than it dies.
Refs #1879
"Due to my misreading of Cassandra code, I thought it would ignore new
components in the Statistics component; however, it doesn't, and the change
(introduced in bdd11648ac ("sstables: add
intra-node sharding metadata") breaks sstable2json and likely any
Cassandra code that touches sstables.
To fix, move the sharding data into a new component ("Scylla.db"), which
Cassandra does ignore. The new component is designed to be extensible so
we don't experience the same issue later on."
* tag 'asias/repair/subranges/refactor_fix/v1' of github.com:cloudius-systems/seastar-dev:
repair: Limit the number of sub ranges
repair: Use estimated_keys_for_range in repair_cf_range
repair: Extract the target_partitions into repair_info class
repair: Put request_transfer_ranges into repair_info class
repair: Introduce check_failed_ranges helper
repair: Introduce do_streaming helper
repair: Make the neighbors const reference
repair: Introduce repair_info
repair: Attach the repair id in the stream plan name
The problem is that replay will unlink any segments which were on disk
at the time the replay starts. However, some of those segments may
have been created by current node since the boot. If a segment is part
of reserve for example, it will be unlinked by replay, but we will
still use that segment to log mutations. Those mutations will not be
visible to replay after a crash though.
The fix is to record preexisting segents before any new segments will
have a chance to be created and use that as the replay list.
Introduced in abe7358767.
dtest failure:
commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup
Message-Id: <1481117436-6243-1-git-send-email-tgrabiec@scylladb.com>
- Add "coordinator" and "replica" categories
- Use a new seastar/metrics_registration framework
* 'rearrange-storage-proxy-stats-v4' of github.com:cloudius-systems/seastar-dev:
service::storage_proxy: rework the collectd counters registration
service/storage_proxy: regroup collectd statistics
The Cassandra derived sstable tools (and likely Cassandra itself) object to
a new sub-component in the Statistics component; create a new Scylla
component instead to host this data.
Allow declaring discriminated unions (with an enum type as the
discriminant and any sstable serializable type as a value) and sets
of these unions, with the disciminant as the key. Parsers and writers
are auto-generated.
Currently housekeeping timer won't be reset when we restart scylla-server.
We expect the service to be run at each start, it will be consistent with
upstart script in Ubuntu 14.04
When we restart scylla-server, housekeepting timer will also be restarted,
so let's replace "OnBootSec" with "OnActiveSec".
Fixes: #1601
Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <a22943cc11a3de23db266c52fd476c08014098c4.1480607401.git.amos@scylladb.com>
To reduce duplicated code and simplified scripts introduce scylla_lib.sh
for shellscripts which provides functions to classify distributions,
and load all sysconfig files.
This also fixes script bugs to misdetect Debian and RHEL.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1480667672-9453-2-git-send-email-syuu@scylladb.com>
Since Ubuntu 15.10/16.04 still uses Upstart to manage GUI session (not as init), when we directly launch Scylla on Ubuntu's GUI Terminal(not using systemctl or initctl), raise(SIGSTOP) mistakenly calls (Because GUI session has "UPSTART_JOB" environment variable, won't happen when running Scylla as systemd service).
To avoid this, we need to verify UPSTART_JOB == "scylla-server".
If it's part of GUI session UPSTART_JOB has to be "unity7", we need to avoid raise(SIGSTOP) in that case.
Fixes#1199
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1480620421-28967-1-git-send-email-syuu@scylladb.com>