This reverts commit 653e250d04.
Compiletion is broken with this patch:
[155/264] CXX build/release/db/config.o
FAILED: g++ -MMD -MT build/release/db/config.o -MF build/release/db/config.o.d -std=gnu++1y -g -Wall -Werror -fvisibility=hidden -pthread -I/home/shlomi/scylla/seastar -I/home/shlomi/scylla/seastar/build/release/gen -march=nehalem -Wno-overloaded-virtual -DHAVE_HWLOC -DHAVE_NUMA -O2 -I/usr/include/jsoncpp/ -Wno-maybe-uninitialized -DHAVE_LIBSYSTEMD=1 -I. -I build/release/gen -I seastar -I seastar/build/release/gen -c -o build/release/db/config.o db/config.cc
db/config.cc:57:13: error: ‘void db::validate(boost::any&, const std::vector<std::__cxx11::basic_string<char> >&, db::string_map*, int)’ defined but not used [-Werror=unused-function]
static void validate(boost::any& out, const std::vector<std::string>& in,
^
cc1plus: all warnings being treated as errors
This branch doesn't have commits which introduce the problem which
this patch fixes, so let's just revert it.
The rate_moving_average is used by timed_rate_moving_average to return
its internal values.
If there are no timed event, the mean_rate is not propertly initilized.
To solve that the mean_rate is now initilized to 0 in the structure
definition.
Refs #1306
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1465231006-7081-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit 2cf882c365)
Currently we only shut down on EIO. Expand this to shut down on any
system_error.
This may cause us to shut down prematurely due to a transient error,
but this is better than not shutting down due to a permanent error
(such as ENOSPC or EPERM). We may whitelist certain errors in the future
to improve the behavior.
Fixes#1311.
Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 961e80ab74)
It was discussed that leveled strategy may not benefit from parallel
compaction feature because almost all compaction jobs will have similar
size. It was also found that leveled strategy wasn't working correctly
with it because two overlapping sstable (targetting the same level)
could be created in parallel by two ongoing compaction.
Fixes#1293.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <60fe165d611c0283ca203c6d3aa2662ab091e363.1464883077.git.raphaelsc@scylladb.com>
(cherry picked from commit 588ce915d6)
* dist/ami/files/scylla-ami 72ae258...863cc45 (3):
> Move --cpuset/--smp parameter settings from scylla_sysconfig_setup to scylla_ami_setup
> convert scylla_install_ami to bash script
> 'sh -x -e' is not valid since all scripts converted to bash script, so remove them
Limit disk bandwidth to 5MB/s to emulate a slow disk:
echo "8:0 5000000" >
/cgroup/blkio/limit/blkio.throttle.write_bps_device
echo "8:0 5000000" >
/cgroup/blkio/limit/blkio.throttle.read_bps_device
Start scylla node 1 with low memory:
scylla -c 1 -m 128M --auto-bootstrap false
Run c-s:
taskset -c 7 cassandra-stress write duration=5m cl=ONE -schema
'replication(factor=1)' -pop seq=1..100000 -rate threads=20
limit=2000/s -node 127.0.0.1
Start scylla node 2 with low memory:
scylla -c 1 -m 128M --auto-bootstrap true
Without this patch, I saw std::bad_alloc during streaming
ERROR 2016-06-01 14:31:00,196 [shard 0] storage_proxy - exception during
mutation write to 127.0.0.1: std::bad_alloc (std::bad_alloc)
...
ERROR 2016-06-01 14:31:10,172 [shard 0] database - failed to move
memtable to cache: std::bad_alloc (std::bad_alloc)
...
To fix:
1. Apply the streaming mutation limiter before we read the mutation into
memory to avoid wasting memory holding the mutation which we can not
send.
2. Reduce the parallelism of sending streaming mutations. Before we send each
range in parallel, after we send each range one by one.
before: nr_vnode * nr_shard * (send_info + cf.make_reader memory usage)
after: nr_shard * (send_info + cf.make_reader memory usage)
We can at least save memory usage by the factor of nr_vnode, 256 by
default.
In my setup, fix 1) alone is not enough, with both fix 1) and 2), I saw
no std::bad_alloc. Also, I did not see streaming bandwidth dropped due
to 2).
In addition, I tested grow_cluster_test.py:GrowClusterTest.test_grow_3_to_4,
as described:
https://github.com/scylladb/scylla/issues/1270#issuecomment-222585375
With this patch, I saw no std::bad_alloc any more.
Fixes: #1270
Message-Id: <7703cf7a9db40e53a87f0f7b5acbb03fff2daf43.1464785542.git.asias@scylladb.com>
(cherry picked from commit 206955e47c)
This reverts commit b3ed55be1d.
The issue is in the failing dtest, not this commit. Gleb writes:
"The bug is in the test, not the patch. Test waits for repair session
to end one way or the other when node is killed, but for nodetool to
know if repair is completed it needs to poll for it. If node dies
before nodetool managed to see repair completion it will stuck
forever since jmx is alive, but does not provide answers any more.
The patch changes timing, repair is completed much close to exit now,
so problem appears, but it may happen even without the patch.
The fix is for dtest to kill jmx as part of killing a node
operation."
Now that Lucas fixed the problem in scylla-ccm, revert the revert.
(cherry picked from commit 0255318bf3)
Scylla-jmx and collectd can preempt scylla and induce long latencies. Tune
the scheduler to provide lower latencies.
Since when the support processes are not running we normally do not context
switch (one thread per core, remember?), there should be no effect on
throughput.
The tunings are provided in a separate package, which can be uninstalled
if the server is shared with other applications which are negatively
affected by the tuning.
Fixes#1218.
Message-Id: <1464529625-12825-1-git-send-email-avi@scylladb.com>
compact_on_idle will lead users to thinking we're talking about sstable
compaction, not log-structured-allocator compaction.
Rename the variable to reduce the probability of confusion.
Message-Id: <1464261650-14136-1-git-send-email-avi@scylladb.com>
When read/write to a partition happens in parallel reader may detect
digest mismatch that may potentially cause cross DC read repair attempt,
but the repair is not really needed, so added latency is not justified.
This patch tries to prevent such parallel access from causing heavy
cross DC repair operation buy checking a timestamp of most resent
modification. If the modification happens less then "write timeout"
seconds ago the patch assumes that the read operation raced with write
one and cancel cross DC repair, but only if CL is LOCAL_*.
The space calculation counters in column family had two problem:
1. The total bytes is an ever growing counter, which is meaningless for
the API.
2. Trying to simply sum the size on all shards, ignores the fact that the
same sstable file can be referenced by multiple shards, this is
especially noticeable during migration time.
To solve this, the implementation was modified so instead of
collecting the sizes, the API would collect a map of file name to size
and then would do the summing.
This removes the duplications and fixes the total bytes calculation
Calling cfstats before the change with load after a compaction happend:
$ nodetool cfstats keyspace1
Keyspace: keyspace1
Verify write latency 1068253.0 76435
Read Count: 75915
Read Latency: 0.5953986037015082 ms.
Write Count: 76435
Write Latency: 0.013975966507490025 ms.
Pending Flushes: 0
Table: standard1
SSTable count: 5
Space used (live): 44261215
Space used (total): 219724478
After the fix:
$ nodetool cfstats keyspace1
Keyspace: keyspace1
Verify write latency 1863206.0 124219
Read Count: 125401
Read Latency: 0.9381053978835895 ms.
Write Count: 124219
Write Latency: 0.01499936402643718 ms.
Pending Flushes: 0
Table: standard1
SSTable count: 6
Space used (live): 50402904
Space used (total): 50402904
Space used by snapshots (total): 0
Fixes: #1042
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1464518757-14666-2-git-send-email-amnon@scylladb.com>
We have recently commited a fix to a broken streaming bug that involved
reverting column_family::stop() back to calling the custom seal functions
explicitly for both memtables and streaming memtables.
We here add a comment to explain why that had to be done.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <fe94b5883e9c29adc7fc9ee9f498894c057e7b64.1464293167.git.glauber@scylladb.com>
"This patch changes the way we wait for supported features. We no longer
sleep periodically, waking up to check if the wanted features are now
avaiable. Instead, we register waiters in a condition variable that is
signaled whenever new endpoint information is received.
We also add a new poll interface based on the feature class, which
encapsulates the availability of a cluster feature."
This class encapsulates the waiting for a cluster feature. A feature
object is registered with the gossiper, which is responsible for later
marking it as enabled.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the sleep-based mechanism of detecting new features
by instead registering waiters with a condition variable that is
signaled whenever a new endpoint information is received.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch removes the timeout when waiting for features,
since future patches will make this argument unnecessary.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch fixes an inadvertent change to the shadow endpoint state
map in gossiper::run, done by calling get_heart_beat_state() which
also updates the endpoint state's timestamp. This did not happen for
the normal map, but did happen for the shadow map. As a result, every
time gossiper::run() was scheduled, endpoint_map_changed would always
be true and all the shards would make superfluous copies of the
endpoint state maps.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1464309023-3254-2-git-send-email-duarte@scylladb.com>
"This patchset provides a way to enable SET_NIC(posix_net_conf.sh) on
non-AMI environment.
Also support -mq option of the script.
This also contains number of bug fixes of scripts.
Fixes#1192"
NOTE: scyllatop now requires the urwid library
previously, if there were more metrics that lines in the terminal
window, the user could not see some of the metrics. Now the user can
scroll.
As an added bonus, the program will not crash when the window size
changes.
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1464098832-5755-1-git-send-email-yoav@scylladb.com>
Vlad reported a strange user configuration:
SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level
info --collectd-address=127.0.0.1:25826 --collectd=1
--collectd-poll-period 60000 --network-stack posix --num-io-queues 32
--max-io-requests 128 --replace-address 10.0.4.131"
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.0.4.131"
In the mean while, 10.0.4.131 is the IP address of the node itself.
When the node was started, the following message were reported.
Apr 13 06:31:12 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
... (20 seconds passed)
Apr 13 06:31:13 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
... (21 seconds passed)
Apr 13 06:31:14 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
... (22 seconds passed)
Apr 13 06:31:15 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
... (23 seconds passed)
The configruation is invalid, becasue for --replace-address to
work, at least one working seed node should be alive. Catch the
configuration error and fail it with an appropriate error message.
Fixes#1183
Message-Id: <a94a082d896313e7a668915ae21fe2c03719da3a.1464164058.git.asias@scylladb.com>
_live_endpoints_just_added tracks the peer node which just becomes live.
When a down node gets back, the peer nodes can receive multiple messages
which would mark the node up, e.g., the message piled up in the sender's
tcp stack, after a node was blocked with gdb and released. Each such
message will trigger a echo message and when the reply of the echo
message is received (real_mark_alive), the same node will be added to
_live_endpoints_just_added.push_back more than once. Thus, we see the
same node be favored more than once:
INFO 2016-04-12 12:09:57,399 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:09:58,412 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:09:59,429 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:10:00,429 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:10:01,430 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:10:02,442 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:10:03,454 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
To fix, do not insert the node if it is already in
_live_endpoints_just_added.
Fixes#1178
Message-Id: <6bcfad4430fbc63b4a8c40ec86a2744bdfafb40f.1464161975.git.asias@scylladb.com>
In commit 4981362f57, I have introduced a regression that was thankfully
caught by our dtest infrastructure.
That patch is a preparation patch for the active reclaim patchset that is to
come, and it consolidated all the flushes using the memtable_list's seal_fn
function instead of calling the seal function explicitly.
The problem here is that the streaming memtables have the delayed mechanism,
about which the memtable_list is unaware. Calling memtable_list's
seal_active_memtable() for the streaming memtables calls the delayed version,
that does not guarantee flush. If we're lucky, we will indeed flush after the
timer expires, but if we're not we'll just stop the CF with data not flushed.
There are two options to fix this: the first is to teach the memtable_list about
the delayed/forced mechanism, and the second is to just call the correct
function explicitly during shutdown, and then when the time comes to add
continuations to the result of the seal, add them here as well.
Although the second option involves a bit more work and duplication, I think it
is better in the sense that the delayed / forced mechanism really is something
that belong to the streaming only. Being this the only user, I don't think it
justifies complicating the memtable_list with this concept.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <b26017c825ccf585f39f58c4ab3787d78e551f5f.1464126884.git.glauber@scylladb.com>
"This change is intended to make migration process safer and easier.
All column families will now have a directory called upload.
With this feature, users may choose to copy migrated sstables to upload
directory of respective column families, and run 'nodetool refresh'.
That's supposed to be the preferred option from now on."
The default CQL frame compression algorithm in Cassandra is LZ4. Add
support for decompressing incoming frames and compressing outgoing
frames with LZ4 if the CQL driver asks for that.
Fixes#416
Message-Id: <1464086807-11325-1-git-send-email-penberg@scylladb.com>