For unknown reasons, I saw gossip syn message got rpc timeout erros when
the cluster is under heavy cassandra-strss stress.
Using a standalone tcp connection seems to fix the issue.
Restrict the impact of flushing a memtable to row_cache to 20% of the
cpu. This is accomplished by converting the code to a thread (with
bad indentation to improve patch readability) and using a thread
scheduling group.
Using the base fedora22 image there are many updates - for an unknown
reason after doing all the rpm installs we are getting
amazon-ebs:
amazon-ebs: Complete!
amazon-ebs: Failed to execute operation: Access denied
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: No AMIs to cleanup
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' errored: Script exited with non-zero exit status: 1
The workaround is to create fedora22 image that already pulled the
updates
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
- no need to create the binary rpm twice - we are using the mock version
- this is causing issues on jenkins as we build rpms on it only via mock
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
We remove the poller too early; after _ready_to_respond becomes ready,
it is likely to have been inserted again.
Fix by moving it after _ready_to_respond.
Refs #356
Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.
Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.
Some logging cleanup/betterment also to make behaviour easier to trace.
Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.
With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.
This address issue #154
Flush command should wait for the commmand completion before returning.
This change replaces the for loop with a parallel_for_each, it will now
wait for all the flushes to complete before returning.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
the tar file prefix needs to be only the version without the release
without this bug I get
.
.
.
Finish: build setup for scylla-server-0.8-20150917.2d99476.fc21.src.rpm
Start: rpmbuild scylla-server-0.8-20150917.2d99476.fc21.src.rpm
Building target platforms: x86_64
Building for target x86_64
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.fA7nBm
+ umask 022
+ cd /builddir/build/BUILD
+ cd /builddir/build/BUILD
+ rm -rf scylla-server-0.8
+ /usr/bin/tar -xf
/builddir/build/SOURCES/scylla-server-0.8-20150917.2d99476.tar
+ cd scylla-server-0.8
/var/tmp/rpm-tmp.fA7nBm: line 33: cd: scylla-server-0.8: No such file or
directory
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.fA7nBm (%prep)
Bad exit status from /var/tmp/rpm-tmp.fA7nBm (%prep)
ERROR:
Exception(build/rpmbuild/SRPMS/scylla-server-0.8-20150917.2d99476.fc21.src.rpm)
Config(fedora-21-x86_64) 4 minutes 17 seconds
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
When the cluster is under heavy load, the time to exchange a gossip
message might take longer than 1s. Let's make the timeout longer for now
before we can solve the large delay of gossip message issue.
Patch "Fix some timing/latency issues with sync" changed new_segment to
_not_ wait for flush to finish. This means that checking actual files on
disk in the test case might race.
Lucklily, we can more or less just check the segment list instead
(added recently-ish)
Refs #356
* Move sync time setting to sync initiate to help prevent double syncs
* Change add_mutation to only do explicit sync with wait if time elapsed
since last is 2x sync window
* Do not wait for sync when moving to new segment in alloc path
* Initiate _sync_time properly.
* Add some tracing log messages to help debug
Race condition happens when two or more shards will try to delete
the same partial sstable. So the problem doesn't affect scylla
when it boots with a single shard.
To fix this problem, shard 0 will be made the responsible for
deleting a partial sstable.
fixes#359.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
From Avi:
We currently send out each cql transport response in its own packet, which
is very inefficient.
Use a poller to schedule responses to be flushed out, which allows multiple
responses to be sent out in one packet, reducing tcp stack overhead.
I see ~50% improvement with this on my desktop (single core).
* Removes previous, accidental fix that got committed.
* Instead just do not give RP:s to replay mutations. This is same as in Origin,
and just as/more correct, since we intend to flush the data to sstables
asap anyway
Instead of flushing responses immediately, ask a reactor poller to flush
them for us. This lets several responses to be flushed out together in
one packet.
We should ignore equal and less than operators for shard_id as well.
Within a 3 nodes cluster, each node has 4 cpus, on first node
Before:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp 0 0 172.30.0.99:36998 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:36772 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:40125 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:60182 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:38013 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:51997 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:56532 172.30.0.100:7000 ESTABLISHED
After:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp 0 0 172.30.0.99:45661 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:57395 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:37807 172.30.0.100:7000 ESTABLISHED
tcp 0 36 172.30.0.99:50567 172.30.0.100:7000 ESTABLISHED
Each shard of a node is supposed to have 1 connection to a peer node,
thus each node will have #cpu connections to a peer node.
With this patch, the cluster is much more stable than before on AWS. So
far, I see no timeout in the gossip syn message exchange.