Commit Graph

6520 Commits

Author SHA1 Message Date
Avi Kivity
987294a412 Add missing copyrights 2015-09-20 10:16:11 +03:00
Asias He
eead846712 messaging_service: Make gossip use standalone tcp connection
For unknown reasons, I saw gossip syn message got rpc timeout erros when
the cluster is under heavy cassandra-strss stress.

Using a standalone tcp connection seems to fix the issue.
2015-09-19 10:17:42 +03:00
Shlomi Livne
4ba3580fa7 dist: aws ami install scylla-tools
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-09-19 10:16:33 +03:00
Avi Kivity
8f4eb7cc51 Merge "Fixes for building aws ami" from Shlomi 2015-09-19 10:15:42 +03:00
Raphael S. Carvalho
4d31e08299 conf: reenable partitioner in scylla.yaml
It's needed for compaction_delete_test dtest.
Otherwise, it will fail with:

Missing directive: partitioner
Fatal configuration error; unable to start. See log for stacktrace.

FAIL

======================================================================
FAIL: compaction_delete_test (compaction_test.TestCompaction_with_SizeTieredCompactionStrategy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/urchin_world/urchin-dtest/compaction_test.py", line 50, in compaction_delete_test
    self.assertEqual(numfound, 10)
AssertionError: 0 != 10

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-19 10:04:05 +03:00
Avi Kivity
4c890fb207 Merge seastar upstream
* seastar 5ab7662...86ffe72 (1):
  > reactor: temporarily disable SO_REUSEPORT.
2015-09-19 09:28:15 +03:00
Avi Kivity
dcdc925b86 Revert "Commitlog: Pre-allocate "reserve" segments"
This reverts commit cbf3b63853, due to
reports of increased latency (instead of the opposite).
2015-09-19 09:26:39 +03:00
Avi Kivity
9dbe8ca1b5 row_cache: reduce cpu impact of memtable flush
Restrict the impact of flushing a memtable to row_cache to 20% of the
cpu.  This is accomplished by converting the code to a thread (with
bad indentation to improve patch readability) and using a thread
scheduling group.
2015-09-19 09:22:52 +03:00
Avi Kivity
93871e4392 tranport: more straightforward poller removal during connection close
Instead of calling do_flush(), just remove the connection from the poll
list directly.
2015-09-19 09:22:32 +03:00
Pekka Enberg
87d6ea940d transport/server: Improve "truncated frame" error message
Include expected size as well as frame length to improve debuggability.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-09-19 08:46:37 +03:00
Shlomi Livne
b820fa1e58 dist: aws ami workaround for packer throwing an error at build time
Using the base fedora22 image there are many updates - for an unknown
reason after doing all the rpm installs we are getting

    amazon-ebs:
    amazon-ebs: Complete!
    amazon-ebs: Failed to execute operation: Access denied
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: No AMIs to cleanup
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' errored: Script exited with non-zero exit status: 1

The workaround is to create fedora22 image that already pulled the
updates

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-09-19 01:54:02 +03:00
Shlomi Livne
55dc4c2d83 dist: rpmbuild builds source rpm, mock builds binary rpm
- no need to create the binary rpm twice - we are using the mock version
- this is causing  issues on jenkins as we build rpms on it only via mock

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-09-19 01:53:59 +03:00
Shlomi Livne
18215e7a99 dist: fix a bug in ami build script
Need to copy the scylla-jmx.rpm into the ami build directory

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-09-19 01:48:53 +03:00
Avi Kivity
5f32c00f5f transport: fix removal of response batch poller when then connection terminates
We remove the poller too early; after _ready_to_respond becomes ready,
it is likely to have been inserted again.

Fix by moving it after _ready_to_respond.
2015-09-17 20:08:07 +02:00
Calle Wilund
cbf3b63853 Commitlog: Pre-allocate "reserve" segments
Refs #356

Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.

Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.

Some logging cleanup/betterment also to make behaviour easier to trace.

Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.

With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.
2015-09-17 19:54:28 +03:00
Shlomi Livne
536f557c22 dist: ami script will build jmx rpm if not available
v2
- add an error message if scylla-jmx is not checked out

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-09-17 18:21:15 +03:00
Avi Kivity
1719a8a43a Merge seastar upstream
* seastar 680f37d...5ab7662 (4):
  > reactor: replace "idle" metric by "load"
  > net::dpdk: workaround for a lack of RSS bits information.
  > rpc: wait for write buffer close during client destruction
  > rpc: remove unused future/promise
2015-09-17 18:19:44 +03:00
Avi Kivity
cc857c0e81 Merge "API: Adding functionaly to column family" from Amnon
"This series is part of a few serieses that adds functionality to column family
command and statistic."
2015-09-17 14:50:07 +03:00
Amnon Heiman
b91013957e API: Flush should wait before returning
This address issue #154

Flush command should wait for the commmand completion before returning.

This change replaces the for loop with a parallel_for_each, it will now
wait for all the flushes to complete before returning.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-09-17 14:48:51 +03:00
Avi Kivity
bad268c25e Update scylla-swagger-ui submodule name 2015-09-17 13:34:53 +03:00
Shlomi Livne
30d216e77e dist: fix generating archive with wrong file name for rpm
the tar file prefix needs to be only the version without the release
without this bug I get
.
.
.

Finish: build setup for scylla-server-0.8-20150917.2d99476.fc21.src.rpm
Start: rpmbuild scylla-server-0.8-20150917.2d99476.fc21.src.rpm
Building target platforms: x86_64
Building for target x86_64
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.fA7nBm
+ umask 022
+ cd /builddir/build/BUILD
+ cd /builddir/build/BUILD
+ rm -rf scylla-server-0.8
+ /usr/bin/tar -xf
/builddir/build/SOURCES/scylla-server-0.8-20150917.2d99476.tar
+ cd scylla-server-0.8
/var/tmp/rpm-tmp.fA7nBm: line 33: cd: scylla-server-0.8: No such file or
directory

RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.fA7nBm (%prep)
    Bad exit status from /var/tmp/rpm-tmp.fA7nBm (%prep)
ERROR:
Exception(build/rpmbuild/SRPMS/scylla-server-0.8-20150917.2d99476.fc21.src.rpm)
Config(fedora-21-x86_64) 4 minutes 17 seconds

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-09-17 12:50:47 +03:00
Avi Kivity
c7a66b7ab2 Merge seastar upstream
* seastar dc5f2f5...680f37d (2):
  > reactor: reset inline continuation counter when starting a scheduler run
  > future: force inline more functions
2015-09-17 11:59:16 +03:00
Pekka Enberg
246df4e325 dist/redhat: Fix RPM package home page URL
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-09-17 11:51:29 +03:00
Asias He
0f5df4476c gossip: Make the timeout longer for gossip syn and echo message
When the cluster is under heavy load, the time to exchange a gossip
message might take longer than 1s. Let's make the timeout longer for now
before we can solve the large delay of gossip message issue.
2015-09-17 11:35:31 +03:00
Asias He
2d99476bb1 storage_service: Fix schedule_schema_pull
It might block for a very long time. Don't wait for it otherwise it will
block the whole gossip round.
2015-09-17 09:13:20 +03:00
Calle Wilund
ca0dac72b1 commitlog_test: fix test sync in test_commitlog_delete_when_over_disk_limit
Patch "Fix some timing/latency issues with sync" changed new_segment to
_not_ wait for flush to finish. This means that checking actual files on
disk in the test case might race.
Lucklily, we can more or less just check the segment list instead
(added recently-ish)
2015-09-16 20:38:59 +03:00
Calle Wilund
b512192b3b Commitlog: Fix some timing/latency issues with sync
Refs #356

* Move sync time setting to sync initiate to help prevent double syncs
* Change add_mutation to only do explicit sync with wait if time elapsed
  since last is 2x sync window
* Do not wait for sync when moving to new segment in alloc path
* Initiate _sync_time properly.
* Add some tracing log messages to help debug
2015-09-16 20:07:25 +03:00
Raphael S. Carvalho
461ecc55e3 sstable: fix race condition when deleting a partial sstable
Race condition happens when two or more shards will try to delete
the same partial sstable. So the problem doesn't affect scylla
when it boots with a single shard.
To fix this problem, shard 0 will be made the responsible for
deleting a partial sstable.

fixes #359.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-16 19:58:44 +03:00
Tomasz Grabiec
6fa49e8fbc Merge tag 'avi/cql-batching/v2' from seastar-dev.git
From Avi:

We currently send out each cql transport response in its own packet, which
is very inefficient.

Use a poller to schedule responses to be flushed out, which allows multiple
responses to be sent out in one packet, reducing tcp stack overhead.

I see ~50% improvement with this on my desktop (single core).
2015-09-16 16:56:47 +02:00
Raphael S. Carvalho
8b6319702e compaction_manager: recreate gate when task is stopped
Otherwise, a gate_closed_exception would be triggered when
resuming the task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-16 17:54:43 +03:00
Avi Kivity
e3987eecd9 Merge seastar upstream
* seastar 8f8cfe9...dc5f2f5 (1):
  > core/reactor: Avoid idle time over-estimation
2015-09-16 17:40:33 +03:00
Gleb Natapov
ab5f52fde3 storage_proxy: lazily calculate digest from data results during query
Do not calculate digest from data on arrival, do it during digest
matching check, also skip it entirely if there is only one digest
to match.
2015-09-16 17:40:22 +03:00
Calle Wilund
e001df0a35 Main: Resolve scylla.conf based on ENV vars + do more explicit error logging
Refs #135
2015-09-16 15:44:34 +03:00
Calle Wilund
d42ff89e83 Config: Promote logging of unhandled options to warning
Fixes #222
2015-09-16 15:43:53 +03:00
Calle Wilund
bf727b2272 config.cc : add logging of unset attributes
Helps checking for missing stuff in scylla.yaml
2015-09-16 15:43:35 +03:00
Calle Wilund
8172717ba0 config.hh : update some default values to match scylla.conf 2015-09-16 15:43:35 +03:00
Calle Wilund
ac6ebc0c14 scylla.yaml: Move supported options to supported and comment out rest
Fixes #350
2015-09-16 15:43:33 +03:00
Calle Wilund
bd14d40a35 main: configure logging before reading yaml as well as after
So that commmand line --log* options can affect config logging.
2015-09-16 15:43:32 +03:00
Calle Wilund
04562b23b4 commitlog_replayer: More correct fix for reordering issue in replay
* Removes previous, accidental fix that got committed.
* Instead just do not give RP:s to replay mutations. This is same as in Origin,
  and just as/more correct, since we intend to flush the data to sstables
  asap anyway
2015-09-16 15:41:17 +03:00
Takuya ASADA
74dafdf8eb dist: add scylla-jmx for AMI
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-09-16 15:40:12 +03:00
Avi Kivity
476af33ee7 Merge "server::process_batch" from Calle
"Fixes #332

Implementation of "native protocol message of type BATCH", i.e. Origin
BatchMessage."
2015-09-16 15:38:20 +03:00
Gleb Natapov
5f76cacc90 storage_proxy: handle some more exceptions during write 2015-09-16 14:08:11 +02:00
Avi Kivity
675c0bbdd4 transport: batch responses
Instead of flushing responses immediately, ask a reactor poller to flush
them for us.  This lets several responses to be flushed out together in
one packet.
2015-09-16 15:04:19 +03:00
Avi Kivity
518720fcbb transport: disable zero-copy
Zero-copy requires pushing a packet to the output stream, which defeats any
attempt at batching.  Disable it for now; we will revisit it later.
2015-09-16 14:19:23 +03:00
Asias He
0091d2fc43 rpm: Fix duplicated log message in syslog
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - Got GossipDigestSyn Reply
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - local heartbeat version 761 greater than 760 for 172.31.14.220
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - Sending a GossipDigestACK2 to 172.31.15.223:0
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - Got GossipDigestACK2 Reply
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - Performing status check ...
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - failure_detector: now=3138723072797, tlast=3137731235729, t=991837068, mean=1.00929e+09, phi=0.982712
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - failure_detector: PHI for 172.31.15.223 : 0.982712
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - Got GossipDigestSyn Reply
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - local heartbeat version 761 greater than 760 for 172.31.14.220
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - Sending a GossipDigestACK2 to 172.31.15.223:0
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - Got GossipDigestACK2 Reply
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - Performing status check ...
Sep 09 02:43:34  scylla_run[7088]: DEBUG   [shard 0] gossip - failure_detector: now=3138723072797, tlast=3137731235729, t=991837068, mean=1.00929e+09, phi=0.982712
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - failure_detector: PHI for 172.31.15.223 : 0.982712

Fixes #321
2015-09-16 11:51:44 +03:00
Avi Kivity
eb61a60434 Merge seastar upstream
* seastar ac54520...8f8cfe9 (2):
  > future: abort on scheduling failure
  > reactor: run any remaining tasks during stop
2015-09-16 11:49:33 +03:00
Asias He
1e7d883ae1 messaging_service: Fix shard_id
We should ignore equal and less than operators for shard_id as well.

Within a 3 nodes cluster, each node has 4 cpus, on first node

Before:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp        0      0 172.30.0.99:36998       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:36772       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:40125       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:60182       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:38013       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:51997       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:56532       172.30.0.100:7000 ESTABLISHED

After:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp        0      0 172.30.0.99:45661       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:57395       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:37807       172.30.0.100:7000 ESTABLISHED
tcp        0     36 172.30.0.99:50567       172.30.0.100:7000 ESTABLISHED

Each shard of a node is supposed to have 1 connection to a peer node,
thus each node will have #cpu connections to a peer node.

With this patch, the cluster is much more stable than before on AWS. So
far, I see no timeout in the gossip syn message exchange.
2015-09-16 08:44:47 +02:00
Paweł Dziepak
33e395d677 cql3: align error message for null partition key with origin
Fixes #328.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-16 08:31:20 +02:00
Avi Kivity
2a09788022 Merge seastar upstream
* seastar d71f1b0...ac54520 (1):
  > future: allow repeat() at least one execution
2015-09-15 19:15:54 +03:00
Avi Kivity
4bf074d944 Merge seastar upstream
* seastar 84cf099...d71f1b0 (2):
  > reactor: switch to time-based task quotas
  > reactor: fold ::task_quota into future_avail_count
2015-09-15 16:01:40 +03:00