Commit Graph

53948 Commits

Author SHA1 Message Date
Pekka Enberg
246df4e325 dist/redhat: Fix RPM package home page URL
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-09-17 11:51:29 +03:00
Asias He
0f5df4476c gossip: Make the timeout longer for gossip syn and echo message
When the cluster is under heavy load, the time to exchange a gossip
message might take longer than 1s. Let's make the timeout longer for now
before we can solve the large delay of gossip message issue.
2015-09-17 11:35:31 +03:00
Asias He
2d99476bb1 storage_service: Fix schedule_schema_pull
It might block for a very long time. Don't wait for it otherwise it will
block the whole gossip round.
2015-09-17 09:13:20 +03:00
Calle Wilund
ca0dac72b1 commitlog_test: fix test sync in test_commitlog_delete_when_over_disk_limit
Patch "Fix some timing/latency issues with sync" changed new_segment to
_not_ wait for flush to finish. This means that checking actual files on
disk in the test case might race.
Lucklily, we can more or less just check the segment list instead
(added recently-ish)
2015-09-16 20:38:59 +03:00
Calle Wilund
b512192b3b Commitlog: Fix some timing/latency issues with sync
Refs #356

* Move sync time setting to sync initiate to help prevent double syncs
* Change add_mutation to only do explicit sync with wait if time elapsed
  since last is 2x sync window
* Do not wait for sync when moving to new segment in alloc path
* Initiate _sync_time properly.
* Add some tracing log messages to help debug
2015-09-16 20:07:25 +03:00
Raphael S. Carvalho
461ecc55e3 sstable: fix race condition when deleting a partial sstable
Race condition happens when two or more shards will try to delete
the same partial sstable. So the problem doesn't affect scylla
when it boots with a single shard.
To fix this problem, shard 0 will be made the responsible for
deleting a partial sstable.

fixes #359.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-16 19:58:44 +03:00
Tomasz Grabiec
6fa49e8fbc Merge tag 'avi/cql-batching/v2' from seastar-dev.git
From Avi:

We currently send out each cql transport response in its own packet, which
is very inefficient.

Use a poller to schedule responses to be flushed out, which allows multiple
responses to be sent out in one packet, reducing tcp stack overhead.

I see ~50% improvement with this on my desktop (single core).
2015-09-16 16:56:47 +02:00
Raphael S. Carvalho
8b6319702e compaction_manager: recreate gate when task is stopped
Otherwise, a gate_closed_exception would be triggered when
resuming the task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-16 17:54:43 +03:00
Avi Kivity
e3987eecd9 Merge seastar upstream
* seastar 8f8cfe9...dc5f2f5 (1):
  > core/reactor: Avoid idle time over-estimation
2015-09-16 17:40:33 +03:00
Gleb Natapov
ab5f52fde3 storage_proxy: lazily calculate digest from data results during query
Do not calculate digest from data on arrival, do it during digest
matching check, also skip it entirely if there is only one digest
to match.
2015-09-16 17:40:22 +03:00
Calle Wilund
e001df0a35 Main: Resolve scylla.conf based on ENV vars + do more explicit error logging
Refs #135
2015-09-16 15:44:34 +03:00
Calle Wilund
d42ff89e83 Config: Promote logging of unhandled options to warning
Fixes #222
2015-09-16 15:43:53 +03:00
Calle Wilund
bf727b2272 config.cc : add logging of unset attributes
Helps checking for missing stuff in scylla.yaml
2015-09-16 15:43:35 +03:00
Calle Wilund
8172717ba0 config.hh : update some default values to match scylla.conf 2015-09-16 15:43:35 +03:00
Calle Wilund
ac6ebc0c14 scylla.yaml: Move supported options to supported and comment out rest
Fixes #350
2015-09-16 15:43:33 +03:00
Calle Wilund
bd14d40a35 main: configure logging before reading yaml as well as after
So that commmand line --log* options can affect config logging.
2015-09-16 15:43:32 +03:00
Calle Wilund
04562b23b4 commitlog_replayer: More correct fix for reordering issue in replay
* Removes previous, accidental fix that got committed.
* Instead just do not give RP:s to replay mutations. This is same as in Origin,
  and just as/more correct, since we intend to flush the data to sstables
  asap anyway
2015-09-16 15:41:17 +03:00
Takuya ASADA
74dafdf8eb dist: add scylla-jmx for AMI
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-09-16 15:40:12 +03:00
Avi Kivity
476af33ee7 Merge "server::process_batch" from Calle
"Fixes #332

Implementation of "native protocol message of type BATCH", i.e. Origin
BatchMessage."
2015-09-16 15:38:20 +03:00
Gleb Natapov
5f76cacc90 storage_proxy: handle some more exceptions during write 2015-09-16 14:08:11 +02:00
Avi Kivity
675c0bbdd4 transport: batch responses
Instead of flushing responses immediately, ask a reactor poller to flush
them for us.  This lets several responses to be flushed out together in
one packet.
2015-09-16 15:04:19 +03:00
Avi Kivity
518720fcbb transport: disable zero-copy
Zero-copy requires pushing a packet to the output stream, which defeats any
attempt at batching.  Disable it for now; we will revisit it later.
2015-09-16 14:19:23 +03:00
Asias He
0091d2fc43 rpm: Fix duplicated log message in syslog
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - Got GossipDigestSyn Reply
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - local heartbeat version 761 greater than 760 for 172.31.14.220
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - Sending a GossipDigestACK2 to 172.31.15.223:0
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - Got GossipDigestACK2 Reply
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - Performing status check ...
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - failure_detector: now=3138723072797, tlast=3137731235729, t=991837068, mean=1.00929e+09, phi=0.982712
Sep 09 02:43:34  scylla[7097]: [shard 0] gossip - failure_detector: PHI for 172.31.15.223 : 0.982712
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - Got GossipDigestSyn Reply
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - local heartbeat version 761 greater than 760 for 172.31.14.220
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - Sending a GossipDigestACK2 to 172.31.15.223:0
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - Got GossipDigestACK2 Reply
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - Performing status check ...
Sep 09 02:43:34  scylla_run[7088]: DEBUG   [shard 0] gossip - failure_detector: now=3138723072797, tlast=3137731235729, t=991837068, mean=1.00929e+09, phi=0.982712
Sep 09 02:43:34  scylla_run[7088]: TRACE   [shard 0] gossip - failure_detector: PHI for 172.31.15.223 : 0.982712

Fixes #321
2015-09-16 11:51:44 +03:00
Avi Kivity
eb61a60434 Merge seastar upstream
* seastar ac54520...8f8cfe9 (2):
  > future: abort on scheduling failure
  > reactor: run any remaining tasks during stop
2015-09-16 11:49:33 +03:00
Asias He
1e7d883ae1 messaging_service: Fix shard_id
We should ignore equal and less than operators for shard_id as well.

Within a 3 nodes cluster, each node has 4 cpus, on first node

Before:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp        0      0 172.30.0.99:36998       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:36772       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:40125       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:60182       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:38013       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:51997       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:56532       172.30.0.100:7000 ESTABLISHED

After:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp        0      0 172.30.0.99:45661       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:57395       172.30.0.100:7000 ESTABLISHED
tcp        0      0 172.30.0.99:37807       172.30.0.100:7000 ESTABLISHED
tcp        0     36 172.30.0.99:50567       172.30.0.100:7000 ESTABLISHED

Each shard of a node is supposed to have 1 connection to a peer node,
thus each node will have #cpu connections to a peer node.

With this patch, the cluster is much more stable than before on AWS. So
far, I see no timeout in the gossip syn message exchange.
2015-09-16 08:44:47 +02:00
Paweł Dziepak
33e395d677 cql3: align error message for null partition key with origin
Fixes #328.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-16 08:31:20 +02:00
Avi Kivity
2a09788022 Merge seastar upstream
* seastar d71f1b0...ac54520 (1):
  > future: allow repeat() at least one execution
2015-09-15 19:15:54 +03:00
Avi Kivity
4bf074d944 Merge seastar upstream
* seastar 84cf099...d71f1b0 (2):
  > reactor: switch to time-based task quotas
  > reactor: fold ::task_quota into future_avail_count
2015-09-15 16:01:40 +03:00
Calle Wilund
7d502e2301 transport/server.cc: Implement process_batch
Equivalent of Origins BatchMessage.
2015-09-15 11:20:16 +02:00
Calle Wilund
dc7a8faa0f query_processor: Add process_batch
More or less identical to Origins version.
2015-09-15 11:20:14 +02:00
Calle Wilund
7f3ce3935e query_options: Add constructor for batch mode options (multi-level)
Added explicit move constructors as well as prohibit copy to help
disambiguate the constructor delegation
2015-09-15 11:20:13 +02:00
Calle Wilund
27421d55bf stream_session: Fix use of query_options::DEFAULT
Make (apparently dead?) test routine (not in test class)stream_session::test
use query_options::DEFAULT the way it is intended. Not copy it (semantically
prohibited, but accidentally possible in code)
2015-09-15 11:19:47 +02:00
Avi Kivity
58a76ae04c Merge seastar upstream
* seastar aa18f5c...84cf099 (1):
  > rpc: do not wait for data to be sent
2015-09-14 19:57:07 +03:00
Pekka Enberg
c4323c306f configure.py: Fix CXXFLAGS when extra flags are specified
We need to specify all flags, not just the extra ones.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-09-14 14:02:16 +03:00
Avi Kivity
1bb9fbc85a Merge "Version numbering" from Pekka
"This series implements version numbering for the "scylla" executable as
well as the release RPM. Fixes #306."
2015-09-14 12:49:35 +03:00
Pekka Enberg
9790cafe49 dist/redhat: Use generated version number in spec file
Fix the hard-coded version number from RPM spec file by using the
SCYLLA-VERSION-GEN script.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-09-14 11:35:32 +03:00
Pekka Enberg
eab6094124 main: Print version number at startup
Now that we have a version number, lets tell the world about it!

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-09-14 11:35:32 +03:00
Pekka Enberg
5ef77a8a56 build: Add version number generation
This adds version number generation in the build system. Version numbers
follow the format:

  <version>-<release>

where release consists of:

  <date>-<git-hash>

The version and release numbers are generated by the SCYLLA-VERSION-GEN
script and they are stored in SCYLLA-VERSION-FILE and
SCYLLA-RELEASE-FILE files so that other parts of the build system can
easily pick them up.

For builds that happen from release tarballs, for example,
SCYLLA-VERSION-GEN looks for a "version" file in the tree and just uses
that.

Basically, we're doing pretty much the same as Git is doing in its build
system.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-09-14 11:23:31 +03:00
Avi Kivity
7e1d03d098 db: delete ignored sstables
If an sstable is irrelevant for a shard, delete it.  The deletion will
only complete when all shards agree (either ignore the sstable or
delete it after compaction).
2015-09-14 10:14:00 +02:00
Amnon Heiman
e2501aa64c API: Compaction manager to return 0 for number of compaction
Until there will be an API for the compaction manager, the API return 0
for the number of total compaction.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-09-13 13:28:31 +03:00
Amnon Heiman
c06eb6b8c8 API: Adding stub and functionality to column family
The following function where added to column family:

is_auto_compaction_disabled
get_built_indexes
get_compression_metadata_off_heap_memory_used
get_compression_parameters
get_compression_ratio
get_read_latency_estimated_histogram
get_write_latency_estimated_histogram

And the get and set compaction strategy methods and a stub
implementation for the compression parameter, crc chec and sstable
count.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-09-13 13:28:14 +03:00
Amnon Heiman
520e96c634 compaction strategy: Return the compaction type
The compaction strategy was modify to return its compaction type.
The type method calls the virtual impl type method. Each of the
implementations return its type.

A name method was added to the compaction strategy that return the name
according to the strategy type.

And the static type method was modified to recieve a const reference to
the string.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-09-13 13:19:52 +03:00
Amnon Heiman
497e403387 API: Workaround for load_map
The get_load_map method should return a map between nodes addresses and
their load. In origin the implementation is based on the load
broadcaster that we currently do not have.

This workaround return a map with a single entry of the current node
address and its load
2015-09-13 12:50:07 +03:00
Avi Kivity
cab2148141 Merge "partial sstable handling" from Raphael
closes issue #75.
2015-09-13 12:03:50 +03:00
Raphael S. Carvalho
e65c91f324 db: avoid possible underflow on stats pending_compactions
In event of a compaction failure, run_compaction would be called
more than one time for a request, which could result in an
underflow in the stats pending_compactions.
Let's fix that by only decreasing it if compaction succeeded.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-13 11:59:34 +03:00
Gleb Natapov
17e54d0604 add logger for consistency level calculation 2015-09-13 11:59:17 +03:00
Avi Kivity
440cf4c94e Merge seastar upstream
* seastar 49989ca...aa18f5c (11):
  > stream: workaround native network stack drops
  > build: fix sanitize=vptr auto-disable
  > tests: test thread scheduling groups
  > thread: scheduling groups
  > thread: introduce thread_attributes
  > reactor: make later() more fair
  > reactor: introduce force_poll()
  > core: move later() out of line
  > test futurize
  > fix futurize<void> for the case in which Func returns a future
  > futures_test: silence exceptional future ignored messages

Fixes #187.
2015-09-13 11:43:08 +03:00
Raphael S. Carvalho
cdb31a0b4a sstable: kill temporary_filename
We no longer this functionality after TemporaryTOC.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-13 03:32:21 -03:00
Raphael S. Carvalho
538611ab93 sstable: delete sstable generation with temporary toc file
When populating a column family, we will now delete all components
of a sstable with a temporary toc file. A sstable with a temporary
TOC file means that it was partially written, and can be safely
deleted because the respective data is either saved in the commit
log, or in the compacted sstables in case of the partial sstable
being result of a compaction.
Deletion procedure is guarded against power failure by only deleting
the temporary TOC file after all other components were deleted.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-13 03:17:58 -03:00
Raphael S. Carvalho
7677202700 db: handle temporary TOC file when populating cf
When populating a cf, we should also check for a sstable with
temporary TOC file, and act accordingly. By the time being,
we will only refuse to boot. Subsequent work is to gather all
files of a sstable with a temporary TOC file and delete them.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-13 03:03:30 -03:00