Commit Graph

53948 Commits

Author SHA1 Message Date
Asias He
ce927105d8 db/system_keyspace: Implement update_local_tokens 2015-08-12 07:50:26 +08:00
Asias He
b3f7507e0a storage_service: Enable gossiper.replacement_quarantine in handle_state_normal 2015-08-12 07:50:26 +08:00
Asias He
4d3f333ec0 storage_service: Enable call to remove_endpoint in on_remove 2015-08-12 07:50:26 +08:00
Asias He
95dd307597 db/system_keyspace: Remove duplicated commented out code
I'm not sure what happened. We have the same commented code in both .hh
and .cc. It is very confusing when enabling some of the code. Let's
remove the duplicated code in .cc and leave the in .hh only.
2015-08-12 07:50:26 +08:00
Asias He
96fe749141 db/system_keyspace: Stub get_bootstrap_state and friends 2015-08-12 07:50:26 +08:00
Asias He
951c0d192b storage_service: Enable _is_survey_mode logic in join_token_ring 2015-08-12 07:50:26 +08:00
Asias He
6874663c9d storage_service: Enable current in prepare_to_join 2015-08-12 07:50:26 +08:00
Asias He
645700d261 storage_service: Implement join_ring
Join the ring by operator request.
2015-08-12 07:50:26 +08:00
Asias He
3ea91504ba storage_service: Enable get_saved_tokens and get_initial_tokens 2015-08-12 07:50:26 +08:00
Asias He
5cb5050ca1 system_keyspace: Stub get_saved_tokens 2015-08-12 07:50:26 +08:00
Raphael S. Carvalho
ce2fea2976 tests: add basic test to compaction manager functionality
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-11 20:14:48 +03:00
Gleb Natapov
36c7c2ac5f Provide correct data_present for read_timeout_exception
Fix fixmes.
2015-08-11 19:45:59 +03:00
Gleb Natapov
6046316352 Untemplatize continuation in storage_proxy::query
Nothing wrong with it besides that it crashs my eclips indexer for some
reason.
2015-08-11 19:45:59 +03:00
Avi Kivity
09ae029f87 Merge seastar upstream
* seastar 8f97b50...7e7cef2 (1):
  > future: make futurize::apply() more flexible
2015-08-11 18:44:28 +03:00
Avi Kivity
ca6ab0a6d1 Merge "Implement "atomic" batch statement processing" from Calle
"I.e. implement storage_proxy::mutate_atomically, which in turn means
roughly the same as mutate, with write/remove from the batchlog table
intermixed.

This patch restructures some stuff in storage_proxy to avoid to much code
duplication, with the assumption (amongst other) that dead nodes will be few
etc."
2015-08-11 18:33:34 +03:00
Avi Kivity
8d449190d1 thrift: handle exceptions thrown in non-continuation part of thrift handler
Our thrift code performs an elaborate dance to convert a result/exception
reported in a future<> to the cob/exn_cob flow required by the thrift
library.  However, if the exception if thrown before the first continuation,
no one will catch it will be leaked, eventually resulting in a crash.

Fix by replacing the complete() infrastructure, which took a future as a
parameter, with a with_cob() helper that instead takes a function to
execute.  This allows it to catch both exceptions thrown directly and
exceptions reported via the future.

Fixes #133.
2015-08-11 18:29:58 +03:00
Calle Wilund
81f2f80963 Add batchlog_manager to cqlenv + add batch statement test to cql test 2015-08-11 17:10:18 +02:00
Calle Wilund
b7c7c97295 StorageProxy: implement mutate_atomically
Atomically == add to batch log before doing actual mutate
2015-08-11 17:10:17 +02:00
Calle Wilund
7cacf6382f main: initialize global batchlog_manager in startup code 2015-08-11 17:10:17 +02:00
Calle Wilund
9a52ad84b1 BatchlogManager: make blm globally reachable distributed like other objects 2015-08-11 17:10:17 +02:00
Calle Wilund
0ded44eeee BatchlogManager: make endpoint_filter method + implement 2015-08-11 17:10:16 +02:00
Glauber Costa
799a6b5962 sstables: change summary_la to summary_ka
What we implement is ka, not la. Since the summary is the one element that
actually changed in the 2.2 implementation, it is particularly important that
we get this one right. I have previously missed this.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-11 17:47:48 +03:00
Raphael S. Carvalho
9823164c89 db: introduce compaction manager
Currently, each column family creates a fiber to handle compaction requests
in parallel to the system. If there are N column families, N compactions
could be running in parallel, which is definitely horrible.

To solve that problem, a per-database compaction manager is introduced here.

Compaction manager is a feature used to service compaction requests from N
column families. Parallelism is made available by creating more than one
fiber to service the requests. That being said, N compaction requests will
be served by M fibers.

A compaction request being submitted will go to a job queue shared between
all fibers, and the fiber with the lowest amount of pending jobs will be
signalled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-11 17:25:46 +03:00
Avi Kivity
3c5f12bb97 Merge seastar upstream
* seastar a1933df...8f97b50 (1):
  > sharded: fix move assignment operator
2015-08-11 17:03:20 +03:00
Avi Kivity
e00deca209 Merge "CQL protocol fixes" from Pekka
"Various fixes to CQL protocol error handling."
2015-08-11 14:49:21 +03:00
Pekka Enberg
d461443bf2 transport/server: Fix error handling in parse_frame()
Throw a protocol exception instead of killing the process with abort().

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-11 14:33:30 +03:00
Pekka Enberg
48abeefdda transport/server: Fix connection version mismatch validation
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-11 14:33:30 +03:00
Pekka Enberg
bb067782d7 transport/server: Fix CQL binary protocol version validation
We only support up to CQL protocol v3. Fixes #117.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-11 14:33:30 +03:00
Pekka Enberg
ad041207a4 transport/server: Catch all request processing errors
We need to also catch exceptions in top-level connection::process() so
that they are converted to proper CQL protocol errors.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-11 14:33:21 +03:00
Calle Wilund
b7cdd189e7 BatchlogManager: make constructible from distributed<db> (to fit main init) 2015-08-11 09:46:59 +02:00
Amnon Heiman
dab068dde9 API: modify column family API to use the histogram
With the change in column_family stats, the API needs to get the counter
from the read and write histogram.

It also adds the implementation for the read and write latency histogram.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Amnon Heiman
17ebebf268 API: When combining histogram, return zeroed histogram on empty
This change make sure that when there are no results (ie. all the
histogram that are summed are empty) the return result will be a zerroed
histogram and not an empty object.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Amnon Heiman
4329377556 column_family to use histogram for read and write latency
With the use of sparse histogram, the read and write counters in the
column_family stats can be used.

The total impact on performanc should be neglectible.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Amnon Heiman
3ef36681cc API: Adding read, write latency histogram to column_family
This adds the latency histogram to the column_family swagger
definitions.
The definitions are based on the ColumnFamilyMetrics.
It adds the following commands:

get_read_latency_histogram
get_all_read_latency_histogram
get_write_latency_histogram
get_all_write_latency_histogram

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Amnon Heiman
bd9a758b80 Utils: Support sample based histogram
The histogrm object is used both as a general counter for the number of
events and for statistics and sampling.

This chanage the histogram implementation, so it would support spares
sampling while keeping the total number of event accurate.

The implementation includes the following:
Remove the template nature of the histogram, as it is used only for
timer and use the name ihistogram instead.

If in the future we'll need a histogram for other types, we can use the
histogrma name for it.

a total counter was added that count the number of events that are part
of the statistic calculation.

A helper methods where added to the ihistogram to handle the latency
counter object.

According to the sample mask it would mark the latency object as start
if the counter and the mask are non zero and it would accept the latency
object in its mark method, in which if the latency was not start, it
will not be added and only the 'count' counter that counts the total
number of events will be incremented.

This should reduce the impact of latency calculation to a neglectable
effect.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:00:53 +03:00
Amnon Heiman
af2ec7c7e8 Utils add an is start method to latency_counter
When doing a spares latency check, it is required to know if a latency
object was started.

This returns true if the start timer was set.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 09:05:11 +03:00
Calle Wilund
6ac6d644be Commitlog: add logging
Note: pretty lame logging, but modeled after origin.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-08-10 18:42:41 +03:00
Shlomi Livne
cd57f2e8c4 Enable num_tokens in config
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-08-10 18:04:40 +03:00
Avi Kivity
925b44250d tests: fix cql_test_env fight with storage_service
storage_service is a singleton, and wants a database for initialization.
On the other hand, database is a proper object that is created and
destroyed for each test.  As a result storage_service ends up using
a destroyed object.

Work around this by:
  - leaking the database object so that storage_service has something
    to play with
  - doing the second phase of storage_service initialization only once
2015-08-10 15:48:38 +03:00
Pekka Enberg
3bac70cb59 db/schema_tables: Fix use-after-free in create_table_from_table_partition()
Fixes #119 and fixes #120.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-10 14:44:50 +03:00
Avi Kivity
875f910843 Merge seastar upstream
* seastar 887f72d...a1933df (1):
  > semaphore: switch from list to circular_buffer
2015-08-10 13:21:32 +03:00
Raphael S. Carvalho
18c792c174 compaction: fix throughput calculation
(endsize / (1024*1024)) is an integer calculation, so if endsize is
lower than 1024^2, the result would be 0.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-10 13:18:11 +03:00
Avi Kivity
5c73819a2e Merge "gossip: futurize callback" from Asias
"We run a gossip round inside a seastar thread which runs every second, so we
can wait for the operations inside the callback to complete."
2015-08-10 12:52:30 +03:00
Avi Kivity
e6f0e459fd Merge "Use num_tokens from config file" from Glauber
"Without this, we cannot use our sstables in Cassandra 2.1.8 without
edititing their conf file first, to expect 3 tokens - a number we hard code."
2015-08-10 12:50:40 +03:00
Avi Kivity
b6e228c39f Merge "Improved cache preload"
I made the mistake of running scylla on a spinning disk.  Since a disk
can serve about 100 reads/second, that set the tone for the whole benchmark.

Fix by improving cache preload when flushing a memtable.  If we can detect
that a mutation is not part of any sstable (other than the one we just wrote),
we can add insert it into the cache.

After this, running a mixed cassandra-stress returns the expected results,
even on a spinning disk.
2015-08-10 12:40:35 +03:00
Avi Kivity
b09c1b8c01 Merge "streaming: error handling and cleanup" from Asias 2015-08-10 12:40:01 +03:00
Raphael S. Carvalho
1e335006e7 api: add missing stats to column family api
addresses issue #84

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-10 12:31:38 +03:00
Takuya ASADA
cdde99fd76 dist: add gcc-c++ on BuildRequires for scylla-server.spec
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-08-10 12:18:51 +03:00
Avi Kivity
0f500f60e5 Merge repair updates from Nadav 2015-08-10 12:17:09 +03:00
Nadav Har'El
a5ce8108f2 repair: add FIXME
Add a FIXME about something I'm unsure about - does repair only need to
repair this node, or also make an effort to also repair the other nodes
(or more accurately, their specific token-ranges being repaired) if we're
already communicating with them?

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:56 +03:00