Commit Graph

5630 Commits

Author SHA1 Message Date
Gleb Natapov
6046316352 Untemplatize continuation in storage_proxy::query
Nothing wrong with it besides that it crashs my eclips indexer for some
reason.
2015-08-11 19:45:59 +03:00
Avi Kivity
09ae029f87 Merge seastar upstream
* seastar 8f97b50...7e7cef2 (1):
  > future: make futurize::apply() more flexible
2015-08-11 18:44:28 +03:00
Avi Kivity
ca6ab0a6d1 Merge "Implement "atomic" batch statement processing" from Calle
"I.e. implement storage_proxy::mutate_atomically, which in turn means
roughly the same as mutate, with write/remove from the batchlog table
intermixed.

This patch restructures some stuff in storage_proxy to avoid to much code
duplication, with the assumption (amongst other) that dead nodes will be few
etc."
2015-08-11 18:33:34 +03:00
Avi Kivity
8d449190d1 thrift: handle exceptions thrown in non-continuation part of thrift handler
Our thrift code performs an elaborate dance to convert a result/exception
reported in a future<> to the cob/exn_cob flow required by the thrift
library.  However, if the exception if thrown before the first continuation,
no one will catch it will be leaked, eventually resulting in a crash.

Fix by replacing the complete() infrastructure, which took a future as a
parameter, with a with_cob() helper that instead takes a function to
execute.  This allows it to catch both exceptions thrown directly and
exceptions reported via the future.

Fixes #133.
2015-08-11 18:29:58 +03:00
Calle Wilund
81f2f80963 Add batchlog_manager to cqlenv + add batch statement test to cql test 2015-08-11 17:10:18 +02:00
Calle Wilund
b7c7c97295 StorageProxy: implement mutate_atomically
Atomically == add to batch log before doing actual mutate
2015-08-11 17:10:17 +02:00
Calle Wilund
7cacf6382f main: initialize global batchlog_manager in startup code 2015-08-11 17:10:17 +02:00
Calle Wilund
9a52ad84b1 BatchlogManager: make blm globally reachable distributed like other objects 2015-08-11 17:10:17 +02:00
Calle Wilund
0ded44eeee BatchlogManager: make endpoint_filter method + implement 2015-08-11 17:10:16 +02:00
Glauber Costa
799a6b5962 sstables: change summary_la to summary_ka
What we implement is ka, not la. Since the summary is the one element that
actually changed in the 2.2 implementation, it is particularly important that
we get this one right. I have previously missed this.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-11 17:47:48 +03:00
Raphael S. Carvalho
9823164c89 db: introduce compaction manager
Currently, each column family creates a fiber to handle compaction requests
in parallel to the system. If there are N column families, N compactions
could be running in parallel, which is definitely horrible.

To solve that problem, a per-database compaction manager is introduced here.

Compaction manager is a feature used to service compaction requests from N
column families. Parallelism is made available by creating more than one
fiber to service the requests. That being said, N compaction requests will
be served by M fibers.

A compaction request being submitted will go to a job queue shared between
all fibers, and the fiber with the lowest amount of pending jobs will be
signalled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-11 17:25:46 +03:00
Avi Kivity
3c5f12bb97 Merge seastar upstream
* seastar a1933df...8f97b50 (1):
  > sharded: fix move assignment operator
2015-08-11 17:03:20 +03:00
Avi Kivity
e00deca209 Merge "CQL protocol fixes" from Pekka
"Various fixes to CQL protocol error handling."
2015-08-11 14:49:21 +03:00
Pekka Enberg
d461443bf2 transport/server: Fix error handling in parse_frame()
Throw a protocol exception instead of killing the process with abort().

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-11 14:33:30 +03:00
Pekka Enberg
48abeefdda transport/server: Fix connection version mismatch validation
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-11 14:33:30 +03:00
Pekka Enberg
bb067782d7 transport/server: Fix CQL binary protocol version validation
We only support up to CQL protocol v3. Fixes #117.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-11 14:33:30 +03:00
Pekka Enberg
ad041207a4 transport/server: Catch all request processing errors
We need to also catch exceptions in top-level connection::process() so
that they are converted to proper CQL protocol errors.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-11 14:33:21 +03:00
Calle Wilund
b7cdd189e7 BatchlogManager: make constructible from distributed<db> (to fit main init) 2015-08-11 09:46:59 +02:00
Calle Wilund
6ac6d644be Commitlog: add logging
Note: pretty lame logging, but modeled after origin.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-08-10 18:42:41 +03:00
Shlomi Livne
cd57f2e8c4 Enable num_tokens in config
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-08-10 18:04:40 +03:00
Avi Kivity
925b44250d tests: fix cql_test_env fight with storage_service
storage_service is a singleton, and wants a database for initialization.
On the other hand, database is a proper object that is created and
destroyed for each test.  As a result storage_service ends up using
a destroyed object.

Work around this by:
  - leaking the database object so that storage_service has something
    to play with
  - doing the second phase of storage_service initialization only once
2015-08-10 15:48:38 +03:00
Pekka Enberg
3bac70cb59 db/schema_tables: Fix use-after-free in create_table_from_table_partition()
Fixes #119 and fixes #120.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-10 14:44:50 +03:00
Avi Kivity
875f910843 Merge seastar upstream
* seastar 887f72d...a1933df (1):
  > semaphore: switch from list to circular_buffer
2015-08-10 13:21:32 +03:00
Raphael S. Carvalho
18c792c174 compaction: fix throughput calculation
(endsize / (1024*1024)) is an integer calculation, so if endsize is
lower than 1024^2, the result would be 0.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-10 13:18:11 +03:00
Avi Kivity
5c73819a2e Merge "gossip: futurize callback" from Asias
"We run a gossip round inside a seastar thread which runs every second, so we
can wait for the operations inside the callback to complete."
2015-08-10 12:52:30 +03:00
Avi Kivity
e6f0e459fd Merge "Use num_tokens from config file" from Glauber
"Without this, we cannot use our sstables in Cassandra 2.1.8 without
edititing their conf file first, to expect 3 tokens - a number we hard code."
2015-08-10 12:50:40 +03:00
Avi Kivity
b6e228c39f Merge "Improved cache preload"
I made the mistake of running scylla on a spinning disk.  Since a disk
can serve about 100 reads/second, that set the tone for the whole benchmark.

Fix by improving cache preload when flushing a memtable.  If we can detect
that a mutation is not part of any sstable (other than the one we just wrote),
we can add insert it into the cache.

After this, running a mixed cassandra-stress returns the expected results,
even on a spinning disk.
2015-08-10 12:40:35 +03:00
Avi Kivity
b09c1b8c01 Merge "streaming: error handling and cleanup" from Asias 2015-08-10 12:40:01 +03:00
Raphael S. Carvalho
1e335006e7 api: add missing stats to column family api
addresses issue #84

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-10 12:31:38 +03:00
Takuya ASADA
cdde99fd76 dist: add gcc-c++ on BuildRequires for scylla-server.spec
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-08-10 12:18:51 +03:00
Avi Kivity
0f500f60e5 Merge repair updates from Nadav 2015-08-10 12:17:09 +03:00
Nadav Har'El
a5ce8108f2 repair: add FIXME
Add a FIXME about something I'm unsure about - does repair only need to
repair this node, or also make an effort to also repair the other nodes
(or more accurately, their specific token-ranges being repaired) if we're
already communicating with them?

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:56 +03:00
Nadav Har'El
7a8ed228c7 repair: better error message
If a stream failed, print a clear error message that repair failed, instead
of ignoring it and letting Seastar's generic "warning, exception was ignored"
be the only thing the user will see.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:56 +03:00
Nadav Har'El
71a3a0c026 repair: repair each local range separately
The previous repair code exchanged data with the other nodes which have
one arbitrary token. This will only work correctly when all the nodes
replicate all the data. In a more realistic scenario, the node being
repaired holds copies of several token ranges, and each of these ranges
has a different set of replicas we need to perform the repair with.

So this patch does the right thing - we perform a separate repair_range()
for each of the local ranges, and each of those will find a (possibly)
different set of nodes to communicate with.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:55 +03:00
Nadav Har'El
f74eedce7d replication: add get_ranges() function
This patch adds a method get_ranges() to replication-strategy.
It returns the list of token ranges held by the given endpoint.

It will be used by the replication code, which needs to know
in particular which token ranges are held by *this* node.

This function is the analogue of Origin's getAddressRanges().get(endpoint).
As in Origin, also here the implementation is not meant to be efficient,
and will not be used in the fast path.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:55 +03:00
Asias He
0e2f9beec4 streaming: Wait after create keyspace and create table
Give it some time to propagate the schema to other nodes.
2015-08-10 15:53:42 +08:00
Asias He
d724fd449c streaming: Avoid storing partition_range in stream_detail
Now, make_local_reader does not need partition_range to be alive when we
read the mutation reader. No need to store it in stream_detail for its
lifetime.
2015-08-10 15:51:13 +08:00
Asias He
62394cc9d0 streaming: Add error handling for PREPARE_MESSAGE 2015-08-10 15:05:10 +08:00
Asias He
9f83588e66 streaming: Add error handling for STREAM_INIT_MESSAGE 2015-08-10 15:01:29 +08:00
Asias He
e13d93b2ff streaming: Improve error handling in stream_transfer_task::complete 2015-08-10 14:49:34 +08:00
Asias He
c7c33a9f44 streaming: Add error handling for STREAM_MUTATION sending 2015-08-10 14:44:25 +08:00
Asias He
be4d9c63b1 streaming: Drop do_with in stream_transfer_task::start
We can copy id instead, it is cheap.
2015-08-10 14:13:15 +08:00
Asias He
7fcaca56bd storage_service: Wait for schedule_schema_pull
It returns a future, we should not ignore it.
2015-08-10 10:26:27 +08:00
Asias He
1291344e68 storage_service: Wait for operations to complete in gossip callback
Since all the gossip callback (e.g., on_change) are executed inside a
seastar::async context, we can make wait for the operations like update
system table to complete.
2015-08-10 10:21:57 +08:00
Asias He
0b475a5173 gossip: Dump endpoint_state_map in debug mode
This is very useful for debug.
2015-08-10 09:48:32 +08:00
Asias He
5f7628da12 gossip: Run real_mark_alive under seastar::async context
Now on_dead is now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
d15c8289a2 gossip: Run remove_endpoint inside seastar::async context
on_remove is now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
56615a8a29 gossip: Make real_mark_alive run inside seastar::async context
on_alive callbacks are now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
4eedd417b1 gossip: Run code inside seastar::async context for add_local_application_state
So that do_before_change_notifications and do_on_change_notifications
are under seastar::async.

Now, before_change callbacks are inside seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
825f6d141d gossip: Run code inside seastar::async context for apply_state_locally
It is easier to futurize apply_new_states and handle_major_state_change.

Now, on_change, on_join and on_restart callbacks are inside
seastar::async context.
2015-08-10 09:48:32 +08:00