Commit Graph

5864 Commits

Author SHA1 Message Date
Avi Kivity
a8cb2f92ac Merge "the storage service metrics" from Amnon
"This series adds the definition and stub implemntation of the storge service
metrics.  It is based on the StorageMetrics class."
2015-08-23 15:43:44 +03:00
Gleb Natapov
6723748aff Implement speculating read and always speculating read executors 2015-08-23 15:26:49 +03:00
Gleb Natapov
54e1628928 Get configured speculative retry type for read 2015-08-23 15:26:48 +03:00
Gleb Natapov
cf10416786 Implement new_read_repair_decision() function. 2015-08-23 15:26:48 +03:00
Gleb Natapov
5de6759f40 Do not check targets size before calling make_digest_requests()
If there is not enough targets make_digest_requests() will return ready
future immediately.
2015-08-23 15:26:48 +03:00
Avi Kivity
9724971fc3 Merge seastar upstream 2015-08-23 15:25:55 +03:00
Raphael S. Carvalho
c6ea25c5fb compaction_manager: fix compaction_manager::stop
For stopping a task of compaction manager, we first close the gate
used by compaction then bust semaphore via semaphore::broken().

The problem is that semaphore::broken() only signals waiters, and so
subsequent semaphore::wait() calls would succeed and the task would
remain alive forever.
The fix is to signal semaphore, forcing the task to exit via gate
exception, so we will no longer rely on semaphore::broken() for
finishing the task. That's possible because we try to access the
gate right after we waited on semaphore.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-22 20:38:12 +03:00
Nadav Har'El
e7ef6149d7 abstract_replication_strategy: add get_primary_ranges() method
Add a abstract_replication_strategy::get_primary_ranges() method, which is
very similar to the existing get_ranges(), except that only the "primary"
owner of each range will return it in its list.

This is needed for the "primary range" repair option, which asks to repair
only the primary range. This option is useful when the user plans to start
a repair on *all* nodes, we shouldn't repair the same token range multiple
times, so each range should be repaired by only one of the nodes.

abstract_replication_strategy::get_primary_ranges() is similar to Origin's
StorageService.getPrimaryRangesForEndpoint().

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-22 20:05:32 +03:00
Gleb Natapov
788fc66e29 messaging: keep shared reference to rpc client while send is underway
There can be multiple sends underway when first one detects an error and
destroys rpc client, but the rpc client is still in use by other sends.
Fix this by making rpc client pointer shared and hold on it for each
send operation.
2015-08-20 19:22:08 +03:00
Paweł Dziepak
a64e9c5029 cql3: forbid empty non-composite clustering key
Origin forbdis empty values in clustering key only if that clustering
key is non-composite (i.e. there is only one column).

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-20 16:28:51 +03:00
Avi Kivity
1579f86503 memtable: keep the lsa region alive while partitions are being destroyed
Or we get a use-after-free.  Reported by Pekka.
2015-08-20 15:32:30 +03:00
Avi Kivity
f531f36a44 lsa: fix types in logs 2015-08-20 15:29:08 +03:00
Avi Kivity
dc13ccdeeb Merge "Improve memory reclaim robustness" 2015-08-20 12:11:58 +03:00
Avi Kivity
f996ea202a Merge seastar upstream
* seastar d96bfcd...696ab29 (5):
  > Merge "dpdk: rework xmit" from Vlad
  > README.md: add libxml2-devel to Fedora's packages list
  > Merge
  > memory: reclaim more aggressively
  > reactor: stop repeat() loops if we wish reclaim memory
2015-08-20 12:11:23 +03:00
Avi Kivity
9012f991bf logalloc: really allow dipping into the emergency pool during reclaim
The RAII wrapper for the emergency pool was invoked without an object,
and so had no effect.
2015-08-20 12:10:03 +03:00
Nadav Har'El
07480c75e6 repair: use parallel_for_each instead of semaphore
Requested by Avi. The added benefit is that the code for repairing
all the ranges in parallel is now identical to the code of repairing
the ranges one by one - just replace do_for_each with parallel_for_each,
and no need for a different implementation using semaphores like I had
before this patch.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-20 10:51:57 +03:00
Nadav Har'El
4e3dbef512 repair: conform to coding style
Use "_" prefix on class member "status".

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-20 10:51:56 +03:00
Vlad Zolotarov
288a96bcc4 README.md: add a missing antlr3-C++-devel package to Fedora packages list
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-08-20 10:47:48 +03:00
Avi Kivity
bcff75003e row_cache: yield while moving data to cache
If we don't yield, we can run out of memory while moving a memtable into
cache.

This reduces the chance that writing an sstable will fail because we could
not transfer the memtable into the cache.
2015-08-19 19:36:41 +03:00
Avi Kivity
c01bc16f58 db: don't give up flushing a memtable on error
We must try again, or the memtable's memory will never be reclaimed.
2015-08-19 19:36:41 +03:00
Avi Kivity
6846909533 db: extract sstable flushing code to a function 2015-08-19 19:36:41 +03:00
Avi Kivity
5bf5476beb db: add collectd counter for dirty memory 2015-08-19 19:36:41 +03:00
Avi Kivity
c175025bb6 db: place all memtables into a single region_group
We can use this to track the amount of unevictable memory in the
system.
2015-08-19 19:36:41 +03:00
Avi Kivity
9ed2bbb25c lsa: introduce region_group
A region_group is a nestable group of regions, for cumulative statistics
purposes.
2015-08-19 19:36:40 +03:00
Raphael S. Carvalho
32ce27f00d tests: fix possible failure on compaction manager test
If sleep time isn't enough for compaction manager to select the
submitted cf for compaction, then the test will fail because the
compaction will not take place and subsequent checks will fail.
A solution is to sleep until the required condition becomes true.

Problem and solution found by Shlomi.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-19 18:26:54 +03:00
Avi Kivity
dd4d76980d Merge seastar upstream
* seastar 4ff14c4...d96bfcd (1):
  > rpc: keep reference to server::connection while reply is sent
2015-08-19 15:11:28 +03:00
Avi Kivity
71aad57ca8 lsa: make region::impl a top-level class
Makes using forward declarations possible.
2015-08-19 14:43:17 +03:00
Avi Kivity
7b67b04822 db: wire up max memtable size configuration 2015-08-19 13:17:27 +03:00
Avi Kivity
c317391f62 db: trigger memtable flush based on actual memory usage
Rather than using _mutation_count as a poor proxy.
2015-08-19 12:59:52 +03:00
Avi Kivity
e7272d27cc tests: perf_mutation: convert to app_template
Won't work with lsa without it, due to too small default memory size.
2015-08-19 11:18:07 +03:00
Avi Kivity
5252d5ec9b managed_bytes: fix self-assignment 2015-08-19 11:18:07 +03:00
Avi Kivity
00f39c4e1a managed_bytes: add small string optimization 2015-08-19 11:18:07 +03:00
Avi Kivity
0db6962a89 Merge "gossip fix" from Asias
Fixes #162, #144, #134.
2015-08-19 11:11:56 +03:00
Asias He
8415218dba gossip: Save one seastar thread inside remove_endpoint
Now all the caller are inside a seastar thread. Kill one thread inside
remove_endpoint.
2015-08-19 14:46:44 +08:00
Avi Kivity
176ab06f77 db: demote commitlog reorderign detected log message to debug
It's less rare than we thought and also less interesting.
2015-08-19 09:26:23 +03:00
Asias He
b560a50ea7 gossip: Fix a name typo
evict_from_membershipg -> evict_from_membership
2015-08-19 14:22:54 +08:00
Asias He
54a42b4549 gossip: Fix a iterate and delete issue in do_status_check
evict_from_membership might delete an element inside endpoint_state_map
while iterating it.

Fixes #162
2015-08-19 14:21:11 +08:00
Asias He
016dfdc8e1 gossip: Gossip error messages
We are printing out error messages when a remote connection is closed

   ERROR   [shard 0] gossip - Fail to send GossipDigestACK2 to 127.0.0.2:0: rpc::closed_error (connection is closed)
   ERROR   [shard 0] gossip - Fail to handle GOSSIP_DIGEST_ACK: rpc::closed_error (connection is closed)
   WARN    [shard 0] unimplemented

this is causing issues with DTEST as it validates after finishing a run
that there are no ERRORs in the log

The rule is:
   We can handle it correctly if error occurs -> log warn
   We can not handle it correctly when error occurs -> log error

Fixes #144
2015-08-19 14:21:05 +08:00
Asias He
009f9e7f21 gossip: Add timeout for send_gossip_digest_syn in do_shadow_round
Fixes #134
2015-08-19 14:20:52 +08:00
Avi Kivity
ddee5e817a Workaround boost::any_cast bug
any_cast<X> is supposed to return X, but boost 1.55's any_cast<X> returns
X&&.  This means the lifetime-extending construct

   auto&& x = boost::any_cast<X>(...);

will not work, because the result of the expression is an rvalue reference,
not a true temporary.

Fix by using a temporary, not a lifetime-extending reference.

Fixes #163.
2015-08-19 09:15:31 +03:00
Avi Kivity
2b4eaf83ab transport: set TCP_NODELAY
Reduces latency.
2015-08-18 15:48:36 +03:00
Avi Kivity
2e92759943 thrift: set TCP_NODELAY
Reduces latency.
2015-08-18 15:48:36 +03:00
Avi Kivity
c98a738656 Merge seastar upstream
* seastar 2afc6c8...39eeca5 (5):
  > rpc: set TCP_NODELAY on rpc sockets
  > net: implement accessors for Nagle's algorithm (TCP_NODELAY)
  > posix: improve getsockopt() interface
  > future: slightly optimize then() signature
  > net: add missing tcp_nodelay thunks
2015-08-18 15:48:16 +03:00
Avi Kivity
2354611920 Merge "storage_service udpate" from Asias 2015-08-18 12:34:14 +03:00
Avi Kivity
8047e5ed8a Merge seastar upstream
* seastar 69edf16...2afc6c8 (3):
  > Rebase dpdk to v2.1.0
  > future: don't use get() in future_state::forward_to()
  > future: add get_value(), and use it in then()
2015-08-18 12:32:43 +03:00
Avi Kivity
0b01b74444 build: disable seastar Xen support
Not needed, and conflicts with dpdk.
2015-08-18 12:31:26 +03:00
Avi Kivity
e9a46215ef build: change project name
The configure script originated from seastar, need a name change.
2015-08-18 12:29:05 +03:00
Asias He
1b97cd988d storage_service: Fix indentation in init_server 2015-08-18 17:06:03 +08:00
Asias He
eda11a35e6 storage_service: Partially implement handle_state_removing 2015-08-18 17:06:03 +08:00
Asias He
0f2e4003ce storage_service: Implement handle_state_moving 2015-08-18 17:06:03 +08:00