Commit Graph

53948 Commits

Author SHA1 Message Date
Pekka Enberg
9476e3f19f db/index: Kill Java code
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 11:51:50 +03:00
Pekka Enberg
544c7936d8 db/commitlog: Kill Java code
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 11:51:49 +03:00
Pekka Enberg
8307fe7c85 cql3: Kill Java code
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 11:15:41 +03:00
Calle Wilund
ac74dd6159 Commitlog: Make "position" type 32-bit to align replay_position with Origin 2015-08-24 10:05:44 +02:00
Calle Wilund
d50986ef31 Commitlog: do not eagerly create first segment on init
Deferring makes it easier to separate old segments from new, which in turn
helps replay logic.
2015-08-24 10:05:44 +02:00
Avi Kivity
83d5c7e7c8 Merge 2015-08-24 10:58:39 +03:00
Pekka Enberg
5dbf1baed4 db/composites: Kill Java code
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 10:56:30 +03:00
Avi Kivity
855ef838a9 db: fix use-after-free with region_group
_dirty_memory_region_group is used by the column_family's memtables, but
is destroyed before them.

Fix by changing the destruction order.

Fixes #175.
2015-08-24 10:51:03 +03:00
Avi Kivity
43474a4d5a Merge 2015-08-24 10:37:08 +03:00
Avi Kivity
a9a3b47781 Merge "Optimize CQL query parameters" from Pekka
"This series optimizes CQL query parameter handling by avoiding memory
allocation and copies where possible. I have only tested with the POSIX
stack and have not seen performance difference in cassandra-stress
because Linux networking dominates the profiles. The optimizations
should improve things with DPDK, though, because the
cql_server::read_query_options() hotspot is effectively eliminated."
2015-08-24 09:19:56 +03:00
Pekka Enberg
aca4c0d2bb mutation: Avoid a copy in set_cell() and others
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Pekka Enberg
10c6eee221 transport/server: Use sstring_view for query option names
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Pekka Enberg
5263b712fb transport/server: read_string_view() helper
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Pekka Enberg
9f2bcc6a77 cql3: Change bind_and_get() return type to bytes_view_opt
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Pekka Enberg
f3118755f8 cql3: Use "auto" for bind_and_get() return value assignment
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Pekka Enberg
23e9bf7162 cql3/query_options: make_temporary() helper
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Pekka Enberg
6dee204db2 cql3/query_options: Store values as bytes view
Store values as bytes view when possible. This improves the CQL protocol
option parsing path by avoiding allocating memory and copying individual
values as "bytes" objects.

Please note that we retain the non-view version for internal queries
where performance is not as important.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Pekka Enberg
6d6c97f1a8 types.hh: as_bytes_view_opt() helper
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Pekka Enberg
cbb6a24911 types.hh: to_bytes_opt() helper
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 09:06:13 +03:00
Avi Kivity
6f11322220 db: move annoying log on non-durable cf to quieter place
Fixes #174.
2015-08-23 23:12:07 +03:00
Avi Kivity
f2e612ebd8 transport: log another request write processing error
An assertion failure (abandoned promise) led me to believe an error is
generated and ignored here.  Let's try to trap it.
2015-08-23 19:36:57 +03:00
Avi Kivity
8377236273 transport: protect against race in using cql connection gate
While we have a with_gate() call when using the connection, it is
somewhere deep in the bowels of process_request_one().  If that
continuation is deferred, the gate can be closed without anyone
noticing that a request is still working on it; when we do hit the
with_gate, we'll see a use-after-free.
2015-08-23 19:34:53 +03:00
Avi Kivity
e3d16e6da9 Merge "Reduce large allocations"
Large allocations can fail, and since we can't retry them after compaction
(yet), these failures are propagated to the client.

This patchset removes the larges offenders.
2015-08-23 17:07:52 +03:00
Avi Kivity
77b3212c88 lsa: provide a fallback during normal allocation
Instead of failing normal allocations when the seastar allocator cannot
allocate a segment, provide a generous reserve.  An allocation failure
will now be satisified from the reserve, but it will still trigger a
reclaim.  This allows hiding low-memory conditions from the user.
2015-08-23 16:38:04 +03:00
Avi Kivity
0afbdf4aa7 Merge "Add row related methods to the cache_service API" from Amnon
"This series expose statistics from the row_cache in the cache_service API.
After this series the following methods will be available:
get_row_hits
get_row_requests
get_row_hit_rate
get_row_size
get_row_entries"
2015-08-23 15:46:07 +03:00
Avi Kivity
a8cb2f92ac Merge "the storage service metrics" from Amnon
"This series adds the definition and stub implemntation of the storge service
metrics.  It is based on the StorageMetrics class."
2015-08-23 15:43:44 +03:00
Gleb Natapov
6723748aff Implement speculating read and always speculating read executors 2015-08-23 15:26:49 +03:00
Gleb Natapov
54e1628928 Get configured speculative retry type for read 2015-08-23 15:26:48 +03:00
Gleb Natapov
cf10416786 Implement new_read_repair_decision() function. 2015-08-23 15:26:48 +03:00
Gleb Natapov
5de6759f40 Do not check targets size before calling make_digest_requests()
If there is not enough targets make_digest_requests() will return ready
future immediately.
2015-08-23 15:26:48 +03:00
Avi Kivity
9724971fc3 Merge seastar upstream 2015-08-23 15:25:55 +03:00
Avi Kivity
c51292e792 sstables: switch from vector<> to deque<>
Large vectors require contiguous storage, which may not be available (or may
be expensive to obtain).  Switch to deque<> instead, which allocates
discontiguous storage.

Allocation problems were observed with the summary and with the bloom
filter bitmaps.
2015-08-23 12:22:49 +03:00
Avi Kivity
1bb840bb72 sstables: use large_bitset in bloom filter
Avoids allocation failures due to multi-megabyte filters.
2015-08-23 12:22:49 +03:00
Avi Kivity
e928bcaf19 utils: introduce large_bitset
Like boost::dynamic_bitset, but less capable.  On the other hand it avoids
very large allocations, which are incurred by the bloom filter's bitset
on even moderately sized sstables.
2015-08-23 12:22:49 +03:00
Raphael S. Carvalho
c6ea25c5fb compaction_manager: fix compaction_manager::stop
For stopping a task of compaction manager, we first close the gate
used by compaction then bust semaphore via semaphore::broken().

The problem is that semaphore::broken() only signals waiters, and so
subsequent semaphore::wait() calls would succeed and the task would
remain alive forever.
The fix is to signal semaphore, forcing the task to exit via gate
exception, so we will no longer rely on semaphore::broken() for
finishing the task. That's possible because we try to access the
gate right after we waited on semaphore.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-22 20:38:12 +03:00
Nadav Har'El
e7ef6149d7 abstract_replication_strategy: add get_primary_ranges() method
Add a abstract_replication_strategy::get_primary_ranges() method, which is
very similar to the existing get_ranges(), except that only the "primary"
owner of each range will return it in its list.

This is needed for the "primary range" repair option, which asks to repair
only the primary range. This option is useful when the user plans to start
a repair on *all* nodes, we shouldn't repair the same token range multiple
times, so each range should be repaired by only one of the nodes.

abstract_replication_strategy::get_primary_ranges() is similar to Origin's
StorageService.getPrimaryRangesForEndpoint().

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-22 20:05:32 +03:00
Gleb Natapov
788fc66e29 messaging: keep shared reference to rpc client while send is underway
There can be multiple sends underway when first one detects an error and
destroys rpc client, but the rpc client is still in use by other sends.
Fix this by making rpc client pointer shared and hold on it for each
send operation.
2015-08-20 19:22:08 +03:00
Paweł Dziepak
a64e9c5029 cql3: forbid empty non-composite clustering key
Origin forbdis empty values in clustering key only if that clustering
key is non-composite (i.e. there is only one column).

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-20 16:28:51 +03:00
Avi Kivity
1579f86503 memtable: keep the lsa region alive while partitions are being destroyed
Or we get a use-after-free.  Reported by Pekka.
2015-08-20 15:32:30 +03:00
Avi Kivity
f531f36a44 lsa: fix types in logs 2015-08-20 15:29:08 +03:00
Avi Kivity
dc13ccdeeb Merge "Improve memory reclaim robustness" 2015-08-20 12:11:58 +03:00
Avi Kivity
f996ea202a Merge seastar upstream
* seastar d96bfcd...696ab29 (5):
  > Merge "dpdk: rework xmit" from Vlad
  > README.md: add libxml2-devel to Fedora's packages list
  > Merge
  > memory: reclaim more aggressively
  > reactor: stop repeat() loops if we wish reclaim memory
2015-08-20 12:11:23 +03:00
Avi Kivity
9012f991bf logalloc: really allow dipping into the emergency pool during reclaim
The RAII wrapper for the emergency pool was invoked without an object,
and so had no effect.
2015-08-20 12:10:03 +03:00
Nadav Har'El
07480c75e6 repair: use parallel_for_each instead of semaphore
Requested by Avi. The added benefit is that the code for repairing
all the ranges in parallel is now identical to the code of repairing
the ranges one by one - just replace do_for_each with parallel_for_each,
and no need for a different implementation using semaphores like I had
before this patch.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-20 10:51:57 +03:00
Nadav Har'El
4e3dbef512 repair: conform to coding style
Use "_" prefix on class member "status".

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-20 10:51:56 +03:00
Vlad Zolotarov
288a96bcc4 README.md: add a missing antlr3-C++-devel package to Fedora packages list
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-08-20 10:47:48 +03:00
Avi Kivity
bcff75003e row_cache: yield while moving data to cache
If we don't yield, we can run out of memory while moving a memtable into
cache.

This reduces the chance that writing an sstable will fail because we could
not transfer the memtable into the cache.
2015-08-19 19:36:41 +03:00
Avi Kivity
c01bc16f58 db: don't give up flushing a memtable on error
We must try again, or the memtable's memory will never be reclaimed.
2015-08-19 19:36:41 +03:00
Avi Kivity
6846909533 db: extract sstable flushing code to a function 2015-08-19 19:36:41 +03:00
Avi Kivity
5bf5476beb db: add collectd counter for dirty memory 2015-08-19 19:36:41 +03:00