Commit Graph

6315 Commits

Author SHA1 Message Date
Glauber Costa
2a7aa1f0d8 sstables: avoid asserts
It's great to have statistics, but assert is too big of a hammer. We don't need
to crash due to the lack of it, and can try our best to continue.

We currently have a problem (described in 265), in which we, for some reason,
fail to read the Statistics file. Throwing an exception will still cause us to
fail to boot, but at least it will be more informative.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-09-08 10:06:05 +03:00
Avi Kivity
e3e13878d1 Merge "Fix storage_service and gossip API" from Asias 2015-09-08 10:05:16 +03:00
Avi Kivity
6d0a2b5075 logalloc: don't invalidate merged region
A region being merged can still be in use; but after merging, compaction_lock
and the reclaim counter will no longer work.  This can lead to
use-after-compact-without-re-lookup errors.

Fix by making the source region be the same as the target region; they
will share compaction locks and reclaim counters, so lookup avoidance
will still work correctly.

Fixes #286.
2015-09-08 08:55:44 +02:00
Asias He
89f2959536 gossip: Rework stop() and shutdown()
Consolidate stop() and shutdown() into one function.

Fix crash:

scylla: urchin/seastar/core/future.hh:315: void
future_state<>::set(): Assertion `_u.st == state::future' failed.

=== stop gossip
$ curl -X DELETE --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/gossiping"

=== start gossip
$ curl -X POST --header "Content-Type: application/json" --header
"Accept: application/json"
"http://127.0.0.1:10000/storage_service/gossiping"
2015-09-08 12:20:53 +08:00
Asias He
247e9109d9 gossip: Introduce uninit_messaging_service_handler
It is useful in gossip shutdown process.
2015-09-08 12:19:06 +08:00
Asias He
312daed342 storage_service: Fix is_starting API
Query _operation_mode on CPU 0.

$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/is_starting"
2015-09-08 11:07:13 +08:00
Asias He
5e3d8a56b2 storage_service: Fix get_operation_mode API
Route request to CPU 0. _operation_mode is not replicated to other CPUS.

Without this:

$ curl -X GET --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/operation_mode"

returns "NORMAL" and "STARTING" randomly.
2015-09-08 10:55:50 +08:00
Asias He
0d88570286 storage_service: Fix is_gossip_running API and friends
Only cpu 0 instance of gossip has the correct information, route request
to cpu 0.

Fix a bug where

$ curl -X GET --header "Accept: application/json"
 "http://172.31.5.77:10000/storage_service/gossiping"

returns true and false randomly.
2015-09-08 10:45:25 +08:00
Calle Wilund
d614143f5e Commitlog/database: Fixup series "Commit log flush request on disk overflow"
Also at seastar-dev: calle/commitlog_flush_v3
(And, yes, this time I _did_ update the remote!)

Refs #262

Commit of original series was done on stale version (v2) due to authors
inability to multitask and update git repos.

v3:
* Removed future<> return value from callbacks. I.e. flush callback is now
  only fully syncronous over actual call
2015-09-07 21:29:19 +03:00
Gleb Natapov
0149a22f69 storage_proxy: use parallel_for_each in mutate() instead of semaphore
If several mutation in a batch throw exceptions have_cl.broken() will be
called more then once. Fix this by dropping ad hoc have_cl and use
parallel_for_each() that does the same thing that current code is doing.

Fixes #297
2015-09-07 19:29:34 +03:00
Tomasz Grabiec
52828c2e84 test.py: Do not run release-mode only tests if release mode not selected 2015-09-07 19:27:33 +03:00
Avi Kivity
dee9060b12 Merge "Commit log flush request on disk overflow" from Calle
"Fixes #262

Handles CL disk size exceeding configured max size by calling flush handlers
for each dirty CF id / high replay_position mark. (Instead of uncontrolled
delete as previously).

* Increased default max disk size to 8GB. Same as Origin/scylla.yaml (so no
   real change, but synced).
* Divide the max disk size by cpus (so sum of all shards == max)
* Abstract flush callbacks in CL
* Handler in DB that initiates memtable->sstable writes when called.

Note that the flush request is done "syncronously" in new_segment() (i.e.
when getting a new segment and crossing threshold). This is however more or
less congruent with Origin, which will do a request-sync in the corresponding
case.
Actual dealing with the request should at least in production code however be
done async, and in DB it is, i.e. we initiate sstable writes. Hopefully
they finish soon, and CL segments will be released (before next segment is
allocated).

If the flush request does _not_ eventually result in any CF:s becoming
clean and segments released we could potentially be issuing flushes
repeatedly, but never more often than on every new segment."
2015-09-07 18:46:48 +03:00
Tomasz Grabiec
fecc87e601 lsa: stub allocation_section with default allocator
memory::stats() always returns 0 as free memory which confuses
guard::enter().
2015-09-07 17:23:02 +02:00
Gleb Natapov
da242146b6 do not pass storage_proxy reference across cpus
storage_proxy instances are per cpu, so they cannot be passed around to
other cpus.
2015-09-07 17:16:29 +02:00
Paweł Dziepak
03f5827570 logalloc: add missing methods to DEFAULT_ALLOCATOR version
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-07 16:59:27 +02:00
Paweł Dziepak
ac602b13b5 tests: fix signed/unsigned comparison
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-07 16:41:00 +02:00
Avi Kivity
e37dfab853 Merge "Stability improvements" from Tomasz
"Fixes #259 and other problems found along the way."
2015-09-07 16:45:44 +03:00
Gleb Natapov
327e27b67b storage_proxy: stop query timeout timer when all replies are received
Fixes #285
2015-09-07 15:21:51 +02:00
Gleb Natapov
41f16159b3 storage_proxy: track reference to storage_proxy during mutate/query operations
This patch makes sure that storage_proxy cannot be deleted while
mutate/query operation is in progress.
2015-09-07 14:46:13 +02:00
Gleb Natapov
f51f5c819e messaging: Add unregister function for verbs used by storage proxy 2015-09-07 14:46:13 +02:00
Gleb Natapov
b884aba147 storage_proxy: drop superfluous captures 2015-09-07 14:46:13 +02:00
Gleb Natapov
5af2d18b6f stroage_proxy: change storage_proxy::mutate_locally to use do_with 2015-09-07 14:46:13 +02:00
Takuya ASADA
8fa868d4e9 dist: use mock rpm instead of rpmbuild
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-09-07 15:41:13 +03:00
Takuya ASADA
45502b7110 dist: Dynamically configure scylla.yaml on EC2 instance
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-09-07 15:41:13 +03:00
Takuya ASADA
a5dcc39494 dist: fix scylla_run to specify '--network-mode posix' when NETWORK_MODE is posix
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-09-07 15:40:08 +03:00
Tomasz Grabiec
1f3d7aa78c Bump up seastar submodule head
Changes:

    tests: improve rpc timeout test
    rpc: add unregister_handler
    future: Fix assertion failure in case schedule() throws
    rpc: fix rpc timeout
    build: disable -fsanitize=vptr if it is available and broken
2015-09-07 14:04:53 +02:00
Calle Wilund
380649eb66 Database: Add commitlog flush handler to switch memtables to disk
Initiates flushing of CF:s to sstable on CL disk overflow (flush req)
2015-09-07 13:21:46 +02:00
Calle Wilund
fdb921afb2 Commitlog: Add flushing of segment CF:s on disk overflow
* Do not throw away commitlog segments on disk size overflow. 
  Issue a flush request (i.e. calculate RP we want to free unto, 
  and for all dirty CF:s, do a request).
  "Abstracted" as registerable callback. I.e. DB:s responsibility 
  to actually do something with it.
2015-09-07 13:21:43 +02:00
Calle Wilund
31f2dcb342 Config: change commilog max size on disk to be in sync with scylla.yaml 2015-09-07 13:13:51 +02:00
Calle Wilund
841dd32a8a Commitlog: divide max on-disk-size by num cpus
To try to keep the resulting limit as configured
2015-09-07 13:13:46 +02:00
Asias He
f89a25562c storage_service: Fix is_auto_bootstrap
Get the value from cfg option.
2015-09-07 12:53:58 +03:00
Raphael S. Carvalho
1157e6f119 sstable: use desired buffer size in write_simple
Currently, we use a 128k buffer for creation of data and index
files, but for other components we use a 4k buffer size.
Let's also use a 128k buffer for the other components.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-07 12:52:26 +03:00
Tomasz Grabiec
433a298f60 row_cache: Extract comparator construction before the loop 2015-09-07 09:41:36 +02:00
Tomasz Grabiec
bf6062493e tests: Introduce tests/perf_row_cache_update 2015-09-07 09:41:36 +02:00
Tomasz Grabiec
10453c71d2 tests: perf: Make iterations between clock readings in time_it() configurable 2015-09-07 09:41:36 +02:00
Tomasz Grabiec
74603425ac mutation_partition: Introduce r-value version of apply() 2015-09-07 09:41:36 +02:00
Asias He
7cc768a864 gossip: Fix wrong cluster name and partitioner name
Right now, gossip returns hard coded cluster and partitioner name.

  sstring get_cluster_name() {
      // FIXME: DatabaseDescriptor.getClusterName()
      return "my_cluster_name";
  }
  sstring get_partitioner_name() {
      // FIXME: DatabaseDescriptor.getPartitionerName()
      return "my_partitioner_name";
  }

Fix it by setting the correct name from configure option.

With this

   cqlsh 127.0.0.$i -e "SELECT * from system.local;

returns correct cluster_name.

Fixes #291
2015-09-07 09:21:18 +03:00
Tomasz Grabiec
8a140a9ba9 mutation_partition: row: Move implementation to source file 2015-09-06 21:25:44 +02:00
Tomasz Grabiec
bf2b64d3f7 mutation_partition: row: Fix operator==()
Spotted by clion inspections.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
ba35788817 mutation_partition: De-templetize methods
Instead of accepting a column resolver callable, accept a schema and
column_kind or column_selector. Makes the interface easier to use and
enables us to move implementation to .cc file.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
49bf844418 tests: Introduce row_cache_alloc_stress
Tests stability of row_cache operations under low/fragmented memory.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
3b441416fa lsa: Make segment size publicly accessible
Some tests depend on segment size.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
49f094ad5f tests: Add test for row_cache::update() 2015-09-06 21:25:44 +02:00
Tomasz Grabiec
122bd8ea46 row_cache: Restore indentation 2015-09-06 21:25:44 +02:00
Tomasz Grabiec
d1f89b4eab row_cache: Use allocation_section
See #259.

When transferring mutations between memtable and cache, lsa sometimes
runs out of memory. This solves the first two points, keeping reserve
filled up and adjusting the amount of reserve based on execution
history.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
7efcde12aa row_cache: Introduce row_cache::touch()
Useful in tests for ensuring that certain entries survive eviction.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
24a5221280 row_cache: Avoid leaking of partitions when exception is thrown inside update() 2015-09-06 21:25:44 +02:00
Tomasz Grabiec
1d182903cd mutation_partition: Document exception guarantees of apply() 2015-09-06 21:25:44 +02:00
Tomasz Grabiec
c82325a76c lsa: Make region evictor signal forward progress
In some cases region may be in a state where it is not empty and
nothing could be evicted from it. For example when creating the first
entry, reclaimer may get invoked during creation before it gets
linked. We therefore can't rely on emptiness as a stop condition for
reclamation, the evction function shall signal us if it made forward
progress.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
94f0db933f lsa: Fix typo in the word 'emergency' 2015-09-06 21:24:59 +02:00