Commit Graph

820 Commits

Author SHA1 Message Date
Avi Kivity
987294a412 Add missing copyrights 2015-09-20 10:16:11 +03:00
Avi Kivity
dcdc925b86 Revert "Commitlog: Pre-allocate "reserve" segments"
This reverts commit cbf3b63853, due to
reports of increased latency (instead of the opposite).
2015-09-19 09:26:39 +03:00
Calle Wilund
cbf3b63853 Commitlog: Pre-allocate "reserve" segments
Refs #356

Pre-allocates N segments from timer task. N is "adaptive" in that it is
increased (to a max) every time segement acquisition is forced to allocate
a new instead of picking from pre-alloc (reserve) list. The idea is that it is
easier to adapt how many segments we consume per timer quanta than the timer
quanta itself.

Also does disk pressure check and flush from timer task now. Note that the
check is still only done max once every new segment.

Some logging cleanup/betterment also to make behaviour easier to trace.

Reserve segments start out at zero length, and are still deleted when finished.
This is because otherwise we'd still have to clear the file to be able to
properly parse it later (given that is can be a "half" file due to power fail
etc). This might need revisiting as well.

With this patch, there should be no case (except flush starvation) where
"add_mutation" actually waits for a (potentially) blocking op (disk).
Note that since the amount of reserve is increased as needed, there will
be occasional cases where a new segment is created in the alloc path
until the system finds equilebrium. But this should only be during a breif
warmup.
2015-09-17 19:54:28 +03:00
Calle Wilund
ca0dac72b1 commitlog_test: fix test sync in test_commitlog_delete_when_over_disk_limit
Patch "Fix some timing/latency issues with sync" changed new_segment to
_not_ wait for flush to finish. This means that checking actual files on
disk in the test case might race.
Lucklily, we can more or less just check the segment list instead
(added recently-ish)
2015-09-16 20:38:59 +03:00
Raphael S. Carvalho
1bd3a2d4bc sstable: create temporary TOC at an early stage
Currently, we create a temporary TOC file after we are done writing
all the other components. However, we want to create a temporary
TOC before starting to write any other component.
So if there is a missing TOC, there is likely to be a corruption,
so we should refuse to boot and provide the sysadmin with a
detailed message. If there is a temporary TOC, it means that there
was a sudden shutdown while the sstable was being written.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-13 03:02:17 -03:00
Avi Kivity
13f86823f9 Merge "Enable persistence in tests using cql_test_env" from Tomasz
"The motivation is to exercise more code during tests, and possibly also avoid
some special casing just for tests in the future. Sstables will be persisted
in a unique temporary directory which is auto-removed when environment is
torn down."
2015-09-10 17:38:26 +03:00
Paweł Dziepak
6a0d4e3ade client_state: verify that keyspace exist
Fixes #323.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-10 13:58:48 +03:00
Tomasz Grabiec
ac57680b08 tests: cql_test_env: Enable persistence in tests
Gives more coverage.
2015-09-09 12:58:28 +02:00
Tomasz Grabiec
a5c46c26ce tests: cql_test_env: Move initialization to start()
It's easier to set members directly rather than pass them to the
constructor of in_memory_cql_env. Plus, stop() now matches start() and
not an external function.
2015-09-09 12:58:09 +02:00
Tomasz Grabiec
7fb0806ba2 tests: Add missing include to sstable_test.hh
Broken by 320ff132f8.
2015-09-09 12:36:00 +02:00
Tomasz Grabiec
db2a82a693 tests: cql_test_env: Move initialization helpers to the top 2015-09-09 12:28:54 +02:00
Tomasz Grabiec
b2c2eb6cd2 tests: Add test exploiting flush while scanning issue 2015-09-09 10:38:43 +02:00
Tomasz Grabiec
b5845e96e5 tests: Fix liveness issue in mutation_test 2015-09-09 10:38:43 +02:00
Calle Wilund
ee2a479731 CQL Test Env: Fixup for test shutdown errors caused by shutdown patch
Refs #293

Even more horrible that the shutdown patch. Tests using cql_test_env
are dependant on init.cc functions, but then scylla stopped being shut down
properly, those tests did to -> assert in sharded.hh

Yet another temp patch, simply duplicating the init.cc code for clq_test_env
to ensure we get what we think.
2015-09-09 10:15:11 +03:00
Avi Kivity
8405aa1c95 Merge "Add decimal type" from Paweł
"These patches add support for decimal type.

Fixes #146."
2015-09-08 19:03:37 +03:00
Paweł Dziepak
e1d4acdcf6 tests/cql: add decimal type to all types test
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-08 16:37:24 +02:00
Paweł Dziepak
8cae940d07 tests/types: add tests for decimal type
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-08 16:04:48 +02:00
Tomasz Grabiec
b4835756fd tests: Fix compilation error
Introdued in 920fe4278a
2015-09-08 12:52:30 +02:00
Tomasz Grabiec
920fe4278a Cleanup leftovers after compaction_counter to reclaim_counter rename 2015-09-08 10:19:19 +02:00
Tomasz Grabiec
15ae1a92cb Merge branch 'pdziepak/compaction-remove-items/v4' from seastar-dev.git
From Pawel:

This series makes compaction remove items that are no longer items:
 - expired cells are changed into tombstones
 - items covered by higher level tombstones are removed
 - expired tombstones are removed if possible

Fixes #70.
Fixes #71.
2015-09-08 09:23:00 +02:00
Paweł Dziepak
b17f5c442f tests/sstable: uncomment part of compaction test
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-07 21:21:38 +02:00
Paweł Dziepak
969fe6b878 sstables: make compact_sstables() take ref to column_family
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-07 21:20:32 +02:00
Paweł Dziepak
5fa42d6b5f tests/sstables: construct schema using schema_builder
schema_builder is necessary to set gc_grace_period.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-07 21:20:32 +02:00
Calle Wilund
d614143f5e Commitlog/database: Fixup series "Commit log flush request on disk overflow"
Also at seastar-dev: calle/commitlog_flush_v3
(And, yes, this time I _did_ update the remote!)

Refs #262

Commit of original series was done on stale version (v2) due to authors
inability to multitask and update git repos.

v3:
* Removed future<> return value from callbacks. I.e. flush callback is now
  only fully syncronous over actual call
2015-09-07 21:29:19 +03:00
Avi Kivity
dee9060b12 Merge "Commit log flush request on disk overflow" from Calle
"Fixes #262

Handles CL disk size exceeding configured max size by calling flush handlers
for each dirty CF id / high replay_position mark. (Instead of uncontrolled
delete as previously).

* Increased default max disk size to 8GB. Same as Origin/scylla.yaml (so no
   real change, but synced).
* Divide the max disk size by cpus (so sum of all shards == max)
* Abstract flush callbacks in CL
* Handler in DB that initiates memtable->sstable writes when called.

Note that the flush request is done "syncronously" in new_segment() (i.e.
when getting a new segment and crossing threshold). This is however more or
less congruent with Origin, which will do a request-sync in the corresponding
case.
Actual dealing with the request should at least in production code however be
done async, and in DB it is, i.e. we initiate sstable writes. Hopefully
they finish soon, and CL segments will be released (before next segment is
allocated).

If the flush request does _not_ eventually result in any CF:s becoming
clean and segments released we could potentially be issuing flushes
repeatedly, but never more often than on every new segment."
2015-09-07 18:46:48 +03:00
Paweł Dziepak
ac602b13b5 tests: fix signed/unsigned comparison
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-09-07 16:41:00 +02:00
Avi Kivity
e37dfab853 Merge "Stability improvements" from Tomasz
"Fixes #259 and other problems found along the way."
2015-09-07 16:45:44 +03:00
Calle Wilund
fdb921afb2 Commitlog: Add flushing of segment CF:s on disk overflow
* Do not throw away commitlog segments on disk size overflow. 
  Issue a flush request (i.e. calculate RP we want to free unto, 
  and for all dirty CF:s, do a request).
  "Abstracted" as registerable callback. I.e. DB:s responsibility 
  to actually do something with it.
2015-09-07 13:21:43 +02:00
Tomasz Grabiec
bf6062493e tests: Introduce tests/perf_row_cache_update 2015-09-07 09:41:36 +02:00
Tomasz Grabiec
10453c71d2 tests: perf: Make iterations between clock readings in time_it() configurable 2015-09-07 09:41:36 +02:00
Asias He
7cc768a864 gossip: Fix wrong cluster name and partitioner name
Right now, gossip returns hard coded cluster and partitioner name.

  sstring get_cluster_name() {
      // FIXME: DatabaseDescriptor.getClusterName()
      return "my_cluster_name";
  }
  sstring get_partitioner_name() {
      // FIXME: DatabaseDescriptor.getPartitionerName()
      return "my_partitioner_name";
  }

Fix it by setting the correct name from configure option.

With this

   cqlsh 127.0.0.$i -e "SELECT * from system.local;

returns correct cluster_name.

Fixes #291
2015-09-07 09:21:18 +03:00
Tomasz Grabiec
49bf844418 tests: Introduce row_cache_alloc_stress
Tests stability of row_cache operations under low/fragmented memory.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
49f094ad5f tests: Add test for row_cache::update() 2015-09-06 21:25:44 +02:00
Tomasz Grabiec
c82325a76c lsa: Make region evictor signal forward progress
In some cases region may be in a state where it is not empty and
nothing could be evicted from it. For example when creating the first
entry, reclaimer may get invoked during creation before it gets
linked. We therefore can't rely on emptiness as a stop condition for
reclamation, the evction function shall signal us if it made forward
progress.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
704cfc13d8 tests: cql_query_test: Init local cache only once
It's a singleton, so we can't attempt to init it more than once.

Fixes cql_query_test failure:

/home/tgrabiec/src/urchin2/seastar/core/future.hh:315: void future_state<>::set(): Assertion `_u.st == state::future' failed.
unknown location(0): fatal error in "test_create_table_statement": signal: SIGABRT (application abort requested)
seastar/tests/test-utils.cc(31): last checkpoint
2015-09-04 20:01:55 +02:00
Glauber Costa
b1c59ab995 sstable_mutation_test: test condition related to #188
This patch tests that collection within a mutation behave properly.
That is what lead to #188.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-09-02 06:01:39 +03:00
Tomasz Grabiec
870e9e5729 lsa: Replace compaction_lock with broader reclaim_lock
Disabling compaction of a region is currently done in order to keep
the references valid. But disabling only compaction is not enough, we
also need to disable eviction, as it also invalidates
references. Rather than introducing another type of lock, compaction
and eviction are controlled together, generalized as "reclaiming"
(hence the reclaim_lock).
2015-09-01 17:29:04 +03:00
Tomasz Grabiec
3115a1aaa0 tests: logalloc_test: Disable test_compaction_lock with default allocator
It relies on the fact that the process has a fixed amount of memory
assigned and std::bad_alloc is thrown in a timely manner when it fills
up, which is the case for seastar's allocator, but not with the
default allocator. With the latter the OOM killer kills the process.
2015-09-01 15:17:43 +03:00
Tomasz Grabiec
66fcff8ff9 tests: Introduce tests for lsa eviction 2015-08-31 21:57:23 +02:00
Tomasz Grabiec
2d6d15308e tests: logalloc_test: Add test for compaction_lock 2015-08-31 21:50:17 +02:00
Tomasz Grabiec
29e33dee4a tests: mutation_test: Restore indentation 2015-08-31 21:50:17 +02:00
Tomasz Grabiec
ff8c81b25f memtable: Encapsulate unsafe accessors 2015-08-31 21:50:17 +02:00
Calle Wilund
9ba84e458a Commitlog: Handle partial writes in segment::cycle
* Fixes #247
* Re-introduce test_allocation_failure, but allow for the "failure" to not
  happen. I.e. if run with low memory settings, the test will check that
  allocation failure is graceful. With lots of memory it will check partial
  write.
2015-08-31 20:02:05 +03:00
Paweł Dziepak
78eb61b38e tests: add test for managed_vector
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-31 17:29:16 +02:00
Paweł Dziepak
f1167a594a tests/cql_env: make sure that value views are correct
Query options need to have correct _value_views in order to
get_value_at() to work. With this patch we switch to constructor that
generates value views from the passed values and sets remaining options
to their default values.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-31 17:25:36 +02:00
Paweł Dziepak
4b9791230a tests/perf/simple_query: fix write mode
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-31 17:25:32 +02:00
Avi Kivity
f2a79aa7f6 Merge Prepare for closing sstables, part 1
Read-ahead will require that we close input_streams.  As part of that
we have to close sstables, and mutation_readers (which encapsulate
input_streams).  This is part 1 of a patchset series to do that.

(The overarching goal is to enable read-ahead for sstables, see #244)

Conflicts:
	sstables/compaction.cc
2015-08-31 16:15:18 +03:00
Avi Kivity
702de43ce3 Merge "Commit log replay" from Calle
"Initial implementation/transposition of commit log replay.

* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
  max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
  sstables are inspected for high water mark, and then replayed from
  those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
  per _previous_ runs shards, not current.

Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
  against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
  so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
  like origin. Partly because I am lazy, but also partly because our serial
  format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
  file, detailing which keyspace/cf:s to replay). Partly because we have no
  system properties.

There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"

Fixes #98.
2015-08-31 15:58:12 +03:00
Avi Kivity
7090dffe91 mutation_reader: switch to a class based implementation
Using a lambda for implementing a mutation_reader is nifty, but does not
allow us to add methods.

Switch to a class-based implementation in anticipation of adding a close()
method.
2015-08-31 15:53:53 +03:00
Calle Wilund
e068ffb5a5 Commitlog: Make file reader provide replay_position for entries 2015-08-31 14:29:47 +02:00