Commit Graph

372 Commits

Author SHA1 Message Date
Calle Wilund
3f1a91b89c Commitlog: do not eagerly create first segment on init
Deferring makes it easier to separate old segments from new, which in turn
helps replay logic.
2015-08-31 13:11:44 +02:00
Avi Kivity
5f62f7a288 Revert "Merge "Commit log replay" from Calle"
Due to test breakage.

This reverts commit 43a4491043, reversing
changes made to 5dcf1ab71a.
2015-08-27 12:39:08 +03:00
Asias He
80c996a315 db/system_keyspace: Fix get_local_host_id
Before:
host_id in system.local is empty

After:
host_id in system.local is inserted correctly

This fixes a hasty problem that we always get a new host_id when
booting up a node with data.
2015-08-27 11:01:07 +03:00
Avi Kivity
43a4491043 Merge "Commit log replay" from Calle
"Initial implementation/transposition of commit log replay.

* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
  max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
  sstables are inspected for high water mark, and then replayed from
  those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
  per _previous_ runs shards, not current.

Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
  against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
  so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
  like origin. Partly because I am lazy, but also partly because our serial
  format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
  file, detailing which keyspace/cf:s to replay). Partly because we have no
  system properties.

There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
2015-08-27 10:53:36 +03:00
Glauber Costa
391eea564e system_tables: implement load_host_id
A simple translation from the original code.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 19:16:30 -05:00
Glauber Costa
0fd2861293 system_tables: implement load_tokens
A simple translation from the original code

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 19:16:30 -05:00
Calle Wilund
2a1c7d2587 CommitLogReplayer: Java -> C++
Initial implementation
2015-08-25 09:41:56 +02:00
Calle Wilund
86a97fea4c Commitlog: Allow skipping X bytes in commit log reader
Also refactor reader into named methods for debugging sanity.
2015-08-25 09:41:55 +02:00
Calle Wilund
37cfc09e91 Commitlog: Handle full paths in descriptor file name parse. 2015-08-25 09:41:55 +02:00
Calle Wilund
4364d72ca3 Commitlog: Expose convinience method "list_existing_segments" 2015-08-25 09:41:54 +02:00
Calle Wilund
a3a02968ab Commitlog: Expose list_existing_descriptors 2015-08-25 09:41:54 +02:00
Calle Wilund
fcb87471b9 Commitlog: Make file reader provide replay_position for entries 2015-08-25 09:40:53 +02:00
Calle Wilund
db6370ad87 Commitlog: Make descriptor type visible/usable from outside 2015-08-25 09:40:53 +02:00
Calle Wilund
4f24b9795e Commitlog: change the ID generation scheme
* Make it more like origin, i.e. based on wall clock time of app start
* Encode shard ID in the, RP segement ID, to ensure RP:s and segement names
  are unique per shard
2015-08-25 09:40:52 +02:00
Asias He
22ee468428 db/system_keyspace: Fix set_bootstrap_state
We set status to COMPLETED in join_token_ring

   set_bootstrap_state(db::system_keyspace::bootstrap_state::COMPLETED)

but

   cqlsh 127.0.0.$i -e "SELECT * from system.local;"

shows

    bootstrapped -> IN_PROGRESS

The static sstring state_name is the bad boy.
2015-08-24 18:54:42 +08:00
Avi Kivity
8a4648761c tests: make test cql environment use volatile system keyspace
Prevents hangs due to the database not being able to persist a memtable.

Tested-by: Asias He <asias@cloudius-systems.com>
2015-08-24 13:50:22 +03:00
Pekka Enberg
9476e3f19f db/index: Kill Java code
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 11:51:50 +03:00
Pekka Enberg
544c7936d8 db/commitlog: Kill Java code
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 11:51:49 +03:00
Calle Wilund
ac74dd6159 Commitlog: Make "position" type 32-bit to align replay_position with Origin 2015-08-24 10:05:44 +02:00
Calle Wilund
d50986ef31 Commitlog: do not eagerly create first segment on init
Deferring makes it easier to separate old segments from new, which in turn
helps replay logic.
2015-08-24 10:05:44 +02:00
Pekka Enberg
5dbf1baed4 db/composites: Kill Java code
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-24 10:56:30 +03:00
Avi Kivity
7b67b04822 db: wire up max memtable size configuration 2015-08-19 13:17:27 +03:00
Asias He
67953a65b6 db/system_keyspace: Stub load_host_ids 2015-08-18 17:06:03 +08:00
Asias He
ab40ab6c19 db/system_keyspace: Stub load_tokens 2015-08-18 17:06:02 +08:00
Asias He
7f98a89968 db/system_keyspace: Introduce init_local_cache 2015-08-18 17:06:02 +08:00
Glauber Costa
0177c7fed1 system keyspace: implement get_bootstrap_state
To avoid spreading the futures all over, we will resort to a cache with this,
the same way we did for the dc/rack information.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-17 11:03:37 -07:00
Glauber Costa
20590db87f system keyspace: implement set_bootstrap_state
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-17 11:03:37 -07:00
Glauber Costa
8a50534119 system keyspace: implement get_saved_tokens
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-17 11:03:37 -07:00
Glauber Costa
6a682d0e49 storage_service: futurize get_tokens
Because all its users are already futurized, this is actually an easy one.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-17 11:03:37 -07:00
Glauber Costa
bebb2abe4b system keyspace: factor out local_cache start code
It will now be used for other values as well.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-17 11:03:36 -07:00
Calle Wilund
8f0f4e7945 Commitlog: do more extensive dir entry probes to determine type
Since directory_entry "type" might not be set.
Ensuring that code does not remain future free or easy to read.

Fixes #157.
2015-08-17 16:56:31 +03:00
Vlad Zolotarov
4e55033dc9 db::config: improve a help output for --endpoint_snitch parameter
- Improve the output formating.
   - Comment out not supported snitches.

Fixes issue #124

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-08-16 12:55:34 +03:00
Avi Kivity
11bf4efc72 Merge "Some changes to deal with allocation failures in CL" from Calle
"Related to 108
Does not fix the problem (fully at least), but at least:
* Throws exceptions instead of crashing
* Tries to back off slighly (allocate less) if possible
* Logs it

Also recycles segments to keep them from being fragmented by mem system"
2015-08-12 17:47:25 +03:00
Calle Wilund
2db7791c6a Commitlog: Attempt to reduce allocation size for segment if alloc fails 2015-08-12 16:20:12 +02:00
Calle Wilund
4fe98d3acf Commitlog: Throw bad_alloc on memalign fail (avoid sigsegv later) 2015-08-12 16:20:11 +02:00
Calle Wilund
7191a130bb Commitlog: recycle buffers to reduce fragmentation. 2015-08-12 16:20:11 +02:00
Gleb Natapov
0b3d2de2f1 Fix mutation write timeout exception reporting
Make it compatible with CQL specification
2015-08-12 14:58:48 +03:00
Asias He
ce927105d8 db/system_keyspace: Implement update_local_tokens 2015-08-12 07:50:26 +08:00
Asias He
95dd307597 db/system_keyspace: Remove duplicated commented out code
I'm not sure what happened. We have the same commented code in both .hh
and .cc. It is very confusing when enabling some of the code. Let's
remove the duplicated code in .cc and leave the in .hh only.
2015-08-12 07:50:26 +08:00
Asias He
96fe749141 db/system_keyspace: Stub get_bootstrap_state and friends 2015-08-12 07:50:26 +08:00
Asias He
5cb5050ca1 system_keyspace: Stub get_saved_tokens 2015-08-12 07:50:26 +08:00
Calle Wilund
9a52ad84b1 BatchlogManager: make blm globally reachable distributed like other objects 2015-08-11 17:10:17 +02:00
Calle Wilund
0ded44eeee BatchlogManager: make endpoint_filter method + implement 2015-08-11 17:10:16 +02:00
Calle Wilund
b7cdd189e7 BatchlogManager: make constructible from distributed<db> (to fit main init) 2015-08-11 09:46:59 +02:00
Calle Wilund
6ac6d644be Commitlog: add logging
Note: pretty lame logging, but modeled after origin.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-08-10 18:42:41 +03:00
Shlomi Livne
cd57f2e8c4 Enable num_tokens in config
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-08-10 18:04:40 +03:00
Pekka Enberg
3bac70cb59 db/schema_tables: Fix use-after-free in create_table_from_table_partition()
Fixes #119 and fixes #120.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-10 14:44:50 +03:00
Avi Kivity
a8ff8ea442 commitlog: switch to faster crc32 implementation 2015-08-09 00:05:36 +03:00
Glauber Costa
92031be642 index_interval: another field for schema_columnfamilies
There is another field I missed, index_interval. It is not actually used for
2.1.8 - so that's why it is easy to stop, but it at least exists.

2.1.8 already has "min_index_interval" and "max_index_interval". If we see a
table that contains index_interval, that will become "min_index_interval".

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 09:30:54 -05:00
Glauber Costa
0237a73e05 system_keyspace: make collections multi-cell
They are multi-cell in Origin. This has nothing to do with 2.2 vs 2.1,
and it is just a plain bug.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 09:30:54 -05:00