Commit Graph

6677 Commits

Author SHA1 Message Date
Avi Kivity
19f36cd3cc Merge "Batchlog manager - run loop on only one shard" from Calle
"* Runs the batchlog loop on only main cpu, but round-robins the actual work
   to each available shard in round-robin fashion.
 * Use gate to guard work loop instead of semaphore (better shutdown,
   eventually)
 * Actually _start_ the batch loop (not done previously)
 * Rename logger + add cpu# hint"

Fixes #424
2015-10-07 16:52:10 +03:00
Calle Wilund
a4c14d3d1d batchlog_manager: Add hint of which cpu timer callback is running on 2015-10-07 14:57:55 +02:00
Calle Wilund
6416c62d39 main: Actually start the batchlog_manager service loop
Was not invoked previously.
2015-10-07 14:30:09 +02:00
Calle Wilund
b46496da34 batchlog_manager: Rename logger
* More useful/referrable on command line (--log*)
* Matches class name (though not origin)
2015-10-07 14:30:09 +02:00
Calle Wilund
6f94a3bdad batchlog_manager: Use gate instead of semaphore
Since that exists now.
2015-10-07 14:30:09 +02:00
Calle Wilund
874da0eb67 batchlog_manager: Run timer loop on only one shard
Since replay is a "node global" operation, we should not attempt to
do it in parallel on each shard. It will just overlap/interfere.
Could just run this on cpu 0 or but since this _could_ be a
lengty operation, each timer callback is round-robined shards just in case...
2015-10-07 14:30:09 +02:00
Avi Kivity
a151268bfe Merge 2015-10-07 14:35:02 +03:00
Calle Wilund
246e8e24f2 replay_position: Make <= comparator simpler and cleaner 2015-10-07 14:34:22 +03:00
Avi Kivity
eccbf85e9d Merge "Truncation records per shard"
Fixes  #423

"Changes the "truncated_at" blob contents of system.local table. It now stores
N replay_positions, where N == # shards.

The system.local table schema remains unchanged, and older truncation data
is accepted, though it will for obvious reasons still be insufficient.

Since the data is opaque to the running instance, blob compatibilty with
origin should be irrelevant (and we're not really that now anyway).

Note that technically, changing shard cound inbetween runs could make us hold
on to RP data "longer than required", but this is
a.) Insignificant data sizes
b.) Data that is valid exactly once: When restarting a failed node and
    replaying. The "shards" only refer to "last run", and after that we don't
    care. At worst, we can get less than fresh data (not all shards manage
    to save truncation records before crash).

It is worth noting (and I've done do in the code) that the system.local table
+ sharding cause some rather silly inefficiencens, since for this (and others)
we store a value for each shard, each save which causes a global flush of the
systable, in turn delegated on all cores. So the op is N^2 in "db complexity".
At some point we should maybe consider if operations like "drop table" and
"truncate" should not be done on shard level, but on machine level, so it can
coordinate itself. But otoh, it is rare and not _very_ expensive either."
2015-10-07 14:33:22 +03:00
Avi Kivity
c48a826c65 db: fix string type incorrectly unvalidated
We call the conversion function that expectes a NUL terminated string,
but provide a string view, which is not.

Fix by using the begin/end variant, which doesn't require a NUL terminator.

Fixes #437.
2015-10-07 12:22:01 +02:00
Calle Wilund
a66c22f1ec commitlog_replayer: Acquire truncation RP:s per replayed shard
I.e. get them in bulk and fill in for all shards
2015-10-07 09:00:22 +02:00
Calle Wilund
17bd18b59c commitlog_replayer: Add logging message for exceptions in multi-file recover 2015-10-07 08:59:54 +02:00
Calle Wilund
3f1fa77979 commitlog_replayer: Fix broken comparison
A commitlog entry should be ignored if its position is <= highest recorded
position, not <.
2015-10-07 08:59:53 +02:00
Calle Wilund
271eb3ba02 replay_position: Add <= comparator 2015-10-07 08:59:53 +02:00
Calle Wilund
6b0ab79ecb system_keyspace: Keep per-shard truncation records
Fixes  #423
* CF ID now maps to a truncation record comprised of a set of 
  per-shard RP:s and a high-mark timestamp
* Retrieving RP:s are done in "bulk"
* Truncation time is calculated as max of all shards.

This version of the patch will accept "old" truncation data, though the 
result of applying it will most likely not be correct (just one shard)

Record is still kept as a blob, "new" format is indicated by 
record size.
2015-10-07 08:59:52 +02:00
Calle Wilund
199b72c6f3 commitlog: fix reader "offset" handling broken + ensure exceptions propagates
Must ensure we find a chunk/entry boundary still even when run
with a start offset, since file navigation in chunk based.
Was not observed as broken previously because
1.) We did not run with offsets
2.) The exception never reached caller.

Also make the reader silently ignore empty files.
2015-10-07 08:54:49 +02:00
Calle Wilund
024041c752 commitlog: make log message slightly more informative/correct 2015-10-07 08:54:49 +02:00
Calle Wilund
f7151cac61 cql3::untyped_result_set: Allow "get_map" to be explicit about result
type

Allow providing both hash/equal etc for resulting map, as well
as explicit data_types for the deserialization.
Also allow direct extraction of kv-pairs to iterator, for more advanced
unpacking.
2015-10-07 08:54:49 +02:00
Avi Kivity
29106ab802 Merge seastar upstream
* seastar 4a3071e...fba8ac6 (3):
  > stream.hh: Fix broken "set_exception".
  > configure.py: fix use of "echo -e"
  > deannoyify touch_directory
2015-10-07 09:44:58 +03:00
Gleb Natapov
358d93112f replace ad-hoc cql connection polling with new batch_flush() output stream API 2015-10-06 19:22:23 +03:00
Pekka Enberg
b40999b504 database: Fix drop_column_family() UUID lookup race
Remove the about to be dropped CF from the UUID lookup table before
truncating and stopping it. This closes a race window where new
operations based on the UUID might be initiated after truncate
completes.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:10:17 +02:00
Pekka Enberg
5878f62b18 db/schema_tables: Clean up indentation
Almost the whole file is (accidentally) indented four spaces to the
right for no reason. Fix that up because it's annoying as hell.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Pekka Enberg
1f9e769dd3 db/schema_tables: Remove obsolete ifdef'd code
Remove ifdef'd code that we won't be converting to C++ because of design
differences.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Avi Kivity
75dd123d01 Merge "CQL DROP KEYSPACE support" from Pekka
"This patch series implements support for CQL DROP KEYSPACE and makes the
test_keyspace CQL test in dtest pass:

  [penberg@nero urchin-dtest]$ nosetests -v cql_tests.py:TestCQL.keyspace_test
  keyspace_test (cql_tests.TestCQL) ... ok

  ----------------------------------------------------------------------
  Ran 1 test in 12.166s

  OK

  [penberg@nero urchin-dtest]$ nosetests -v cql_tests.py:TestCQL.table_test
  table_test (cql_tests.TestCQL) ... ok

  ----------------------------------------------------------------------
  Ran 1 test in 23.841s

  OK"
2015-10-06 15:19:33 +03:00
Pekka Enberg
da7b741f64 service/migration_manager: Implement announce_keyspace_drop()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
6e304cd58c db/schema_tables: Fix merge_keyspaces() to actually drop keyspaces
When we query schema keyspaces after we have applied a delete mutation,
the dropped keyspace does not exist in the "after" result set. Fix the
merge_keyspaces() algorithm to take that into account.

Makes merge_keyspaces() really call to database::drop_keyspace() when a
keyspace is dropped.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
5d9d1e28cb db/schema_tables: Implement make_drop_keyspace_mutations()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
9576b0ef23 database: Implement drop_keyspace()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
b66154e43a cql3: Fix capture-by-reference in drop_keyspace_statement
We need to capture the "is_local_only" boolean by value because it's an
argument to the function. Fixes an annoying bug where we failed to update
schema version because we pass "true" accidentally.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 14:53:35 +03:00
Tomasz Grabiec
bc1d159c1b Merge branch 'penberg/cql-drop-table/v3' from seastar-dev.git
From Pekka:

This patch series implements support for CQL DROP TABLE. It uses the newly
added truncate infrastructure under the hood. After this series, the
test_table CQL test in dtest passes:

  [penberg@nero urchin-dtest]$ nosetests -v cql_tests.py:TestCQL.table_test
  table_test (cql_tests.TestCQL) ... ok

  ----------------------------------------------------------------------
  Ran 1 test in 23.841s

  OK
2015-10-06 13:39:25 +02:00
Shlomi Livne
f347a024a1 update boost testsuite output
We are generating huge output xml files with the --jenkins flag. Update
the printout from all to test_suite - to reduce size and incldue the
info we need.

Error messages / failed assertions are still printed

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-10-06 14:27:19 +03:00
Pekka Enberg
042e9252d5 service/migration_manager: Implement announce_column_family_drop()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
633279415d db/schema_tables: Fix merge_tables() to actually drop tables
When we query schema tables after we have applied a delete mutation, the
dropped table does not exist in the "after" result set. Fix the
merge_tables() algorithm to take that into account.

Makes merge_tables() really call to database::drop_column_family() when
a table is dropped.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
82d20dba65 db/schema_tables: Implement make_drop_table_mutations()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
b89b70daa8 db/schema_tables: Wire up drop column notifications
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
b1e6ab144a database: Implement drop_column_family()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
afbb2f865d database: Add keyspace_metadata::remove_column_family() helper
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
0651ab6901 database: Futurize drop_column_family() function
Futurize drop_column_family() so that we can call truncate() from it.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:55 +03:00
Pekka Enberg
85ffaa5330 database: Add truncate() variant that does not look up CF by name
For drop_column_family(), we want to first remove the column_family from
lookup tables and truncate after that to avoid races. Introduce a
truncate() variant that takes keyspace and column_family references.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:54 +03:00
Pekka Enberg
baff913d91 cql3: Fix capture-by-reference in drop_table_statement
We need to capture the "is_local_only" boolean by value because it's an
argument to the function. Fixes an annoying bug where we failed to update
schema version because we pass "true" accidentally. Spotted by ASan.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 11:28:54 +03:00
Avi Kivity
2f56f72466 Merge seastar upstream
* seastar 0c402e1...4a3071e (3):
  > output stream flush batching
  > Update README with compilation issues - OOM
  > resource: fix memory leak in resource::allocate()
2015-10-06 11:17:29 +03:00
Avi Kivity
e342914265 Merge "Fixes for incremental backup" from Glauber
"The control over backups is now moved to the CF itself, from the storage
service. That allows us to simplify the code (while making it correct) for cases
in which the storage service is not available.

With this change, we no longer need the database config passed down to the
storage_service object. So that patch is reverted."
2015-10-05 14:36:26 +03:00
Glauber Costa
651937becf Revert "pass db::config to storage service as well"
This reverts commit c2b981cd82.
2015-10-05 13:21:33 +02:00
Glauber Costa
639ba2b99d incremental backups: move control to the CF level
Currently, we control incremental backups behavior from the storage service.
This creates some very concrete problems, since the storage service is not
always available and initialized.

The solution is to move it to the column family (and to the keyspace so we can
properly propagate the conf file value). When we change this from the api, we will
have to iterate over all of them, changing the value accordingly.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-05 13:16:11 +02:00
Glauber Costa
b619d244e8 storage_service: public access to the database object
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-05 13:15:27 +02:00
Glauber Costa
69d1358627 database: non const versions of get_keyspaces/column_families
We will need to change some properties of the keyspace / cf. We need an acessor
that is not marked as const.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-05 13:13:37 +02:00
Pekka Enberg
b74a9d99d5 db/schema_tables: Fix UTF-8 serialization
Use the utf8_type to serialize strings instead of using to_bytes().

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-05 09:26:15 +02:00
Avi Kivity
21bb5ea5c7 Add .gitattributes file to classify C++ source
With this, diffs become more pleasant to read, as access specifiers
no longer find their way into the hunk header.
2015-10-05 08:51:51 +02:00
Avi Kivity
7c23ec49ae Merge "Support incremental backups" from Glauber
"Generate backups when the configuration file indicates we should;
toggle behavior on/off through the API."
2015-10-04 13:49:20 +03:00
Avi Kivity
4ca4efbc9c Merge "Add cfstats support" from Amnon
"This series adds the functionality that is required so the nodetool cfstats
would work.

It complete the histogram support for read and write latency and add stub for
functionality that is needed but is not supported yet."
2015-10-04 13:38:30 +03:00