Commit Graph

6701 Commits

Author SHA1 Message Date
Takuya ASADA
eb1924a4e4 dist: fix file not found error on centos_dep/build_dependency.sh
We don't have boost.diff, and doesn't need it. So return to rpmbuild --rebuild.

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-14 14:12:46 +03:00
Pekka Enberg
2ed34b0e96 Merge seastar upstream
* seastar 1995676...78e3924 (5):
  > fix output stream batching
  > rpc: server connection shutdown fix
  > doc: add Seastar tutorial
  > resource: increase default reserve memory
  > http client: moved http_response_parser.rl from apps/seawreck into http directory

Adjust transport/server.cc for the demise of output_stream::batch_flush()
scylla-0.10
2015-10-12 16:12:35 +03:00
Glauber Costa
12ac9a1fbd do not calculate truncation time independently
Currently, we are calculating truncated_at during truncate() independently for
each shard. It will work if we're lucky, but it is fairly easy to trigger cases
in which each shard will end up with a slightly different time.

The main problem here, is that this time is used as the snapshot name when auto
snapshots are enabled. Previous to my last fixes, this would just generate two
separate directories in this case, which is wrong but not severe.

But after the fix, this means that both shards will wait for one another to
synchronize and this will hang the database.

Fix this by making sure that the truncation time is calculated before
invoke_on_all in all needed places.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-09 17:39:47 +03:00
Glauber Costa
4460f243a3 snapshots: fix json type
We are generating a general object ({}), whereas Cassandra 2.1.x generates an
array ([]). Let's do that as well to avoid surprising parsers.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 19:06:38 +03:00
Glauber Costa
55a5877d82 snapshots: handle jsondir creation for empty files case
We still need to write a manifest when there are no files in the snapshot.
But because we have never reached the touch_directory part in the sstables
loop for that case, nobody would have created jsondir in that case.

Since now all the file handling is done in the seal_snapshot phase, we should
just make sure the directory exists before initiating any other disk activity.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 19:06:33 +03:00
Glauber Costa
b03a474ca6 snapshots: get rid of empty tables optimization
We currently have one optimization that returns early when there are no tables
to be snapshotted.

However, because of the way we are writing the manifest now, this will cause
the shard that happens to have tables to be waiting forever. So we should get
rid of it. All shards need to pass through the synchronization point.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 19:06:28 +03:00
Glauber Costa
9ec7b9a213 snapshots: don't hash pending snapshots by snapshot name
If we are hashing more than one CF, the snapshot themselves will all have the same name.
This will cause the files from one of them to spill towards the other when writing the manifest.

The proper hash is the jsondir: that one is unique per manifest file.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 19:06:22 +03:00
Pekka Enberg
fc4e167ffd release: prepare for 0.10 2015-10-08 14:44:36 +03:00
Pekka Enberg
c7c6ebb813 Merge "Switch to gcc-5 on CentOS rpm, with some related fixes" from Takuya 2015-10-08 14:43:29 +03:00
Pekka Enberg
95012793e5 db/schema_tables: Wire up drop keyspace notifications
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-08 13:10:48 +02:00
Pekka Enberg
87d45cc58a service/migration_manager: Simplify notify_drop_keyspace()
There's no need to pass keyspace_metadata to notify_drop_keyspace()
because all we are interested in is the name. The keyspace has been
dropped so there's not much we could do with its metadata either.

Simplifies the next patch that wires up drop keyspace notification.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-08 13:10:48 +02:00
Avi Kivity
e5dca96af3 Merge "snapshots: fix global generation of the manifest file" from Glauber
"snapshotting the files themselves is easy: if more than one CF happens to link
an SSTable twice, all but one will fail, and we will end up with one copy.

The problem for us, is that the snapshot procedure is supposed to leave a
manifest file inside its directory.  So if we just call snapshot() from
multiple shards, only the last one will succeed, writing its own SSTables to
the manifest leaving all other shards' SSTables unaccounted for.

Moreover, for things like drop table, the operation should only proceed when
the snapshot is complete. That includes the manifest file being correctly
written, and for this reason we need to wait for all shards to finish their
snapshotting before we can move on."
2015-10-08 13:08:31 +03:00
Glauber Costa
725ae03772 snapshots: write the manifest file from a single shard
Currently, the snapshot code has all shards writing the manifest file. This is
wrong, because all previous writes to the last will be overwritten. This patch
fixes it, by synchronizing all writes and leaving just one of the shards with the
task of closing the manifest.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 11:36:36 +02:00
Glauber Costa
25d24222fe snapshots: separate manifest creation
The way manifest creation is currently done is wrong: instead of a final
manifest containing all files from all shards, the current code writes a
manifest containing just the files from the shard that happens to be the
unlucky loser of the writing race.

In preparation to fix that, separate the manifest creation code from the rest.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 11:36:36 +02:00
Glauber Costa
abc63e4669 snapshots: clarify and fix sync behavior
We do need to sync jsondir after we write the manifest file (previously done,
but with a question), and before we start it (not previously done) to guarantee
that the manifest file won't reference any file that is not visible yet.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 11:36:36 +02:00
Glauber Costa
ca4babdb57 snapshots: close file after flush
We are currently flushing it, but not closing it.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 11:36:36 +02:00
Avi Kivity
bd7bf3ea84 Merge seastar upstream
* seastar 6664a83...1995676 (1):
  > introduce sync_directory
2015-10-08 12:29:17 +03:00
Takuya ASADA
3a77188d47 dist: move yum install first
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-08 06:29:06 +09:00
Takuya ASADA
10dd1781be dist: Stop specify required libraries manually, use AutoReqProv
We don't need specify dynamically linked library here. AutoReqProv detects it.

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-08 06:15:46 +09:00
Takuya ASADA
137fe19ea9 dist: support glob pattern on do_install()
Currently do_install() does not function correctly when passing glob pattern & package are already installed.

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-08 06:15:46 +09:00
Takuya ASADA
9cb2776606 dist: switch CentOS gcc to 5.1.1-4
Since we don't want to let user to upgrade libstdc++, we will link libstdc++ statically, using ./configure.py --static-stdc++

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-08 06:15:46 +09:00
Takuya ASADA
0e13757d92 configure.py: add --static-stdc++ to link libstdc++ statically
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-08 06:15:46 +09:00
Avi Kivity
bffcbc592f Merge seastar upstream
* seastar fba8ac6...6664a83 (2):
  > do not add failed stream to output stream poller.
  > rpc: wait for all data to be sent before closing
2015-10-07 18:33:04 +03:00
Calle Wilund
42c086a5cd batchlog_manager: Fixup includes + exception handling
* Fix exception handling in batch loop (report + still re-arm)
* Cleanup seastar include reference style
2015-10-07 17:06:34 +03:00
Avi Kivity
19f36cd3cc Merge "Batchlog manager - run loop on only one shard" from Calle
"* Runs the batchlog loop on only main cpu, but round-robins the actual work
   to each available shard in round-robin fashion.
 * Use gate to guard work loop instead of semaphore (better shutdown,
   eventually)
 * Actually _start_ the batch loop (not done previously)
 * Rename logger + add cpu# hint"

Fixes #424
2015-10-07 16:52:10 +03:00
Calle Wilund
a4c14d3d1d batchlog_manager: Add hint of which cpu timer callback is running on 2015-10-07 14:57:55 +02:00
Calle Wilund
6416c62d39 main: Actually start the batchlog_manager service loop
Was not invoked previously.
2015-10-07 14:30:09 +02:00
Calle Wilund
b46496da34 batchlog_manager: Rename logger
* More useful/referrable on command line (--log*)
* Matches class name (though not origin)
2015-10-07 14:30:09 +02:00
Calle Wilund
6f94a3bdad batchlog_manager: Use gate instead of semaphore
Since that exists now.
2015-10-07 14:30:09 +02:00
Calle Wilund
874da0eb67 batchlog_manager: Run timer loop on only one shard
Since replay is a "node global" operation, we should not attempt to
do it in parallel on each shard. It will just overlap/interfere.
Could just run this on cpu 0 or but since this _could_ be a
lengty operation, each timer callback is round-robined shards just in case...
2015-10-07 14:30:09 +02:00
Avi Kivity
a151268bfe Merge 2015-10-07 14:35:02 +03:00
Calle Wilund
246e8e24f2 replay_position: Make <= comparator simpler and cleaner 2015-10-07 14:34:22 +03:00
Avi Kivity
eccbf85e9d Merge "Truncation records per shard"
Fixes  #423

"Changes the "truncated_at" blob contents of system.local table. It now stores
N replay_positions, where N == # shards.

The system.local table schema remains unchanged, and older truncation data
is accepted, though it will for obvious reasons still be insufficient.

Since the data is opaque to the running instance, blob compatibilty with
origin should be irrelevant (and we're not really that now anyway).

Note that technically, changing shard cound inbetween runs could make us hold
on to RP data "longer than required", but this is
a.) Insignificant data sizes
b.) Data that is valid exactly once: When restarting a failed node and
    replaying. The "shards" only refer to "last run", and after that we don't
    care. At worst, we can get less than fresh data (not all shards manage
    to save truncation records before crash).

It is worth noting (and I've done do in the code) that the system.local table
+ sharding cause some rather silly inefficiencens, since for this (and others)
we store a value for each shard, each save which causes a global flush of the
systable, in turn delegated on all cores. So the op is N^2 in "db complexity".
At some point we should maybe consider if operations like "drop table" and
"truncate" should not be done on shard level, but on machine level, so it can
coordinate itself. But otoh, it is rare and not _very_ expensive either."
2015-10-07 14:33:22 +03:00
Avi Kivity
c48a826c65 db: fix string type incorrectly unvalidated
We call the conversion function that expectes a NUL terminated string,
but provide a string view, which is not.

Fix by using the begin/end variant, which doesn't require a NUL terminator.

Fixes #437.
2015-10-07 12:22:01 +02:00
Calle Wilund
a66c22f1ec commitlog_replayer: Acquire truncation RP:s per replayed shard
I.e. get them in bulk and fill in for all shards
2015-10-07 09:00:22 +02:00
Calle Wilund
17bd18b59c commitlog_replayer: Add logging message for exceptions in multi-file recover 2015-10-07 08:59:54 +02:00
Calle Wilund
3f1fa77979 commitlog_replayer: Fix broken comparison
A commitlog entry should be ignored if its position is <= highest recorded
position, not <.
2015-10-07 08:59:53 +02:00
Calle Wilund
271eb3ba02 replay_position: Add <= comparator 2015-10-07 08:59:53 +02:00
Calle Wilund
6b0ab79ecb system_keyspace: Keep per-shard truncation records
Fixes  #423
* CF ID now maps to a truncation record comprised of a set of 
  per-shard RP:s and a high-mark timestamp
* Retrieving RP:s are done in "bulk"
* Truncation time is calculated as max of all shards.

This version of the patch will accept "old" truncation data, though the 
result of applying it will most likely not be correct (just one shard)

Record is still kept as a blob, "new" format is indicated by 
record size.
2015-10-07 08:59:52 +02:00
Calle Wilund
199b72c6f3 commitlog: fix reader "offset" handling broken + ensure exceptions propagates
Must ensure we find a chunk/entry boundary still even when run
with a start offset, since file navigation in chunk based.
Was not observed as broken previously because
1.) We did not run with offsets
2.) The exception never reached caller.

Also make the reader silently ignore empty files.
2015-10-07 08:54:49 +02:00
Calle Wilund
024041c752 commitlog: make log message slightly more informative/correct 2015-10-07 08:54:49 +02:00
Calle Wilund
f7151cac61 cql3::untyped_result_set: Allow "get_map" to be explicit about result
type

Allow providing both hash/equal etc for resulting map, as well
as explicit data_types for the deserialization.
Also allow direct extraction of kv-pairs to iterator, for more advanced
unpacking.
2015-10-07 08:54:49 +02:00
Avi Kivity
29106ab802 Merge seastar upstream
* seastar 4a3071e...fba8ac6 (3):
  > stream.hh: Fix broken "set_exception".
  > configure.py: fix use of "echo -e"
  > deannoyify touch_directory
2015-10-07 09:44:58 +03:00
Gleb Natapov
358d93112f replace ad-hoc cql connection polling with new batch_flush() output stream API 2015-10-06 19:22:23 +03:00
Pekka Enberg
b40999b504 database: Fix drop_column_family() UUID lookup race
Remove the about to be dropped CF from the UUID lookup table before
truncating and stopping it. This closes a race window where new
operations based on the UUID might be initiated after truncate
completes.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:10:17 +02:00
Pekka Enberg
5878f62b18 db/schema_tables: Clean up indentation
Almost the whole file is (accidentally) indented four spaces to the
right for no reason. Fix that up because it's annoying as hell.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Pekka Enberg
1f9e769dd3 db/schema_tables: Remove obsolete ifdef'd code
Remove ifdef'd code that we won't be converting to C++ because of design
differences.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Avi Kivity
75dd123d01 Merge "CQL DROP KEYSPACE support" from Pekka
"This patch series implements support for CQL DROP KEYSPACE and makes the
test_keyspace CQL test in dtest pass:

  [penberg@nero urchin-dtest]$ nosetests -v cql_tests.py:TestCQL.keyspace_test
  keyspace_test (cql_tests.TestCQL) ... ok

  ----------------------------------------------------------------------
  Ran 1 test in 12.166s

  OK

  [penberg@nero urchin-dtest]$ nosetests -v cql_tests.py:TestCQL.table_test
  table_test (cql_tests.TestCQL) ... ok

  ----------------------------------------------------------------------
  Ran 1 test in 23.841s

  OK"
2015-10-06 15:19:33 +03:00
Pekka Enberg
da7b741f64 service/migration_manager: Implement announce_keyspace_drop()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00
Pekka Enberg
6e304cd58c db/schema_tables: Fix merge_keyspaces() to actually drop keyspaces
When we query schema keyspaces after we have applied a delete mutation,
the dropped keyspace does not exist in the "after" result set. Fix the
merge_keyspaces() algorithm to take that into account.

Makes merge_keyspaces() really call to database::drop_keyspace() when a
keyspace is dropped.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00