Commit Graph

275 Commits

Author SHA1 Message Date
Nadav Har'El
b6fd3dd623 storage_proxy::make_local_reader: support wrap-around ranges
If the requested range wraps around the end of the ring, make_local_reader()
ends up trying to create an sstable reader with that wrapping range, which
is not supported. Also our shard-looping code is wrong in a wrapping range.

The solution is simple: split the wrap-around range into two (from the start
of the range until the end of the ring, and then from the beginning of the
ring until the end of the range), and read each of these subranges normally.

This feature is needed to allow streaming a range that wraps around, because
streaming currently uses make_local_reader().

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-07 11:38:03 +03:00
Nadav Har'El
6af8fb1d4a storage_proxy::make_local_reader: simplify lifetime of range parameter
make_local_reader takes a partition-range parameter, passed by *reference*.
The meaning of this passing-by-reference is not documented: when
make_local_reader returns, which of the following two statements holds?

  1. make_local_reader still holds this reference, so the caller is
     forced to keep the range object alive. For how long?
or
  2. make_local_reader reads the range before returning, and when it
     returns, the caller continues to "own" the range object, and is
     allowed to do anything, including delete it.

The principle of least surprise suggests that #2 is better, unless
there is a very good reason to choose #1, and unless this requirement to
keep the argument alive (and for how long) was clearly documented.
But neither is the case here - the overhead of copying the argument is
negligable compared to the other overheads of make_local_reader, and
nothing was documented.

But it turns out the current code did #1 - the address of the range was
passed around *after* make_local_reader returns - the different readers
on different cpus continue to use it to eventually find the right sstable
byte range to iterate over. It very easy to forget this and call
make_local_reader on a on-stack range variable, and the result is a
hard-to-debug use-after-free mess.

This patch switches us to situation #2: Before make_local_reader returns,
the range is copied to whoever needs to hold it after the return, namely
each individual shard_reader. The patch to do this is trivial (one
character removed).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-07 11:38:03 +03:00
Pekka Enberg
921d9386cc service/storage_service: Endpoint lifecycle subscriber hooks
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-06 15:26:40 +03:00
Pekka Enberg
8d0d60168e service: Convert IEndpointLifecycleSubscriber to C++
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-06 15:26:40 +03:00
Pekka Enberg
bbe2c52d9b service: Import IEndpointLifecycleSubscriber.java
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-06 15:26:40 +03:00
Nadav Har'El
91d804de9b storage_service: fix compilation on gcc 4.9
gcc 4.9 fails to compiles the following legal code (which compiles fine on
gcc 5.1):

	#include <unordered_set>
	std::unordered_set<int> hi() {
        	return {};
	}

Work around this problem to make Scylla compile on gcc 4.9 again.

Note that this bug is specific to std::unordered_set - we have other places
in the code code which "return {}" for std::set, and those work fine.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-06 13:01:31 +03:00
Asias He
c21fd58189 storage_service: Sleep less if we get schema version
This drops the second node boot up time from ~12 to ~7 seconds.
2015-08-06 15:23:51 +08:00
Asias He
b27201bd56 storage_service: Pass db into storage_service
It is needed for db.get_version(). I really hated to pass &db everywhere
If we had a global helper function like get_local_db(), life will be much
easier.
2015-08-06 15:23:51 +08:00
Asias He
470e9972a3 storage_service: Use set_mode
Now that set_mode is enabled, use it.
2015-08-06 15:23:51 +08:00
Asias He
5764e98b49 storage_service: Enable set_mode 2015-08-06 15:23:51 +08:00
Asias He
006554821c storage_service: Improve prepare_to_join
Use seastar::async to simplify the code. This code is only executed
during boot up, so it is fine to use seastar::async.

More commented out code are enabled.
2015-08-06 15:23:51 +08:00
Asias He
efe067fd74 storage_service: Enable more code in prepare_to_join 2015-08-06 15:23:51 +08:00
Asias He
f7b3b9646f storage_service: Stub prepare_replacement_info() 2015-08-06 15:23:51 +08:00
Asias He
93de64a061 storage_service: Add helper to get property and friends
They are used but we don't support them yet. Add stub helpers for now.
2015-08-06 15:23:51 +08:00
Pekka Enberg
99a80050e3 db: Rename legacy_schema_tables to schema_tables
There's nothing legacy about it so rename legacy_schema_tables to
schema_tables. The naming comes from a Cassandra 3.x development branch
which is not relevant for us in the near future.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:56:47 +03:00
Pekka Enberg
d743f6df50 service/migration_manager: Fix error handling in announce()
Propagate exceptions from migration_manager::announce() to the callers.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:31:46 +03:00
Pekka Enberg
0793a7849f service/migration_manager: Fix logger name
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:31:46 +03:00
Pekka Enberg
c281e6f1c3 service/migration_manager: Use get_local_gossiper()
Use the get_local_gossiper() helper instead of open-coding it.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:31:46 +03:00
Pekka Enberg
d70091e1fe service/migration_manager: Remove isReadyForBootstrap()
Remove commented out isReadyForBoostrap. We don't have a StageManager
nor we will so drop the function.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:31:46 +03:00
Pekka Enberg
dd9d178502 service/migration_manager: Rename MIGRATION_DELAY_IN_MSEC
Rename "MIGRATION_DELAY_IN_MSEC" to "migration_delay" as the unit of
time is already clear from the type.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:31:46 +03:00
Pekka Enberg
feb6b7d316 service/migration_manager: Remove storage proxy arguments
Use get_storage_proxy() and get_local_storage_proxy() helpers under the
hood to simplify migration manager API users.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:31:46 +03:00
Pekka Enberg
a49e16a762 service/migration_manager: Remove ifdef'd code
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:16:45 +03:00
Avi Kivity
55ca295154 Merge "Initial CQL event support" from Pekka
"This series implements initial support for CQL events. We introduce
migration_listener hook in migration manager as well as event notifier
in the CQL server that's built on top of it to send out the events via
CQL binary protocol. We also wire up create keyspace events to the
system so subscribed clients are notified when a new keyspace is
created.

There's still more work to be done to support all the events. That
requires some work to restructure existing code so it's better to merge
this initial series now and avoid future code conflicts."
2015-08-05 12:56:37 +03:00
Pekka Enberg
12d99bd282 service/migration_manager: Migration listener hooks
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 11:50:51 +03:00
Asias He
cb6fbee68b storage_service: Wire up remove_endpoint 2015-08-05 15:45:33 +08:00
Asias He
65e19203b0 storage_service: Enable more debug print 2015-08-05 15:32:38 +08:00
Asias He
f7be7c9cee storage_service: Wire up update_tokens in handle_state_normal 2015-08-05 15:29:32 +08:00
Asias He
8ccf7665e9 storage_service: Enable call to _token_metadata.remove_from_moving
This function is available now.
2015-08-05 15:29:32 +08:00
Asias He
3cb68f05f5 storage_service: Use update_normal_tokens to update tokens
Assume we have 3 tokens,

  {ee 36 d0 3e e8 6c 35 b1 , c5 5b 00 4a 1d 77 4e 50 , b9 b2 a1 0a 16 0d 76 8e }

With this

   for (auto t : tokens) {
       _token_metadata.update_normal_token(t, get_broadcast_address());
   }

Only the last token is inserted.

With this

   _token_metadata.update_normal_tokens(tokens, get_broadcast_address());

All 3 tokens are inserted correctly.
2015-08-05 15:29:32 +08:00
Avi Kivity
52dea3ac02 Merge "storage_service update" from Asias
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-04 18:13:54 +03:00
Avi Kivity
8d050b679a db: improve make_local_reader()
Instead of merging shard data using make_combined_reader(), take advantage
of the fact that shard data is disjoint, and use make_joining_reader().
This removes the need to sort the partitions as they are being read.
2015-08-04 17:11:39 +03:00
Asias He
2aec155f50 storage_service: Add debug print to dump token to endpoint map 2015-08-04 20:39:33 +08:00
Asias He
2250123654 storage_service: Enable code for remove_endpoint in handle_state_normal 2015-08-04 20:39:33 +08:00
Asias He
ba1a8c5ad7 storage_service: Enable debug print for tokens 2015-08-04 20:39:32 +08:00
Asias He
a7b9a8faed storage_service: Remove debug print for tokens in on_join 2015-08-04 20:26:34 +08:00
Asias He
95775917c9 storage_service: Rename camel case to snake case
They are leftovers.
2015-08-04 20:26:34 +08:00
Asias He
31eb9cea3d storage_service: Remove storage_service header in debug print 2015-08-04 20:26:34 +08:00
Asias He
c9ba5fa847 storage_service: Add more debug info for state change 2015-08-04 20:26:34 +08:00
Asias He
a5cd9ee315 storage_service: Remove the SS header in debug print
Since the logger will print logger name, no need to print it twice.
2015-08-04 20:26:34 +08:00
Asias He
28b2e6e95f storage_service: Print start gossiper service earlier
Otherwise, you see gossip log before gossiper service is started.
2015-08-04 20:26:34 +08:00
Pekka Enberg
5b3f7f8091 service: Convert MigrationListener to C++
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-04 15:08:24 +03:00
Pekka Enberg
44b53be4a1 service: Import IMigrationListener.java
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-04 13:17:05 +03:00
Shlomi Livne
b8bcf6d6e7 Fix a bug in mutation response processing in smp
Current code passes the storage_proxy instance of the shard on which the
message was received instead of using the storage_proxy instance of the
shard that sent the request.

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-08-04 11:23:49 +03:00
Pekka Enberg
a3c95235e6 migration_manager: Make stateful with sharded<>
In preparation for adding listener state to migration manager, use
sharded<> for migration manager.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-04 11:23:23 +03:00
Tomasz Grabiec
51b3bf38bb storage_proxy: Introduce make_local_reader() 2015-08-03 15:21:40 +02:00
Tomasz Grabiec
19851fb8db storage_proxy: Move stop() implementation to source file 2015-08-03 11:52:23 +02:00
Glauber Costa
438a3f8619 storage_proxy: move speculative_retry type to schema
It will be stored in the schema, so move there, where it belongs.  We'll need
to do more than just store a type, so provide a class that encapsulates it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-29 18:44:59 -04:00
Asias He
04e1cbfac7 storage_service: Enable check_for_endpoint_collision
We do a shadow gossip round to check if the ip address is used by
another node.
2015-07-29 15:41:52 +08:00
Avi Kivity
34131b22dd Merge "Debuggability improvements" from Tomasz 2015-07-28 12:47:33 +03:00
Tomasz Grabiec
c2664c0d46 storage_proxy: Add query logging on log_level::trace 2015-07-28 11:31:08 +02:00