Commit Graph

3684 Commits

Author SHA1 Message Date
Avi Kivity
d3cd8ec24a Merge "token bootstrap"
From Asias:

"With this series, simple replication strategy is supposed to work.

Start two nodes, after bootstrap, uuid and token are spread correctly
through gossip:

ep=127.0.0.2, eps=EndpointState: HeartBeatState = generation = 1433298647, version = 156, AppStateMap =
{ 0 : Value(NORMAL,TOKENS,30) }  { 5 : Value(urchin_1_0,4) }  { 8 : Value(,3) }  { 11 : Value(ms_1_0,1) }
{ 12 : Value(6127dd57-fb40-4aca-9046-c3509eca4d1e,2) }  { 13 : Value(acd66a3f2e5bb1af;d0046f49663e2d9b;43dcb3bd8dc7397b,29) }
ep=127.0.0.1, eps=EndpointState: HeartBeatState = generation = 1433298640, version = 161, AppStateMap =
{ 0 : Value(NORMAL,TOKENS,11) }  { 5 : Value(urchin_1_0,4) }  { 8 : Value(,3) }  { 11 : Value(ms_1_0,1) }
{ 12 : Value(a6d2ac36-2f0e-492f-8676-198bbbc42dd1,2) }  { 13 : Value(71e4bd2878ec2446;2170ecc473cd6240;e0f2574988bb909e,9) }

Endpoint -> Token
inet_address=127.0.0.2, token=ac d6 6a 3f 2e 5b b1 af
inet_address=127.0.0.2, token=d0 04 6f 49 66 3e 2d 9b
inet_address=127.0.0.1, token=e0 f2 57 49 88 bb 90 9e
inet_address=127.0.0.1, token=21 70 ec c4 73 cd 62 40
inet_address=127.0.0.2, token=43 dc b3 bd 8d c7 39 7b
inet_address=127.0.0.1, token=71 e4 bd 28 78 ec 24 46

Endpoint -> UUID
inet_address=127.0.0.2, uuid=6127dd57-fb40-4aca-9046-c3509eca4d1e
inet_address=127.0.0.1, uuid=a6d2ac36-2f0e-492f-8676-198bbbc42dd1

Sorted Token
token=ac d6 6a 3f 2e 5b b1 af
token=d0 04 6f 49 66 3e 2d 9b
token=e0 f2 57 49 88 bb 90 9e
token=21 70 ec c4 73 cd 62 40
token=43 dc b3 bd 8d c7 39 7b
token=71 e4 bd 28 78 ec 24 46"
2015-06-04 12:46:08 +03:00
Asias He
c95364fe31 failure_detector: Start on all cpus
Code calls failure_detector::is_alive on all cpus, so we start
failure_detector on all cpus. However, the internal data of failure_detector
is modified on cpu zero and it is not replicated to non-zero cpus.
This is fine since the user of failure_detector (the gossiper) accesses
it on cpu0 only.
2015-06-04 17:25:20 +08:00
Asias He
26cd039005 gossip: Add is_alive helper
failure_detector::is_alive asks gossiper if a node is up or down.
2015-06-04 17:16:58 +08:00
Asias He
77e8f361bb storage_service: Reduce time for non-seed node to join the ring
Waiting for 30 seconds is way too long for testing. Reduce it to 5
seconds.

When we have a proper config system, we can specify in cmdline.
2015-06-04 17:16:50 +08:00
Asias He
a19d2171eb storage_service: Remove ad-hoc token_metadata creation
Use token_metadata from storage_service when creating a
replication_strategy in keyspace::create_replication_strategy.
2015-06-04 17:16:50 +08:00
Asias He
f1ed0cdc7e storage_service: Start on all cpus and replicate _token_metadata
_token_metadata is needed by replication strategy code on all cpus.
Changes to _token_metadata are done on cpu 0. Replicate it to all cpus.

We may copy only if _token_metadata actually changes. As a starter, we
always copy in gossip modification callbacks.
2015-06-04 17:16:50 +08:00
Asias He
8467db11aa token_metadata: Print _sorted_tokens in debug
_sorted_tokens is used by replication code.
2015-06-04 17:12:10 +08:00
Asias He
cae9d65e9d storage_service: Move more code to source file 2015-06-04 17:12:10 +08:00
Asias He
4311662828 storage_service: Implement update_peer_info 2015-06-04 17:12:10 +08:00
Asias He
a85cee6afe storage_service: Rename isSurveyMode to _is_survey_mode 2015-06-04 17:12:10 +08:00
Asias He
db527c1a81 storage_service: Move joinRing to source file 2015-06-04 17:12:10 +08:00
Asias He
4dc4e54e50 storage_service: Add is_joined 2015-06-04 17:12:10 +08:00
Asias He
ca2e151c03 storage_service: Rename initialized to _initialized 2015-06-04 17:12:10 +08:00
Asias He
e5c653939b storage_service: Add is_bootstrap_mode and finish_bootstrapping 2015-06-04 17:12:10 +08:00
Asias He
c87f950aff storage_service: Implement handle_state_normal
Start two nodes, after bootstrap, uuid and token are spread correctly
through gossip:

----------- endpoint_state_map:  -----------
ep=127.0.0.1, eps=EndpointState: HeartBeatState = generation =
1433172216, version = 66, AppStateMap =  { 0 : Value(NORMAL,TOKENS,11) }
{ 5 : Value(urchin_1_0,4) }  { 8 : Value(,3) }  { 11 : Value(ms_1_0,1) }
{ 12 : Value(06eb49d2-a092-483a-a89a-f774cff2c3e5,2) }  { 13 :
Value(0b20137e213f697b;c39a029ad9dd2948;0003be0eeb569d5a,9) }

ep=127.0.0.2, eps=EndpointState: HeartBeatState = generation =
1433172229, version = 56, AppStateMap =  { 0 : Value(NORMAL,TOKENS,51) }
{ 5 : Value(urchin_1_0,4) }  { 8 : Value(,3) }  { 11 : Value(ms_1_0,1) }
{ 12 : Value(adc8eb9f-7c1f-4695-905c-c1c4fdeea4d8,2) }  { 13 :
Value(6f5607a9b4cbadf0;eb7d976656cafad1;a225d312b9f42e5b,50) }

----------- token_metadata:  -----------
Endpoint -> Token
inet_address=127.0.0.2, token=a2 25 d3 12 b9 f4 2e 5b
inet_address=127.0.0.1, token=c3 9a 02 9a d9 dd 29 48
inet_address=127.0.0.2, token=eb 7d 97 66 56 ca fa d1
inet_address=127.0.0.1, token=00 03 be 0e eb 56 9d 5a
inet_address=127.0.0.1, token=0b 20 13 7e 21 3f 69 7b
inet_address=127.0.0.2, token=6f 56 07 a9 b4 cb ad f0
Endpoint -> UUID
inet_address=127.0.0.1, uuid=06eb49d2-a092-483a-a89a-f774cff2c3e5
inet_address=127.0.0.2, uuid=adc8eb9f-7c1f-4695-905c-c1c4fdeea4d8
2015-06-04 17:12:10 +08:00
Asias He
9649414011 dht: Align token print
Before:
token=df 96 79 87 21 b2 ed 80
token=5c 98 e a0 4f 5e 28 6b

After:
token=df 96 79 87 21 b2 ed 80
token=5c 98 0e a0 4f 5e 28 6b
2015-06-04 17:12:10 +08:00
Asias He
abad1520ad gossip: Fix get_host_id
Return a real UUID.
2015-06-04 17:12:10 +08:00
Asias He
8a578a1364 utils: Add UUID(const sstring& uuid_string) constructor
Construct a UUID from a UUID string.
2015-06-04 17:12:10 +08:00
Asias He
1ed5d01cd2 storage_service: Fix STATUS in set_tokens
Here, we should set STATUS to NORMAL.
2015-06-04 17:12:10 +08:00
Asias He
68f671a8b7 storage_service: Move gossip callback to source file 2015-06-04 17:12:09 +08:00
Asias He
6917a904c3 storage_service: Implement handle_state_bootstrap 2015-06-04 17:12:09 +08:00
Asias He
0c98a4413f token_metadata: Add _leaving_endpoints and _moving_endpoints 2015-06-04 17:12:09 +08:00
Asias He
34b3d679ab dht: Do move in token constructor 2015-06-04 17:12:09 +08:00
Asias He
9dc7a60b4a storage_service: Move handle_state_bootstrap and friends to source file 2015-06-04 17:12:09 +08:00
Asias He
42a2f24c77 token_metadata: Add add_bootstrap_tokens and remove_bootstrap_tokens 2015-06-04 17:12:09 +08:00
Asias He
9c5cd2bca8 storage_service: Switch to use unordered_set for tokens
We do not care about the order of the tokens.

Also, in token_metadata, we use unordered_set for tokens as well, e.g.
update_normal_tokens. Unify the usage.
2015-06-04 17:12:09 +08:00
Asias He
6b263e46eb token_metadata: Add is_member 2015-06-04 17:12:09 +08:00
Asias He
06a792d6be token_metadata: Add get_host_id and friends 2015-06-04 17:12:09 +08:00
Asias He
edee90550c database: Fix boost::find compile error
boost::find confuses compiling when both <boost/algorithm/string/find.hpp> and
<boost/range/algorithm/find.hpp> are included.
2015-06-04 17:12:09 +08:00
Avi Kivity
7fa17d9880 Merge "range query read path"
Conflicts:
	database.cc
2015-06-04 10:21:48 +03:00
Pekka Enberg
fcd6f147fc db/legacy_schema_tables.cc: Use schema_result::value_type instead of std::pair
Switch to schema_result::value_type instead of the open-coded std::pair
so that the actual types are defined in one place.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-03 19:14:39 +02:00
Avi Kivity
32a7e3b21f Merge "failure detector REST API"
From Amnon:

"This series adds the failure detector API. The API definition is based on the
FailureDetectorMBean API."
2015-06-03 19:23:07 +03:00
Amnon Heiman
ba8365d95a Adding the Failure detector API implementation
This series adds the implementation for teh Failure detector API.
After this patch the following APIs will be supported:
/failure_detector/endpoints
/failure_detector/count/endpoint/up
/failure_detector/count/endpoint/down
/failure_detector/phi
POST:/failure_detector/phi
/failure_detector/simple_states
/failure_detector/endpoints/states

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-06-03 19:13:03 +03:00
Amnon Heiman
a75376e8e3 API: Add a helper function from map to key value list
When using swagger definition file, returning a map, needs to be in a
key, value list. To handle this common case in the API, a helper
function was added that gets an unorder_map and return a vector of key,
value mapping.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-06-03 19:13:03 +03:00
Amnon Heiman
711fe64208 Expose the failure_detector functionality
The failure detector runs on CPU 0, for external usage, this is an
implementation detail which is unrelevant.

This adds a wrapper functions for the functions that are defined in
FailureDetectorMBean which would map the request to the correct CPU.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-06-03 19:13:03 +03:00
Amnon Heiman
71bfd07d69 API Adding the failure detector swagger definition
This adds the failure detector definition that is based on the
FailureDetectorMBean

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-06-03 19:12:57 +03:00
Amnon Heiman
57a2777da9 api: fix string containing space cause boost execption
When the container_to_vec helper function has a string that contains
space, a boost exection is thrown.

This fixes it by using std::string for the conversion that the boost
recognize as a string type.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-06-03 19:00:32 +03:00
Avi Kivity
9765eda012 db: drop memtables that were successfully flushed 2015-06-03 16:39:53 +03:00
Avi Kivity
a71c287c10 db: add sstables to the range scan read path 2015-06-03 16:39:46 +03:00
Pekka Enberg
0993142d8d sstable: Fix write buffer size
The current 4K write buffer is ridiculously small and forces Urchin to
issue small I/O batches. Increase the buffer size to 64K.

Before:

  Results:
  op rate                   : 27265
  partition rate            : 27265
  row rate                  : 27265
  latency mean              : 1.2
  latency median            : 0.9
  latency 95th percentile   : 2.4
  latency 99th percentile   : 10.6
  latency 99.9th percentile : 14.3
  latency max               : 44.7
  Total operation time      : 00:00:30
  END

After:

  Results:
  op rate                   : 35365
  partition rate            : 35365
  row rate                  : 35365
  latency mean              : 0.9
  latency median            : 0.8
  latency 95th percentile   : 1.8
  latency 99th percentile   : 8.8
  latency 99.9th percentile : 21.8
  latency max               : 272.2
  Total operation time      : 00:00:34
  END

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-03 16:32:55 +03:00
Tomasz Grabiec
3d7049f0de tests: Fix commitlog_test.cc
"file" should not go out of scope until async stat() completes.
2015-06-03 14:12:06 +02:00
Pekka Enberg
2eeabfcebc db/legacy_schema_tables: Fix merge_keyspaces()
Keys that are in "entries_only_on_right" need to be looked up from
"after". Fixes a regression introduced in commit 5418e32
("map_difference: Simplify difference value").

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-03 13:43:48 +02:00
Calle Wilund
5418673659 Column family seal_active_memtable fix: don't use local by ref in cont.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-03 13:43:47 +02:00
Tomasz Grabiec
b2549a7b14 Merge branch 'calle/secondary_index' from seastar-dev.git 2015-06-03 13:22:01 +02:00
Avi Kivity
062eaab809 db: load sstable before adding it to column_family read set
A just-written sstable is not ready for reading without reopening the
files in ro mode.
2015-06-03 12:57:56 +03:00
Avi Kivity
61d62dffe8 db: futurize column_family::for_all_partitions() internal loop
Adapt for_all_partitions() to use futures instead of iterators,
as that will be the interface to sstables.  We drop use of nway_merger as
that is not able to use futures and instead open-code the heap
functionality.
2015-06-03 12:57:28 +03:00
Calle Wilund
293dbf66e3 Forward and use replay_position when applying mutation
* Forward commitlog replay_position to column_family.memtable, updating
  highest RP if needed
* When flushing memtable, signal back to commitlog that RP has been dealt with
  to potentially remove finished segment(s)

Note: since memtable flushing right now is _not_ explicitly ordered,
this does not actually work, since we need to guarantee ordering with
regards to RP. I.e. if we flush N blocks, we must guarantee that:
a.) We report "flushed RP" in RP order
b.) For a given RP1, all RP* lower than RP1 must also have been flushed.
(The latter means that it is fine to say, flush X tables at the same time, as long as we report a single RP that is the highest, and no lower RP:s exist in non-flushed tables)

I am however letting someone else deal with ensuring MT->sstable flush order.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-03 12:38:13 +03:00
Avi Kivity
33ac1922f3 db: adjust mutation print function
Show the decorated key instead of the partition key, as it is a superset.
2015-06-03 12:35:13 +03:00
Avi Kivity
a2fa63e09b db: add another mutation constructor 2015-06-03 12:35:13 +03:00
Avi Kivity
c9a4289244 tests: fix mutation_test capturing stack variable by value
Worked until now because column_family::for_all_partitions() did not defer.
2015-06-03 12:35:07 +03:00