Commit Graph

53948 Commits

Author SHA1 Message Date
Nadav Har'El
7a8ed228c7 repair: better error message
If a stream failed, print a clear error message that repair failed, instead
of ignoring it and letting Seastar's generic "warning, exception was ignored"
be the only thing the user will see.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:56 +03:00
Nadav Har'El
71a3a0c026 repair: repair each local range separately
The previous repair code exchanged data with the other nodes which have
one arbitrary token. This will only work correctly when all the nodes
replicate all the data. In a more realistic scenario, the node being
repaired holds copies of several token ranges, and each of these ranges
has a different set of replicas we need to perform the repair with.

So this patch does the right thing - we perform a separate repair_range()
for each of the local ranges, and each of those will find a (possibly)
different set of nodes to communicate with.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:55 +03:00
Nadav Har'El
f74eedce7d replication: add get_ranges() function
This patch adds a method get_ranges() to replication-strategy.
It returns the list of token ranges held by the given endpoint.

It will be used by the replication code, which needs to know
in particular which token ranges are held by *this* node.

This function is the analogue of Origin's getAddressRanges().get(endpoint).
As in Origin, also here the implementation is not meant to be efficient,
and will not be used in the fast path.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:55 +03:00
Asias He
0e2f9beec4 streaming: Wait after create keyspace and create table
Give it some time to propagate the schema to other nodes.
2015-08-10 15:53:42 +08:00
Asias He
d724fd449c streaming: Avoid storing partition_range in stream_detail
Now, make_local_reader does not need partition_range to be alive when we
read the mutation reader. No need to store it in stream_detail for its
lifetime.
2015-08-10 15:51:13 +08:00
Asias He
62394cc9d0 streaming: Add error handling for PREPARE_MESSAGE 2015-08-10 15:05:10 +08:00
Asias He
9f83588e66 streaming: Add error handling for STREAM_INIT_MESSAGE 2015-08-10 15:01:29 +08:00
Asias He
e13d93b2ff streaming: Improve error handling in stream_transfer_task::complete 2015-08-10 14:49:34 +08:00
Asias He
c7c33a9f44 streaming: Add error handling for STREAM_MUTATION sending 2015-08-10 14:44:25 +08:00
Asias He
be4d9c63b1 streaming: Drop do_with in stream_transfer_task::start
We can copy id instead, it is cheap.
2015-08-10 14:13:15 +08:00
Asias He
7fcaca56bd storage_service: Wait for schedule_schema_pull
It returns a future, we should not ignore it.
2015-08-10 10:26:27 +08:00
Asias He
1291344e68 storage_service: Wait for operations to complete in gossip callback
Since all the gossip callback (e.g., on_change) are executed inside a
seastar::async context, we can make wait for the operations like update
system table to complete.
2015-08-10 10:21:57 +08:00
Asias He
0b475a5173 gossip: Dump endpoint_state_map in debug mode
This is very useful for debug.
2015-08-10 09:48:32 +08:00
Asias He
5f7628da12 gossip: Run real_mark_alive under seastar::async context
Now on_dead is now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
d15c8289a2 gossip: Run remove_endpoint inside seastar::async context
on_remove is now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
56615a8a29 gossip: Make real_mark_alive run inside seastar::async context
on_alive callbacks are now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
4eedd417b1 gossip: Run code inside seastar::async context for add_local_application_state
So that do_before_change_notifications and do_on_change_notifications
are under seastar::async.

Now, before_change callbacks are inside seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
825f6d141d gossip: Run code inside seastar::async context for apply_state_locally
It is easier to futurize apply_new_states and handle_major_state_change.

Now, on_change, on_join and on_restart callbacks are inside
seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
c0aae33991 gossip: Futurize apply_state_locally 2015-08-10 09:48:32 +08:00
Asias He
802b3fdf19 gossip: Add timeout to send_gossip
Otherwise, when a node tries to send to a just killed node, it will
block for a long time, thus gossip round will be blocked.
2015-08-10 09:48:32 +08:00
Asias He
6ee2b138a4 gossip: Futurize handle_ack_msg 2015-08-10 09:48:32 +08:00
Asias He
c6509dad42 gossip: Make send_gossip and friends return future 2015-08-10 09:48:32 +08:00
Asias He
3b064c528e gossip: Make gossiper::run execute in a seastar::thread
Prepare to futurize gossiper.
2015-08-10 09:48:32 +08:00
Asias He
baec9e3449 gossip: Fix is_enabled
It is not correct to use _scheduled_gossip_task.armed() to tell if
gossip is enabled or not , since timer set _armed = false before calling
the timer callback.

It was working correctly because we did not actually check is_enabled()
flag inside the timer callback but inside the send_gossip_digest_syn()'s
continuation and at that time the timer is armed again.

Use a standalone flag to do so.
2015-08-10 09:48:32 +08:00
Avi Kivity
1016b21089 cache: improve preloading of flushed memtable mutations
If a mutation definitely doesn't exist in all sstables, then we can
certainly load it into the cache.
2015-08-09 22:46:08 +03:00
Avi Kivity
fee3a9513b sstables: add yet another variant of filter_has_key()
This time public, for use when preloading the cache.
2015-08-09 22:03:01 +03:00
Avi Kivity
29ce425862 db: introduce negative_mutation_reader concept
Similar to a mutation_reader, but limited: it only returns whether a key
is sure not to exist in some mutation source.  Non-blocking and expected
to execute fast.  Corresponds to an sstable bloom filter.

To avoid ambiguity, it doesn't return a bool, instead a longer but less
ambiguous "definitely_doesnt_exists" or "maybe_exists".
2015-08-09 22:00:44 +03:00
Avi Kivity
f3107c7869 Merge seastar upstream
* seastar 6f1dd3c...887f72d (8):
  > finally(): don't discard any exception
  > dpdk: check the resulting cluster for non-i40e NICs
  > reactor: avoid SIGPIPE when writing to a socket
  > memory: Don't run reclaimers if free memory is above the threshold
  > core: Add missing include to transfer.hh
  > dhcp: print the "sending discover" message only once
  > reactor: count io_threaded_fallback statistic
  > future: finally(): don't let the exceptional future to be ignored
2015-08-09 19:04:24 +03:00
Avi Kivity
061d86c91d Merge "Faster CRC implementation"
"Use the x86 CRC32 instruction to compute CRC."
2015-08-09 09:55:26 +03:00
Avi Kivity
a8ff8ea442 commitlog: switch to faster crc32 implementation 2015-08-09 00:05:36 +03:00
Avi Kivity
d6351ecca7 utils: add crc32 class
C++ interface to the crc32 x86 instruction.
2015-08-09 00:05:33 +03:00
Avi Kivity
70618762c3 build: require at least a Nehalem-class cpu
We want to use the crc32 instruction, which was made available
on Nehalem, so let's require it.  It's old enough to be present
everywhere.
2015-08-08 23:28:32 +03:00
Avi Kivity
4a5845ae60 Merge "Incremental eviction" from Tomasz
"This series enables incremental eviction of data from cache. The eviction is
controlled by the LSA tracker, which consideres evictable regions as part of
its reclaim() method."
2015-08-08 14:39:13 +03:00
Tomasz Grabiec
5e677f4331 tests: Add row_cache eviction test 2015-08-08 09:59:24 +02:00
Tomasz Grabiec
ef549ae5a5 lsa: Reclaim space from evictable regions incrementally
When LSA reclaimer cannot reclaim more space by compaction, it
will reclaim data by evicting from evictable regions.

Currently the only evictable region is the one owned by the row cache.
2015-08-08 09:59:24 +02:00
Tomasz Grabiec
7a8f1ef6c3 row_cache: Replace _lru_len counter with region occupancy
_lru_len may get stale when row_cache instance goes out of scope
purging all its partitions from cache. I'm assuming we're not really
interested in the number of partitions here, but rather a measure of
occupancy, so I applied a simple fix of using LSA region occupancy
instead.
2015-08-08 09:59:24 +02:00
Tomasz Grabiec
bceeb301b7 tests: lsa: Add test for region merging 2015-08-08 09:59:24 +02:00
Tomasz Grabiec
a095b39091 lsa: Don't leak empty _active segment in merge() 2015-08-08 09:59:24 +02:00
Tomasz Grabiec
5b5c0038e6 lsa: Don't allocate aligned segments
Requiring alignment means that there must be 64K of contiguous space
to allocate each 32K segment. When memory is fragmented, we may fail
to allocate such segment, even though there's plenty of free space.

This especially hurts forward progress of compaction, which frees
segments randomly and relies on the fact that freeing a segment will
make it available to the next segment request.
2015-08-07 22:13:17 +02:00
Tomasz Grabiec
64bd4bee94 lsa: Log segment closing and releasing on trace level 2015-08-07 22:06:15 +02:00
Tomasz Grabiec
02ff31b815 lsa: Reduce amount of calls to descriptor() in free() 2015-08-07 22:05:53 +02:00
Tomasz Grabiec
e3592a4a04 api: lsa: Invoke compaction on all shards 2015-08-07 22:05:53 +02:00
Avi Kivity
416d8f7799 sstables: don't pass temporary string to regex
Since the regex match returns views into that string, it must not be
a temporary. gcc 5.1's libstdc++ won't accept it, either.
2015-08-07 21:46:55 +03:00
Avi Kivity
a1543dc4f9 tests: mark fake variable as unused in logalloc_test
So that gcc 5.1 doesn't complain.
2015-08-07 21:32:09 +03:00
Glauber Costa
ae2ce78ee6 version: change all fields to uint16_t
Ok, shame on me: the version string was so obviously correct that I only
verified that the comparisons were working as expected.

Turns out it isn't: http://lists.boost.org/boost-users/2006/12/24194.php

boost::format will treat uint8_t arguments as char, and therefore we will end
up with the version string misprinted.

We can just cast it to uint16_t before we print, but since this is not exactly
a struct that we will be using all the time, let's favor readability over
saving a few bytes, and change all fields to uint16_t.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 20:25:20 +03:00
Glauber Costa
28f3b4a084 main: add default value for configuration file
We should have a better default here, that is prefixed by some directory.  But
this will do for now, so we can start using the file without passing it as a
parameter all the time.

Also fix help string so it says scylla, and not cassandra.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 11:10:56 -05:00
Glauber Costa
3426b3ecc1 bootstrap tokens: get tokens from config file
Aside from being the obviously correct thing to do, not having this will force us
to manually adjust num_tokens when running our sstables into Cassandra.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 11:10:56 -05:00
Glauber Costa
2678b0e606 dht: change get_bootstrap_tokens()'s signature
It needs to access the non-existent "DatabaseDescriptor". Do as we have been doing,
and just pass the database object instead.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 11:10:56 -05:00
Tomasz Grabiec
5dc58a7cd4 allocation_strategy: Leak the standard strategy
Some code may attempt to use it during finalization after "instance"
was destroyed.

Reported by Pekka:

/usr/include/c++/4.9.2/bits/unique_ptr.h:291:14: runtime error:
reference binding to null pointer of type 'struct
standard_allocation_strategy'
./utils/allocation_strategy.hh:105:13: runtime error: reference
binding to null pointer of type 'struct standard_allocation_strategy'
./utils/allocation_strategy.hh:118:35: runtime error: reference
binding to null pointer of type 'struct allocation_strategy'
./utils/managed_bytes.hh:59:45: runtime error: member call on null
pointer of type 'struct allocation_strategy'
./utils/allocation_strategy.hh:82:9: runtime error: member access
within null pointer of type 'struct allocation_strategy'
2015-08-07 18:35:20 +03:00
Glauber Costa
5d3c7165d2 version: use a tuple internally.
As Avi suggested, we can use a tuple to make some comparisons more natural.
However, instead of doing a make_tuple on the comparison only, we can go
further and store the tuple internally.

I am still keeping the outer type, so it can host convenience functions like
to_sstring() and current().

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 18:24:54 +03:00