Commit Graph

7611 Commits

Author SHA1 Message Date
Avi Kivity
b3cd672d97 Merge seastar upstream
* seastar ad07a2e...5b9e3da (2):
  > Merge "rpc cleanups and improvements" from Gleb
  > shared_future: Add missing include
2015-12-10 18:11:59 +02:00
Paweł Dziepak
9d482532f4 tests/lsa: reduce the size of large allocation
Originally, large allocation test case attempted to allocate an object
as big as halft of the space used by the lsa. That failed when the test
was executed with lower amount of memory available mainly due to the
memory fragmentation caused by previous test cases.

This patches reduces the size of the large allocation to 3/8 of the
total space used by the lsa which is still a lot but seems to make the
test pass even with as little memory as 64MB per shard.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 13:16:43 +01:00
Avi Kivity
d425aacaeb release: copy version string into heap
If we get a core dump from a user, it is important to be able to
identify its version.  Copy the release string into the heap (which is
copied into the code dump), so we can search for it using the "strings"
or "ident" commands.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2015-12-10 13:12:40 +02:00
Lucas Meneghel Rodrigues
2167173251 utils/logalloc.cc - Declare member minimum_size from segment_zone struct
This fixes compile error:

In function `logalloc::segment_zone::segment_zone()':
/home/lmr/Code/scylla/utils/logalloc.cc:412: undefined reference to `logalloc::segment_zone::minimum_size'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-12-10 12:54:34 +02:00
Asias He
b7d10b710e streaming: Propagate fail to send PREPARE_DONE_MESSAGE exception
Otherwise the stream_plan will not be marked as failed state.
2015-12-10 12:38:00 +02:00
Paweł Dziepak
ec453c5037 managed_bytes: fix potentially unaligned accesses
blob_storage defined with attribute packed which makes its alignment
requirement equal 1. This means that its members may be unaligned.
GCC is obviously aware of that and will generate appropriate code
(and not generate ubsan checks). However, there are few places where
members of blob_storage are accessed via pointers, these have to be
wrapped by unaligned_cast<> to let the compiler know that the location
pointed to may be not aligned properly.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 11:59:54 +02:00
Tomasz Grabiec
43498b3158 Merge branch 'pdziepak/fix-partial-clustering-keys/v1' from seastar-dev.git
Form Paweł:

This series fixes support for clustering keys which trailing components
are null. The solution is to use clustering_key_prefix instead of
clustering_key everywhere.

Fixes #515.
2015-12-10 10:43:12 +01:00
Paweł Dziepak
66ff1421f0 tests/cql: add test for clustering keys with empty components
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:47:07 +01:00
Paweł Dziepak
64f50a4f40 db: make clustering_key a prefix
Schemas using compact storage can have clustering keys with the trailing
components not set and effectively being a clustering key prefixes
instead of full clustering keys.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:47 +01:00
Paweł Dziepak
77c7ed6cc5 keys: add prefix_equality_less_compare for prefixes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
220a3b23c0 keys: allow creating partial views of prefixes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
3c16ab080a sstables: do not assume clustering_key has the proper format
In case of non-compound dense tables the column name is just the value
of the clustering key (which has only one component). Current code just
casts clustering_key to bytes_view which works because there is no
additional metadata in single element clustering keys.
However, that may change when the internal representation of clustering
key is changed so explicitly extract the proper component.

This change will become necessary when clustering_key is replaced by
clustering_key_prefix.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
5f1e9fd88f mutation_partition: remove unused find_entry()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
3287022000 cql3: do not assume that clustering key is full
In case of schemas that use compact storage it is possible that trailing
components of clustering keys are not set.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Avi Kivity
167addbfe1 main: remove issue #417 (poll mode) warning
Fixed.
2015-12-09 19:00:32 +02:00
Avi Kivity
a352d63bf9 Merge seastar upstream
* seastar c5e595b...ad07a2e (1):
  > reactor: add command line option to disable sleep mode

Fixes #417
2015-12-09 19:00:20 +02:00
Glauber Costa
3c988e8240 perf_sstable: use current scylla default directory
When this tool was written, we were still using /var/lib/cassandra as a default
location. We should update it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2015-12-09 17:46:31 +02:00
Avi Kivity
01c3670def Merge seastar upstream
* seastar 5dc22fa...c5e595b (3):
  > memory: be less strict about NUMA bindings
  > reactor: let the resource code specify the default memory reserve
  > resource: reserve even more memory when hwloc is compiled in

Fixes #642
2015-12-09 16:47:47 +02:00
Asias He
66938ac129 streaming: Add retransmit logic for streaming verbs
Retransmit streaming related verbs and give up in 5 minutes.

Tested with:

  lein test :only cassandra.batch-test/batch-halves-decommission

Fixes #568.
2015-12-09 15:12:36 +02:00
Avi Kivity
14794af260 Merge seastar upstream
* seastar 9f9182e...5dc22fa (1):
  > future: add repeat_until_value(): repeat an action until it returns a value
2015-12-09 15:11:59 +02:00
Avi Kivity
213700e42f Merge seastar upstream
* seastar d40453b...9f9182e (5):
  > Merge "Sleep mode support"
  > future: add futurize<T>::from_tuple(tuple<T>)
  > tls: Add missing destructor for dh_params::impl, fixes ASAN error
  > tls/socket fix: Add missing noexcept to constructor/move
  > Merge "Initial SSL/TLS socket support" from Calle
2015-12-09 11:01:13 +02:00
Avi Kivity
204610ac61 Merge "Make LSA more large-allocation-friendly" from Paweł
"This series attempts to make LSA more friendly for large (i.e. bigger
than LSA segment) allocations. It is achieved by introducing segment
zones – large, contiguous areas of segments and using them to allocate
segments instead of calling malloc() directly.
Zones can be shrunk when needed to reclaim memory and segments can be
migrated either to reduce number of zone or to defragment one in order
to be able to shrink it. LSA tries to keep all segments at the lower
addresses and reclaims memory starting from the zones in the highest
parts of the address space."
2015-12-09 10:49:23 +02:00
Avi Kivity
883074e936 Merge "Fix replace_node support" from Asias
Also:

[PATCH scylla v1 0/7] gossip mark node down fix + cleanup
[PATCH scylla v1 0/2] Refuse decommissioned node to rejoin
[PATCH scylla] storage_service: Fix added node not showing up in nodetool in status joining
2015-12-09 10:42:52 +02:00
Paweł Dziepak
8ba66bb75d managed_bytes: fix copy size in move constructor
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-09 10:38:28 +02:00
Asias He
b63d49c773 storage_service: Log removing replaced endpoint from system.peers
This info is important when replacing a node. Useful for debugging.
2015-12-09 12:30:52 +08:00
Asias He
d26c7e671d storage_service: Enable commented out code in handle_state_normal
Add current_owner to endpoints_to_remove if endpoint and current_owner
have the same token and endpoint is newer than current_owner.
2015-12-09 12:30:52 +08:00
Asias He
3793bb7be1 token_metadata: Add get_endpoint_to_token_map_for_reading 2015-12-09 12:30:52 +08:00
Asias He
1cc7887ffb token_metadata: Do nothing if tokens is empty.
When replacing a node, we might ignore the tokens so that the tokens is
empty. In this case, we will have

   std::unordered_map<inet_address, std::unordered_set<token>> = {ip, {}}

passed to token_metadata::update_normal_tokens(std::unordered_map<inet_address,
std::unordered_set<token>>& endpoint_tokens)

and hit the assert

   assert(!tokens.empty());
2015-12-09 12:30:52 +08:00
Asias He
e79c85964f system_keyspace: Flush system.peers in remove_endpoint
1) Start node 1, node 2, node 3
2) Stop  node 3
3) Start node 4 to replace node 3
4) Kill  node 4 (removal of node 3 in system.peers is not flushed to disk)
5) Start node 4 (will load node 3's token and host_id info in bootup)

This makes

   "Token .* changing ownership from 127.0.0.3 to 127.0.0.4"

messages printed again in step 5) which are not expected, which fails the dtest

   FAIL: replace_first_boot_test (replace_address_test.TestReplaceAddress)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "scylla-dtest/replace_address_test.py",
   line 220, in replace_first_boot_test
       self.assertEqual(len(movedTokensList), numNodes)
   AssertionError: 512 != 256
2015-12-09 12:30:52 +08:00
Asias He
110a18987e token_metadata: Print Token changing ownership from
Needed by test.
2015-12-09 12:30:52 +08:00
Asias He
906f670a86 gossip: Print node status in handle_major_state_change
It is useful to know the STATUS value when debugging.
2015-12-09 12:29:15 +08:00
Asias He
a0325a5528 gossip: Simplify is_shutdown and friends.
Use the newly added helper get_gossip_status.
2015-12-09 12:29:15 +08:00
Asias He
9d4382c626 gossip: Introduce get_gossip_status
Get value of application_state::STATUS.
2015-12-09 12:29:15 +08:00
Asias He
5a65d8bcdd gossip: Fix endless marking a node down
In commit 56df32ba56 (gossip: Mark node as
dead even if already left). A node liveness check is missed.

Fix it up.

Before: (mark a node down multiple times)

[Tue Dec  8 12:16:33 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:33 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:34 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:34 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:35 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:35 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:36 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:36 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead

After: (mark a node down only one time)

[Tue Dec  8 12:28:36 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:28:36 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
2015-12-09 12:29:15 +08:00
Asias He
fa3c84db10 gossip: Kill default constructor for versioned_value
The only reason we needed it is to make
   _application_state[key] = value
work.

With the current default constructor, we increase the version number
needlessly. To fix and to be safe, remove the default constructor
completely.
2015-12-09 12:29:15 +08:00
Asias He
52a5e954f9 gossip: Pass const ref for versioned_value in on_change and before_change 2015-12-09 12:29:15 +08:00
Asias He
3308430343 storage_service: Make before_change and on_change log print more informative
- Make before_change and on_change print the versioned_value
- Print endpoint address first in handle_state_* and
  on_change and friends.
2015-12-09 12:29:15 +08:00
Asias He
ccbd801f40 storage_service: Fix decommissioned nodes are willing to rejoin the cluster if restarted
Backport: CASSANDRA-8801

a53a6ce Decommissioned nodes will not rejoin the cluster.

Tested with:
topology_test.py:TestTopology.decommissioned_node_cant_rejoin_test
2015-12-09 10:43:51 +08:00
Asias He
b3dd2d976a storage_service: Simplify prepare_to_join with seastar thread 2015-12-09 10:43:51 +08:00
Asias He
e9a4d93d1b storage_service: Fix added node not showing up in nodetool in status joining
The get_token_endpoint API should return a map of tokens to endpoints,
including the bootstrapping ones.

Use get_local_storage_service().get_token_to_endpoint_map() for it.

$ nodetool -p 7100 status

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID Rack
UN  127.0.0.1  12645      256     ?  eac5b6cf-5fda-4447-8104-a7bf3b773aba  rack1
UN  127.0.0.2  12635      256     ?  2ad1b7df-c8ad-4cbc-b1f1-059121d2f0c7  rack1
UN  127.0.0.3  12624      256     ?  61f82ea7-637d-4083-acc9-567e0c01b490  rack1
UJ  127.0.0.4  ?          256     ?  ced2725e-a5a4-4ac3-86de-e1c66cecfb8d  rack1

Fixes #617
2015-12-09 10:43:51 +08:00
Paweł Dziepak
63bdf52803 tests/lsa: add large allocations test
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 23:56:46 +01:00
Tomasz Grabiec
d68a8b5349 Merge branch 'dev/amnon/index_summary_size_v2' from seastar-dev.git
API for getting sstable index summary memory footprint from Amnon
2015-12-08 20:03:39 +01:00
Paweł Dziepak
73a1213160 scylla-gdb.py: print lsa zones
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
0d66300d43 lsa: add more counters
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
83b004b2fb lsa: avoid fragmenting memory
Originally, lsa allocated each segment independently what could result
in high memory fragmentation. As a result many compaction and eviction
passes may be needed to release a sufficiently big contiguous memory
block.

These problems are solved by introduction of segment zones, contiguous
groups of segments. All segments are allocated from zones and the
algorithm tries to keep the number of zones to a minimum. Moreover,
segments can be migrated between zones or inside a zone in order to deal
with fragmentation inside zone.

Segment zones can be shrunk but cannot grow. Segment pool keeps a tree
containing all zones ordered by their base addresses. This tree is used
only by the memory reclamer. There is also a list of zones that have
at least one free segments that is used during allocation.

Segment allocation doesn't have any preferences which segment (and zone)
to choose. Each zone contains a free list of unused segments. If there
are no zones with free segments a new one is created.

Segment reclamation migrates segments from the zones higher in memory
to the ones at lower addresses. The remaining zones are shrunk until the
requested number of segments is reclaimed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
6c4a54fb0b tests: add tests for utils::dynamic_bitset
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
2fb14a10b6 utils: add dynamic_bitset
A dynamic bitset implementation that provides functions to search for
both set and cleared bits in both directions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
40dda261f2 lsa: maintain segment to region mapping
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
c4e71bac7f tests/row_cache_alloc_stress: make sure that allocation fails
Currently test case "Testing reading when memory can't be reclaimed."
assumes that the allocation section used by row cache upon entering
will require more free memory than there is available (inc. evictable).
However, the reserves used by allocation section are adjusted
dynamically and depend solely on previous events. In other words there
is no guarantee that the reserve would be increased so much that the
allocation will fail.

The problem is solved by adding another allocation that is guaranteed
to be bigger than all evictable and free memory.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
2e94086a2c lsa: use bi::list to implement segment_stack
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00