Commit Graph

53948 Commits

Author SHA1 Message Date
Paweł Dziepak
273b8daeeb lsa: add no-op default constructor for segment
Zero initialization of segment::data when segment is value initialized
is undesirable.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:37:37 +01:00
Paweł Dziepak
e6cf3e915f lsa: add counters for memory used by large objects
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:36:27 +01:00
Paweł Dziepak
9396956955 scylla-gdb.py: show lsa statistics and regions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:36:20 +01:00
Paweł Dziepak
aaecf5424c scylla-gdb.py: show free, used and total memory
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:36:16 +01:00
Paweł Dziepak
6b113a9a7a lsa: fix eviction of large blobs
LSA memory reclaimer logic assumes that the amount of memory used by LSA
equals: segments_in_use * segment_size. However, LSA is also responsible
for eviction of large objects which do not affect the used segmentcount,
e.g. region with no used segments may still use a lot of memory for
large objects. The solution is to switch from measuring memory in used
segments to used bytes count that includes also large objects.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:29:09 +01:00
Takuya ASADA
4a8c79ca0e dist: re-initialize RAID on ephemeral disk when stop/restart AMI instance
Since this won't check disk types, may re-initialize RAID on EBS when first block was lost.
But in such condition, probably re-initialize RAID is the only choice we can take, so this is fine.
Fixes #364.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-24 10:46:10 +02:00
Asias He
3a9200db03 config: Add more document for options
For consistent_rangemovement, join_ring, load_ring_state, etc.
2015-11-24 10:07:31 +08:00
Asias He
7ddf8963f5 config: Enable broadcast_rpc_address option
With this patch, start two nodes

node 1:
scylla --rpc-address 127.0.0.1 --broadcast-rpc-address 127.0.0.11

node 2:
scylla --rpc-address 127.0.0.2 --broadcast-rpc-address 127.0.0.12

On node 1:
cqlsh> SELECT rpc_address from system.peers;

 rpc_address
-------------
  127.0.0.12

which means client should use this address to connect node 2 for cql and
thrift protocol.
2015-11-24 10:07:31 +08:00
Asias He
33ef58c5c9 utils: Add get_broadcast_rpc_address and set_broadcast_rpc_address helper 2015-11-24 10:07:31 +08:00
Asias He
1e55aa38c1 storage_service: Implement is_replacing 2015-11-24 10:07:29 +08:00
Asias He
644c226d58 config: Enable replace_address and replace_address_first_boot option
It is same as

   -Dcassandra.replace_address
   -Dcassandra.replace_address_first_boot

in cassandra.
2015-11-24 10:07:24 +08:00
Asias He
bfe26ea208 config: Enable replace_token option
It is same as

   -Dcassandra.replace_token

in cassandra.

Use it as:

   $ scylla --replace-token $token1,$token2,$token3
2015-11-24 10:07:20 +08:00
Asias He
730abbc421 config: Enable replace_node option
It is same as

   -Dcassandra.replace_node

in cassandra.

Use it as:

   $ scylla --replace-node $node_uuid
2015-11-24 10:07:16 +08:00
Asias He
2513d6ddbe config: Enable load_ring_state option
It is same as

   -Dcassandra.load_ring_state

in cassandra.

Use it as:

   $ scylla --load-ring-state 0

or

   $ scylla --load-ring-state 1
2015-11-24 10:07:12 +08:00
Asias He
6e72e78e0d config: Enable join_ring option
It is same as

   -Dcassandra.join_ring

in cassandra.

Use it as:

   $ scylla --join-ring 0

or

   $ scylla --join-ring 1
2015-11-24 10:07:07 +08:00
Asias He
505b3e4936 config: Enable consistent_rangemovement option
It is same as

  -Dcassandra.consistent.rangemovement

in cassandra.

Use it as:

  $ scylla --consistent-rangemovement 0

or

  $ scylla --consistent-rangemovement 1
2015-11-24 10:06:54 +08:00
Gleb Natapov
33e5097090 messaging: do not kill live connection needlessly
Messaging service closes connection in rpc call continuation on
closed_error, but the code runs for each outstanding rpc call on the
connection, so first continuation may destroy genuinely closed connection,
then connection is reopened and next continuation that handless previous
error kills now perfectly healthy connection. Fix this by closing
connection only in error state.
2015-11-23 20:16:28 +02:00
Tomasz Grabiec
cb0b56f75f Merge tag 'empty/v3' from https://github.com/avikivity/scylla
From Avi:

Origin supports a notion of empty values for non-container types; these
are serialized as zero-length blobs.  They are mostly useless and only
retained for compatibility.

The implementation here introduces a wrapper maybe_empty<T>, similar to
optional<T> but oriented towards usually-nonempty usage with implicit
conversion.

There is more work needed for full empty support: fixing up deserializers to
create empty values instead of nulls, and splitting up data_value into
data_value and a data_value_nonnull for the cases that require it.

(I chose maybe_empty<> rather than using optional<data_value> for nullable
data_value both because it requires fewer changes, and because
optional<data_value> introduces a lot of control flow when moving or copying,
which would be mostly useless in most cases).
2015-11-23 16:12:06 +01:00
Calle Wilund
b1a0c4b451 commitlog_tests: Add segment corruption tests
Test entry and chunk corruption.
2015-11-23 15:43:33 +01:00
Calle Wilund
d65adef10c commitlog_tests: test cleanup
This cleanup patch got lost in git-space some time ago. It is however sorely
needed...

* Use cleaner wrapper for creating temp dir + commit log, avoiding
  having to clear and clean in every test, etc.
* Remove assertions based on file system checks, since these are not
  valid due to both the async nature of the CL, and more to the point,
  because of pre-allocation of files and file blocks. Use CL
  counters/methods instead
* Fix some race conditions to ensure tests are safe(r)
* Speed up some tests
2015-11-23 15:42:45 +01:00
Calle Wilund
262f44948d commitlog: Add get_flush_count method (for testing) 2015-11-23 15:42:45 +01:00
Calle Wilund
76b43fbf74 commitlog_replayer: Handle replay data errors as non-fatal
Discern fatal and non-fatal excceptions, and handle data corruption 
by adding to stats, resporting it, but continue processing.

Note that "invalid_arguement", i.e. attempting to replay origin/old
segments are still considered fatal, as it is probably better to 
signal this strongly to user/admin
2015-11-23 15:42:45 +01:00
Calle Wilund
2fe2320490 commitlog: Make reading segments with crc/data errors non-fatal
Parser object now attempts to skip past/terminate parsing on corrupted
entries/chunks (as detected by invalid sizes/crc:s). The amount of data
skipped is kept track of (as well as we can estimate - pre-allocation
makes it tricky), and at the end of parsing/reporting, IFF errors 
occurred, and exception detailing the failures is thrown (since 
subsciption has little mechanism to deal with this otherwise). 

Thus a caller can decide how to deal with data corruption, but will be
given as many entries as possible.
2015-11-23 15:42:45 +01:00
Avi Kivity
23895ac7f5 types: fix up confusion around empty serialized representation
An empty serialized representation means an empty value, not NULL.

Fix up the confusion by converting incorrect make_null() calls to a new
make_empty(), and removing make_null() in empty-capable types like
bytes_type.

Collections don't support empty serialized representations, so remove
the call there.
2015-11-22 12:20:24 +02:00
Tomasz Grabiec
ae9e0c3d41 storage_proxy: Avoid potential use after move on schema_ptr
Paramter evaluation order is unspecified, so it's possible that the
move of 'schema' into lambda captures would happen before construction of
mutation.
2015-11-22 12:15:04 +02:00
Avi Kivity
0799251a9f Merge "optimize the sstable loading step of boot" from Raphael
"To speed up boot, parallelism was introduced to our code that loads
sstables from a column family, a function was implemented to read
the minimum from a sstable to determine whether it belongs to the
current shard, and buffer size in read simple is dynamically chosen
based on the size of the file and dma alignment.
The latter is important because filter file can be considerably
large when the respective sstable (data file) is very large.
Before this patchset, scylla took about 5 minutes to boot with a
data directory of 660GB. After this patchset, scylla took about 20
seconds to boot with the same data directory."
2015-11-22 11:27:34 +02:00
Asias He
23723991ed gossip: Fix STATUS field in nodetool gossipinfo
Before:
   === with c* cluster ===
   $ nodetool -p 7100 gossipinfo

   STATUS:NORMAL,-1139428872328849340

   === with scylla ===
   $ nodetool -p 7100 gossipinfo

   0:NORMAL,8251763528961471825;-9147358554612963965;5334343410266177046

After:
   === with scylla ===
   $ nodetool -p 7100 gossipinfo

   0:NORMAL,8251763528961471825

To align with c*, print one token in the STATUS field.

Refs #508.
2015-11-20 10:57:49 +02:00
Raphael S. Carvalho
a5842642fa sstables: change buf size in read_simple to 128k
Avi says:
"A small buffer size will hurt if we read a large file, but
a large buffer size won't hurt if we read a small file, since
we close it immediately."

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:35:25 -02:00
Raphael S. Carvalho
0f3ccc1143 db: optimize the sstable loading process
Currently, we only determine if a sstable belongs to current shard
after loading some of its components into memory. For example,
filter may be considerably big and its content is irrelevant to
decide if a sstable should be included to a given shard.
Start using the functions previously introduced to optimize the
sstable loading process. add_sstable no longer checks if a sstable
is relevant to the current shard.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:25 -02:00
Raphael S. Carvalho
0053394ec0 sstables: introduce mark_sstable_for_deletion
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:24 -02:00
Raphael S. Carvalho
0ce2b7bc8d db: introduce belongs_to_current_shard
Returns true if key range belongs to current shard.
False otherwise.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:21 -02:00
Raphael S. Carvalho
f06b72eb18 sstables: introduce function to return sstable key range
Provides a function that will return sstable key range reading only
the summary component.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:19 -02:00
Raphael S. Carvalho
966e8c7144 db: introduce parallelism to sstable loading
Boot may be slow because the function that loads sstables do so
serially instead of in parallel. In the callback supplied to
lister::scan_dir, let's push the future returned by probe_file
(function that loads sstable) into a vector of future and wait
for all of them at the end.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:11 -02:00
Takuya ASADA
83c8b3e433 dist: support Ubuntu 15.10
We cannot share some dependency package names between 14.04 and 15.10, so need to add ifdefs.
Not tested on other version of Ubuntu.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-19 10:57:25 +02:00
Tomasz Grabiec
53e842aaf7 scylla-gdb.py: Fix scylla column_families command 2015-11-19 10:44:00 +02:00
Avi Kivity
0b91b643ba types: empty value support for non-container types
Origin supports (https://issues.apache.org/jira/browse/CASSANDRA-5648) "empty"
values even for non-container types such as int.  Use maybe_empty<> to
encapsulate abstract_type::native_type, adding an empty flag if needed.
2015-11-18 18:38:38 +02:00
Avi Kivity
7257f72fbf types: introduce maybe_empty<T> type alias
- T for container types (that can naturally be empty)
 - emptyable<T> otherwise (adding that property artificially)
2015-11-18 15:25:24 +02:00
Avi Kivity
58d3a3e138 types: introduce emptyable<> template
Similar to optional<>, with the following differences:
 - decays back to the encapsulated type, with an emptiness check;
   this reflects the expectation that the value will rarely be empty
 - avoids conditionals during copy/move (and requires a default constructor),
   again with the same expectation.
2015-11-18 15:25:22 +02:00
Gleb Natapov
0870caaea1 cql transport: catch all exceptions
Not all exceptions are inherited from std::exception
(std::nested_exception) for instance, so catch and log all of them.
2015-11-18 15:17:43 +02:00
Asias He
242e5ea291 streaming: Ignore remote no_such_column_family for stream_transfer_task
When we start to sending mutations for cf_id to remote node, remote node
might do not have the cf_id anymore due to dropping of the cf for
instance.

We should not fail the streaming if this happens, since the cf does not
exist anymore there is no point streaming it.

Fixes #566
2015-11-18 15:12:23 +02:00
Asias He
3816e35d11 storage_service: Detect other bootstrapping/leaving/moving nodes during bootstrap 2015-11-18 15:11:56 +02:00
Avi Kivity
6390bc3121 README: instructions for contributing 2015-11-18 15:10:37 +02:00
Asias He
3b52033371 gossip: Favor newly added node in do_gossip_to_live_member
When a new node joins a cluster, it will starts a gossip round with seed
node. However, within this round, the seed node will not tell the new
node anything it knows about other nodes in the cluster, because the
digest in the gossip SYN message contains only the new node itself and
no other nodes. The seed node will pick randomly from the live nodes,
including the newly added node in do_gossip_to_live_member to start a
gossip round. If the new node is "lucky", seed node will talk to it very
soon and tells all the information it knows about the cluster, thus the
new node will mark the seed node alive and think it has seen the seed
node. If there considerably large number of live nodes, it might take a
long time before the seed node pick the new node and talk to it.

In bootstrap code, storage_service::bootstrap checks if we see any nodes
after sleep of RING_DELAY milliseconds and throw "Unable to contact any
seeds!" if not, thus the node will fail to bootstrap.

To help the seed node talk to new node faster, we favor new node in
do_gossip_to_live_member.
2015-11-18 15:00:37 +02:00
Amnon Heiman
374414ffd0 API: failure_detector modify the get_all_endpoint_states
In origin, get_all_endpoint_states perform all the information
formatting and returns a string.

This is not a good API approach, this patch replaces the implementation
so the API will return an array of values and the JMX will do the
formatting.

This is a better API and would make it simpler in the future to stay in
sync with origin output.

This patch is part of #508

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-18 14:59:09 +02:00
Avi Kivity
17f6dc3671 Merge seastar upstream
* seastar 95ddb8e...84cb6df (2):
  > rpc: do not convert EOF into exception
  > reactor: remove debug output in command line option validation
2015-11-18 11:20:27 +02:00
Takuya ASADA
16cd5892f7 dist: setup rps on scylla_prepare, not on scylla_run
All preparation of running scylla should be done in scylla_prepare

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-18 11:20:17 +02:00
Asias He
269ea7f81b storage_service: Enable is_ready_for_bootstrap in join_token_ring
The goal is to make sure our schema matches with other nodes in the
cluster.
2015-11-18 10:46:40 +02:00
Asias He
bb1470f0d4 migration_manager: Introduce is_ready_for_bootstrap
This compares local schema version with other nodes in the cluster.
Return true if all of them match with each other.
2015-11-18 10:46:06 +02:00
Avi Kivity
ba859acb3b big_decimal: add default constructor
Arithmetic types should have a default constructor, and anyway the
following patch wants it.
2015-11-18 10:36:03 +02:00
Takuya ASADA
f0a6c33b6d dist: use /var/lib/scylla instead of /data on ami
Fixes #551.
Change mountpoint to /var/lib/scylla, copy conf/ on it.
Note: need to replace conf/ with symlink to /etc/scylla when new rpm uploaded on yum repository.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Signed-off-by: Pekka Enberg <penberg@iki.fi>
2015-11-18 10:10:48 +02:00