Commit Graph

11716 Commits

Author SHA1 Message Date
Amnon Heiman
9ea3ffe527 idl-compiler: Add optional support
This patch adds optional writer support an optional field can be either
skip or set.

For vector of optional, a write_empty method will
add 1 to the vector count and mark the optional as false.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-26 12:25:08 +01:00
Asias He
fd5f3cff47 streaming: Fix stream_manager progress api
For each stream_session, we pretend we are sending/receiving one file,
to make it compatible with nodetool. For receiving_files, the file name
is "rxnofile". For sending_files, the file name is "txnofile".

stream_manager::update_all_progress_info is introduced to update the
progress info of all the stream_sessions in the node. We need this
because streaming mutations are received on all the cores, but the
stream_session object is only on one of the cores. It adds overhead if
we update progress info in stream_session object whenever we receive a
streaming mutation. So, what we do now is when we really need the
progress info, we update the progress info in stream_session object.

With http://127.0.0.$i:10000/stream_manager/, it looks like below when
decommission node 3 in a 3 nodes cluster.

=========== GET NODE 1
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.3", "current_bytes": 16876296}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]

=========== GET NODE 2

[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16755552, "peer": "127.0.0.3", "current_bytes": 16755552}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]

=========== GET NODE 3
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"sending_files": [{"value": {"direction":
"OUT", "file_name": "txnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.1", "current_bytes": 16876296}, "key":
"txnofile"}], "sending_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.1", "peer":
"127.0.0.1"},{"sending_files": [{"value": {"direction": "OUT",
"file_name": "txnofile", "session_index": 0, "total_bytes": 16755552,
"peer": "127.0.0.2", "current_bytes": 16755552}, "key": "txnofile"}],
"sending_summaries": [{"files": 1, "total_size": 0, "cf_id":
"869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state":
"PREPARING", "connecting": "127.0.0.2", "peer": "127.0.0.2"}]}]
2016-02-26 17:38:37 +08:00
Asias He
37f52d632f streaming: Remove unused progress() function 2016-02-26 17:38:37 +08:00
Asias He
8060b97d67 streaming: Log number of bytes sent and recevied when stream_plan completes
It is useful for test code to verify number of bytes sent/received.

It looks like below in the log.

/tmp/out1:INFO  [shard 0] stream_session - \
[Stream #1f3e23f0-db9e-11e5-9cfb-000000000000] bytes_sent = 0, bytes_received = 15760704

/tmp/out2:INFO  [shard 0] stream_session - \
[Stream #1f3e23f0-db9e-11e5-9cfb-000000000000] bytes_sent = 0, bytes_received = 18203964

/tmp/out3:INFO  [shard 0] stream_session - \
[Stream #1f3e23f0-db9e-11e5-9cfb-000000000000] bytes_sent = 33964668, bytes_received = 0
2016-02-26 17:38:37 +08:00
Asias He
9dede89e07 streaming: Add get_progress_on_all_shards for plan_id
Get stream_bytes for a specific plan_id.
2016-02-26 17:38:37 +08:00
Tomasz Grabiec
97558b2cfe idl-compiler: Put serializers inside template class specializations
The problem is that a generic functions (eg. skip()) which call
deserialize() overloads based on their template parameter only see
deserilize() overloads which were declared at the time skip() was
declared and not those which are available at the time of
instantiation. This forces all serializers to be declared before
serialization_visitors.hh is first included. Serializers included
later will fail to compile. This becomes problematic to ensure when
serializers are included from headers.

Template class specialization lookup doesn't suffer from this
limitation. We can use that to solve the problem. The IDL compiler
will now generate template class specializations with read/write
static methods. In addition to that, default serializer() and
deserialize() implementations are delegating to serializer<>
specialization so that API and existing code doesn't have to change.

Message-Id: <1456423066-6979-1-git-send-email-tgrabiec@scylladb.com>
2016-02-25 20:00:49 +02:00
Takuya ASADA
aa3f6ad462 dist: add scyllatop on .rpm/.deb
Fixes #933

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456420768-15921-1-git-send-email-syuu@scylladb.com>
2016-02-25 19:24:11 +02:00
Avi Kivity
a74f68eeb2 Merge "Properly tag readers" from Glauber
"Gleb has recently noted that our query reads are not even being registered
with the I/O queue.

Investigating what is happening, I found out that while the priority that
make_reader receives was not being properly passed downwards to the SSTable
reader. The reader code is also used by compaction class, and that one is fine.
But the CQL reads are not.

On top of that, there are also some other places where the tag was not properly
propagated, and those are patched."
2016-02-25 18:35:58 +02:00
Raphael S. Carvalho
fc4cbcde72 Revert "Revert "database: Fix use and assumptions about pending compations""
This reverts commit a4d92750eb.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <8a405e7c1daf94c4d70d8084f59ce7205d56fe52.1456415398.git.raphaelsc@scylladb.com>
2016-02-25 18:02:01 +02:00
Raphael S. Carvalho
7f0371129c tests: sstable_test: submit compaction request through column family
That's needed for reverted commit 9586793c to work. It's also the
correct thing to do, i.e. column family submits itself to manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2a1d141ad929c1957933f57412083dd52af0390b.1456415398.git.raphaelsc@scylladb.com>
2016-02-25 18:02:00 +02:00
Avi Kivity
c269527f42 Merge "Get rid of assert in gossip and storage_service" from Asias
"Make the error handling more robust."
2016-02-25 17:38:21 +02:00
Pekka Enberg
a4d92750eb Revert "database: Fix use and assumptions about pending compations"
This reverts commit 9586793c70. It breaks
sstable_test as follows:

  [penberg@nero scylla]$ build/release/tests/sstable_test --smp 1
  Running 81 test cases...
  INFO  [shard 0] compaction_manager - Asked to stop
  INFO  [shard 0] compaction_manager - Stopped
  sstable_test: database.cc:878: future<> column_family::run_compaction(sstables::compaction_descriptor): Assertion `_stats.pending_compactions > 0' failed.
  unknown location(0): fatal error in "compaction_manager_test": signal: SIGABRT (application abort requested)
  tests/sstable_datafile_test.cc(1023): last checkpoint
2016-02-25 15:28:06 +02:00
Asias He
32eaaecf36 gossip: Get rid of assert
Log the error and throw the exception, instead of abort the whole
process. Make the code more robust.
2016-02-25 21:19:52 +08:00
Asias He
699fd25467 storage_service: Get rid of assert
We can recover from most of the errors. Log the error and throw the
exception, instead of abort the whole process. Make the code more
robust.
2016-02-25 21:19:52 +08:00
Asias He
59564591d5 storage_service: Use get_gossip_status to get status
The help is introduced recently, use it. Avoid to open code it.
2016-02-25 21:19:52 +08:00
Pekka Enberg
8e2c924de3 cql3: Fix quadratic behavior in update_statement::parsed_insert::prepare_internal()
This fixes a quadratic search for duplicate columns in prepare_internal().

Refs #822.

Message-Id: <1456405104-16482-1-git-send-email-penberg@scylladb.com>
2016-02-25 15:06:56 +02:00
Yoav Kleinberger
872079d999 tools/scyllatop: correct mistake in help text
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <01844d90f2d942a051d128b03ae12578ac0bb69c.1456324697.git.yoav@scylladb.com>
2016-02-25 12:49:48 +02:00
Asias He
94cb7f22d4 gossip: Make add_local_application_state safe to call on any cpu
add_local_application_state is used in various places. Before this
patch, it can only be called on cpu zero. To make it safer to use, use
invoke_on() to foward the code to run on cpu zero, so that caller can
call it on any cpu.

Refs: #795
Message-Id: <d69b81c5561622078dbe887d87209c4ea2e3bf46.1456315043.git.asias@scylladb.com>
2016-02-25 12:45:54 +02:00
Asias He
4e931c2453 gossip: Log the error when fails to add local application state
Gleb saw once:

scylla: gms/gossiper.cc:1393:
gms::gossiper::add_local_application_state(gms::application_state,
gms::versioned_value):: mutable: Assertion
`endpoint_state_map.count(ep_addr)' failed.

The assert is about we can not find the entry in endpoint_state_map of
the node itself. I can not really find any place we could call
add_local_application_state before we call gossiper::start_gossiping()
where it inserts broadcast address into endpoint_state_map.

I can not reproduce issue, let's log the error so we can narrow down
which application state triggered the assert.

Refs: #795
Message-Id: <f4433be0a0d4f23470a5e24e528afdb67b74c7ef.1456315043.git.asias@scylladb.com>
2016-02-25 12:45:17 +02:00
Takuya ASADA
b250a3b116 dist: Add collectd configuration support on .rpm/.deb
Depends on collectd, add /etc/collectd.d/scylla.conf on scylla-server package installation.
Fixes #946

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456336200-11876-1-git-send-email-syuu@scylladb.com>
2016-02-25 10:35:47 +02:00
Takuya ASADA
28dd202613 scyllatop: add --logfile argument to specify path to log file
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456333116-7389-2-git-send-email-syuu@scylladb.com>
2016-02-25 10:33:41 +02:00
Takuya ASADA
af3a8ead21 scyllatop: output error message both on log file and stdout
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456333116-7389-1-git-send-email-syuu@scylladb.com>
2016-02-25 10:33:40 +02:00
Calle Wilund
9586793c70 database: Fix use and assumptions about pending compations
Fixes #934 - faulty assert in discard_sstables

run_with_compaction_disabled clears out a CF from compaction
mananger queue. discard_sstables wants to assert on this, but looks
at the wrong counters.

pending_compactions is an indicator on how much interested parties
want a CF compacted (again and again). It should not be considered
an indicator of compactions actually being done.

This modifies the usage slightly so that:
1.) The counter is always incremented, even if compaction is disallowed.
    The counters value on end of run_with_compaction_disabled is then
    instead used as an indicator as to whether a compaction should be
    re-triggered. (If compactions finished, it will be zero)
2.) Document the use and purpose of the pending counter, and add
    method to re-add CF to compaction for r_w_c_d above.
3.) discard_sstables now asserts on the right things.

Message-Id: <1456332824-23349-1-git-send-email-calle@scylladb.com>
2016-02-25 08:57:04 +02:00
Glauber Costa
6f1d0dce00 mutation_query: attach the query priority read when reading mutations
We call a mutation source during the query path without any consideration
for attaching a priority. This is incorrect, and queries called through this
facility will end up in the default class.

Fix this by attaching the query priority class here.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-02-24 18:00:34 -05:00
Glauber Costa
336babfcb8 database: add a priority class to a few SSTable readers
Not all SSTable readers will end up getting the right tag for a priority
class. In particular, the range reader, also used for the memtables complete
ignores any priority class.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-02-24 18:00:34 -05:00
Glauber Costa
2816bc6fed database: use a reference instead of a pointer to store the priority classes
We will always initialize it, so don't use a pointer.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-02-24 18:00:34 -05:00
Glauber Costa
80ab41a715 memtable reader: also include a priority class
There are situations when a memtable is already flushed but the memtable
reader will continue to be in place, relaying reads to the underlying
table.

For that reason, the "memtables don't need a priority class" argument
gets obviously broken. We need to pass a priority class for its reader
as well.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-02-24 18:00:34 -05:00
Calle Wilund
590ec1674b truncate: Require timestamp join-function to ensure equal values
Fixes #937

In fixing #884, truncation not truncating memtables properly,
time stamping in truncate was made shard-local. This however
breaks the snapshot logic, since for all shards in a truncate,
the sstables should snapshot to the same location.

This patch adds a required function argument to truncate (and
by extension drop_column_family) that produces a time stamp in
a "join" fashion (i.e. same on all shards), and utilizes the
joinpoint type in caller to do so.

Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>
2016-02-24 18:59:31 +02:00
Calle Wilund
43ea1f5945 utils::jointpoint: Helper type to generate a singular value for all shards
Lets operations working on all shards "join" and acquire
the same value of something, with that value being based on
whenever all shards reach the join.

Obvious use case: time stamp after one set of per-shard ops, but
before final ones.

The generation of the value is guaranteed to happen on the shards
that created the join point.

Based on the join-ops in CF::snapshot, but abstracted and made
caller responsibility. Primary use case is to help deal with
the join-problem of truncation.

Message-Id: <1456332856-23395-1-git-send-email-calle@scylladb.com>
2016-02-24 18:59:25 +02:00
Yoav Kleinberger
c3ce9e53cb tools/scyllatop: support glob patterns to specifiy metrics
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <42f84cdeeb75c3719230028a13a1dd8499673d4c.1456319441.git.yoav@scylladb.com>
2016-02-24 15:35:45 +02:00
Raphael S. Carvalho
bb48f1b06c sstables: use system clock's epoch for timestamp in compaction history
As pointed out by Tomek, the type of column used is timestamp, therefore
system's clock epoch (db_clock) should be used instead.

Fixes #817.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <f80f9f411d673cf2d653e193ccb8ebaa36bc891b.1456317766.git.raphaelsc@scylladb.com>
2016-02-24 14:49:21 +02:00
Pekka Enberg
dfcc48d82a transport: Add result metadata to PREPARED message
The gocql driver assumes that there's a result metadata section in the
PREPARED message. Technically, Scylla is not at fault here as the CQL
specification explicitly states in Section 4.2.5.4. ("Prepared") that the
section may be empty:

   - <result_metadata> is defined exactly as <metadata> but correspond to the
      metadata for the resultSet that execute this query will yield. Note that
      <result_metadata> may be empty (have the No_metadata flag and 0 columns, See
      section 4.2.5.2) and will be for any query that is not a Select. There is
      in fact never a guarantee that this will non-empty so client should protect
      themselves accordingly. The presence of this information is an

However, Cassandra always populates the section so lets do that as well.

Fixes #912.

Message-Id: <1456317082-31688-1-git-send-email-penberg@scylladb.com>
2016-02-24 14:43:24 +02:00
Avi Kivity
fedba9d6cd Merge "reduce gossip round latency" from Asias
"This series makes gossip message handling to be async to reduce gossip round
latency. Commit log of patch 3 explains the issue in detail.

Refs: #900"
2016-02-24 13:44:06 +02:00
Avi Kivity
b42a32efc7 Update scylla-ami submodule
* dist/ami/files/scylla-ami 398b1aa...d4a0e18 (3):
  > Sort service running order (scylla-ami-setup.service -> scylla-io-setup.service -> scylla-server.service)
  > Drop --ami and --disk-count parameters
  > dist: pass the number of disks to set io params
2016-02-24 13:38:05 +02:00
Avi Kivity
cda29c0324 Merge seastar upstream
* seastar 8c560f2...769cb8b (4):
  > temporary_buffer: make operator bool explicit (and const)
  > iotune: use SEASTAR_IO instead of SCYLLA_IO
  > iotune: add --format option, to use EnvironmentFile on systemd
  > sstring: add data() methods
2016-02-24 13:38:05 +02:00
Avi Kivity
efabb1a1d8 commitlog: fix buffer size calculation
We were adding bool(buffer), instead of buffer.size(); exposed by making
temporary_buffer::operator bool explicit.
2016-02-24 13:38:05 +02:00
Asias He
697b16414a gossip: Make gossip message handling async
In each gossip round, i.e., gossiper::run(), we do:

1) send syn message
2)                           peer node: receive syn message, send back ack message
3) process ack message in handle_ack_msg
   apply_state_locally
     mark_alive
       send_gossip_echo
     handle_major_state_change
       on_restart
       mark_alive
         send_gossip_echo
       mark_dead
         on_dead
       on_join
     apply_new_states
       do_on_change_notifications
          on_change
4) send back ack2 message
5)                            peer node: process ack2 message
   			      apply_state_locally

At the moment, syn is "wait" message, it times out in 3 seconds. In step
3, all the registered gossip callbacks are called which might take
significant amount of time to complete.

In order to reduce the gossip round latency, we make syn "no-wait" and
do not run the handle_ack_msg insdie the gossip::run(). As a result, we
will not get a ack message as the return value of a syn message any
more, so a GOSSIP_DIGEST_ACK message verb is introduced.

With this patch, the gossip message exchange is now async. It is useful
when some nodes are down in the cluster. We will not delay the gossip
round, which is supposed to run every second, 3*n seconds (n = 1-3,
since it talks to 1-3 peer nodes in each gossip round) or even
longer (considering the time to run gossip callbacks).

Later, we can make talking to the 1-3 peer nodes in parallel to reduce
latency even more.

Refs: #900
2016-02-24 19:33:39 +08:00
Asias He
63df54b368 messaging_service: Add GOSSIP_DIGEST_ACK
We will soon switch to use no-wait message for gossip. GOSSIP_DIGEST_SYN
will no longer return GOSSIP_DIGEST_ACK message. So we need a standalone
verb for GOSSIP_DIGEST_ACK.
2016-02-24 19:31:14 +08:00
Asias He
022c7e50a1 failure_detector: Fix false alarm of "Not marking nodes down due to local pause of"
The problem is we initialize _last_interpret when failure_detector
object is constructed. When interpret() runs for the first time, the
_last_interpret value is not the last time we run interpret() but the
time we initialize failure_detector object.

Fix by initializing _last_interpret inside interpret().

[Thu Feb 18 02:40:04 2016] INFO  [shard 0] storage_service - Node 127.0.0.1 state jump to normal
[Thu Feb 18 02:40:04 2016] INFO  [shard 0] storage_service - NORMAL: node is now in normal status
[Thu Feb 18 02:40:04 2016] INFO  [shard 0] gossip - Waiting for gossip to settle before accepting client requests...
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - No gossip backlog; proceeding
Starting listening for CQL clients on 127.0.0.1:9042...
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - Node 127.0.0.2 is now part of the cluster
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - InetAddress 127.0.0.2 is now UP
[Thu Feb 18 02:40:13 2016] INFO  [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2
[Thu Feb 18 02:40:13 2016] WARN  [shard 0] failure_detector - Not marking nodes down due to local pause of 9091 > 5000 (milliseconds)
2016-02-24 19:31:14 +08:00
Avi Kivity
e993102cb5 Merge "introduce scylla-io-setup.service" from Takuya
"Add scylla-io-setup.service to configure max-io-requests and num-io-queues on first boot.
Moved SCYLLA_IO configuration code from scylla_sysconfig_setup to scylla-io-setup.service, revert commits related it.
On scylla-io-setup.service, autodetect Amazon EC2 instead of using AMI variable on sysconfig."
2016-02-24 10:13:23 +02:00
Takuya ASADA
c4035a0a13 dist: add comment about /etc/scylla.d/io.conf on sysconfig 2016-02-24 04:00:52 +09:00
Takuya ASADA
0f20abb365 Revert "dist: introduce SCYLLA_IO"
This reverts commit 5cae2560a3.

Conflicts:
	dist/common/sysconfig/scylla-server
2016-02-24 03:46:14 +09:00
Takuya ASADA
b79a1a77da Revert "dist: update SCYLLA_IO with params for AMI"
This reverts commit 5494135ddd.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup
2016-02-24 03:45:11 +09:00
Takuya ASADA
643beefc8c Revert "Revert "dist: remove AMI entry from sysconfig, since there is no script refering it""
This reverts commit 21e6720988.
2016-02-24 03:33:50 +09:00
Takuya ASADA
66c5feb9e9 Revert "dist: align ami option with others (-a --> --ami)"
This reverts commit 312f1c9d98.
2016-02-24 03:33:41 +09:00
Takuya ASADA
a9926f1cea dist: introduce scylla-io-setup.service to setup io parameters on first startup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-24 03:33:03 +09:00
Tomasz Grabiec
79bcb5a616 tests: Fix build of memory_footprint 2016-02-23 19:12:54 +01:00
Amnon Heiman
f461ebc411 idl-compiler: Add pos and rollback to serialize vector
This adds the ability to store a position of a serialized vector and to
rollback to that stored position afterwards.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1456041750-1505-3-git-send-email-amnon@scylladb.com>
2016-02-23 17:49:51 +01:00
Amnon Heiman
ea97e07ed7 serialization_visitors: Adding vector_position struct
While serialization vector it is sometimes required to rollback some of
the serialized elements.

vector_position is the equivalent to the bytes_ostream position struct.
It holds information about the current position in a serialized vector,
the position in the bufffer and the current number of elements
serialized.

It will allow to rollback to the current point.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1456041750-1505-2-git-send-email-amnon@scylladb.com>
2016-02-23 17:49:51 +01:00
Tomasz Grabiec
f72fd9eefd Merge branch 'pdziepak/canonical-mutation-idl/v1' from sesastar-dev.git 2016-02-23 17:02:43 +01:00