Commit Graph

8635 Commits

Author SHA1 Message Date
Pekka Enberg
dfcc48d82a transport: Add result metadata to PREPARED message
The gocql driver assumes that there's a result metadata section in the
PREPARED message. Technically, Scylla is not at fault here as the CQL
specification explicitly states in Section 4.2.5.4. ("Prepared") that the
section may be empty:

   - <result_metadata> is defined exactly as <metadata> but correspond to the
      metadata for the resultSet that execute this query will yield. Note that
      <result_metadata> may be empty (have the No_metadata flag and 0 columns, See
      section 4.2.5.2) and will be for any query that is not a Select. There is
      in fact never a guarantee that this will non-empty so client should protect
      themselves accordingly. The presence of this information is an

However, Cassandra always populates the section so lets do that as well.

Fixes #912.

Message-Id: <1456317082-31688-1-git-send-email-penberg@scylladb.com>
2016-02-24 14:43:24 +02:00
Avi Kivity
fedba9d6cd Merge "reduce gossip round latency" from Asias
"This series makes gossip message handling to be async to reduce gossip round
latency. Commit log of patch 3 explains the issue in detail.

Refs: #900"
2016-02-24 13:44:06 +02:00
Avi Kivity
b42a32efc7 Update scylla-ami submodule
* dist/ami/files/scylla-ami 398b1aa...d4a0e18 (3):
  > Sort service running order (scylla-ami-setup.service -> scylla-io-setup.service -> scylla-server.service)
  > Drop --ami and --disk-count parameters
  > dist: pass the number of disks to set io params
2016-02-24 13:38:05 +02:00
Avi Kivity
cda29c0324 Merge seastar upstream
* seastar 8c560f2...769cb8b (4):
  > temporary_buffer: make operator bool explicit (and const)
  > iotune: use SEASTAR_IO instead of SCYLLA_IO
  > iotune: add --format option, to use EnvironmentFile on systemd
  > sstring: add data() methods
2016-02-24 13:38:05 +02:00
Avi Kivity
efabb1a1d8 commitlog: fix buffer size calculation
We were adding bool(buffer), instead of buffer.size(); exposed by making
temporary_buffer::operator bool explicit.
2016-02-24 13:38:05 +02:00
Asias He
697b16414a gossip: Make gossip message handling async
In each gossip round, i.e., gossiper::run(), we do:

1) send syn message
2)                           peer node: receive syn message, send back ack message
3) process ack message in handle_ack_msg
   apply_state_locally
     mark_alive
       send_gossip_echo
     handle_major_state_change
       on_restart
       mark_alive
         send_gossip_echo
       mark_dead
         on_dead
       on_join
     apply_new_states
       do_on_change_notifications
          on_change
4) send back ack2 message
5)                            peer node: process ack2 message
   			      apply_state_locally

At the moment, syn is "wait" message, it times out in 3 seconds. In step
3, all the registered gossip callbacks are called which might take
significant amount of time to complete.

In order to reduce the gossip round latency, we make syn "no-wait" and
do not run the handle_ack_msg insdie the gossip::run(). As a result, we
will not get a ack message as the return value of a syn message any
more, so a GOSSIP_DIGEST_ACK message verb is introduced.

With this patch, the gossip message exchange is now async. It is useful
when some nodes are down in the cluster. We will not delay the gossip
round, which is supposed to run every second, 3*n seconds (n = 1-3,
since it talks to 1-3 peer nodes in each gossip round) or even
longer (considering the time to run gossip callbacks).

Later, we can make talking to the 1-3 peer nodes in parallel to reduce
latency even more.

Refs: #900
2016-02-24 19:33:39 +08:00
Asias He
63df54b368 messaging_service: Add GOSSIP_DIGEST_ACK
We will soon switch to use no-wait message for gossip. GOSSIP_DIGEST_SYN
will no longer return GOSSIP_DIGEST_ACK message. So we need a standalone
verb for GOSSIP_DIGEST_ACK.
2016-02-24 19:31:14 +08:00
Asias He
022c7e50a1 failure_detector: Fix false alarm of "Not marking nodes down due to local pause of"
The problem is we initialize _last_interpret when failure_detector
object is constructed. When interpret() runs for the first time, the
_last_interpret value is not the last time we run interpret() but the
time we initialize failure_detector object.

Fix by initializing _last_interpret inside interpret().

[Thu Feb 18 02:40:04 2016] INFO  [shard 0] storage_service - Node 127.0.0.1 state jump to normal
[Thu Feb 18 02:40:04 2016] INFO  [shard 0] storage_service - NORMAL: node is now in normal status
[Thu Feb 18 02:40:04 2016] INFO  [shard 0] gossip - Waiting for gossip to settle before accepting client requests...
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - No gossip backlog; proceeding
Starting listening for CQL clients on 127.0.0.1:9042...
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - Node 127.0.0.2 is now part of the cluster
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - InetAddress 127.0.0.2 is now UP
[Thu Feb 18 02:40:13 2016] INFO  [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2
[Thu Feb 18 02:40:13 2016] WARN  [shard 0] failure_detector - Not marking nodes down due to local pause of 9091 > 5000 (milliseconds)
2016-02-24 19:31:14 +08:00
Avi Kivity
e993102cb5 Merge "introduce scylla-io-setup.service" from Takuya
"Add scylla-io-setup.service to configure max-io-requests and num-io-queues on first boot.
Moved SCYLLA_IO configuration code from scylla_sysconfig_setup to scylla-io-setup.service, revert commits related it.
On scylla-io-setup.service, autodetect Amazon EC2 instead of using AMI variable on sysconfig."
2016-02-24 10:13:23 +02:00
Takuya ASADA
c4035a0a13 dist: add comment about /etc/scylla.d/io.conf on sysconfig 2016-02-24 04:00:52 +09:00
Takuya ASADA
0f20abb365 Revert "dist: introduce SCYLLA_IO"
This reverts commit 5cae2560a3.

Conflicts:
	dist/common/sysconfig/scylla-server
2016-02-24 03:46:14 +09:00
Takuya ASADA
b79a1a77da Revert "dist: update SCYLLA_IO with params for AMI"
This reverts commit 5494135ddd.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup
2016-02-24 03:45:11 +09:00
Takuya ASADA
643beefc8c Revert "Revert "dist: remove AMI entry from sysconfig, since there is no script refering it""
This reverts commit 21e6720988.
2016-02-24 03:33:50 +09:00
Takuya ASADA
66c5feb9e9 Revert "dist: align ami option with others (-a --> --ami)"
This reverts commit 312f1c9d98.
2016-02-24 03:33:41 +09:00
Takuya ASADA
a9926f1cea dist: introduce scylla-io-setup.service to setup io parameters on first startup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-24 03:33:03 +09:00
Tomasz Grabiec
79bcb5a616 tests: Fix build of memory_footprint 2016-02-23 19:12:54 +01:00
Amnon Heiman
f461ebc411 idl-compiler: Add pos and rollback to serialize vector
This adds the ability to store a position of a serialized vector and to
rollback to that stored position afterwards.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1456041750-1505-3-git-send-email-amnon@scylladb.com>
2016-02-23 17:49:51 +01:00
Amnon Heiman
ea97e07ed7 serialization_visitors: Adding vector_position struct
While serialization vector it is sometimes required to rollback some of
the serialized elements.

vector_position is the equivalent to the bytes_ostream position struct.
It holds information about the current position in a serialized vector,
the position in the bufffer and the current number of elements
serialized.

It will allow to rollback to the current point.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1456041750-1505-2-git-send-email-amnon@scylladb.com>
2016-02-23 17:49:51 +01:00
Tomasz Grabiec
f72fd9eefd Merge branch 'pdziepak/canonical-mutation-idl/v1' from sesastar-dev.git 2016-02-23 17:02:43 +01:00
Tomasz Grabiec
995b638d96 mutation_partition_visitor: Fix crash for large blobs
Fixes #927.

The new visiting code builds cell instances using
atomic_cell::make_*() factory methods, which won't work in LSA context
because they depend on managed_bytes storage to be linearized. It may
not be since large blob support. This worked before because we created
cells from views before which works in all contexts.

Fix by constructing them in standard allocator context.

Message-Id: <1456234064-13608-2-git-send-email-tgrabiec@scylladb.com>
2016-02-23 16:41:39 +02:00
Tomasz Grabiec
33cf65c2aa mutation_partition_view: Fix use-after-move on visitor instance
The line:

  boost::apply_visitor(atomic_cell_or_collection_visitor(std::move(visitor), id, col), cell);

is executed in a loop, so visitor could be used after being
moved-from. This may not always be allowed for some visitors. Also,
vistors may keep state, which should be preserved for the whole
visitation.

This doesn't fix any issue right now.

Message-Id: <1456234064-13608-1-git-send-email-tgrabiec@scylladb.com>
2016-02-23 16:41:39 +02:00
Yoav Kleinberger
f822359d96 bugfix: fixed broken --print-config option
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <57b452106cdcd9ceb09da4c63781650cefe48040.1456234464.git.yoav@scylladb.com>
2016-02-23 15:35:44 +02:00
Asias He
f7fccc6efb locator: Fix get token from a range<token>
With a range{t1, t2}, if t2 == {}, the range.end() will contain no
value. Fix getting t2 in this case.

Fixes #911.
Message-Id: <4462e499d706d275c03b116c4645e8aaee7821e1.1456128310.git.asias@scylladb.com>
2016-02-23 14:29:26 +01:00
Pekka Enberg
4a4074ad21 tools/scyllatop: Sort metrics by name
This makes the output much easier to read, especially if you have tons
of metrics specified.

Message-Id: <1456230377-3149-1-git-send-email-penberg@scylladb.com>
2016-02-23 14:35:57 +02:00
Takuya ASADA
0f87922aa6 main: notify service start completion ealier, to reduce systemd unit startup time
Fixes #910

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455830245-11782-1-git-send-email-syuu@scylladb.com>
2016-02-23 14:33:16 +02:00
Pekka Enberg
1f6cac8839 tools/scyllatop: Use 'erase' to clear the screen
The 'clear' function explicitly clears the screen and repaints it which
causes really annoying flicker. Use 'erase' to make scyllatop more
pleasant on the eyes.

Message-Id: <1456229348-2194-1-git-send-email-penberg@scylladb.com>
2016-02-23 14:12:48 +02:00
Tomasz Grabiec
2b5253927f test.py: Print output on timeout as well
It is often the case that the there is useful debugging information
printed by the test before it hangs. It is annoying to see just "TIMED
OUT" in jenkins. Print the output always when it is available.

In addition to that, we should not interpret all exceptions thrown
from communicate() as timeouts. For example, currently ^C sent to the
script misleadingly results in "TIMED OUT" to be printed.
Message-Id: <1456174992-21909-1-git-send-email-tgrabiec@scylladb.com>
2016-02-23 13:41:11 +02:00
Pekka Enberg
78c6fdf429 cql3/functions: Fix is_pure() for native scalar functions
Every native scalar function is already tagged whether they're pure or
not but because we don't implement the is_pure() function, all functions
end up being advertised as pure. This means that functions like now()
that are *not* pure, end up being evaluated only once.

Fixes #571.
Message-Id: <1456227171-461-1-git-send-email-penberg@scylladb.com>
2016-02-23 12:37:32 +01:00
Yoav Kleinberger
74fbc62129 ScyllaTop: top-like tool to see live scylla metrics
requires a local collectd configured with the unix-sock plugin,
use the --help option for more. Run it with:

        $ scyllatop.py --help

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <bd3f8c7e120996fc464f41f60130c82e3fb55ac6.1456164703.git.yoav@scylladb.com>
2016-02-23 12:32:47 +02:00
Avi Kivity
8ba474f1c9 Merge "Drop empty partitions from mutation query results" from Tomasz 2016-02-23 11:18:47 +02:00
Tomasz Grabiec
c591157755 tests: mutation_query: Add test for dropping partitions with expired tombstones 2016-02-22 20:23:29 +01:00
Tomasz Grabiec
41d475d9c0 schema_builder: Fluentize property setters 2016-02-22 20:23:29 +01:00
Tomasz Grabiec
6fdaf110d6 mutation_query: Don't include empty partitions
In same cases we may have a lot of empty partitions whose tombstones
expired, and there is no point in including them in the results.

This was found to cause performance issues for workloads using batch
updates. system.batchlog table would accumulate a lot of deletes over
time. It has gc_grace_seconds set to 0 so most of the tombstones would
be expired. mutation queries done by batchlog manager were still
returning all partitions present in memtables which caused mutation
queries result to be inflated. This in turn was causing
mutation_result_merger to take a long time to process them.
2016-02-22 20:21:23 +01:00
Pekka Enberg
4ff1692248 cql3: Make 'CREATE TYPE' error message human readable
We don't support the 'CREATE TYPE' statement for now. The user-visible
error message, however, is unreadable because our CQL parser doesn't
even recognize the statement.

  cqlsh:ks1> CREATE TYPE config (url text);
  SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message=" : cannot match to any predicted input...

Implement just enough of 'CREATE TYPE' parsing to be able to report a
human readable error message if someone tries to execute such
statements:

  cqlsh:ks1> CREATE TYPE config (url text);
  ServerError: <ErrorMessage code=0000 [Server error] message="User-defined types are not supported yet">
Message-Id: <1456148719-9473-2-git-send-email-penberg@scylladb.com>
2016-02-22 14:50:25 +01:00
Pekka Enberg
d1bbd0271a cql3: Return const reference from ut_name::get_keyspace()
There's no need to copy the string but it does make it more difficult to
use get_keyspace() from other places that already return a const
reference.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
Message-Id: <1456148719-9473-1-git-send-email-penberg@scylladb.com>
2016-02-22 14:50:25 +01:00
Pekka Enberg
a15cbf0968 transport: Remove read_unsigned_short() variant
As explained in commit 0ff0c55 ("transport: server: 'short' should be
unsigned"), "short" type is always unsigned in the CQL binary protocol.
Therefore, drop the read_unsigned_short() variant altogether and just
use read_short() everywhere.

Message-Id: <1456133171-1433-1-git-send-email-penberg@scylladb.com>
2016-02-22 11:39:33 +02:00
Tomasz Grabiec
fb3344eba1 sstables: Do not write corrupted sstables when column names are too large
This may result in errors during reading like the following one:

  runtime error: Unexpected marker. Found k, expected \x01\n)'

The error above happened when executing limits.py:max_key_length_test dtest.

After change the exception will happen during writing and will be clearer.

Refs #807.

This patch doesn't deal with the problem of ensuring that we will
never hit those errors, which is very desirable. We shouldn't ack a
write if we can't persist it to sstables.

Message-Id: <1456130045-2364-1-git-send-email-tgrabiec@scylladb.com>
2016-02-22 11:03:16 +02:00
Vlad Zolotarov
f2c6f16a50 locator: everywhere_replication_strategy: change the class_registrator name to "EverywhereStrategy"
Change the name used with class_registrator from "EverywhereReplicationStrategy"
(used in the initial patch from CASSANDRA-826 JIRA) to "EverywhereStrategy"
as it is in the current DCE code.

With this change one will be able to create an instance of
everywhere_replication_strategy class by giving either
an "org.apache.cassandra.locator.EverywhereStrategy" (full name) or
an "EverywhereStrategy" (short name) as a replication strategy name.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1456081258-937-1-git-send-email-vladz@cloudius-systems.com>
2016-02-22 09:18:47 +02:00
Vlad Zolotarov
cc30956c56 locator: added EverywhereReplicationStrategy
This strategy would ignore an RF configuration and would
always try to replicate on all cluster nodes.

This means that its get_replication_factor()  would return a
number of currently "known" nodes in the cluster and
if a cluster is currently bootstrapping this value obviously may
change in time for the same key. Therefore using this strategy
should be done with caution.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1456074333-15014-3-git-send-email-vladz@cloudius-systems.com>
2016-02-21 19:29:29 +02:00
Vlad Zolotarov
ec14fb2a70 locator: token_metadata: add get_all_endpoints_count()
Return a number of currently known endpoints when
it's needed in a fast path flow.

Calling a get_all_endpoints().size() for that matter
would not be fast enough because of the unordered_set->vector transformation
we don't need.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1456074333-15014-2-git-send-email-vladz@cloudius-systems.com>
2016-02-21 19:29:28 +02:00
Avi Kivity
63841b425d Merge seastar upstream
* seastar c829b69...8c560f2 (2):
  > iotune: add missing static variable definitions
  > prevent futures ignored by parallel_for_each from generating warnings
2016-02-21 18:39:28 +02:00
Shlomi Livne
312f1c9d98 dist: align ami option with others (-a --> --ami)
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <c159aac7f0478aba34d4398a2eb8ea71285ede21.1456052976.git.shlomi@scylladb.com>
2016-02-21 15:06:20 +02:00
Shlomi Livne
21e6720988 Revert "dist: remove AMI entry from sysconfig, since there is no script refering it"
This reverts commit 54f9e59006.

AMI is needed to setting up io params

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <4154a000f019059f740319cfa2fbf875568770b7.1456052976.git.shlomi@scylladb.com>
2016-02-21 15:06:20 +02:00
Tomasz Grabiec
d376167bd4 cql: create_table_statement: Optimize duplicate column names detection
Current algorithm is O(N^2) where N is the column count. This causes
limits.py:TestLimits.max_columns_and_query_parameters_test to timeout
because CREATE TABLE statement takes too long.

This change replaces it with an algorithm of O(N)
complexity. _defined_names are already sorted so if any duplicates
exist, they must be next to each other.

Message-Id: <1456058447-5080-1-git-send-email-tgrabiec@scylladb.com>
2016-02-21 14:55:03 +02:00
Tomasz Grabiec
d3b7e143dc db: Fix error handling in populate_keyspace()
When find_uuid() fails Scylla would terminate with:

  Exiting on unhandled exception of type 'std::out_of_range': _Map_base::at

But we are supposed to ignore directories for unknown column
families. The try {} catch block is doing just that when
no_such_column_family is thrown from the find_column_family() call
which follows find_uuid(). Fix by converting std::out_of_range to
no_such_column_family.

Message-Id: <1456056280-3933-1-git-send-email-tgrabiec@scylladb.com>
2016-02-21 14:19:31 +02:00
Tomasz Grabiec
0c8db777b1 bytes_ostream: Avoid recursion when freeing chunks
When there is a lot of chunks we may get stack overflow.

This seems to fix issue #906, a memory corruption during schema
merge. I suspect that what causes corruption there is overflowing of
the stack allocated for the seastar thread. Those stacks don't have
red zones which would catch overflow.

Message-Id: <1456056288-3983-1-git-send-email-tgrabiec@scylladb.com>
2016-02-21 14:18:49 +02:00
Raphael S. Carvalho
b1cc0490f5 sstables: make compaction manager shutdown less verbose
before:

^CINFO  [shard 0] compaction_manager - Asked to stop
INFO  [shard 0] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 0] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 1] compaction_manager - Asked to stop
INFO  [shard 2] compaction_manager - Asked to stop
INFO  [shard 1] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 2] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 3] compaction_manager - Asked to stop
INFO  [shard 1] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 2] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 3] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 3] compaction_manager - compaction task handler stopped due to shutdown

after:

^CINFO  [shard 0] compaction_manager - Asked to stop
INFO  [shard 0] compaction_manager - Stopped
INFO  [shard 1] compaction_manager - Asked to stop
INFO  [shard 2] compaction_manager - Asked to stop
INFO  [shard 3] compaction_manager - Asked to stop
INFO  [shard 1] compaction_manager - Stopped
INFO  [shard 2] compaction_manager - Stopped
INFO  [shard 3] compaction_manager - Stopped

`compaction_manager - compaction task handler stopped due to shutdown` is still printed
in debug level

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <535d5ad40102571a3d5d36257342827989e8f0f4.1455835407.git.raphaelsc@scylladb.com>
2016-02-21 11:55:17 +02:00
Raphael S. Carvalho
55be1830ff database: make column_family::rebuild_sstable_list safer
If any of the allocation in rebuild_sstable_list fail, the system
may be left with an incorrect set of sstables.
It's probably safer to assign the new set of sstables as a last
step.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <52b188262dcc06730dc9220b54ff6810d7dca1ae.1455835030.git.raphaelsc@scylladb.com>
2016-02-21 11:55:15 +02:00
Raphael S. Carvalho
9cb8a43684 start using notation ks.cf everywhere
Some places were using the notation ks/cf to represent a keyspace
and column family pair. ks.cf is the notation used by C*, so we
should use it everywhere.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <939449af92565b79d1823890784dc4d1dc3cdc84.1455830989.git.raphaelsc@scylladb.com>
2016-02-21 11:15:09 +02:00
Avi Kivity
69ac1a3229 Merge seastar upstream
* seastar cf1716f...c829b69 (1):
  > iotune: limit generate() concurrency to 128

Fixes #922.
2016-02-21 11:12:10 +02:00