Commit Graph

8453 Commits

Author SHA1 Message Date
Raphael S. Carvalho
ed61fe5831 sstables: make compaction stop report user-friendly
When scylla stopped an ongoing compaction, the event was reported
as an error. This patch introduces a specialized exception for
compaction stop so that the event can be handled appropriately.

Before:
ERROR [shard 0] compaction_manager - compaction failed: read exception:
std::runtime_error (Compaction for keyspace1/standard1 was deliberately
stopped.)

After:
INFO  [shard 0] compaction_manager - compaction info: Compaction for
keyspace1/standard1 was stopped due to shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1f85d4e5c24d23a1b4e7e0370a2cffc97cbc6d44.1455034236.git.raphaelsc@scylladb.com>
2016-02-11 12:16:53 +02:00
Takuya ASADA
8d8130f9c9 dist: fix typo on build_ami.sh
We should always run scylla_setup, not just for locally built rpm

Fixes #897

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455103519-13780-1-git-send-email-syuu@scylladb.com>
2016-02-11 11:56:11 +02:00
Shlomi Livne
64f8d5a50e dist: update packer location
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <3c33ea073f702e00b789930fce9befef03ad9e88.1455178900.git.shlomi@scylladb.com>
2016-02-11 11:52:56 +02:00
Avi Kivity
bfbf89ee31 Merge "Serialize keys in a form independent of in-memory representation" from Tomasz
"This series changes the on-wire definitions of keys to be of the following form:

  class partition_key {
     std::vector<bytes> exploded();
  };

Keys are therefore collections of components. The components are serialized according
to the format specified in the CQL binary protocol. No bit depends now on how we store keys in memory.

Constructing keys from components currently requires a schema reference,
which makes it not possible to deserialize or serialize the keys automatically
by RPC. To avoid those complications, compound_type was changed so that
it can be constructed and components can be iterated over without schema.
Because of this, partition_key size increased by 2 bytes."
2016-02-10 17:54:42 +02:00
Tomasz Grabiec
b74301302c tests: Add test for key serialization 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
3e2c1840d8 idl: Make key definitions independent of in-memory representation 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
428fce3828 compound: Optimize serialize_single() 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
0cc2832a76 keys: Allow constructing from a range 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
3ffcb998fb keys: Enable serialization from a range not just a vector 2016-02-10 14:35:14 +01:00
Tomasz Grabiec
095efd01d6 keys: Make from_exploded() and components() work without schema
For simplicity, we want to have keys serializable and deserializable
without schema for now. We will serialize keys in a generic form of a
vector of components where the format of components is specified by
CQL binary protocol. So conversion between keys and vector of
components needs to be possible to do without schema.

We may want to make keys schema-dependent back in the future to apply
space optimizations specific to column types. Existing code should
still pass schema& to construct and access the key when possible.

One optimization had to be reverted in this change - avoidance of
storing key length (2 bytes) for single-component partition keys. One
consequence of this, in addition to a bit larger keys, is that we can
no longer avoid copy when constructing single-component partition keys
from a ready "bytes" object.

I haven't noticed any significant performance difference in:

  tests/perf/perf_simple_query -c1 --write

It does ~130K tps on my machine.
2016-02-10 14:35:13 +01:00
Tomasz Grabiec
31312722d1 compound: Reduce duplication 2016-02-10 14:35:13 +01:00
Tomasz Grabiec
085d148d6f compound: Remove unused methods 2016-02-10 14:35:13 +01:00
Tomasz Grabiec
b777cc9565 tests: Fix tests to not rely on key representation 2016-02-10 14:35:13 +01:00
Asias He
6d0407503b locator: Do not generate wrap-around ranges
Like we did in commit d54c77d5d0,
make the remaining functions in abstract_replication_strategy return
non-wrap-around ranges.

This fixes:

ERROR [shard 0] stream_session - [Stream #f0b7fda0-cf3e-11e5-b6c4-000000000000]
stream_transfer_task: Fail to send to 127.0.0.4:0: std::runtime_error (Not implemented: WRAP_AROUND)

in streaming.
Message-Id: <514d2a9a1d3b868d213464c8858ac5162c0338d8.1455093643.git.asias@scylladb.com>
2016-02-10 10:03:31 +01:00
Avi Kivity
9f3061ade8 Revert "streaming: Send mutations on all shards"
This reverts commit 31d439213c.

Fixes #894.

Conflicts:
    streaming/stream_manager.cc

(may have undone part of 63a5aa6122)
2016-02-09 18:26:14 +02:00
Calle Wilund
873f87430d database: Check sstable dir name UUID part when populating CF
Fixes #870
Only load sstables from CF directories that match the current
CF uuid.
Message-Id: <1454938450-4338-1-git-send-email-calle@scylladb.com>
2016-02-08 14:48:19 +01:00
Calle Wilund
2ffd7d7b99 stream_manager: Change construction to make gcc 4.9 happy
gcc 4.9 complains about the type{ val, val } construction of
type with implicit default constructor, i.e. member = initial
declarations. gcc 5 does not (and possibly rightly so).
However, we still (implicitly) claim to support gcc 4.9 so
why not just change this particular instance.

Message-Id: <1454921328-1106-1-git-send-email-calle@scylladb.com>
2016-02-08 10:54:48 +02:00
Paweł Dziepak
c90ec731c8 transport: do not close gate at connection shutdown
connection::_pending_requests_gate is responsible for keeping connection
objects alive as long as there are outstanding requests and is closed
in connection::proccess() when needed. Closing it in connection::shutdown()
as well may cause the gate to be closed twice what is a bug.

Fixes #690.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1454596390-23239-1-git-send-email-pdziepak@scylladb.com>
2016-02-07 20:07:23 +02:00
Avi Kivity
8b0a26f06d build: support for alternative versions of libsystemd pkgconfig
While pkgconfig is supposed to be a distribution and version neutral way
of detecting packages, it doesn't always work this way.  The sd_notify()
manual page documents that sd_notify is available via the libsystemd
package, but on centos 7.0 it is only available via the libsystemd-daemon
package (on centos 7.1+ it works as expected).

Fix by allowing for alternate version of package names, testing each one
until a match is found.

Fixes #879.

Message-Id: <1454858862-5239-1-git-send-email-avi@scylladb.com>
2016-02-07 17:36:57 +02:00
Avi Kivity
ad58663c96 row_cache: reindent 2016-02-07 13:25:29 +02:00
Asias He
31d439213c streaming: Send mutations on all shards
Currently, only the shard where the stream_plan is created on will send
streaing mutations. To utilize all the available cores, we can make each
shard send mutations which it is responsbile for. On the receiver side,
we do not forward the mutations to the shard where the stream_session is
created, so that we can avoid unnecessary forwarding.

Note: the downside is that it is now harder to:

1) to track number of bytes sent and received
2) to update the keep alive timer upon receive of the STREAM_MUTATION

To fix, we now store the sent/recieved bytes info on all shards. When
the keep alive timer expires, we check if any progress has been made.

Hopefully, this patch will make the streaming much faster and in turn
make the repair/decommission/adding a node faster.

Refs: https://github.com/scylladb/scylla/issues/849

Tested with decommission/repair dtest.

Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>
2016-02-07 10:57:51 +02:00
Gleb Natapov
63a5aa6122 prevent superfluous frozen_mutation copying
Sometimes frozen_mutation is copied while it can be moved instead. Fix
those cases.

Message-Id: <20160204165708.GI6705@scylladb.com>
2016-02-07 10:54:16 +02:00
Erich Keane
4197ceeedb raw_statement::is_reversed rewrite to avoid VLA
The is_reversed function uses a variable length array, which isn't
spec-abiding C++.  Additionally, the Clang compiler doesn't allow them
with non-POD types, so this function wouldn't compile.

After reading through the function it seems that the array wasn't
necessary as the check could be calculated inline rather than
separately.  This version should be more performant (since it no longer
requires the VLA lookup performance hit) while taking up less memory in
all but the smallest of edge-cases (when the clustering_key_size *
sizeof(optional<bool>) < sizeof(size_type) - sizeof(uint32_t) +
sizeof(bool).

This patch uses  relation_order_unsupported it assure that the exception
order is consistent with the preivous version.  The throw would
otherwise be moved into the initial for-loop.

There are two derrivations in behavior:
The first is the initial assert.  It however should not change the apparent
behavior besides causing orderings() to be looked up 2x in debug
situations.

The second is the conversion of is_reversed_ from an optional to a bool.
The result is that the final return value is now well-defined to be
false in the release-condition where orderings().size() == 0, rather
than be the ill-defined *is_reversed_ that was there previously.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454546285-16076-4-git-send-email-erich.keane@verizon.net>
2016-02-07 10:38:17 +02:00
Erich Keane
49842aacd9 managed_vector: maybe_constructed ctor to non-constexpr
Clang enforces that a union's constexpr CTOR must initialize
one of the members.  The spec is seemingly silent as to what
the rule on this is, however, making this non-constexpr results in clang
accepting the constructor.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454604300-1673-1-git-send-email-erich.keane@verizon.net>
2016-02-07 10:30:45 +02:00
Erich Keane
e87019843f Fix PHI_FACTOR definition to be spec compliant
PHI_FACTOR is a constexpr variable that is defined using std::log.
Though G++ has a constexpr version of std::log, this itself is not spec
complaint (in fact, Clang enforces this).  See C++ Spec 26.8 for the
definition of std::log and 17.6.5.6 for the rule regarding adding
constexpr where it isn't specified.

This patch replaces the std::log statement with a version from math.h
that contains the exact value (M_LOG10El).

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454603285-32677-1-git-send-email-erich.keane@verizon.net>
2016-02-04 18:33:44 +02:00
Avi Kivity
c85f6c4df1 Merge seastar upstream
* seastar 661ccd9...14c9991 (1):
  > reactor: use correct open_flags when opening a file without DMA support

Fixes #871.
2016-02-04 18:17:04 +02:00
Gleb Natapov
77d47c0c4b optimize serialization of array/vector of integral types
Array of integral types on little endian machine can be memcpyed into/out
of a buffer instead of serialized/deserialized element by element.

Message-Id: <20160204155425.GC6705@scylladb.com>
2016-02-04 18:01:14 +02:00
Avi Kivity
91fbb81477 Merge seastar upstream
* seastar f8beab9...661ccd9 (1):
  > Merge "Use swapcontext() with AddressSanitizer" from Paweł
2016-02-04 17:30:15 +02:00
Paweł Dziepak
ababdfc9e2 tests/batchlog: use proper batchlog version
Since 42e3999a00 "Check batchlog version
before replaying" there is a version check in batchlog replay.
However, the test wasn't updated and still used some arbitrary version
number which caused it to fail.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1454595368-21670-1-git-send-email-pdziepak@scylladb.com>
2016-02-04 16:50:45 +02:00
Gleb Natapov
049ae37d08 storage_proxy: change collectd to show foreground mutation instead of overall mutation count
It is much easier to see what is going on this way otherwise graphs for
bg mutations and overall mutations are very close with usual scaling for
many workloads.

Message-Id: <20160204083452.GH6705@scylladb.com>
2016-02-04 14:58:56 +02:00
Gleb Natapov
a9e4afd8d2 Drop query-result.hh from database.hh
It is not needed there but causes a lot of recompilation when changed.

Message-Id: <1454496142-14537-3-git-send-email-gleb@scylladb.com>
2016-02-04 13:22:27 +02:00
Gleb Natapov
2ae1ae2d18 Cleanup messaging_service.hh includes a bit.
Forward declare some classes instead.

Message-Id: <1454496142-14537-2-git-send-email-gleb@scylladb.com>
2016-02-04 13:22:24 +02:00
Avi Kivity
f3ca597a01 Merge "Sstable cleanup fixes" from Tomasz
"  - Added waiting for async cleanup on clean shutdown

  - Crash in the middle of sstable removal doesn't leave system in a non-bootable state"
2016-02-04 12:36:13 +02:00
Tomasz Grabiec
c7ef3703cc sstable: Make sstable deletion never leave sstable set in a non-bootable state
Refs #860
Refs #802

An sstable file set with any component missing is interpreted as a
critical error during boot. Currently sstable removal procedure could
leave the files in a non-bootable state if the process crashed after
TOC was removed but before all components were removed as well.

To solve this problem, start the removal by renaming the TOC file to a
so called "temporary TOC". Upon boot such kind of TOC file is
interpreted as an sstable which is safe to remove. This kind of TOC
was added before to deal with a similar scenario but in the opposite
direction - when writing a new sstable.
2016-02-03 17:36:17 +01:00
Tomasz Grabiec
c8a98b487c sstables: Remove coupling-hiding duplication 2016-02-03 17:36:17 +01:00
Tomasz Grabiec
355874281a sstables: Do not register exit hooks from static initializer
Fixes #868.

Registerring exit hooks while reactor is already iterating over exit
hooks is not allowed and currently leads to undefined behavior
observed in #868. While we should make the failure more user friendly,
registering exit hooks concurrently with shutdown will not be allowed.

We don't expect exit hooks to be registered after exit starts because
this would violate the guarantee which says that exit hooks are
executed in reverse order of registration. Starting exit sequence in
the middle of initialization sequence would result in use after free
errors. Btw, I'm not sure if currently there's anything which prevents
this

To solve this problem, move the exit hook to initilization
sequence. In case of tests, the cleanup has to be called explicitly.
2016-02-03 17:35:50 +01:00
Tomasz Grabiec
136c9d9247 sstables: Improve error message in case of generation duplication
Refs #870.
2016-02-03 17:35:50 +01:00
Calle Wilund
a00ff015f4 transport::server: read cqlv2 batch options correctly
Fixes #563.
Refs #584

CQLv2 encodes batch query_options in v1 format, not v2+.
CQLv1 otoh has no batch support at all.
Make read_options use explicit version format if needed.

v2: Ensure we preserve cql protocol version in query_opts
Message-Id: <1454514510-21706-1-git-send-email-calle@scylladb.com>
2016-02-03 16:55:07 +01:00
Gleb Natapov
b4b560e0fc change result_digest to hold std::array instead of a std::vector
Digest size if fixed, so no need to use std::vector to hold it.

Message-Id: <20160203102530.GU6705@scylladb.com>
2016-02-03 12:27:39 +02:00
Raphael S. Carvalho
4041f8cffc compaction: stop all ongoing compaction during shutdown
Currently, we wait for ongoing compaction during shutdown, but
that may take 'forever' if compacting huge sstables with a slow
disk. Compaction of huge sstables will take a considerable amount
of time even with fast disks. Therefore, all ongoing compaction
should be stopped during shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3370f17ce4274df417ea60651f33fc5d4de91199.1454441286.git.raphaelsc@scylladb.com>
2016-02-03 10:18:51 +02:00
Raphael S. Carvalho
cf22c827f9 compaction_manager: fix assertion when stopping task
Task is stopped by closing gate and forcing it to exit via gate
exception. The problem is that task->compacting_cf may be set to
the column family being compacted, and compaction_manager::remove
would see it and try to stop the same task again, which would
lead to problems. The fix is to clean task->compacting_cf when
stopping task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3473e93c1a107a619322769d65fa020529b5501b.1454441286.git.raphaelsc@scylladb.com>
2016-02-03 10:18:15 +02:00
Asias He
c67538009c streaming: Fix assert in update_progress
The problem is that on the follower side, we set up _session_info too
late, after received PREPARE_DONE_MESSAGE message. The initiator can
send STREAM_MUTATION before sending PREPARE_DONE_MESSAGE message.

To fix, we set up _session_info after we received the prepare_message on
both initiator and follower.

Fixes #869

scylla: streaming/session_info.cc:44: void
streaming::session_info::update_progress(streaming::progress_info):
Assertion `peer == new_progress.peer' failed.
Message-Id: <6d945ba1e8c4fc0949c3f0a72800c9448ba27761.1454476876.git.asias@scylladb.com>
2016-02-03 10:15:45 +02:00
Asias He
46c392eb17 messaging_service: Stop retrying if messaging_service is being shutdown
If we are shutting down the messaging_service, we should not retry the
message again.

Refs #862

Message-Id: <7c3afb646ba8254eca69096d80dd5ea007e416a7.1454418053.git.asias@scylladb.com>
2016-02-02 19:50:54 +02:00
Gleb Natapov
c509e48674 Parallelize batchlog replay
Current code is serialized by get_truncated_at(). Use map_reduce to make
it run in parallel.
Message-Id: <1454421603-13080-4-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:54 +01:00
Gleb Natapov
42e3999a00 Check batchlog version before replaying
In case batchlog serialization format changes check it before trying
to interpret raw data.
Message-Id: <1454421603-13080-3-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:54 +01:00
Gleb Natapov
116ad5a603 Use net::messaging_service::current_version for serialization format versioning
Message-Id: <1454421603-13080-2-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:53 +01:00
Avi Kivity
b14d39bfb1 Merge "Move last bits to IDL serializer and get rid of old one" from Gleb 2016-02-02 12:33:18 +02:00
Gleb Natapov
19067db642 remove old serializer 2016-02-02 12:15:50 +02:00
Gleb Natapov
4e440ebf8e Remove old inet_address and uuid serializers 2016-02-02 12:15:50 +02:00
Gleb Natapov
31bb194c21 Remove old result_digest serializer 2016-02-02 12:15:50 +02:00