Compare commits

...

2627 Commits

Author SHA1 Message Date
Tomasz Grabiec
1c40fe36c6 Revert "dist: reduce kernel's inclincation to OOM-kill Scylla"
This reverts commit 1ea0af8acb.

Backported by mistake.
2016-07-25 10:01:55 +02:00
Avi Kivity
1ea0af8acb dist: reduce kernel's inclincation to OOM-kill Scylla
Scylla is likely the most important process on the machine, make the kernel
less inclined to kill it, on systemd-enabled hosts.

See #1393.
Message-Id: <1468847986-9429-1-git-send-email-avi@scylladb.com>
2016-07-25 09:58:56 +02:00
Tomasz Grabiec
f32437eb28 schema_tables: Fix hang during keyspace drop
Fixes #1484.

We drop tables as part of keyspace drop. Table drop starts with
creating a snapshot on all shards. All shards must use the same
snapshot timestamp which, among other things, is part of the snapshot
name. The timestamp is generated using supplied timestamp generating
function (joinpoint object). The joinpoint object will wait for all
shards to arrive and then generate and return the timestamp.

However, we drop tables in parallel, using the same joinpoint
instance. So joinpoint may be contacted by snapshotting shards of
tables A and B concurrently, generating timestamp t1 for some shards
of table A and some shards of table B. Later the remaining shards of
table A will get a different timestamp. As a result, different shards
may use different snapshot names for the same table. The snapshot
creation will never complete because the sealing fiber waits for all
shards to signal it, on the same name.

The fix is to give each table a separate joinpoint instance.

Message-Id: <1469117228-17879-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5e8f0efc85)
2016-07-22 17:14:31 +02:00
Avi Kivity
5216764245 auth: fix performance problem when looking up permissions
data_resource lookup uses data_resource::name(), which uses sprint(), which
uses (indirectly) locale, which takes a global lock.  This is a bottleneck
on large machines.

Fix by not using name() during lookup.

Fixes #1419
Message-Id: <1467616296-17645-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 76cc0c0cf9)
2016-07-04 17:56:51 +03:00
Avi Kivity
05afbb38fb mutation_reader: make restricting_mutation_reader even more restricting
While limiting the number of concurrently executing sstable readers reduces
our memory load, the queued readers, although consuming a small amount of
memory, can still grow without bounds.

To limit the damage, add two limits on the queue:
 - a timeout, which is equal to the read timeout
 - a queue length limit, which is equal to 2% of the shard memory divided
   by an estimate of the queued request size (1kb)

Together, these limits bound the amount of memory needed by queued disk
requests in case the disk can't keep up.
Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 9ac730dcc9)
2016-06-29 17:37:14 +03:00
Avi Kivity
e8fcf0a6be Fix backport of restricting_mutation_reader 2016-06-27 19:56:59 +03:00
Avi Kivity
34a712ddb9 db: add statistics about queued reads
Fixes #1398.

(cherry picked from commit f03cd6e913)
2016-06-27 19:50:38 +03:00
Avi Kivity
71265a3691 db: restrict replica read concurrency
Since reading mutations can consume a large amount of memory, which, moreover,
is not predicatable at the time the read is initiated, restrict the number
of reads to 100 per shard.  This is more than enough to saturate the disk,
and hopefully enough to prevent allocation failures.

Restriction is applied in column_family::make_sstable_reader(), which is
called either on a cache miss or if the cache is disabled.  This allows
cached reads to proceed without restriction, since their memory usage is
supposedly low.

Reads from the system keyspace use a separate semaphore, to prevent
user reads from blocking system reads.  Perhaps we should select the
semaphore based on the source of the read rather than the keyspace,
but for now using the keyspace is sufficient.

Fixes #1398.

(cherry picked from commit edeef03b34)
2016-06-27 19:50:30 +03:00
Avi Kivity
deeabaea9c mutation_reader: introduce restricting_reader
A restricting_reader wraps a mutation_reader, and restricts it concurrency
using a provided semaphore; this allows controlling read concurrency, which
is important since reads can consume a lot of resources ((number of
participating sstables) * 128k after we have streaming mutations, and a lot
more before).

Fixes #1398.

(cherry picked from commit bea7d7ee94)
2016-06-27 19:50:28 +03:00
Avi Kivity
d2bf1535ea Update seastar submodule
* seastar b80564d...010c27c (1):
  > resource: don't abort on too-high io queue count

Fixes #1400.
2016-06-27 19:37:13 +03:00
Avi Kivity
077252956f release: prepare for 1.1.4 2016-06-27 19:34:59 +03:00
Nadav Har'El
3bbd260a3e Rewriting shared sstables only after all shards loaded sstables
After commit faa4581, each shard only starts splitting its shared sstables
after opening all sstables. This was important because compaction needs to
be aware of all sstables.

However, another bug remained: If one shard finishes loading its sstables
and starts the splitting compactions, and in parallel a different shard is
still opening sstables - the second shard might find a half-written sstable
being written by the first shard, and abort on a malformed sstable.

So in this patch we start the shared sstable rewrites - on all shards -
only after all shards finished loading their sstables. Doing this is easy,
because main.cc already contains a list of sequential steps where each
uses invoke_on_all() to make sure the step completes on all shards before
continuing to the next step.

Fixes #1371

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit 3372052d48)
2016-06-20 18:20:23 +03:00
Nadav Har'El
4905f8a49d Rewrite shared sstables only after entire CF is read
Starting in commit 721f7d1d4f, we start "rewriting" a shared sstable (i.e.,
splitting it into individual shards) as soon as it is loaded in each shard.

However as discovered in issue #1366, this is too soon: Our compaction
process relies in several places that compaction is only done after all
the sstables of the same CF have been loaded. One example is that we
need to know the content of the other sstables to decide which tombstones
we can expire (this is issue #1366). Another example is that we use the
last generation number we are aware of to decide the number of the next
compaction output - and this is wrong before we saw all sstables.

So with this patch, while loading sstables we only make a list of shared
sstables which need to be rewritten - and the actual rewrite is only started
when we finish reading all the sstables for this CF. We need to do this in
two cases: reboot (when we load all the existing sstables we find on disk),
and nodetool referesh (when we import a set of new sstables).

Fixes #1366.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466344078-31290-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit faa45812b2)
2016-06-19 17:19:45 +03:00
Asias He
ad2b7d1e8c repair: Switch log level to warn instead of error
dtest takes error level log as serious error. It is not a serious error
for streaming to fail to send a verb and fail a streaming session which
triggers a repair failure, for example, the peer node is gone or
stopped. Switch to use log level warn instead of level error.

Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test

Fixes: #1335
Message-Id: <406fb0c4a45b81bd9c0aea2a898d7ca0787b23e9.1465979288.git.asias@scylladb.com>
(cherry picked from commit de0fd98349)
2016-06-18 11:43:39 +03:00
Asias He
80e4e2da38 streaming: Switch log level to warn instead of error
dtest takes error level log as serious error. It is not a serious error
for streaming to fail to send a verb and fail a streaming session, for
example, the peer node is gone or stopped. Switch to use log level warn
instead of level error.

Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test

Fixes: #1335
Message-Id: <0149d30044e6e4d80732f1a20cd20593de489fc8.1465979288.git.asias@scylladb.com>
(cherry picked from commit 94c9211b0e)
2016-06-18 11:43:28 +03:00
Asias He
29da6fa5b4 streaming: Fix indention in do_send_mutations
Message-Id: <bc8cfa7c7b29f08e70c0af6d2fb835124d0831ac.1464857352.git.asias@scylladb.com>
(cherry picked from commit 96463cc17c)
2016-06-18 11:43:11 +03:00
Pekka Enberg
ad1af17aad release: prepare for 1.1.3 2016-06-16 13:39:45 +03:00
Pekka Enberg
7e052a4e91 service/storage_service: Make do_isolate_on_error() more robust
Currently, we only stop the CQL transport server. Extract a
stop_transport() function from drain_on_shutdown() and call it from
do_isolate_on_error() to also shut down the inter-node RPC transport,
Thrift, and other communications services.

Fixes #1353

(cherry picked from commit d72c608868)

Conflicts:
	service/storage_service.cc
2016-06-16 13:37:06 +03:00
Pekka Enberg
787d8f88d3 release: prepare for 1.1.3.rc1 2016-06-15 10:19:21 +03:00
Pekka Enberg
d4e1a25858 Merge "Rebase "Rewrite shared sstables soon after startup" to 1.1" from Glauber
"This is a rebase of Nadav's "Rewrite shared sstables soon after startup" to 1.1.
Aside from minor issues, there are two major conflicts with 1.1:

 1) Parallel compactions within the same CF,
 2) Cache invalidation fixes.

Because 2) is fixing an actual bug that would bite us in 1.1 as well, I have
decided to, exercising my discretion, backport that as well. It has the nice
side-effect of getting the database.cc-side of the conflicts disappear and
apply cleanly. The patches are also fairly simple.

On the other hand, 1) has no place in 1.1, and I have manually rebased it.

I have executed the following dtests in the updated tree, which are all passing:

compaction_test.py:TestCompaction_with_LeveledCompactionStrategy.compaction_delete_test
compaction_test.py:TestCompaction_with_SizeTieredCompactionStrategy.compaction_delete_test
compaction_test.py:TestCompaction_with_LeveledCompactionStrategy.compaction_delete_2_test
compaction_test.py:TestCompaction_with_SizeTieredCompactionStrategy.compaction_delete_2_test
compaction_additional_test.py"
2016-06-15 09:44:28 +03:00
Nadav Har'El
5243c77fcb Rewrite shared sstables soon after startup
Several shards may share the same sstable - e.g., when re-starting scylla
with a different number of shards, or when importing sstables from an
external source. Sharing an sstable is fine, but it can result in excessive
disk space use because the shared sstable cannot be deleted until all
the shards using it have finished compacting it. Normally, we have no idea
when the shards will decide to compact these sstables - e.g., with size-
tiered-compaction a large sstable will take a long time until we decide
to compact it. So what this patch does is to initiate compaction of the
shared sstables - on each shard using it - so that a soon as possible after
the restart, we will have the original sstable is split into separate
sstables per shard, and the original sstable can be deleted. If several
sstables are shared, we serialize this compaction process so that each
shard only rewrites one sstable at a time. Regular compactions may happen
in parallel, but they will not not be able to choose any of the shared
sstables because those are already marked as being compacted.

Commit 3f2286d0 increased the need for this patch, because since that
commit, if we don't delete the shared sstable, we also cannot delete
additional sstables which the different shards compacted with it. For one
scylla user, this resulted in so much excessive disk space use, that it
literally filled the whole disk.

After this patch commit 3f2286d0, or the discussion in issue #1318 on how
to improve it, is no longer necessary, because we will never compact a shared
sstable together with any other sstable - as explained above, the shared
sstables are marked as "being compacted" so the regular compactions will
avoid them.

Fixes #1314.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1465406235-15378-1-git-send-email-nyh@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

1.1 Rebase: Glauber Costa <glauber@scylladb.com>

Conflicts:
	sstables/compaction_manager.cc

Rebase notes: this patch conflicts heavily with a previous series that allows
parallel compactions on the same CF. That machinery is not 1.1 material, so I
had to work around it. Most relevant changes:
 - changed tasks vector to a list (for consistency with parallel compaction)
 - each shard will compact task_nr + 1 tasks, the extra one being the rewrite
   task.
2016-06-14 13:02:42 -04:00
Tomasz Grabiec
d8768257fe row_cache: Make stronger guarantees in clear/invalidate
Correctness of current uses of clear() and invalidate() relies on fact
that cache is not populated using readers created before
invalidation. Sstables are first modified and then cache is
invalidated. This is not guaranteed by current implementation
though. As pointed out by Avi, a populating read may race with the
call to clear(). If that read started before clear() and completed
after it, the cache may be populated with data which does not
correspond to the new sstable set.

To provide such guarantee, invalidate() variants were adjusted to
synchronize using _populate_phaser, similarly like row_cache::update()
does.

Conflicts:
    database.cc

1.1 Rebase: Glauber Costa <glauber@scylladb.com>

Rebase notes: the conflicts are due to unrelated changes in how we
invoke the flush functions for streaming memtables. Those changes
are definitely not needed by 1.1
2016-06-14 12:03:00 -04:00
Tomasz Grabiec
c2c4e842fc row_cache: Implement clear() using invalidate()
Reduces code duplication.

1.1 Rebase: Glauber Costa <glauber@scylladb.com> - no changes needed.
2016-06-14 12:03:00 -04:00
Raphael S. Carvalho
61d4bf3800 db: fix read consistency after refresh
If sstable loaded by refresh covers a row that is cached by the
column family, read query may fail to return consistent data.
What we should do is to clear cache for the column family being
loaded with new sstables.

Fixes #1212.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a08c9885a5ceb0b2991e40337acf5b7679580a66.1464072720.git.raphaelsc@scylladb.com>

1.1 Rebase: Glauber Costa <glauber@scylladb.com> - no changes needed
2016-06-14 12:02:27 -04:00
Pekka Enberg
1df0e435e8 utils/exceptions: Whitelist EEXIST and ENOENT in should_stop_on_system_error()
There are various call-sites that explicitly check for EEXIST and
ENOENT:

  $ git grep "std::error_code(E"
  database.cc:                            if (e.code() != std::error_code(EEXIST, std::system_category())) {
  database.cc:            if (e.code() != std::error_code(ENOENT, std::system_category())) {
  database.cc:        if (e.code() != std::error_code(ENOENT, std::system_category())) {
  database.cc:                            if (e.code() != std::error_code(ENOENT, std::system_category())) {
  sstables/sstables.cc:            if (e.code() == std::error_code(ENOENT, std::system_category())) {
  sstables/sstables.cc:            if (e.code() == std::error_code(ENOENT, std::system_category())) {

Commit 961e80a ("Be more conservative when deciding when to shut down
due to disk errors") turned these errors into a storage_io_exception
that is not expected by the callers, which causes 'nodetool snapshot'
functionality to break, for example.

Whitelist the two error codes to revert back to the old behavior of
io_check().
Message-Id: <1465454446-17954-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 8df5aa7b0c)
2016-06-14 15:27:22 +03:00
Avi Kivity
4e6f09ad95 Be more conservative when deciding when to shut down due to disk errors
Currently we only shut down on EIO.  Expand this to shut down on any
system_error.

This may cause us to shut down prematurely due to a transient error,
but this is better than not shutting down due to a permanent error
(such as ENOSPC or EPERM).  We may whitelist certain errors in the future
to improve the behavior.

Fixes #1311.
Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 961e80ab74)
2016-06-14 15:27:18 +03:00
Pekka Enberg
426316a4b7 release: prepare for 1.1.2 2016-06-06 14:47:32 +03:00
Pekka Enberg
3289010910 Revert "dist/common/scripts: update SET_NIC when --setup-nic passed to scylla_sysconfig_setup"
This reverts commit 73fa36b416.

Fixes #1301.
2016-06-06 14:46:32 +03:00
Asias He
52c9723e04 streaming: Reduce memory usage when sending mutations
Limit disk bandwidth to 5MB/s to emulate a slow disk:
echo "8:0 5000000" >
/cgroup/blkio/limit/blkio.throttle.write_bps_device
echo "8:0 5000000" >
/cgroup/blkio/limit/blkio.throttle.read_bps_device

Start scylla node 1 with low memory:
scylla -c 1 -m 128M --auto-bootstrap false

Run c-s:
taskset -c 7 cassandra-stress write duration=5m cl=ONE -schema
'replication(factor=1)' -pop seq=1..100000  -rate threads=20
limit=2000/s -node 127.0.0.1

Start scylla node 2 with low memory:
scylla -c 1 -m 128M --auto-bootstrap true

Without this patch, I saw std::bad_alloc during streaming

ERROR 2016-06-01 14:31:00,196 [shard 0] storage_proxy - exception during
mutation write to 127.0.0.1: std::bad_alloc (std::bad_alloc)
...
ERROR 2016-06-01 14:31:10,172 [shard 0] database - failed to move
memtable to cache: std::bad_alloc (std::bad_alloc)
...

To fix:

1. Apply the streaming mutation limiter before we read the mutation into
memory to avoid wasting memory holding the mutation which we can not
send.

2. Reduce the parallelism of sending streaming mutations. Before we send each
range in parallel, after we send each range one by one.

   before: nr_vnode * nr_shard * (send_info + cf.make_reader memory usage)

   after: nr_shard * (send_info + cf.make_reader memory usage)

We can at least save memory usage by the factor of nr_vnode, 256 by
default.

In my setup, fix 1) alone is not enough, with both fix 1) and 2), I saw
no std::bad_alloc. Also, I did not see streaming bandwidth dropped due
to 2).

In addition, I tested grow_cluster_test.py:GrowClusterTest.test_grow_3_to_4,
as described:

https://github.com/scylladb/scylla/issues/1270#issuecomment-222585375

With this patch, I saw no std::bad_alloc any more.

Fixes: #1270

Message-Id: <7703cf7a9db40e53a87f0f7b5acbb03fff2daf43.1464785542.git.asias@scylladb.com>
(cherry picked from commit 206955e47c)

Conflicts:
	streaming/stream_transfer_task.cc
2016-06-02 11:08:19 +03:00
Pekka Enberg
3cc91eeb84 release: prepare for 1.1.1 2016-05-26 12:58:39 +03:00
Pekka Enberg
d67ee37bbc Update seastar submodule
* seastar 85bdfb7...b80564d (1):
  > reactor: advertise the logging_failures metric as a DERIVE counter
2016-05-26 12:57:54 +03:00
Raphael S. Carvalho
fcbe43cc87 sstables: optimize leveled compaction strategy
Leveled compaction strategy is doing a lot of work whenever it's asked to get
a list of sstables to be compacted. It's checking if a sstable overlaps with
another sstable in the same level twice. First, when adding a sstable to a
list with sstables at the same level. Second, after adding all sstables to
their respective lists.

It's enough to check that a sstable creates an overlap in its level only once.
So I am changing the code to unconditionally insert a sstable to its respective
list, and after that, it will call repair_overlapping_sstables() that will send
any sstable that creates an overlap in its level to L0 list.

By the way, the optimization isn't in the compaction itself, instead in the
strategy code that gets a set of sstables to be compacted.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <8c8526737277cb47987a3a5dbd5ff3bb81a6d038.1461965074.git.raphaelsc@scylladb.com>
(cherry picked from commit ae95ce1bd7)
2016-05-24 15:56:10 +03:00
Pekka Enberg
ef79310b3c dist/docker: Use Scylla 1.1 RPM repository 2016-05-23 10:16:13 +03:00
Pekka Enberg
f7e81c7b7d dist/docker: Fetch RPM repository from Scylla web site
Fix the hard-coded Scylla RPM repository by downloading it from Scylla
web site. This makes it easier to switch between different versions.

Message-Id: <1463981271-25231-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 8a7197e390)
2016-05-23 10:15:17 +03:00
Raphael S. Carvalho
54224dfaa0 tests: check that overlapping sstable has its level changed to 0
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit cbc2e96a58)
2016-05-20 13:42:32 +03:00
Raphael S. Carvalho
e30199119c db: fix migration of sstables with level greater than 0
Refresh will rewrite statistics of any migrated sstable with level
> 0. However, this operation is currently not working because O_EXCL
flag is used, meaning that create will fail.

It turns out that we don't actually need to change on-disk level of
a sstable by overwriting statistics file.
We can only set in-memory level of a sstable to 0. If Scylla reboots
before all migrated sstables are compacted, leveled strategy is smart
enough to detect sstables that overlap, and set their in-memory level
to 0.

Fixes #1124.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit ee0f66eef6)
2016-05-20 13:42:27 +03:00
Raphael S. Carvalho
07ce4ec032 main: stop compaction manager earlier
Avi says:
"During shutdown, we prevent new compactions, but perhaps too late.
Memtables are flushed and these can trigger compaction."

To solve that, let's stop compaction manager at a very early step
of shutdown. We will still try to stop compaction manager in
database::stop() because user may ask for a shutdown before scylla
was fully started. It's fine to stop compaction manager twice.
Only the first call will actually stop the manager.

Fixes #1238.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c64ab11f3c91129c424259d317e48abc5bde6ff3.1462496694.git.raphaelsc@scylladb.com>
(cherry picked from commit bf18025937)
2016-05-20 13:41:11 +03:00
Asias He
c7c18d9c0c gms: Optimize gossiper::is_alive
In perf-flame, I saw in

service::storage_proxy::create_write_response_handler (2.66% cpu)

  gossiper::is_alive takes 0.72% cpu
  locator::token_metadata::pending_endpoints_for takes 1.2% cpu

After this patch:

service::storage_proxy::create_write_response_handler (2.17% cpu)

  gossiper::is_alive does not show up at all
  locator::token_metadata::pending_endpoints_for takes 1.3% cpu

There is no need to copy the endpoint_state from the endpoint_state_map
to check if a node is alive. Optimize it since gossiper::is_alive is
called in the fast path.

Message-Id: <2144310aef8d170cab34a2c96cb67cabca761ca8.1463540290.git.asias@scylladb.com>
(cherry picked from commit eb9ac9ab91)
2016-05-20 13:03:44 +03:00
Asias He
2a4582ab9f token_metadata: Speed up pending_endpoints_for
pending_endpoints_for is called frequently by
storage_proxy::create_write_response_handler when doing cql query.

Before this patch, each call to pending_endpoints_for involves
converting a multimap (std::unordered_multimap<range<token>,
inet_address>>) to map (std::unordered_map<range<token>,
std::unordered_set<inet_address>>).

To speed up the token to pending endpoint mapping search, a interval map
is introduced. It is faster than searching the map linearly and can
avoid caching the token/pending endpoint mapping.

With this patch, the operations per second drop during adding node
period gets much better.

Before:
45K to 10K

After:
45k to 38K

(The number is measured with the streaming code skipping to send data to
rule out the streaming factor.)

Refs: #1223
(cherry picked from commit 089734474b)
2016-05-20 13:03:31 +03:00
Asias He
ab5e23f6e7 dht: Add default constructor for token
It is needed to put token in to a boost interval_map in the following
patch.

(cherry picked from commit ee0585cee9)
2016-05-20 13:03:24 +03:00
Calle Wilund
fecea15a25 cql3::statements::cf_prop_defs: Fix compation min/max not handled
Property parsing code was looking at wrong property level
for initial guard statement.

Fixes #1257

Message-Id: <1462967584-2875-1-git-send-email-calle@scylladb.com>
(cherry picked from commit 5604fb8aa3)
2016-05-18 14:19:27 +03:00
Tomasz Grabiec
dce549f44f tests: Add unit tests for schema_registry
(cherry picked from commit 90c31701e3)
2016-05-16 10:52:02 +03:00
Tomasz Grabiec
26a3302957 schema_registry: Fix possible hang in maybe_sync() if syncer doesn't defer
Spotted during code review.

If it doesn't defer, we may execute then_wrapped() body before we
change the state. Fix by moving then_wrapped() body after state changes.

(cherry picked from commit 443e5aef5a)
2016-05-16 10:51:54 +03:00
Tomasz Grabiec
f796d8081b migration_manager: Fix schema syncing with older version
The problem was that "s" would not be marked as synced-with if it came from
shard != 0.

As a result, mutation using that schema would fail to apply with an exception:

  "attempted to mutate using not synced schema of ..."

The problem could surface when altering schema without changing
columns and restarting one of the nodes so that it forgets past
versions.

Fixes #1258.

Will be covered by dtest:

  SchemaManagementTest.test_prepared_statements_work_after_node_restart_after_altering_schema_without_changing_columns

(cherry picked from commit 8703136a4f)
2016-05-16 10:51:48 +03:00
Pekka Enberg
b850cb991c release: prepare for 1.1.0 2016-05-16 09:33:26 +03:00
Tomasz Grabiec
734cfa949a migration_manager: Invalidate prepared statements on every schema change
Currently we only do that when column set changes. When prepared
statements are executed, paramaters like read repair chance are read
from schema version stored in the statement. Not invalidating prepared
statements on changes of such parameters will appear as if alter took
no effect.

Fixes #1255.
Message-Id: <1462985495-9767-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 13d8cd0ae9)
2016-05-12 09:18:00 +03:00
Calle Wilund
3606e3ab29 transport::server: Do not treat accept exception as fatal
1.) It most likely is not, i.e. either tcp or more likely, ssl
    negotiation failure. In any case, we can still try next
    connection.
2.) Not retrying will cause us to "leak" the accept, and then hang
    on shutdown.

Also, promote logging message on accept exception to "warn", since
dtest(s?) depend on seeing log output.

Message-Id: <1462283265-27051-4-git-send-email-calle@scylladb.com>
(cherry picked from commit 917bf850fa)
2016-05-10 19:26:29 +03:00
Calle Wilund
014284de00 cql_server: Use credentials_builder to init tls
Slightly cleaner, and shard-safe tls init.

Message-Id: <1462283265-27051-3-git-send-email-calle@scylladb.com>
(cherry picked from commit 437ebe7128)
2016-05-10 19:26:23 +03:00
Calle Wilund
f17764e74a messaging_service: Change tls init to use credentials_builder
To simplify init of msg service, use credendials_builder
to encapsulate tls options so actual credentials can be
more easily created in each shard.

Message-Id: <1462283265-27051-2-git-send-email-calle@scylladb.com>
(cherry picked from commit 58f7edb04f)
2016-05-10 19:26:18 +03:00
Avi Kivity
8643028d0c Update seastar submodule
* seastar 73d5583...85bdfb7 (4):
  > tests/mkcert.gmk: Fix makefile bug in snakeoil cert generator
  > tls_test: Add case to do a little checking of credentials_builder
  > tls: Add credentials_builder - copyable credentials "factory"
  > tls_test: Add test for large-ish buffer send/recieve
2016-05-10 19:24:49 +03:00
Avi Kivity
a35f1d765a Backport seastar iotune fixes
* seastar dab58e4...73d5583 (2):
  > iotune: don't coredump when directory fails to be created
  > iotune: improve recommendation in case we timeout

Fixes #1243.
2016-05-09 10:49:15 +03:00
Avi Kivity
3116a92b0e Point seastar submodule at scylla-seastar repository
Allows us to backport seastar fixes to branch-1.1.
2016-05-08 14:49:02 +03:00
Gleb Natapov
dad312ce0a tests: test for result row counting
Message-Id: <1462377579-2419-2-git-send-email-gleb@scylladb.com>
(cherry picked from commit f1cd52ff3f)
2016-05-06 13:32:36 +03:00
Gleb Natapov
3cae56f3e3 query: fix result row counting for results with multiple partitions
Message-Id: <1462377579-2419-1-git-send-email-gleb@scylladb.com>
(cherry picked from commit b75475de80)
2016-05-06 13:32:29 +03:00
Calle Wilund
656a10c4b8 storage_service: Add logging to match origin
Pointing out if CQL server is listing in SSL mode.
Message-Id: <1462368016-32394-2-git-send-email-calle@scylladb.com>

(cherry picked from commit 709dd82d59)
2016-05-06 13:30:29 +03:00
Calle Wilund
c04b3de564 messaging_service: Add logging to match origin
To announce rpc port + ssl if on.

Message-Id: <1462368016-32394-1-git-send-email-calle@scylladb.com>
(cherry picked from commit d8ea85cd90)
2016-05-06 13:26:01 +03:00
Gleb Natapov
4964fe4cf0 storage_proxy: stop range query with limit after the limit is reached
(cherry picked from commit 3039e4c7de)
2016-05-06 12:50:51 +03:00
Gleb Natapov
e3ad3cf7d9 query: put live row count into query::result
The patch calculates row count during result building and while merging.
If one of results that are being merged does not have row count the
merged result will not have one either.

(cherry picked from commit db322d8f74)
2016-05-06 12:50:47 +03:00
Gleb Natapov
cef40627a7 storage_proxy: fix calculation of concurrency queried ranges
(cherry picked from commit 41c586313a)
2016-05-06 12:50:37 +03:00
Gleb Natapov
995820c08a storage_proxy: add logging for range query row count estimation
(cherry picked from commit c364ab9121)
2016-05-06 12:50:32 +03:00
Calle Wilund
b78abd7649 auth: Make auth.* schemas use deterministic UUIDs
In initial implementation I figured this was not required, but
we get issues communicating across nodes if system tables
don't have the same UUID, since creation is forcefully local, yet
shared.

Just do a manual re-create of the scema with a name UUID, and
use migration manager directly.
Message-Id: <1462194588-11964-1-git-send-email-calle@scylladb.com>

(cherry picked from commit 6d2caedafd)
2016-05-03 10:49:16 +03:00
Calle Wilund
d47c62b51c messaging_service: Change init to use per-shard tls credentials
Fixes: #1220

While the server_credentials object is technically immutable
(esp with last change in seastar), the ::shared_ptr holding them
is not safe to share across shards.

Pre-create cpu x credentials and then move-hand them out in service
start-up instead.

Fixes assertion error in debug builds. And just maybe real memory
corruption in release.

Requires seastar tls change:
"Change server_credentials to copy dh_params input"

Message-Id: <1462187704-2056-1-git-send-email-calle@scylladb.com>
(cherry picked from commit 751ba2f0bf)
2016-05-02 15:37:43 +03:00
Pekka Enberg
bef19e7f9e release: prepare for 1.1.rc1 2016-04-29 08:49:10 +03:00
Takuya ASADA
3ec47fbcf0 dist/ubuntu: unofficial support Debian 8.4
Unofficial support for Debian 8.4.
Now we supported both ubuntu and debian, but keep directory name as 'dist/ubuntu' for now.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461006868-28273-1-git-send-email-syuu@scylladb.com>
2016-04-27 15:39:20 +03:00
Pekka Enberg
31090f3116 Merge "Fix for systemd support on Ubuntu, add Ubuntu 16.04 support" from Takuya
"This is bug fix for systemd support on Ubuntu, and add Ubuntu 16.04 support."
2016-04-27 15:37:25 +03:00
Takuya ASADA
1cfde50102 dist/ubuntu: support 16.04
Drop 'unsupported release' message on 16.04.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
988b7bcd3d dist/ubuntu: don't use ubuntu-toolchain-r/test ppa repo on recent versions of Ubuntu, since it has newer g++
On Ubuntu 15.04 and newer, official g++ package is >= g++-4.9.
So we don't need to use development repository, just use official package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
fa0b90b727 dist/ubuntu: add dependency for libsystemd-dev to handle startup correctly on recent versions of Ubuntu
To handle scylla startup correctly on systemd versions of Ubuntu, scylla requires to build with libsystemd-dev.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
eae881ff70 dist/ubuntu: skip dh_installinit --upstart-only on recent versions of Ubuntu
Since 16.04LTS does not support this argument anymore, drop it on recent version of Ubuntu which does not uses Upstart.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
d5efa02eab dist/ubuntu/dep: Drop python-support on Ubuntu 16.04
Ubuntu 16.04 seems dropped python-support, so remove it from thrift package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Takuya ASADA
e733c2aae8 dist/ubuntu/dep: use distribution's thrift-compiler-0.9.1 on newer versions of Ubuntu
Use distribution's thrift if version > 14.04LTS.
14.04LTS doesn't have thrift-compiler-0.9.1, use our version.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-04-27 18:06:59 +09:00
Avi Kivity
ad9e75a3fa Merge seastar upstream
* seastar 15a92cf...dab58e4 (6):
  > tls: Fix tls sink::put so it deals with larger packets
  > tls: Change server_credentials to copy dh_params input
  > seastar thread: allow the thread_scheduling_group's usage fraction to change
  > seastar::async allow passing an attribute
  > thread: document undocumented classes
  > fair_queue: fix inconsistency during renormalization
2016-04-27 10:39:09 +03:00
Avi Kivity
454512a272 dist/redhat: package scylla_kernel_check
Can't build rpm without this.
Message-Id: <1461683947-30356-1-git-send-email-avi@scylladb.com>
2016-04-27 08:38:48 +03:00
Tomasz Grabiec
61435108a5 query: Do not take arguments via ... in the visitor
Amnon reports that current code fails to compile on gcc 4.9:

distcc[9700] ERROR: compile /home/amnon/.ccache/tmp/query.tmp.localhost.localdomain.9673.ii on localhost failed
In file included from query.cc:30:0:
query-result-reader.hh: In instantiation of ‘void query::result_view::consume(const query::partition_slice&, ResultVisitor&&) [with ResultVisitor = query::result::calculate_row_count(const query::partition_slice&)::<anonymous struct>&]’:
query.cc:196:32:   required from here
query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class clustering_key_prefix’ through ‘...’
                     visitor.accept_new_row(*row.key(), static_row, view);
                     ^
query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’
query-result-reader.hh:184:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’
query-result-reader.hh:186:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’
                     visitor.accept_new_row(static_row, view);
                     ^
query-result-reader.hh:186:21: error: cannot pass objects of non-trivially-copyable type ‘class query::result_row_view’ through ‘...’

Work around the problem by not using '...'.
Message-Id: <1460964042-2867-1-git-send-email-tgrabiec@scylladb.com>
2016-04-26 14:50:35 +03:00
Takuya ASADA
eb9bd3ee21 dist/common/scripts: show knowledge base URL when kernel is too old
To explain why this kernel is not supported, we need to show kb URL here.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461644708-32078-1-git-send-email-syuu@scylladb.com>
2016-04-26 14:43:10 +03:00
Takuya ASADA
05ac4bb99d dist/common/scripts: notice restart required after changing bootparameters
Fixes #1115

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459330851-32470-1-git-send-email-syuu@scylladb.com>
2016-04-26 14:41:49 +03:00
Tomasz Grabiec
88bb5fcb53 api: Fix error message
Keyspace and table names are separated by a single colon.
Message-Id: <1461600269-4070-1-git-send-email-tgrabiec@scylladb.com>
2016-04-26 08:40:28 +03:00
Takuya ASADA
e7f438eeae dist/ubuntu: Drop dependency to libthrift0, link it statically
Drop dependency to libthrift0 on installation time, link libthrift statically.
With this fix, we don't need to distribute libthrift0 deb package anymore to install scylla-server binary package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461594460-2403-2-git-send-email-syuu@scylladb.com>
2016-04-25 17:44:46 +03:00
Takuya ASADA
ec2ef467c8 configure.py: configure.py: add --static-thrift option to link libthrift statically
This is needed for Ubuntu packaging, to drop dependency to libthrift0 on installation time.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461594460-2403-1-git-send-email-syuu@scylladb.com>
2016-04-25 17:44:44 +03:00
Avi Kivity
af803a9149 Merge seastar upstream
* seastar 2b3c363...15a92cf (2):
  > smp: allow more than 128 in-flight operations on core-to-core queue
  > future: balance constructors and destructors in future_state<>

Fixes #1205.
2016-04-25 13:34:27 +03:00
Calle Wilund
cdd0f00de5 client_state: Remove unwarranted keyspace check
"has_keyspace_access" is not supposed to (according to origin)
verify that a keyspace exists. Remove.
It (and all others) are however supposed to check "ks" (name)
not empty. Add this.
Message-Id: <1461578072-24113-1-git-send-email-calle@scylladb.com>
2016-04-25 13:16:36 +03:00
Calle Wilund
49d3d79dfe sstables: Fix compilation error on boost 1.55
Message-Id: <1461067254-526-2-git-send-email-calle@scylladb.com>
2016-04-25 12:54:44 +03:00
Calle Wilund
9130b0de16 database.cc: Fix compilation error with boost 1.55
Message-Id: <1461067254-526-1-git-send-email-calle@scylladb.com>
2016-04-25 12:54:43 +03:00
Takuya ASADA
c657a431dc dist/common/scripts: Fix incorrect order to run scylla_sysconfig_setup on scylla_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461174517-12441-2-git-send-email-syuu@scylladb.com>
2016-04-25 11:09:49 +03:00
Takuya ASADA
9a99231f6b dist/common/scripts: On scylla_setup, skip showing 'lo' interface on sysconfig prompt
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461174517-12441-1-git-send-email-syuu@scylladb.com>
2016-04-25 11:09:48 +03:00
Takuya ASADA
611b0a3400 dist/common/scripts: Add kernel version check
Check kernel version at beginning of scylla_setup, show error when kernel is too old.
Use iotune --fs-check to check kernel.

Fixes #1116

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459886738-10882-1-git-send-email-syuu@scylladb.com>
2016-04-24 17:47:13 +03:00
Vlad Zolotarov
813ad4024f query_processor: account unprepared statements executions
Add the statistics counter for a number of unprepared statements
executions and expose it with collectd.

Since in our implementation a number of unprepared statements executions
equals to a number of executions of prepare() function we may simply
increment the new statistics counter every time query_processor::get_statement()
is called.

Fixes #1068

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1461503492-32228-1-git-send-email-vladz@cloudius-systems.com>
2016-04-24 16:55:15 +03:00
Avi Kivity
c6b5890eb2 Merge 2016-04-24 16:17:00 +03:00
Pekka Enberg
f6da9bc92b Merge "Additional mutations/queries related collectd metrics" from Vlad
"This series introduces some additional metrics (mostly) in a storage_proxy and
a database level that are meant to create a better picture of how data flows
in the cluster.

First of all where possible counters of each category (e.g. total writes in the storage
proxy level) are split into the following categories:
   - operations performed on a local Node
   - operations performed on remote Nodes aggregated per DC

In a storage_proxy level there are the following metrics that have this "split"
nature (all on a sending side):
   - total writes (attempts/errors)
   - writes performed as a result of a Read Repair logic
   - total data reads (attempts/completed/errors)
   - total digest reads (attempts/completed/errors)
   - total mutations data reads (attempts/completed/errors)

In a batchlog_manager:
   - writes performed as a result of a batchlog replay logic

Thereby if for instance somebody wants to get an idea of how many writes
the current Node performs due to user requested mutations only he/she has
to take a counter of total writes and subtract the writes resulted by Read
Repairs and batchlog replays.

On a receiving side of a storage_proxy we add the two following counters:
   - total number of received mutations
   - total number of forwarded mutations (attempts/errors)

In order to get a better picture of what is going on on a local Node
we are adding two counters on a database level:
   - total number of writes
   - total number of reads

Comparing these to total writes/reads in a storage_proxy may give a good
idea if there is an excessive access to a local DB for example."
2016-04-21 15:58:45 +03:00
Takuya ASADA
2bfc8e8c12 main: add tcp_syncookies sanity check
Check net.ipv4.tcp_syncookies, show error message when it set to 0.
Fixes #1118

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460738415-3798-1-git-send-email-syuu@scylladb.com>
2016-04-21 14:55:26 +03:00
Pekka Enberg
3f1fcca3bc cql3: Fix DROP KEYSPACE error message when keyspace does not exist
Commit d3fe0c5 ("Refactor db/keyspace/column_family toplogy") changed
database::find_keyspace() to throw a std::nested_exception so the catch
block in migration_manager::announce_keyspace_drop() no longer catches
the exception. Fix the issue by explicitly checking if the keyspace
exists and throwing the correct exception type if it doesn't.

Fixes TestCQL.keyspace_test.
Message-Id: <1461218910-26691-1-git-send-email-penberg@scylladb.com>
2016-04-21 12:42:45 +02:00
Vlad Zolotarov
4ef5b11e9b batchlog_manager: add a counter for a total number of write attempts
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:29:21 +03:00
Vlad Zolotarov
97e5bfa815 database: add metrics for total writes and reads
This patch adds a counter of total writes and reads
for each shard.

It seems that nothing ensures that all database queries are
ready before database object is destroyed.
Make _stats lw_shared_ptr in order to ensure that the object is
alive when lambda gets to incrementing it.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:53 +03:00
Vlad Zolotarov
9bf8253412 storage_proxy: add read requests split counters
Add split (local Nodes, external Nodes aggregated per Nodes' DCs) counters
for the following read categories:
   - data reads
   - digest reads
   - mutation data reads

Each category is added attempts, completions and errors metrics.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:19 +03:00
Vlad Zolotarov
cbcbdc3b4a storage_proxy: add split counters for writes
Added split metrics for operations on a local Node and on external
Nodes aggregated per Nodes' DCs.

Added separate split counters for:
    - total writes attempts/errors
    - read repair write attempts (there is no easy way to separate errors
      at the moment)

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:15 +03:00
Vlad Zolotarov
c92654b281 storage_proxy: add counters for received and forwarded mutations
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:27:29 +03:00
Piotr Jastrzebski
8231385e0c sstables: Remove unused code from mp_row_consumer
_mutation_to_subscription is not used anywhere so
it should probably be removed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <90ef62daee0c183b29dcb86d08843145d657ea38.1461179970.git.piotr@scylladb.com>
2016-04-20 23:10:43 +03:00
Raphael S. Carvalho
eb51c93a5a tests: fix use-after-free in sstable test
After commit a843aea547, a gate was introduced to make sure that
an asynchronous operation is finished before column family is
destroyed. A sstable testcase was not stopping column family,
instead it just removed column family from compaction manager.
That could cause an user-after-free if column family is destroyed
while the asynchronous operation is running. Let's fix it by
stopping column family in the test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ed910ec459c1752148099e6dc503e7f3adee54da.1461177411.git.raphaelsc@scylladb.com>
2016-04-20 22:08:08 +03:00
Pekka Enberg
7af9ac2880 Merge "Add support for User Defined Types" from Duarte
"This patchset enables support for user defined types,
 completing the functionality that was already in place.

 Fixes #426"
2016-04-20 21:26:03 +03:00
Yoav Kleinberger
1543253bfd scyllatop: differentiate metrics coming from different hosts
Fix issue #1173.
Previously scyllatop aggregated metrics coming from a cluster with many
hosts so that individual contributions could not be recognized. This is
now changed so that aggregation is also by hostname.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <8a4d8b82216d8c1aa855026ff31bcfd8bfac7e47.1461150261.git.yoav@scylladb.com>
2016-04-20 20:20:09 +02:00
Duarte Nunes
c04f8c239e udt: Enable user type query test case
This patch enables the test case for user defined types in
cql_query_test.

Fixes #426

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
bc90d6a730 udt: type_parser handles user defined types
This patch ensures type_parser can handle user defined types. It also
prefixes user_type_impl::make_name() with
org.apache.cassandra.db.marshal.UserType.

Fixes #631

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
b5a87f8bdc udt: Add unit test for user type schema changes
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
7911438de0 udt: Add grammar for altering user types
This patch adds support in Cql.g for the alter user type statement.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
fbf70e9bed udt: Add alter type statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:07 +02:00
Duarte Nunes
3e663cfa9a udt: Add capability to replace a user_type
This patch adds a function to abstract_type that locates the usage of
a given user_type and recursively returns an updated version of the
containing type containing the updated user type.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:06 +02:00
Duarte Nunes
6cb57a567f udt: Add grammar for dropping user types
This patch adds support in Cql.g for the drop user type statement.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:06 +02:00
Duarte Nunes
809b45e160 udt: Add drop type statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:02 +02:00
Calle Wilund
7f85373e15 cql3/drop_table_statement: Fix exception handling in access check
Tried to handle possibly benign exception in continuation, but
this is always thrown synchronously.
Fixes ttl_test dtest failures.

Message-Id: <1461154499-10674-1-git-send-email-calle@scylladb.com>
2016-04-20 15:49:04 +03:00
Duarte Nunes
66c60f03fe udt: Add references_user_type to abstract_type
This patch adds a virtual function to the abstract_type hierarchy to
tell whether a given type references the specified type. Needed to
implement the drop and alter type statements.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:07 +02:00
Duarte Nunes
6732da67ab udt: Add is_user_type function to abstract_type
This patch adds a function to identify a given abstract_type as a
user_type_impl.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:07 +02:00
Duarte Nunes
ddb4a4b29b udt: Implement as_cql3_type for user_type_impl
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
35a88b5d49 udt: Complete create_type_statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
d1f215b743 udt: Merge user defined type mutations
This patch implements the merge_types() function,
allowing mutations to user defined types to be applied.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
fdddcfb3ea udt: Fix user type compatibility check
A new user type is checked for compatibility against the previous
version of that type, so as to ensure that an updated field type
is compatible with the previous field type (e.g., altering a field
type from text to blob is allowed, but not the other way around).

However, it is also possible to add new fields to a user type. So,
when comparing a user type against its previous version, we should
also allow the current, new type to be longer than the previous one.
The current code instead allows for the previous type to be longer,
which this patch fixes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
eae7f10906 map_difference: Allow on unordered_map
This patch changes the map_difference interface so difference()
can be called on on unordered_maps.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
7dc895e63d types: Add operator== for abstract_types
This patch allows abstract_types to be compared for equality. In
particular, it enables the indirect_equal_to<abstract_type> idiom.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
0aeb4dcaaf udt: Implement equals() for user_type_impl
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
d6d29f7c52 schema: Replace ad hoc func with indirect_equal_to
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
08a7bba4ed udt: Announce UDT migrations
This patch defines the member functions responsible for announce
create, update and drop user defined types migration.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
dd75fe8ec0 udt: Add mutations for user defined types
This patch implements mutations for user defined types.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
37a1547971 udt: Add migration notifications
This patch adds migration notifications for user defined types.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
c2e3e918e8 udt: Take name by ref when querying for an UDT
..so as not to incur in a copy.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
2c15778fe0 udt: Remove user_types field from keyspace
This field is superfluous and adds confusion regarding the user_types
field in the keyspace metadata.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
c7b3a4b144 udt: Parse user types system table
This patch loads and parses the user types system table during
bootstrap.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
f8d8dbdeb7 types: Don't wrap tombstone in an std::optional
All the callers of do_serialize_mutation_form pass a valid tombstone
that is converted into a non-empty optional. This happens even if the
tombstone is empty (tombstone::timestamp == api::missing_timestamp).

This patch fixes this by passing in a reference to the tombstone which
is convertible to bool, based on whether it is empty or not.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1460620528-3628-1-git-send-email-duarte@scylladb.com>
2016-04-20 09:22:01 +02:00
Duarte Nunes
40c1b29701 cql3: Implement contains relation
Although it doesn't work in the absence of secondary indexes,
now we provide the same error messages as origin when trying to use
the contains relation.

Fixes #1158

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1461088626-26958-1-git-send-email-duarte@scylladb.com>
2016-04-20 09:22:25 +03:00
Pekka Enberg
4ed702f0da Merge "Authorizer support" from Calle
"Conversion/implementation of "authorizer" code from origin, handling
 permissions management for users/resources.

 Default implementation keeps mapping of <user.resource>->{permissions}
 in a table, contents of which is cached for slightly quicker checks.

 Adds access control to all (existing) cql statements.
 Adds access management support to the CQL impl. (GRANT/REVOKE/LIST)

 Verified manually and with dtest auth_test.py. Note that several of these
 still fail due to (unrelated) unimplemented features, like index, types
 etc.

 Fixes #1138"
2016-04-19 15:00:38 +03:00
Calle Wilund
4c246b5cc3 scylla.yaml: Move authorizer/authenticator options to supported section 2016-04-19 11:49:06 +00:00
Calle Wilund
9ed25a970e Cql.g: Permission statements parsing 2016-04-19 11:49:06 +00:00
Calle Wilund
3b101c6e19 cql3::statements::drop_user_statement: Drop all permissions for user 2016-04-19 11:49:06 +00:00
Calle Wilund
14cc47d8b9 cql3::statements::revoke_statement: Initial conversion 2016-04-19 11:49:06 +00:00
Calle Wilund
4e1ef3c1bc cql3::statements::grant_statement: Initial conversion 2016-04-19 11:49:05 +00:00
Calle Wilund
04c37def3a cql3::statements::list_permissions_statement: Initial conversion 2016-04-19 11:49:05 +00:00
Calle Wilund
fe23447f6f cql3::statements::permission_altering_statement: Inital conversion
Alter permission base typ
2016-04-19 11:49:05 +00:00
Calle Wilund
add2111c0a cql3::statements::authorizarion_statement: Initial conversion
Auth cql base type
2016-04-19 11:49:05 +00:00
Calle Wilund
3906dc9f0d cql3::statements: Change check_access to future<> + implement 2016-04-19 11:49:05 +00:00
Calle Wilund
dac6cf69eb service::client_state: Add authorization checkers 2016-04-19 11:49:05 +00:00
Calle Wilund
072acc68da validation: Add KS validation + convinence methods
Looking up local db.
2016-04-19 11:49:05 +00:00
Calle Wilund
a7e1af1c06 db::config: Add permissions cache entries/mark auth/perm as used 2016-04-19 11:49:05 +00:00
Calle Wilund
36bb40c205 auth::auth: Add authorizer initialization + permissions getter
Create and init authorizer object on start. Create thread local
permissions cache to front end the actual authorizer.
2016-04-19 11:49:05 +00:00
Calle Wilund
03568d0325 tests::cql_test_env: Fake logged in user in case test requires is. 2016-04-19 11:49:05 +00:00
Calle Wilund
ead1c882f8 utils::loading_cache: Version of the LoadingCache type used in origin
Simple, expiring, cache of potentially limited number of entries.
2016-04-19 11:49:05 +00:00
Calle Wilund
956ee87e12 auth::authenticator: Change "protected_resources" to return reference
It it an immutable static value anyway.
2016-04-19 11:49:05 +00:00
Calle Wilund
1f0bbf2d9a auth::authorizer: Initial conversion
Main authorization endpoint. Default (and only) real authorizer
keeps a mapping resource -> permission sets in system table
2016-04-19 11:49:04 +00:00
Benoît Canet
e17795d2dd scylla_dev_mode_setup: Unify --developer-mode prompt and parsing
Fixes: #1194

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1461002978-5379-2-git-send-email-benoit@scylladb.com>
2016-04-19 09:38:03 +03:00
Takuya ASADA
f6252be0c1 utils: fix compilation error on utils/exceptions.hh
It doesn't able to find std::system_error due to missing header.

Fixes #1202

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461006884-28316-1-git-send-email-syuu@scylladb.com>
2016-04-19 09:37:31 +03:00
Raphael S. Carvalho
bf03cd1ea6 sstables: kill unused code from size tiered strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <485b1e49419cb052218ab4558f27270ce3bd03b4.1460761821.git.raphaelsc@scylladb.com>
2016-04-19 08:46:06 +03:00
Raphael S. Carvalho
29db5f5e1f sstables: move compaction strategy code to a new source file
Moving compaction strategy code from sstables/compaction.cc to
sstables/compaction_strategy.cc
That improves readability. Strategy code should be separated
from the generic compaction code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <5af6fc8f7321351a071fc0ce03c80ffea21f8396.1460761821.git.raphaelsc@scylladb.com>
2016-04-19 08:45:43 +03:00
Pekka Enberg
a3a0404d33 Merge seastar upstream
* seastar 2185f37...2b3c363 (1):
  > net/tls: Fix compilation with older GnuTLS versions
2016-04-19 08:43:36 +03:00
Calle Wilund
c446fe50e6 tuple_hash: Add convinence operator for two arguments (non-pair) 2016-04-18 13:51:15 +00:00
Calle Wilund
f0d2efd206 data_value: Add constructor from unordered_set<> 2016-04-18 13:51:15 +00:00
Calle Wilund
690c7207fe cql3::untyped_result_set: Add get_set<> method
Gets a value as a, you guessed it, set.
2016-04-18 13:51:15 +00:00
Calle Wilund
443af44f24 log: Add output operator for std::exception&/std::system_error& 2016-04-18 13:51:15 +00:00
Calle Wilund
ca7d339110 auth::authenticated_user: Add copy/move constructors 2016-04-18 13:51:15 +00:00
Calle Wilund
d3a9650646 auth::permission_set: Add < operator 2016-04-18 13:51:15 +00:00
Calle Wilund
c93d114949 auth::permission: Add stringizers + move sets into namespace 2016-04-18 13:51:15 +00:00
Calle Wilund
6e09920f93 auth::data_resource: Fix to_string to match origin 2016-04-18 13:51:15 +00:00
Calle Wilund
bb96e5bd66 auth::data_resource: Move declaration of "resource_ids" 2016-04-18 13:51:15 +00:00
Takuya ASADA
2eb91421eb dist/ami: Show correct login message when scylla-ami-setup.service is still running
While scylla-ami-setup.service is running, login message says "run systemctl status scylla-server" to see status, but it actually never launched yet.

This patch fixes the message to notice RAID construction is running, and 'systemctl status scylla-ami-setup' is the correct way to see status.

Fixes #1035

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460660628-10103-2-git-send-email-syuu@scylladb.com>
2016-04-18 15:22:02 +03:00
Takuya ASADA
07a6057c03 dist/ami: fix incorrect service name on .bash_profile
Ubuntu's service name on .bash_profile is incorrect, fix it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460660628-10103-1-git-send-email-syuu@scylladb.com>
2016-04-18 15:21:48 +03:00
Tomasz Grabiec
45527fcffa Merge branch 'glommer/issue-1144-v5'
From Glauber:

There are current some outstanding issues with the throttling code. It's
easier to see them with the streaming code, but at least one of them is general.

One of them is related to situations in which the amount of memory available
leaves only one memtable fitting in memory. That would only happen with the
general code if we set the memtable cleanup threshold to 100 % - and I don't
even know if it is valid - but will happen quite often with the streaming code.
If that happens, we'll start throttling when that memtable is being written,
but won't be able to put anything else in its place - leading to unnecessary
throttling.

The second, and more serious, happens when we start throttling and the amount
of available memory is not at least 1MB. This can deadlock the database in
the sense that it will prevent any request from continuing, and in turn causing
a flush due to memtable size. It is a good practice anyway to always guarantee
progress.

Fixes #1144
2016-04-18 12:20:13 +02:00
Gleb Natapov
f3b515052b udt: fix error generation if accessed type is not udt
Fixes #1198
Message-Id: <1460884314-3717-2-git-send-email-gleb@scylladb.com>
2016-04-18 12:45:03 +03:00
Duarte Nunes
ece89069dd udt: Implement to_string() for selectable
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1460884314-3717-1-git-send-email-gleb@scylladb.com>
2016-04-18 12:44:48 +03:00
Pekka Enberg
edf7f098e2 Merge "Fix query of collection cell with all items deleted" from Tomek 2016-04-18 11:01:24 +03:00
Tomasz Grabiec
2e08d0f698 Merge branch 'dev/gleb/logging'
Logging improvements from Gleb.
2016-04-15 19:03:44 +02:00
Tomasz Grabiec
89bc32b020 tests: Add test for query of collection with deleted item 2016-04-15 18:14:05 +02:00
Tomasz Grabiec
c69d0a8e87 mutation_partition: Fix collection emptiness check
Broken by f15c380a4f.

This resulted in empty collection being returned in the results
instead of no collection.

Fixes org.apache.cassandra.cql3.validation.entities.CollectionsTest
from cassandra-unit-tests.
2016-04-15 18:14:05 +02:00
Tomasz Grabiec
b0d4782016 types: Add default argument values to is_any_live() 2016-04-15 18:14:05 +02:00
Avi Kivity
0de32ab120 Merge seastar upstream
* seastar 2aeb9dd...2185f37 (15):
  > reactor: avoid issuing systemwide memory barriers in parallel
  > Revert "Use sys_membarrier() when available"
  > Merge "Various exception-safety fixes" from Tomasz
  > future-util: make map reduce exception safe
  > collectd: do not give up after a failure
  > future-util: make repeat_until_value exception safe
  > rpc: do not block connection when unknown verbs is received
  > rpc: do not wait for a reply after timeout
  > rpc: move connection stats to base class
  > core/reactor: Handle io_submit failures inside flush_pending_aio
  > apps/iotune: add --fs-check option to use iotune for kernel version check
  > Merge "Some exception safety patches" from Paweł
  > tls: Fix conversion of dh_params::level to gnutls_sec_param_t
  > core: posix_thread: Mark start_routine as noexcept
  > fair_queue: better overflow protection
2016-04-15 16:06:53 +03:00
Pekka Enberg
3f2286d02e Merge "Delete compacted sstables atomically" from Avi
"If we compact sstables A, B into a new sstable C we must either delete both
A and B, or none of them.  This is because a tombstone in B may delete data
in A, and during compaction, both the tombstone and the data are removed.
If only B is deleted, then the data gets resurrected.

Non-atomic deletion occurs because the filesystem does not support atomic
deletion of multiple files; but the window for that is small and is not
addressed in this patchset.  Another case is when A is shared across
multiple shards (as is the case when changing shard count, or migrating
from existing Cassandra sstables).  This case is covered by this patchset.

Fixes #1181."
2016-04-14 22:04:15 +03:00
Glauber Costa
9c87ae3496 throttle: always release at least one request if we are below the limit
Our current throttling code releases one requests per 1MB of memory available
that we have. If we are below the memory limit, but not by 1MB or more, then
we will keep getting to unthrottle, but never really do anything.

If another memtable is close to the flushing point, those requests may be
exactly the ones that would make it flush. Without them, we'll freeze the
database.

In general, we need to always release at least one request to make sure that
progress is always achieved.

This fixes #1144

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-14 13:13:15 -04:00
Gleb Natapov
9801d69d53 storage_proxy: add query result row count to brief format
Report number of rows in brief reporting format, but only if
we can count them without linearizing result's buffer.
2016-04-14 19:26:00 +03:00
Gleb Natapov
53993527ed storage_proxy: move verbose query result printing into separate logger
If query result is large tracing cannot be done since printing the result takes
too much time and space.
2016-04-14 19:26:00 +03:00
Gleb Natapov
46e5d05220 storage_proxy: cleanup query logging.
Since commit c1cffd06 logger catch errors internally, so no need to
catch most of them at the top level. Only those that can happen during
parameter evaluation can reach here. Change parameters to not throw
too.
2016-04-14 19:26:00 +03:00
Gleb Natapov
15ebe5e4e5 query: add calculate_row_count function to query::result 2016-04-14 19:26:00 +03:00
Gleb Natapov
f47b2dad18 query: add lazy printer to query::result
query::result transformation to printable form is very heavy operation
that allocates memory and thus can fail. Add a class to query::result that
can be used with logger to push to string conversion when output is
performed.
2016-04-14 19:26:00 +03:00
Glauber Costa
2c5dfe08c1 memtable_list: make sure at least two memtables are available
This is usually not a problem for the main memtable list - although it can be,
depending on settings, but shows up easily for the streaming memtables list.

We would like to have at least two memtables, even if we have to cut it short.
If we don't do that, one memtable will have use all available memory and we'll
force throttling until the memtable gets totally flushed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-14 12:12:50 -04:00
Glauber Costa
1daede7396 unnest throttle_state
throttle_state is currently a nested member of database, but there is no
particular reason - aside from the fact that it is currently only ever
referenced by the database for us to do so.

We'll soon want to have some interaction between this and the column family, to
allow us to flush during throttle. To make that easier, let's unnest it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-14 12:12:50 -04:00
Glauber Costa
39def369ce move information about memtables' region group inside memtable list
This is a preparation patch so we can move the throttling infrastructure inside
the memtable_list. To do that, the region group will have to be passed to the
throttler so let's just go ahead and store it.

In consequence of that, all that the CF has to tell us is what is the current
schema - no longer how to create a new memtable.

Also, with a new parameter to be passed to the memtable_list the creation code
gets quite big and hard to follow. So let's move the creation functions to a
helper.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-14 12:12:50 -04:00
Avi Kivity
a843aea547 db: delete compacted sstables atomically
If sstables A, B are compacted, A and B must be deleted atomically.
Otherwise, if A has data that is covered by a tombstone in B, and that
tombstone is deleted, and if B is deleted while A is not, then the data
in A is resurrected.

Fixes #1181.
2016-04-14 17:14:26 +03:00
Avi Kivity
3798d04ae8 sstables: convert sstable::mark_for_deletion() to atomic deletion infrastructure
All deletions must go through the same data structure, or some atomic
deletions will never be satisified.
2016-04-14 17:14:26 +03:00
Avi Kivity
e43dbac836 main: cancel pending atomic deletions on shutdown
A shared sstable must be compacted by all shards before it can be deleted.
Since we're stoping, that's not going to happen.  Cancel those pending
deletions to let anyone waiting on them to continue.
2016-04-14 17:14:26 +03:00
Avi Kivity
2ba584db8d sstables: add delete_atomically(), for atomically deleting multiple sstables
When we compact a set of sstables, we have to remove the set atomically,
otherwise we can resurrect data if the following happens:

 insert data to sstable A
 insert tombstone to sstable B
 compact A+B -> C (removing both data and tombstone)
 delete B only
 read data from A

Since an sstable may be shared by multiple shard, and each shard performs
compaction at a different time, we need to defer deletion of an sstable
set until all shards agree that the set can be deleted.

An additional atomicity issue exists because posix does not provide a way
to atomically delete multiple files.  This issue is not addressed by this
patch.
2016-04-14 17:14:26 +03:00
Pekka Enberg
a1a9294d8c Merge "Support nodetool removenode force and status" from Asias
"With this series, we support all the 3 nodetool removenode commands, e.g.,

$ nodetool removenode 778948bf-6709-4eb5-80fe-bee911e9c3bf

$ nodetool removenode status
RemovalStatus: Removing token (-8969872965815280276). Waiting for
replication confirmation from [127.0.0.3,127.0.0.1].

$ nodetool removenode force
RemovalStatus: No token removals in process.

Tested with:

1)
- start 3 nodes
- inject data with
  cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1

2)
- start 3 nodes
- inject data with
  cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1
- kill -9 node3
- nodetool removenode will wait forever since node3 is gonne, node3
  will never send the replication confirmation to node1
- run nodetool removenode force on node1
  nodetool removenode completes with the following error:
    $ nodetool removenode 31690b82-ebb0-4594-8bcf-1ce82b6e0f6e
    nodetool: Scylla API server HTTP POST to URL
    '/storage_service/remove_node' failed: nodetool removenode force is called by user
  nodetool removenode force completes sucessfully
    $ nodetool removenode force
    RemovalStatus: Removing token (-9171569494049085776). Waiting for
    replication confirmation from [127.0.0.3,127.0.0.1].

Fixes #1135."
2016-04-14 15:44:33 +03:00
Pekka Enberg
144d1e3216 dist/docker/redhat: Start up JMX proxy and include tools
Make the Docker image more user-friendly by starting up JMX proxy in the
background and install Scylla tools in the image. Also add a welcome
banner like we have with our AMI so that users have pointers to nodetool
and cqlsh, as well as our documentation.
Message-Id: <1460376059-3678-1-git-send-email-penberg@scylladb.com>
2016-04-14 15:41:21 +03:00
Pekka Enberg
355c3ea331 dist/docker/redhat: Make sure image builds against latest Scylla
Use "yum clean expire-cache" to make sure we build against the latest
Scylla release.
Message-Id: <1460374418-27315-1-git-send-email-penberg@scylladb.com>
2016-04-14 15:41:10 +03:00
Gleb Natapov
6f13715f8c storage_proxy: add logging to read executor creation path
Message-Id: <1460549369-29523-4-git-send-email-gleb@scylladb.com>
2016-04-14 14:58:02 +03:00
Gleb Natapov
14ecadb247 storage_proxy: add logging for mutation write path
Message-Id: <1460549369-29523-3-git-send-email-gleb@scylladb.com>
2016-04-14 14:57:29 +03:00
Gleb Natapov
dbb1217896 cl: enable logging for insufficient LOCAL_QUORUM consistency
Message-Id: <1460549369-29523-2-git-send-email-gleb@scylladb.com>
2016-04-14 14:56:58 +03:00
Gleb Natapov
dfdbb1e703 storage_proxy: move hack to make coordinator most preferable node for read into sorting function
This is kind of sorting, so it belongs there, but it also fixes a bug in
storage_proxy::get_read_executor() that assumes filter_for_query() do
not change order of nodes in all_nodes when extra replica is chosen.
Otherwise if coordinator ip happens to be last in all_nodes then it will
be chosen as extra replica and will be quired twice.
Message-Id: <1460549369-29523-1-git-send-email-gleb@scylladb.com>
2016-04-14 14:56:21 +03:00
Duarte Nunes
73e3b5ac5d udt: Fix user type compatibility check
A new user type is checked for compatibility against the previous
version of that type, so as to ensure that an updated field type
is compatible with the previous field type (e.g., altering a field
type from text to blob is allowed, but not the other way around).

However, it is also possible to add new fields to a user type. So,
when comparing a user type against its previous version, we should
also allow the current, new type to be longer than the previous one.
The current code instead allows for the previous type to be longer,
which this patch fixes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1460627939-11376-12-git-send-email-duarte@scylladb.com>
2016-04-14 13:30:37 +03:00
Takuya ASADA
f98997120a dist: #!/bin/bash for all scripts
We choosed #!/bin/sh for shebang when we started to implement installer scripts, not bash.
After we started to work on Ubuntu, we found that we mistakenly used bash syntax on AMI script, it caused error since /bin/sh is dash on Ubuntu.
So we changed shebang to /bin/bash for the script, from that time we have both sh scripts and bash scripts.
(2f39e2e269)
If we use bash syntax on sh scripts, it won't work on Ubuntu but works on Fedora/CentOS, could be very easy to confusing.
So switch all scripts to #!/bin/bash. It will much safer.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460594643-30666-1-git-send-email-syuu@scylladb.com>
2016-04-14 12:01:28 +03:00
Pekka Enberg
60352f810a Merge "Fixes for the reading of missing Summary" from Glauber
"This patchset contains some fixes spotted during post-merged review
by {Nad,}av{,i}. I don't consider any of them a must for backport to 1.0,
but since we haven't yet even backported the main series, might as well backport
everything.

It also includes some unit tests to make sure that they will be kept working
in the future."
2016-04-13 11:32:05 +03:00
Raphael S. Carvalho
beaacbda2e tests: test that leveled strategy was fixed
L1 wasn't being compacted into L2.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1a357896a448eafa7da4d28bc56fa02b89d4193e.1460508373.git.raphaelsc@scylladb.com>
2016-04-13 11:14:28 +03:00
Raphael S. Carvalho
c7b728e716 sstables: Fix leveled compaction strategy
There is a problem in the implementation of leveled compaction strategy that
prevents level 1 from being compacted into level 2, and so forth. As a result,
all sstables will only belong to either level 0 or 1. One of the consequences
is level 1 being overwhelmed by a huge amount of sstables.

The root of the problem is a conditional statement in the code that prevents a
single sstable, with level > 0, from being compacted into a subsequent level
that is empty or has no overlapping sstables.

Fixes #1180.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <9a4bffdb0368dea77b49c23687015ff5832299ab.1460508373.git.raphaelsc@scylladb.com>
2016-04-13 11:14:14 +03:00
Asias He
1e84699a64 api: Wire up storage_service removal_status and force_remove_completion
They are used by nodetool removenode:

$ nodetool removenode force
$ nodetool removenode status

For example:

$ nodetool removenode status
RemovalStatus: Removing token (-8969872965815280276). Waiting for
replication confirmation from [127.0.0.3,127.0.0.1].

$ nodetool removenode force
RemovalStatus: No token removals in process.

Tested with:

1)
- start 3 nodes
- inject data with
  cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1

2)
- start 3 nodes
- inject data with
  cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)'
- kill -9 node2
- wait for node2 to be in DOWN state
- run nodetool removenode host2_host_id on node1
- kill -9 node3
- nodetool removenode will wait forever since node3 is gonne, node3
  will never send the replication confirmation to node1
- run nodetool removenode force on node1
  nodetool removenode completes with the following error:
    $ nodetool removenode 31690b82-ebb0-4594-8bcf-1ce82b6e0f6e
    nodetool: Scylla API server HTTP POST to URL
    '/storage_service/remove_node' failed: nodetool removenode force is called by user
  nodetool removenode force completes sucessfully
    $ nodetool removenode force
    RemovalStatus: Removing token (-9171569494049085776). Waiting for
    replication confirmation from [127.0.0.3,127.0.0.1].

Fixes 1135.
2016-04-13 14:53:28 +08:00
Asias He
891e947314 storage_service: Rename remove_node to removenode
nodetool uses removenode command to remove a node. Rename the
implementation in storage_service to match the command.
2016-04-13 14:53:28 +08:00
Asias He
9ffb95216d storage_service: Add force_remove_completion
It is needed by the

$ nodetool removenode force

command.
2016-04-13 14:53:28 +08:00
Asias He
7c7e5967f6 storage_service: Add get_removal_status
It is needed by the

$ nodetool removenode status

command.
2016-04-13 14:53:28 +08:00
Asias He
8d7cd07d6c storage_service: Add print info in confirm_replication
The message is rare but it is very useful to debug removenode operation.
2016-04-13 14:53:28 +08:00
Asias He
ffe91b5755 token_metadata: Do not assert in get_host_id
Throw an exception instead of assert.
2016-04-13 14:53:27 +08:00
Raphael S. Carvalho
c28d168619 sstables: allow user to specify max sstable size with leveled strategy
This change will allow user to specify the maximum size of a new sstable
created as a result of leveled compaction.

Example of using this setting:
ALTER TABLE ks.test5 with compaction = {'sstable_size_in_mb': '1000',
'class': 'LeveledCompactionStrategy'}

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ebb9844401af74388bda12586c2435283f6d8db8.1460486043.git.raphaelsc@scylladb.com>
2016-04-13 09:13:33 +03:00
Raphael S. Carvalho
15246f31f7 sstables: fix incorrect sstable size when compression is enabled
Size of uncompressed sstable was being unconditionally used to determine
when to stop writing a table. When compression is enabled, compressed
size should be used instead. Problem affected Scylla when compression
and leveled strategy were used.

Fixes #1177.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d9bf26def41fb33ca297f4127ce042b7f67adf96.1460484529.git.raphaelsc@scylladb.com>
2016-04-13 09:01:01 +03:00
Glauber Costa
60ab3b3f50 sstable_tests: make sure the generation of the Summary is sane
When we recreate the summary from a missing Summary, we should make
sure it is generated sanely, and that it resembles the Summary that
would have otherwise been there.

In this tests we'll grab one of the Summary tests we've been doing,
and just apply them to the non-existent Summary file. We expect
the same results on those cases. Plus, a new test is added with some
sanity checking.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:55:01 -04:00
Glauber Costa
114ba5e3a8 be robust against broken summary files
Now that we can boot without a Summary file, we can just as easily boot
with a broken one.

Suggested by Nadav, and it is actually very easy to do, so do it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:55:01 -04:00
Glauber Costa
72dc45999d review fixes for generate_summary
Spotted by Avi post-merge
1) Need to close the file
2) Should be using the parameter pc instead of the default_class

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:55:01 -04:00
Glauber Costa
f78f43850d clear components if reading toc fail
This shouldn't be a problem in practice, because if read_toc() fails,
the users will just tend to discard the sstable object altogether, and
not insist on using it.

However, if somebody does try to keep using it, a subsequent read_toc() could
theoretically have some components filled up leading the new reader to believe
the toc was populated successfully.

It is easier to just clear the _components set and never worry about it, than
trying to reason about whether or not that could happen.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:55:01 -04:00
Glauber Costa
0f41ef1b84 index_reader: avoid misleading parent name
Also add comments about the expected signature of IndexConsumer

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-12 11:15:11 -04:00
Takuya ASADA
1eebe8bce1 dist: Support systemd for Ubuntu 15.10
To share systemd unit file between Fedora/CentOS and Ubuntu, generate
systemd unit file on building time since Fedora/CentOS and Ubuntu has
sysconfdir on different place.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459779957-11007-1-git-send-email-syuu@scylladb.com>
2016-04-12 14:39:26 +03:00
Avi Kivity
715794cce6 sstables: filter sstables single-row read using first_key/last_key
Using leveled compaction strategy, only a few sstables will contain a
given key, so we need to filter out the rest.  Using the summary entries
to filter keys works if the key is before the first summary entry,
but does not work if it is after the last summary entry, because the last
summary entry does not represent the last key; so sstables that are
are towards the beginning of the ring are read even if they do not contain
the key, greatly reducing read performance.

Fix by consulting the summary's first_key/last_key entries before consulting
the summary entry array.
2016-04-12 10:33:17 +03:00
Pekka Enberg
64c9ebb962 Merge "More exception safety fixes" from Paweł
"This is the second part of exception safety fixes for issues
discovered using memory allocation failure injector."
2016-04-12 08:08:00 +03:00
Paweł Dziepak
d53354947c storage_proxy: mark hint_to_dead_endpoints() noexcept
Hints are currently unimplemented but there is code depending on the
fact that hint_to_dead_endpoints() doesn't throw.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-12 00:06:10 +01:00
Paweł Dziepak
209b373412 exceptions: make exception constructors noexcept
Some of the exceptions are not thrown but constructed and set to some
future. In such case if there is another exception thrown in the
constructor it won't be propagated properly as it will casue stack to be
unwind in the place where the future is set, not in the continuation
chain waiting for it.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-12 00:06:02 +01:00
Paweł Dziepak
b00a3a76cc transport: ignore errors during connection shutdown
If the other end of the connection has already disconnected the shutdown
will fail with ENOTCONN. The resulting exception is going to propagate
through the continuation chain that is supposed to shut the cql server
down preventing it from properly waiting for all outstanding
continuations.

The solution is to just ignore any errors that shutdown() may return.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:54:47 +01:00
Paweł Dziepak
0d3d0a3c08 gossiper: handle failures in gossiper thread creation
seastar::async() creates a seastar thread and to do that allocates
memory. That allocation, obviously, may fail so the error handling code
needs to be moved so that it also catches errors from thread creation.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:54:47 +01:00
Paweł Dziepak
c1cffd0639 log: try to report logger failure
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:54:47 +01:00
Paweł Dziepak
b75c4098f2 storage_proxy: catch all errors in abstract_read_executor::execute()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:52:13 +01:00
Paweł Dziepak
9cd3da496e transport: retry do_accept() in case of bad_alloc
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:52:13 +01:00
Paweł Dziepak
2db70cf912 database: remove throw() specifiers
Most of them are missing std::bad_alloc (which leads to aborts) and they
force the compiler to add unnecessary runtime checks.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:52:13 +01:00
Gleb Natapov
3734dcbace storage_proxy: cleanup data_read_resolver::resolve()
live_row_count is summed several times in the same function. Do it only
once.

--
v1->v2:
  - call get() on std::reference_wrapper<std::vector<partition>> to get
    to reference for moving out of it.

Message-Id: <20160411123829.GE21479@scylladb.com>
2016-04-11 17:13:48 +02:00
Pekka Enberg
7af46e41e5 Merge "CQL authentication implementation" from Calle
"Adds support for CQL commands to create, alter, drop and list users.
Verified manually and by relevant dtests.

With this patch set, scylla supports adding super/regular users and
run sessions logged in as these. Note however that since actual
authorization is still not implemented, no CF/KS is really protected
by the authentication beyond initial login.

Some fixes for lingering bugs in user management in the existing code
as well.

Fixes #1121"
2016-04-11 12:57:00 +03:00
Pekka Enberg
4e04805352 cql3: Make lexer and parser error messages compatible with Cassandra
The default recognition error messages in antlr C++ backend are
different from Java backend which makes Scylla's CQL error messages
incompatible with Cassandra. This makes it very hard to write CQL level
test cases which are portable between Scylla and Cassandra.

To fix the issue, override the most common lexer and parser error
messages to follow the convention set by the antlr Java backend. This
unlocks various test cases in AlterTest, for example.
Message-Id: <1460032883-14422-1-git-send-email-penberg@scylladb.com>
2016-04-11 12:35:53 +03:00
Calle Wilund
ceac4df164 Cql.g: Add create/drop/alter/list user parsing 2016-04-11 09:10:41 +00:00
Calle Wilund
b8bd77e621 cql3::list_users_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
adaf21403b cql3::drop_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
8732b3eed7 cql3::alter_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
da89189308 cql3::create_user_statement: Initial conversion 2016-04-11 09:10:41 +00:00
Calle Wilund
57f5bb854f cql3::authentication_statement: cql auth base class 2016-04-11 09:10:41 +00:00
Calle Wilund
cef52d1653 cql3::user_options: Add options wrapper type 2016-04-11 09:10:41 +00:00
Calle Wilund
7ebac35779 client_state: break up setting login/validation
transport::server uses client_state in a move-temporary-around
fashion. Having a setter that does continuation-bound validation
makes this messier. Break them up to separate "this" placement
from the actual validation continuation logic
2016-04-11 09:10:41 +00:00
Calle Wilund
83e2604bc6 client_state: Propagate login user in merge 2016-04-11 09:10:41 +00:00
Calle Wilund
3daf768a82 client_state : Add ensure_not_anonymous method 2016-04-11 09:10:41 +00:00
Calle Wilund
1d7930c4bd authenticated_user: implement "is_super"
Which also, unfortunately, must be a continuation. (Queries tables)
2016-04-11 09:10:41 +00:00
Calle Wilund
d9b176307f auth::authenticator: option<->string 2016-04-11 09:10:41 +00:00
Raphael S. Carvalho
8fe7524e46 sstables: enable leveled strategy feature to prevent L0 from falling behind
If level 0 falls behind, size tiered strategy is used on it to reduce overhead
until we can catch up on the higher levels.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <17bf15b7d12cd5dc652cc92939c0c68f921662a2.1459976469.git.raphaelsc@scylladb.com>
2016-04-11 11:52:00 +03:00
Nadav Har'El
92ef11ffaa stables_mutation_test: more compare keys not representations
Commit 0fc4c36952 missed one place where
keys were compared using their byte representation. Fix that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459778074-10759-4-git-send-email-nyh@scylladb.com>
2016-04-11 11:36:17 +03:00
Nadav Har'El
9f9353ae5b sstable_mutation_test: another test for range tombstone merging
This is even a more elaborate tombstone merging unit test, with
3 levels of nesting, which did not pass with older range-tombstone
merging algorithms, and works with the current one.

I started with deletion of three nested levels of row -
aaa, aaa:bbb, and aaa:bbb::ccc. I then complicated the sstable
even further by adding additional middle-points with the same
timestamps (which we saw happening in some real-life sstables),
resulting in:

[
{"key": "pk",
 "cells": [["aaa:_","aaa:bba:_",1459438519943668,"t",1459438519],
           ["aaa:bba:_","aaa:bbb:_",1459438519943668,"t",1459438519],
           ["aaa:bbb:_","aaa:bbb:ccb:_",1459438519950348,"t",1459438519],
           ["aaa:bbb:ccb:_","aaa:bbb:ccc:_",1459438519950348,"t",1459438519],
           ["aaa:bbb:ccc:_","aaa:bbb:ccc:!",1459438519958850,"t",1459438519],
           ["aaa:bbb:ccc:!","aaa:bbb:ddd:!",1459438519950348,"t",1459438519],
           ["aaa:bbb:ddd:!","aaa:bbb:!",1459438519950348,"t",1459438519],
           ["aaa:bbb:!","aaa:!",1459438519943668,"t",1459438519]]}
]

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459778074-10759-3-git-send-email-nyh@scylladb.com>
2016-04-11 11:35:59 +03:00
Nadav Har'El
77a793048e sstable_mutation_test: strengthen tombstone_merging test
In the tombstone_merging test, we expected one row tombstone. But we did
not verify that in addition to that row tombstone, there is no other rows
(deleted or otherwise). It turns out that in the onld merging algorithm,
we did produce additional deleted rows which shouldn't have been there.

So this patch adds a test that there are no such additional deleted rows
beyond the one row tombstone we expect. The test passes with the new
range tombstone merging algorithm.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459778074-10759-2-git-send-email-nyh@scylladb.com>
2016-04-11 11:35:46 +03:00
Nadav Har'El
818f14f444 stable: overhaul (again) range tombstone merging
In commit 99ecda3c96, we overhauled the
way we read Cassandra's disjoint range tombstones, and convert them to
the overlapping whole-prefix tombstones which we support.

Unfortunately, while this algorithm worked correctly for a couple of
test cases, it did not for additional test cases. While the previous
algorithm could not generate "wrong" tombstones (it didn't generate things
it didn't see), it could generate redundant overlapping tombstones, and
missed some sanity checks about the correctness of the merge process.

In this patch, a new algorithm makes sure to not generate redundant
tombstones, and includes additional tests to ensure that we do not
mistakenly merge range tombstones which cannot actually be merged.

The following patches will include tests which failed with the previous
algorithm, and succeeds with this one.

I described the new algorithm on the ScyllaDB mailing list this way:

1. Have a stack of open ranges, start & timestamp for each (no end for
   each), and just one "end of last contiguous deletion"
Processing each range tombstone:
2. If the start of a range tombstone is not adjacent to the "end of last
   deletion", assert we have no open range on the stack (because we can
   never close those). In any case, set the "end of of last deletion" to
   the end of this tombstone.
3. If the current tombstone's timestamp is STRICTLY HIGHER than that on the
   top of the stack, push the new tombstone's start+timestamp to the stack.
   Note: If it was STRICTLY LOWER, throw error (it means the open range will
   never be closed).
4. If the current tombstone's end matches (i.e., closes row) of the start on
   the top of the stack, emit this tombstone and pop the stack.
When the row ends:
5. Assert the stack is empty.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459778074-10759-1-git-send-email-nyh@scylladb.com>
2016-04-11 11:35:23 +03:00
Avi Kivity
0c7f9917dc Merge seastar upstream
* seastar aa281bd...2aeb9dd (20):
  > memory: avoid exercising the reclaimers for oversized requests
  > tests: test cross-cpu free not underflowing live object counter
  > memory: fix live objects counter underflow due to cross-cpu free
  > core/reactor: Don't abort in allocate_aligned_buffer() on allocation failure
  > build: add --tests-debuginfo, to avoid stripping tests
  > connected_socket: Add buffer size arg to output()
  > scripts/posix_net_conf.sh: added a support for bonding interfaces
  > scripts/posix_net_conf.sh: move the NIC configuration code into a separate function
  > scripts/posix_net_conf.sh: implement the logic for selecting default MQ mode
  > scripts/posix_net_conf.sh: forward the interface name as a parameter
  > http/routes: Remove request failure logging to stderr
  > lowres_clock: Initialize _now when the clock is created
  > apps/iotune: fix broken URL
  > tutorial: expand and improve semaphore section
  > DPDK: support set RSS key to port_conf when hash_key_size is unknown
  > dpdk: aware of vmxnet3 max xmit frags and do linearizing
  > packet_util: insert out of order packet when map is empty
  > core: Fix use-after-free of scollectd::impl
  > futures: Optimize finally()
  > futures: Factor out exceptional path of finally()
2016-04-10 18:08:51 +03:00
Pekka Enberg
9b98278436 Merge "Be able to boot without a Summary" from Glauber
"Summary files are a relatively recent addition to Cassandra. I thought
that every SSTable converted to 2.1 would have them, but that does not
seem to be true. It's easy to generate a stream of files that will boot
in Cassandra 2.1 just fine, but not in Scylla as they will be missing
the Summary.

Cassandra can boot those files because they are robust against the Summary
not existing, and we should do the same.

Since we keep the Summary in memory, in case one does not exist we create a
memory copy of it from the Index - the filesystem is not touched. Hopefully,
compaction will run soon and the next time we boot we won't have to do such
thing.

Fixes #1170"
2016-04-09 20:38:57 +03:00
Pekka Enberg
992dab3fcb Merge "Fixes for mutation querying" from Tomek
"Fixes dtest failure of paging_test.py:TestPagingData.static_columns_paging_test"
2016-04-09 09:07:36 +03:00
Glauber Costa
8a50b027aa summary: generate one if it is not present
There are cases in which a Summary file will not be present, and imported
SSTables will have just the Index and Data files. In earlier versions of
Cassandra, a Summary didn't exist, so one may not be generated when migrating.

In Issue #1170, we can see an example of tables generated by CQLSSTableWriter,
and they lack a Summary. Cassandra is robust against this and can cope
perfectly with the Summary not existing. I will argue that we should do the
same.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
4de26fdec8 sstables: allow read_toc to be called more than once
We do that by bailing immediately if we detect that the components
map is already populated. This allow us to call read_toc() earlier
if we need to - for instance, to inquire about the existence of the
Summary - without the need to re-read the components again later.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
736e21222e sstables: avoid passing schema unnecessarily
for prepare_summary we can just pass the min interval as a parameter and
avoid having the schema do yet another hop. For sealing the summary, it
is completely unused and we can do away with it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
0de3a32147 index reader: make index_consumer a template parameter
This is done so we can use other consumers. An example of that, is regeneration
of the Summary from an existing Index.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
8453ff7788 make get_sstable_key_range an instance method
Because just creating an SSTable object does not generate any I/O,
get_sstable_key_range should be an instance method. The main advantage
of doing that is that we won't have to read the summary twice. The way
we're doing it currently, if happens to be a shard-relevant table we'll
call load() - which reads the summary again.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Glauber Costa
6ae601a025 do not re-read the summary
There are times in which we read the Summary file twice. That actually happens
every time during normal boot (it doesn't during refresh). First during
get_sstable_key_range and then again during load().

Every summary will have at least one entry, so we can easily test for whether
or not this is properly initialized.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-04-08 17:14:29 -04:00
Tomasz Grabiec
3e0c24934b tests: cql_query_test: Add test for slicing in reverse 2016-04-08 20:53:33 +02:00
Tomasz Grabiec
c2b955d40b mutation_partition: Fix static row being returned when paginating
Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test.

Broken by f15c380a4f, where the
calcualtion of has_ck_selector got broken, in such a way that present
clustering restrictions were treated as if not present, which resulted
in static row being returned when it shouldn't.

While at it, unify the check between query_compacted() and
do_compact() by extracting it to a function.
2016-04-08 20:53:33 +02:00
Tomasz Grabiec
a1539fed95 mutation_partition: Fix reversed trim_rows()
The first erase_and_dispose(), which removes rows between last
position and beginning of the next range, can invalidate end()
iterator of the range. Fix by looking up end after erasing.

mutation_partition::range() was split into lower_bound() and
upper_bound() to allow for that.

This affects for example queries with descending order where the
selected clustering range is empty and falls before all rows.

Exposed by f15c380a4f, which is now
calling do_compact() during query.

Reproduced by dtest paging_test.py:TestPagingData.static_columns_paging_test
2016-04-08 20:53:33 +02:00
Avi Kivity
db03295c8a Merge "Fix query digest mismatch" from Tomasz
"Currently data query digest includes cells and tombstones which may have
expired or be covered by higher-level tombstones. This causes digest
mismatch between replicas if some elements are compacted on one of the
nodes and not on others. This mismatch triggers read-repair which doesn't
resolve because mutations received by mutation queries are not differing,
they are compacted already.

The fix adds compacting step before writing and digesting query results by
reusing the algorithm used by mutation query. This is not the most optimal
way to fix this. The compaction step could be folded with the query writing,
there is redundancy in both steps. However such change carries more risk,
and thus was postponed.

perf_simple_query test (cassandra-stress-like partitions) shows regression
from 83k to 77k (7%) ops/s.

Fixes #1165."
2016-04-08 12:13:29 +03:00
Pekka Enberg
47a904c0f6 Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias
"There is a need to have an ability to detect whether a feature is
supported by entire cluster. The way to do it is to advertise feature
availability over gossip and then each node will be able to check if all
other nodes have a feature in question.

The idea is to have new application state SUPPORTED_FEATURES that will contain
set of strings, each string holding feature name.

This series adds API to do so.

The following patch on top of this series demostreates how to wait for features
during boot up. FEATURE1 and FEATURE2 are introduced. We use
wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully.
Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout.

   --- a/service/storage_service.cc
   +++ b/service/storage_service.cc
   @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() {
        // Add features supported by this local node. When a new feature is
        // introduced in scylla, update it here, e.g.,
        // return sstring("FEATURE1,FEATURE2")
   -    return sstring("");
   +    return sstring("FEATURE1,FEATURE2");
    }

    std::set<inet_address> get_seeds() {
   @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() {
        // gossip snitch infos (local DC and rack)
        gossip_snitch_info().get();

   +    gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get();
   +    logger.info("Wait for FEATURE1 and FEATURE2 done");
   +    gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get();
   +    logger.info("Wait for FEATURE3 done");
   +

We can query the supported_features:

    cqlsh> SELECT supported_features from system.peers;

     supported_features
    --------------------
      FEATURE1,FEATURE2
      FEATURE1,FEATURE2

    (2 rows)
    cqlsh> SELECT supported_features from system.local;

     supported_features
    --------------------
      FEATURE1,FEATURE2

    (1 rows)"
2016-04-08 09:22:50 +03:00
Benoît Canet
7c99ecf16f scylla_setup: Check if scylla-jmx is installed
Signed-of-by: Benoît Canet <benoit@scylladb.com>

Fixes #1107
Message-Id: <1460045692-815-1-git-send-email-benoit@scylladb.com>
2016-04-08 09:03:38 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Tomasz Grabiec
474a35ba6b tests: Add test for query digest calculation 2016-04-07 19:57:19 +02:00
Tomasz Grabiec
4418da77e6 tests: mutation_source: Include random mutations in generate_mutation_sets() result
Probably increases coverage.
2016-04-07 19:57:19 +02:00
Tomasz Grabiec
5d768d0681 tests: mutation_test: Move mutation generator to mutation_source_test.hh
So that it can be reused.
2016-04-07 19:57:19 +02:00
Tomasz Grabiec
30d25bc47a tests: mutation_test: Add test case for querying of expired cells 2016-04-07 19:57:19 +02:00
Tomasz Grabiec
58bbd4203f partition_slice_builder: Add new setters 2016-04-07 19:57:19 +02:00
Tomasz Grabiec
7cd8e61429 tests: result_set_assertions: Add and_only_that() 2016-04-07 19:57:19 +02:00
Tomasz Grabiec
f15c380a4f database: Compact mutations when executing data queries
Currently data query digest includes cells and tombstones which may have
expired or be covered by higher-level tombstones. This causes digest
mismatch between replicas if some elements are compacted on one of the
nodes and not on others. This mismatch triggers read-repair which doesn't
resolve because mutations received by mutation queries are not differing,
they are compacted already.

The fix adds compacting step before writing and digesting query results by
reusing the algorithm used by mutation query. This is not the most optimal
way to fix this. The compaction step could be folded with the query writing,
there is redundancy in both steps. However such change carries more risk,
and thus was postponed.

perf_simple_query test (cassandra-stress-like partitions) shows regression
from 83k to 77k (7%) ops/s.

Fixes #1165.
2016-04-07 19:56:58 +02:00
Tomasz Grabiec
e4e8acc946 mutation_query: Extract main part of mutation_query() into more generic querying_reader
So that it can be reused in query()
2016-04-07 19:03:04 +02:00
Takuya ASADA
ed7a3beed2 dist/ubuntu: drop unused scripts
This was used when we didn't shared scripts between CentOS/Fedora and Ubuntu, but used anymore so drop them.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459408583-13497-1-git-send-email-syuu@scylladb.com>
2016-04-06 08:21:09 +03:00
Asias He
d5dce8016b storage_service: Advertise supported_features into cluster
Advertise features supported by this node, so that other nodes can know
this info. For example, on a 3 node cluster with supported_features ==
FEATURE1 and FEATURE2, it looks like:

cqlsh> SELECT supported_features from system.peers;

 supported_features
--------------------
  FEATURE1,FEATURE2
  FEATURE1,FEATURE2

(2 rows)
cqlsh> SELECT supported_features from system.local;

 supported_features
--------------------
  FEATURE1,FEATURE2

(1 rows)
2016-04-06 07:12:34 +08:00
Asias He
0e1738943d storage_service: Add supported_features into system.peers table 2016-04-06 07:12:34 +08:00
Asias He
50bcfe569a system_keyspace: Add supported_features into system.local table 2016-04-06 07:12:34 +08:00
Asias He
b710a5f9ee storage_service: Introduce get_config_supported_features
It tells features supported by this local node. When new feature is
introduced in scylla, update features returned by
get_config_supported_features, e.g.,

   return sstring("FEATURE1,FEATURE2")
2016-04-06 07:12:34 +08:00
Asias He
e0a82a1107 gossip: Add supported_features helper in versioned_value
Give a supported features sstring, return a versioned_value for it.
2016-04-06 07:12:34 +08:00
Asias He
214c0f72b2 db: Add supported_features column in system.local and system.peers table 2016-04-06 07:12:34 +08:00
Asias He
04e8727793 gossip: Introduce wait_for_feature_on_{all}_node
API to wait for features are available on a node or all the nodes in the
cluster.

$timeout specifies how long we want to wait. If the features are not
availabe yet, sleep 2 seconds and retry.
2016-04-06 07:12:34 +08:00
Asias He
1e437e925c gossip: Introduce get_supported_features
- Get features supported by this particular node

  std::set<sstring> get_supported_features(inet_address endpoint) const;

- Get features supported by all the nodes this node knows about

  std::set<sstring> get_supported_features() const;
2016-04-06 07:12:34 +08:00
Asias He
a6080773b3 gossip: Add SUPPORTED_FEATURES application_state
It is used to negotiate cluster wide features.
2016-04-06 07:12:34 +08:00
Piotr Jastrzebski
d3f91eec61 Implement tuple_type_impl::from_string
This is a fix for:
https://github.com/scylladb/scylla/issues/574

It mirrors the behavior of:
org.apache.cassandra.db.marshal.TupleType.java#fromString

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <24a7d6253727d0faebb1df117c2f52410523d42f.1459843091.git.piotr@scylladb.com>
2016-04-05 16:00:18 +03:00
Vlad Zolotarov
2daaa00c4f conf: resurrect the important text related to endpoint_snitch configuration
commit d1b44cef1b removed an
important part of a comment related to an 'endpoint_snitch'
configuration. This patch puts it back.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1459858934-12005-1-git-send-email-vladz@cloudius-systems.com>
2016-04-05 15:23:13 +03:00
Raphael S. Carvalho
e15ce5eb4d api: Add support to get column family compression ratio
After this change, user can query compression ratio on a per column
family basis with 'nodetool cfstats'.

look at 'nodetool cfstats' output:
./bin/nodetool cfstats ks.test5
Keyspace: ks
	Read Count: 0
	Read Latency: NaN ms.
	Write Count: 0
	Write Latency: NaN ms.
	Pending Flushes: 0
		Table: test5
		SSTable count: 1
		Space used (live): 4774
		Space used (total): 4774
		Space used by snapshots (total): 0
		Off heap memory used (total): 131384
		SSTable Compression Ratio: 0.833333
	...

Fixes #636.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a1bee5a23fe63787df3e387a88f2d216ba4a4134.1459802771.git.raphaelsc@scylladb.com>
2016-04-05 12:46:40 +03:00
Asias He
d1b44cef1b conf: Drop duplicated section for endpoint_snitch
endpoint_snitch is supported and it is the "Supported Parameters".
Remove the duplicated section in "Unsupported parameters".
Message-Id: <f8260b72558305f9186c011b8f8f452b3b91339b.1459325982.git.asias@scylladb.com>
2016-04-05 08:48:48 +03:00
Pekka Enberg
32471fcb96 Merge "Do batch log replay in decommission" from Asias
"batchlog_manager is modified to allow the storage_service to initate a bachlog
 replay operation.

 Refs #1085.

 Tested with tests/batchlog_manager_test and batch_test.py"
2016-04-05 08:42:47 +03:00
Gleb Natapov
70575699e4 commitlog, sstables: enlarge XFS extent allocation for large files
With big rows I see contention in XFS allocations which cause reactor
thread to sleep. Commitlog is a main offender, so enlarge extent to
commitlog segment size for big files (commitlog and sstable Data files).

Message-Id: <20160404110952.GP20957@scylladb.com>
2016-04-04 14:15:00 +03:00
Amnon Heiman
725231a7a0 api: set the api_doc before registering any api
This is a left over from the re ordering of the API init.  The api_doc
should be set first, so later API registration will enable their
relevent swagger doc.

Currently, the swagger documentation of the system API is not available.

Fixes #1160

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1459750490-15996-1-git-send-email-amnon@scylladb.com>
2016-04-04 11:37:59 +03:00
Avi Kivity
6a3cf4ac41 cql: unlock ALTER TABLE syntax
It was marked experimental for 1.0, but will be fully supported in the
next release.
Message-Id: <1459707946-5860-1-git-send-email-avi@scylladb.com>
2016-04-04 11:36:11 +03:00
Piotr Jastrzebski
613e7d8618 Add more info to wrong RPC address error
If listening on RPC address failed then report
IP address and port in the error message.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c4db3527df2ce6dccb3b619584ee3fcb1e70ffd1.1459512258.git.piotr@scylladb.com>
2016-04-03 12:57:19 +03:00
Takuya ASADA
cad5edc53b dist: fix build error at copy symlinks
Both build_rpm.sh and build_deb.sh will fail with "cannot stat 'xxx': No such file or directory" when scylla-server package is not installed, need to prevent it by --no-dereference option of cp.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459523585-9108-1-git-send-email-syuu@scylladb.com>
2016-04-03 12:49:55 +03:00
Tomasz Grabiec
0fc4c36952 tests: sstable_mutation_test: Compare keys not representations
Representation is opaque at this level of abstraction.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459508193-7086-1-git-send-email-tgrabiec@scylladb.com>
2016-04-03 11:39:03 +03:00
Nadav Har'El
6c4ee49bd3 sstables: another test for range tombstone merging
This is another unit test for range tombstone merging, introduced in commit
0fc9a5ee4d and rewritten in commit
99ecda3c96.

In this test, a single large deletion was broken up into several smaller
ranges, all with the same time stamps, so we should recombine them into
one row tombstone, instead of failing the read.

The sstable in this test case was artificially created using json2sstable.
We don't know how yet to produce such a case using Cassandra 2, but we
have seen a similar occurance in the wild, in a real SSTable.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459429243-15821-1-git-send-email-nyh@scylladb.com>
2016-04-01 11:55:14 +02:00
Takuya ASADA
d59c1c7648 dist/redhat: drop very old %pre script
These lines are needed for very old version of scylla, not for 1.0.
Can be removed now.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459177601-20269-1-git-send-email-syuu@scylladb.com>
2016-04-01 09:41:18 +03:00
Pekka Enberg
b9a1aef670 Merge "Random exception safety fixes" from Paweł
"These patches fix some of the problems found by randomly injecting
 memory allocation failures."
2016-04-01 08:58:00 +03:00
Paweł Dziepak
8f78b8e190 log: ignore logging exceptions
Logging is used in many places including those that shouldn't really
throw any exceptions (destructors, noexcept functions).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:43:32 +01:00
Paweł Dziepak
c8159eca52 commitlog: make sure that segment destructor doesn't throw
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:42:56 +01:00
Paweł Dziepak
3e0555809e storage_proxy: catch all exceptions in read executor
abstract_read_executor::reconcile() is supposed to make sure that
_result_promise is eventually set to either a result or an exception.
That may not happen however if reconciliation throws any exception
since only read timeouts are being caught. When that happends the
continuation chain becomes stuck.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:38:41 +01:00
Paweł Dziepak
3c107c4b05 sstables: remove HyperLogLog throw() specifier
HyperLogLog constructor promises that it only throws instances of
std::invalid_argument. That's a lie since it also adds elements to a
vector (and doesn't catch potential bad_allocs).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-31 16:36:53 +01:00
Avi Kivity
417bcb122d commitlog: ignore commitlog segments generated by Cassandra-derived tools
Cassandra-derived tools (such as sstable2json) may write commitlog segments,
that Scylla cannot recognize.  Since we now write them with a distinct name,
we can recognize the name and ignore these segments, as we know the data they
contain is not interesting.

Fixes #1112.
Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>
2016-03-31 16:01:08 +03:00
Nadav Har'El
78c9f49585 sstables: Move check_marker() to source file
The check_marker() function is use as a sanity-check of data we read
from sstable, so instead of the header file key.hh, let's move it to
the sstable-parsing source file partition.cc.

In addition to having less code in header files, another benefit is
that the function can now throw a more specific exception (malformed
sstable exception).

Also fixed the exception's message (which had a second "%d" but only
one parameter).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459420430-5968-1-git-send-email-nyh@scylladb.com>
2016-03-31 14:22:51 +03:00
Nadav Har'El
99ecda3c96 sstables: overhaul range tombstone reading
Until recently, we believed that range tombstones we read from sstables will
always be for entire rows (or more generalized clustering-key prefixes),
not for arbitrary ranges. But as we found out, because Cassandra insists
that range tombstones do not overlap, it may take two overlapping row
tombstones and convert them into three range tombstones which look like
general ranges (see the patch for a more detailed example).

Not only do we need to accept such "split" range tombstones, we also need
to convert them back to our internal representation which, in the above
example, involves two overlapping tombstones. This is what this patch does.

This patch also contains a test for this case: We created in Cassandra
an sstable with two overlapping deletions, and verify that when we read
it to Scylla, we get these two overlapping deletions - despite the
sstable file actually having contained three non-overlapping tombstones.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <b7c07466074bf0db6457323af8622bb5210bb86a.1459399004.git.glauber@scylladb.com>
2016-03-31 12:49:50 +03:00
Pekka Enberg
2629389d5d dist/docker/ubuntu: Use bash in start-scylla script
The default shell in Ubuntu is "dash" which causes the following error
when "scylla-start" script is executed:

  /start-scylla: 8: /start-scylla: source: not found

Message-Id: <1459406561-20141-1-git-send-email-penberg@scylladb.com>
2016-03-31 11:21:36 +03:00
Duarte Nunes
26a3461908 cql: Fix antlr3 missing token leak
This patch overrides the antlr3 function that allocates the missing
tokens that would eventually leak. The override stores these tokens in
a vector, ensuring memory is freed whenever the parser is destroyed.

Fixes #1147

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1459355146-17402-1-git-send-email-duarte@scylladb.com>
2016-03-31 08:44:45 +03:00
yan cui
6fc29843cd dist/docker: refine docker file for ubuntu 2016-03-30 18:54:14 +03:00
Duarte Nunes
f7a12adb6f cql3: Disable pg-style string format test
antlr3 leaks the token itself creates when recovering from a mismatch in
the case the missing token can be determined. Until this bug is fixed
or circumvented, the test should remain disabled.

Ref #1147

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1459345403-8243-1-git-send-email-duarte@scylladb.com>
2016-03-30 16:44:47 +03:00
Asias He
bc1889b7ab storage_service: Shutdown batchlog_manager after decommission
On the node which was decommissioned, I saw

2016-03-29 09:35:52,097 [shard 0] storage_service - DECOMMISSIONED:
2016-03-29 09:35:52,097 [shard 0] storage_service - DECOMMISSIONING: done
2016-03-29 09:36:28,814 [shard 0] batchlog_manager - Batchlog replay on shard 0: starts
2016-03-29 09:36:28,814 [shard 0] batchlog_manager - Batchlog replay on shard 0: done
2016-03-29 09:37:28,819 [shard 0] batchlog_manager - Batchlog replay on shard 1: starts
2016-03-29 09:37:28,820 [shard 0] batchlog_manager - Batchlog replay on shard 1: done
2016-03-29 09:38:28,830 [shard 0] batchlog_manager - Batchlog replay on shard 0: starts
2016-03-29 09:38:28,830 [shard 0] batchlog_manager - Batchlog replay on shard 0: done
2016-03-29 09:39:28,844 [shard 0] batchlog_manager - Batchlog replay on shard 1: starts
2016-03-29 09:39:28,844 [shard 0] batchlog_manager - Batchlog replay on shard 1: done

We should stop the batchlog_manager to avoid initiating only future
batchlog replay operation.
2016-03-30 20:54:30 +08:00
Asias He
5d1140b1eb storage_service: Do batch log replay in decommission
Replay the batch log during decommission. Kill one FIXME.

Refs #1085
2016-03-30 20:54:30 +08:00
Asias He
5550aeba1d batchlog_manager: Avoid stopping batchlog_manager more than once
We can stop batchlog_manager in decommission and drain.
Avoid stopping it more than once.

Fix the following error:

$ nodetool decommission
$ nodetool drain

storage_service - DECOMMISSIONING: stop_gossiping done
storage_service - messaging_service stopped
storage_service - DECOMMISSIONING: stop messaging_service done
storage_service - DECOMMISSIONING: set_bootstrap_state done
storage_service - DECOMMISSIONED:
storage_service - DECOMMISSIONING: done
storage_service - DRAINING: starting drain process
gossip - gossip is already stopped
scylla: ./seastar/core/gate.hh:93: future<> seastar::gate::close():
Assertion `!_stopped && "seastar::gate::close() cannot be called
more than once"' failed.
2016-03-30 20:54:30 +08:00
Asias He
cdb43c5586 batchlog_manager: Allow user initiated bachlog replay operation
During decommission, the storage_service::unbootstrap() needs to
initiate a batchlog replay operation. To sync the replay operation
initiated by the timer in batchlog_manager and storage_service, a
semaphore is introduced.  To simplify the semaphore locking, the
management code now always runs on shard zero, but the real work is
distruted to all shards.
2016-03-30 20:54:30 +08:00
Nadav Har'El
0fc9a5ee4d sstables: merge range tombstones if possible
This is a rewrite of Glauber's earlier patch to do the same thing, taking
into account Avi's comments (do not use a class, do not throw from the
constructor, etc.). I also verified that the actual use case which was
broken in #1136 was fixed by this patch.

Currently, we have no support for range tombstones because CQL will not
generate them as of version 2.x. Thrift will, but we can safely leave this for
the future.

However, we have seen cases during a real migration in which a pure-CQL
Cassandra would generate range tombstones in its SSTables.

Although we are not sure how and why, those range tombstones were of a special
kind: their end and next's start range were adjacent, which means that in
reality, they could very well have been written as a single range tombstone for
an entire clustering key - which we support just fine.

This code will attempt to fix this problem temporarily by merging such ranges
if possible. Care must be taken so that we don't end up accepting a true
generic range tombstone by accident.

Fixes #1136

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459333972-20345-1-git-send-email-nyh@scylladb.com>
2016-03-30 13:40:10 +03:00
Calle Wilund
0f5ca342b8 lists.cc: setter_by_uuid does not require read before execute
Fixes #1082

Setting by UUID does not need existing data in list,
so need no read before execute

Message-Id: <1459325931-16387-1-git-send-email-calle@scylladb.com>
2016-03-30 11:24:20 +03:00
Takuya ASADA
73fa36b416 dist/common/scripts: update SET_NIC when --setup-nic passed to scylla_sysconfig_setup
scylla_sysconfig_setup mistakenly ignores --setup-nic argument.
Fixes #1132

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459285500-22185-1-git-send-email-syuu@scylladb.com>
2016-03-30 11:07:33 +03:00
Takuya ASADA
58fb7000b1 dist: add setup scripts symlink to /usr/sbin
Instead of moving script to /usr/sbin, create symlink from /usr/lib/scylla/scylla_*setup to /usr/sbin/

Fixes #1092

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459324684-31364-1-git-send-email-syuu@scylladb.com>
2016-03-30 11:04:41 +03:00
Glauber Costa
23808ba184 sstables: fix exception printouts in check_marker
As Nadav noticed in his bug report, check_marker is creating its error messages
using characters instead of numbers - which is what we intended here in the
first place.

That happens because sprint(), when faced with an 8-byte type, interprets this
as a character.  To avoid that we'll use uint16_t types, taking care not to
sign-extend them.

The bug also noted that one of the error messages is missing a parameter, and
that is also fixed.

Fixes #1122

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <74f825bbff8488ffeb1911e626db51eed88629b1.1459266115.git.glauber@scylladb.com>
2016-03-29 19:23:28 +03:00
Takuya ASADA
c1277bacb4 dist/common/scripts: prevent misinterpret blank input as '/dev/', show error when inputted device path is not found
Fixes #1110

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459267786-19123-1-git-send-email-syuu@scylladb.com>
2016-03-29 19:18:51 +03:00
Glauber Costa
d5c1366e85 compaction: be verbose about which table is causing an exception
When we, for some reason, fail to compact an SSTable, we do not log the file
name leaving us with cryptic messages that tell us what happened, but not where
it happened.

This patch adds logging in compaction so that we'll know what's going on.
Please note that readers are more of a concern, because the SSTable being
written technically do not exist yet. Still, better safe than sorry: if
open_data fails, or we leave an unfinished SSTable, it is still good to know
which one was the culprit.

Some argument can be made about whether we should log this at the lower SSTable
level, or at the compaction level.

The reason I am logging this at the compaction level, is that we don't really
know which exception will trigger, and where: it may be the case that we're
seeing exceptions that are not SSTable specific, and may not have the chance to
log it properly.

In particular, if the exception happens inside the reader: read_rows() and
friends only return a mutation reader, which doesn't really do anything until
we call read(). But at that time, we don't hold any pointers to the SSTable
anymore.

In Summary, logging at the compaction level guarantees that we always do it no
matter what. Exceptions that are part of the main SSTable path can log the file
name as well if they want: if that's the case, we'll be left with the name
appearing twice. That's totally harmless, and better than none.

Fixes #1123

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c5c969fb6aeb788a037bd7a4ea69979c1042cb34.1459263847.git.glauber@scylladb.com>
2016-03-29 18:15:56 +03:00
Glauber Costa
d536846433 commitlog: initialize sync period with actual sync period
commitlog's sync period is initialized as the batch period, and not as the
sync period itself as it should be.

I've found this by code inspection, but unless I am missing something
really fundamental, this seems to be completely wrong. It's been working
fine because in our defaults, I have checked that both variables default to
the same value. But it seems to me that as long as anyone would change one
of them, the behavior wouldn't be as expected.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>
2016-03-29 15:21:02 +03:00
Takuya ASADA
a5bb6c4b1b dist/ubuntu: drop classical sysv init script, only support Upstart for Ubuntu 14.04LTS
Sysv init script was added just for prevent warning message on lintian,
never really used by Ubuntu users. Result of that, we often break this
script since upstart/systemd unit file frequently changed. It may
confuse users, it's better to use Upstart only, just like Fedora/CentOS.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459177601-20269-2-git-send-email-syuu@scylladb.com>
2016-03-29 11:48:18 +03:00
Takuya ASADA
42ce77a3b7 dist/redhat: prevent 'yum: command not found' on some Fedora environment
On some Fedora environments such as Fedora official AMI, dnf-yum package
is not installed by default, causes command not found error when we run
our setup scripts.

To prevent this, we need to add dnf-yum to scylla-server package
dependency.

Fixes #1106

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459099744-23068-1-git-send-email-syuu@scylladb.com>
2016-03-29 11:29:09 +03:00
Avi Kivity
adffb1c061 dist/ubuntu: improve handling of bad command line options
On a bad command line, Scylla will exit with an exit code of 2.  Mark it
as a "normal" exit, to prevent a respawn.

Fixes #1087
Message-Id: <1458827221-12833-1-git-send-email-avi@scylladb.com>
2016-03-29 11:14:45 +03:00
Avi Kivity
c1d8fb56f7 dist/ubuntu: specify kill timeout
Allow more time for commitlog flushing
Message-Id: <1458827216-12778-1-git-send-email-avi@scylladb.com>
2016-03-29 11:14:27 +03:00
Raphael Carvalho
d515a7fd85 sstables: fix deletion of sstable with temporary TOC
After 4e52b41a4, remove_by_toc_name() became aware of temporary TOC
files, however, it doesn't consider that some components may be
missing if temporary TOC is present.
When creating a new sstable, the first thing we do is to write all
components into temporary TOC, so content of a temporary TOC isn't
reliable until it is renamed.

Solution is about implementing the following flow (described by Avi):
"Flow should be:

  - remove all components in parallel
  - forgive ENOENT, since the compoent may not have been written;
otherwise deletion error should be raised
  - fsync the directory
  - delete the temporary TOC
"

This problem can be reproduced by running compaction without disk
space, so compaction would fail and leave a partial sstable that would
be marked for deletion. Afterwards, remove_by_toc_name() would try to
delete a component that doesn't exist because it looked at the content
of temporary TOC.

Fixes #1095.

Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com>
Message-Id: <0cfcaacb43cc5bad3a8a7ea6c1fa6f325c5de97d.1459194263.git.raphaelsc@scylladb.com>
2016-03-29 10:38:01 +03:00
Tomasz Grabiec
d1db23e353 storage_service: Fix typos
Message-Id: <1458837390-26634-1-git-send-email-tgrabiec@scylladb.com>
2016-03-29 10:29:04 +03:00
Pekka Enberg
994390769f Update scylla-ami submodule
* dist/ami/files/scylla-ami 89e7436...7019088 (1):
  > Re-enable clocksource=tsc on AMI
2016-03-29 10:18:06 +03:00
Takuya ASADA
201b0c6ab3 dist: re-enable clocksource=tsc on AMI
clocksource=tsc on boot parameter mistakenly dropped on b3c85aea89, need to re-enable.

[ penberg: Manual backport of commit 050fb911d5 to 1.0. ]
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459180643-4389-1-git-send-email-syuu@scylladb.com>

(cherry picked from commit 80242ff443)
2016-03-29 10:17:41 +03:00
Pekka Enberg
227daecba6 Revert "dist: move setup scripts to /usr/sbin"
This reverts commit 989357189a because it
broke our Jenkins packaging jobs.
2016-03-29 10:17:05 +03:00
Pekka Enberg
d1ec97e76f Revert "dist: re-enable clocksource=tsc on AMI"
This reverts commit 050fb911d5 in
preparation for reverting 989357189a.
2016-03-29 10:16:48 +03:00
Takuya ASADA
050fb911d5 dist: re-enable clocksource=tsc on AMI
clocksource=tsc on boot parameter mistakenly dropped on b3c85aea89, need to re-enable.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1459180643-4389-1-git-send-email-syuu@scylladb.com>
2016-03-29 09:53:23 +03:00
Asias He
62d443a07d streaming: Fix log of plan_id and session address in stream_session
They are get swapped. Fix it up. Spotted by looking at the log.
Message-Id: <d163d71e9a96d1a45c3a4c529519790eeff7c486.1459172778.git.asias@scylladb.com>
2016-03-29 09:01:06 +03:00
Nadav Har'El
a05577ca41 sstable: fix read failure of certain sstables
We had a problem reading certain existing Cassandra sstables into
Scylla.

Our consume_range_tombstone() function assumes that the start and end
columns have a certain "end of component" markers, and want to verify
that assumption. But because of bugs in older versions of Cassandra,
see https://issues.apache.org/jira/browse/CASSANDRA-7593, sometimes the
"end of component" was missing (set to 0). CASSANDRA-7593 suggested
this problem might exist on the start column, so we allowed for that,
but now we discovered a case where also the end column is set to 0 -
causing the test in consume_range_tombstone() to fail and the sstable
read to fail - causing Scylla to no be able to import that sstable from
Cassandra. Allowing for an 0 also on the end column made it possible
to read that sstable, compact it, and so on.

Fixes #1125.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1459173964-23242-1-git-send-email-nyh@scylladb.com>
2016-03-28 17:09:37 +03:00
Duarte Nunes
db881fdc8f cql: Add support for pg-style string literal
This patch adds support for pg-style string literals to the CQL
grammar.

Fixes #1078

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1459093238-2529-1-git-send-email-duarte@scylladb.com>
2016-03-28 17:06:03 +03:00
yan cui
e5d1c031ac dist: add ubuntu docker file 2016-03-28 10:14:12 +03:00
Avi Kivity
a919113fdb schema_tables: fix deadlock in cross-node communications
Seastar wrongly limits the number of concurrent submit_to()s to a single
remote shard.  This can cause an ABBA deadlock:

  fiberA                fiberB (x127)
  submit_to(0)                         # lock schema
  <- returns
                        submit_to(0)   # lock schema (waits)
  submit_to(0)                         # do work (waits)

The fiberBs wait for fiberA, which in turn waits for a fiberB to return.

While the correct fix is to remote the client-side limit and replace it
with a server-side per-verb limit, we start with a simpler fix that
replaces the blocking lock call with a non-blocking call, removing the
deadlock.

Fixes #1088.

Message-Id: <1459095357-28950-1-git-send-email-avi@scylladb.com>
2016-03-28 10:12:10 +03:00
Raphael Carvalho
e6e5999282 Fix corner-case in refresh
Problem found by dtest which loads sstables with generation 1 and 2 into an
empty column family. The root of the problem is that reshuffle procedure
changes new sstables to start from generation 2 at least. So reshuffle could
try to set generation 1 to 2 when generation 2 exists.
This problem can be fixed by starting from generation 1 instead, so reshuffle
would handle this case properly.

Fixes #1099.

Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com>
Message-Id: <88c51fbda9557a506ad99395aeb0a91cd550ede4.1458917237.git.raphaelsc@scylladb.com>
2016-03-27 10:03:32 +03:00
Avi Kivity
077c0d1022 dist: ami: fix AMI_OPT receiving no value
We assign AMI=0 and AMI_OPT=1, so in the true case, AMI_OPT has no value,
and a later compare fails.
2016-03-26 21:16:28 +03:00
Takuya ASADA
989357189a dist: move setup scripts to /usr/sbin
Since these scripts are user command, should be on $PATH.

Fixes #1092

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458860407-25269-1-git-send-email-syuu@scylladb.com>
2016-03-25 11:50:13 +03:00
Takuya ASADA
2582dbe4a0 dist/ami: use tilde for release candidate builds
Sync with ubuntu package versioning rule

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458882718-29317-1-git-send-email-syuu@scylladb.com>
2016-03-25 11:34:28 +03:00
Glauber Costa
e750a94300 sanity check Seastar's I/O queue configuration
While Seastar in general can accept any parameter for its I/O queues, Scylla
in particular shouldn't run with them disabled. Such will be the status when
the max-io-requests parameter is not enabled.

On top of that, we would like to have enough depth per I/O queue not to allow
for shard-local parallelism. Therefore, we will require a minimum per-queue
capacity of 4. In machines where the disk iodepth is not enough to allow for 4
concurrent requests per shard, one should reduce the number of I/O queues.

For --max-io-requests, we will check the parameter itself. However, the
--num-io-queues parameter is not mandatory, and given enough concurrent
requests, Seastar's default configuration can very well just be doing the right
thing. So for that, we will check the final result of each I/O queue.

As it is the case with other checks of the sorts, this can be overridden by
the --developer-mode switch.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <63bf7e91ac10c95810351815bb8f5e94d75592a5.1458836000.git.glauber@scylladb.com>
2016-03-25 11:33:57 +03:00
Tomasz Grabiec
53bbcf4a1e schema_tables: Wait for notifications to be processed.
Listeners may defer since:

 93015bcc54 "migration_manager: Make the migration callbacks runs inside seastar thread"

Not all places were adjusted to wait for them. Fix that.

Message-Id: <1458837613-27616-1-git-send-email-tgrabiec@scylladb.com>
2016-03-24 19:04:12 +02:00
Avi Kivity
12744217b8 Initial github issue template
Message-Id: <1458817106-1513-1-git-send-email-avi@scylladb.com>
2016-03-24 15:37:00 +02:00
Benoît Canet
4ac1126677 collectd: Write to the network to get rid of spurious log messages
Closes #1018

Suggested-by: Avi Kivity <avi@scylladb.com>
Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458759378-4935-1-git-send-email-benoit@scylladb.com>
2016-03-24 12:34:14 +02:00
Calle Wilund
ff5df306e3 database: Use disk-marking delete function in discard_sstables
Fixes #797

To make sure an inopportune crash after truncate does not leave
sstables on disk to be considered live, and thus resurrect data,
after a truncate, use delete function that renames the TOC file to
make sure we've marked sstables as dead on disk when we finish
this discard call.
Message-Id: <1458575440-505-2-git-send-email-calle@scylladb.com>
2016-03-24 12:02:08 +02:00
Calle Wilund
4e52b41a46 sstables: Add delete func to rename TOC ensuring table is marked dead
Note: "normal" remove_by_toc_name must now be prepared for and check
if the TOC of the sstable is already moved to temp file when we
get to the juicy delete parts.
Message-Id: <1458575440-505-1-git-send-email-calle@scylladb.com>
2016-03-24 12:01:53 +02:00
Asias He
6fd6e57e80 streaming: Harden keep alive timer
- Do nothing in case the session is closed, to prevent we fire up the
  timer again

- Print log info when no progress has been made if the time expires, it
  is very useful to debug a idle session

- Grab a reference when the keep alive timer is running

Message-Id: <9f2cc3164696905a6a39c0d072a980765d598dfd.1458782956.git.asias@scylladb.com>
2016-03-24 11:58:54 +02:00
Avi Kivity
112a930f92 Merge "Bring back simplify session completion logic" from Asias
"The following patches are reverted becasue they were thought they break
Glauber's "Make sure repairs do not cripple incoming load" series. It turns out
these two patches just made another bug more visisble. The bug is fixed in
c2eff7e824 (streaming: Complete receive task
after the flush). We can bring the two patches back now.

Passed repair_additional_test.py and update_cluster_layout_tests.py with smp 2."
2016-03-24 11:57:20 +02:00
Tomasz Grabiec
341b509f68 cql_test_env: Make initialization exception-safe
Currently start() is not prepared to handle exceptions thrown from
service initialization. It's easy to trigger such exceprion by
starting two tests at the same time, which will result in socket bind
error.

Exception thrown from start() typically results in assertion failures
like this one:

  seastar::sharded<Service>::~sharded() [with Service = database]: Assertion `_instances.empty()' failed.

This patch fixes the problem by combining start() and stop() in a
single do_with() and using RAII for stopping services.

Now exceptions thrown from service initialization should stop services
in proper order and let the original exception to pass
through. Example result:

  fatal error in "test_new_schema_with_no_structural_change_is_propagated": std::runtime_error: bind: Address already in use
Message-Id: <1458768018-27662-1-git-send-email-tgrabiec@scylladb.com>
2016-03-24 11:20:01 +02:00
Shlomi Livne
d3a91e737b fix a collision betwen --ami command line param and env
sysconfig scylla-server includes an AMI, the script also used an AMI
variable fix this by renaming the script variable

6a18634f9f introduced this issue since it
started imported the sysconfig scylla-server

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <0bc472bb885db2f43702907e3e40d871f1385972.1458767984.git.shlomi@scylladb.com>
2016-03-24 08:14:41 +02:00
Asias He
fe263e5436 Revert "Revert "streaming: Start to send mutations after PREPARE_DONE_MESSAGE""
This reverts commit 1f29a698d5.
2016-03-24 08:43:17 +08:00
Asias He
a6dd6e6d55 Revert "Revert "streaming: Simplify session completion logic""
This reverts commit 354fca9d56.
2016-03-24 07:48:27 +08:00
Gleb Natapov
0afd1c6f0a config: enable truncate_request_timeout_in_ms option
Option truncate_request_timeout_in_ms is used by truncate. Mark it as
used.

Message-Id: <20160323162649.GH2282@scylladb.com>
2016-03-23 18:50:24 +02:00
Yoav Kleinberger
91269d0c15 tools/scyllatop: add sums to aggregate view
the aggregate view now supports both sums and means.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1328af8efb113a786d7402b0704220108bfb28db.1458749600.git.yoav@scylladb.com>
2016-03-23 18:49:57 +02:00
Shlomi Livne
6a18634f9f scylla_io_setup import scylla-server env args
scylla_io_seup requires the scylla-server env to be setup to run
correctly. previously scylla_io_setup was encapsulated in
scylla-io.service that assured this.

extracting CPUSET,SMP from SCYLLA_ARGS as CPUSET is needed for invoking
io_tune

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <d49af9cb54ae327c38e451ff76fe0322e64a5f00.1458747527.git.shlomi@scylladb.com>
2016-03-23 17:54:06 +02:00
Pekka Enberg
8bf3d4f550 Merge "Make sure repairs do not cripple incoming load" from Glauber
"This series makes sure that the influence of repairs on the ongoing loads is
limited. This patch does not fix the situation completely, but it will be the
best we can do for 1.0

Here's a brief explanation about some potentially contentions points, and future work:

1) With the old parallelism semaphore in tree, we could never really drop parallelism
below 256, since even with (local) parallelism = 1, we would still have 256 vnodes. So while
the number 100 is totally empirical, we know for a fact that around 200-something, we
start having real trouble. (total) parallelism = 100 is enough to allow us to survive
a load as much as 3 times heavier than the load described in Issue944. So while it is
empirical, at least it is based on something

2) I totally support changing the checksumming algorithm. However, I would rather focus
my efforts on testing this to exhaustion than doing this at the moment. But if anybody
wants to do it, I think it is a great thing to have before 1.0. Specially because
we'll probably need a new verb for that, so we would be better off having it from the start

3) This problem was made harder due to the fact that there are three conditions really that
can affect the ongoing load. Only one of them needs to trigger for us to see degradation, so
fixing them individually will usually buy us nothing.

Those are:
    a) The disk bandwidth. Since the mutations are all together in the same memtable/commitlog
       as normal memtables, we can differentiate between them from the I/O Scheduler perspective.
       This is not an issue of course if the incoming mutations are not enough for us to saturate
       the disk, but specially given the highly parallel nature of repair, we usually will. If
       the commitlog queue starts getting too big, for instance, new requests will start being
       put to wait. The effect of this part of the series is to *completely* shift the high waiting
       times from those classes to the streaming ones (unfortunately compaction is still affected,
       but that's fine IMHO). With the new streaming classes, the waiting time of a memtable / commitlog
       requests is still kept in the microseconds range. The streaming classes, on the other hand,
       will be in the hundreds of milliseconds range, or even seconds.

    b) The memory consumption: since the whole problem that leads to a) is the fact that due to high
       disk activity some requests will have to wait, we will end up with a lot of streaming memtables
       not yet flushed. Because of that, we will start throttling new incoming CQL requests and all
       the isolation efforts are rendered useless. Once again, due to the highly parallel nature of
       repair, this turned out to be a very easy condition to trigger. The solution proposed here
       is to limit a maximum amount of dirty memory for the repair job (in here, 25 %). This way,
       we can endure even slightly heavier loads without sweating too much.

    c) The task scheduler: repair generates a ton of requests for range checksums, and we actually
       want to keep it that way - so that the ranges checksummed are small enough so we don't have
       to resend a lot of mutations for no reason. However, if we pile up thousands of continuations
       in the task scheduler, seastar has absolutely no mechanism (right now) to prioritize between
       different kinds of requests. That means that the continuations that are supposed to be handling
       user requests will simply not for a long time. Even if the Seastar load is less than 100 %
       that is still a problem, since that is just adding hundreds of milliseconds worth of latencies to
       any request processing.

Fixes #944 and fixes #1033."
2016-03-23 16:07:06 +02:00
Yoav Kleinberger
d2cfb86dc8 tools/scyllatop: defend against unexpected strings from collectd
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <cd7ecf6b3b82bd2027179cbec4e689a946469e9a.1458740337.git.yoav@scylladb.com>
2016-03-23 16:05:59 +02:00
Asias He
c2eff7e824 streaming: Complete receive task after the flush
A STREAM_MUTATION_DONE message will signal the receiver that the sender
has completed the sending of streams mutations. When the receiver finds
it has zero task to send and zero task to receive, it will finish the
stream_session, and in turn finish the stream_plan if all the
stream_sessions are finished. We should call receive_task_completed only
after the flush finishes so that when stream_plan is finshed all the
data is on disk.

Fixes repair_disjoint_data_test issue with Glauber's "[PATCH v4 0/9] Make
sure repairs do not cripple incoming load" serries

======================================================================
FAIL: repair_disjoint_data_test
(repair_additional_test.RepairAdditionalTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "scylla-dtest/repair_additional_test.py",
line 102, in repair_disjoint_data_test
    self.check_rows_on_node(node1, 3000)
  File "scylla-dtest/repair_additional_test.py",
line 33, in check_rows_on_node
    self.assertEqual(len(result), rows, len(result))
AssertionError: 2461
2016-03-23 09:40:49 -04:00
Glauber Costa
f49e965d78 repair: rework repair code so we can limit parallelism
The repair code as it is right now is a bit convoluted: it resorts to detached
continuations + do_for_each when calling sync_ranges, and deals with the
problem of excessive parallelism by employing a semaphore inside that range.

Still, even by doing that, we still generate a great number of
checksum requests because the ranges themselves are processed in parallel.

It would be better to have a single-semaphore to limit the overall parallelism
for all requests.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:40:49 -04:00
Glauber Costa
34a9fc106f database: keep streaming memtables in their own region group
Theoretically, because we can have a lot of pending streaming memtables, we can
have the database start throttling and incoming connections slowing down during
streaming.

Turns out this is actually a very easy condition to trigger. That is basically
because the other side of the wire in this case is quite efficient in sending
us work. This situation is alleviated a bit by reducing parallelism, but not
only it does't go away completely, once we have the tools to start increasing
parallelism again it will become common place.

The solution for this is to limit the streaming memtables to a fraction of the
total allowed dirty memory. Using the nesting capability built in in the LSA
regions, we will make the streaming region group a child of the main region
group.  With that, we can throttle streaming requests separately, while at the
same time being able to control the total amount of dirty memory as well.

Because of the property, it can still be the case that incoming requests will
throttle earlier due to streaming - unless we allow for more dirty memory to be
used during repairs - but at least that effect will be limited.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:40:47 -04:00
Glauber Costa
455d5a57d2 streaming memtables: coalesce incoming writes
The repair process will potentially send ranges containing few mutations,
definitely not enough to fill a memtable. It wants to know whether or not each
of those ranges individually succeeded or failed, so we need a future for each.

Small memtables being flushed are bad, and we would like to write bigger
memtables so we can better utilize our disks.

One of the ways to fix that, is changing the repair itself to send more
mutations at a single batch. But relying on that is a bad idea for two reasons:

First, the goals of the SSTable writer and the repair sender are at odds. The
SSTable writer wants to write as few SSTables as possible, while the repair
sender wants to break down the range in pieces as small as it can and checksum
them individually, so it doesn't have to send a lot of mutations for no reason.

Second, even if the repair process wants to process larger ranges at once, some
ranges themselves may be small. So while most ranges would be large, we would
still have potentially some fairly small SSTables lying around.

The best course of action in this case is to coalesce the incoming streams
write-side.  repair can now choose whatever strategy - small or big ranges - it
wants, resting assure that the incoming memtables will be coalesced together.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:38:22 -04:00
Glauber Costa
5fa866223d streaming: add incoming streaming mutations to a different sstable
Keeping the mutations coming from the streaming process as mutations like any
other have a number of advantages - and that's why we do it.

However, this makes it impossible for Seastar's I/O scheduler to differentiate
between incoming requests from clients, and those who are arriving from peers
in the streaming process.

As a result, if the streaming mutations consume a significant fraction of the
total mutations, and we happen to be using the disk at its limits, we are in no
position to provide any guarantees - defeating the whole purpose of the
scheduler.

To implement that, we'll keep a separate set of memtables that will contain
only streaming mutations. We don't have to do it this way, but doing so
makes life a lot easier. In particular, to write an SSTable, our API requires
(because the filter requires), that a good estimate on the number of partitions
is informed in advance. The partitions also need to be sorted.

We could write mutations directly to disk, but the above conditions couldn't be
met without significant effort. In particular, because mutations can be
arriving from multiple peer nodes, we can't really sort them without keeping a
staging area anyway.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:13:00 -04:00
Glauber Costa
10c8ca6ace priority manager: separate streaming reads from writes
Streaming has currently one class, that can be used to contain the read
operations being generated by the streaming process. Those reads come from two
places:

- checksums (if doing repair)
- reading mutations to be sent over the wire.

Depending on the amount of data we're dealing with, that can generate a
significant chunk of data, with seconds worth of backlog, and if we need to
have the incoming writes intertwined with those reads, those can take a long
time.

Even if one node is only acting as a receiver, it may still read a lot for the
checksums - if we're talking about repairs, those are coming from the
checksums.

However, in more complicated failure scenarios, it is not hard to imagine a
node that will be both sending and receiving a lot of data.

The best way to guarantee progress on both fronts, is to put both kinds of
operations into different classes.

This patch introduces a new write class, and rename the old read class so it
can have a more meaningful name.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Glauber Costa
78189de57f database: make seal_on_overflow a method of the memtable_list
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Glauber Costa
635bb942b2 database: move add_memtable as a method of the memtable_list
The column family still has to teach the memtable list how to allocate a new memtable,
since it uses CF parameters to do so.

After that, the memtable_list's constructor takes a seal and a create function and is complete.
The copy constructor can now go, since there are no users left.
The behavior of keeping a reference to the underlying memtables can also go, since we can now
guarantee that nobody is keeping references to it (it is not even a shared pointer anymore).
Individual memtables are, and users may be keeping references to them individually.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Glauber Costa
6ba95d450f database: move active_memtable to memtable_list
Each list can have a different active memtable. The column family method keeps
existing, since the two separate sets of memtable are just an implementation
detail to deal with the problem of streaming QoS: *the* active memtable keeps
being the one from the main list.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Glauber Costa
af6c7a5192 database: create a class for memtable_list
memtable_list is currently just an alias for a vector of memtables.  Let's move
them to a class on its own, exporting the relevant methods to keep user code
unchanged as much as possible.

This will help us keeping separate lists of memtables.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:12:59 -04:00
Avi Kivity
8ed95754c0 Merge seastar upstream
* seastar 9f2b868...aa281bd (7):
  > shared_promise: Add move assignment operator
  > lowres_clock: Fix stretched time
  > scripts: Delete tap with ip instead of tunctl
  > vla: Actually be exception-safe
  > vla: Ensure memory is freed if ctor throws
  > vla: Ensure memory is correctly freed
  > net: Improve error message when parsing invalid ipv4 address
2016-03-23 14:39:31 +02:00
Takuya ASADA
50db64de33 dist: drop -j2 option on .spec, make build_rpm.sh able to specify -j option
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458678665-30273-1-git-send-email-syuu@scylladb.com>
2016-03-23 13:32:14 +02:00
Gleb Natapov
48c83163b9 init: make more initialization threaded
Since initialization now runs in a thread storage, messaging and
gossiper services initialization code may take advantage of it too.

Message-Id: <20160323094732.GF2282@scylladb.com>
2016-03-23 11:53:11 +02:00
Shlomi Livne
4ecc37111f dist/ami: Use the actual number of disks instead of AWS meta service
We have seen in some cases that when using the boto api to start
instances the aws metadata service
http://169.254.169.254/latest/meta-data/block-device-mapping/ returns
incorrect number of disks - workaround that by checking the actual
number of disks using lsblk

Adding a validation at the end verifying that after all computations the
NR_IO_QUEUES will not be greater then the number of shards (we had an
issue with i2.8x)

Fixes: #1062

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <54c51cd94dd30577a3fe23aef3ce916c01e05504.1458721659.git.shlomi@scylladb.com>
2016-03-23 10:47:08 +02:00
Raphael Carvalho
370b1336fe service: fix refresh
Vlad and I were working on finding the root of the problems with
refresh. We found that refresh was deleting existing sstable files
because of a bug in a function that was supposed to return the maximum
generation of a column family.
The intention of this function is to get generation from last element
of column_family::_sstables, which is of type std::map.
However, we were incorrectly using std::map::end() to get last element,
so garbage was being read instead of maximum generation.
If the garbage value is lower than the minimum generation of a column
family, then reshuffle_sstables() would set generation of all existing
sstables to a lower value. That would confuse our mechanism used to
delete sstables because sstables loaded at boot stage were touched.
Solution to this problem is about using rbegin() instead of end() to
get last element from column_family::_sstables.

The other problem is that refresh will only load generations that are
larger than or equal to X, so new sstables with lower generation will
not be loaded. Solution is about creating a set with generation of
live SSTables from all shards, and using this set to determine whether
a generation is new or not.

The last change was about providing an unused generation to reshuffle
procedure by adding one to the maximum generation. That's important to
prevent reshuffle from touching an existing SSTable.

Tested 'refresh' under the following scenarios:
1) Existing generations: 1, 2, 3, 4. New ones: 5, 6.
2) Existing generations: 3, 4, 5, 6. New ones: 1, 2.
3) Existing generations: 1, 2, 3, 4. New ones: 7, 8.
4) No existing generation. No new generation.
5) No existing generation. New ones: 1, 2.
I also had to adapt existing testcase for reshuffle procedure.

Fixes #1073.

Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com>
Message-Id: <1c7b8b7f94163d5cd00d90247598dd7d26442e70.1458694985.git.raphaelsc@scylladb.com>
2016-03-23 10:21:58 +02:00
Benoît Canet
1594bdd5bb dist/ubuntu: Fix the init script variable sourcing
The variable sourcing was crashing the init script on ubuntu.
Fix it with the suggestion from Avi.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458685099-1160-1-git-send-email-benoit@scylladb.com>
2016-03-23 09:03:17 +02:00
Tomasz Grabiec
5f44afa311 cql3: batch_statement: Execute statements sequentially
Currently we execute all statements in parallel, but some statements
depend on order, in particular list append/prepend. Fix by executing
sequentially.

Fixes cql_additional_tests.py:TestCQL.batch_and_list_test dtest.

Fixes #1075.

Message-Id: <1458672874-4749-1-git-send-email-tgrabiec@scylladb.com>
2016-03-22 20:59:40 +02:00
Pekka Enberg
354fca9d56 Revert "streaming: Simplify session completion logic"
This reverts commit 208b7fa7ba. It breaks
Glauber's upcoming repair series.
2016-03-22 20:37:50 +02:00
Pekka Enberg
1f29a698d5 Revert "streaming: Start to send mutations after PREPARE_DONE_MESSAGE"
This reverts commit 4c06221766. It breaks
Glauber's upcoming repair series.
2016-03-22 20:37:22 +02:00
Avi Kivity
7df21768d6 Merge "Fix row_cache_alloc_stress test" from Tomasz
"The test predates LSA zones and was not anticipating that LSA would
take much more free memory from the system than it needs in its assertions.
Fix by accounting for the fact properly."
2016-03-22 18:46:31 +02:00
Avi Kivity
b8f80bb2be Update scylla-ami submodule
* dist/ami/files/scylla-ami 56f1ab7...89e7436 (1):
  > Merge "iotune packaging fix for scylla-ami" from Takuya
2016-03-22 17:55:00 +02:00
Takuya ASADA
dac2bc3055 dist: on scylla_io_setup, SMP and CPUSET should be empty when the parameter not present
Fixes #1060

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458659928-2050-1-git-send-email-syuu@scylladb.com>
2016-03-22 17:49:06 +02:00
Avi Kivity
8cf785e53a Merge "Merge "iotune packaging fix" from Takuya
"This implements #1065
 - iotune will NOT be a part of scylla service - remove the scylla.io.service
 - User will have to run it manually - using a script call scylla_io_tune_setup (that will do the exact same thing the service does today.
 - if they wont, and do not use --developer-mode, scylla init will fail will a proper error - scylla will not start (in the same manner it does not start if you run scylla on non XFS FS)
 - For c3,m3,i2 we will use the evaluation formula we have (that takes the number of disks , cores etc.)
 - For other instances we will set --developer-mode. if the user logins into the instance - he will get a developer-mode warning
 - No iotune on AWS"

Fixes #1065.
2016-03-22 17:46:32 +02:00
Takuya ASADA
9889712d43 dist: remove scylla-io-setup.service and make it standalone script 2016-03-22 17:45:58 +02:00
Takuya ASADA
2cedab07f2 dist: on scylla_io_setup print out message both for stdout and syslog 2016-03-22 17:45:58 +02:00
Takuya ASADA
83112551bb dist: introduce dev-mode.conf and scylla_dev_mode_setup 2016-03-22 17:45:58 +02:00
Tomasz Grabiec
a4e3adfbec Fix assertion in row_cache_alloc_stress
Fixes the following assertion failure:

  row_cache_alloc_stress: tests/row_cache_alloc_stress.cc:120: main(int, char**)::<lambda()>::<lambda()>: Assertion `mt->occupancy().used_space() < memory::stats().free_memory()' failed.

memory::stats()::free_memory() may be much lower than the actual
amount of reclaimable memory in the system since LSA zones will try to
keep a lot of free segments to themselves. Fix by using actual amount
of reclaimable memory in the check.
2016-03-22 16:31:04 +01:00
Tomasz Grabiec
a0cba3c86f logalloc: Introduce tracker::occupancy()
Returns occupancy information for all memory allocated by LSA, including
segment pools / zones.
2016-03-22 16:28:10 +01:00
Yoav Kleinberger
97bb7a35d9 tools/scyllatop: some sensible default metrics
Previosly if the user did not specify any metrics, scyllatop use
whatever it could find. Now we have some preset defaults which are
probably more interesting.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1458658804-377-1-git-send-email-yoav@scylladb.com>
2016-03-22 17:04:13 +02:00
Tomasz Grabiec
529c8b8858 logalloc: Rename tracker::occupancy() to region_occupancy() 2016-03-22 14:56:44 +01:00
Pekka Enberg
5019b709ba service/migration_manager: Simplify verb unregistration
You can safely unregister verbs even if they're not registered yet.
Simplify code in migration manager by dropping the redundant checks.
Message-Id: <1458027669-6517-1-git-send-email-penberg@scylladb.com>
2016-03-22 15:24:55 +02:00
Pekka Enberg
3e1a660839 Merge seastar upstream
* seastar c193821...9f2b868 (4):
  > memory: set free memory to non-zero value in debug mode
  > Merge "Increase IOTune's robustness by including a timeout" from Glauber
  > shared_future: add companion class, shared_promise
  > rpc: fix client connection stopping
2016-03-22 15:16:21 +02:00
Asias He
4c06221766 streaming: Start to send mutations after PREPARE_DONE_MESSAGE
Below are 3 possible cases in a stream session, after commit
208b7fa7ba (streaming: Simplify session completion logic) We might
close the session before the exchange of the PREPARE_DONE_MESSAGE
message in case 1). To fix, we defer the sending of mutations after
PREPARE_DONE_MESSAGE is sent at the initiator node.

1)
Initiator         Follower
tx rx              tx rx
1  0               0  1
send prepare
                   send back prepare
recev prepare
send mutations (close the session before prepare_done msg is sent)
                  recv mutations (close session before prepare_done msg is received)
send prepare_done
                   recv prepare_done and send no mutations
2)
Initiator         Follower
tx rx              tx rx
0  1               1  0
send prepare
                    send back prepare
recv prepare
nothing to send
send prepare_done
                    recv prepare_done and send mutations (close session)
recv mutations (close session)

3)
Initiator         Follower
tx rx              tx rx
1  1               1  1
send prepare
                    send back prepare
recv prepare
send mutations
                    recv mutations, can not close session since we have mutations to send
send prepare_done
                    recv prepare_done and send mutations (close session)
recv mutations (close session)
Message-Id: <d6510b558565db23202164fa491b883ef3796e58.1458634037.git.asias@scylladb.com>
2016-03-22 15:05:57 +02:00
Takuya ASADA
6b2a8a2f70 dist: enable collectd on scylla_setup by default, to make scyllatop usable
Fixes #1037

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458324769-9152-1-git-send-email-syuu@scylladb.com>
2016-03-22 15:02:18 +02:00
Tomasz Grabiec
ca08db504b managed_bytes: Make operator[] work for large blobs as well
Fixes assertion in mutation_test:

mutation_test: ./utils/managed_bytes.hh:349: blob_storage::char_type* managed_bytes::data(): Assertion `!_u.ptr->next'

Introduced in ea7c2dd085

Message-Id: <1458648786-9127-1-git-send-email-tgrabiec@scylladb.com>
2016-03-22 14:43:52 +02:00
Gleb Natapov
1e6352e398 messaging: do not admit new requests during messaging service shutdown.
Sending a message may open new client connection which will never be
closed in case messaging service is shutting down already.

Fixes #1059

Message-Id: <1458639452-29388-3-git-send-email-gleb@scylladb.com>
2016-03-22 13:00:18 +02:00
Gleb Natapov
357c91a076 messaging: do not delete client during messaging service shutdown
Messaging service stop() method calls stop() on all clients. If
remove_rpc_client_one() is called while those stops are running
client::stop() will be called twice which not suppose to happen. Fix it
by ignoring client remove request during messaging service shutdown.

Fixes #1059

Message-Id: <1458639452-29388-2-git-send-email-gleb@scylladb.com>
2016-03-22 13:00:18 +02:00
Asias He
b8abd88841 messaging_service: Take reference of ms in send_message_timeout_and_retry
Take a reference of messaging_service object inside
send_message_timeout_and_retry to make sure it is not freed during the
life time of send_message_timeout_and_retry operation.
2016-03-22 12:32:19 +02:00
Pekka Enberg
ae33e9fe76 dist/ubuntu: Use tilde for release candidate builds
The version number ordering rules are different for rpm and deb. Use
tilde ('~') for the latter to ensure a release candidate is ordered
_before_ a final version.

Message-Id: <1458627524-23030-1-git-send-email-penberg@scylladb.com>
2016-03-22 11:52:05 +02:00
Avi Kivity
5a20a70728 Merge "CQL syntax extension to handle sstable loader lists" from Calle
"Adds an extension function SCYLLA_TIMEUUID_LIST_INDEX to CQL syntax
for collection element indexing, which, if the target is a list,
will attempt to directly index the list (which is really a map)
by the ordering time uuid (as index parameter)."
2016-03-22 11:42:47 +02:00
Duarte Nunes
36571a2018 init: Trim spaces in seeds list
This patch ensures we are resilient against spaces before or after IP
addresses in the seeds list.

Fixes #958

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1458637617-5761-1-git-send-email-duarte@scylladb.com>
2016-03-22 11:10:29 +02:00
Avi Kivity
1798889e85 Merge "Make apply() exception-safe" from Tomasz
"We cannot leave partially applied mutation behind when the write
fails. It may fail if memory allocation fails in the middle of
apply(). This for example would violate write atomicity, readers
should either see the whole write or none at all.

This fix makes apply() revert partially applied data upon failure, by
the means of ReversiblyMergeable concept. In a nut shell the idea is
to store old state in the source mutation as we apply it and swap back
in case of exception. At cell level this swapping is inexpensive, just
rewiring pointers. For this to work, the source mutation needs to be
brought into mutable form, so frozen mutations need to be unfrozen. In
practice this doesn't increase amount of cell allocations in the
memtable apply path because incoming data will usually be newer and we
will have to copy it into LSA anyway. There are extra allocations
though for the data structures which holds cells.

I didn't see significant change in performance of:

    build/release/tests/perf/perf_simple_query -c1 -m1G --write --duration 13

The score fluctuates around ~77k ops/s.

The change was tested with a unit test (patch to mutation_test) which generates
random mutations and injects allocation failures at every possible allocation
site in the apply path. This also uncovered other preexisting bugs."
2016-03-22 10:43:41 +02:00
Gleb Natapov
ea92064d38 avoid invoke_on_all during developer-mode application if possible
Message-Id: <20160315145327.GW6117@scylladb.com>
2016-03-22 10:40:30 +02:00
Nadav Har'El
2eb0627665 sstable: fix use-after-free of temporary ioclass copy
Commit 6a3872b355 fixed some use-after-free
bugs but introduced a new one because of a typo:

Instead of capturing a reference to the long-living io-class object, as
all the code does, one place in the code accidentally captured a *copy*
of this object. This copy had a very temporary life, and when a reference
to that *copy* was passed to sstable reading code which assumed that it
lives at least as long as the read call, a use-after-free resulted.

Fixes #1072

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1458595629-9314-1-git-send-email-nyh@scylladb.com>
2016-03-21 22:28:05 +01:00
Tomasz Grabiec
6e73c3f3dc perf_simple_query: Make duration configurable 2016-03-21 21:49:53 +01:00
Tomasz Grabiec
2fbb55929d mutation_test: Add allocation failure stress test for apply()
The test injects allocation failures at every allocation site during
apply(). Only allocations throug allocation_strategy are instrumented,
but currently those should include all allocations in the apply() path.

The target and source mutations are randomized.
2016-03-21 21:49:53 +01:00
Tomasz Grabiec
8ede27f9c6 mutation_test: Add more apply() tests 2016-03-21 21:49:53 +01:00
Tomasz Grabiec
36575d9f01 mutation_test: Hoist make_blob() to a function 2016-03-21 21:49:53 +01:00
Tomasz Grabiec
4c85d06df7 mutation_test: Make make_blob() return different blob each time
random_bytes was constructed with the same seed each time.
2016-03-21 21:49:53 +01:00
Tomasz Grabiec
19b3df9f0f mutation_test: Fix use-after-free
The problem was that verify_row() was returning a future which was not
waited on. Fix by running the code in a thread.
2016-03-21 21:49:53 +01:00
Tomasz Grabiec
a7966e9b71 mutation_partition: Fix friend declarations
Missing "class" confuses CLion IDE.
2016-03-21 21:49:53 +01:00
Tomasz Grabiec
dc290f0af7 mutation_partition: Make apply() atomic even in case of exception
We cannot leave partially applied mutation behind when the write
fails. It may fail if memory allocation fails in the middle of
apply(). This for example would violate write atomicity, readers
should either see the whole write or none at all.

This fix makes apply() revert partially applied data upon failure, by
the means of ReversiblyMergeable concept. In a nut shell the idea is
to store old state in the source mutation as we apply it and swap back
in case of exception. At cell level this swapping is inexpensive, just
rewiring pointers. For this to work, the source mutation needs to be
brought into mutable form, so frozen mutations need to be unfrozen. In
practice this doesn't increase amount of cell allocations in the
memtable apply path because incoming data will usually be newer and we
will have to copy it into LSA anyway. There are extra allocations
though for the data structures which holds cells.

I didn't see significant change in performance of:

  build/release/tests/perf/perf_simple_query -c1 -m1G --write --duration 13

The score fluctuates around ~77k ops/s.

Fixes #283.
2016-03-21 21:49:52 +01:00
Tomasz Grabiec
e09d186c7c mutation_partition: Make intrusive sets ReversiblyMergeable 2016-03-21 21:49:52 +01:00
Tomasz Grabiec
f1a4feb1fc mutation_partition: Make row_tombstones_entry ReversiblyMergeable 2016-03-21 19:26:24 +01:00
Tomasz Grabiec
e4a576a90f mutation_partition: Make rows_entry ReversiblyMergeable 2016-03-21 19:26:24 +01:00
Tomasz Grabiec
aadcd75d89 mutation_partition: Make row_marker ReversiblyMergeable 2016-03-21 19:26:24 +01:00
Tomasz Grabiec
ea7c2dd085 mutation_partition: Make row ReversiblyMergeable 2016-03-21 19:26:24 +01:00
Tomasz Grabiec
c9d4f5a49c atomic_cell_or_collection: Introduce as_atomic_cell_ref()
Needed for setting the REVERT flag on existing cell.
2016-03-21 19:25:54 +01:00
Tomasz Grabiec
1ffe06165d atomic_cell_hash: Specialize appending_hash<> for atomic_cell and collection_mutation 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
bfc6413414 atomic_cell: Add REVERT flag
Needed to make atomic cells ReversiblyMergeable.
2016-03-21 18:41:27 +01:00
Tomasz Grabiec
7fcfa97916 tombstone: Make ReversiblyMergeable 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
1407173186 Introduce the concept of ReversiblyMergeable 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
9fc7f8a5ed mutation_partition: row: Add empty() 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
d5e66a5b0d mutation_partition: row: Allow storing empty cells internally
Currently only "set" storage could store empty cells, but not the
"vector" one because there empty cell has the meaning of being
missing. To implement rolback, we need to be able to distinguish empty
cells from missing ones. Solve by making vector storage use a bitmap
for presence checking instead of emptiness. This adds 4 bytes to
vector storage.
2016-03-21 18:41:27 +01:00
Tomasz Grabiec
ed1e6515db mutation_partition: Make row::merge() tolerate empty row
The row may be empty and still have a set storage, in which case
rbegin() dereference is undefined behavior.
2016-03-21 18:41:27 +01:00
Tomasz Grabiec
184e2831e7 managed_bytes: Mark move-assignment noexcept 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
92d4cfc3ab managed_bytes: Make copy assignment exception-safe 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
22d193ba9f managed_bytes: Make linearization_context::forget() noexcept
It is needed for noexcept destruction, which we need for exception
safety in higher layers.

According to [1], erase() only throws if key comparison throws, and in
our case it doesn't.

[1] http://en.cppreference.com/w/cpp/container/unordered_map/erase
2016-03-21 18:41:27 +01:00
Tomasz Grabiec
87d7279267 mutation: Add copy assignment operator
We already have a copy constructor, so can have copy assignment as
well.
2016-03-21 18:41:27 +01:00
Shlomi Livne
b7e338275b fix centos local ami creation (revert some changes)
in centos we do not have a version file created - revert this changes
introduced when adding ubuntu ami creation

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <69c80dcfa7afe4f5db66dde2893d9253a86ac430.1458578004.git.shlomi@scylladb.com>
2016-03-21 18:41:40 +02:00
Asias He
208b7fa7ba streaming: Simplify session completion logic
Both the initiator and follower of a stream session knows how many
transfer task and receive task the stream session contains in the
preparation phase. They use the _transfers and _receivers map to track
the tasks, like below:

       std::map<UUID, stream_transfer_task> _transfers;
       std::map<UUID, stream_receive_task> _receivers;

A stream_transfer_task will send STREAM_MUTATION verb to transfer data
with frozen_mutation, when all the STREAM_MUTATIONs are sent, it will
send STREAM_MUTATION_DONE to tell the peer the stream_transfer_task is
completed and remove the stream_transfer_task from _transfers map.  The
peer will remove the corresponding stream_receive_task in _receivers.

We do not really need the COMPLETE_MESSAGE verb to notify the peer we
have completed sending. It makes the session completion logic much
simpler and cleaner if we do not depend on COMPLETE_MESSAGE verb.

However, to be compatible with older version, we always send a
COMPLETE_MESSAGE message and do nothing in the COMPLETE_MESSAGE handler
and replies a ready future even if the stream_session is closed already.
This way, node with older version will get a COMPLETE_MESSAGE message
and manage to send a COMPLETE_MESSAGE message to new node as before.

Message-Id: <1458540564-34277-2-git-send-email-asias@scylladb.com>
2016-03-21 16:58:03 +02:00
Pekka Enberg
4892a6ded9 build: Invoke Seastar build only once
Make sure we invoke the Seastar ninja build only once from our own build
process so that we don't have multiple ninjas racing with each other.

Refs #1061.

Message-Id: <1458563076-29502-1-git-send-email-penberg@scylladb.com>
2016-03-21 16:22:11 +02:00
Takuya ASADA
6edd909b00 dist: stop using '-p' option on lsblk since Ubuntu doesn't supported it
On scylla_setup interactive mode we are using lsblk to list up candidate
block devices for RAID, and -p option is to print full device paths.

Since Ubuntu 14.04LTS version of lsblk doesn't supported this option, we
need to use non-full path name and complete paths before passes it to
scylla_raid_setup.

Fixes #1030

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458325411-9870-1-git-send-email-syuu@scylladb.com>
2016-03-21 14:54:36 +02:00
Calle Wilund
5982c0ee10 Cql.g: Add extension function SCYLLA_TIMEUUID_LIST_INDEX
Allows scylla sstable loader (cql) to do by-uuid updates to
non-frozen lists.
2016-03-21 12:28:37 +00:00
Calle Wilund
5b570c417b cql3::operation: Allow set_element to be "by uuid" (for lists)
Just add an instantiation flag to keep track. Then choose actual
opertation to perform in prepare.
2016-03-21 12:28:37 +00:00
Calle Wilund
71170f51a8 cql3::lists: Add setter_by_uuid operation
Allows direct setting of list element by UUID key
2016-03-21 12:28:36 +00:00
Asias He
39992dd559 gossip: Sync gossip_digest.idl.hh and application_state.hh
We did the clean up in idl/gossip_digest.idl.hh, but the patch to clean
up gms/application_state.hh was never merged.

To maintain compatibility with previous version of scylla, we can not
change application_state.hh, instead change idl to be sync with
application_state.hh.

Message-Id: <3a78b159d5cb60bc65b354d323d163ce8528b36d.1458557948.git.asias@scylladb.com>
2016-03-21 13:07:22 +02:00
Pekka Enberg
bcdd034512 dist/ubuntu: Install wget package if it's not available
The build scripts use wget so make sure it's actually installed on the
machine.

Message-Id: <1458554706-14558-1-git-send-email-penberg@scylladb.com>
2016-03-21 12:36:52 +02:00
Asias He
7acc9816d2 gossip: Handle unknown application_state when printing
In case an unknown application_state is received, we should be able to
handle it when printting.

Message-Id: <98d2307359292e90c8925f38f67a74b69e45bebe.1458553057.git.asias@scylladb.com>
2016-03-21 11:59:04 +02:00
Asias He
28ccd866e2 streaming: Move ranges in stream_plan
The ranges are not used afterwards. We can move instead of copy.
Message-Id: <1458540564-34277-1-git-send-email-asias@scylladb.com>
2016-03-21 10:10:09 +01:00
Avi Kivity
e1e4766cc6 Merge "Ubuntu based AMI support" from Takuya
"This provides Ubuntu based AMI support.
With this patchset, you will able to run build_ami.sh on Ubuntu 14.04LTS."
2016-03-20 20:40:21 +02:00
Raphael Carvalho
de4b4e593d db: better handling of failure in column_family::populate
Improve handling of failure by saving first exception and ignoring
the remaining futures. At the moment, code only throws first
exception and doesn't care about any possible remaining future.

Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com>
Message-Id: <383dc4445db09dd2fbce093d4609a0a0bc38a405.1458240398.git.raphaelsc@scylladb.com>
2016-03-20 17:33:20 +02:00
Avi Kivity
7869a48c31 Update scylla-ami submodule
* dist/ami/files/scylla-ami 84bcd0d...56f1ab7 (2):
  > Ubuntu AMI support on scylla_install_ami
  > scylla_ami_setup is not POSIX sh compatible, change shebang to /bin/bash
2016-03-20 17:26:03 +02:00
Takuya ASADA
769204d41e dist: allow more requests for i2 instances
i2 instances has better performance than others, so allow more requests.
Fixes #921

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458251067-1533-1-git-send-email-syuu@scylladb.com>
2016-03-20 17:24:52 +02:00
Tomasz Grabiec
c518e852ee modificiation_statement: Use result_view::do_with()
Reduces code duplication.

Message-Id: <1458336592-22065-1-git-send-email-tgrabiec@scylladb.com>
2016-03-20 15:14:28 +02:00
Avi Kivity
6d031b4c6b Merge seastar upstream
* seastar 6a207e1...c193821 (6):
  > semaphore: allow wait() and signal() after broken()
  > run reactor::stop() only once
  > sharded: fix start with reference parameter
  > core: add asserts to rwlock
  > util/defer: Fix cancel() not being respected
  > tcp: Do not return accept until the connection is connected
2016-03-20 13:32:18 +02:00
Tomasz Grabiec
8134992024 mutation_partition: Add cell_entry constructor which makes an empty cell 2016-03-18 22:30:04 +01:00
Tomasz Grabiec
518e956736 mutation_partition: Make row::vector_to_set() exception-safe
Currently allocation failure can leave the old row in a
half-moved-from state and leak cell_entry objects.
2016-03-18 22:30:04 +01:00
Tomasz Grabiec
c91eefa183 mutation_partition: Unmark cell_entry's copy constructor as noexcept
It was a mistake, it certainly may throw because it copies cells.
2016-03-18 22:30:04 +01:00
Glauber Costa
e52b869b25 fix small typo
will sent -> will send

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20eaf0cea6fe14b03332547b7c4a3b85e9b619e7.1458325926.git.glauber@scylladb.com>
2016-03-18 20:34:22 +02:00
Takuya ASADA
a6cd085c38 dist: allow to run 'sudo scylla_ami_setup' for Ubuntu AMI
Allows to run scylla_ami_setup from scylla-server.conf

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:50 +09:00
Takuya ASADA
7828023599 dist: launch scylla_ami_setup on Ubuntu AMI
Since upstart does not have same behavior as systemd, we need to run scylla_io_setup and scylla_ami_setup in scylla-server.conf's pre-start stanza.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:50 +09:00
Takuya ASADA
93bf7bff8e dist: fix broken scylla_install_pkg --local-pkg and --unstable on Ubuntu
--local-pkg and --unstable arguments didn't handled on Ubuntu, support it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:50 +09:00
Takuya ASADA
0c83b34d0c dist: prevent to show up dialog on apt-get in scylla_raid_setup
"apt-get -y install mdadm" shows up a dialog to select install mode of postfix, this will block scylla-ami-setup.service forever since it is running as background task, we need to prevent it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:50 +09:00
Takuya ASADA
b097ed6d75 dist: Ubuntu based AMI support
This introduces Ubuntu AMI.
Both CentOS AMI and Ubuntu AMI are need to build on same distribution, so build_ami.sh script automatically detect current distribution, and selects base AMI image.

Fixes #998

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-18 05:57:40 +09:00
Takuya ASADA
4cc589872d dist: follow sysconfig setting when counting number of cpus on scylla_io_setup
When NR_CPU >= 8, we disabled cpu0 for AMI on scylla_sysconfig_setup.
But scylla_io_setup doesn't know that, try to assign NR_CPU queues, then scylla fails to start because queues > cpus.
So on this fix scylla_io_setup checks sysconfig settings, if '--smp <n>' specified on SCYLLA_ARGS, use n to limit queue size.
Also, when instance type is not supported pre-configured parameters, we need to passes --cpuset parameters to iotune. Otherwise iotune will run on a different set of CPUs, which may have different performance characteristics.

Fixes #996, #1043, #1046

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458221762-10595-2-git-send-email-syuu@scylladb.com>
2016-03-17 16:44:46 +02:00
Takuya ASADA
6f71173827 dist: On scylla_sysconfig_setup, don't disable cpu0 on non-AMI environments
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458221762-10595-1-git-send-email-syuu@scylladb.com>
2016-03-17 16:44:45 +02:00
Benoît Canet
3b1d3d977d exceptions: Shutdown communications on non file I/O errors
Apply the same treatment to non file filesystem I/O errors.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:54 +02:00
Benoît Canet
1fb9a48ac5 exception: Optionally shutdown communication on I/O errors.
I/O errors cannot be fixed by Scylla the only solution
is to shutdown the database communications.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:52 +02:00
Pekka Enberg
69dacf9063 main: Fix broadcast_address and listen_address validation errors
Fix the validation error message to look like this:

  Scylla version 666.development-20160316.49af399 starting ...
  WARN  2016-03-17 12:24:15,137 [shard 0] config - Option partitioner is not (yet) used.
  WARN  2016-03-17 12:24:15,138 [shard 0] init - NOFILE rlimit too low (recommended setting 200000, minimum setting 10000; you may run out of file descriptors.
  ERROR 2016-03-17 12:24:15,138 [shard 0] init - Bad configuration: invalid 'listen_address': eth0: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> > (Invalid argument)
  Exiting on unhandled exception of type 'bad_configuration_error': std::exception

Instead of:

  Exiting on unhandled exception of type 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >': Invalid argument

Fixes #1051.

Message-Id: <1458210329-4488-1-git-send-email-penberg@scylladb.com>
2016-03-17 14:59:00 +02:00
Tomasz Grabiec
b9af32c9d5 Merge branch 'pdziepak/fix-lsa-memory-accounting/v1' from seastar-dev.git
Memory accounting fix from Paweł.
2016-03-17 12:55:21 +01:00
Paweł Dziepak
13849fd129 tests/lsa: add test for region groups
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-17 11:20:22 +00:00
Paweł Dziepak
ed53784cb6 tests/lsa: do not leak memory in large allocation test
Large allocations test, unsurprisingly, allocates a lot of memory. Do
not leak it so that any tests that are going to be run afterwards have
still some memory left.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-17 11:19:13 +00:00
Paweł Dziepak
338fd34770 lsa: update _closed_occupancy after freeing all segments
_closed_occupancy will be used when a region is removed from its region
group, make sure that it is accurate.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-17 11:12:05 +00:00
Pekka Enberg
0434bc3d33 dist: Fix '--developer-mode' parsing in scylla_io_setup
We need to support the following variations:

   --developer-mode true
   --developer-mode 1
   --developer-mode=true
   --developer-mode=1

Fixes #1026.
Message-Id: <1458203393-26658-1-git-send-email-penberg@scylladb.com>
2016-03-17 09:58:34 +01:00
Pekka Enberg
972fc6e014 main: Defer API server hooks until commitlog replay
Defer registering services to the API server until commitlog has been
replayed to ensure that nobody is able to trigger sstable operations via
'nodetool' before we are ready for them.
Message-Id: <1458116227-4671-1-git-send-email-penberg@scylladb.com>
2016-03-17 10:04:35 +02:00
Takuya ASADA
95161d5db7 dist: add scylla-gdb.py on Ubuntu dbg package
Fixes #969

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458150248-10632-1-git-send-email-syuu@scylladb.com>
2016-03-17 09:03:00 +02:00
Pekka Enberg
303dd76205 Merge "Fix debug messages for streaming session" from Glauber
"One of the messages is printed twice, and one of the verbs is missing
 a message. That makes it hard to debug the session."
2016-03-17 08:11:50 +02:00
Glauber Costa
a3ebf640c6 stream_session: print debug message for STREAM_MUTATION
For this verb(), we don't call get_session - and it doesn't look like we will.
We currently have no debug message for this one, which makes it harder to debug
the stream of messages. Print it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-16 22:09:46 -04:00
Glauber Costa
0ab4275893 stream_session: remove duplicated debug message
Whenever we call get_session, that will print a debug message about the arrival
of this new verb. Because we also print that explicitly in PREPARE_DONE, that
message gets duplicated.

That confuses poor developers who are, for a while, left wondering why is it that
the sender is sender the message twice.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-16 22:04:25 -04:00
Glauber Costa
6a3872b355 sstables: do not assume mutation_reader will be kept alive
Our sstables::mutation_reader has a specialization in which start and end
ranges are passed as futures. That is needed because we may have to read the
index file for those.

This works well under the assumption that every time a mutation_reader will be
created it will be used, since whoever is using it will surely keep the state
of the reader alive.

However, that assumption is no longer true - for a while. We use a reader
interface for reading everything from mutations and sstables to cache entries,
and when we create an sstable mutation_reader, that does not mean we'll use it.
In fact we won't, if the read can be serviced first by a higher level entity.

If that happens to be the case, the reader will be destructed. However, since
it may take more time than that for the start and end futures to resolve, by
the time they are resolved the state of the mutation reader will no longer be
valid.

The proposed fix for that is to only resolve the future inside
mutation_reader's read() function. If that function is called,  we can have a
reasonable expectation that the caller object is being kept alive.

A second way to fix this would be to force the mutation reader to be kept alive
by transforming it into a shared pointer and acquiring a reference to itself.
However, because the reader may turn out not to be used, the delayed read
actually has the advantage of not even reading anything from the disk if there
is no need for it.

Also, because sstables can be compacted, we can't guarantee that the sst object
itself , used in the resolution of start and end can be alive and that has the
same problem. If we delay the calling of those, we will also solve a similar
problem.  We assume here that the outter reader is keeping the SSTable object
alive.

I must note that I have not reproduced this problem. What goes above is the
result of the analysis we have made in #1036. That being the case, a thorough
review is appreciated.

Fixes #1036

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <a7e4e722f76774d0b1f263d86c973061fb7fe2f2.1458135770.git.glauber@scylladb.com>
2016-03-16 17:51:02 +02:00
Nadav Har'El
02ba8ffbe8 Allow uncompression at end of file
Asking to read from byte 100 when a file has 50 bytes is an obvious error.
But what if we ask to read from byte 50? What if we ask to read 0 bytes at
byte 50? :-)

Before this patch, code which asked to read from the EOF position would
get an exception. After this patch, it would simply read nothing, without
error. This allows, for example, reading 0 bytes from position 0 on a file
with 0 bytes, which apparently happened in issue #1039...

A read which starts at a position higher than the EOF position still
generates an exception.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1458137867-10998-1-git-send-email-nyh@scylladb.com>
2016-03-16 17:50:23 +02:00
Nadav Har'El
73297c7872 Fix out-of-range exception when uncompressing 0 bytes
The uncompression code reads the compressed chunks containing the bytes
pos through pos + len - 1. This, however, is not correct when len==0,
and pos + len - 1 may even be -1, causing an out-of-range exception when
calling locate() to find the chunks containing this byte position.

So we need to treat len==0 specially, and in this case we don't read
anything, and don't need to locate() the chunks to read.

Refs #1039.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1458135987-10200-1-git-send-email-nyh@scylladb.com>
2016-03-16 15:54:48 +02:00
Takuya ASADA
f1d18e9980 dist: do not auto-start scylla-server job on Ubuntu package install time
Fixes #1017

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1458122424-22889-1-git-send-email-syuu@scylladb.com>
2016-03-16 13:55:12 +02:00
Pekka Enberg
2f519b9b34 tests/gossip_test: Fix messaging service stop
This fixes gossip test shutdown similar to what commit 13ce48e ("tests:
Fix stop of storage_service in cql_test_env") did for CQL tests:

  gossip_test: /home/penberg/scylla/seastar/core/sharded.hh:439: Service& seastar::sharded<Service>::local() [with Service = net::messaging_service]: Assertion `local_is_initialized()' failed.
  Running 1 test case...

  [snip]

  unknown location(0): fatal error in "test_boot_shutdown": signal: SIGABRT (application abort requested)
  seastar/tests/test-utils.cc(32): last checkpoint
Message-Id: <1458126520-20025-1-git-send-email-penberg@scylladb.com>
2016-03-16 13:15:18 +02:00
Asias He
2d50c71ca3 streaming: Handle cf is deleted after the deletion check
The cf can be deleted after the cf deletion check. Handle this case as
well.

Use "warn" level to log if cf is missing. Although we can handle the
case, but it is good to distingush where the receiver of streaming
applied all the stream mutations or not. We believe that the cf is
missing because it was dropped, but it could be missing because of a bug
or something we didn't anticipated here.

Related patch: "streaming: Handle cf is deleted when sending
STREAM_MUTATION_DONE"

Fixes simple_add_new_node_while_schema_changes_test failure.
Message-Id: <c4497e0500f50e0a3422efb37e73130765c88c57.1458090598.git.asias@scylladb.com>
2016-03-16 09:46:41 +01:00
Asias He
13ce48e775 tests: Fix stop of storage_service in cql_test_env
In stop() of storage_service, it unregisters the verb handler. In the
test, we stop messaging_service before storage_service. Fix it by
deferring stop of messaging_service.
Message-Id: <c71f7b5b46e475efe2fac4c1588460406f890176.1458086329.git.asias@scylladb.com>
2016-03-16 08:32:01 +02:00
Asias He
83ffae1568 storage_service: Drop block_until_update_pending_ranges_finished
It is a legacy API from c*. Since we can wait for the
update_pending_ranges to complete, we can wait for it directly instead
of calling block_until_update_pending_ranges_finished to do so.

Also, change do_update_pending_ranges to be private.

Message-Id: <ac79b2879ec08fdcd3b2278ff68962cc71492f12.1458040608.git.asias@scylladb.com>
2016-03-15 15:18:45 +02:00
Avi Kivity
cc3e49e16f Merge seastar upstream
* seastar 0739576...6a207e1 (3):
  > file: allow custom file_impl implementations
  > Dockerfile update
  > tcp: Fix a typo in input_handle_other_state
2016-03-15 15:06:35 +02:00
Gleb Natapov
c6157dd99e enable rpc_keepalive parameter
Fixes #1044

Message-Id: <20160315104609.GV6117@scylladb.com>
2016-03-15 12:51:12 +02:00
Paweł Dziepak
9f3893980a move SCHEMA_CHECK registration to migration_manager
The verb is just for reporting and debugging purposes, but it is better
not to register it until it can return a meaningful value. Besides, it
really belongs to the migration manager subsystem anyway.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1458037053-14836-1-git-send-email-pdziepak@scylladb.com>
2016-03-15 12:24:37 +02:00
Asias He
d79dbfd4e8 main: Defer initalization of streaming
Streaming is used by bootstrap and repair. Streaming uses storage_proxy
class to apply the frozen_mutation and db/column_family class to
invalidate row cache. Defer the initalization just before repair and
bootstrap init.
Message-Id: <8e99cf443239dd8e17e6b6284dab171f7a12365c.1458034320.git.asias@scylladb.com>
2016-03-15 11:56:34 +02:00
Pekka Enberg
eb13f65949 main: Defer REPAIR_CHECKSUM_RANGE RPC verb registration after commitlog replay
Register the REPAIR_CHECKSUM_RANGE messaging service verb handler after
we have replayed the commitlog to avoid responding with bogus checksums.
Message-Id: <1458027934-8546-1-git-send-email-penberg@scylladb.com>
2016-03-15 11:56:18 +02:00
Pekka Enberg
917ed4adbe Merge "verb init/handler for gosisp and storage_service" from Asias
"- ignore ack2 msg if gossip is not enabled
 - move REPLICATION_FINISHED to where it belongs to
 - add comments for gossip runtime dependency"
2016-03-15 11:12:10 +02:00
Avi Kivity
ad26e81444 Merge "Update pending ranges when ks is changed" from Asias
"At the momment, the migration_listener callbacks returns void, it is impossible
to wait for the callbacks to complete. Make the callbacks runs inside seastar
thread, so if we need to wait for the callback, we can make it call
foo_operation().get() in the callback. It is easier than making the callbacks
return future<>.

Fixes #1000."
2016-03-15 10:50:07 +02:00
Asias He
883d8cb8fd storage_service: Move REPLICATION_FINISHED verb to storage_service
It belongs to storage_service not storage_proxy.
2016-03-15 16:13:22 +08:00
Asias He
fb4d292d5c storage_service: Drop unused debug code 2016-03-15 16:13:21 +08:00
Asias He
16af12ca47 gossip: Add comments on external runtime dependency needed by gossip 2016-03-15 16:13:13 +08:00
Asias He
1034dd0aff gossip: Ignore ack2 message if gosisp is not enabled yet 2016-03-15 16:09:43 +08:00
Asias He
1bf0412e7a gossip: Introduce handle_shutdown_msg helper 2016-03-15 16:09:43 +08:00
Asias He
54d8ac16b5 gossip: Introduce handle_echo_msg helper 2016-03-15 16:09:42 +08:00
Asias He
1f64f4bfcb gossip: Introdcue handle_ack2_msg helper 2016-03-15 16:09:42 +08:00
Asias He
d63281b256 storage_service: Update pending ranges when keyspace is changed
If a keyspace is created after we calcuate the pending ranges during
bootstrap. We will ignore the keyspace in pending ranges when handling
write request for that keyspace which will casue data lose if rf = 1.

Fixes #1000
2016-03-15 15:41:23 +08:00
Asias He
93015bcc54 migration_manager: Make the migration callbacks runs inside seastar thread
At the momment, the callbacks returns void, it is impossible to wait for
the callbacks to complete. Make the callbacks runs inside seastar
thread, so if we need to wait for the callback, we can make it call
foo_operation().get() in the callback. It is easier than making the
callbacks return future<>.
2016-03-15 15:41:23 +08:00
Gleb Natapov
5076f4878b main: Defer storage proxy RPC verb registration after commitlog replay
Message-Id: <20160315071229.GM6117@scylladb.com>
2016-03-15 09:18:12 +02:00
Gleb Natapov
e228ef1bd9 messaging: enable keepalive tcp option for inter-node communication
Some network equipment that does TCP session tracking tend to drop TCP
sessions after a period of inactivity. Use keepalive mechanism to
prevent this from happening for our inter-node communication.

Message-Id: <20160314173344.GI31837@scylladb.com>
2016-03-14 19:39:39 +02:00
Avi Kivity
7ae2298081 Merge seastar upstream
* seastar 88cc232...0739576 (4):
  > rpc: allow configuring keepalive for rpc client
  > net: add keepalive configuration to socket interface
  > iotune: refuse to run if there is not enough space available
  > rpc: make client connection error more clear
2016-03-14 19:38:54 +02:00
Pekka Enberg
1429213b4c main: Defer migration manager RPC verb registration after commitlog replay
Defer registering migration manager RPC verbs after commitlog has has
been replayed so that our own schema is fully loaded before other other
nodes start querying it or sending schema updates.
Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com>
2016-03-14 18:03:16 +01:00
Pekka Enberg
16f947dcb3 message/messaging_service: Remove init_messaging_service() declaration
The function no longer exists so drop the function declaration.
Message-Id: <1457694134-25600-1-git-send-email-penberg@scylladb.com>
2016-03-14 13:54:53 +02:00
Vlad Zolotarov
ce47fcb1ba sstables: properly account removal requests
The same shard may create an sstables::sstable object for the same SStable
that doesn't belong to it more than once and mark it
for deletion (e.g. in a 'nodetool refresh' flow).

In that case the destructor of sstables::sstable accounted
the deletion requests from the same shard more than once since it was a simple
counter incremented each time there was a deletion request while it should
account request from the same shard as a single request. This is because
the removal logic waited for all shards to agree on a removal of a specific
SStable by comparing the counter mentioned above to the total
number of shards and once they were equal the SStable files were actually removed.

This patch fixes this by replacing the counter by an std::unordered_set<unsigned>
that will store a shard ids of the shards requesting the deletion
of the sstable object and will compare the size() of this set
to smp::count in order to decide whether to actually delete the corresponding
SStable files.

Fixes #1004

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1457886812-32345-1-git-send-email-vladz@cloudius-systems.com>
2016-03-14 11:45:08 +02:00
Raphael S. Carvalho
1ff7d32272 sstables: make write_simple() safer by using exclusive flag
We should guarantee that write_simple() will not try to overwrite
an existing file.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <194bd055f1f2dc1bb9766a67225ec38c88e7b005.1457818073.git.raphaelsc@scylladb.com>
2016-03-14 11:45:00 +02:00
Raphael S. Carvalho
0af786f3ea sstables: fix race condition when writing to the same sstable in parallel
When we are about to write a new sstable, we check if the sstable exists
by checking if respective TOC exists. That check was added to handle a
possible attempt to write a new sstable with a generation being used.
Gleb was worried that a TOC could appear after the check, and that's indeed
possible if there is an ongoing sstable write that uses the same generation
(running in parallel).
If TOC appear after the check, we would again crap an existing sstable with
a temporary, and user wouldn't be to boot scylla anymore without manual
intervention.

Then Nadav proposed the following solution:
"We could do this by the following variant of Raphael's idea:

   1. create .txt.tmp unconditionally, as before the commit 031bf57c1
(if we can't create it, fail).
   2. Now confirm that .txt does not exist. If it does, delete the .txt.tmp
we just created and fail.
   3. continue as usual
   4. and at the end, as before, rename .txt.tmp to .txt.

The key to solving the race is step 1: Since we created .txt.tmp in step 1
and know this creation succeeded, we know that we cannot be running in
parallel with another writer - because such a writer too would have tried to
create the same file, and kept it existing until the very last step of its
work (step 4)."

This patch implements the solution described above.
Let me also say that the race is theoretical and scylla wasn't affected by
it so far.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ef630f5ac1bd0d11632c343d9f77a5f6810d18c1.1457818331.git.raphaelsc@scylladb.com>
2016-03-14 11:44:51 +02:00
Avi Kivity
7278d0343b Merge seastar upstream
* seastar 906b562...88cc232 (2):
  > reactor: fix work item leak in syscall work queue
  > rpc_test: add missing header
2016-03-14 11:15:42 +02:00
Asias He
9f64c36a08 storage_service: Fix pending_range_calculator_service
Since calculate_pending_ranges will modify token_metadata, we need to
replicate to other shards. With this patch, when we call
calculate_pending_ranges, token_metadata will be replciated to other
non-zero shards.

In addition, it is not useful as a standalone class. We can merge it
into the storage_service. Kill one singleton class.

Fixes #1033
Refs #962
Message-Id: <fb5b26311cafa4d315eb9e72d823c5ade2ab4bda.1457943074.git.asias@scylladb.com>
2016-03-14 10:14:22 +02:00
Pekka Enberg
d4b4baad98 Merge "Add more information to query result digest" from Paweł
"This series adds more information (i.e. keys and tombstones) to the
query result digest in order to ensure correctness and increase the
chances of early detection of disagreement between replicas.

The digest is no longer computed by hashing query::result but build
using the query result builder. That is necessary since the query
result itself doesn't contain all information required to compute
the digest. Another consequence of this is that now replicas asked
for a result need to send both the result and the digest to
the coordinator as it won't be able to compute the digest itself.

Unfortunately, these patches change our on wire communication:
 1) hash computation is different
 2) format of query::result is changed (and it is made non-final)

Fixes #182."
2016-03-14 08:22:05 +02:00
Paweł Dziepak
72970c9c90 query: add query::result::_digest to pretty printer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:17 +00:00
Paweł Dziepak
82d2a2dccb specify whether query::result, result_digest or both are needed
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Paweł Dziepak
21e2ebcf8c query: build only result, only digest or both
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Paweł Dziepak
46079f763b query: add keys and tombstones to result digest
Query result digest is used to verify that all replicas have the same
data. Therefore, it needs to contain more information than the query
result itself in order to ensure proper detection of disagreements.

Generally, adding clustering keys to the digest regardless of whether
the client asked for them will guarantee correctness. However, adding
tombstones as well improves the chances of early detection of nodes
containing stale data.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Paweł Dziepak
15fd3e96ff md5_hasher: add finalize_array()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Paweł Dziepak
3efb10bd08 result.idl: keep digest together with result
Result digest is going to be computed in query result builder and
require information not available in the query resylt. That's why the
digest now needs to be sent to the other nodes together with the result
as they won't be able compute it on their own.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Paweł Dziepak
86ba96622e atomic_cell: do not require type to hash collection cell
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Paweł Dziepak
23ee493d91 types: make collection_type_impl::deserialize_mutation_form static
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Paweł Dziepak
c1f7f11d54 mutation_partition: do not add ck to result when not asked to
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Paweł Dziepak
77dbe3c12f storage_proxy: fix reconciliation with limits
Currently, if there is a disagreement between replicas we get mutations
from all of them, merge this mutations and send the result to the
client, difference between the result and the mutation sent by a
particular replica is sent back to repair it.
Unfortunately, that may not suffice to provide user with correct results
in case of disagreements.

Consider the following scenario:

create table cf(p int, c int, r int, primary key(p, c));

node1:
p=0, c=1, r=1 (timestamp = 1)
p=0, c=2, r=2 (timestamp = 2)

node2:
p=0, c=1, r=tombstone (timestamp = 2)
p=0, c=2, r=1 (timestamp = 1)

query:
select r from cf limit 1;

Let's assume there are no row markers. node1 will send only outdated
cell (p=0, c=1, r=1) while node2 will send both tombstone for c=1 and
outdated cell (p=0, c=2, r=1). A disagreement will be detected, the
replies will be merged and the coordinator will respond to the client
with result r=1, while the correct answer is r=2.

The solution proposed in this patch is to attempt to detect cases when
the problem may occur and retry queries with larger limit which result
in replicas providing more information.

The detection logic is simple: the partition key and clustering key of
the last row in the reconciled result are compared with the partition
keys and clustering keys of the last rows of replies from replicas
(except short reads). If the (pk, ck) of the replica last row is smaller
than the (pk, ck) of the reconciled result the query is retried.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:26:33 +00:00
Asias He
f747df2aff streaming: Fix rethrow in stream_transfer_task
Fix bootstrap_test.py:TestBootstrap.failed_bootstap_wiped_node_can_join_test

Logs on node 1:
 INFO  2016-03-11 15:53:43,287 [shard 0] gossip - FatClient 127.0.0.2 has been silent for 30000ms, removing from gossip
 INFO  2016-03-11 15:53:43,287 [shard 0] stream_session - stream_manager: Close all stream_session with peer = 127.0.0.2 in on_remove
 WARN  2016-03-11 15:53:43,498 [shard 0] stream_session - [Stream #4e411ba0-e75e-11e5-81f8-000000000000] stream_transfer_task: Fail to send STREAM_MUTATION_DONE to 127.0.0.2:0: std::runtime_error ([Stream #4e411ba0-e75e-11e5-81f8-000000000000] GOT STREAM_ MUTATION_DONE 127.0.0.1: Can not find stream_manager)

 terminate called without an active exception

Backtrace on node 1:

 #0  0x00007fb74723da98 in raise () from /lib64/libc.so.6
 #1  0x00007fb74723f69a in abort () from /lib64/libc.so.6
 #2  0x00007fb74ab84aed in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
 #3  0x00007fb74ab82936 in ?? () from /lib64/libstdc++.so.6
 #4  0x00007fb74ab82981 in std::terminate() () from /lib64/libstdc++.so.6
 #5  0x00007fb74ab82be9 in __cxa_rethrow () from /lib64/libstdc++.so.6
 #6  0x0000000000f3521e in streaming::stream_transfer_task::<lambda()>::<lambda(auto:44)>::operator()<std::__exception_ptr::exception_ptr> (ep=..., __closure=0x7ffce74d8630) at streaming/stream_transfer_task.cc:169
 #7  do_void_futurize_apply<const streaming::stream_transfer_task::start()::<lambda()>::<lambda(auto:44)>&, std::__exception_ptr::exception_ptr> (func=...) at /home/asias/src/cloudius-systems/scylla/seastar/core/future.hh:1142
 #8  futurize<void>::apply<const streaming::stream_transfer_task::start()::<lambda()>::<lambda(auto:44)>&, std::__exception_ptr::exception_ptr> (func=...) at /home/asias/src/cloudius-systems/scylla/seastar/core/future.hh:1190
 #9  future<>::<lambda(auto:7&&)>::operator()<future<> > ( fut=fut@entry=<unknown type in /home/asias/src/cloudius-systems/scylla/build/release/scylla, CU 0xec84d00, DIE 0xee2561d>, __closure=__closure@entry=0x7ffce74d8630) at /home/asias/src/cloudius-systems/scylla/seastar/core/future.hh:1014

Message-Id: <1457684884-4776-2-git-send-email-asias@scylladb.com>
2016-03-11 11:14:05 +02:00
Asias He
bcdd3dbb3e messaging_service: Add missed throw
It is missed somehow.
Message-Id: <1457684884-4776-1-git-send-email-asias@scylladb.com>
2016-03-11 11:01:24 +02:00
Raphael S. Carvalho
031bf57c19 sstables: bail out if toc exists for generation used by write_components
Currently, if sstable::write_components() is called to write a new sstable
using the same generation of a sstable that exists, a temporary TOC will
be unconditionally created. Afterwards, the same sstable::write_components()
will fail when it reaches sstable::create_data(). The reason is obvious
because data component exists for that generation (in this scenario).
After that, user will not be able to boot scylla anymore because there is
a generation with both a TOC and a temporary TOC. We cannot simply remove a
generation with TOC and temporary TOC because user data will be lost (again,
in this scenario). After all, the temporary TOC was only created because
sstable::write_components() was wrongly called with the generation of a
sstable that exists.

Solution proposed by this patch is to trigger exception if a TOC file
exists for the generation used.

Some SSTable unit tests were also changed to guarantee that we don't try
to overwrite components of an existing sstable.

Refs #1014.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <caffc4e19cdcf25e4c6b9dd277d115422f8246c4.1457643565.git.raphaelsc@scylladb.com>
2016-03-11 09:22:51 +02:00
Nadav Har'El
1b4f8842ee sstable: fix compressed data file overread
Since commit 2f56577 ("sstables: more efficient read of compressed data
file"), the compressed_file_input_stream uses a file_input_stream to
efficiently read the compressed data at chunks some desired size (128 KB
is our default) instead of at smaller compressed chunks.

However, I had a bug where I mis-calculated the desired length of the
read (giving the *end byte* instead of the length!) and as a result
file_input_stream did not know where the read was supposed to stop, and
always read 128 KB buffers. The results were not incorrect, because the
sstable reader stops when it needs to, even if given too much data. But
it was inefficient because too much data was read in the last buffer.

With this patch, the length is correctly given to the input stream, and
it can read a much smaller buffer at the end of the read, not the full
128 KB. I tested that this actually happens.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1457633616-15193-1-git-send-email-nyh@scylladb.com>
2016-03-11 09:17:50 +02:00
Pekka Enberg
987e8579d7 Merge "general robustness improvements for SSTables code" from Glauber
"As described in issue #1014, we have found ourselves in a situation where
 SSTables can be written too early, and that causes problems for the existing
 SSTables.  While this shouldn't happen - and Pekka's recent patch to move
 populate() a lot earlier in initialization should fix that, when that did
 happen what we had was not enough to prevent it from overwriting existing
 tables.

 We should do a lot better job protecting against that.

 Also, some of the exceptions that are generated at totally inconclusive. This
 series also aims at making some of the exceptions more descriptive."
2016-03-11 09:03:05 +02:00
Glauber Costa
a339296385 database: turn sstable generation number into an optional
This patch makes sure that every time we need to create a new generation number -
the very first step in the creation of a new SSTable, the respective CF is already
initialized and populated. Failure to do so can lead to data being overwritten.
Extensive details about why this is important can be found
in Scylla's Github Issue #1014

Nothing should be writing to SSTables before we have the chance to populate the
existing SSTables and calculate what should the next generation number be.

However, if that happens, we want to protect against it in a way that does not
involve overwriting existing tables. This is one of the ways to do it: every
column family starts in an unwriteable state, and when it can finally be written
to, we mark it as writeable.

Note that this *cannot* be a part of add_column_family. That adds a column family
to a db in memory only, and if anybody is about to write to a CF, that was most
likely already called. We need to call this explicitly when we are sure we're ready
to issue disk operations safely.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-10 21:06:05 -05:00
Glauber Costa
f2a8bcabc2 sstables: improve error messages
The standard C++ exception messages that will be thrown if there is anything
wrong writing the file, are suboptimal: they barely tell us the name of the failing
file.

Use a specialized create function so that we can capture that better.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-10 21:06:05 -05:00
Glauber Costa
6c4e31bbdb main: when scanning SSTables, run shard 0 first
Deletion of previous stale, temporary SSTables is done by Shard0. Therefore,
let's run Shard0 first. Technically, we could just have all shards agree on the
deletion and just delete it later, but that is prone to races.

Those races are not supposed to happen during normal operation, but if we have
bugs, they can. Scylla's Github Issue #1014 is an example of a situation where
that can happen, making existing problems worse. So running a single shard
first and getting making sure that all temporary tables are deleted provides
extra protection against such situations.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-10 21:06:05 -05:00
Glauber Costa
8eb4e69053 database: remove unused parameter
We are no longer using the in_flight_seals gate, but forgot to remove it.
To guarantee that all seal operations will have finished when we're done,
we are using the memtable_flush_queue, which also guarantees order. But
that gate was never removed.

The FIXME code should also be removed, since such interface does exist now.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-10 21:05:54 -05:00
Glauber Costa
94e90d4a17 column_family: do not open code generation calculation
We already have a function that wraps this, re-use it.  This FIXME is still
relevant, so just move it there. Let's not lose it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-10 21:05:47 -05:00
Glauber Costa
46fdeec60a colum_family: remove mutation_count
We use memory usage as a threshold these days, and nowhere is _mutation_count
checked. Get rid of it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-10 21:05:47 -05:00
Gleb Natapov
16135c2084 make initialization run in a thread
While looking at initialization code I felt like my head is going to
explode. Moving initialization into a thread makes things a little bit
better. Only lightly tested.

Message-Id: <20160310163142.GE28529@scylladb.com>
2016-03-10 17:42:05 +01:00
Gleb Natapov
176aa25d35 fix developer-mode parameter application on SMP
I am almost sure we want to apply it once on each shard, and not multiple
times on a single shard.

Message-Id: <20160310155804.GB28529@scylladb.com>
2016-03-10 17:17:48 +01:00
Pekka Enberg
97bef4fb7c build: Fix http/http_response_parser.hh dependency
Make sure http_response.hh that is pulled by locator/ec2_snitch.hh is
built. The commit is similar to what commit 6ccf8f8 ("build: make sure
to ask seastar to build http/request_parser.hh, and depend on it") did
for request_parser.hh.

Fixes the following build error on CentOS:

  In file included from ./locator/ec2_multi_region_snitch.hh:41:0,
                   from locator/ec2_multi_region_snitch.cc:39:
  ./locator/ec2_snitch.hh:24:40: fatal error: http/http_response_parser.hh: No such file or directory

Spotted by Shlomi.
Message-Id: <1457612266-315-1-git-send-email-penberg@scylladb.com>
2016-03-10 14:46:41 +01:00
Gleb Natapov
51ca3122cf cleanup forward declaration for key types
Message-Id: <20160310075138.GC6117@scylladb.com>
2016-03-10 10:52:19 +01:00
Pekka Enberg
5dd1fda6cf main: Initialize system keyspace earlier
We start services like gossiper before system keyspace is initialized
which means we can start writing too early. Shuffle code so that system
keyspace is initialized earlier.

Refs #1014
Message-Id: <1457593758-9444-1-git-send-email-penberg@scylladb.com>
2016-03-10 10:39:27 +01:00
Pekka Enberg
f2f35a2f50 Merge "fix shutdown and improve logging" from Asias
"Fixes #1005 and probably fixes #1013."
2016-03-10 08:21:48 +02:00
Asias He
a9ec752939 streaming: Reduce STREAM_MUTATION error logging
There might be larger number of STREAM_MUTATION inflight. Log one error
per column_family per range to avoid spam the log.
2016-03-10 10:56:48 +08:00
Asias He
134b814cde gossip: Log status info when stopping gossip 2016-03-10 10:56:48 +08:00
Asias He
7c4c99d7c7 streaming: Fix a log level in get_column_family_stores
It is supposed to be debug level instead of info level.
2016-03-10 10:56:48 +08:00
Asias He
cb90ff2709 storage_service: Make decommission log info instead of debug level
The log is just a few lines. It is very useful to tell which step fails
in case of error when we do decommission.
2016-03-10 10:56:48 +08:00
Asias He
ed723665df gossip: Do not stop gossip more than once
If we do
   - Decommission a node
   - Stop a node
we will shutdown gossip more than once in:
   - storage_service::decommission
   - storage_service::drain_on_shutdown

Fix by checking if it is already stopped and back off if so.
2016-03-10 10:56:48 +08:00
Asias He
138c5f5834 storage_service: Do not stop messaging_service more than once
If we do
   - Decommission a node
   - Stop a node
we will shutdown messaging_service more than once in:
   - storage_service::decommission
   - storage_service::drain_on_shutdown

Fixes #1005
Refs  #1013

This fix a dtest failure in debug build.

update_cluster_layout_tests.TestUpdateClusterLayout.simple_decommission_node_1_test/

/data/jenkins/workspace/urchin-dtest/label/monster/mode/debug/scylla/seastar/core/future.hh:802:35:
runtime error: member call on null pointer of type 'struct
future_state'
core/future.hh:334:49: runtime error: member access within null
pointer of type 'const struct future_state'
ASAN:SIGSEGV
=================================================================
==4557==ERROR: AddressSanitizer: SEGV on unknown address
0x000000000000 (pc 0x00000065923e bp 0x7fbf6ffac430 sp 0x7fbf6ffac420
T0)
    #0 0x65923d in future_state<>::available() const
/data/jenkins/workspace/urchin-dtest/label/monster/mode/debug/scylla/seastar/core/future.hh:334
    #1 0x41458f1 in future<>::available()
/data/jenkins/workspace/urchin-dtest/label/monster/mode/debug/scylla/seastar/core/future.hh:802
    #2 0x41458f1 in then_wrapped<parallel_for_each(Iterator, Iterator,
Func&&)::<lambda(parallel_for_each_state&)> [with Iterator =
std::__detail::_Node_iterator<std::pair<const net::msg_addr,
net::messaging_service::shard_info>, false, true>; Func =
net::messaging_service::stop()::<lambda(auto:39&)> [with auto:39 =
std::unordered_map<net::msg_addr, net::messaging_service::shard_info,
net::msg_addr::hash>]::<lambda(std::pair<const net::msg_addr,
net::messaging_service::shard_info>&)>]::<lambda(future<>)>, future<>
> /data/jenkins/workspace/urchin-dtest/label/monster/mode/debug/scylla/seastar/core/future.hh:878
2016-03-10 10:56:48 +08:00
Tomasz Grabiec
838a038cbd log: Fix operator<<(std::ostream&, const std::exception_ptr&)
Attempt to print std::nested_exception currently results in exception
to leak outside the printer. Fix by capturing all exception in the
final catch block.

For nested exception, the logger will print now just
"std::nested_exception".  For nested exceptions specifically we should
log more, but that is a separate problem to solve.
Message-Id: <1457532215-7498-1-git-send-email-tgrabiec@scylladb.com>
2016-03-09 16:05:03 +02:00
Pekka Enberg
2566f8dc18 configure: Remove 'scylla_libs' variable
It's not actually used by anyone so drop it.
Message-Id: <1457531753-27891-2-git-send-email-penberg@scylladb.com>
2016-03-09 14:56:54 +01:00
Pekka Enberg
9bfb6a0c5b configure: Add boost date_time library as a dependency
It's needed to fix the debug build.
Message-Id: <1457531753-27891-1-git-send-email-penberg@scylladb.com>
2016-03-09 14:56:51 +01:00
Takuya ASADA
0ab3d0fd52 dist: use SEASTAR_IO instead of SCYLLA_IO
sync with iotune, fixes #1010

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1457530910-1273-1-git-send-email-syuu@scylladb.com>
2016-03-09 15:45:34 +02:00
Gleb Natapov
f242c6395c storage_proxy: add counter for retries reads
Message-Id: <20160309130453.GF2253@scylladb.com>
2016-03-09 14:09:42 +01:00
Pekka Enberg
ab502bcfa8 types: Implement to_string for timestamps and dates
The to_string() function is used for logging purpose so use boost
to_iso_extended_string() to format both timestamps and dates.

Fixes #968 (showstopper)
Message-Id: <1457528755-6164-1-git-send-email-penberg@scylladb.com>
2016-03-09 14:08:33 +01:00
Pekka Enberg
8eedaca948 Merge "streaming: handle cf is deleted" from Asias
"Fixes #979
 Fixes #976"
2016-03-09 14:52:27 +02:00
Asias He
3a4ea227d8 storage_service: Fix effective_ownership
Now, get_ranges_for_endpoint will unwrap the first range. With t0 t1 t2
t3, the first range (t3,t0] will be splitted as (min,t0] and (t3,max].
Skippping the range (t3,max] we will get the correct ownership number as
if the first range were not splitted.

Fixes #928
Message-Id: <2e30ebd53f3dba3cc5e0cf36d5541c354b0e30ca.1457506704.git.asias@scylladb.com>
2016-03-09 13:26:01 +01:00
Asias He
d9ead889f3 streaming: Handle cf is deleted when sending STREAM_MUTATION_DONE
In the preparation phase of streaming, we check that remote node has all
the cf_id which are needed for the entire streaming process, including the
cf_id which local node will send to remote node and wise versa.

So, at later time, if the cf_id is missing, it must be that the cf_id is
deleted. It is fine to ingore no_such_column_family exception. In this
patch, we change the code to ignore at server side to avoid sending the
exception back, to avoid handle exception in an IDL compatiable way.

One thing we can improve is that the sender might know the cf is deleted
later than the receiver does. In this case, the sender will send some
more mutations if we send back the no_such_column_family back to the
sender. However, since we do not throw exceptions in the receiver stream
mutation handler, it will not cause a lot of overhead, the receiver will
just ignore the mutation received.

Fixes #979
2016-03-09 16:50:38 +08:00
Asias He
efa74dbae0 streaming: Do not send if the cf is deleted
It is possible that a cf is deleted after we make the cf reader. Avoid
sending them to avoid the unnecessary overhead to send them on the wire and
the peer node to drop the received mutations.
2016-03-09 16:50:38 +08:00
Asias He
4abaacfc61 db: Introduce column_family_exists
It is cheaper than throwing a no_such_column_family exception to test if
a cf is gone, e.g., deleted.
2016-03-09 16:50:38 +08:00
Asias He
dca9e594cc streaming: Remove the unused test code
It is introduced in the early development of streaming. We have dtest
for streaming now, drop it.
Message-Id: <1457499303-21163-1-git-send-email-asias@scylladb.com>
2016-03-09 10:31:42 +02:00
Pekka Enberg
4f3d6977f1 Merge "Abort stream_session if peer is removed or restarted" from Asias
"Hook streaming with gossip callback so we can abort
the stream_session in such case:

- a node is restarted
- a node is removed from the cluster

Fixes #1001."
2016-03-09 10:18:42 +02:00
Nadav Har'El
2f56577794 sstables: more efficient read of compressed data file
Before this patch, reading large ranges from a compressed data file involved
two inefficiencies:

 1.  The compressed data file was read one compressed chunk at a time.
     Such a chunk is around 30 KB in size, well below our desired sstable
     read-ahead size (sstable_buffer_size = 128 KB).

 2.  Because the compressed chunks have variable length (the uncompressed
     chunk has a fixed length) they are not aligned to disk blocks, so
     consecutive chunks have overlapping blocks which were unnecessarily
     read twice.

The fix for both issues is to build the compressed_file_input_stream on
an existing file_input_stream, instead of using direct file IO to read the
individual chunks. file_input_stream takes care of doing the appropriate
amount of read-ahead, and the compressed_file_input_stream layer does the
decompression of the data read from the underlying layer.

Fixes #992.

Historical note: Implementing compressed_file_input_stream on top of
file_input_stream was already tried in the past, and rejected. The problem
at that time was that compressed_file_input_stream's constructor did not
specify the *end* of the range to read, so that when we wanted to read
only a small range we got too much read-ahead beyond the exactly one
compressed chunk that we needed to read.  Following the fix to issue #964,
we now know on every streaming read also the intended *end* of the stream,
so we can now use this to stop reading at the end of the last required
chunk, even when we use a read-ahead buffer much larger than a chunk.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1457304335-8507-1-git-send-email-nyh@scylladb.com>
2016-03-09 10:14:15 +02:00
Glauber Costa
8260b8fc6f touch CF directories during startup
We try to be robust against files disappearing (due to any kind of corruption)
inside the data directory.

But if the data directory itself goes missing, that's a situation that we don't
handle correctly.  We will keep accepting writes normally, but when we try to
flush the memtable to disk, we'll fail with a system error.

Having the CF directory disappearing is not a common thing. But it is also one
that we can easily protect against, by touching all CF directories we know
about on startup.

Fixes #999

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <ed66373dccca11742150a6d08e21ece3980227d3.1457379853.git.glauber@scylladb.com>
2016-03-09 09:06:51 +02:00
Asias He
bf3507d093 messaging_service: Stop retrying if node is removed from gossip
- Start a node
- Inject data
- Start another node to bootstrap
- Before the second node finishes streaming, kill the second node
- After a while the node will be removed from the cluster becusue it does
  not manage to join the cluster.
- At this time, messaging_service might keep retrying the
  stream_mutations unncessarily.

To fix, check if the peer node is still a known node in the gossip.
2016-03-09 07:35:20 +08:00
Asias He
1f3928c321 streaming: Hook streaming with gossip callback
If the peer node of a stream_session is restarted or removed we should
abort the streaming. It is better to hook gossip callback in the stream
manager than in each streamm_session.
2016-03-09 07:35:20 +08:00
Glauber Costa
2cd756ae5e repair: replace a magic number with another magic number
In due time we will have to fix this, but as an interim step, let's use
a "better" magic number.

The problem with 100, is that as soon as the partitions start to go bigger,
we're using too much memory. Since this is multiplied by the number of token
ranges, and happens in every shard, the final number can become really big,
and the amount of resources we use go up proportionally.

This means that even we are mistaken about the new number (we probably are),
in this case it is better to err on the side of a more conservative resource
usage.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <97158f3db5734916cee4ccf12eaa66e7402570bb.1457448855.git.glauber@scylladb.com>
2016-03-08 17:29:00 +02:00
Nadav Har'El
b7e29691c2 sstables: avoid index and data file over-reads
When we do a streaming read that knows the expected *end* position of the
read, we can use a large read-ahead buffer, and at the same time, stop
reading at exactly the intended end (or small rounding of it to the DMA
block size) and not waste resources blindly reading a large amount of data
after the end just to fill the read-ahead buffer.

The sstable reading code, both for reading the data file and the index file,
created a file input stream without specifiying its end, thereby losing
this optimization - so when a large buffer was used, we would get a large
over-read. This patch fixes this, so sstable data file and index file are
read using a file input stream which is a ware of its end.

Fixes #964.

Note that this patch does not change the behavior when reading a
*compressed* data file. For compressed read, we did not have the problem
of over-read in the first place, because chunks are read one by one.
But we do have other sources of inefficiencies there (stemming, again,
from the fact that the compressed chunks are read one by one), and I
opened a separate issue #992 for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1457219304-12680-1-git-send-email-nyh@scylladb.com>
2016-03-08 17:26:10 +02:00
Calle Wilund
8575f1391f lists.cc: fix update insert of frozen list
Fixes #967

Frozen lists are just atomic cells. However, old code inserted the
frozen data directly as an atomic_cell_or_collection, which in turn
meant it lacked the header data of a cell. When in turn it was
handled by internal serialization (freeze), since the schema said
is was not a (non-frozen) collection, we tried to look at frozen
list data as cell header -> most likely considered dead.
Message-Id: <1457432538-28836-1-git-send-email-calle@scylladb.com>
2016-03-08 13:48:45 +01:00
Pekka Enberg
81af486b69 Update scylla-ami submodule
* dist/ami/files/scylla-ami d4a0e18...84bcd0d (1):
  > Add --ami parameter
2016-03-08 13:49:31 +02:00
Takuya ASADA
254b0fa676 dist: show message to use XFS for scylla data directory and also notify about developer mode, when iotune fails
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1457426286-15925-1-git-send-email-syuu@scylladb.com>
2016-03-08 12:20:33 +02:00
Pekka Enberg
83d82ea901 Merge "Fix Ubuntu package issues on AMI" from Takuya
"This fixes bugs on Ubuntu package and AMI scripts, closes #991."
2016-03-08 11:51:30 +02:00
Takuya ASADA
18a27de3c8 dist: export all entries on /etc/default/scylla-server on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-08 18:18:30 +09:00
Gleb Natapov
ce6d1a242a storage_proxy: fix background_reads counter
background_reads collectd counter was not always properly decremented.
Fix it and streamline background read repair error handling.

Message-Id: <20160307182255.GI4849@scylladb.com>
2016-03-07 19:41:09 +01:00
Yoav Kleinberger
1cd01cd2ab tools/scyllatop: defend against curses "out of screen bounds" error
Fixes issue #945 (hopefully)
This issue was probably the result of trying to write outside the
confines of the window. The views.Base class now defends against this.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <9735806b211567f3239e187d87437c484f532291.1457265435.git.yoav@scylladb.com>
2016-03-07 18:02:26 +01:00
Raphael S. Carvalho
0f4239d63a service: improve logging of storage_service::load_new_sstables
Closes #952.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2402f387c32d2d1221e740edb67e56c1593c1936.1457366098.git.raphaelsc@scylladb.com>
2016-03-07 18:01:52 +01:00
Raphael S. Carvalho
e850c1406e sstables: update comment
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <8abc1c6c66ed8d3bb35ecfb6d8251de3f61a97ae.1457093016.git.raphaelsc@scylladb.com>
2016-03-07 17:36:34 +01:00
Raphael S. Carvalho
822759eee0 compaction_manager: update stat pending_tasks properly
Size of both _cfs_to_cleanup and _cfs_to_compact must be added when
calculating a new value to _stats.pending_tasks.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b601e24d0631922798575f39d00fb54fe00d4971.1457093016.git.raphaelsc@scylladb.com>
2016-03-07 17:36:03 +01:00
Gleb Natapov
2d092bbd32 storage_proxy: send read requests with timeout
No need to wait for replies long after request is timed out.
Message-Id: <1457351304-28721-2-git-send-email-gleb@scylladb.com>
2016-03-07 14:00:11 +01:00
Gleb Natapov
4122422d19 storage_proxy: always wait for digest read resolver done future
Currently it is waited upon only if background read repair check is
needed and this cause unhandled exception warning to be printed if
it enters failed state. Fix this by always waiting on it, but doing
anything beyond ignoring an exception only if check is needed.
Message-Id: <1457351304-28721-1-git-send-email-gleb@scylladb.com>
2016-03-07 14:00:09 +01:00
Gleb Natapov
626c9d046b fix EACH_QUORUM handling during bootstrapping
Currently write acknowledgements handling does not take bootstrapping
node into account for CL=EACH_QUORUM. The patch fixes it.

Fixes #994

Message-Id: <20160307121620.GR2253@scylladb.com>
2016-03-07 13:56:34 +01:00
Raphael S. Carvalho
d65642cee8 fix storage_service::load_new_sstables() to not disable write permanently
Avi says:
"If an exception happens, then enable_sstable_writes won't be called."

The problem is fixed by catching a possible exception and enabling sstable
write for the relevant column family if it wasn't enabled already.

Closes #953.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <32c1bcb2c60c7b9e5514eb0a95062f40ca92093a.1457119308.git.raphaelsc@scylladb.com>
2016-03-07 13:56:02 +01:00
Gleb Natapov
f59415b3c6 Take pending endpoints into account while checking for sufficient live nodes
During bootstrapping additional copies of data has to be made to ensure
that CL level is met (see CASSANDRA-833 for details). Our code does
that, but it does not take into account that bootstraping node can be
dead which may cause request to proceed even though there is no
enough live nodes for it to be completed. In such a case request neither
completes nor timeouts, so it appear to be stuck from CQL layer POV. The
patch fixes this by taking into account pending nodes while checking
that there are enough sufficient live nodes for operation to proceed.

Fixes #965

Message-Id: <20160303165250.GG2253@scylladb.com>
2016-03-07 13:30:13 +01:00
Gleb Natapov
8dad399256 log: add space between log level and date in the outpu
It was dropped by 6dc51027a3

Message-Id: <20160306125313.GI2253@scylladb.com>
2016-03-07 13:06:06 +01:00
Tomasz Grabiec
9deb036e4e Merge branch 'dev/issue-845-set-incremental-backup-config-v1' from seastar-dev.git
From Vlad:

This series modifies the 'database' class to use the internal
_enable_incremental_backups value (initialized with
'incremental_backups' configuration value) instead of using the
'incremental_backups' configuration value directly.

Then we update this internal value in runtime from 'nodetool
enable/disablebackup' API callback so that newly created keyspaces and
column families use the newly configured incremental backup
configuration.
2016-03-07 10:47:20 +01:00
Tomasz Grabiec
b3e56549ca Merge branch 'dev/issue-909-synchronization-part-v2' from seatar-dev.git
From Vlad:

This series fixes the first part of issue #909 (the second part has a
separate github issue #965) which is a discrepancy between a
storage_service::token_metadata and a gossiper::endpoint_state_map
contents on non-zero shards.
2016-03-07 10:20:15 +01:00
Paweł Dziepak
99b61d3944 lsa: set _active to nullptr in region destructor
In region destructor, after active segments is freed pointer to it is
left unchanged. This confuses the remaining parts of the destructor
logic (namely, removal from region group) which may rely on the
information in region_impl::_active.

In this particular case the problem was that code removing from the
region group called region_impl::occupancy() which was
dereferencing _active if not null.

Fixes #993.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>
2016-03-07 10:15:28 +01:00
Takuya ASADA
9ee14abf24 dist: export sysconfig for scylla-io-setup.service
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-07 18:13:30 +09:00
Takuya ASADA
3d9dc52f5f Revert "Revert "dist: align ami option with others (-a --> --ami)""
This reverts commit 66c5feb9e9.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-07 18:13:30 +09:00
Takuya ASADA
c9882bc2c4 Revert "Revert "Revert "dist: remove AMI entry from sysconfig, since there is no script refering it"""
This reverts commit 643beefc8c.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup
	dist/common/sysconfig/scylla-server

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-07 17:15:42 +09:00
Takuya ASADA
c888eaac74 dist: add /etc/scylla.d/io.conf on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-03-07 17:15:42 +09:00
Vlad Zolotarov
2cd836a02e api::set_storage_service(): fix the 'nodetool enablebackup' API
'nodetool enable/disablebackup' callback was modifying only the
existing keyspaces and column families configurations.
However new keyspaces/column families were using
the original 'incremental_backups' configuration value which could
be different from the value configured by 'nodetool enable/disablebackup'
user command.

This patch updates the database::_enable_incremental_backups per-shard
value in addition to updating the existing keyspaces and column families
configurations.

Fixes #845

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 17:26:31 +02:00
Vlad Zolotarov
a45ecaf336 database: store "incremental backup" configuration value in per-shard instance
Store the "incremental_backups" configuration value in the database
class (and use it when creating a keyspace::config) in order to be
able to modify it in runtime.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 17:22:48 +02:00
Vlad Zolotarov
87e6efcdab storage_service: distribute gossiper::endpoint_state_map together with token_metadata
If storage_service::token_metadata is not distributed together with
gossiper::endpoint_state_map there may be a situation when a non-zero
shard sees a new value in token_metadata (e.g. newly added node's
token ranges) while still seeing an old gossiper::endpoint_state_map
contents (e.g. a mentioned above newly added node may not be present,
thus causing gossiper::is_alive() to return FALSE for that node, while
the node is actually alive and kicking).

To avoid this discrepancy we will always update a token_metadata together
with an endpoint_state_map when we distribute new token_metadata data
among shards.

Fixes #909

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 13:15:19 +02:00
Vlad Zolotarov
3a72ef87f2 gossiper: make _shadow_endpoint_state_map public and rename
We will need to access it from a storage_service class when replicate
token_metadata.

Rename _shadow_endpoint_state_map -> shadow_endpoint_state_map
according to our coding convention.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 11:16:44 +02:00
Vlad Zolotarov
4a21d48cc5 gossiper: use a semaphore instead of a future<> for serializing a timer callback
Use a semaphore to allow serializing with a gossiper's timer callback.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 11:16:44 +02:00
Takuya ASADA
6dc51027a3 log: make log.cc able to compile with g++-4.9
std::put_time() is not implemented on g++-4.9, so replace it with strftime().
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1457024183-893-1-git-send-email-syuu@scylladb.com>
2016-03-04 12:48:43 +01:00
Avi Kivity
6c2e57b003 Merge seastar upstream
* seastar ba615c7...906b562 (1):
  > rpc: prepare some more for feature negotiation
2016-03-03 18:22:57 +02:00
Gleb Natapov
b89b6f442b storage_proxy: fix race between read cl completion and timeout in digest resolver
If timeout happens after cl promise is fulfilled, but before
continuation runs it removes all the data that cl continuation needs
to calculate result. Fix this by calculating result immediately and
returning it in cl promise instead of delaying this work until
continuation runs. This has a nice side effect of simplifying digest
mismatch handling and making it exception free.

Fixes #977.

Message-Id: <1457015870-2106-3-git-send-email-gleb@scylladb.com>
2016-03-03 16:48:28 +02:00
Gleb Natapov
e4ac5157bc storage_proxy: store only one data reply in digest resolver.
Read executor may ask for more than one data reply during digest
resolving stage, but only one result is actually needed to satisfy
a query, so no need to store all of them.

Message-Id: <1457015870-2106-2-git-send-email-gleb@scylladb.com>
2016-03-03 16:47:53 +02:00
Gleb Natapov
69b61b81ce storage_proxy: fix cl achieved condition in digest resolver timeout handler
In digest resolver for cl to be achieved it is not enough to get correct
number of replies, but also to have data reply among them. The condition
in digest timeout does not check that, fortunately we have a variable
that we set to true when cl is achieved, so use it instead.

Message-Id: <1457015870-2106-1-git-send-email-gleb@scylladb.com>
2016-03-03 16:47:11 +02:00
Tomasz Grabiec
2abd62b5cb bytes_ostream: Drop methods which serialize integers
This will make bytes_ostream completely agnostic to serialization
format, which should be determined by layer above it.

Message-Id: <1457004221-8345-2-git-send-email-tgrabiec@scylladb.com>
2016-03-03 13:27:27 +02:00
Tomasz Grabiec
aaac2a3cec serializer: Add missing include
Message-Id: <1457004221-8345-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 13:27:22 +02:00
Pekka Enberg
9c930d88a0 db/system_keyspace: Remove ifdef'd code
We have our implementations of all the three ifdef'd functions.

Message-Id: <1456926917-12594-1-git-send-email-penberg@scylladb.com>
2016-03-03 12:26:50 +02:00
Takuya ASADA
da56325f69 configure.py: add support --static-stdc++ for seastar binaries (iotune)
Ubuntu 14.04LTS package is broken now because iotune does not statically linked against libstdc++, so this patch fixed it.
Requires seastar patch to add --static-stdc++ on configure.py.

Fixes #982

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456995050-22007-1-git-send-email-syuu@scylladb.com>
2016-03-03 12:18:47 +02:00
Avi Kivity
d4c92c7e27 Merge seastar upstream
* seastar b3fc7c5...ba615c7 (1):
  > configure.py: add --static-stdc++ to link libstdc++ statically
2016-03-03 12:18:23 +02:00
Asias He
01cb6b0d42 gossip: Send syn message in parallel and do not wait for it
1) As explained in commit 697b16414a (gossip: Make gossip message
handling async), in each gossip round we can make talking to the 1-3
peer nodes in parallel to reduce latency of gossip round.

2) Gossip syn message uses one way rpc message, but now the returned
future of the one way message is ready only when message is dequeued for
some reason (sent or dropped). If we wait for the one way syn messge to
return it might block the gossip round for a unbounded time. To fix, do
not wait for it in the gossip round. The downside is there will be no
back pressure to bound the syn messages, however since the messages are
once per second, I think it is fine.
Message-Id: <ea4655f121213702b3f58185378bb8899e422dd1.1456991561.git.asias@scylladb.com>
2016-03-03 11:17:50 +02:00
Takuya ASADA
e545013e47 Revert "dist: downgrade g++ to 4.9 on Ubuntu"
This reverts commit 01bd4959ac.

Fixes #983

Conflicts:
	dist/ubuntu/build_deb.sh
	dist/ubuntu/control.in
	dist/ubuntu/rules.in

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456996244-19889-1-git-send-email-syuu@scylladb.com>
2016-03-03 11:12:18 +02:00
Tomasz Grabiec
04f2482d74 schema_tables: Log results of schema merge
Currently schema changes are only logged at coordinator node which
initiates the change. It would be helpful in post morten analysis to
also see when and how schema changes are resolved when applied on
other nodes.
Message-Id: <1456953095-1982-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 11:12:15 +02:00
Nadav Har'El
2cf09147b5 Repair: don't use freeze() to calculate mutation checksums
Use the existing "feed_hash" mechanism to find a checksum of the
content of a mutation, instead of serializing the mutation (with freeze())
and then finding the checksum of that string.

The serialized form is more prone to future changes, and not really
guaranteed to provide equal hashes for mutations which are considered
"equal".

Fixes #971

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1456958676-27121-1-git-send-email-nyh@scylladb.com>
2016-03-03 09:58:24 +01:00
Avi Kivity
bec30ccf25 build: add order-only dependency between building antlr .o and IDL headers
This ensures that if an antlr generated .cpp file depends on an
IDL-generated .hh file, then that .hh is generated before the .o is
built.
2016-03-03 09:52:25 +02:00
Tomasz Grabiec
b42d3a90b3 cql3: create_table_statement: Sort _defined_names by text
Currently they are sorted by address in memory, which breaks the
check for column name duplicates, which assumes sorting by text.

Fixes #975.

Message-Id: <1456937400-20475-1-git-send-email-tgrabiec@scylladb.com>
2016-03-02 18:53:43 +02:00
Avi Kivity
dda77d14b9 Merge seastar upstream
* seastar 9964cbf...b3fc7c5 (2):
  > Introduce util/indirect.hh
  > reactor: new counters for the io queue
2016-03-02 18:52:36 +02:00
Calle Wilund
0c3322befd commitlog: Ensure segment survives whole flush call
Must keep shared pointer alíve.
Likewise though, the shared pointer copy in cycle main continuation
is not needed.

Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>
2016-03-02 18:22:13 +02:00
Calle Wilund
f1c4e3eb3d commitlog: Clear reserve segments in orphan_all
Otherwise they will keep the segment_manager alive (leak).
Fixes jenkins ASan errors.

Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>
2016-03-02 18:22:09 +02:00
Calle Wilund
a556f665c0 commitlog: Take segment_manager locks first in write/flush
While is is formally better to take a local lock first and
then first contend for a global, in this case it is arguably
better to ensure we get a gate exception synchronously (early)
instead of potentially in a continuation. Old version might
cause us to do a gate::leave even while never entered.

And since we should really only have one active (contending)
segment per shard anyway, it should not matter.

Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>
2016-03-02 18:22:05 +02:00
Calle Wilund
e79ca557ed managed_bytes: Change init of small object to silence error on gcc5
Fixes #865

(Some) gcc 5 (5.3.0 for me) on ubuntu will generate errors on
compilation of this code (compiling logalloc_test). The memcpy
to inline storage seems to confuse the compiler.
Simply change to std::copy, which shuts the compiler up.
Any decent stl should convert primitive std::copy to memcpy
anyway, but since it is also the inline (small storage),
it should not matter which way.

Message-Id: <1456931988-5876-4-git-send-email-calle@scylladb.com>
2016-03-02 18:21:51 +02:00
Pekka Enberg
6d7e14a53a Merge "Implement describe_schema_versions" from Paweł
"This series implements describe_schema_versions so that we nodetool
 describecluster can return proper schema information for the whole
 cluster. It involves adding new verb SCHEMA_CHECK which is used to get
 schema version for a given node and a simple map-reduce that using that
 verb gets info from the whole cluster.

 This fixes #677, fixes #684, and fixes #472."
2016-03-02 16:02:53 +02:00
Paweł Dziepak
5396042f06 api: use proper describe_schema_versions implementation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:55 +00:00
Paweł Dziepak
723b3ae7ed storage_service: implement describe_schema_versions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:55 +00:00
Paweł Dziepak
b5eee2e5d4 gms: add inet_address::to_sstring()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:55 +00:00
Paweł Dziepak
ca68c36c8c storage_proxy: handle SCHEMA_CHECK verb
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:54 +00:00
Paweł Dziepak
b92f8a6d2b messaging_service: add SCHEMA_CHECK verb
SCHEMA_CHECK is used to get node schema version.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:54 +00:00
Tomasz Grabiec
9a5d7c6388 log: Prepend log lines with timestamp when printed to stdout
Useful for determining order of events in logs of different nodes, or
for estimating how much time passed between two events.

Fixes #941.

Example log:

INFO  2016-03-01 18:30:37,688 [shard 0] gossip - Waiting for gossip to settle before accepting client requests...
INFO  2016-03-01 18:30:45,689 [shard 0] gossip - No gossip backlog; proceeding
INFO  2016-03-01 18:30:45,689 [shard 0] storage_service - Starting listening for CQL clients on localhost:9042...

Message-Id: <1456853532-28800-1-git-send-email-tgrabiec@scylladb.com>
2016-03-02 13:49:39 +02:00
Avi Kivity
431e1fd379 Merge "Drop db::serializer<>s" from Paweł
"This series removes old-style db::serializer<>s which were replaced by
the IDL-based serialization."
2016-03-02 13:16:36 +02:00
Asias He
a41bcad585 storage_service: Fix run with api lock
Start with coarse control:

1) converting the run_with_write_api_lock operations:

join_ring, start_gossiping, stop_gossiping, start_rpc_server,
stop_rpc_server, start_native_transport, stop_native_transport,
decommission, remove_node, drain, move, rebuild

to use run_with_api_lock which uses a flag to indicate current operation
in progress.

If one of the above operation is in progress when admin issues another
opeartion we return a "try again" exception to avoid running two
operations in parallel.

2) converting the run_with_read_api_lock to use no lock.

Fixes #850.

Message-Id: <00782b601028ed87437e5decae382f72dff634f6.1456758391.git.asias@scylladb.com>
2016-03-02 11:32:02 +02:00
Paweł Dziepak
d50594351b db: remove old-style serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 09:09:30 +00:00
Paweł Dziepak
bdc23ae5b5 remove db/serializer.hh includes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 09:07:09 +00:00
Paweł Dziepak
53858ed9cd keys: remove old-style serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 09:05:25 +00:00
Paweł Dziepak
e1a4b992c5 mutation_partition_serializer: remove read() and read_as_view()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 09:04:02 +00:00
Tomasz Grabiec
4a4d288bba query_pagers: Fix dereference of potentially disengaged _last_ckey optional
Message-Id: <1456855674-1984-3-git-send-email-tgrabiec@scylladb.com>
2016-03-02 10:49:15 +02:00
Tomasz Grabiec
307c7676da to_string: Make std::experimental::optional printable
Message-Id: <1456855674-1984-2-git-send-email-tgrabiec@scylladb.com>
2016-03-02 10:49:14 +02:00
Takuya ASADA
6ae41a71c9 dist: fix initctl start scylla-server failed on Ubuntu
scylla_io_setup is executed via sudo, so we need to add it to sudoers

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456906634-14504-1-git-send-email-syuu@scylladb.com>
2016-03-02 10:36:47 +02:00
Tomasz Grabiec
4279ab40c5 cql_serialization_format: Print version as integer instead of char
Currently prints ^C instead of 3.

Message-Id: <1456856287-3681-1-git-send-email-tgrabiec@scylladb.com>
2016-03-01 20:47:48 +02:00
Tomasz Grabiec
f4a86729f9 query: Move implementaion of result_merger to .cc file
Message-Id: <1456855396-1563-1-git-send-email-tgrabiec@scylladb.com>
2016-03-01 20:06:42 +02:00
Tomasz Grabiec
d0ae2e3607 idl-compiler: Make sub-streams stored in views be properly bounded
Currently when reading a view to an object the stream stored has the
same bound as the containing stream, not the bounds of the object
itself. The serializer of the view assumes that the stream has the
bounds of the object itself.

Fixes dtest failure in
paging_test.py:TestPagingSize.test_undefined_page_size_default

Fixes #963.

Message-Id: <1456854556-32088-1-git-send-email-tgrabiec@scylladb.com>
2016-03-01 19:50:42 +02:00
Calle Wilund
e667dcc3d0 commitlog: Make segment->segment_manager relation shared pointer
The segment->segment_manager pointer has, until now, been a raw pointer,
which in a way is sensible, since making circular shared pointer
relations is in general bad. However, since the code and life cycle
of segments has evolved quite a bit since that initial relation
was defined, becoming both more and then suddenly, in a sense,
less, asynchronous over time, the usage of the relation is in fact
more consistent with a shared pointer, in that a segment needs to
access its manager to properly do things like write and flush.

These two ops in particular depend on accessing the segment manager
in a way that might be fine even using raw pointers, if it was not
again for that little annoying thing of continuation reordering.

So, lets just make the relation a shared pointer, solving the issue
of whether the manager is alive when a segment accesses it. If it
has been "released" (shut down), the existing mechanisms (gate)
will then trigger and prevent any actual _actions_ from taking
place. And we don't have to complicate anything else even more.

Only "big" change is that we need to explicitly orphan all
segments in commitlog destructor (segment_manager is essentially
a p-impl).

This fixes some spurious crashes in nightly unit tests.

Fixes #966.

Message-Id: <1456838735-17108-1-git-send-email-calle@scylladb.com>
2016-03-01 16:48:28 +02:00
Pekka Enberg
3a6d43c784 cql3: Fix duplicate column definition check
We cannot use shared_ptr *instances* for checking duplicate column
definitions because they are never equal. Store column definition name
in the unordered_map instead.

Fixes cql_additional_tests.py:TestCQL.identifier_test.

Spotted by Shlomi.

Message-Id: <1456840506-13941-1-git-send-email-penberg@scylladb.com>
2016-03-01 16:46:33 +02:00
Asias He
50bf65db8d streaming: Fix keep alive timer progress checking
When the first time the keep alive timer fires, the _last_stream_bytes
btyes will be zero since it is the first time we update it. The keep
alive timer will be rearmed and fired again. The second time, we find
there is no progress, we close the session. The total idle time will be
2 * keep alive timer.

To make the idle time to close the session be more precise, we reduce
the interval to check the progess and close the session by checking last
time the progress is made.

Message-Id: <c959cffce0cc738a3d73caaf71d2adb709d46863.1456831616.git.asias@scylladb.com>
2016-03-01 16:46:08 +02:00
Paweł Dziepak
92f9c9428e cql3: don't insert row marker if schema is_cql3_table()
Checking schema::is_dense() is not enough to know whether row marker
should be inserted or not as there may be compact storage tables that
are not considered dense (namely, a table with now clustering key).

Row marker should only be insterted if schema::is_cql3_table() is true.

Fixes #931.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1456834937-1630-1-git-send-email-pdziepak@scylladb.com>
2016-03-01 13:29:53 +01:00
Paweł Dziepak
6a6c12f8c4 tests/commitlog: use unaligned_cast instead of reinterpret_cast
corrupt_segment() is meant to write some garbage at arbitrary position
in the commitlog segment. That position is not necessairly properly
aligned for uint32_t.

Silences ubsan complaints about unaligned write.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1456827726-21288-1-git-send-email-pdziepak@scylladb.com>
2016-03-01 12:57:06 +02:00
Takuya ASADA
fd7eb7d1e5 dist: add support scylla_io_setup for Ubuntu
Unlike CentOS/Fedora, scylla_io_setup is calling from pre-start section of scylla-server upstart job, not from separated job.
This is because Upstart does not provide same behavior as After / Requires directives on systemd.

Fixes #954.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456825805-4195-1-git-send-email-syuu@scylladb.com>
2016-03-01 12:56:44 +02:00
Avi Kivity
e295b9b4e4 Merge 2016-03-01 09:52:04 +02:00
Amnon Heiman
1c7bc28d35 idl-compiler: change optional vector implementation
This patch change the way optional vector are implemented.

Now a vector of optional would be handle like any other non primitive
types, with a single method add() that would return a writer to the
optional.

The writer to the optional would have a skip and write method like
simple optional field.

For basic types the write method would get the value as a parameter, for
composite type, it would return a writer to the type.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1456796143-3366-2-git-send-email-amnon@scylladb.com>
2016-03-01 09:41:30 +02:00
Raphael S. Carvalho
34ed930aa4 sstables: fix lack of accuracy in disk usage report
To report disk usage, scylla was only taking into account size of
sstable data component. Other components such as index and filter
may be relatively big too. Therefore, 'nodetool status' would
report an innacurate disk usage. That can be fixed by taking into
account size of all sstable components.

Fixes #943.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <08453585223570006ac4d25fe5fb909ad6c140a5.1456762244.git.raphaelsc@scylladb.com>
2016-03-01 08:58:42 +02:00
Paweł Dziepak
e194835d8a tests/idl: add test for stdx::optional<> serialization
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1456761055-23916-1-git-send-email-pdziepak@scylladb.com>
2016-02-29 18:12:59 +02:00
Paweł Dziepak
dec63eac6e commitlog: add commitlog entry move constructor
Default move constructor and assignment didn't handle reference to
mutation (_mutation) properly.

Fixes #935.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1456760905-23478-1-git-send-email-pdziepak@scylladb.com>
2016-02-29 18:10:15 +02:00
Calle Wilund
0de8f6d24f cql_test_env: Shutdown auth on test stop
Ensures no spurious timer tasks tries to touch stopped distributed
objects.

Message-Id: <1456753987-6914-4-git-send-email-calle@scylladb.com>
2016-02-29 16:06:33 +02:00
Calle Wilund
fafcb8cc1e storage_service: Explicitly shutdown "auth" on system drain
I.e. cancels auth object setup tasks if not already run.

Message-Id: <1456753987-6914-3-git-send-email-calle@scylladb.com>
2016-02-29 16:06:30 +02:00
Calle Wilund
2ba738b555 auth: make scheduled tasks explicity cancellable
Adds a shutdown method. In this, explicitly cancels all waiting tasks
(all two!).

Message-Id: <1456753987-6914-2-git-send-email-calle@scylladb.com>
2016-02-29 16:06:25 +02:00
Calle Wilund
dc136a6a1c commitlog: Fix reserve counter overflow
Fixes #482

See code comment. Reserve segment allocation count sum can temporarily
overflow due to continuation delay/reordering, if we manage to reach the
on_timer code before finally clauses from previous reserve allocation
invocation has processed. However, since these are benign overflows
(just indicating even more that we don't need to do anything right now)
simply capping the count should be fine.
Avoids assert in boost irange.

Message-Id: <1456740679-4537-1-git-send-email-calle@scylladb.com>
2016-02-29 14:56:24 +02:00
Avi Kivity
5cc1b39cc9 Merge "Store gossip generation in system table" from Asias
"Kill one FIXME."
2016-02-29 14:53:06 +02:00
Avi Kivity
0bababedc3 Merge "Fix scylla-io-setup.service" from Takuya
"This patchset fixes #950, run scylla-io-setup before scylla-server on anycase, and installs example /etc/scylla.d/io.conf by default to prevent error on 'EnvironmentFile=/etc/scylla.d/*.conf'."
2016-02-29 14:14:42 +02:00
Takuya ASADA
6e55ed96d6 dist: add user-defined prefix for AMI name
With this change, you can define your own prefix of AMI name in variable.json.

example:
{
	"access_key": "xxx",
	"secret_key": "xxx",
	"subnet_id": "xxx",
	"security_group_id": "xxx",
	"region": "us-east-1",
	"associate_public_ip_address": "true",
	"instance_type": "c4.xlarge",
	"ami_prefix": "takuya-"
}

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456329247-5109-1-git-send-email-syuu@scylladb.com>
2016-02-29 13:49:11 +02:00
Takuya ASADA
48d72c01d1 dist: don't run iotune on developer mode 2016-02-29 20:13:46 +09:00
Takuya ASADA
e8a107de43 dist: install example io.conf by default
Prevent error on 'EnvironmentFile=/etc/scylla.d/*.conf'.
Parameters are commented out, and the file will replace when scylla starts, by scylla-io-setup.service.
2016-02-29 20:12:54 +09:00
Avi Kivity
69fdbf6a6e Merge "Use IDL for query results" from Tomek
"The series includes Amnon's unmerged support for optional<> in idl-compiler.

Depends on seastar patch "[PATCH seastar] simple_input_stream: Introduce begin()".

The query result footprint for cassandra-stress mutation as reported
by tests/memory-footprint increased by 18% from 285 B to 337 B.

perf_simple_query shows slight regression in throughput (-8%):

  build/release/tests/perf/perf_simple_query -c4 -m1G --partitions 100000

Before: ~433k tps
After:  ~400k tps"
2016-02-29 12:52:44 +02:00
Takuya ASADA
a281b10210 dist: run scylla-io-setup.service before scylla-server.service on anycase
We needed "systemctl enable scylla-io-setup.service; systemctl start scylla-server.service" to launch scylla-io-setup.service before scylla-server.service.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-29 18:59:10 +09:00
Takuya ASADA
11a616d4d6 dist: add scyllaio-setup.service for %systemd_post and %systemd_preun on .rpm
This is needed for initialize the service correctly

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-29 18:59:10 +09:00
Pekka Enberg
3919878a32 service/storage_service: Use logger for CQL listening report
Message-Id: <1456739417-11909-1-git-send-email-penberg@scylladb.com>
2016-02-29 11:52:06 +02:00
Avi Kivity
a1ff21f6ea main: sanity check cpu support
We require SSE 4.2 (for commitlog CRC32), verify it exists early and bail
out if it does not.

We need to check early, because the compiler may use newer instructions
in the generated code; the earlier we check, the lower the probability
we hit an undefined opcode exception.

Message-Id: <1456665401-18252-1-git-send-email-avi@scylladb.com>
2016-02-29 11:41:54 +02:00
Asias He
cc1e1a567c storage_service: Make replace-node error msg more friendly
Before:

ERROR [shard 0] storage_service - Format of host-id =
marshal_exception (marshalling error) is incorrect ???
Exiting on unhandled exception of type 'marshal_exception': marshalling error

After:

ERROR [shard 0] storage_service - Unable to parse 127.0.0.3 as host-id
Exiting on unhandled exception of type 'std::runtime_error': Unable to
parse 127.0.0.3 as host-id

Message-Id: <1456737987-32353-1-git-send-email-asias@scylladb.com>
2016-02-29 11:40:13 +02:00
Asias He
e36a99ef23 storage_service: Do not take api lock for get_load_map
It is used by

nodetool status

If an api operation inside storage_service takes a long time to finish
, which holds the lock, it will block nodetool status for a long time.

I think it is safe to get the load map even if other operations are in-flight.

Refs: #850

Message-Id: <1456737987-32353-2-git-send-email-asias@scylladb.com>
2016-02-29 11:39:09 +02:00
Takuya ASADA
84447fd7b0 dist: permission error fixed on scylla_io_setup
Can't be scylla user since we are writing to /etc

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456330209-5828-1-git-send-email-syuu@scylladb.com>
2016-02-29 11:35:38 +02:00
Asias He
1061bf0854 storage_service: Use increment_and_get_generation to get generation
The gossip generation is now stored in system.local table.

Before:
cqlsh> SELECT gossip_generation from system.local;

 gossip_generation
-------------------
              null

(1 rows)

After:
cqlsh> SELECT gossip_generation from system.local;

 gossip_generation
-------------------
        1456733559

(1 rows)
2016-02-29 16:31:42 +08:00
Asias He
abafec99a5 system_keyspace: Implement increment_and_get_generation 2016-02-29 16:31:42 +08:00
Gleb Natapov
22d2b9a2dc Yield execution in mutation_result_merger
mutation_result_merger::get can run for a long time. Make it yield
execution from time to time.

Message-Id: <1456674046-14502-1-git-send-email-gleb@scylladb.com>
2016-02-28 17:55:33 +02:00
Avi Kivity
182e6eb89b Merge seastar upstream
* seastar fbb4b01...9964cbf (4):
  > Allow map_reduce reducer to return future
  > Workaround for gcc 4.9 optional bug
  > add convert() to future<> futurizer specification
  > tests: fix rpc_test build
2016-02-28 17:55:03 +02:00
Gleb Natapov
32e9f1ecd4 Fix read_timeouts storage_proxy counter
Read timeouts are not counted now. The patch fixes it.

Message-Id: <20160228133315.GN6705@scylladb.com>
2016-02-28 15:34:42 +02:00
Avi Kivity
31b42a2574 Merge seastar upstream
* seastar 769cb8b...fbb4b01 (5):
  > simple_input_stream: Introduce begin()
  > tests: add rpc unit testing
  > tests: add loopback sockets
  > packet: introduce release() and release_into()
  > temporary_buffer: add make_copy() named constructor
2016-02-28 15:32:38 +02:00
Yoav Kleinberger
651aa06c32 tools/scyllatop: fix mistake in help message
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <ae5f11d7df954dc7561db94cda2d73bd8233f1e5.1456510513.git.yoav@scylladb.com>
2016-02-28 12:42:12 +02:00
Avi Kivity
ed365c2779 Merge "Fix row_cache::update()" from Tomasz
"Fixes recent regression in row_cache_test.cc:test_update_failure"
2016-02-28 11:17:32 +02:00
Pekka Enberg
81bc5dab77 Merge "streaming progress info fix" from Asias
"This series:

1) Log total bytes sent/recevied when a stream plan completes.
It is useful in test code.

2) Fix http://scylla_ip:10000/stream_manager API"
2016-02-27 16:13:04 +02:00
Tomasz Grabiec
3997421b2c row_cache: Let the cleanup guard do invalidation of unmerged partitions 2016-02-26 16:57:31 +01:00
Tomasz Grabiec
aa15268249 row_cache: Delete the entry even if invalidation failed
Otherwise we will leak it, and region destructor will fail:

row_cache_test: utils/logalloc.cc:1211: virtual logalloc::region_impl::~region_impl(): Assertion `seg->is_empty()' failed.

Fixes regression in row_cache_test.
2016-02-26 16:57:31 +01:00
Tomasz Grabiec
be24816c8a row_cache: Clear partitions with region locked
Since invalidate() may allocate, we need to take the region lock to
keep m.partitions references valid around whole clear_and_dispose(),
which relies on that.
2016-02-26 16:57:31 +01:00
Tomasz Grabiec
6cec131432 query: Switch to IDL-generated views and writers
The query result footprint for cassandra-stress mutation as reported
by tests/memory-footprint increased by 18% from 285 B to 337 B.

perf_simple_query shows slight regression in throughput (-8%):

  build/release/tests/perf/perf_simple_query -c4 -m1G --partitions 100000

Before: ~433k tps
After:  ~400k tps
2016-02-26 12:26:13 +01:00
Tomasz Grabiec
ee8509cf36 idl-compiler: Introduce add(*_view) on vector 2016-02-26 12:26:13 +01:00
Tomasz Grabiec
1ecf9a7427 query: result_view: Introduce do_with()
Encapsulates linearization. Abstracts away the fact that result_view
can't work with discontiguous storage yet.
2016-02-26 12:26:13 +01:00
Tomasz Grabiec
135c1fa306 tests: memory_footprint: Report size in query results 2016-02-26 12:26:13 +01:00
Tomasz Grabiec
6c89e3d2ea serializer: Fix wrong size_type being serialized into the placeholder 2016-02-26 12:26:13 +01:00
Tomasz Grabiec
4ab0ca07f1 idl-compiler: Catch un-closed frame errors sooner
By initilizing them to 0 we can catch unclosed frames at
deserialization time. It's better than leaving frame size undefined,
which may cause errors much later in deserialization process and thus
would make it harder to identifiy the real cause.
2016-02-26 12:26:13 +01:00
Tomasz Grabiec
697d9bfa56 serializer: Introduce as_input_stream(bytes_view) 2016-02-26 12:26:13 +01:00
Tomasz Grabiec
85fb4eba32 Add missing includes 2016-02-26 12:26:13 +01:00
Tomasz Grabiec
4284715ddf Relax includes 2016-02-26 12:26:13 +01:00
Amnon Heiman
9ea3ffe527 idl-compiler: Add optional support
This patch adds optional writer support an optional field can be either
skip or set.

For vector of optional, a write_empty method will
add 1 to the vector count and mark the optional as false.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-26 12:25:08 +01:00
Asias He
fd5f3cff47 streaming: Fix stream_manager progress api
For each stream_session, we pretend we are sending/receiving one file,
to make it compatible with nodetool. For receiving_files, the file name
is "rxnofile". For sending_files, the file name is "txnofile".

stream_manager::update_all_progress_info is introduced to update the
progress info of all the stream_sessions in the node. We need this
because streaming mutations are received on all the cores, but the
stream_session object is only on one of the cores. It adds overhead if
we update progress info in stream_session object whenever we receive a
streaming mutation. So, what we do now is when we really need the
progress info, we update the progress info in stream_session object.

With http://127.0.0.$i:10000/stream_manager/, it looks like below when
decommission node 3 in a 3 nodes cluster.

=========== GET NODE 1
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.3", "current_bytes": 16876296}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]

=========== GET NODE 2

[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16755552, "peer": "127.0.0.3", "current_bytes": 16755552}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]

=========== GET NODE 3
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"sending_files": [{"value": {"direction":
"OUT", "file_name": "txnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.1", "current_bytes": 16876296}, "key":
"txnofile"}], "sending_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.1", "peer":
"127.0.0.1"},{"sending_files": [{"value": {"direction": "OUT",
"file_name": "txnofile", "session_index": 0, "total_bytes": 16755552,
"peer": "127.0.0.2", "current_bytes": 16755552}, "key": "txnofile"}],
"sending_summaries": [{"files": 1, "total_size": 0, "cf_id":
"869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state":
"PREPARING", "connecting": "127.0.0.2", "peer": "127.0.0.2"}]}]
2016-02-26 17:38:37 +08:00
Asias He
37f52d632f streaming: Remove unused progress() function 2016-02-26 17:38:37 +08:00
Asias He
8060b97d67 streaming: Log number of bytes sent and recevied when stream_plan completes
It is useful for test code to verify number of bytes sent/received.

It looks like below in the log.

/tmp/out1:INFO  [shard 0] stream_session - \
[Stream #1f3e23f0-db9e-11e5-9cfb-000000000000] bytes_sent = 0, bytes_received = 15760704

/tmp/out2:INFO  [shard 0] stream_session - \
[Stream #1f3e23f0-db9e-11e5-9cfb-000000000000] bytes_sent = 0, bytes_received = 18203964

/tmp/out3:INFO  [shard 0] stream_session - \
[Stream #1f3e23f0-db9e-11e5-9cfb-000000000000] bytes_sent = 33964668, bytes_received = 0
2016-02-26 17:38:37 +08:00
Asias He
9dede89e07 streaming: Add get_progress_on_all_shards for plan_id
Get stream_bytes for a specific plan_id.
2016-02-26 17:38:37 +08:00
Tomasz Grabiec
97558b2cfe idl-compiler: Put serializers inside template class specializations
The problem is that a generic functions (eg. skip()) which call
deserialize() overloads based on their template parameter only see
deserilize() overloads which were declared at the time skip() was
declared and not those which are available at the time of
instantiation. This forces all serializers to be declared before
serialization_visitors.hh is first included. Serializers included
later will fail to compile. This becomes problematic to ensure when
serializers are included from headers.

Template class specialization lookup doesn't suffer from this
limitation. We can use that to solve the problem. The IDL compiler
will now generate template class specializations with read/write
static methods. In addition to that, default serializer() and
deserialize() implementations are delegating to serializer<>
specialization so that API and existing code doesn't have to change.

Message-Id: <1456423066-6979-1-git-send-email-tgrabiec@scylladb.com>
2016-02-25 20:00:49 +02:00
Takuya ASADA
aa3f6ad462 dist: add scyllatop on .rpm/.deb
Fixes #933

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456420768-15921-1-git-send-email-syuu@scylladb.com>
2016-02-25 19:24:11 +02:00
Avi Kivity
a74f68eeb2 Merge "Properly tag readers" from Glauber
"Gleb has recently noted that our query reads are not even being registered
with the I/O queue.

Investigating what is happening, I found out that while the priority that
make_reader receives was not being properly passed downwards to the SSTable
reader. The reader code is also used by compaction class, and that one is fine.
But the CQL reads are not.

On top of that, there are also some other places where the tag was not properly
propagated, and those are patched."
2016-02-25 18:35:58 +02:00
Raphael S. Carvalho
fc4cbcde72 Revert "Revert "database: Fix use and assumptions about pending compations""
This reverts commit a4d92750eb.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <8a405e7c1daf94c4d70d8084f59ce7205d56fe52.1456415398.git.raphaelsc@scylladb.com>
2016-02-25 18:02:01 +02:00
Raphael S. Carvalho
7f0371129c tests: sstable_test: submit compaction request through column family
That's needed for reverted commit 9586793c to work. It's also the
correct thing to do, i.e. column family submits itself to manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2a1d141ad929c1957933f57412083dd52af0390b.1456415398.git.raphaelsc@scylladb.com>
2016-02-25 18:02:00 +02:00
Avi Kivity
c269527f42 Merge "Get rid of assert in gossip and storage_service" from Asias
"Make the error handling more robust."
2016-02-25 17:38:21 +02:00
Pekka Enberg
a4d92750eb Revert "database: Fix use and assumptions about pending compations"
This reverts commit 9586793c70. It breaks
sstable_test as follows:

  [penberg@nero scylla]$ build/release/tests/sstable_test --smp 1
  Running 81 test cases...
  INFO  [shard 0] compaction_manager - Asked to stop
  INFO  [shard 0] compaction_manager - Stopped
  sstable_test: database.cc:878: future<> column_family::run_compaction(sstables::compaction_descriptor): Assertion `_stats.pending_compactions > 0' failed.
  unknown location(0): fatal error in "compaction_manager_test": signal: SIGABRT (application abort requested)
  tests/sstable_datafile_test.cc(1023): last checkpoint
2016-02-25 15:28:06 +02:00
Asias He
32eaaecf36 gossip: Get rid of assert
Log the error and throw the exception, instead of abort the whole
process. Make the code more robust.
2016-02-25 21:19:52 +08:00
Asias He
699fd25467 storage_service: Get rid of assert
We can recover from most of the errors. Log the error and throw the
exception, instead of abort the whole process. Make the code more
robust.
2016-02-25 21:19:52 +08:00
Asias He
59564591d5 storage_service: Use get_gossip_status to get status
The help is introduced recently, use it. Avoid to open code it.
2016-02-25 21:19:52 +08:00
Pekka Enberg
8e2c924de3 cql3: Fix quadratic behavior in update_statement::parsed_insert::prepare_internal()
This fixes a quadratic search for duplicate columns in prepare_internal().

Refs #822.

Message-Id: <1456405104-16482-1-git-send-email-penberg@scylladb.com>
2016-02-25 15:06:56 +02:00
Yoav Kleinberger
872079d999 tools/scyllatop: correct mistake in help text
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <01844d90f2d942a051d128b03ae12578ac0bb69c.1456324697.git.yoav@scylladb.com>
2016-02-25 12:49:48 +02:00
Asias He
94cb7f22d4 gossip: Make add_local_application_state safe to call on any cpu
add_local_application_state is used in various places. Before this
patch, it can only be called on cpu zero. To make it safer to use, use
invoke_on() to foward the code to run on cpu zero, so that caller can
call it on any cpu.

Refs: #795
Message-Id: <d69b81c5561622078dbe887d87209c4ea2e3bf46.1456315043.git.asias@scylladb.com>
2016-02-25 12:45:54 +02:00
Asias He
4e931c2453 gossip: Log the error when fails to add local application state
Gleb saw once:

scylla: gms/gossiper.cc:1393:
gms::gossiper::add_local_application_state(gms::application_state,
gms::versioned_value):: mutable: Assertion
`endpoint_state_map.count(ep_addr)' failed.

The assert is about we can not find the entry in endpoint_state_map of
the node itself. I can not really find any place we could call
add_local_application_state before we call gossiper::start_gossiping()
where it inserts broadcast address into endpoint_state_map.

I can not reproduce issue, let's log the error so we can narrow down
which application state triggered the assert.

Refs: #795
Message-Id: <f4433be0a0d4f23470a5e24e528afdb67b74c7ef.1456315043.git.asias@scylladb.com>
2016-02-25 12:45:17 +02:00
Takuya ASADA
b250a3b116 dist: Add collectd configuration support on .rpm/.deb
Depends on collectd, add /etc/collectd.d/scylla.conf on scylla-server package installation.
Fixes #946

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456336200-11876-1-git-send-email-syuu@scylladb.com>
2016-02-25 10:35:47 +02:00
Takuya ASADA
28dd202613 scyllatop: add --logfile argument to specify path to log file
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456333116-7389-2-git-send-email-syuu@scylladb.com>
2016-02-25 10:33:41 +02:00
Takuya ASADA
af3a8ead21 scyllatop: output error message both on log file and stdout
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456333116-7389-1-git-send-email-syuu@scylladb.com>
2016-02-25 10:33:40 +02:00
Calle Wilund
9586793c70 database: Fix use and assumptions about pending compations
Fixes #934 - faulty assert in discard_sstables

run_with_compaction_disabled clears out a CF from compaction
mananger queue. discard_sstables wants to assert on this, but looks
at the wrong counters.

pending_compactions is an indicator on how much interested parties
want a CF compacted (again and again). It should not be considered
an indicator of compactions actually being done.

This modifies the usage slightly so that:
1.) The counter is always incremented, even if compaction is disallowed.
    The counters value on end of run_with_compaction_disabled is then
    instead used as an indicator as to whether a compaction should be
    re-triggered. (If compactions finished, it will be zero)
2.) Document the use and purpose of the pending counter, and add
    method to re-add CF to compaction for r_w_c_d above.
3.) discard_sstables now asserts on the right things.

Message-Id: <1456332824-23349-1-git-send-email-calle@scylladb.com>
2016-02-25 08:57:04 +02:00
Glauber Costa
6f1d0dce00 mutation_query: attach the query priority read when reading mutations
We call a mutation source during the query path without any consideration
for attaching a priority. This is incorrect, and queries called through this
facility will end up in the default class.

Fix this by attaching the query priority class here.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-02-24 18:00:34 -05:00
Glauber Costa
336babfcb8 database: add a priority class to a few SSTable readers
Not all SSTable readers will end up getting the right tag for a priority
class. In particular, the range reader, also used for the memtables complete
ignores any priority class.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-02-24 18:00:34 -05:00
Glauber Costa
2816bc6fed database: use a reference instead of a pointer to store the priority classes
We will always initialize it, so don't use a pointer.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-02-24 18:00:34 -05:00
Glauber Costa
80ab41a715 memtable reader: also include a priority class
There are situations when a memtable is already flushed but the memtable
reader will continue to be in place, relaying reads to the underlying
table.

For that reason, the "memtables don't need a priority class" argument
gets obviously broken. We need to pass a priority class for its reader
as well.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-02-24 18:00:34 -05:00
Calle Wilund
590ec1674b truncate: Require timestamp join-function to ensure equal values
Fixes #937

In fixing #884, truncation not truncating memtables properly,
time stamping in truncate was made shard-local. This however
breaks the snapshot logic, since for all shards in a truncate,
the sstables should snapshot to the same location.

This patch adds a required function argument to truncate (and
by extension drop_column_family) that produces a time stamp in
a "join" fashion (i.e. same on all shards), and utilizes the
joinpoint type in caller to do so.

Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>
2016-02-24 18:59:31 +02:00
Calle Wilund
43ea1f5945 utils::jointpoint: Helper type to generate a singular value for all shards
Lets operations working on all shards "join" and acquire
the same value of something, with that value being based on
whenever all shards reach the join.

Obvious use case: time stamp after one set of per-shard ops, but
before final ones.

The generation of the value is guaranteed to happen on the shards
that created the join point.

Based on the join-ops in CF::snapshot, but abstracted and made
caller responsibility. Primary use case is to help deal with
the join-problem of truncation.

Message-Id: <1456332856-23395-1-git-send-email-calle@scylladb.com>
2016-02-24 18:59:25 +02:00
Yoav Kleinberger
c3ce9e53cb tools/scyllatop: support glob patterns to specifiy metrics
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <42f84cdeeb75c3719230028a13a1dd8499673d4c.1456319441.git.yoav@scylladb.com>
2016-02-24 15:35:45 +02:00
Raphael S. Carvalho
bb48f1b06c sstables: use system clock's epoch for timestamp in compaction history
As pointed out by Tomek, the type of column used is timestamp, therefore
system's clock epoch (db_clock) should be used instead.

Fixes #817.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <f80f9f411d673cf2d653e193ccb8ebaa36bc891b.1456317766.git.raphaelsc@scylladb.com>
2016-02-24 14:49:21 +02:00
Pekka Enberg
dfcc48d82a transport: Add result metadata to PREPARED message
The gocql driver assumes that there's a result metadata section in the
PREPARED message. Technically, Scylla is not at fault here as the CQL
specification explicitly states in Section 4.2.5.4. ("Prepared") that the
section may be empty:

   - <result_metadata> is defined exactly as <metadata> but correspond to the
      metadata for the resultSet that execute this query will yield. Note that
      <result_metadata> may be empty (have the No_metadata flag and 0 columns, See
      section 4.2.5.2) and will be for any query that is not a Select. There is
      in fact never a guarantee that this will non-empty so client should protect
      themselves accordingly. The presence of this information is an

However, Cassandra always populates the section so lets do that as well.

Fixes #912.

Message-Id: <1456317082-31688-1-git-send-email-penberg@scylladb.com>
2016-02-24 14:43:24 +02:00
Avi Kivity
fedba9d6cd Merge "reduce gossip round latency" from Asias
"This series makes gossip message handling to be async to reduce gossip round
latency. Commit log of patch 3 explains the issue in detail.

Refs: #900"
2016-02-24 13:44:06 +02:00
Avi Kivity
b42a32efc7 Update scylla-ami submodule
* dist/ami/files/scylla-ami 398b1aa...d4a0e18 (3):
  > Sort service running order (scylla-ami-setup.service -> scylla-io-setup.service -> scylla-server.service)
  > Drop --ami and --disk-count parameters
  > dist: pass the number of disks to set io params
2016-02-24 13:38:05 +02:00
Avi Kivity
cda29c0324 Merge seastar upstream
* seastar 8c560f2...769cb8b (4):
  > temporary_buffer: make operator bool explicit (and const)
  > iotune: use SEASTAR_IO instead of SCYLLA_IO
  > iotune: add --format option, to use EnvironmentFile on systemd
  > sstring: add data() methods
2016-02-24 13:38:05 +02:00
Avi Kivity
efabb1a1d8 commitlog: fix buffer size calculation
We were adding bool(buffer), instead of buffer.size(); exposed by making
temporary_buffer::operator bool explicit.
2016-02-24 13:38:05 +02:00
Asias He
697b16414a gossip: Make gossip message handling async
In each gossip round, i.e., gossiper::run(), we do:

1) send syn message
2)                           peer node: receive syn message, send back ack message
3) process ack message in handle_ack_msg
   apply_state_locally
     mark_alive
       send_gossip_echo
     handle_major_state_change
       on_restart
       mark_alive
         send_gossip_echo
       mark_dead
         on_dead
       on_join
     apply_new_states
       do_on_change_notifications
          on_change
4) send back ack2 message
5)                            peer node: process ack2 message
   			      apply_state_locally

At the moment, syn is "wait" message, it times out in 3 seconds. In step
3, all the registered gossip callbacks are called which might take
significant amount of time to complete.

In order to reduce the gossip round latency, we make syn "no-wait" and
do not run the handle_ack_msg insdie the gossip::run(). As a result, we
will not get a ack message as the return value of a syn message any
more, so a GOSSIP_DIGEST_ACK message verb is introduced.

With this patch, the gossip message exchange is now async. It is useful
when some nodes are down in the cluster. We will not delay the gossip
round, which is supposed to run every second, 3*n seconds (n = 1-3,
since it talks to 1-3 peer nodes in each gossip round) or even
longer (considering the time to run gossip callbacks).

Later, we can make talking to the 1-3 peer nodes in parallel to reduce
latency even more.

Refs: #900
2016-02-24 19:33:39 +08:00
Asias He
63df54b368 messaging_service: Add GOSSIP_DIGEST_ACK
We will soon switch to use no-wait message for gossip. GOSSIP_DIGEST_SYN
will no longer return GOSSIP_DIGEST_ACK message. So we need a standalone
verb for GOSSIP_DIGEST_ACK.
2016-02-24 19:31:14 +08:00
Asias He
022c7e50a1 failure_detector: Fix false alarm of "Not marking nodes down due to local pause of"
The problem is we initialize _last_interpret when failure_detector
object is constructed. When interpret() runs for the first time, the
_last_interpret value is not the last time we run interpret() but the
time we initialize failure_detector object.

Fix by initializing _last_interpret inside interpret().

[Thu Feb 18 02:40:04 2016] INFO  [shard 0] storage_service - Node 127.0.0.1 state jump to normal
[Thu Feb 18 02:40:04 2016] INFO  [shard 0] storage_service - NORMAL: node is now in normal status
[Thu Feb 18 02:40:04 2016] INFO  [shard 0] gossip - Waiting for gossip to settle before accepting client requests...
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - No gossip backlog; proceeding
Starting listening for CQL clients on 127.0.0.1:9042...
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - Node 127.0.0.2 is now part of the cluster
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - InetAddress 127.0.0.2 is now UP
[Thu Feb 18 02:40:13 2016] INFO  [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2
[Thu Feb 18 02:40:13 2016] WARN  [shard 0] failure_detector - Not marking nodes down due to local pause of 9091 > 5000 (milliseconds)
2016-02-24 19:31:14 +08:00
Avi Kivity
e993102cb5 Merge "introduce scylla-io-setup.service" from Takuya
"Add scylla-io-setup.service to configure max-io-requests and num-io-queues on first boot.
Moved SCYLLA_IO configuration code from scylla_sysconfig_setup to scylla-io-setup.service, revert commits related it.
On scylla-io-setup.service, autodetect Amazon EC2 instead of using AMI variable on sysconfig."
2016-02-24 10:13:23 +02:00
Takuya ASADA
c4035a0a13 dist: add comment about /etc/scylla.d/io.conf on sysconfig 2016-02-24 04:00:52 +09:00
Takuya ASADA
0f20abb365 Revert "dist: introduce SCYLLA_IO"
This reverts commit 5cae2560a3.

Conflicts:
	dist/common/sysconfig/scylla-server
2016-02-24 03:46:14 +09:00
Takuya ASADA
b79a1a77da Revert "dist: update SCYLLA_IO with params for AMI"
This reverts commit 5494135ddd.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup
2016-02-24 03:45:11 +09:00
Takuya ASADA
643beefc8c Revert "Revert "dist: remove AMI entry from sysconfig, since there is no script refering it""
This reverts commit 21e6720988.
2016-02-24 03:33:50 +09:00
Takuya ASADA
66c5feb9e9 Revert "dist: align ami option with others (-a --> --ami)"
This reverts commit 312f1c9d98.
2016-02-24 03:33:41 +09:00
Takuya ASADA
a9926f1cea dist: introduce scylla-io-setup.service to setup io parameters on first startup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-24 03:33:03 +09:00
Tomasz Grabiec
79bcb5a616 tests: Fix build of memory_footprint 2016-02-23 19:12:54 +01:00
Amnon Heiman
f461ebc411 idl-compiler: Add pos and rollback to serialize vector
This adds the ability to store a position of a serialized vector and to
rollback to that stored position afterwards.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1456041750-1505-3-git-send-email-amnon@scylladb.com>
2016-02-23 17:49:51 +01:00
Amnon Heiman
ea97e07ed7 serialization_visitors: Adding vector_position struct
While serialization vector it is sometimes required to rollback some of
the serialized elements.

vector_position is the equivalent to the bytes_ostream position struct.
It holds information about the current position in a serialized vector,
the position in the bufffer and the current number of elements
serialized.

It will allow to rollback to the current point.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1456041750-1505-2-git-send-email-amnon@scylladb.com>
2016-02-23 17:49:51 +01:00
Tomasz Grabiec
f72fd9eefd Merge branch 'pdziepak/canonical-mutation-idl/v1' from sesastar-dev.git 2016-02-23 17:02:43 +01:00
Tomasz Grabiec
995b638d96 mutation_partition_visitor: Fix crash for large blobs
Fixes #927.

The new visiting code builds cell instances using
atomic_cell::make_*() factory methods, which won't work in LSA context
because they depend on managed_bytes storage to be linearized. It may
not be since large blob support. This worked before because we created
cells from views before which works in all contexts.

Fix by constructing them in standard allocator context.

Message-Id: <1456234064-13608-2-git-send-email-tgrabiec@scylladb.com>
2016-02-23 16:41:39 +02:00
Tomasz Grabiec
33cf65c2aa mutation_partition_view: Fix use-after-move on visitor instance
The line:

  boost::apply_visitor(atomic_cell_or_collection_visitor(std::move(visitor), id, col), cell);

is executed in a loop, so visitor could be used after being
moved-from. This may not always be allowed for some visitors. Also,
vistors may keep state, which should be preserved for the whole
visitation.

This doesn't fix any issue right now.

Message-Id: <1456234064-13608-1-git-send-email-tgrabiec@scylladb.com>
2016-02-23 16:41:39 +02:00
Yoav Kleinberger
f822359d96 bugfix: fixed broken --print-config option
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <57b452106cdcd9ceb09da4c63781650cefe48040.1456234464.git.yoav@scylladb.com>
2016-02-23 15:35:44 +02:00
Asias He
f7fccc6efb locator: Fix get token from a range<token>
With a range{t1, t2}, if t2 == {}, the range.end() will contain no
value. Fix getting t2 in this case.

Fixes #911.
Message-Id: <4462e499d706d275c03b116c4645e8aaee7821e1.1456128310.git.asias@scylladb.com>
2016-02-23 14:29:26 +01:00
Pekka Enberg
4a4074ad21 tools/scyllatop: Sort metrics by name
This makes the output much easier to read, especially if you have tons
of metrics specified.

Message-Id: <1456230377-3149-1-git-send-email-penberg@scylladb.com>
2016-02-23 14:35:57 +02:00
Takuya ASADA
0f87922aa6 main: notify service start completion ealier, to reduce systemd unit startup time
Fixes #910

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455830245-11782-1-git-send-email-syuu@scylladb.com>
2016-02-23 14:33:16 +02:00
Pekka Enberg
1f6cac8839 tools/scyllatop: Use 'erase' to clear the screen
The 'clear' function explicitly clears the screen and repaints it which
causes really annoying flicker. Use 'erase' to make scyllatop more
pleasant on the eyes.

Message-Id: <1456229348-2194-1-git-send-email-penberg@scylladb.com>
2016-02-23 14:12:48 +02:00
Tomasz Grabiec
2b5253927f test.py: Print output on timeout as well
It is often the case that the there is useful debugging information
printed by the test before it hangs. It is annoying to see just "TIMED
OUT" in jenkins. Print the output always when it is available.

In addition to that, we should not interpret all exceptions thrown
from communicate() as timeouts. For example, currently ^C sent to the
script misleadingly results in "TIMED OUT" to be printed.
Message-Id: <1456174992-21909-1-git-send-email-tgrabiec@scylladb.com>
2016-02-23 13:41:11 +02:00
Pekka Enberg
78c6fdf429 cql3/functions: Fix is_pure() for native scalar functions
Every native scalar function is already tagged whether they're pure or
not but because we don't implement the is_pure() function, all functions
end up being advertised as pure. This means that functions like now()
that are *not* pure, end up being evaluated only once.

Fixes #571.
Message-Id: <1456227171-461-1-git-send-email-penberg@scylladb.com>
2016-02-23 12:37:32 +01:00
Yoav Kleinberger
74fbc62129 ScyllaTop: top-like tool to see live scylla metrics
requires a local collectd configured with the unix-sock plugin,
use the --help option for more. Run it with:

        $ scyllatop.py --help

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <bd3f8c7e120996fc464f41f60130c82e3fb55ac6.1456164703.git.yoav@scylladb.com>
2016-02-23 12:32:47 +02:00
Avi Kivity
8ba474f1c9 Merge "Drop empty partitions from mutation query results" from Tomasz 2016-02-23 11:18:47 +02:00
Tomasz Grabiec
c591157755 tests: mutation_query: Add test for dropping partitions with expired tombstones 2016-02-22 20:23:29 +01:00
Tomasz Grabiec
41d475d9c0 schema_builder: Fluentize property setters 2016-02-22 20:23:29 +01:00
Tomasz Grabiec
6fdaf110d6 mutation_query: Don't include empty partitions
In same cases we may have a lot of empty partitions whose tombstones
expired, and there is no point in including them in the results.

This was found to cause performance issues for workloads using batch
updates. system.batchlog table would accumulate a lot of deletes over
time. It has gc_grace_seconds set to 0 so most of the tombstones would
be expired. mutation queries done by batchlog manager were still
returning all partitions present in memtables which caused mutation
queries result to be inflated. This in turn was causing
mutation_result_merger to take a long time to process them.
2016-02-22 20:21:23 +01:00
Pekka Enberg
4ff1692248 cql3: Make 'CREATE TYPE' error message human readable
We don't support the 'CREATE TYPE' statement for now. The user-visible
error message, however, is unreadable because our CQL parser doesn't
even recognize the statement.

  cqlsh:ks1> CREATE TYPE config (url text);
  SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message=" : cannot match to any predicted input...

Implement just enough of 'CREATE TYPE' parsing to be able to report a
human readable error message if someone tries to execute such
statements:

  cqlsh:ks1> CREATE TYPE config (url text);
  ServerError: <ErrorMessage code=0000 [Server error] message="User-defined types are not supported yet">
Message-Id: <1456148719-9473-2-git-send-email-penberg@scylladb.com>
2016-02-22 14:50:25 +01:00
Pekka Enberg
d1bbd0271a cql3: Return const reference from ut_name::get_keyspace()
There's no need to copy the string but it does make it more difficult to
use get_keyspace() from other places that already return a const
reference.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
Message-Id: <1456148719-9473-1-git-send-email-penberg@scylladb.com>
2016-02-22 14:50:25 +01:00
Pekka Enberg
a15cbf0968 transport: Remove read_unsigned_short() variant
As explained in commit 0ff0c55 ("transport: server: 'short' should be
unsigned"), "short" type is always unsigned in the CQL binary protocol.
Therefore, drop the read_unsigned_short() variant altogether and just
use read_short() everywhere.

Message-Id: <1456133171-1433-1-git-send-email-penberg@scylladb.com>
2016-02-22 11:39:33 +02:00
Tomasz Grabiec
fb3344eba1 sstables: Do not write corrupted sstables when column names are too large
This may result in errors during reading like the following one:

  runtime error: Unexpected marker. Found k, expected \x01\n)'

The error above happened when executing limits.py:max_key_length_test dtest.

After change the exception will happen during writing and will be clearer.

Refs #807.

This patch doesn't deal with the problem of ensuring that we will
never hit those errors, which is very desirable. We shouldn't ack a
write if we can't persist it to sstables.

Message-Id: <1456130045-2364-1-git-send-email-tgrabiec@scylladb.com>
2016-02-22 11:03:16 +02:00
Vlad Zolotarov
f2c6f16a50 locator: everywhere_replication_strategy: change the class_registrator name to "EverywhereStrategy"
Change the name used with class_registrator from "EverywhereReplicationStrategy"
(used in the initial patch from CASSANDRA-826 JIRA) to "EverywhereStrategy"
as it is in the current DCE code.

With this change one will be able to create an instance of
everywhere_replication_strategy class by giving either
an "org.apache.cassandra.locator.EverywhereStrategy" (full name) or
an "EverywhereStrategy" (short name) as a replication strategy name.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1456081258-937-1-git-send-email-vladz@cloudius-systems.com>
2016-02-22 09:18:47 +02:00
Vlad Zolotarov
cc30956c56 locator: added EverywhereReplicationStrategy
This strategy would ignore an RF configuration and would
always try to replicate on all cluster nodes.

This means that its get_replication_factor()  would return a
number of currently "known" nodes in the cluster and
if a cluster is currently bootstrapping this value obviously may
change in time for the same key. Therefore using this strategy
should be done with caution.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1456074333-15014-3-git-send-email-vladz@cloudius-systems.com>
2016-02-21 19:29:29 +02:00
Vlad Zolotarov
ec14fb2a70 locator: token_metadata: add get_all_endpoints_count()
Return a number of currently known endpoints when
it's needed in a fast path flow.

Calling a get_all_endpoints().size() for that matter
would not be fast enough because of the unordered_set->vector transformation
we don't need.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1456074333-15014-2-git-send-email-vladz@cloudius-systems.com>
2016-02-21 19:29:28 +02:00
Avi Kivity
63841b425d Merge seastar upstream
* seastar c829b69...8c560f2 (2):
  > iotune: add missing static variable definitions
  > prevent futures ignored by parallel_for_each from generating warnings
2016-02-21 18:39:28 +02:00
Shlomi Livne
312f1c9d98 dist: align ami option with others (-a --> --ami)
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <c159aac7f0478aba34d4398a2eb8ea71285ede21.1456052976.git.shlomi@scylladb.com>
2016-02-21 15:06:20 +02:00
Shlomi Livne
21e6720988 Revert "dist: remove AMI entry from sysconfig, since there is no script refering it"
This reverts commit 54f9e59006.

AMI is needed to setting up io params

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <4154a000f019059f740319cfa2fbf875568770b7.1456052976.git.shlomi@scylladb.com>
2016-02-21 15:06:20 +02:00
Tomasz Grabiec
d376167bd4 cql: create_table_statement: Optimize duplicate column names detection
Current algorithm is O(N^2) where N is the column count. This causes
limits.py:TestLimits.max_columns_and_query_parameters_test to timeout
because CREATE TABLE statement takes too long.

This change replaces it with an algorithm of O(N)
complexity. _defined_names are already sorted so if any duplicates
exist, they must be next to each other.

Message-Id: <1456058447-5080-1-git-send-email-tgrabiec@scylladb.com>
2016-02-21 14:55:03 +02:00
Tomasz Grabiec
d3b7e143dc db: Fix error handling in populate_keyspace()
When find_uuid() fails Scylla would terminate with:

  Exiting on unhandled exception of type 'std::out_of_range': _Map_base::at

But we are supposed to ignore directories for unknown column
families. The try {} catch block is doing just that when
no_such_column_family is thrown from the find_column_family() call
which follows find_uuid(). Fix by converting std::out_of_range to
no_such_column_family.

Message-Id: <1456056280-3933-1-git-send-email-tgrabiec@scylladb.com>
2016-02-21 14:19:31 +02:00
Tomasz Grabiec
0c8db777b1 bytes_ostream: Avoid recursion when freeing chunks
When there is a lot of chunks we may get stack overflow.

This seems to fix issue #906, a memory corruption during schema
merge. I suspect that what causes corruption there is overflowing of
the stack allocated for the seastar thread. Those stacks don't have
red zones which would catch overflow.

Message-Id: <1456056288-3983-1-git-send-email-tgrabiec@scylladb.com>
2016-02-21 14:18:49 +02:00
Raphael S. Carvalho
b1cc0490f5 sstables: make compaction manager shutdown less verbose
before:

^CINFO  [shard 0] compaction_manager - Asked to stop
INFO  [shard 0] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 0] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 1] compaction_manager - Asked to stop
INFO  [shard 2] compaction_manager - Asked to stop
INFO  [shard 1] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 2] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 3] compaction_manager - Asked to stop
INFO  [shard 1] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 2] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 3] compaction_manager - compaction task handler stopped due to shutdown
INFO  [shard 3] compaction_manager - compaction task handler stopped due to shutdown

after:

^CINFO  [shard 0] compaction_manager - Asked to stop
INFO  [shard 0] compaction_manager - Stopped
INFO  [shard 1] compaction_manager - Asked to stop
INFO  [shard 2] compaction_manager - Asked to stop
INFO  [shard 3] compaction_manager - Asked to stop
INFO  [shard 1] compaction_manager - Stopped
INFO  [shard 2] compaction_manager - Stopped
INFO  [shard 3] compaction_manager - Stopped

`compaction_manager - compaction task handler stopped due to shutdown` is still printed
in debug level

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <535d5ad40102571a3d5d36257342827989e8f0f4.1455835407.git.raphaelsc@scylladb.com>
2016-02-21 11:55:17 +02:00
Raphael S. Carvalho
55be1830ff database: make column_family::rebuild_sstable_list safer
If any of the allocation in rebuild_sstable_list fail, the system
may be left with an incorrect set of sstables.
It's probably safer to assign the new set of sstables as a last
step.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <52b188262dcc06730dc9220b54ff6810d7dca1ae.1455835030.git.raphaelsc@scylladb.com>
2016-02-21 11:55:15 +02:00
Raphael S. Carvalho
9cb8a43684 start using notation ks.cf everywhere
Some places were using the notation ks/cf to represent a keyspace
and column family pair. ks.cf is the notation used by C*, so we
should use it everywhere.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <939449af92565b79d1823890784dc4d1dc3cdc84.1455830989.git.raphaelsc@scylladb.com>
2016-02-21 11:15:09 +02:00
Avi Kivity
69ac1a3229 Merge seastar upstream
* seastar cf1716f...c829b69 (1):
  > iotune: limit generate() concurrency to 128

Fixes #922.
2016-02-21 11:12:10 +02:00
Takuya ASADA
5a213341a4 dist: restart scylla on abnormal termination (Ubuntu)
Fixes #907

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455833659-12652-1-git-send-email-syuu@scylladb.com>
2016-02-21 10:33:28 +02:00
Takuya ASADA
7a6f9c9bb4 dist: use /usr/bin/python3.4 for idl-compiler.py on CentOS
Fixes #923

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455826201-11333-1-git-send-email-syuu@scylladb.com>
2016-02-20 19:15:21 +02:00
Avi Kivity
bba2034957 Merge "IDL-ize mutations" from Paweł
"This series switches mutation_partition_serializer, mutation_partition_view
and frozen_mutation to the IDL-based serialization format.
canonical_mutations and frozen_schemas are still not converted.

Quick test with 4 node ccm cluster and cassandra-stress doesn't show any
problem, unsurprisingly, as frozen_mutation_test obviously still passes."
2016-02-20 18:49:23 +02:00
Avi Kivity
889bf1aef2 dist: build only scylla and iotune binaries, not all the tests 2016-02-20 18:45:42 +02:00
Takuya ASADA
ca9f7bf72e dist: add iotune on .rpm/.deb packages
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455929805-28987-1-git-send-email-syuu@scylladb.com>
2016-02-20 18:44:47 +02:00
Paweł Dziepak
8e7a2fa557 schema_mutations: drop old serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:12:00 +00:00
Paweł Dziepak
061dd111b5 canonical_mutation: drop old serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:12:00 +00:00
Paweł Dziepak
351c69b476 frozen_schema: use IDL-based serialization
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:12:00 +00:00
Paweł Dziepak
81f42415d4 schema_mutations: prepare for auto-generated serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:12:00 +00:00
Paweł Dziepak
1b52264dfd batchlog_manager: use new canonical_mutation serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:12:00 +00:00
Paweł Dziepak
6c8b298ccd canonical_mutation: prepare for auto-generated serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:12:00 +00:00
Paweł Dziepak
7dda3977c6 column_mapping: drop old-style serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Paweł Dziepak
89b75a02d4 commitlog: use IDL-based serialization for entries
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Paweł Dziepak
d5c794d5e4 data_output: add reserve()
Allows mixing data_output with other output stream like
seastar::simple_output_stream which is useful when switching to the new
IDL-based serializers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Paweł Dziepak
f548c75200 commitlog: move implementation to *.cc file
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Paweł Dziepak
5a353486c6 canonical_mutation: switch to IDL-based serialization
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:31 +00:00
Paweł Dziepak
c55fa9e4c2 schema: make column_mapping serializer-friendly
- unnested column_mapping::column
- more accessors

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:16 +00:00
Paweł Dziepak
4f3ee7abbc frozen_mutation: use IDL-based serialization
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:51:17 +00:00
Paweł Dziepak
28fa2a6493 idl-compiler: add serialization callback interface
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:50:29 +00:00
Paweł Dziepak
186061adef mutation_partition: switch serialization to IDL-based one
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:49:08 +00:00
Paweł Dziepak
ccd29bf7a7 frozen_mutation: switch to bytes_ostream
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:47:54 +00:00
Paweł Dziepak
5127321866 column_mapping: add column_at()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:47:54 +00:00
Paweł Dziepak
e332f95960 types: make serialize_mutation_form() static
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:47:42 +00:00
Paweł Dziepak
7586e47004 idl-compiler: avoid copy of basic types
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:47:42 +00:00
Paweł Dziepak
2ea735f5ed idl-compiler: accept both bytes and bytes_view
bytes can always be trivially converted to bytes_view. Conversion in the
other direction requires a copy.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:47:42 +00:00
Paweł Dziepak
18b1c66287 idl-compiler: allow auto-generated serializers in writers
This patch allows using both auto-generated serializers or writer-based
serialization for non-stub [[writable]] types.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:47:21 +00:00
Paweł Dziepak
af2241686f idl-compiler: add reindent() function
This helps having C++ code properly indented both in the compiler source
code and in the auto-generated files.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:20:09 +00:00
Paweł Dziepak
597ed15dfd tests: add idl unit test
Test auto-generated and writer-based serialization as well as
deserialization of simple compound type, vectors and variants.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:19:30 +00:00
Paweł Dziepak
7d1a66d3a0 idl-compiler: move writers and view to *.impl.hh
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:17:00 +00:00
Paweł Dziepak
f1f14631f4 add set_size() overload for bytes_ostream()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:16:55 +00:00
Paweł Dziepak
340d0cccbc serializer: fix duration deserializer
Deserializer is supposed to update input stream.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 21:11:57 +00:00
Avi Kivity
6330743775 Merge seastar upstream
* seastar 8679033...cf1716f (1):
  > iotune: qualify filesystems for aio
2016-02-18 17:16:46 +02:00
Nadav Har'El
f9ee74f56f repair: options for repairing only a subrange
To implement nodetool's "--start-token"/"--end-token" feature, we need
to be able to repair only *part* of the ranges held by this node.
Our REST API already had a "ranges" option where the tool can list the
specific ranges to repair, but using this interface in the JMX
implementation is inconvenient, because it requires the *Java* code
to be able to intersect the given start/end token range with the actual
ranges held by the repaired node.

A more reasonable approach, which this patch uses, is to add new
"startToken"/"endToken" options to the repair's REST API. What these
options do is is to find the node's token ranges as usual, and only
then *intersect* them with the user-specified token range. The JMX
implementation becomes much simpler (in a separate patch for scylla-jmx)
and the real work is done in the C++ code, where it belongs, not in
Java code.

With the additional scylla-jmx patch to use the new REST API options
provided here, this fixes #917.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1455807739-25581-1-git-send-email-nyh@scylladb.com>
2016-02-18 17:13:56 +02:00
Raphael S. Carvalho
a53cfc8127 compaction manager: add support to wait for termination of cleanup
'nodetool cleanup' must wait for termination of cleanup, however,
cleanup is handled asynchronously. To solve that, a mechanism is
added here to wait for termination of a cleanup. This mechanism is
about using promise to notificate waiter of cleanup completion.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <6dc0a39170f3f51487fb8858eb443573548d8bce.1455655016.git.raphaelsc@scylladb.com>
2016-02-18 17:01:18 +02:00
Paweł Dziepak
763f6e1dc0 storage_service: don't drain twice
If drain was explicitly requested by user there is no need to do that
again during shutdown.

Fixes segmentation fault when shutting down already drained Scylla.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1455794465-16670-1-git-send-email-pdziepak@scylladb.com>
2016-02-18 14:58:02 +02:00
Avi Kivity
1b49c0ce19 dist: Restart scylla on abnormal termination
Restarting the service can recover from a transient failure.

Fixes #904 (on systemd systems only)
Message-Id: <1455441103-11963-1-git-send-email-avi@scylladb.com>
2016-02-17 21:30:40 +01:00
Avi Kivity
62e96de48a Merge "Adding writers and visitor to IDL" from Amnon
"writers are used to stream objects programmatically and not from objects.
visitors views are used to retrieve information from serialized objects without
deserialize them entirely, but to skip to the position in the buffer with the
relevant information and deserialize only it."
2016-02-17 22:06:46 +02:00
Amnon Heiman
fbc6770837 idl-compiler: Verify member type
This patch adds static assert to the generated code that verify that a
declare type in the idl matches the parameter type.

Accepted type ignores reference and const when comparison.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-17 21:44:53 +02:00
Amnon Heiman
38cd55e9cf Adding the mutation idl
This adds the mutation definition idl and add it to the compilation.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-17 18:42:09 +02:00
Amnon Heiman
714d4927d6 idl-compiler: add writers and view to classes.
This patch adds a writer object to classes in the idl.

It adds attribute support to classes. Writer will be created for classes
that are marked as writable.

For the writers, the code generator creates two kind of struct.
states, that holds the write state (manly the place holders for all
current objects, and vectors) and nodes that represent the current
position in the writing state machine.

To write an object create a writer:
For example creating a writer for mutation, if out is a bytes_ostream

writer_of_mutation w(out);

Views are used to read from buffer without deserialize an entire
object.
This patch adds view creation to the idl-compiler. For each view a
read_size function is created that will be used when skipping through
buffers.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-17 18:42:09 +02:00
Amnon Heiman
33d5c95b90 serialization_visitors: Add skip template
The skip template function is used when skipping data types.
By default is uses a deserializer to calculate the size.

A specific implementation save unneeded deserialization. For fix sized
object the skip function would become an expression allowing the
compiler to drop the function altogether.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-17 18:42:09 +02:00
Amnon Heiman
64c097422d Adding the serialization_visitors.hh file
The serialization_visitors.hh contains helper classes for the readers and writers
visitors class.

place_holder is a wrapper around bytes_stream place holder.
frame is used to store size in bytes.
empty_frame is used with final object (that do not store their size)
from the code that uses it, it looks the same, but in practice it does
not store any data.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-17 18:42:08 +02:00
Amnon Heiman
ca72d637f9 bytes_ostream: Allow place holder return a stream
Reader and writer can use the bytes_ostream as a raw bytes stream,
handling the bytes encoding and streaming on their own.

To fully support this functionality, place holder should support it as
well.

This patch adds a get_stream method that return a simple_output_stream
writer can use it using their own serialization function.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-17 18:42:04 +02:00
Tomasz Grabiec
eedc1548e7 transport: server: Fix typo
Spotted-by: Vitaly Davidovich <vitalyd@gmail.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
2016-02-17 15:55:10 +01:00
Avi Kivity
0363a0efe6 Merge "Fix limits handling in CQL server" from Tomasz
"Fixes the following issues:
 #807 Wrong maximum key length
 #809 Scylla assert on returning result when max column size is overflow"
2016-02-17 15:06:51 +02:00
Tomasz Grabiec
6e7bac14b3 transport: server: Throw instead of abort on bounds check failures
Instead of crashing the server we will respond with a "Server Error"
to the requestor.

Fixes #809.
2016-02-17 13:12:11 +01:00
Tomasz Grabiec
0ff0c5555a transport: server: 'short' should be unsigned
According to CQL binary protocol v3 [1], "short" fields are unsigned:

   [short]        A 2 bytes unsigned integer

[1] https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol_v3.spec

C* code agrees as well.

Fixes #807.
2016-02-17 13:12:11 +01:00
Tomasz Grabiec
9375b7df7b transport: server: Size check should allow max value 2016-02-17 13:12:11 +01:00
Tomasz Grabiec
48e9d67525 compound: Extract size_type alias 2016-02-17 13:12:11 +01:00
Amnon Heiman
719b8e1e4d serializer: Add boost::variant, chrono::time_point and unknown variant
This patch adds a stub support for boost::variant. Currently variant are
not serialized, this is added just so non stub classes will be able to
compile.

deserialize for chrono::time_point and deserializer for chrono::duration

unknown variant:
Planning for situations where variant could be expanded, there may be
situation that a variant will return an unknown value.

In those cases the data and index will be paseed to the reader, that
can decide what to do with it.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-02-17 11:43:50 +02:00
Avi Kivity
69183be6f0 Update scylla-ami submodule
* dist/ami/files/scylla-ami b3b85be...398b1aa (3):
  > Import AMI initialization code from scylla-server repo
  > Use long options on scylla_raid_setup and scylla_sysconfig_setup
  > Wait more longer to finishing AMI setup
2016-02-17 10:45:50 +02:00
Takuya ASADA
e640b42081 dist: On Ubuntu, log coredump before creating the core
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455664279-32157-1-git-send-email-syuu@scylladb.com>
2016-02-17 10:43:56 +02:00
Avi Kivity
00f40881d6 Merge "Interactive scylla_setup and refactoring setup scripts" from Takuya 2016-02-17 10:42:26 +02:00
Takuya ASADA
553c2ca523 dist: call scylla_install_ami directly from scylla.json
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
2d429e602a dist: interactive scylla_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
b9ae4ff272 dist: delete build_ami_local.sh, merge it to build_ami.sh
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
6b93505952 dist: long options for build_rpm.sh
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
d754bbe122 dist: add --unstable on build_ami.sh
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
70f397911b dist: long options for build_ami.sh
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
a860e4baae dist: remove AMI initialization code from scylla_setup, move to scylla-ami
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
08038e3f42 dist: long options for scylla_install_pkg
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
d7a03676e3 dist: don't ignore posix_net_conf.sh error
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
684447d3ab dist: long options for scylla_sysconfig_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
7861871c1b dist: long options for scylla_raid_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
889c4706fc dist: long options for scylla_ntp_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
2804e394a6 dist: long options for scylla_coredump_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:08 +09:00
Takuya ASADA
c12d95afe0 dist: add options to skip running setups on scylla_setup, support long options
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:34:05 +09:00
Takuya ASADA
dc9012d5a4 dist: move selinux setup code to scylla_selinux_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:33:01 +09:00
Takuya ASADA
54f9e59006 dist: remove AMI entry from sysconfig, since there is no script refering it
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:33:01 +09:00
Takuya ASADA
9b8f45d5b7 dist: don't use -a option for scylla_bootparam_setup since it was removed
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:33:01 +09:00
Takuya ASADA
5b742ff447 dist: generalize scylla_ntp_setup, drop '-a' option (means AMI) from it
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-02-17 07:28:38 +09:00
Takuya ASADA
51c497527c dist: support unstable repository on scylla_install_pkg 2016-02-17 07:28:36 +09:00
Amnon Heiman
1e4d227b20 managed_bytes: don't return auto from non-member function
gcc 4.9 does not allow non-static data member declared auto.

This patch replace the auto decleration with std::result_of_t

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1455652166-16860-1-git-send-email-amnon@scylladb.com>
2016-02-16 21:50:55 +02:00
Tomasz Grabiec
7af65e45b2 compound: Throw exception when key is too large rather than abort
Abort is too big of a hammer.

Refs #809.

Message-Id: <1455650129-9202-1-git-send-email-tgrabiec@scylladb.com>
2016-02-16 21:36:25 +02:00
Avi Kivity
bd3a08fd19 Merge seastar upstream
* seastar 1bbb02f...8679033 (1):
  > net: fix compilation problem introduced after e5cbee3
2016-02-16 19:46:32 +02:00
Avi Kivity
e1828b82b5 Merge seastar upstream
* seastar b25a958...1bbb02f (6):
  > native-stack: fix arp request missing under loopback connection
  > apps: iotune: fix compilation with g++ 4.9
  > simple-stream: Add copy constructor
  > tcp: don't need to choose another core since only one core
  > Merge "Fix undefined behaviors related to reactor shutdown" from Tomasz
  > rpc: do not wait for data to be send before reporting timeout
2016-02-16 18:06:50 +02:00
Tomasz Grabiec
a921479e71 Merge tag '807-v3' from https://github.com/avikivity/scylla
From Avi:

This patchset introduces a linearization context for managed_bytes objects.

Within this context, any scattered managed_bytes (found only in lsa regions,
so limited to memtable and cache) are auto-linearized for the lifetime of
the context.   This ensures that key and value lookups can use fast
contiguous iterators instead of using slow discontiguous iterators (or
crashing, as is the case now).
2016-02-16 14:29:48 +01:00
Avi Kivity
13144ea9eb managed_bytes: get rid of explicit linearize/scatter
Now that everything is in a linarization context, we don't need to explicitly
gather data.
2016-02-16 14:37:46 +02:00
Avi Kivity
d415167496 memtable: use managed_bytes linearization context when applying mutations
Ensures that we don't access scattered keys when looking up stuff.
2016-02-16 14:37:46 +02:00
Avi Kivity
fbe6961827 row_cache: run partiton-touching operations of row_cache::update in a linearization context
To avoid scattered keys (and values, though those are already protected)
from being accessed, run the update procedure in a managed_bytes linearization
context.

Fixes #807.
2016-02-16 14:37:44 +02:00
Avi Kivity
47ea1237ed build: build seastar's iotune
Target name is build/{mode}/iotune.
2016-02-16 12:13:29 +02:00
Avi Kivity
84ede4c14c Merge seastar upstream
* seastar 0f759f0...b25a958 (1):
  > Merge "IOTune: a tool to tune Seastar's I/O parameters" from Glauber
2016-02-16 12:12:34 +02:00
Asias He
d146045bc5 Revert "Revert "streaming: Send mutations on all shards""
This brings back streaming on all shards. The bug in
locator/abstract_replication_strategy is now fixed.

This reverts commit 9f3061ade8.

Message-Id: <a79ce9cdd6f4af1c6088b89e1911b4b2ed1c10ae.1455589460.git.asias@scylladb.com>
2016-02-16 11:16:51 +02:00
Avi Kivity
ce74718950 Merge "Preparation for specifying query result format in IDL" from Tomasz 2016-02-15 19:41:18 +02:00
Raphael S. Carvalho
59bbe98c21 sstables: keep track of compacting sstables in compacton manager itself
Avi says:
"Something like unordered_set<unsigned long> is error prone, because ints
tend to mix up (also, need to use a sized type, unsigned long varies among
machines)."

With that in mind, it's better if we keep track of compacting sstables in
a unordered_set<shared_sstable>.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <249f0fd4cfcf786cf3c37a79978f7743d07f48ad.1455120811.git.raphaelsc@scylladb.com>
2016-02-15 18:35:43 +02:00
Nadav Har'El
3a2885e1e3 repair: use seastar::gate
Switch to use seastar::gate (and its new gate::check() method) instead
of a similar implementation in repair.cc.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1455553063-13488-1-git-send-email-nyh@scylladb.com>
2016-02-15 18:22:36 +02:00
Avi Kivity
54f145b666 Merge seastar upstream
* seastar 353b1a1...0f759f0 (11):
  > tutorial: add a link to future API documentation
  > sleep: document
  > tutorial: fix typos
  > gate: add check() method
  > tutorial: introduce seastar::gate
  > doc: explain how to test the native stack without dpdk
  > doc: separate the mini-tutorial into its own file
  > doc: move DPDK build instructions to its own file
  > doc: split building instructions into separate files
  > doc: fix/modernize git commands in contributing.md
  > doc: how-to on contributing & guidelines
2016-02-15 18:22:02 +02:00
Tomasz Grabiec
09dc79f245 cql3: select_statement: Set desired serialization format 2016-02-15 17:05:55 +01:00
Tomasz Grabiec
63006e5dd2 query: Serialize collection cells using CQL format
We want the format of query results to be eventually defined in the
IDL and be independent of the format we use in memory to represent
collections. This change is a step in this direction.

The change decouples format of collection cells in query results from
our in-memory representation. We currently use collection_mutation_view,
after the change we will use CQL binary protocol format. We use that because
it requires less transformations on the coordinator side.

One complication is that some list operations need to retrieve keys
used in list cells, not only values. To satisfy this need, new query
option was added called "collections_as_maps" which will cause lists
and sets to be reinterpreted as maps matching their underlying
representation. This allows the coordinator to generate mutations
referencing existing items in lists.
2016-02-15 17:05:55 +01:00
Tomasz Grabiec
383296c05b cql3: Fix handling of lists with static columns
List operations and prefetching were not handling static columns
correctly. One issue was that prefetching was attaching static column
data to row data using ids which might overlap with clustered columns.

Another problem was that list operations were always constructing
clustering key even if they worked on a static column. For static
columns the key would be always empty and lookup would fail.

The effect was that list operations which depend on curent state had
no effect. Similar problem could be observed on C* 2.1.9, but not on 2.2.3.

Fixes #903.
2016-02-15 17:05:55 +01:00
Tomasz Grabiec
e65fddc14b types: Introduce data_value::serialize() 2016-02-15 17:05:55 +01:00
Tomasz Grabiec
5f756fcbe5 query: Add cql_format property to partition_slice
It will specify in which format CQL values should be serialized. Will
allow for rolling out new CQL binary protocol versions without
stalling reads.
2016-02-15 17:05:55 +01:00
Tomasz Grabiec
6709c0ac15 cql_serialization_format: Make it CQL protocol version aware
We want to serialize it as a single number, the CQL binary protocol
version to which it corresponds, so it needs to be aware of the
version number.
2016-02-15 17:05:55 +01:00
Tomasz Grabiec
81fdd12f07 cql_serialization_version: Abstract away collection format changes
This puts knowledge about which cql_serialization_formats have the
same collection format into one place,
cql_serialization_format::collection_format_unchanged().
2016-02-15 17:03:53 +01:00
Tomasz Grabiec
9d11968ad8 Rename serialization_format to cql_serialization_format 2016-02-15 16:53:56 +01:00
Tomasz Grabiec
916a91c913 query: Split send_timestamp_and_expiry into two separate options
It's cleaner that way. They don't need to come together.
2016-02-15 16:53:56 +01:00
Tomasz Grabiec
100b540a53 validation: Fix validation of empty partition key
The validation was wrongly assuming that empty thrift key, for which
the original C* code guards against, can only correspond to empty
representation of our partition_key. This no longer holds after:

   commit 095efd01d6
   "keys: Make from_exploded() and components() work without schema"

This was responsible for dtest failure:
cql_additional_tests.TestCQL:column_name_validation_test
2016-02-15 16:53:56 +01:00
Tomasz Grabiec
f4e3bd0c00 keys: Introduce partition_key::validate()
So that user doesn't have to play with low-level representations.
2016-02-15 16:53:56 +01:00
Tomasz Grabiec
df5f8e4bfc keys: Avoid unnecessary construction of temporary 'bytes' object
We're now using managed_bytes as main storage, so conversion from
bytes_view to bytes is redundant, we need to convert to managed_bytes
eventualy.
2016-02-15 16:53:56 +01:00
Tomasz Grabiec
6d00e473ac keys: Make constructor from bytes private 2016-02-15 16:53:55 +01:00
Tomasz Grabiec
e061eb02df cql3: Avoid using partition_key::from_bytes()
serialize() and from_bytes() is a low level interface, which in this
case can be replaced with a partition_key static factory method
resulting in cleaner code.
2016-02-15 16:53:55 +01:00
Paweł Dziepak
dbb878d16e Revert "do not use boost::multiprecision::msb()"
This reverts commit dadd097f9c.

That commit caused serialized forms of varint and decimal to have some
excess leading zeros. They didn't affect deserialization in any way but
caused computed tokens to differ from the Cassandra ones.

Fixes #898.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1455537278-20106-1-git-send-email-pdziepak@scylladb.com>
2016-02-15 14:24:37 +02:00
Avi Kivity
1f752446d2 Merge "Truncation format & fixes" from Calle
"Fixes #884
Fixes #895

Also at seastar-dev: calle/truncate_more

1.) Change truncation records to be stored with IDL serialization
2.) Fix db::serializers encoding of replay_position
3.) Detect attempted reading of Origin truncation records, and instead
    of crashing, ignore and warn.
4.) Change truncation time stamps to be generated per-shard, _after_
    CF flush is done, otherwise data in memtables at flush would be
    retained/replayed on next start. Retain the highest time stamp
    generated.

Note for (3): This patch set does _not_ clear out origin records
automatically. This because I feel that is a somewhat drastic and
irreversible thing to do. If we want to avail the user of a means
to get rid of the (3) warning, we should probably tell him to either
use cqlsh, or add an API call for this, so he can do it explicitly.
"
2016-02-15 11:39:56 +02:00
Takuya ASADA
fb3f4cc148 dist: add posix_net_conf.sh on Ubuntu package
Fixes #881

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455522990-32044-1-git-send-email-syuu@scylladb.com>
2016-02-15 11:37:30 +02:00
Nadav Har'El
7dc843fc1c repair: stop ongoing repairs during shutdown
When shutting down a node gracefully, this patch asks all ongoing repairs
started on this node to stop as soon as possible (without completing
their work), and then waits for these repairs to finish (with failure,
usually, because they didn't complete).

We need to do this, because if the repair loop continues to run while we
start destructing the various services it relies on, it can crash (as
reported in #699, although the specific crash reported there no longer
occurs after some changes in the streaming code). Additionally, it is
important that to stop the ongoing repair, and not wait for it to complete
its normal operation, because that can take a very long time, and shutdown
is supposed to not take more than a few seconds.

Fixes #699.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1455218873-6201-1-git-send-email-nyh@scylladb.com>
2016-02-14 16:52:41 +02:00
Raphael S. Carvalho
a487ef1ff3 sstables: improve log message when a sstable is sealed
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <e391243212d83347b1b50c728bee24f6a2ecc950.1455230788.git.raphaelsc@scylladb.com>
2016-02-14 12:05:16 +02:00
Tomasz Grabiec
456275e06a storage_proxy: Simplify condition
Message-Id: <1455288472-30538-1-git-send-email-tgrabiec@scylladb.com>
2016-02-14 11:22:15 +02:00
Tomasz Grabiec
321287dd7c cql3: Fix crash when parsing collection condition
Happened when parsing a statement like this:

 DELETE FROM tmap WHERE k=0 IF m[null] = 'foo'

Message-Id: <1455294896-15184-1-git-send-email-tgrabiec@scylladb.com>
2016-02-14 11:21:10 +02:00
Takuya ASADA
3697cee76d dist: switch AMI base image to 'CentOS7-Base2', uses CentOS official kernel
On previous CentOS base image, it accsidently uses non-standard kernel from elrepo.
This replaces base image to new one, contains CentOS default kernel.

Fixes #890

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455398903-2865-1-git-send-email-syuu@scylladb.com>
2016-02-14 10:15:27 +02:00
Tomasz Grabiec
efdbc3d6d7 abstract_replication_strategy: Fix generation of token ranges
We can't move-from in the loop because the subject will be empty in
all but the first iteration.

Fixes crash during node stratup:

  "Exiting on unhandled exception of type 'runtime_exception': runtime error: Invalid token. Should have size 8, has size 0"

Fixes update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_node_1_test (and probably others)

Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
2016-02-12 19:38:36 +01:00
Shlomi Livne
f938e1d303 dist: start scylla with SCYLLA_IO
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <d93a7b41a285fcde796c5681479a328f1efac0c3.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:03 +02:00
Shlomi Livne
5494135ddd dist: update SCYLLA_IO with params for AMI
Add setting of --num-io-queues, --max-io-requests for AMI

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <b94a63154a91c8568e194d7221b9ffc7d7813ebc.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:02 +02:00
Shlomi Livne
5cae2560a3 dist: introduce SCYLLA_IO
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <6490d049fd23a335bb0a95cac3e8a4c08c61166e.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:02 +02:00
Shlomi Livne
d8cdf76e70 dist: change setting of scylla home from "-d" to "-r"
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <53dcd9d1daa0194de3f889b67788d9c21d1e474d.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:00:37 +02:00
Avi Kivity
3c4f67f3e6 build: require boost > 1.55
See #898.

Add checks both for boost being installed, and for the correct version.
Message-Id: <1455193574-24959-1-git-send-email-avi@scylladb.com>
2016-02-11 15:15:49 +02:00
Avi Kivity
9249d45ae1 Update scylla-ami submodule
* dist/ami/files/scylla-ami b2724be...b3b85be (1):
  > adding --stop-services
2016-02-11 12:24:17 +02:00
Avi Kivity
5834815ed9 Merge seastar upstream
* seastar 14c9991...353b1a1 (2):
  > scripts: posix_net_conf.sh: Change the way we learn NIC's IRQ numbers
  > gate: protect against calling close() more than once
2016-02-11 12:23:51 +02:00
Takuya ASADA
09b1ec6103 dist: attach ephemeral disks on AMI by default
To attach maximum number of ephemeral disks available on the instance, specify 8.
On AMI creation, it will be reduce to available number.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454439628-2882-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:21:09 +02:00
Takuya ASADA
16e6db42e1 dist: abandon to start scylla-server when it's disabled from AMI userdata
Support AMi's --stop-services, prevent startup scylla-server (and scylla-jmx, since it's dependent on scylla-server)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454492729-11876-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:21:08 +02:00
Takuya ASADA
f227b3faac dist: On AMI, mark root disk with delete_on_termination
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454513308-12384-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:19:28 +02:00
Takuya ASADA
33309f667e dist: enable enhanced networking on AMI
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454971289-21369-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:18:48 +02:00
Raphael S. Carvalho
ed61fe5831 sstables: make compaction stop report user-friendly
When scylla stopped an ongoing compaction, the event was reported
as an error. This patch introduces a specialized exception for
compaction stop so that the event can be handled appropriately.

Before:
ERROR [shard 0] compaction_manager - compaction failed: read exception:
std::runtime_error (Compaction for keyspace1/standard1 was deliberately
stopped.)

After:
INFO  [shard 0] compaction_manager - compaction info: Compaction for
keyspace1/standard1 was stopped due to shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1f85d4e5c24d23a1b4e7e0370a2cffc97cbc6d44.1455034236.git.raphaelsc@scylladb.com>
2016-02-11 12:16:53 +02:00
Takuya ASADA
8d8130f9c9 dist: fix typo on build_ami.sh
We should always run scylla_setup, not just for locally built rpm

Fixes #897

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455103519-13780-1-git-send-email-syuu@scylladb.com>
2016-02-11 11:56:11 +02:00
Shlomi Livne
64f8d5a50e dist: update packer location
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <3c33ea073f702e00b789930fce9befef03ad9e88.1455178900.git.shlomi@scylladb.com>
2016-02-11 11:52:56 +02:00
Avi Kivity
bfbf89ee31 Merge "Serialize keys in a form independent of in-memory representation" from Tomasz
"This series changes the on-wire definitions of keys to be of the following form:

  class partition_key {
     std::vector<bytes> exploded();
  };

Keys are therefore collections of components. The components are serialized according
to the format specified in the CQL binary protocol. No bit depends now on how we store keys in memory.

Constructing keys from components currently requires a schema reference,
which makes it not possible to deserialize or serialize the keys automatically
by RPC. To avoid those complications, compound_type was changed so that
it can be constructed and components can be iterated over without schema.
Because of this, partition_key size increased by 2 bytes."
2016-02-10 17:54:42 +02:00
Tomasz Grabiec
b74301302c tests: Add test for key serialization 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
3e2c1840d8 idl: Make key definitions independent of in-memory representation 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
428fce3828 compound: Optimize serialize_single() 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
0cc2832a76 keys: Allow constructing from a range 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
3ffcb998fb keys: Enable serialization from a range not just a vector 2016-02-10 14:35:14 +01:00
Tomasz Grabiec
095efd01d6 keys: Make from_exploded() and components() work without schema
For simplicity, we want to have keys serializable and deserializable
without schema for now. We will serialize keys in a generic form of a
vector of components where the format of components is specified by
CQL binary protocol. So conversion between keys and vector of
components needs to be possible to do without schema.

We may want to make keys schema-dependent back in the future to apply
space optimizations specific to column types. Existing code should
still pass schema& to construct and access the key when possible.

One optimization had to be reverted in this change - avoidance of
storing key length (2 bytes) for single-component partition keys. One
consequence of this, in addition to a bit larger keys, is that we can
no longer avoid copy when constructing single-component partition keys
from a ready "bytes" object.

I haven't noticed any significant performance difference in:

  tests/perf/perf_simple_query -c1 --write

It does ~130K tps on my machine.
2016-02-10 14:35:13 +01:00
Tomasz Grabiec
31312722d1 compound: Reduce duplication 2016-02-10 14:35:13 +01:00
Tomasz Grabiec
085d148d6f compound: Remove unused methods 2016-02-10 14:35:13 +01:00
Tomasz Grabiec
b777cc9565 tests: Fix tests to not rely on key representation 2016-02-10 14:35:13 +01:00
Asias He
6d0407503b locator: Do not generate wrap-around ranges
Like we did in commit d54c77d5d0,
make the remaining functions in abstract_replication_strategy return
non-wrap-around ranges.

This fixes:

ERROR [shard 0] stream_session - [Stream #f0b7fda0-cf3e-11e5-b6c4-000000000000]
stream_transfer_task: Fail to send to 127.0.0.4:0: std::runtime_error (Not implemented: WRAP_AROUND)

in streaming.
Message-Id: <514d2a9a1d3b868d213464c8858ac5162c0338d8.1455093643.git.asias@scylladb.com>
2016-02-10 10:03:31 +01:00
Avi Kivity
fc6159e2b9 key: tighten partition_key::representation() to return a const managed_bytes&
The conversion to bytes_view can fail if the key is scattered; so defer that
conversion until later.  In a later patch we will intervene before the
conversion to ensure the data is linearized.
2016-02-09 19:55:13 +02:00
Avi Kivity
3c60310e38 key: relax some APIs to accept partition_key_view instead of const partition_key&
Using a partition_key_view can save an allocation in some cases.  We will
make use of it when we linearize a partition_key; during the process we
are given a simple byte pointer, and constructing a partition_key from that
requires an allocation.
2016-02-09 19:55:13 +02:00
Avi Kivity
af8ef54d5a managed_bytes: introduce with_linearized_managed_bytes()
A large managed_bytes blob can be scattered in lsa memory.  Usually this is
fine, but someone we want to examine it in place without copying it out, but
using contiguous iterators for efficiency.

For this use case, introduce with_linearized_managed_bytes(Func),
which runs a function in a "linearization context".  Within the linearization
context, reads of managed_bytes object will see temporarily linearized copies
instead of scattered data.
2016-02-09 19:55:13 +02:00
Avi Kivity
9f3061ade8 Revert "streaming: Send mutations on all shards"
This reverts commit 31d439213c.

Fixes #894.

Conflicts:
    streaming/stream_manager.cc

(may have undone part of 63a5aa6122)
2016-02-09 18:26:14 +02:00
Calle Wilund
18203a4244 database::truncate/drop: Move time stamp generation to shard
Fixes #884

Time stamps for truncation must be generated after flush, either by
splitting the truncate into two (or more) for-each-shard operations,
or simply by doing time stamping per shard (this solution).

We generate TS on each shard after flushing, and then rely on the
actual stored value to be the highest time point generated.

This should however, from batch replay point of view, be functionally
equivalent. And not a problem.
2016-02-09 15:45:37 +00:00
Calle Wilund
ce66acc771 system_keyspace: Always retain highest truncation time stamp
Since the table is written from all shards, and we possibly might
have conflicting time stamps, we define the trucated_at time
as the highest time point. I.e. conservative.
2016-02-09 15:45:37 +00:00
Calle Wilund
22a38f0025 db/serializer: Fix db::serializer<replay_position> format
Should match struct/"official" serial format. (64+32)
This serializer is however not really used any more and could
be removed.
2016-02-09 15:45:37 +00:00
Calle Wilund
1c213e1f38 system_keyspace: Use IDL types + better verification of truncation record
Truncation records are not portable between us and Origin.
We need to detect and ensure we neither try to use, and more to the
point, don't crash because of data format error when loading, origin
records from a migrated system.

This problem was seen by Tzach when doing a migration from an origin
setup.

Updated record storage to use IDL-serialized types + added versioning
and magic marking + odd-size-checking to ensure we load only correct
data. The code will also deal with records from an older version of
scylla.
2016-02-09 15:45:37 +00:00
Calle Wilund
4d7289b275 serializer_impl: Add convinience wrapper for one-obj deserialization
Akin to serizalize_to_buffer
2016-02-09 13:55:33 +00:00
Calle Wilund
dff89fffcd IDL: Add idl definitions for replay_position and truncation_record 2016-02-09 13:55:33 +00:00
Calle Wilund
873f87430d database: Check sstable dir name UUID part when populating CF
Fixes #870
Only load sstables from CF directories that match the current
CF uuid.
Message-Id: <1454938450-4338-1-git-send-email-calle@scylladb.com>
2016-02-08 14:48:19 +01:00
Avi Kivity
e5b72aedf1 managed_bytes: don't copy data during hashing 2016-02-08 12:43:05 +02:00
Avi Kivity
5d958db869 managed_bytes: fix operator== for fragmented blobs
Must compare fragment by fragment.
2016-02-08 12:43:05 +02:00
Calle Wilund
2ffd7d7b99 stream_manager: Change construction to make gcc 4.9 happy
gcc 4.9 complains about the type{ val, val } construction of
type with implicit default constructor, i.e. member = initial
declarations. gcc 5 does not (and possibly rightly so).
However, we still (implicitly) claim to support gcc 4.9 so
why not just change this particular instance.

Message-Id: <1454921328-1106-1-git-send-email-calle@scylladb.com>
2016-02-08 10:54:48 +02:00
Paweł Dziepak
c90ec731c8 transport: do not close gate at connection shutdown
connection::_pending_requests_gate is responsible for keeping connection
objects alive as long as there are outstanding requests and is closed
in connection::proccess() when needed. Closing it in connection::shutdown()
as well may cause the gate to be closed twice what is a bug.

Fixes #690.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1454596390-23239-1-git-send-email-pdziepak@scylladb.com>
2016-02-07 20:07:23 +02:00
Avi Kivity
8b0a26f06d build: support for alternative versions of libsystemd pkgconfig
While pkgconfig is supposed to be a distribution and version neutral way
of detecting packages, it doesn't always work this way.  The sd_notify()
manual page documents that sd_notify is available via the libsystemd
package, but on centos 7.0 it is only available via the libsystemd-daemon
package (on centos 7.1+ it works as expected).

Fix by allowing for alternate version of package names, testing each one
until a match is found.

Fixes #879.

Message-Id: <1454858862-5239-1-git-send-email-avi@scylladb.com>
2016-02-07 17:36:57 +02:00
Avi Kivity
ad58663c96 row_cache: reindent 2016-02-07 13:25:29 +02:00
Asias He
31d439213c streaming: Send mutations on all shards
Currently, only the shard where the stream_plan is created on will send
streaing mutations. To utilize all the available cores, we can make each
shard send mutations which it is responsbile for. On the receiver side,
we do not forward the mutations to the shard where the stream_session is
created, so that we can avoid unnecessary forwarding.

Note: the downside is that it is now harder to:

1) to track number of bytes sent and received
2) to update the keep alive timer upon receive of the STREAM_MUTATION

To fix, we now store the sent/recieved bytes info on all shards. When
the keep alive timer expires, we check if any progress has been made.

Hopefully, this patch will make the streaming much faster and in turn
make the repair/decommission/adding a node faster.

Refs: https://github.com/scylladb/scylla/issues/849

Tested with decommission/repair dtest.

Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>
2016-02-07 10:57:51 +02:00
Gleb Natapov
63a5aa6122 prevent superfluous frozen_mutation copying
Sometimes frozen_mutation is copied while it can be moved instead. Fix
those cases.

Message-Id: <20160204165708.GI6705@scylladb.com>
2016-02-07 10:54:16 +02:00
Erich Keane
4197ceeedb raw_statement::is_reversed rewrite to avoid VLA
The is_reversed function uses a variable length array, which isn't
spec-abiding C++.  Additionally, the Clang compiler doesn't allow them
with non-POD types, so this function wouldn't compile.

After reading through the function it seems that the array wasn't
necessary as the check could be calculated inline rather than
separately.  This version should be more performant (since it no longer
requires the VLA lookup performance hit) while taking up less memory in
all but the smallest of edge-cases (when the clustering_key_size *
sizeof(optional<bool>) < sizeof(size_type) - sizeof(uint32_t) +
sizeof(bool).

This patch uses  relation_order_unsupported it assure that the exception
order is consistent with the preivous version.  The throw would
otherwise be moved into the initial for-loop.

There are two derrivations in behavior:
The first is the initial assert.  It however should not change the apparent
behavior besides causing orderings() to be looked up 2x in debug
situations.

The second is the conversion of is_reversed_ from an optional to a bool.
The result is that the final return value is now well-defined to be
false in the release-condition where orderings().size() == 0, rather
than be the ill-defined *is_reversed_ that was there previously.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454546285-16076-4-git-send-email-erich.keane@verizon.net>
2016-02-07 10:38:17 +02:00
Erich Keane
49842aacd9 managed_vector: maybe_constructed ctor to non-constexpr
Clang enforces that a union's constexpr CTOR must initialize
one of the members.  The spec is seemingly silent as to what
the rule on this is, however, making this non-constexpr results in clang
accepting the constructor.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454604300-1673-1-git-send-email-erich.keane@verizon.net>
2016-02-07 10:30:45 +02:00
Erich Keane
e87019843f Fix PHI_FACTOR definition to be spec compliant
PHI_FACTOR is a constexpr variable that is defined using std::log.
Though G++ has a constexpr version of std::log, this itself is not spec
complaint (in fact, Clang enforces this).  See C++ Spec 26.8 for the
definition of std::log and 17.6.5.6 for the rule regarding adding
constexpr where it isn't specified.

This patch replaces the std::log statement with a version from math.h
that contains the exact value (M_LOG10El).

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454603285-32677-1-git-send-email-erich.keane@verizon.net>
2016-02-04 18:33:44 +02:00
Avi Kivity
c85f6c4df1 Merge seastar upstream
* seastar 661ccd9...14c9991 (1):
  > reactor: use correct open_flags when opening a file without DMA support

Fixes #871.
2016-02-04 18:17:04 +02:00
Gleb Natapov
77d47c0c4b optimize serialization of array/vector of integral types
Array of integral types on little endian machine can be memcpyed into/out
of a buffer instead of serialized/deserialized element by element.

Message-Id: <20160204155425.GC6705@scylladb.com>
2016-02-04 18:01:14 +02:00
Avi Kivity
91fbb81477 Merge seastar upstream
* seastar f8beab9...661ccd9 (1):
  > Merge "Use swapcontext() with AddressSanitizer" from Paweł
2016-02-04 17:30:15 +02:00
Paweł Dziepak
ababdfc9e2 tests/batchlog: use proper batchlog version
Since 42e3999a00 "Check batchlog version
before replaying" there is a version check in batchlog replay.
However, the test wasn't updated and still used some arbitrary version
number which caused it to fail.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1454595368-21670-1-git-send-email-pdziepak@scylladb.com>
2016-02-04 16:50:45 +02:00
Gleb Natapov
049ae37d08 storage_proxy: change collectd to show foreground mutation instead of overall mutation count
It is much easier to see what is going on this way otherwise graphs for
bg mutations and overall mutations are very close with usual scaling for
many workloads.

Message-Id: <20160204083452.GH6705@scylladb.com>
2016-02-04 14:58:56 +02:00
Gleb Natapov
a9e4afd8d2 Drop query-result.hh from database.hh
It is not needed there but causes a lot of recompilation when changed.

Message-Id: <1454496142-14537-3-git-send-email-gleb@scylladb.com>
2016-02-04 13:22:27 +02:00
Gleb Natapov
2ae1ae2d18 Cleanup messaging_service.hh includes a bit.
Forward declare some classes instead.

Message-Id: <1454496142-14537-2-git-send-email-gleb@scylladb.com>
2016-02-04 13:22:24 +02:00
Avi Kivity
f3ca597a01 Merge "Sstable cleanup fixes" from Tomasz
"  - Added waiting for async cleanup on clean shutdown

  - Crash in the middle of sstable removal doesn't leave system in a non-bootable state"
2016-02-04 12:36:13 +02:00
Tomasz Grabiec
c7ef3703cc sstable: Make sstable deletion never leave sstable set in a non-bootable state
Refs #860
Refs #802

An sstable file set with any component missing is interpreted as a
critical error during boot. Currently sstable removal procedure could
leave the files in a non-bootable state if the process crashed after
TOC was removed but before all components were removed as well.

To solve this problem, start the removal by renaming the TOC file to a
so called "temporary TOC". Upon boot such kind of TOC file is
interpreted as an sstable which is safe to remove. This kind of TOC
was added before to deal with a similar scenario but in the opposite
direction - when writing a new sstable.
2016-02-03 17:36:17 +01:00
Tomasz Grabiec
c8a98b487c sstables: Remove coupling-hiding duplication 2016-02-03 17:36:17 +01:00
Tomasz Grabiec
355874281a sstables: Do not register exit hooks from static initializer
Fixes #868.

Registerring exit hooks while reactor is already iterating over exit
hooks is not allowed and currently leads to undefined behavior
observed in #868. While we should make the failure more user friendly,
registering exit hooks concurrently with shutdown will not be allowed.

We don't expect exit hooks to be registered after exit starts because
this would violate the guarantee which says that exit hooks are
executed in reverse order of registration. Starting exit sequence in
the middle of initialization sequence would result in use after free
errors. Btw, I'm not sure if currently there's anything which prevents
this

To solve this problem, move the exit hook to initilization
sequence. In case of tests, the cleanup has to be called explicitly.
2016-02-03 17:35:50 +01:00
Tomasz Grabiec
136c9d9247 sstables: Improve error message in case of generation duplication
Refs #870.
2016-02-03 17:35:50 +01:00
Calle Wilund
a00ff015f4 transport::server: read cqlv2 batch options correctly
Fixes #563.
Refs #584

CQLv2 encodes batch query_options in v1 format, not v2+.
CQLv1 otoh has no batch support at all.
Make read_options use explicit version format if needed.

v2: Ensure we preserve cql protocol version in query_opts
Message-Id: <1454514510-21706-1-git-send-email-calle@scylladb.com>
2016-02-03 16:55:07 +01:00
Gleb Natapov
b4b560e0fc change result_digest to hold std::array instead of a std::vector
Digest size if fixed, so no need to use std::vector to hold it.

Message-Id: <20160203102530.GU6705@scylladb.com>
2016-02-03 12:27:39 +02:00
Raphael S. Carvalho
4041f8cffc compaction: stop all ongoing compaction during shutdown
Currently, we wait for ongoing compaction during shutdown, but
that may take 'forever' if compacting huge sstables with a slow
disk. Compaction of huge sstables will take a considerable amount
of time even with fast disks. Therefore, all ongoing compaction
should be stopped during shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3370f17ce4274df417ea60651f33fc5d4de91199.1454441286.git.raphaelsc@scylladb.com>
2016-02-03 10:18:51 +02:00
Raphael S. Carvalho
cf22c827f9 compaction_manager: fix assertion when stopping task
Task is stopped by closing gate and forcing it to exit via gate
exception. The problem is that task->compacting_cf may be set to
the column family being compacted, and compaction_manager::remove
would see it and try to stop the same task again, which would
lead to problems. The fix is to clean task->compacting_cf when
stopping task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3473e93c1a107a619322769d65fa020529b5501b.1454441286.git.raphaelsc@scylladb.com>
2016-02-03 10:18:15 +02:00
Asias He
c67538009c streaming: Fix assert in update_progress
The problem is that on the follower side, we set up _session_info too
late, after received PREPARE_DONE_MESSAGE message. The initiator can
send STREAM_MUTATION before sending PREPARE_DONE_MESSAGE message.

To fix, we set up _session_info after we received the prepare_message on
both initiator and follower.

Fixes #869

scylla: streaming/session_info.cc:44: void
streaming::session_info::update_progress(streaming::progress_info):
Assertion `peer == new_progress.peer' failed.
Message-Id: <6d945ba1e8c4fc0949c3f0a72800c9448ba27761.1454476876.git.asias@scylladb.com>
2016-02-03 10:15:45 +02:00
Asias He
46c392eb17 messaging_service: Stop retrying if messaging_service is being shutdown
If we are shutting down the messaging_service, we should not retry the
message again.

Refs #862

Message-Id: <7c3afb646ba8254eca69096d80dd5ea007e416a7.1454418053.git.asias@scylladb.com>
2016-02-02 19:50:54 +02:00
Gleb Natapov
c509e48674 Parallelize batchlog replay
Current code is serialized by get_truncated_at(). Use map_reduce to make
it run in parallel.
Message-Id: <1454421603-13080-4-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:54 +01:00
Gleb Natapov
42e3999a00 Check batchlog version before replaying
In case batchlog serialization format changes check it before trying
to interpret raw data.
Message-Id: <1454421603-13080-3-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:54 +01:00
Gleb Natapov
116ad5a603 Use net::messaging_service::current_version for serialization format versioning
Message-Id: <1454421603-13080-2-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:53 +01:00
Avi Kivity
b14d39bfb1 Merge "Move last bits to IDL serializer and get rid of old one" from Gleb 2016-02-02 12:33:18 +02:00
Gleb Natapov
19067db642 remove old serializer 2016-02-02 12:15:50 +02:00
Gleb Natapov
4e440ebf8e Remove old inet_address and uuid serializers 2016-02-02 12:15:50 +02:00
Gleb Natapov
31bb194c21 Remove old result_digest serializer 2016-02-02 12:15:50 +02:00
Gleb Natapov
10cd4d948c Move result_digest to idl 2016-02-02 12:15:50 +02:00
Gleb Natapov
775cc93880 remove unused range and token serializers 2016-02-02 12:15:49 +02:00
Gleb Natapov
e3a40254e6 Remove old partition_checksum serializer 2016-02-02 12:15:49 +02:00
Gleb Natapov
e6f7b12b51 Move partition_checksum to use idl 2016-02-02 12:15:49 +02:00
Gleb Natapov
8cc1d1a445 Add std:array serializer 2016-02-02 12:15:49 +02:00
Gleb Natapov
a8902ccb4a Remove old frozen_schema serializer 2016-02-02 12:15:49 +02:00
Gleb Natapov
60e3637efc Move frozen_schema to idl 2016-02-02 12:15:49 +02:00
Nadav Har'El
b95c15f040 repair: change checksum structure to be better suited for serializer
Change the partition_checksum structure to be better suited for the
new serializers:

 1. Use std::array<> instead of a C array, as the latters are not
    supported by the new serializers.

 2. Use an array of 32 bytes, instead of 4 8-byte integers. This will
    guarantee that no byte-swapping monkey-business will be done on
    these checksums.
    The checksum XOR and equality-checking methods still temporarily
    cast the bytes to 8-byte chunks, for (hopefully) better performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1454364900-3076-1-git-send-email-nyh@scylladb.com>
2016-02-02 11:58:25 +02:00
Calle Wilund
c67e7e4ce4 cql3::sets: Make insert/update frozen set handle null/empty correctly
Fixes #578

Message-Id: <1454345878-1977-1-git-send-email-calle@scylladb.com>
2016-02-01 19:15:28 +02:00
Takuya ASADA
5fe82ce555 dist: fix build error on Ubuntu 15.10
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454345982-5899-1-git-send-email-syuu@scylladb.com>
2016-02-01 19:14:49 +02:00
Avi Kivity
1f245e3bcb mutation_partition: fix use of boost::intrusive::set<>::comp()
Seems like boost::intrusive::set<>::comp() is not accessible on some
versions of boost.  Replace by the equivalent
boost::intrusive::set<>::key_comp().

Fixes #858.
Message-Id: <1454326483-29780-1-git-send-email-avi@scylladb.com>
2016-02-01 13:54:52 +01:00
Calle Wilund
159dbe3a64 sstable_datafile_tests: Replace '---' with auto
Fixes compilation issues on some g++.
Message-Id: <1454323749-21933-1-git-send-email-calle@scylladb.com>
2016-02-01 12:58:33 +02:00
Avi Kivity
2b84bd3b75 Merge "standalone tcp connection for streaming" from Asias
"Make the streaming use standalone tcp connection and send more mutations in
parallel.

It is supposed to help: "Decommission not fully utilizing hardware #849""
2016-02-01 09:54:11 +02:00
Asias He
c618c699b3 streaming: Increase mutation_send_limiter
The idea behind the current 10 stream_mutations per core limitation is
to avoid streaming overwhelms the TCP connection and starves normal cql
verbs if the streaming mutations are big and takes long time to
complete.

Now that we use a standalone connection for streaming verbs, we can
increase the limitation.

Hopefully, this will fix #849.
2016-02-01 11:01:56 +08:00
Asias He
fbf796b812 messaging_service: Use standalone connection for stream verbs
In streaming, the amount of data needs to be streamed to peer nodes
might be large.

In order to avoid the streaming overwhelms the TCP connection used by
user CQL verbs and starves the user CQL queries, we use a standalone TCP
connection for streaming verbs.
2016-02-01 11:01:56 +08:00
Avi Kivity
1146e3796d Merge "streaming refactor" from Asias
"- Wire up session progress
- Refactor stream_coordinator::host_streaming_data
- Introduce get_session helper to simplfy verb handling
- Remove unused code

Tested with streaming in update_cluster_layout_tests.py"
2016-01-31 20:17:53 +02:00
Tomasz Grabiec
945ae5d1ea Move std::hash<range<T>> definition to range.hh
Message-Id: <1454008052-5152-1-git-send-email-tgrabiec@scylladb.com>
2016-01-31 20:11:30 +02:00
Avi Kivity
f6e7dbf080 Merge seastar upstream
* seastar 6623379...f8beab9 (2):
  > json_base_element: do not assign the element name
  > io_queue: change visibility of internal function
2016-01-31 16:34:39 +02:00
Raphael S. Carvalho
a46aa47ab1 make sstables::compact_sstables return list of created sstables
Now, sstables::compact_sstables() receives as input a list of sstables
to be compacted, and outputs a list of sstables generated by compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0d8397f0395ce560a7c83cccf6e897a7f464d030.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Raphael S. Carvalho
ee84f310d9 move deletion of sstables generated by interrupted compaction
This deletion should be handled by sstables::compact_sstables, which
is the responsible for creation of new sstables.
It also simplifies the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <541206be2e910ab4edb1500b098eb5ebf29c6509.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Glauber Costa
7214649b8a sstables: const where const is due
Some SSTable methods are not marked as const. But they should be.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <72cd3ef0157eb38e7fd48d0c989f2342cbc42f3c.1454103008.git.glauber@scylladb.com>
2016-01-31 12:36:36 +02:00
Avi Kivity
3434b8e7c6 Merge seastar upstream
* seastar fbd9b30...6623379 (1):
  > fstream: improve make_file_input_stream() for a subrange of a file
2016-01-31 12:01:46 +02:00
Avi Kivity
a3fa123070 Update scylla-ami submodule
* dist/ami/files/scylla-ami e284bcd...b2724be (2):
  > Revert "Run scylla.yaml construction only once"
  > Move AMI dependent part of scylla_prepare to scylla-ami-setup.service
2016-01-31 12:01:15 +02:00
Avi Kivity
f08f5858a8 Merge "Introduce scylla-ami-setup.service, fix bugs" from Takuya
"This moves AMI dependent part of scylla_prepare to scylla-ami repo, make it scylla-ami-setup.service which is independent systemd unit.
Also, it stopped calling scylla_sysconfig_setup on scylla_setup (called on AMI creation time), call it from scylla-ami-setup instead."
2016-01-31 12:00:32 +02:00
Takuya ASADA
111dc19942 dist: construct scylla.yaml on first startup of AMI instance, not AMI image creation time
Install scylla-ami-setup.service, stop calling scylla_sysconfig_setup on AMI.
scylla-ami-setup.service will call it instead.
Only works with scylla-ami fix.
Fixes #857

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-30 15:48:45 -05:00
Takuya ASADA
71a26e1412 dist: don't need AMI_KEEP_VERSION anymore, since we fixed the issue that 'yum update' mistakenly replaces scylla development version with release version
It actually doesn't called unconditionally now, (since $LOCAL_PKG is always empty) so we can safely remove this.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-30 15:47:05 -05:00
Takuya ASADA
f9d32346ef dist: scylla_sysconfig_setup uses current sysconfig values as default value
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-30 15:46:21 -05:00
Takuya ASADA
4d5baef3e3 dist: keep original SCYLLA_ARGS when updating sysconfig
Since we dropped scylla_run, now default SCYLLA_ARGS parameter is not empty.
So we need to support it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-30 15:44:03 -05:00
Asias He
f07cd30c81 streaming: Remove unused create_message_for_retry 2016-01-29 16:31:07 +08:00
Asias He
cb92fe75e6 streaming: Introduce get_session helper
To simplify streaming verb handler.

- Use get_session instead of open coded logic to get get_coordinator and
  stream_session in all the verb handlers

- Use throw instead of assert for error handling

- init_receiving_side now returns a shared_ptr<stream_result_future>
2016-01-29 16:31:07 +08:00
Asias He
360df6089c streaming: Remove unused stream_session::retry 2016-01-29 16:31:07 +08:00
Asias He
2f48d402e2 streaming: Remove unused commented code 2016-01-29 16:31:07 +08:00
Asias He
ed3da7b04c streaming: Drop flush_tables option for add_transfer_ranges
We do not stream sstable files. No need to flush it.
2016-01-29 16:31:07 +08:00
Asias He
aa69d5ffb2 streaming: Drop update_progress in stream_coordinator
Since we have session_info inside stream_session now, we can call
update_progress directly in stream_session.
2016-01-29 16:31:07 +08:00
Asias He
30c745f11a streaming: Get rid of stream_coordinator::host_streaming_data
Now host_streaming_data only holds shared_ptr<stream_session>, we can
get rid of it and put shared_ptr<stream_session> inside _peer_sessions.
2016-01-29 16:31:07 +08:00
Asias He
46bec5980b streaming: Put session_info inside stream_session
It is 1:1 mapping between session_info and stream_session. Putting
session_info inside stream_session, we can get rid of the
stream_coordinator::host_streaming_data class.
2016-01-29 16:31:07 +08:00
Asias He
91e245edac streaming: Initialize total_size in stream_transfer_task
Also rename the private member to _total_size and _files
2016-01-29 16:31:07 +08:00
Asias He
c4bdb6f782 streaming: Wire up session progress
The progress info is needed by JMX api.
2016-01-29 16:31:07 +08:00
Avi Kivity
3e4ce609ee Merge seastar upstream
* seastar ec468ba...fbd9b30 (9):
  > Add implementation of count_leading_zeros<LL>
  > Fix htonl usage for clang
  > Fix gnutls_error_category ctor for clang
  > Add header files required for libc++
  > Add clang warning suppressions
  > Switch to correct usage of std::abs
  > Fix the do_marshall(sic) function to init_list
  > Corrected sockaddr_in initialization
  > Remove unused const char misc_strings
2016-01-28 18:24:46 +02:00
Raphael S. Carvalho
3b7970baff compaction: delete generated sstables in event of an interrupt
Generated sstables may imply either fully or partially written.
Compaction is interrupted if it was deriberately asked to stop (stop API)
or it was forced to do so in event of a failure, ex: out of disk space.
There is a need to explicitly delete sstables generated by a compaction
that was interrupted. Otherwise, such sstables will waste disk space and
even worsen read performance, which degrades as number of generations
to look at increases.

Fixes #852.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <49212dbf485598ae839c8e174e28299f7127f63e.1453912119.git.raphaelsc@scylladb.com>
2016-01-28 14:05:57 +02:00
Pekka Enberg
3c3c819280 Merge "api: Fix stream_manager" from Asias
"Fix the metrics for bytes sent and received"
2016-01-28 13:57:59 +02:00
Raphael S. Carvalho
ba4260ea8f api: print proper compaction type
There are several compaction types, and we should print the correct
one when listing ongoing compaction. Currently, we only support
compaction types: COMPACTION and CLEANUP.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c96b1508a8216bf5405b1a0b0f8489d5cc4be844.1453851299.git.raphaelsc@scylladb.com>
2016-01-28 13:47:00 +02:00
Tomasz Grabiec
9fa62af96b database: Move implementation to .cc
Message-Id: <1453980679-27226-1-git-send-email-tgrabiec@scylladb.com>
2016-01-28 13:35:33 +02:00
Tomasz Grabiec
ca6bafbb56 canonical_mutation: Remove commented out junk 2016-01-28 12:29:20 +01:00
Tomasz Grabiec
41dc98bb79 Merge branch 'cleanup_improvements' from git@github.com:raphaelsc/scylla.git
Compaction cleanup improvements from Raphael.
2016-01-27 18:30:46 +01:00
Avi Kivity
873deb5808 Merge "move paging_state to use idl" from Gleb 2016-01-27 19:06:04 +02:00
Takuya ASADA
03caacaad0 dist: enable collectd client by default
Fixes #838

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453910311-24928-2-git-send-email-syuu@scylladb.com>
2016-01-27 18:45:45 +02:00
Takuya ASADA
f33656ef03 dist: eliminate startup script
Fixes #373

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453910311-24928-1-git-send-email-syuu@scylladb.com>
2016-01-27 18:45:35 +02:00
Gleb Natapov
b065e2003f Move paging_state to use idl 2016-01-27 18:39:43 +02:00
Raphael S. Carvalho
45c446d6eb compaction: pass dht::token by reference
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:25:41 -02:00
Raphael S. Carvalho
fc541e2f08 compaction: remove code to sort local ranges
storage_service::get_local_ranges returns sorted ranges, which are
not overlapping nor wrap-around. As a result, there is no need for
the consumer to do anything.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:15:36 -02:00
Avi Kivity
c75d1c4eeb Merge "Ubuntu 'expect stop' related fixes" from Takuya 2016-01-27 17:00:23 +02:00
Gleb Natapov
65bd429a0b Add serialization helper to use outside of rpc. 2016-01-27 16:43:06 +02:00
Takuya ASADA
4162fb158c main: raise SIGSTOP only when scylla become ready
supervisor_notify() calls periodically, to log message on systemd.
So raise(SIGSTOP) will called multiple times, upstart doesn't expected that.
We need to call it just one time.

Fixes #846

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:30:26 +09:00
Takuya ASADA
851951d32d dist: run upstart job as 'scylla' user
Don't use sudo when launching scylla, run directly from upstart.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:30:02 +09:00
Takuya ASADA
89f0fc89b4 dist: set ulimit in upstart job
Upstart job able to specify ulimit like systemd, so drop ubuntu's scylla_run and merge with redhat one.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:29:52 +09:00
Takuya ASADA
b4accd8904 main: autodetect systemd/upstart
We can autodetect systemd/upstart by environment variables, don't need program argument.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:29:32 +09:00
Takuya ASADA
559f913494 dist: use nightly for prebuilt 3rdparty packages (CentOS)
Developers probably wants to use latest dependency packages, so switch to nightly.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453904521-2716-1-git-send-email-syuu@scylladb.com>
2016-01-27 16:24:49 +02:00
Gleb Natapov
19c55693fd idl: add missing header to serializer.hh 2016-01-27 15:49:29 +02:00
Amnon Heiman
7b53b99968 idl-compiler: split the idl list
Not all the idls are used by the messaging service, this patch removes
the auto-generated single include file that holds all the files and
replaes it with individual include of the generated fiels.
The patch does the following:
* It removes from the auto-generated inc file and clean the configure.py
  from it.
* It places an explicit include for each generated file in
  messaging_serivce.
* It add dependency of the generated code in the idl-compiler, so a
change in the compiler will trigger recreation of the generated files.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453900241-13053-1-git-send-email-amnon@scylladb.com>
2016-01-27 15:23:00 +02:00
Pekka Enberg
86173fb8cc db/commitlog: Fix debug log format string in commitlog_replayer::recover()
I saw the following Boost format string related warning during commitlog
replay:

  INFO  [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log
  WARN  [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed)

While inspecting the code, I noticed that one of the error loggers is
missing an argument. As I don't know how the original failure triggered,
I wasn't able to verify that that was the only one, though.

Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>
2016-01-27 13:40:19 +02:00
Pekka Enberg
c4dafe24f5 Update scylla-ami submodule
* dist/ami/files/scylla-ami 77cde04...e284bcd (2):
  > Run scylla.yaml construction only once
  > Revert "Run scylla.yaml construction only once"
2016-01-27 13:30:04 +02:00
Asias He
9fee1cc43a api: Use get_bytes_{received,sent} in stream_manager
The data in session_info is not correctly updated.

Tested while decommission a node:

$ curl -X GET  --silent --header "Accept: application/json"
"http://127.0.0.$i:10000/stream_manager/metrics/incoming";echo

$ curl -X GET --silent --header "Accept: application/json"
"http://127.0.0.$i:10000/stream_manager/metrics/outgoing";echo
2016-01-27 18:17:36 +08:00
Asias He
03aced39c4 streaming: Account number of bytes sent and received per session
The API will consume it soon.
2016-01-27 18:16:58 +08:00
Asias He
36829c4c87 api: Fix stream_manager total_incoming/outgoing bytes
Any stream, no matter initialized by us or initialized by a peer node,
can send and receive data. We should audit incoming/outgoing bytes in
the all streams.
2016-01-27 18:15:09 +08:00
Asias He
08f703ddf6 streaming: Add get_all_streams in stream_manager
Get all streams both initialized by us or initialized by peer node.
2016-01-27 18:15:09 +08:00
Tomasz Grabiec
c971544e83 bytes_ostream: Adapt to Output concept used in serializer.hh
Message-Id: <1453888242-2086-1-git-send-email-tgrabiec@scylladb.com>
2016-01-27 12:13:34 +02:00
Gleb Natapov
6a581bb8b6 messaging_service: replace rpc::type with boost::type
RPC moved to boost::type to make serializers less rpc centric. Scylla
should follow.

Message-Id: <20160126164450.GA11706@scylladb.com>
2016-01-27 11:57:45 +02:00
Gleb Natapov
6f6b231839 Make serializer use new simple stream location
Message-Id: <20160127093045.GG9236@scylladb.com>
2016-01-27 11:37:37 +02:00
Raphael S. Carvalho
d54c77d5d0 change abstract_replication_strategy::get_ranges to not return wrap-arounds
The main motivation behind this change is to make get_ranges() easier for
consumers to work with the returned ranges, e.g. binary search to find a
range in which a token is contained. In addition, a wrap-around range
introduces corner cases, so we should avoid it altogether.

Suppose that a node owns three tokens: -5, 6, 8

get_ranges() would return the following ranges:
(8, -5], (-5, 6], (6, 8]
get_ranges() will now return the following ranges:
(-inf, -5], (-5, 6], (6, 8], (8, +inf)

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <4bda1428d1ebbe7c8af25aa65119edc5b97bc2eb.1453827605.git.raphaelsc@scylladb.com>
2016-01-27 09:48:31 +01:00
Avi Kivity
b9ab28a0e6 Merge "storage_service: add drain on shutdown logic & fix" from Asias
"Fixes:
- storage_service::handle_state_removing() doesn't call drain() #825
https://github.com/scylladb/scylla/issues/825

- nodetool gossipinfo is out of sync #790
https://github.com/scylladb/scylla/issues/790"
2016-01-27 10:38:56 +02:00
Avi Kivity
1d7144ac14 Merge seastar upstream
* seastar bdb273a...ec468ba (1):
  > Move simple streams used for serialization into separate header
2016-01-27 10:38:09 +02:00
Amnon Heiman
fd94009d0e Fix API init process
The last patch of the API init process had a bug, that the wrong init
function was called.

This solve the issue.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453879042-26926-1-git-send-email-amnon@scylladb.com>
2016-01-27 10:03:24 +02:00
Asias He
8b4275126d storage_service: Shutdown messaging_service in decommission
It is commented out.
2016-01-27 11:48:49 +08:00
Asias He
b2f2c1c28c storage_service: Add drain on shutdown logic
We register engine().at_exit() callbacks when we initialize the services. We
do not really call the callbacks at the moment due to #293.

It is pretty hard to see the whole picture in which order the services
are shutdown. Instead of for each services to register a at_exit()
callbacks, I proposal to have a single at_exit() callback which do the
shutdown for all the services. In cassandra, the shutdown work is done
in storage_service::drain_on_shutdown callbacks.

In this patch, the drain_on_shutdown is executed during shutdown.

As a result, the proper gossip shutdown is executed and fixes #790.

With this patch, when Ctrl-C on a node, it looks like:

INFO  [shard 0] storage_service - Drain on shutdown: starts
INFO  [shard 0] gossip - Announcing shutdown
INFO  [shard 0] storage_service - Node 127.0.0.1 state jump to normal
INFO  [shard 0] storage_service - Drain on shutdown: stop_gossiping done
INFO  [shard 0] storage_service - CQL server stopped
INFO  [shard 0] storage_service - Drain on shutdown: shutdown rpc and cql server done
INFO  [shard 0] storage_service - Drain on shutdown: shutdown messaging_service done
INFO  [shard 0] storage_service - Drain on shutdown: flush column_families done
INFO  [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
INFO  [shard 0] storage_service - Drain on shutdown: done
2016-01-27 11:45:52 +08:00
Asias He
e733930dff storage_service: Call drain inside handle_state_removing
Now that drain is implemented, call it.

Fixes #825
2016-01-27 11:45:52 +08:00
Asias He
5003c6e78b config: Introduce shutdown_announce_in_ms option
Time a node waits after sending gossip shutdown message in milliseconds.

Reduces ./cql_query_test execution time

from
   real    2m24.272s
   user    0m8.339s
   sys     0m10.556s

to
   real    1m17.765s
   user    0m3.698s
   sys     0m11.578
2016-01-27 11:19:38 +08:00
Paweł Dziepak
490201fd1c row_cache: protect against stale entries
row_cache::update() does not explicitly invalidate the entries it failed
to update in case of a failure. This could lead to inconsistency between
row cache and sstables.

In paractice that's not a problem because before row_cache::update()
fails it will cause all entries in the cache to be invalidated during
memory reclaim, but it's better to be safe and explicitly remove entries
that should be updated but it was not possible to do so.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453829681-29239-1-git-send-email-pdziepak@scylladb.com>
2016-01-26 20:34:41 +01:00
Takuya ASADA
9b66d00115 dist: fix scylla_bootparam_setup for Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453836012-6436-1-git-send-email-syuu@scylladb.com>
2016-01-26 21:24:20 +02:00
Erich Keane
c836c88850 Replace deprecated BOOST_MESSAGE with BOOST_TEST_MESSAGE
BOOST Unit test deprecated BOOST_MESSAGE as early as 1.34 and had it
been perminently removed.  This patch replaces all uses of BOOST_MESSAGE
with BOOST_TEST_MESSAGE.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1453783854-4274-1-git-send-email-erich.keane@verizon.net>
2016-01-26 19:01:40 +02:00
Amnon Heiman
b1845cddec Breaking the API initialization into stages
The API needs to be available at an early stage of the initialization,
on the other hand not all the specific APIs are available at that time.

This patch breaks the API initialization into stages, in each stage
additional commands will be available.

While setting that the api header files was broken into api_init.hh that
is relevent to the main and to api.hh which holds the different
api helper functions.

Fixes #754

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453822331-16729-2-git-send-email-amnon@scylladb.com>
2016-01-26 17:41:31 +02:00
Calle Wilund
e6b792b2ff commitlog bugfix: Fix batch mode
Last series accidently broke batch mode.
With new, fancy, potentitally blocking ways, we need to treat
batch mode differently, since in this case, sync should always
come _after_ alloc-write.
Previous patch caused infinite loop. Broke jenkins.

Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>
2016-01-26 17:13:14 +02:00
Glauber Costa
3f94070d4e use auto&& instead of auto& for priority classes.
By Avi's request, who reminds us that auto& is more suited for situations
in which we are assigning to the variable in question.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>
2016-01-26 17:00:20 +02:00
Avi Kivity
71eb79aedd main: exit with code 0 on shutdown
To avoid confusing systemd.

Fixes #823.

Message-Id: <1453220473-28712-1-git-send-email-avi@scylladb.com>
2016-01-26 16:26:53 +02:00
Calle Wilund
89dc0f7be3 commitlog: wait for writes (if needed) on new segment as well
Also check closed status in allocate, since alloc queue waiting could
lead to us re-allocating in a segment that gets closed in between
queue enter and us running the continuation.

Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>
2016-01-26 15:05:12 +02:00
Shlomi Livne
0a553dae1f Fix test.py invocation of sstable_test
invocation of sstable_test "./test.py  --name sstable_test --mode
release --jenkins a"
ran ... --log_sink=a.release.sstable_test -c1.boost.xml" which caused
the test to fail "with error code -11" fix that.

In addition boost test printout was bad fix that as well

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <3af8c4b55beae673270f5302822d7b9dbba18c0f.1453809032.git.shlomi@scylladb.com>
2016-01-26 12:56:26 +01:00
Avi Kivity
fbf56b3d98 Merge "Commit log threshold / back pressure" from Calle
"Adds flush + write thresholds/limits that, when reached, causes
operations to wait before being issued.
Write ops waiting also causes further allocations to queue up,
i.e. limiting throughput.

Adds getters for some useful "backlog" measurements:

* Pending (ongoing) writes/flush
* Pending (queued, wating) allocations
* Num times write/flush threshold has been exceeded (i.e. waits occured)
* Finished, dirty segments
* Unused (preallocated) segments"
2016-01-26 13:19:58 +02:00
Avi Kivity
a53788d61d Merge "More streaming cleanup and fix" from Asias
"- Drop compression_info/stream_message
- Cleanup outgoing_file_message/prepare_message
- Fix stream manager API (more to come)"
2016-01-26 13:17:58 +02:00
Avi Kivity
486d937111 Merge seastar upstream
* seastar 97f418a...bdb273a (6):
  > rpc: alias rpc::type to boost::type
  > Fix warning_supported to properly work with Clang
  > rpc: change 'overflow' to 'underflow' in input stream processing
  > rpc: log an error that caused connection to be closed.
  > rpc: clarify deserialization error message
  > rpc: do not append new line in a logger
2016-01-26 12:58:32 +02:00
Avi Kivity
5ad4b59f99 Update scylla-ami submodule
* dist/ami/files/scylla-ami 188781c...77cde04 (1):
  > Run scylla.yaml construction only once
2016-01-26 12:58:11 +02:00
Takuya ASADA
12748cf1b9 dist: support CentOS AMI on scylla_ntp_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453801319-26072-1-git-send-email-syuu@scylladb.com>
2016-01-26 12:55:08 +02:00
Calle Wilund
f2c5315d33 commitlog: Add write/flush limits
Configured on start (for now - and dummy values at that). 
When shard write/flush count reaches limit, and incoming ops will queue
until previous ones finish. 

Consequently, if an allocation op forces a write, which blocks, any 
other incoming allocations will also queue up to provide back pressure.
2016-01-26 10:19:24 +00:00
Calle Wilund
7628a4dfe0 commitlog: Add some feedback/measurement methods
Suitable to derive "back pressure" from.
2016-01-26 09:47:14 +00:00
Calle Wilund
4f5bd4b64b commitlog: split write/flush counters 2016-01-26 09:47:14 +00:00
Calle Wilund
215c8b60bf commitlog: minor cleanup - remove red squiggles in eclipse 2016-01-26 09:42:26 +00:00
Calle Wilund
61c7235c11 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-26 09:42:08 +00:00
Avi Kivity
0de7d1fc1b Merge "Add priority classes to our I/O path" from Glauber
"After the patch, all of our relevant I/O is placed on a specific priority class.
The ones which are not are left into the Seastar's default priority, which will
effectively work as an idle class.

Examples of such I/O are commitlog replay and initial SSTable loading. Since they
will happen during initialization, they will run uncontended, and do not justify
having a class on their own."
2016-01-26 10:46:13 +02:00
Asias He
750573ca0c configure: Fix idl indentation 2016-01-26 15:04:45 +08:00
Asias He
cc6d928193 api: Fix peer -> streaming_plan id in stream_manager
It is wrong to get a stream plan id like below:

   utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);

We should look at all stream_sessions with the peer in question.
2016-01-26 15:00:44 +08:00
Asias He
384e81b48a streaming: Add get_peer_session_info
Like get_all_session_info, but only gets the session_info for a specific
peer.
2016-01-26 14:52:40 +08:00
Asias He
c7b156ed65 api: Fix get_{all}total_outgoing_byte in stream_manager
We should call get_total_size_sent instead of get_total_size_received
for outgoing byes.
2016-01-26 14:22:43 +08:00
Asias He
2e69d50c0c streaming: Cleanup prepare_message
- Drop empty prepare_message.cc
- Drop #if 0'ed code
2016-01-26 13:14:04 +08:00
Asias He
bbf025968b streaming: Cleanup outgoing_file_message
- Drop the unused headers
- Drop the outgoing_file_message.cc file which is empty
2016-01-26 13:12:01 +08:00
Asias He
e8b8b454df streaming: Flatten streaming messages class namespace
There are only two messages: prepare_message and outgoing_file_message.
Actually only the prepare_message is the message we send on wire.
Flatten the namespace.
2016-01-26 13:04:29 +08:00
Asias He
cab36a450b streaming: Remove stream_message
It is not useful to make stream_message as the base class for stream
messages. Scylla uses RPC verbs to distinguish different messages types.
2016-01-26 12:32:17 +08:00
Asias He
6a067bcc23 streaming: Drop unused compression_info 2016-01-26 11:55:36 +08:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
261c272178 introduce a priority manager
After the introduction of the Fair I/O Queueing mechanism in Seastar,
it is possible to add requests to a specific priority class, that will
end up being serviced fairly.

This patch introduces a Priority Manager service, that manages the priority
each class of request will get. At this moment, having a class for that may
sound like an overkill. However, the most interesting feature of the Fair I/O
queue comes from being able to adjust the priorities dynamically as workloads
changes: so we will benefit from having them all in the same place.

This is designed to behave like one of our services, with the exception that
it won't use the distributed interface. This is mainly because there is no
reason to introduce that complexity at this point - since we can do thread local
registration as we have been doing in Seastar, and because that would require us
to change most of our tests to start a new service.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
f6cfb04d61 add a priority class to mutation readers
SSTables already have a priority argument wired to their read path. However,
most of our reads do not call that interface directly, but employ the services
of a mutation reader instead.

Some of those readers will be used to read through a mutation_source, and those
have to patched as well.

Right now, whenever we need to pass a class, we pass Seastar's default priority
class.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
8e4bf025ae sstables: wire priority for read path
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
56c11a8109 sstables: wire priority for write path
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
03d5a89b90 sstables: mandate a buffer size parameter for data_stream_at
The only user for the default size is data_read, sitting at row.cc.
That reader wants to read and process a chunk all at once. So there's
really no reason to use the default buffer size - except that this code
is old.

We should do as we do in other single-key / single-range readers and
try to read all at once if possible, by looking at the size we received
as a parameter. Cleaning up the data_stream_at interface then comes as
a nice side effect.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
15336e7eb7 key_source: turn it into a class
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
58fdae33bd mutation_source: turn it into a class
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Gleb Natapov
c9bd069815 messaging_service: log rpc errors
Message-Id: <20160125155005.GC23862@scylladb.com>
2016-01-25 17:59:26 +02:00
Avi Kivity
91b57c7e20 Merge "Move streaming to use IDL" from Asias 2016-01-25 17:10:22 +02:00
Asias He
f027a9babe streaming: Drop unused serialization code 2016-01-25 22:39:13 +08:00
Asias He
ad80916905 messaging_service: Add streaming implementation for idl
- stream_request
- stream_summary
- prepare_message

 Please enter the commit message for your changes. Lines starting
2016-01-25 22:36:58 +08:00
Asias He
b299cc3bee idl: Add streaming.idl.hh
- stream_request
- stream_summary
- prepare_message
2016-01-25 22:29:25 +08:00
Asias He
5e100b3426 streaming: Drop unused repaired_at in stream_request 2016-01-25 22:28:48 +08:00
Avi Kivity
6fade0501b Update test/message.cc for MESSAGE verb rename 2016-01-25 14:47:55 +02:00
Nadav Har'El
db19a43d98 repair: try harder to repair, even when some nodes are unreachable
In the existing code, when we fail to reach one of the replicas of some
range being repaired, we would give up, and not continue to repair the
living replicas of this range. The thinking behind this was since the
repair should be considered failed anyway, there's no point in trying
to do a half-job better.

However, in a discussion I had with Shlomi, he raised the following
alternative thinking, which convinced me: In a large cluster, having
one node or another temporarily dead has a high probability. In that
case, even if the if the repair is doomed to be considered "failed",
we want it at least to do as much as it possibly can to repair the
data on the living part of the cluster. This is what this patch does:
If we can only reach some of the replicas of a given range, the repair
will be considered failed (as before), but we will still repair the
reachable replicas of this range, if they have different checksums.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1453724443-29320-1-git-send-email-nyh@scylladb.com>
2016-01-25 14:37:39 +02:00
Amnon Heiman
039e627b32 idl-compiler: Fix an issue with default values
This patch fix an issue when a parameter with a version attribute had a
default value.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453723251-9797-1-git-send-email-amnon@scylladb.com>
2016-01-25 14:32:00 +02:00
Takuya ASADA
e9fdb426b6 dist: add pyparsing on CentOS build time dependency CentOS
Porting pyparsing from Fedora23, build it for python34 which provided by epel.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453720780-21105-1-git-send-email-syuu@scylladb.com>
2016-01-25 13:26:58 +02:00
Takuya ASADA
b8b0ff0482 dist: add pyparsing on Ubuntu build time dependency
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453720081-15979-1-git-send-email-syuu@scylladb.com>
2016-01-25 13:08:48 +02:00
Avi Kivity
5c5207f122 Merge "Another round of streaming cleanup" from Asias
"- Merge stream_init_message and stream_parepare_message
- Drop  session_index / keep_ss_table_level / file_message_header"
2016-01-25 12:54:30 +02:00
Asias He
77684a5d4c messaging_service: Drop STREAM_INIT_MESSAGE
The verb is not used anymore.
Message-Id: <1453719054-29584-1-git-send-email-asias@scylladb.com>
2016-01-25 12:53:08 +02:00
Asias He
53c6cd7808 gossip: Rename echo verb to gossip_echo
It is used by gossip only. I really could not allow myself to get along
this inconsistence. Change before we still can.
Message-Id: <1453719054-29584-2-git-send-email-asias@scylladb.com>
2016-01-25 12:53:07 +02:00
Takuya ASADA
67d2aa677e dist: add pyparsing on Fedora build time dependency
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453715594-32675-1-git-send-email-syuu@scylladb.com>
2016-01-25 11:59:32 +02:00
Takuya ASADA
78d107ccaa dist: add missing dependencies for scylla-gdb
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453715339-32296-1-git-send-email-syuu@scylladb.com>
2016-01-25 11:59:31 +02:00
Asias He
51fa717b8e streaming: Get rid of file_message_header
Again, we do not send sstable files, thus neither header info for
sstables files.

TODO: Estimate mutation size we sent.
2016-01-25 17:56:43 +08:00
Asias He
eba9820b22 streaming: Remove stream_session::file_sent
It is the callback after sending file_message_header. In scylla, we do
not sent the file_message_header. Drop it.
2016-01-25 17:25:34 +08:00
Asias He
592683650a streaming: Remove unused serialization code for file_message_header 2016-01-25 17:16:57 +08:00
Calle Wilund
2d1e332fba Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-25 09:11:12 +00:00
Asias He
fa4e94aa27 streaming: Get rid of keep_ss_table_level
We stream mutation instead of files, so keep_ss_table_level is not
relevant for us.
2016-01-25 16:58:57 +08:00
Asias He
2cc31ac977 streaming: Get rid of the stream_index
It is always zero.
2016-01-25 16:58:57 +08:00
Asias He
6b30f08a38 streaming: Always return zero for session_index in api/stream_manager
We will remove session_index soon. It will always be zero. Do not drop
it in api so that the api will be compatible with c*.
2016-01-25 16:58:51 +08:00
Asias He
ad4a096b80 streaming: Get rid of stream_init_message
Unlike streaming in c*, scylla does not need to open tcp connections in
streaming service for both incoming and outgoing messages, seastar::rpc
does the work. There is no need for a standalone stream_init_message
message in the streaming negotiation stage, we can merge the
stream_init_message into stream_prepare_message.
2016-01-25 16:24:16 +08:00
Asias He
048965ea02 streaming: Do not print session_index in handle_session_prepared
session_index is always 0. It will be removed soon.
2016-01-25 16:24:16 +08:00
Avi Kivity
449b81f5d3 Merge "streaming cleanup" from Asias
"No mercy to the unused parameters and messages.
This will help the upcoming IDL serialize/deserialize work."
2016-01-25 10:21:16 +02:00
Avi Kivity
9ebd3f8098 Merge "Move gossip to use IDL" from Asias
"This changes gossip to use IDL based serialization code."
2016-01-25 10:18:34 +02:00
Asias He
20496ed9a8 tests: Stop gossip during shutdown in cql_test_env
Fixes the heap-use-after-free error in build/debug/tests/auth_test

==1415==ERROR: AddressSanitizer: heap-use-after-free on address
0x62200032cfa8 at pc 0x00000350701d bp 0x7fec96df8d40 sp
0x7fec96df8d30
READ of size 8 at 0x62200032cfa8 thread T1
    #0 0x350701c in
_ZZN3gms8gossiper3runEvENKUlOT_E0_clI6futureIJEEEEDaS2_
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x350701c)
    #1 0x35795b1 in apply<gms::gossiper::run()::<lambda(auto:40&&)>,
future<> > /home/penberg/scylla/seastar/core/future.hh:1203
    #2 0x369103d in
_ZZN6futureIJEE12then_wrappedIZN3gms8gossiper3runEvEUlOT_E0_S0_EET0_S5_ENUlS5_E_clI12future_stateIJEEEEDaS5_
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x369103d)
    #3 0x369182a in run /home/penberg/scylla/seastar/core/future.hh:399
    #4 0x435f24 in
reactor::run_tasks(circular_buffer<std::unique_ptr<task,
std::default_delete<task> >, std::allocator<std::unique_ptr<task,
std::default_delete<task> > > >&) core/reactor.cc:1368
    #5 0x43a44f in reactor::run() core/reactor.cc:1672
    #6 0x952e4b in app_template::run_deprecated(int, char**,
std::function<void ()>&&) core/app-template.cc:123
    #7 0x58dc79d in test_runner::start(int,
char**)::{lambda()#1}::operator()()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x58dc79d)
    #8 0x58e6cd6 in _M_invoke /usr/include/c++/5.3.1/functional:1871
    #9 0x688639 in std::function<void ()>::operator()() const
/usr/include/c++/5.3.1/functional:2271
    #10 0x8d939c in posix_thread::start_routine(void*) core/posix.cc:51
    #11 0x7feca02a4609 in start_thread (/lib64/libpthread.so.0+0x7609)
    #12 0x7fec9ffdea4c in clone (/lib64/libc.so.6+0x102a4c)

0x62200032cfa8 is located 5800 bytes inside of 5808-byte region
[0x62200032b900,0x62200032cfb0)
freed by thread T1 here:
    #0 0x7feca4f76472 in operator delete(void*, unsigned long)
(/lib64/libasan.so.2+0x9a472)
    #1 0x3740772 in gms::gossiper::~gossiper()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x3740772)
    #2 0x2588ba1 in shared_ptr<gms::gossiper>::~shared_ptr()
seastar/core/shared_ptr.hh:389
    #3 0x4fc908c in
seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}::~stop()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x4fc908c)
    #4 0x4ff722a in future<>
future<>::then<seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}, future<>
>(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}&&)::{lambda(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1})#1}::~then()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x4ff722a)
    #5 0x509a28c in continuation<future<>
future<>::then<seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}, future<>
>(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}&&)::{lambda(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1})#1}>::~continuation()
seastar/core/future.hh:395
    #6 0x509a40d in continuation<future<>
Message-Id: <f8f1c92c1eb88687ab0534f5e7874d53050a5b93.1453446350.git.asias@scylladb.com>
2016-01-25 08:19:18 +02:00
Asias He
bc4ac5004e streaming: Kill stream_result_future::create_and_register
The helper is used only once in init_sending_side and in
init_receiving_side we do not use create_and_register to create
stream_result_future. Kill the trivial helper to make the code more
consistent.

In addition, rename variables "future" and "f" to sr (streaming_result).
2016-01-25 11:38:13 +08:00
Asias He
face74a8f2 streaming: Rename stream_result_future::init to ::init_sending_side
So we have:

- init_sending_side
  called when the node initiates a stream_session

- init_receiving_side
  called when the node is a receiver of a stream_session initiated by a peer
2016-01-25 11:38:13 +08:00
Asias He
dc94c5e42e streaming: Rename get_or_create_next_session to get_or_create_session
There is only one session for each peer in stream_coordinator.
2016-01-25 11:38:13 +08:00
Asias He
e46d4166f2 streaming: Refactor host_streaming_data
In scylla, in each stream_coordinator, there will be only one
stream_session for each remote peer. Drop the code supporting multiple
stream_sessions in host_streaming_data.

We now have

   shared_ptr<stream_session> _stream_session

instead of

   std::map<int, shared_ptr<stream_session>> _stream_sessions
2016-01-25 11:38:13 +08:00
Asias He
8a4b563729 streaming: Drop the get_or_create_session_by_id interafce
The session index will always be 0 in stream_coordinator. Drop the api for it.
2016-01-25 11:38:13 +08:00
Asias He
9a346d56b9 streaming: Drop unnecessary parameters in stream_init_message
- from
  We can get it form the rpc::client_info

- session_index
  There will always be one session in stream_coordinator::host_streaming_data with a peer.

- is_for_outgoing
  In cassandra, it initiates two tcp connections, one for incoming stream and one for outgoing stream.
  logger.debug("[Stream #{}] Sending stream init for incoming stream", session.planId());
  logger.debug("[Stream #{}] Sending stream init for outgoing stream", session.planId());
  In scylla, it only initiates one "connection" for sending, the peer initiates another "connection" for receiving.
  So, is_for_outgoing will also be true in scylla, we can drop it.

- keep_ss_table_level
  In scylla, again, we stream mutations instead of sstable file. It is
  not relevant to us.
2016-01-25 11:38:13 +08:00
Asias He
1bc5cd1b22 streaming: Drop streaming/messages/session_failed_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
2a04e8d70e streaming: Drop streaming/messages/incoming_file_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
26ba21949e streaming: Drop streaming/messages/retry_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
4b4363b62d streaming: Drop streaming/messages/received_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
b3e00472ed streaming: Drop streaming/streaming.cc
It is used in the early stage of development to make sure things compile.
2016-01-25 11:38:13 +08:00
Asias He
5a0bf10a0b streaming: Drop streaming/messages/complete_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
bdd6a69af7 streaming: Drop unused parameters
- int connections_per_host

Scylla does not create connections per stream_session, instead it uses
rpc, thus connections_per_host is not relevant to scylla.

- bool keep_ss_table_level
- int repaired_at

Scylla does not stream sstable files. They are not relevant to scylla.
2016-01-25 11:38:13 +08:00
Asias He
7b633ad127 gossip: Drop unused serialization code
- heart_beat_state
2016-01-25 11:28:29 +08:00
Asias He
4ce08ff251 messaging_service: Add heart_beat_state implementation 2016-01-25 11:28:29 +08:00
Asias He
d7c7994f37 gossip: Drop unused serialization code
- versioned_value
2016-01-25 11:28:29 +08:00
Asias He
8098ba10b7 gossip: Drop unused serialization code
- endpoint_state
2016-01-25 11:28:29 +08:00
Asias He
ecca969adf messaging_service: Add gossip::endpoint_state implementation 2016-01-25 11:28:29 +08:00
Asias He
2a0b6589dd messaging_service: Add versioned_value implementation 2016-01-25 11:28:29 +08:00
Asias He
6660658742 gossip: Drop unused serialization code
- gossip_digest_serialization_helper
- gossip_digest
2016-01-25 11:28:29 +08:00
Asias He
15f2b353b9 messaging_service: Add gossip_digest implementation 2016-01-25 11:28:29 +08:00
Asias He
736d21a912 gossip: Drop unused serialization code
- gossip_digest_syn
- gossip_digest_ack
- gossip_digest_ack2
2016-01-25 11:28:29 +08:00
Asias He
d81fc12af3 messaging_service: Add gossip_digest_ack2 implementation 2016-01-25 11:28:29 +08:00
Asias He
e67cecaee1 messaging_service: Add gossip_digest_syn implementation 2016-01-25 11:28:29 +08:00
Asias He
d94b7e49d2 idl: Add gossip_digest_syn
Added get_partioner and get_cluster_id
2016-01-25 11:28:28 +08:00
Asias He
60f5891c3f idl: Add gossip_digest_ack2 2016-01-25 11:28:26 +08:00
Takuya ASADA
0f0d1c7aed dist: don't depends on libvirtd, since we are not using it
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453470970-31036-1-git-send-email-syuu@scylladb.com>
2016-01-24 17:15:13 +02:00
Avi Kivity
65a140481c Merge " streaming COMPLETE_MESSAGE failure and message retry logic fix" from Asias
"This series:

- Add more debug info to stream session
- Fail session if we fail to send COMPLETE_MESSAGE
- Handle message retry logic for verbs used by streaming

See commit log for details."
2016-01-24 16:41:06 +02:00
Avi Kivity
6135e0ae78 Merge "Move read/write mutation path to use IDL" from Gleb 2016-01-24 13:35:04 +02:00
Avi Kivity
b415f87324 Merge "Serializer Deserializer code generation" from Amnon
"The series do the following:
It adds the code generation
Perform the needed changes in the current classes so each would have getter for
each of its serializable value and a constructor from the serialized values.
It adds a schema definition that cover gossip_diget_ack
It changes the messaging_service to use the generated code.

An overall explanation of the solution with a description of the schema IDL can
be found on the wiki page:

https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation
"
2016-01-24 12:56:42 +02:00
Gleb Natapov
b9b6f703c3 Remove old serializer for frozen_mutation and reconcilable_result 2016-01-24 12:45:41 +02:00
Gleb Natapov
067bdb23cd Move reconcilable_result and frozen_mutation to idl 2016-01-24 12:45:41 +02:00
Gleb Natapov
18dff5ebc8 Move smart pointer serialization helpers to .cc file.
They are not used outside of the .cc file, so should not be in the
header.
2016-01-24 12:45:41 +02:00
Gleb Natapov
93da9b2725 Remove redundant vector serialization code.
IDL serializer has the code to serialize vectors, so use it instead.
2016-01-24 12:45:41 +02:00
Gleb Natapov
ab6703f9bc Remove old query::result serializer 2016-01-24 12:45:41 +02:00
Gleb Natapov
afc407c6e5 Move query::result to use idl. 2016-01-24 12:45:41 +02:00
Gleb Natapov
be4e68adbf Add bytes_ostream serializer. 2016-01-24 12:45:41 +02:00
Gleb Natapov
043d132ba9 Remove no longer used serializers. 2016-01-24 12:45:41 +02:00
Gleb Natapov
4ae906b204 Add serializer overload for query::partition_range.
From now on query::partition_range will use generated code.
2016-01-24 12:45:41 +02:00
Gleb Natapov
2d1b2765e6 Add serializer overload for query::read_command.
From now on query::read_command will use generated code.
2016-01-24 12:45:41 +02:00
Gleb Natapov
49ce2b83df Add ring_position constructor needed by serializer. 2016-01-24 12:45:41 +02:00
Gleb Natapov
6cc5b15a9c Fix read_command constructor to not copy parameters. 2016-01-24 12:45:41 +02:00
Gleb Natapov
4384c7fe85 un-nest range::bound class.
Serializer does not support nested classes yet, so move bound outside.
2016-01-24 12:45:41 +02:00
Gleb Natapov
7357b1ddfe Move specific_ranges to .hh and un-nest it.
Serializer requires class to be defined, so it has to be in .h file. It
also does not support nested types yet, so move it outside of containing
class.
2016-01-24 12:45:41 +02:00
Gleb Natapov
9ae7dc70da Prepare partition_slice to be used by serializer.
Add missing _specific_ranges getter and setter.
2016-01-24 12:45:41 +02:00
Gleb Natapov
48ab0bd613 Make constructor from bytes for partition_key and clustering_key_prefix public
Make constructor from bytes public since serializer will use it.
2016-01-24 12:45:41 +02:00
Gleb Natapov
8deb5e424c Add idl files for more types.
Add idl for uuid/range/read_command/token/ring_position/clustering_key_prefix/partition_key.
2016-01-24 12:45:41 +02:00
Gleb Natapov
11299aa3db Add serializers for more basic types.
We will need them in following patches.
2016-01-24 12:45:41 +02:00
Gleb Natapov
a643f3d61f Reorder bool and uint8_t serializers
bool serializer uses uint8_t one so should be defined after it.
2016-01-24 12:45:41 +02:00
Gleb Natapov
cba31eb4f8 cleanup gossip_digest.idl
Remove uuid class, nonexistent application states and add ';'.
2016-01-24 12:45:37 +02:00
Avi Kivity
a3efecb8fe Merge seastar upstream
* seastar 5c2660b...97f418a (4):
  > io_queues: register individual classes with collectd
  > reactor: destroy the I/O queues explicitly
  > rpc_impl: add pragma once
  > rpc: add skip to simple_input_stream
2016-01-24 12:35:24 +02:00
Takuya ASADA
b5029dae7e dist: remove abrt from AMI, since it's not able to work with Scylla
New CentOS Base Image contains abrt by default, so remove it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453416479-28553-2-git-send-email-syuu@scylladb.com>
2016-01-24 12:31:50 +02:00
Amnon Heiman
0006f236a6 Add an IDL definition file
This adds the IDL definition file.
It is also cover in the wiki page:
https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
f266c2ed42 README.md: Add dependency for pyparsing python3
python3 needs to install pyparsing excplicitely. This adds the
installation of python3-pyparsing to the require dependencies in the
README.md

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
577ce0d231 Adding a sepcific template initialization in messaging_service to use
the serializer

This patch adds a specific template initialization so that the rpc would
use the serializer and deserializer that are auto-generated.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
b625363072 Adding the serializer decleration and implementation files.
This patch adds the serializer and serializer_imple files. They holds
the functions that are not auto generated: primitives and templates (map
and vector) It also holds the include to the auto-generated code.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:20 +02:00
Amnon Heiman
451cf2692c configure.py Add serializer code generation from schema
This patch adds rules and the idl schema to configure, which will call
the code generation to create the serialization and deserialization
functions.

There is also a rule to create the header file that include the auto
generated header files.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:20 +02:00
Amnon Heiman
0715dcd6ba A schema definition for gossip_digest_ack
This is a definition example for gossip_digest_ack with all its sub
classes.

It can be used by the code generator to create the serializer and
deserializer functions.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:14 +02:00
Amnon Heiman
d27734b9be Add a constructor to inet_address from uint32_t
inet_address uses uint32_t to store the ip address, but its constructor
is int32_t.
So this patch adds such a constructor.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
8a4d211a99 Changes the versioned_value to make serializeble
This patch contains two changes, it make the constructor with parameters
public. And it removes the dependency in messaging_service.hh from the
header file by moving some of the code to the .cc file.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
ddc3fe1328 endpoint_state adds a constructor for all serialized parameters
An external deserialize function needs a constructor with all the
serialized parameters.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
4a34ed82a3 Add code generation for serializer and deserializer
The code generation takes a schema file and create two files from it,
one with a dist.hh extension containing the forward declarations and a second
with dist.impl.hh with the actual implementation.

Because the rpc uses templating for the input and output streams. The
generated functions are templates.

For each class, struct or enum, two functions are created:
serialize - that gets the output buffer as template parameter and
serialize the object to it. There must be a public way to get to each of
the parameters in the class (either a getter or the parameter should be
public)

deserialize - that gets an input buffer, and return the deserialize
object (and by reference the number of char it read).
To create the return object, the class must have a public constructor
with all of its parameters.

The solution description can be found here:
https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:12:51 +02:00
Takuya ASADA
aef1e67a9b dist: remove mdadm,xfsprogs from dependencies, install it when constructing RAID by scylla_raid_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453422886-26297-2-git-send-email-syuu@scylladb.com>
2016-01-24 12:10:41 +02:00
Takuya ASADA
b92a075a34 main: support supervisor_notify() on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453422886-26297-1-git-send-email-syuu@scylladb.com>
2016-01-24 12:10:41 +02:00
Asias He
7ac3e835a6 messaging_service: Fix send_message_timeout_and_retry
When a verb timeout, if we resend the message again, the peer could receive
the message more than once. This would confuse the receiver. Currently, only
the streaming code use the retry logic.

- In case of rpc:timeout_error:

Instead of doing timeout in a relatively short time and resending a few
times, we make the timeout big enough and let the tcp to do the resend.
Thus, we can avoid resending the message more than once, of course, the
receiver would not receive the message more than once.

- In case of rpc::closed_error:

There are two cases:
1) Failing to establish a connection.

For instance, the peer is down. It is safe to resend since we know for
sure the receiver hasn't received the message yet.

2) The connection is established.

We can not figure out if the remote peer have received the message
already or not upon receiving the rpc::closed_error exception.

Currently, we still sleep & resend the message again, so the receiver
might receive the message more than once. We do not have better choice
in this case, if we want the resend to recover the sending error due to
temporary network issue, since failing the whole stream_session due to
failing to send a single message is not wise.

NOTE: If the duplicated message is received when the stream_session is done,
it will be ignored since it can not find the stream_manager anymore.
For message like, STREAM_MUTATION, it is ok to receive twice (we apply the
mutation twice).

TODO: For other messages which uses the retry logic, we need
to make sure it is ok to receive more than once.
2016-01-22 08:20:48 +08:00
Asias He
864c7f636c streaming: Fail the session if fails to send COMPLETE_MESSAGE
We will retry sending COMPLETE_MESSAGE, if it fails even with the
retry, there must be something wrong. Abort the stream_session in this
case.
2016-01-22 07:44:21 +08:00
Asias He
9be671e7f5 streaming: Simplify send_complete_message
The send once logic is open coded. Moved it into
send_complete_message(), so we can simplify the caller.
2016-01-22 07:43:39 +08:00
Asias He
88e99e89d6 streaming: Add more debug info
- Add debug for the peer address info
- Add debug in stream_transfer_task and stream_receive_task
- Add debug when cancel the keep_alive timer
- Add debug for has_active_sessions in stream_result_future::maybe_complete
2016-01-22 07:43:16 +08:00
Pekka Enberg
81996bd10b Merge "Improvements to compaction manager" from Raphael 2016-01-21 20:54:49 +02:00
Raphael S. Carvalho
bb909798bc compaction_manager: introduce can_submit
Purpose is to reuse code and also make it easier to read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:42:23 -02:00
Raphael S. Carvalho
653a07d75d compaction_manager: introduce signal_less_busy_task
Purpose is to reuse code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:31:44 -02:00
Raphael S. Carvalho
2164aa8d5b move compaction manager from /utils to /sstables
Compaction manager was initially created at utils because it was
more generic, and wasn't only intended for compaction.
It was more like a task handler based on futures, but now it's
only intended to manage compaction tasks, and thus should be
moved elsewhere. /sstables is where compaction code is located.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:23:05 -02:00
Pekka Enberg
b5833e8002 Merge "Enable incremental backups option" from Vlad
"This series moves the "backup" logic into the sstable::write_components()
methods, adds a support for enabling backup for sstables flushed in the
compaction flow (in addition to a regular flushing flow which had this support
already) and enables the "incremental_backups" configuration option."

I fixed up a merge conflict with commit 5e953b5 ("Merge "Add support to
stop ongoing compaction" from Raphael").
2016-01-21 18:52:07 +02:00
Pekka Enberg
5e953b5e47 Merge "Add support to stop ongoing compaction" from Raphael
"stop compaction is about temporarily interrupting all ongoing compaction
 of a given type.
 That will also be needed for 'nodetool stop <compaction_type>'.

 The test was about starting scylla, stressing it, stopping compaction using
 the API and checking that scylla was able to recover.

 Scylla will print a message as follow for each compaction that was stopped:
 ERROR [shard 0] compaction_manager - compaction failed: read exception:
 std::runtime_error (Compaction for keyspace1/standard1 was deliberately stopped.)
 INFO  [shard 0] compaction_manager - compaction task handler sleeping for 20 seconds"
2016-01-21 18:34:10 +02:00
Takuya ASADA
fae47ee4a8 dist: fetch CentOS dependencies from koji, update them to latest version
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453378765-19596-1-git-send-email-syuu@scylladb.com>
2016-01-21 15:04:45 +02:00
Asias He
755d792c78 gossip: Wait for gossip timer callback to finish in do_stop_gossiping
Also do not rearm the timer if we stopped the gossip.

Message-Id: <73765857b554d9914e87b24d287ff35ab0af6fce.1453378191.git.asias@scylladb.com>
2016-01-21 14:15:57 +02:00
Vlad Zolotarov
e3d7db5e57 ec2_snitch: complete the EC2Snitch -> Ec2Snitch renaming
The rename started in 72b27a91fe
was not complete. This patch fixes the places that were missed
in the above patch.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1453375025-7512-3-git-send-email-vladz@cloudius-systems.com>
2016-01-21 13:35:30 +02:00
Vlad Zolotarov
9951edde1a locator::ec2_multi_region_snitch: add a get_name() implementation
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1453375025-7512-2-git-send-email-vladz@cloudius-systems.com>
2016-01-21 13:35:29 +02:00
Avi Kivity
43c81db74e Update ami submodule
* dist/ami/files/scylla-ami eb1fdd4...188781c (1):
  > Switch SimpleSnitch to Ec2Snitch
2016-01-21 13:13:23 +02:00
Vlad Zolotarov
de3bb01582 config: allow enabling the incremental backup via .yaml
Enable the incremental_backups/--incremental-backups option.
When enabled there will be a hard link created in the
<column family directory>/backup directory for every flushed
sstable.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:24 +02:00
Vlad Zolotarov
c2ab54e9c7 sstables flushing: enable incremental backup (if requested)
Enable incremental backup when sstables are flushed if
incremental backup has been requested.

It has been enabled in the regular flushing flow before but
wasn't in the compaction flow.

This patch enables it in both places and does it using a
backup capability of sstable::write_components() method(s).

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:20 +02:00
Vlad Zolotarov
cb5c66f264 sstable::write_components(): add a 'backup' parameter
When 'backup' parameter is TRUE - create backup hard
links for a newly written sstables in <sstable dir>/backups/
subdirectory.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:04:45 +02:00
Amnon Heiman
e33710d2ca API: storage_service get_logging_level
This patch adds the get_loggin_level command that returns a map between
the log name and its level.
To test the API do:
curl -X GET "http://localhost:10000/storage_service/logging_level"

this would enable the `nodetool getlogginglevels` command.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453365106-27294-3-git-send-email-amnon@scylladb.com>
2016-01-21 11:58:54 +02:00
Amnon Heiman
ba80121e49 migration_task: rename logger name
Logger name should not contain a space, it causes issues when trying to
modify their level from the nodetool.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453365106-27294-2-git-send-email-amnon@scylladb.com>
2016-01-21 11:58:42 +02:00
Calle Wilund
980681d28e auth: Add a simplistic "schedule" for auth db setup
Only difference from previous sleep is that we will
explicitly delete the objects if the process terminates
before tasks are run. I.e. make ASas happier.

Message-Id: <1453295521-29580-1-git-send-email-calle@scylladb.com>
2016-01-20 19:31:14 +02:00
Raphael S. Carvalho
f001bb0f53 sstables: fix make_checksummed_file_output_stream
Arguments buffer_size and true were accidently inverted.
GCC wasn't complaning because implicit conversion of bool to
int, and vice-versa, is valid.
However, this conversion is not very safe because we could
accidentaly invert parameters.

This should fix the last problem with sstable_test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <9478cd266006fdf8a7bd806f1c612ec9d1297c1f.1453301866.git.raphaelsc@scylladb.com>
2016-01-20 16:01:38 +01:00
Calle Wilund
07f992e42a Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-20 13:31:33 +00:00
Calle Wilund
63b17be4f0 auth_test: Modify yet another case to use "normal" continuation.
test_cassandra_hash also sort of expects exceptions. ASas causes false
positives here as well with seastar::thread, so do it with normal cont.
Message-Id: <1453295521-29580-2-git-send-email-calle@scylladb.com>
2016-01-20 15:15:45 +02:00
Takuya ASADA
2eb12681b0 dist: add 'scylla-gdb' package for CentOS
Fixes #831

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453291407-12232-1-git-send-email-syuu@scylladb.com>
2016-01-20 15:12:33 +02:00
Avi Kivity
7bc3e6ffd0 Merge seastar upstream
* seastar 0516ed0...5c2660b (4):
  > reactor: block all signals early
  > reactor: replace sigprocmask() with pthread_sigmask()
  > fstream: remove unused interface
  > foreign_ptr: remove make_local_and_release()

Fixes #601.
2016-01-20 14:59:52 +02:00
Asias He
1c2d95f2b0 streaming: Remove unused verb handlers
They are never used in scylla.
Message-Id: <1453283955-23691-2-git-send-email-asias@scylladb.com>
2016-01-20 13:58:59 +02:00
Asias He
767e25a686 streaming: Remove the _handlers helper
It is introduced to help to run the invoke_on_all, we can reuse the
distributed<database> db for it.
Message-Id: <1453283955-23691-1-git-send-email-asias@scylladb.com>
2016-01-20 13:58:44 +02:00
Paweł Dziepak
33892943d9 sstables: do not drop row marker when reading mutation
Since 581271a243 "sstables: ignore data
belonging to dropped columns" we silently drop cells if there is no
column in the current schema that they belong to or their timestamp is
older than the column dropped_at value. Originally this check was
applied to row markers as well which caused them to be always dropped
since there is no column in the schema representing these markers.
This patch makes sure that the check whether colum is alive is performed
only if the cell is not a row marker.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453289300-28607-1-git-send-email-pdziepak@scylladb.com>
2016-01-20 12:35:41 +01:00
Calle Wilund
9197a886f8 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-20 09:44:38 +00:00
Takuya ASADA
79b218eb1c dist: use our own CentOS7 Base image
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-4-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:56 +02:00
Takuya ASADA
b9cb91e934 dist: stop ntpd before running ntpdate
New CentOS Base Image runs ntpd by default, so shutdown it before running ntpdate.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-3-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:35 +02:00
Takuya ASADA
98e61a93ef dist: disable SELinux only when it enabled
New CentOS7 Base Image disabled SELinux by default, and running 'setenforce 0' on the image causes error, we won't able to build AMI.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-2-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:01 +02:00
Raphael S. Carvalho
c318f3baa3 sstables: fix sstable::data_stream_at
After 63967db8, offset is ignored when creating a input stream.
Found the problem after sstable_test failed recently.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <56ece21ff6e043e224eb2a6e76cdd422b94821b0.1453232689.git.raphaelsc@scylladb.com>
2016-01-20 09:35:57 +02:00
Raphael S. Carvalho
ff9b1694fe api: implement stop_compaction
stop_compaction is implemented by calling stop_compaction() of
compaction manager for each database.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
5cceb7d249 api: fix paramType of parameter of stop_compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
3bd240d9e8 compaction: add ability to stop an ongoing compaction
That's needed for nodetool stop, which is called to stop all ongoing
compaction. The implementation is about informing an ongoing compaction
that it was asked to stop, so the compaction itself will trigger an
exception. Compaction manager will catch this exception and re-schedule
the compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
ec4c73d451 compaction: rename compaction_stats to compaction_info
compaction_info makes more sense because this structure doesn't
only store stats about ongoing compaction. Soon, we will add
information to it about whether or not an user asked to stop the
respective ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Tomasz Grabiec
bd34adcf22 tests: memory_footprint: Show canonnical_mutation size
Message-Id: <1453227147-21918-1-git-send-email-tgrabiec@scylladb.com>
2016-01-19 20:22:59 +02:00
Tomasz Grabiec
b8c3fa4d46 cql3: Print only column name in error message
Printing column_definition prints all fields of the struct, we want
only name here.
Message-Id: <1453207531-16589-1-git-send-email-tgrabiec@scylladb.com>
2016-01-19 20:22:37 +02:00
Tomasz Grabiec
0596455dc2 Merge branch 'pdziepak/date-timestamp-fixes/v2'
From Paweł:

These patches contain fixes for date and timestamp types:
 - date and timestamp are considered compatible types
 - date type is added to abstract_type::parse_type()
2016-01-19 18:35:09 +01:00
Glauber Costa
63967db8bf sstables: always use a file_*_stream_options in our readers and writes
Instead of using the APIs that explicitly pass things like buffer_size,
always use the options instance instead.

This will make it easier to pass extra options in the future.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <5b04e60ab469c319a17a522694e5bedf806702fe.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:37 +02:00
Glauber Costa
c3ac5257b5 sstables: don't repeat file_writer creation all the time
When this code was originally written, we used to operate on a generic
output_stream. We created a file output stream, and then moved it into
the generic object.

Many patches and reworks later, we now have a file_writer object, but
that pattern was never reworked.

So in a couple of places we have something like this:

    f = file_object acquired by open_file_dma
    auto out = file_writer(std::move(f), 4096);
    auto w = make_shared<file_writer>(std::move(out));

The last statement is just totally redundant. make_shared can create
an object from its parameters without trouble, so we can just pass
the parameter list directly to it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c01801a1fdf37f8ea9a3e5c52cd424e35ba0a80d.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:36 +02:00
Calle Wilund
59bf54d59a commitlog_replayer: Modify logging to more match origin
* Match origin log messages
  - Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
  - At info level, like origin

Prompted by dtest that expects origin log output.

Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>
2016-01-19 17:19:52 +02:00
Avi Kivity
07e0f0a31f Merge "Support schema changes in batchlog manager" from Tomasz
"We need to be able to replay mutations created using older versions of
the table's schema. frozen_mutation can be only read using the version
it was serialized with, and there is no guarantee that the node will
know this version at the time of replay. Currently versions are kept
in-memory so a node forgets all past versions when it restarts. This
was not implemented yet, replay would fail with exception if the
version is unknown."
2016-01-19 17:17:47 +02:00
Calle Wilund
3f4c8d9eea commitlog_replayer: Modify logging to more match origin
* Match origin log messages
  - Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
  - At info level, like origin

Prompted by dtest that expects origin log output.

v2:
* Fixed broken + operator
* Use map_reduce instead of easily readable code
2016-01-19 15:14:21 +00:00
Paweł Dziepak
db30ac8d2d tests/types: add test for timestamp and date compatibility
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 15:34:45 +01:00
Avi Kivity
a2953833dc Merge seastar upstream
* seastar e93cd9d...0516ed0 (9):
  > http: use default file input stream options in file_handler
  > linecount: use default file input stream options
  > fstream: do not pass offset as part options member
  > net: move posix network stack registration to reactor.cc
  > net: throw a human-readable error if using an unregistered network stack
  > io_queue: remove pending_io counter
  > Revert "Merge "Improve rpc server-side statistics""
  > tests: corrections regarding Boost.Test 1.59 compilation failures
  > Merge "Improve rpc server-side statistics"
2016-01-19 16:33:35 +02:00
Calle Wilund
1b4b7aeb66 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-19 13:51:00 +00:00
Paweł Dziepak
900f5338e7 types: make timestamp_type and date_type compatible
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 14:03:15 +01:00
Tomasz Grabiec
ec12b75426 batchlog_manager: Store canonical_mutations
We need to be able to replay mutations created using older versions of
the table's schema. frozen_mutation can be only read using the version
it was serialized with, and there is no guarantee that the node will
know this version at the time of replay. Currently versions kept
in-memory so a node forgets all past versions when it restarts.

To solve this, let's store canonical_mutations which, like data in
sstables, can be read using any later schema version of given table.
2016-01-19 13:46:28 +01:00
Tomasz Grabiec
e21049328f batchlog_manager: Add more debug logging 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
608b606434 canonical_mutation: Introduce column_family_id() getter 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
06d1f4b584 database: Print table name when printing mutation 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
52073d619c database: Add trace-level logging of applied mutations 2016-01-19 13:46:28 +01:00
Paweł Dziepak
a6171d3e99 types: add date type to parse_type()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 13:43:36 +01:00
Paweł Dziepak
f77ab67809 types: use correct name for date_type
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 13:42:53 +01:00
Tomasz Grabiec
d7cb88e0af Merge branch 'pdziepak/fixes-for-alter-table/v1'
From Paweł:

"This series contains some more fixes for issues related to alter table,
namely: incorrect parsing of collection information in comparator, missing
schema::_raw._collections in equality check, missing compatibility
information for utf8->blob, ascii->blob and ascii->utf8 casts."
2016-01-19 13:22:10 +01:00
Calle Wilund
de9f9308a5 auth_test: workaround ASan false error
test_password_authenticator_operations causes ASan failures, in a way
that I am 99% sure is fully false positive, caused by a combo of
seastar threads, exception throwing and externals.

In lieu of actually identifying what ASan flaw causes this and
potentially cure it, for now, lets just re-write the test in question
to not use seastar::async, but normal continuation. Less easy to read,
but passes ASan.
Message-Id: <1453205136-10308-1-git-send-email-calle@scylladb.com>
2016-01-19 13:11:20 +01:00
Calle Wilund
79a5f7b19d auth_test: workaround ASan false error
test_password_authenticator_operations causes ASan failures, in a way
that I am 99% sure is fully false positive, caused by a combo of
seastar threads, exception throwing and externals.

In lieu of actually identifying what ASan flaw causes this and
potentially cure it, for now, lets just re-write the test in question
to not use seastar::async, but normal continuation. Less easy to read,
but passes ASan.
2016-01-19 12:02:50 +00:00
Raphael S. Carvalho
0c67b1d22b compaction: filter out mutation that doesn't belong to shard
When compacting sstable, mutation that doesn't belong to current shard
should be filtered out. Otherwise, mutation would be duplicated in
all shards that share the sstable being compacted.
sstable_test will now run with -c1 because arbitrary keys are chosen
for sstables to be compacted, so test could fail because of mutations
being filtered out.

fixes #527.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1acc2e8b9c66fb9c0c601b05e3ae4353e514ead5.1453140657.git.raphaelsc@scylladb.com>
2016-01-19 10:16:41 +01:00
Vlad Zolotarov
922eb218b1 locator::reconnectable_snitch_helper: don't check messaging_service version
Don't demand the messaging_service version to be the same on both
sides of the connection in order to use internal addresses.

Upstream has a similar change for CASSANDRA-6702 in commit a7cae32 ("Fix
ReconnectableSnitch reconnecting to peers during upgrade").

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1452686729-32629-1-git-send-email-vladz@cloudius-systems.com>
2016-01-19 11:04:37 +02:00
Paweł Dziepak
7c9708953e tests/cql3: add tests for ALTER TABLE with multiple collections
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:24 +01:00
Paweł Dziepak
e249d4eab5 tests/type: add test for simple type compatibility
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:20 +01:00
Paweł Dziepak
440b6d058e types: fix compatibility for text types
bytes_type is_compatible_with utf8_type and ascii_type
utf8_type is_compatible_with ascii_type

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:16 +01:00
Paweł Dziepak
17ca7e06f3 schema: print collection info
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:12 +01:00
Paweł Dziepak
2e2de35dfb schema: add _raw._collections check to operator==()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:08 +01:00
Paweł Dziepak
92dc95b73b schema: fix comparator parsing
The correct format of collection information in comparator is:

o.a.c.db.m.ColumnToCollection(<name1>:<type1>, <name2>:<type2>, ...)

not:

o.a.c.db.m.ColumnToCollection(<name1>:<type1>),
o.a.c.db.m.ColumnToCollection(<name2>:<type2>) ...

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:05 +01:00
Amnon Heiman
9be42bfd7b API: Add version to application state in failure_detection
The upstream of origin adds the version to the application_state in the
get_endpoints in the failure detector.

In our implementation we return an object to the jmx proxy and the proxy
do the string formatting.

This patch adds the version to the return object which is both useful as
an API and will allow the jmx proxy to add it to its output when we move
forward with the jmx version.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1448962889-19611-1-git-send-email-amnon@scylladb.com>
2016-01-19 10:23:56 +02:00
Tomasz Grabiec
5a1587353f tests: Don't depend on partition_key representation
Representation format is an implementation detail of
partition_key. Code which compares a value to representation makes
assumptions about key's representation. Compare keys to keys instead.
Message-Id: <1453136316-18125-1-git-send-email-tgrabiec@scylladb.com>
2016-01-18 19:01:56 +02:00
Pekka Enberg
2ca8606b4e streaming/stream_session: Don't stop stream manager
We cannot stop the stream manager because it's accessible via the API
server during shutdown, for example, which can cause a SIGSEGV.

Spotted by ASan.
Message-Id: <1453130811-22540-1-git-send-email-penberg@scylladb.com>
2016-01-18 16:34:19 +01:00
Pekka Enberg
422cff5e00 api/messaging_service: Fix heap-buffer-overflows in set_messaging_service()
Fix various issues in set_messaging_service() that caused
heap-buffer-overflows when JMX proxy connects to Scylla API:

  - Off-by-one error in 'num_verb' definition

  - Call to initializer list std::vector constructor variant that caused
    the vector to be two elements long.

  - Missing verb definitions from the Swagger definition that caused
    response vector to be too small.

Spotted by ASan.
Message-Id: <1453125439-16703-1-git-send-email-penberg@scylladb.com>
2016-01-18 15:43:29 +01:00
Pekka Enberg
3723beb302 service/storage_service: Fix typos in logger messages
Message-Id: <1453128076-18613-1-git-send-email-penberg@scylladb.com>
2016-01-18 15:43:04 +01:00
Takuya ASADA
d5d5857b62 dist: extend coredump size limit
16GB is not enough for some larger machines, so extend it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453115792-21989-2-git-send-email-syuu@scylladb.com>
2016-01-18 13:38:43 +02:00
Takuya ASADA
023c6dc620 dist: preserve environment variable when running scylla_prepare on sudo
sysconfig parameters are passed via environment variables, but sudo resets it by default.
Need to keep them beyond sudo.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453115792-21989-1-git-send-email-syuu@scylladb.com>
2016-01-18 13:22:14 +02:00
Gleb Natapov
dde2e80a20 storage_proxy: remove batchlog synchronously
Wait for batchlog removal before completing a query otherwise batchlog
removal queries may accumulate. Still ignore an error if it happens
since it is not critical, but log it.

Message-Id: <20160118095642.GB6705@scylladb.com>
2016-01-18 12:38:12 +02:00
Avi Kivity
221ef4536c messaging service: limit rpc server resources
Otherwise, a slow node can be overwhelmed by other nodes and run out of
memory.

Fixes #596.
Message-Id: <1452776394-13682-1-git-send-email-avi@scylladb.com>
2016-01-18 11:16:45 +02:00
Avi Kivity
a881e596fa Merge "Ubuntu dependency packages fix" from Takuya 2016-01-18 11:13:18 +02:00
Gleb Natapov
f97eed0c94 fix batch size checking
warn_threshold is in kbytes v.size is in bytes size is in kbytes.

Message-Id: <20160118090620.GZ6705@scylladb.com>
2016-01-18 11:08:13 +02:00
Avi Kivity
5313a28044 Merge "Fix re-addinig collections" from Paweł
"This series makes sure that Scylla rejects adding a collections if
its column name is the same as a collection that existed before and
their types are incompatible.

Fixes #782"
2016-01-18 10:58:40 +02:00
Tomasz Grabiec
237819c31f logalloc: Excluded zones' free segments in lsa/byres-non_lsa_used_space
Historically the purpose of the metric is to show how much memory is
in standard allocations. After zones were introduced, this would also
include free space in lsa zones, which is almost all memory, and thus
the metric lost its original meaning. This change brings it back to
its original meaning.

Message-Id: <1452865125-4033-1-git-send-email-tgrabiec@scylladb.com>
2016-01-18 10:48:14 +02:00
Takuya ASADA
5270da1eef dist: prevent use abrt with scylla
scylla should use with systemd-coredump, not abrt.
Fixes #762

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452713081-32492-1-git-send-email-syuu@scylladb.com>
2016-01-18 10:47:11 +02:00
Pekka Enberg
7d3a3bd201 Merge "column family cleanup support" from Raphael
"This patch is intended to add support to column family cleanup, which will
 make 'nodetool cleanup' possible.

 Why is this feature needed? Remove irrelevant data from a node that loses part
 of its token range to a newly added node."
2016-01-18 10:15:05 +02:00
Pekka Enberg
6cc02242f6 Merge "Multi schema support in commit log" from Paweł
"This series adds support for multiple schema versions to the commit log.
 All segments contain column mappings of all schema versions used by the
 mutations contained in the segment, which are necessary in order to be
 able to read frozen mutations and upgrade them to the current schema
 version."
2016-01-18 10:11:26 +02:00
Avi Kivity
d5050e4c6a storage_proxy: make MUTATION and MUTATION_DONE verbs sychronous at the server side
While MUTATION and MUTATION_DONE are asynchronous by nature (when a MUTATION
completes, it sends a MUTATION_DONE message instead of responding
synchronously), we still want them to be synchronous at the server side
wrt. the RPC server itself.  This is because RPC accounts for resources
consumed by the handler only while the handler is executing; if we return
immediately, and let the code execute asynchronously, RPC believes no
resources are consumed and can instantiate more handlers than the shard
has resources for.

Fix by changing the return type of the handlers to future<no_wait_type>
(from a plain no_wait_type), and making that future complete when local
processing is over.

Ref #596.
Message-Id: <1453048967-5286-1-git-send-email-avi@scylladb.com>
2016-01-18 09:59:34 +02:00
Nadav Har'El
d97cbbbe43 repair: forbid repair with "-dc" not including the current host
Theoretically, one could want to repair a single host *and* all the hosts
in one or more other data centers which don't include this host. However,
Cassandra's "nodetool repair" explicitly does not allow this, and fails if
given a list of data centers (via the "-dc" option) which doesn't include
the host starting the repair. So we need to behave like "nodetool repair"
and fail in this case too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1453037016-25775-1-git-send-email-nyh@scylladb.com>
2016-01-18 09:54:16 +02:00
Paweł Dziepak
fa7bef72d4 tests/cql3: add tests for ALTER TABLE validation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:50 +01:00
Paweł Dziepak
b7e58db7ec tests: allow any future in assert_that_failed()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:44 +01:00
Paweł Dziepak
00f7a873a5 cql3: forbid re-adding collection with incompatible type
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:38 +01:00
Paweł Dziepak
4927ff95da schema: read collections from comparator
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:33 +01:00
Paweł Dziepak
725129deb7 type_parser: accept sstring_view
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:27 +01:00
Paweł Dziepak
6372a22064 schema: use _raw._collections to generate comparator name
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:03 +01:00
Paweł Dziepak
84840c1c98 schema: keep track of removed collections
Cassandra disallows adding a column with the same name as a collection
that existed in the past in that table if the types aren't compatible.
To enforce that Scylla needs to keep track of all collections that ever
existed in the column family.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:34:29 +01:00
Avi Kivity
c4cf4e0bcd Merge seastar upstream
* seastar a8183c1...e93cd9d (2):
  > rpc: make sure we serialize on _resources_available sempahore
  > rpc: fix support for handlers returning future<no_wait_type>
2016-01-17 18:36:22 +02:00
Avi Kivity
249dbc1d8e Merge seastar upstream
* seastar 6f9453d...a8183c1 (2):
  > rpc: fix server losing handler
  > Merge "Fair I/O Queue" from Glauber
2016-01-17 14:21:53 +02:00
Takuya ASADA
01309c0dd8 dist: add missing dependency (xfslibs-dev) for Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
9ad3365353 dist: use gdebi to resolve install-time dependencies
Since we switched to use mk-build-deps, it only resolves build-time dependencies.
We also need to install install-time dependencies.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
705285cf27 dist: resolve build time dependency by mk-build-deps command, do not install them manually
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
90be81f9ba dist: add missing build time dependency for thrift package on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Tomasz Grabiec
d332fcaefc row_cache: Restore indentation 2016-01-15 15:33:17 +01:00
Tomasz Grabiec
6b3cd35109 Merge branch 'pdziepak/multi-schema-sstables/v1'
From Paweł:

This series add support for reading sstables using different schema than
the one that was used to write them.
2016-01-15 14:23:18 +01:00
Paweł Dziepak
dbf23fdff5 tests/sstable: add test for multi schema
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Paweł Dziepak
cfc0a132a9 sstable: handle multi-cell vs atomic incompatibilities
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Paweł Dziepak
581271a243 sstables: ignore data belonging to dropped columns
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Asias He
e10580f474 cql_server: Fix connection shutdown
_fd is of type connected_socket. shutdown_input() and shutdown_output()
return future<>. Do not ignore the future.

Message-Id: <786eee890541a18d3501ecd52415f2900c545157.1452835922.git.asias@scylladb.com>
2016-01-15 11:37:30 +02:00
Tomasz Grabiec
ccd609185f sstables: Add ability to wait for async sstable cleanup tasks
This patch adds a function which waits for the background cleanup work
which is started from sstable destructors.

We wait for those cleanups on reactor exit so that unit tests don't
leak. This fixes erratic ASAN complaint about memory leak when running
schema_change_test in debug mode:

    Indirect leak of 64 byte(s) in 1 object(s) allocated from:
         0x7fab24413912 in operator new(unsigned long) (/lib64/libasan.so.2+0x99912)
         0x1776aeb in make_unique<continuation<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> >, future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /usr/include/c++/5.1.1/bits/unique_ptr.h:765
         0x1752b69 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:513
        0x1711365 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:690
        0x16d0474 in then_wrapped<future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>, future<> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:880
        0x1696e9c in handle_exception<sstables::sstable::~sstable()::<lambda(auto:52)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:1012
        0x1638ba8 in sstables::sstable::~sstable() sstables/sstables.cc:1619

The leak is about allocations related to close() syscall tasks invoked
from sstable destructor, which were not waited for.

Message-Id: <1452783887-25244-1-git-send-email-tgrabiec@scylladb.com>
2016-01-15 11:32:15 +02:00
Calle Wilund
e935c9cd34 select_statement: Make sure all aggregate queries use paging
Mainly to make sure we respect row limits. Since normal result
generation does not for aggregates.

Fixes #752 

Message-Id: <1452681048-30171-2-git-send-email-calle@scylladb.com>
2016-01-14 19:03:37 +02:00
Calle Wilund
1dc5937f40 query_pagers: fix log message in requires_paging
message would state that all queries required paging, even when
returning the opposite to caller

Message-Id: <1452681048-30171-1-git-send-email-calle@scylladb.com>
2016-01-14 19:03:16 +02:00
Asias He
cc3073b42d gossip: cleanup application_state
Drop the unused one.

Message-Id: <4cc45164d55742951b618d2c7b1e8bdb997f005a.1452771260.git.asias@scylladb.com>
2016-01-14 19:01:51 +02:00
Avi Kivity
d47a58cc32 README: add libxml2 and libpciaccess packages to list or required packages
Needed for link stage.
2016-01-14 17:47:48 +02:00
Avi Kivity
cf7e6cede2 README: add hwloc and numactl to install recommendations 2016-01-14 17:30:43 +02:00
Tomasz Grabiec
b7976f3b82 config: Set default logging level to info
Commit d7b403db1f changed the default in
logging::logger. It affected tests but not scylla binary, where it's
being overwritten in main.cc.
Message-Id: <1452777008-21708-1-git-send-email-tgrabiec@scylladb.com>
2016-01-14 15:11:58 +02:00
Pekka Enberg
9306f4eb22 Merge "Disable ALTER TABLE statement unless --experimental=on" from Tomek 2016-01-14 14:30:20 +02:00
Avi Kivity
cf8ab65fbc Merge seastar upstream
* seastar 43e64c2...6f9453d (2):
  > Merge "rpc resource accounting"
  > core: Introduce smp::invoke_on_all()
2016-01-14 14:28:27 +02:00
Asias He
826b6ed877 gossip: Print node status in handle_major_state_change
Message-Id: <1452768680-32355-1-git-send-email-asias@scylladb.com>
2016-01-14 14:22:37 +02:00
Asias He
e7a899f5f3 gossip: Enable debug msg for convcit
Kill one FIXME in convict

Message-Id: <1452768680-32355-2-git-send-email-asias@scylladb.com>
2016-01-14 14:22:36 +02:00
Tomasz Grabiec
054f1df0a5 cql3: Disable ALTER TABLE unless experimental features are on 2016-01-14 13:21:13 +01:00
Tomasz Grabiec
1fd03ea1d2 tests: cql_test_env: Enable experimental features 2016-01-14 13:21:13 +01:00
Tomasz Grabiec
a13aaa62df config: Add 'experimental' switch 2016-01-14 13:21:13 +01:00
Gleb Natapov
647a09cd7b storage_proxy: improve mutation timeout logging
Message-Id: <20160114105359.GY6705@scylladb.com>
2016-01-14 12:00:35 +01:00
Pekka Enberg
733584c44d main: Start the API service as the last step
This reverts commit f0d68e4 ("main: start the http server in the first
step"). The service layer is not ready to serve clients before it's
fully up and running which causes early startup crashes everywhere.
Message-Id: <1452768015-22763-1-git-send-email-penberg@scylladb.com>
2016-01-14 12:55:50 +02:00
Tomasz Grabiec
1daaf909d7 Merge branch 'tgrabiec/row_cache_invalidate_fix'
Fixes for wrap-around range handling in row_cache.
2016-01-14 11:38:26 +01:00
Takuya ASADA
7479cde28b dist: extend root disk size to 10GB
Since default root disk size is too small for our purpose, it's better to extend.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452762325-5620-1-git-send-email-syuu@scylladb.com>
2016-01-14 11:29:00 +02:00
Pekka Enberg
90123197e1 service/client_state: Use anonymous user when authentication is disabled
If authentication is disabled, nobody calls login() to set the current
user. There's untranslated code in client_state constructor to do just
that.

Fixes "You have not logged in" errors when USE statement is executed
with authentication disabled.
Message-Id: <1452759946-13998-1-git-send-email-penberg@scylladb.com>
2016-01-14 09:29:33 +01:00
Avi Kivity
4143cf6385 Merge "Initial authenticator support" from Calle
"Add implementation of cassandra password authenticator, and user
password checking to CQL connections.

User/pwd are stored in system_auth table. Passwords are hashed
using glibc 'crypt_r'.

The latter is worth noting, as this is a difference compared to origin;
Origin uses Java bcrypt library for salt/hash, i.e. blowfish hashing.
Most glibc variants do _not_ have support for blowfish. To be 100%
compatible with imported origin tables we might need to add
bcrypt/blowfish sources into scylla (no packaged libs available afaict)

The code currently first attempts to use blowfish, if we happen to run
centos or Openwall, which has it compiled in. Otherwise we will fall
back to sha512, sha256 or even md5 depending on lib support.

To use:
* scylla.conf: authenticator=PasswordAuthenticator
* cqlsh -u cassandra -p cassandra

Not implemented (yet):
* "Authorizer", thus no KS/CF access checking
* CQL create/alter/delete user (create_user_statement etc). I.e. there is
  only a single user name; default "cassandra:cassandra" user/pwd combo"
2016-01-13 19:13:05 +02:00
Takuya ASADA
0511b02f90 dist: run scylla_prepare, scylla_stop on sudo
Since we changed uid on scylla-server.service to scylla, we need sudo for these scripts.

Fixes #783

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452704598-5292-1-git-send-email-syuu@scylladb.com>
2016-01-13 19:06:33 +02:00
Tomasz Grabiec
6b059fd828 row_cache: Guard against wrap-around range in make_reader() 2016-01-13 17:50:55 +01:00
Tomasz Grabiec
7fb0bc4e15 row_cache: Take the reclaim lock in invalidate()
It's needed to keep the iterators valid in case eviciton is triggered
somehwere in between. It probably isn't because destructors should not
allocate, but better be safe.
2016-01-13 17:50:55 +01:00
Tomasz Grabiec
5e05f63ee7 tests: Add more tests for row_cache::invalidate()
Regs #785.
2016-01-13 17:50:55 +01:00
Tomasz Grabiec
50cc0c162e row_cache: Make invalidate() handle wrap-around ranges
Currently for wrap around the "begin" iterator would not meet with the
"end" iterator, invoking undefined behavior in erase_and_dispose()
which results in a crash.

Fixes #785
2016-01-13 17:50:55 +01:00
Calle Wilund
8192384338 auth_test: Unit tests for auth objects 2016-01-13 15:37:39 +00:00
Calle Wilund
9e3295bc69 cql_test_env: Allow specifying db::config for the env 2016-01-13 15:35:37 +00:00
Calle Wilund
9ef05993ff config: Mark "authenticator" used + update description 2016-01-13 15:35:36 +00:00
Calle Wilund
1d811f1e8f transport::server: Add authentication support
If system autheticator object requires authentication, issue
a challenge to client, and process response.
2016-01-13 15:35:36 +00:00
Calle Wilund
1c30d37285 client_state: Add user object + login
Note: all actual authorization methods are still unimplemented.
2016-01-13 15:35:36 +00:00
Calle Wilund
4692f46b8d storage_service: Initialize auth system on start 2016-01-13 15:35:36 +00:00
Calle Wilund
9a4d45e19d auth::auth/authenticator: user storage and authentication
User db storage + login/pwd db using system tables.

Authenticator object is a global shard-shared singleton, assumed
to be completely immutable, thus safe.
Actual login authentication is done via locally created stateful object
(sasl challenge), that queries db.

Uses "crypt_r" for password hashing, vs. origins use of bcrypt.
Main reason is that bcrypt does not exist as any consistent package
that can be consumed, so to guarantee full compatibility we'd have
to include the source. Not hard, but at least initially more work than
worth.
2016-01-13 15:35:35 +00:00
Calle Wilund
00de63c920 cql3::query_processor: Add processing helpers for internal usage
syntactical sugar + "process" for internal, similar to 
execute_internal, but allowing querying the whole cluster, and optional
statement caching.
2016-01-13 15:35:21 +00:00
Calle Wilund
6a5f075107 batch_statement: Modify verify_batch_size to match current origin
Fixes #614

* Use warning threshold from config
* Don't throw exceptions. We're only supposed to warn.
* Try to actually estimate mutation data payload size, not
  number of mutations.
Message-Id: <1452615759-23213-1-git-send-email-calle@scylladb.com>
2016-01-13 12:26:49 +01:00
Paweł Dziepak
218898b297 commitlog: upgrade mutations during commitlog replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:26 +01:00
Paweł Dziepak
661849dbc3 commitlog: learn about schema versions during replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:23 +01:00
Paweł Dziepak
55d342181a commitlog: do not skip entries inside a chunk
All entries inside a chunk needs to be read since any of them may
contain column mapping.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:23:00 +01:00
Paweł Dziepak
18d0a57bf4 commitlog: use commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:20:06 +01:00
Paweł Dziepak
a877905bd4 commitlog: allow adding entries using commitlog_entry_writer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:17:45 +01:00
Paweł Dziepak
0254c3e30b commitlog: add commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:49 +01:00
Paweł Dziepak
434c02cdfa commitlog: keep track of schema versions
Each segment chunk should contain column mappings for all schema
versions used by the mutations it contains. In order to avoid
duplication db::commitlog::segment remembers all schema versions already
written in current chunk.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:41 +01:00
Paweł Dziepak
9d74268234 commitlog: introduce entry_writer
Current commitlog interface requires writers to specify the size of a
new entry which cannot depend on the segment to which the entry is
written.
If column mappings are going to be stored in the commitlog that's not
enough since we don't know whether column mapping needs to be written
until we known in which segment the entry is going to be stored.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:26 +01:00
Calle Wilund
32e480025f cql3::query_options: Add constructors for internal processing 2016-01-13 08:49:01 +00:00
Calle Wilund
2e9ab3aff1 types.hh: Add data_type_for<bool> 2016-01-13 08:49:01 +00:00
Calle Wilund
40efd231b1 auth::authenticated_user: Object representing a named or anon user 2016-01-13 08:49:01 +00:00
Calle Wilund
51af2bcafd auth::permission: permissions for authorization
Not actually used yet. But some day...
2016-01-13 08:49:01 +00:00
Calle Wilund
6f708eae1c auth::data_resource: resource identifier for auth permissions 2016-01-13 08:49:01 +00:00
Calle Wilund
9c1d088718 exceptions: add authorization exceptions 2016-01-13 08:49:01 +00:00
Calle Wilund
cd4ae7a81e Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-13 08:48:43 +00:00
Tomasz Grabiec
e88f41fb3f messaging_service: Move REPAIR_CHECKSUM_RANGE verb out of the streaming verbs group
Message-Id: <1452620321-17223-1-git-send-email-tgrabiec@scylladb.com>
2016-01-12 20:17:08 +02:00
Calle Wilund
8de95cdee8 paging bugfix: Allow reset/removal of "specific ck range"
Refs #752

Paged aggregate queries will re-use the partition_slice object,
thus when setting a specific ck range for "last pk", we will hit
an exception case.
Allow removing entries (actually only the one), and overwriting
(using schema equality for keys), so we maintain the interface
while allowing the pager code to re-set the ck range for previous
page pass.

[tgrabiec: commit log cleanup, fixed issue ref]

Message-Id: <1452616259-23751-1-git-send-email-calle@scylladb.com>
2016-01-12 17:45:57 +01:00
Calle Wilund
7d7d592665 batch_statement: Modify verify_batch_size to match current origin
Fixes #614

* Use warning threshold from config
* Don't throw exceptions. We're only supposed to warn.
* Try to actually estimate mutation data payload size, not
  number of mutations.
2016-01-12 16:30:31 +00:00
Calle Wilund
81e9dc0c2a paging bugfix: Ensure limit for single page is min(page size, limit left)
Fixes #752

We set row limit for query to be min of page size/remaining in limit,
but if we have a multinode query we might end up with more rows than asked
for, so must do this again in post-processing.
2016-01-12 16:30:30 +00:00
Calle Wilund
ea92d7d4fd paging bugfix: Allow reset/removal of "specific ck range"
Refs #792

Paged aggregate queries will re-use the partition_slice object,
thus when setting a specific ck range for "last pk", we will hit
an exception case.
Allow removing entries (actually only the one), and overwriting
(using schema equality for keys), so we maintain the interface
while allowing the pager code to re-set the ck range for previous
page pass. 

v2: 
* Changed to schema-equality checks so we sort of maintain a 
  sane api and behaviour, even with the 1-entry map
 
v3: 
* Renamed remove "contains" in specific_ranges, and made the calling
  code use more map-like logic, again to keep things cleaner
2016-01-12 16:30:30 +00:00
Calle Wilund
e50d8b6895 paging bugfix: Ensure limit for single page is min(page size, limit left)
Fixes #752

We set row limit for query to be min of page size/remaining in limit,
but if we have a multinode query we might end up with more rows than asked
for, so must do this again in post-processing.

Message-Id: <1452606935-12899-2-git-send-email-calle@scylladb.com>
2016-01-12 17:23:04 +02:00
Vlad Zolotarov
9232ad927f messaging_service::get_rpc_client(): fix the encryption logic
According to specification
(here https://wiki.apache.org/cassandra/InternodeEncryption)
when the internode encryption is set to `dc` the data passed between
DCs should be encrypted and similarly, when it's set to `rack`
the inter-rack traffic should encrypted.

Currently Scylla would encrypt the traffic inside a local DC in the
first case and inside the local RACK in the later one.

This patch fixes the encryption logic to follow the specification
above.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1452501794-23232-1-git-send-email-vladz@cloudius-systems.com>
2016-01-12 16:22:26 +02:00
Avi Kivity
4693197e37 Merge seastar upstream
* seastar fe7a49c...43e64c2 (1):
  > resource: fix failures on low-memory machines

Fixes #734.
2016-01-12 14:45:43 +02:00
Calle Wilund
5b9f196115 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-12 11:46:40 +00:00
Avi Kivity
39f81b95d6 main: make --developer-mode relax dma requirements
With Docker we might be running on a filesystem that does not support DMA
(aufs; or tmpfs on boot2docker), so let --developer-mode allow running
on those file systems.
Message-Id: <1452593083-25601-1-git-send-email-avi@scylladb.com>
2016-01-12 13:34:46 +02:00
Avi Kivity
d68026716e Merge seastar upstream
* seastar ad3577b...fe7a49c (2):
  > reactor: workaround tmpfs O_DIRECT vs O_EXCL bug
  > rpc: fix reordering between sending client's negotiation frame and user's data
2016-01-12 13:27:16 +02:00
Takuya ASADA
a1d1d0bd06 Revert "dist: prevent 'local rpm' AMI image update to older version of scylla package by yum update"
This reverts commit b28b8147a0.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452592877-29721-2-git-send-email-syuu@scylladb.com>
2016-01-12 12:26:09 +02:00
Takuya ASADA
5459df1e9e dist: renumber development version as 666.development
yum command think "development-xxxx.xxxx" is older than "0.x", so nightly package mistakenly update with release version.
To prevent this problem, we should add greater number prior to "development".
Also same on Ubuntu package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452592877-29721-1-git-send-email-syuu@scylladb.com>
2016-01-12 12:26:08 +02:00
Calle Wilund
fdda880920 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-12 10:17:22 +00:00
Avi Kivity
5809ed476f Merge "Orderly service startup for systemd"
Use systemd Type=notify to tell systemd about startup progress.

We can now use 'systemctl status scylla-server' to see where we are
in service startup, and 'systemctl start scylla-server' will wait until
either startup is complete, or we fail to start up.
2016-01-12 12:01:32 +02:00
Avi Kivity
3d5f6de683 main: notify systemd of startup progress
Send current startup stage via sd_notify STATUS variable; let it know that
startup is complete via READY=1.

Fixes #760.
2016-01-12 11:58:24 +02:00
Calle Wilund
1b54b9c2d8 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-12 09:02:05 +00:00
Calle Wilund
7f4985a017 commit log reader bugfix: Fix tried to read entries across chunk bounds
read_entry did not verify that current chunk has enough data left
for a minimal entry. Thus we could try to read an entry from the slack
left in a chunk, and get lost in the file (pos > next, skip very much
-> eof). And also give false errors about corruption.
Message-Id: <1452517700-599-1-git-send-email-calle@scylladb.com>
2016-01-12 10:29:07 +02:00
Tzach Livyatan
c5b332716c Fix AMI prompt from "nodetool --help" to "nodetool help"
Fixes #775

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <1452586945-28738-1-git-send-email-tzach@scylladb.com>
2016-01-12 10:27:05 +02:00
Takuya ASADA
fc13b9eb66 dist: yum install epel-release before installing CentOS dependencies
Fixes #779

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452586442-19777-1-git-send-email-syuu@scylladb.com>
2016-01-12 10:24:56 +02:00
Raphael S. Carvalho
fc6a1934b0 api: implement force_keyspace_cleanup
This will add support for an user to clean up an entire keyspace
or some of its column families.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 03:53:22 -02:00
Raphael S. Carvalho
a5c90194f5 db: add support to clean up a column family
Cleanup is a procedure that will discard irrelevant keys from
all sstables of a column family, thus saving disk space.
Scylla will clean up a sstable by using compaction code, in
which this sstable will be the only input used.
Compaction manager was changed to become aware of cleanup, such
that it will be able to schedule cleanup requests and also know
how to handle them properly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 03:53:04 -02:00
Raphael S. Carvalho
d44a5d1e94 compaction: filter out compacting sstables
The implementation is about storing generation of compacting sstables
in an unordered set per column family, so before strategy is called,
compaction manager will filter out compacting sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 01:18:29 -02:00
Raphael S. Carvalho
9c13c1c738 compaction: move compaction execution from strategy to manager
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 00:04:27 -02:00
Raphael S. Carvalho
68619211f5 tests: add test to a sstable rewrite
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 21:43:41 -02:00
Raphael S. Carvalho
ed80ed82ef sstables: prepare compact_sstables to work with cleanup
Cleanup is about rewriting a sstable discarding any keys that
are irrelevant, i.e. keys that don't belong to current node.
Parameter cleanup was added to compact_sstables.
If set to true, irrelevant code such as the one that updates
compaction history will be skipped. Logic was also added to
discard irrelevant keys.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 21:43:40 -02:00
Raphael S. Carvalho
5c674091dc db: move code that rebuilds sstable list to a function
That code will be used by column family cleanup, so let's put
that code into a function. This change also improves the code
readability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 19:51:04 -02:00
Raphael S. Carvalho
58189dd489 db: move generation calculation code to a function
Code that calculates generation should be put in a function.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 19:51:02 -02:00
Avi Kivity
678bdd5c79 Merge "Change AMI base image to CentOS7, use systemd-coredump for Fedora/CentOS, make AMI rootfs as XFS" from Takuya 2016-01-11 18:43:57 +02:00
Tzach Livyatan
8a4f7e211b Add REST API server ip:port parameters to scylla.yaml
api_port and api_address are already valid configuration options.
Adding them to scylla.yaml, let user know they exists.

solve issue #704

Signed-off-by: Tzach Livyatan <tzach@cloudius-systems.com>
Message-Id: <1452527028-13724-1-git-send-email-tzach@cloudius-systems.com>
2016-01-11 18:00:48 +02:00
Avi Kivity
f917f73616 Merge "Handling of schema changes" from Tomasz
"Our domain objects have schema version dependent format, for efficiency
reasons. The data structures which map between columns and values rely on
column ids, which are consecutive integers. For example, we store cells in a
vector where index into the vector is an implicit column id identifying table
column of the cell. When columns are added or removed the column ids may
shift. So, to access mutations or query results one needs to know the version
of the schema corresponding to it.

In case of query results, the schema version to which it conforms will always
be the version which was used to construct the query request. So there's no
change in the way query result consumers operate to handle schema changes. The
interfaces for querying needed to be extended to accept schema version and do
the conversions if necessary.

Shard-local interfaces work with a full definition of schema version,
represented by the schema type (usually passed as schema_ptr). Schema versions
are identified across shards and nodes with a UUID (table_schema_version
type). We maintain schema version registry (schema_registry) to avoid fetching
definitions we already know about. When we get a request using unknown schema,
we need to fetch the definition from the source, which must know it, to obtain
a shard-local schema_ptr for it.

Because mutation representation is schema version dependent, mutations of
different versions don't necessarily commute. When a column is dropped from
schema, the dropped column is no longer representable in the new schema. It is
generally fine to not hold data for dropped columns, the intent behind
dropping a column is to lose the data in that column. However, when merging an
incoming mutation with an existing mutation both of which have different
schema versions, we'd have to choose which schema should be considered
"latest" in order not to loose data. Schema changes can be made concurrently
in the cluster and initiated on different nodes so there is not always a
single notion of latest schema. However, schema changes are commutative and by
merging changes nodes eventually agree on the version.  For example adding
column A (version X) on one node and adding column B (version Y) on another
eventually results in a schema version with both A and B (version Z). We
cannot tell which version among X and Y is newer, but we can tell that version
Z is newer than both X and Y. So the solution to the problem of merging
conflicting mutations could be to ensure that such merge is performed using
the schema which is superior to schemas of both mutations.

The approach taken in the series for ensuring this is as follows. When a node
receives a mutation of an unknown schema version it first performs a schema
merge with the source of that mutation. Schema merge makes sure that current
node's version is superior to the schema of incoming mutation. Once the
version is synced with, it is remembered as such and won't be synced with on
later mutations. Because of this bookkeeping, schema versions must be
monotonic; we don't want table altering to result in any earlier version
because that would cause nodes to avoid syncing with them. The version is a
cryptographically-secure hash of schema mutations, which should fulfill this
purpose in practice.

TODO: It's possible that the node is already performing a sync triggered by
broadcasted schema mutations. To avoid triggering a second sync needlessly, the
schema merging should mark incoming versions as being synced with.

Each table shard keeps track of its current schema version, which is
considered to be superior to all versions which are going to be applied to it.
All data sources for given column family within a shard have the same notion
of current schema version. Individual entries in cache and memtables may be at
earlier versions but this is hidden behind the interface. The entries are
upgraded to current version lazily on access. Sstables are immutable, so they
don't need to track current version. Like any other data source, they can be
queried with any schema version.

Note, the series triggered a bug in demangler:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68700"
2016-01-11 17:59:14 +02:00
Avi Kivity
3092c1ebb5 Update scylla-ami submodule
* ami/files/scylla-ami 07b7118...eb1fdd4 (2):
  > move log file to /var/lib/scylla
  > move config file to /etc/scylla
2016-01-11 17:58:47 +02:00
Avi Kivity
9182ce1f61 Merge seastar upstream
* seastar d0bf6f8...ad3577b (9):
  > httpd: close connection before deleting it
  > reactor: support for non-O_DIRECT capable filesystems
  > tests: modernize linecount
  > IO queues: destruct within reactor's destructor
  > tests: Use dnsdomainname in mkcert.gmk
  > tests: memcached: workaround a possible race between flush_all and read
  > apps: memcached: reduce the error during the expiration time translation
  > timer: add missing #include
  > core: do not call open_file_dma directly

Fixes #757.
2016-01-11 17:41:39 +02:00
Takuya ASADA
6a457da969 dist: add ignore files for AMI 2016-01-11 14:22:20 +00:00
Takuya ASADA
b28b8147a0 dist: prevent 'local rpm' AMI image update to older version of scylla package by yum update
Since yum command think development version is older than release version, we need it.
2016-01-11 14:22:13 +00:00
Takuya ASADA
dd9894a7b6 dist: cleanup build directory before creating rpms for AMI
To prevent AMI build fail, cleanup build directory first.
2016-01-11 14:21:02 +00:00
Takuya ASADA
8886fe7393 dist: use systemd-coredump on Fedora/CentOS, create symlink /var/lib/scylla/coredump -> /var/lib/systemd/coredump when we mounted RAID
Use systemd-coredump for coredump if distribution is CentOS/RHEL/Fedora, and make symlink from RAID to /var/lib/systemd/coredump if RAID is mounted.
2016-01-11 14:20:50 +00:00
Takuya ASADA
927957d3b9 dist: since AMI uses XFS rootfs, we don't need to warn extra disks not attached to the AMI instance
Even extra disks are not supplied, it's stil valid since we have XFS rootfs now.
2016-01-11 14:19:35 +00:00
Takuya ASADA
47be3fd866 dist: split scylla_install script to two parts, scylla_install_pkg is for installing .rpm/.deb packages, scylla_setup is for setup environment after package installed
This enables to setup RAID/NTP/NIC after .rpm/.deb package installed.
2016-01-11 14:19:29 +00:00
Takuya ASADA
76f0191382 dist: remove scylla_local.json, merge it to scylla.json
We can share one packer config file for both build settings.
2016-01-11 14:18:55 +00:00
Takuya ASADA
8721e27978 dist: fetch CentOS dependencies from our yum repository by default
Only rebuild dependencies when passing -R option to build_rpm.sh
2016-01-11 14:18:49 +00:00
Takuya ASADA
b3c85aea89 dist: switch AMI base image from Fedora to CentOS
Move AMI to CentOS, use XFS for rootfs
2016-01-11 14:18:30 +00:00
Takuya ASADA
202389b2ec dist: don't need yum install and mv scylla-ami before scylla_install
This fixes 'amazon-ebs: mv: cannot stat ‘/home/fedora/scylla-ami’: No such file or directory' on build_ami_local.sh
2016-01-11 14:18:08 +00:00
Takuya ASADA
f3c32645d3 dist: add build time dependency to scylla-libstdc++-static for CentOS
This fixes link error on CentOS

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-11 14:17:52 +00:00
Takuya ASADA
780d9a26b2 configure.py: add --python option to specify python3 command path, for CentOS
Since python3 path is /usr/bin/python3.4 on CentOS, we need modify it's path
2016-01-11 14:17:27 +00:00
Takuya ASADA
b0980ef0c4 dist: use scylla-boost instead of boost to fix compile error on CentOS
boost package doesn't usable on CentOS, use scylla-boost instead.
2016-01-11 14:17:02 +00:00
Lucas Meneghel Rodrigues
94c3c5c1e9 dist/ami: Print newline at the end of MOTD banner
The MOTD banner now printed upon .bash_profile execution,
if scylla is running, ends with a 'tput sgr0'. That command
appends an extra '[m' at the beginning of the output of any
following command. The automation scripts don't like this.

So let's add an 'echo' at the end of that path to add a newline,
avoiding the condition described above, and another one at the
'ScyllaDB is not started' path, for symmetry. I'm doing this
as it seems easier than having to develop heuristics to know
whether to remove or not that character.

CC: Shlomi Livne <slivne@scylladb.com>
Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
Message-Id: <1452216044-28374-1-git-send-email-lmr@scylladb.com>
2016-01-11 15:40:43 +02:00
Calle Wilund
244cd62edb commit log reader bugfix: Fix tried to read entries across chunk bounds
read_entry did not verify that current chunk has enough data left
for a minimal entry. Thus we could try to read an entry from the slack
left in a chunk, and get lost in the file (pos > next, skip very much
-> eof). And also give false errors about corruption.
2016-01-11 13:07:26 +00:00
Vlad Zolotarov
0ed210e117 storage_proxy::query(): intercept exceptions coming from trace()
Exceptions originated by an unimplemented to_string() methods
may interrupt the query() flow if not intercepted. Don't let it
happen.

Fixes issue #768

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-11 12:29:50 +01:00
Tomasz Grabiec
e62857da48 schema_tables: Wait for make_directory_for_column_family() to finish in merge_tables() 2016-01-11 10:34:55 +01:00
Tomasz Grabiec
71bbbceced schema_tables: Notify about table creation after it is fully inited
I'm not aware of any issues it could cause, but it makes more sense
that way.
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
b6c6ee5360 tests: Add test for statement invalidation 2016-01-11 10:34:55 +01:00
Tomasz Grabiec
036eec295f query_processor: Invalidate statements synchronously
We want the statements to be removed before we ack the schema change,
otherwise it will race with all future operations.

Since the subscriber will be invoked on each shard, there is no need
to broadcast to all shards, we can just handle current shard.
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
8deb3f18d3 query_processor: Invalidate prepared statements when columns change
Replicates https://issues.apache.org/jira/browse/CASSANDRA-7910 :

"Prepare a statement with a wildcard in the select clause.
2. Alter the table - add a column
3. execute the prepared statement
Expected result - get all the columns including the new column
Actual result - get the columns except the new column"
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
facc549510 schema: Introduce equal_columns() 2016-01-11 10:34:55 +01:00
Tomasz Grabiec
0ea045b654 tests: Add notification test to schema_change_test 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
d80ffc580f schema_tables: Notify about table schema update 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
40858612e5 db: Make column_family::schema() return const& to avoid copy 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
8817e9613d migration_manager: Simplify notifications
Currently the notify_*() method family broadcasts to all shards, so
schema merging code invokes them only on shard 0, to avoid doubling
notifications. We can simplify this by making the notify_*() methods
per-instance and thus shard-local.
2016-01-11 10:34:54 +01:00
Tomasz Grabiec
5d38614f51 tests: Add test for column drop 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
5689a1b08b tests: Add test for column drop 2016-01-11 10:34:54 +01:00
Paweł Dziepak
21bbc65f3f tests/cql: add tests for ALTER TABLE
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
0276919819 cql3: complete translation of alter table statement
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
f24f677dde db/schema_tables: simplify column difference computation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
ae3acd0f9c system_tables: store sechma::dropped_columns in system tables
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
b5bee9c36a schema_builder: force column id recomputation in build()
If the schema_builder is constructed from an existing schema we need to
make sure that the original column ids of regular and static columns are
*not* used since they may become invalid if columns are added or
removed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
da0f999123 schema_builder: add with_altered_column_type()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
9807ddd158 schema_builder: add with_column_rename()
Columns that are part of the primary key can be renamed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
9bf13ed09b mutation_partition: drop cells from dropped_columns at upgrade
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
[tgrabiec: Merged the changes into converting_mutation_partition_applied]
2016-01-11 10:34:53 +01:00
Paweł Dziepak
3cbfa0e52f schema: add column_definition::_dropped_at
When a column is dropped its name and deletion timestamp are added
to schema::_raw._dropped_columns to prevent data resurrection in case a
column with the same name is added. To reduce the number of lookups in
_dropped_columns this patch makes each instance of column_definition
to caches this information (i.e. timestamp of the latest removal of a
column with the same name).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:53 +01:00
Paweł Dziepak
42dc4ce715 schema: keep track of dropped columns
Knowing which columns were dropped (and when) is important to prevent
the data from the dropped ones reappearing if a new column is added with
the same name.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
a81fa1727b tests: Add schema_change_test 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
d8ff9ee441 schema_tables: Make merge_tables() compare by mutations
Schema version is calculated from mutations, so merge_schema should
also look at mutation changes to detect schema changes whenever
version changes.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
5707c5e7ca schema_tables: Simplify merge_tables() and merge_keyspaces()
read_schema_for_keyspaces() drops empty results so the emptiness
checks are always false and we can remove some redundancy.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
bfefe5a546 schema_tables: Calculate digest from mutations
We want the node's schema version to change whenever
table_schema_version of any table changes. The latter is calculated by
hashing mutations so we should also use mutation hash when calculating
schema digest.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
b91c92401f migration_manager: Implement migration_manager::announce_column_family_update 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
c6a52bed73 db: Fail when attempting to mutate using not synced schema 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
a2cdbff965 storage_proxy: Log failures of definitions update handler
Fixes #769.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
e1e8858ed1 service: Fetch and sync schema 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
cdca20775f messaging_service: Introduce get_source() 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
f0d886893d db: Mark new schemas as synced 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
fb5658ede1 schema_registry: Track synced state of schema
We need to track which schema version were synced with on current node
to avoid triggering the sync on every mutation. We need to sync before
mutating to be able to apply the incoming mutation using current
node's schema, possibly applying irreverdible transformations to it to
make it conform.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
311e3733e0 service: migration_task: Implement using migration_manager::merge_schema_from()
To avoid duplication.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
dee0bbf3f3 migration_manager: Introduce merge_schema_from() 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
be2bdb779a tests: Introduce canonical_mutation_test 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
a63971ee4c tests: memtable_test: Add test for concurrent reading and schema changes 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
8164902c84 schema_tables: Change column_family schema on schema sync
Notifications are not implemented yet.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
d81a46d7b5 column_family: Add schema setters
There is one current schema for given column_family. Entries in
memtables and cache can be at any of the previous schemas, but they're
always upgraded to current schema on access.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
da3a453003 service: Add GET_SCHEMA_VERSION remote call
The verb belongs to a seaprate client to avoid potential deadlocks
should the throttling on connection level be introduced in the
future. Another reason is to reduce latency for version requests as it
can potentially block many requests.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
a9c00cbc11 batchlog_manager: Use requested schema version 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
036974e19b Make mutation interfaces support multiple versions
Schema is tracked in memtable and cache per-entry. Entries are
upgraded lazily on access. Incoming mutations are upgraded to table's
current schema on given shard.

Mutating nodes need to keep schema_ptr alive in case schema version is
requested by target node.
2016-01-11 10:34:51 +01:00
Tomasz Grabiec
9eef4d1651 db: Learn schema versions when adding tables 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
175be4c2aa cql_query_test: Disable test_user_type 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
04eb58159a query: Add schema_version field to read_command 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
f9ae1ed1c6 frozen_mutation: Add schema_version field 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
8c6480fc46 Introduce global_schema_ptr 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
f25487bc1e Introduce schema_registry 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
533aec84b3 schema: Enable shared_from_this() 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
8a05b61d68 memtable: Read under _read_section 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
5184381a0b memtable: Deconstify memtable in readers
We want to upgrade entries on read and for that we need mutating
permission.
2016-01-11 10:34:51 +01:00
Tomasz Grabiec
0a9436fc1a schema: Introduce frozen_schema
For passing schema across shards/nodes. Also, for keeping in
schema_registry when there's no live schema_ptr.
2016-01-11 10:34:51 +01:00
Tomasz Grabiec
060f93477b Make schema_mutations serializable
We must use canonical_mutation form to allow for changes in the schema
of schema tables. The node which deserializes schema mutations may not
have the same version of the schema tables so we cannot use
frozen_mutation, which is a schema dependent form.
2016-01-11 10:34:50 +01:00
Tomasz Grabiec
e84f3717b5 Introduce canonical_mutation
frozen_schema will transfer schema definition across nodes with schema
mutations. Because different nodes may have different versions of
schema tables, we cannot use frozen_mutations to transfer these
because frozen_mutation can only be read using the same version of the
schema it was frozen with. To solve this problem, new from of mutation
is introduced called canonical_mutation, which can be read using any
version of the schema.
2016-01-11 10:34:50 +01:00
Tomasz Grabiec
3e447e4ad1 tests: mutation_test: Add tests for equality and hashing 2016-01-11 10:34:50 +01:00
Tomasz Grabiec
48f1db5ffa mutation_assertions: Add is_not_equal_to() 2016-01-11 10:34:50 +01:00
Tomasz Grabiec
88a6a17f72 tests: Use mutation generators in frozen_mutation_test 2016-01-11 10:34:50 +01:00
Avi Kivity
6185744312 dist: redhat: drop 'sudo' in scylla_run
Systemd will change the user for us, and the extra process created by
'sudo' confuses sd_notify().
2016-01-10 18:46:43 +02:00
Avi Kivity
dd271b77b0 build: add support for optional pkg-config managed packages 2016-01-10 18:24:12 +02:00
Vlad Zolotarov
19e275be1f tests: gossip_test: initialize a broadcast address and a snitch
This patch fixes a regression introduced by
a commit ca935bf "tests: Fix gossip_test".

database service initializes a replication_strategy
object and a replication_strategy requires a snitch
service to be initialized.

A snitch service requires a broadcast address to be
set.

If any of the above is not initialized we are going
to hit the corresponding assert().

Set a snitch to a SimpleSnitch and a broadcast
address to 127.0.0.1.

Fixes issue #770

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1452421748-9605-1-git-send-email-vladz@cloudius-systems.com>
2016-01-10 13:13:37 +02:00
Tomasz Grabiec
d7b403db1f log: Change default level from warn to info
Logging at 'warn' level leaves us with too silent logs, not as helpful
as they could be in case of failure.

Message-Id: <1452283669-11675-1-git-send-email-tgrabiec@scylladb.com>
2016-01-09 09:24:22 +02:00
Tomasz Grabiec
9b2cc557c5 mutation_source_test: Add mutation generators
The goal is to provide various test cases with a way of iterating over
many combinations of mutaitons. It's good to have this in one place to
avoid duplication and increased coverage.
2016-01-08 21:10:27 +01:00
Tomasz Grabiec
4b92ef01fc test: Add tests for mutation upgrade 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
f59ec59abc mutation: Implement upgrade()
Converts mutation to a new schema.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
0edfe138f8 mutation_partition_view: Make visitable also with column_mapping 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
2cfdfe261d Introduce converting_mutation_partition_applier 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
b17cbc23ab schema: Introduce column_mapping
Encapsulates information needed to convert mutation representations
between schema versions.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
9a3db10b85 db/serializer: Implement skip() for bytes and sstring 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
13974234a4 db/serializer: Spread serializers to relax header dependencies 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
d13c6d7008 types: Introduce is_atomic()
Matches column_definition::is_atomic()
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
f3556ebfc2 schema: Introduce column_count_type
Right now in some places we use column_id, and in some places
size_t. Solve it by using column_count_type whose meaning is "an
integer sufficiently large for indexing columns". Note that we cannot
use column_id because it has more meaning to it than that.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
f58c2dec1e schema: Make schema objects versioned
The version needs to change value not only on structural changes but
also temporal. This is needed for nodes to detect if the version they
see was already synchronized with or not even if it has the same
structure as the past versions. We also need to end up with the same
version on all nodes when schema changes are commuted.

For regular mutable schemas version will be calculated from underlying
mutations when schema is announced. For static schemas of system
keyspace it is calculated by hashing scylla version and column id,
because we don't have mutations at the time of building the schema.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
13295563e0 schema_builder: Move compact_storage setting outside build()
Properties of the schema are set using methods of schema_builder and
different variants of build() are for different forms of the final
schema object.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
dbb7b7ebe3 db: Move system keyspace initialization to init_system_keyspace() 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
fdb9e01eb4 schema_tables: Use schema_mutations for schema_ptr translations
We will be able to reuse the code in frozen_schema. We need to read
data in mutation form so that we can construct the correct
schema_table_version, and attach the mutations to schema_ptr.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
d07e32bc32 schema_tables: Simplify schema building invocation chain 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
3c3ea20640 schema_tables: Drop pkey parameter from add_table_to_schema_mutation()
It simplifies add_table_to_schema_mutation() interface.

The current code is also a bit confusing, partition_key is created
with the keyspaces() schema and used in mutations destined for the
columnfamilies() schema. It works, the types are the same, but looks a
bit scary.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
22254e94cc query::result_set: Add constructor from mutation 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
a861b74b7e Introduce schema_mutations 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
a6084ee007 mutation: Make hashable
The computed hash is independent of any internal representation thus
can be used as a digest across nodes and versions.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
c009fe5991 keys: Add missing clustering_key_prefix_view::get_compound_type() 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
ade5cf1b4b mutation_partition: Make visitable with mutation_partition_visitor 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
bc9ee083dd db: Move atomic_cell_or_collection to separate header
To break future cyclic dependency:

  atomic_cell.hh -> schema.hh (new) -> types.hh -> atomic_cell.hh
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
6f955e1290 mutation_partition: Make equal() work with different schemas 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
75caba5b8a schema: Guarantee that column id order matches name order
For static and regular (row) columns it is very convenient in some
cases to utilize the fact that columns ordered by ids are also ordered
by name. It currently holds, so make schema export this guarantee and
enable consumers to rely on.

The static schema::row_column_ids_are_ordered_by_name field is about
allowing code external to schema to make it very explicit (via
static_assert) that it relies on this guarantee, and be easily
discoverable in case we would have to relax this.
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
14d0482efa Introduce md5_hasher 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
eb1b21eb4b Introduce hashing helpers 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
ff3a2e1239 mutation_partition: Drop row tombstones in do_compact() 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
eb9b383531 service: migration_manager: Fix announce order to match C*
Current logic differs from C*, we first push to other nodes and then
initiate the the sync locally, while C* does the opposite.
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
0768deba74 query_processor: Add trace-level logging of processed statements 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
dae531554a create_index_statement: Use textual column name in all messages
As pointed out by Pawel, we can rely on operator<<()
Message-Id: <1452243656-3376-1-git-send-email-tgrabiec@scylladb.com>
2016-01-08 11:06:09 +02:00
Tomasz Grabiec
5d6d039297 create_index_statement: Use textual representation of column name
Before:

  InvalidRequest: code=2200 [Invalid query] message="No column definition found for column 736368656d615f76657273696f6e"

After:

  InvalidRequest: code=2200 [Invalid query] message="No column definition found for column schema_version"
Message-Id: <1452243156-2923-1-git-send-email-tgrabiec@scylladb.com>
2016-01-08 10:53:37 +02:00
Avi Kivity
0c755d2c94 db: reduce log spam when ignoring an sstable
With 10 sstables/shard and 50 shards, we get ~10*50*50 messages = 25,000
log messages about sstables being ignored.  This is not reasonable.

Reduce the log level to debug, and move the message to database.cc,
because at its original location, the containing function has nothing to
do with the message itself.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>
2016-01-07 19:23:25 +02:00
Avi Kivity
3377739fa3 main: wait for API http server to start
Wait for the future returned by the http server start process to resolve,
so we know it is started.  If it doesn't, we'll hit the or_terminate()
further down the line and exit with an error code.
Message-Id: <1452092806-11508-3-git-send-email-avi@scylladb.com>
2016-01-07 16:44:07 +02:00
Avi Kivity
fbe3283816 snitch: intentionally leak snitch singleton
Because our shutdown process is crippled (refs #293), we won't shutdown the
snitch correctly, and the sharded<> instance can assert during shutdown.
This interferes with the next patch, which adds orderly shutdown if the http
server fails to start.

Leak it intentionally to work around the problem.
Message-Id: <1452092806-11508-2-git-send-email-avi@scylladb.com>
2016-01-07 16:43:37 +02:00
Pekka Enberg
973c62a486 gms/gossiper: Fix compilation error
Commit 02b04e5 ("gossip: Add is_safe_for_bootstrap") needs on extra
curly bracket to compile.
Message-Id: <1452177529-13555-1-git-send-email-penberg@scylladb.com>
2016-01-07 16:42:55 +02:00
Vlad Zolotarov
07f8549683 database: filter out a manifest.json files
Filter out manifest.json files when reading sstables during
bootup and when loading new sstables ('nodetool refresh').

Fixes issue #529

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1451911734-26511-3-git-send-email-vladz@cloudius-systems.com>
2016-01-07 15:56:02 +02:00
Vlad Zolotarov
c5aa2d6f1a database: lister: add a filtering option
Add a possibility to pass a filter functor receiving a full path
to a directory entry and returning a boolean value: TRUE if an
entry should be enumerated and FALSE - if it should be filtered out.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1451911734-26511-2-git-send-email-vladz@cloudius-systems.com>
2016-01-07 15:56:01 +02:00
Asias He
02b04e5907 gossip: Add is_safe_for_bootstrap
Make the following tests pass:

bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test
bootstrap_test.py:TestBootstrap.killed_wiped_node_cannot_join_test

    1) start node2
    2) wait for cql connection with node2 is ready
    3) stop node2
    4) delete data and commitlog directory for node2
    5) start node2

In step 5), node2 will do the bootstrap process since its data,
including the system table is wiped. It will think itself is a completly
new node and can possiblly stream from wrong node and violate
consistency.

To fix, we reject the boot if we found the node was in SHUTDOWN or
STATUS_NORMAL.

CASSANDRA-9765
Message-Id: <47bc23f4ce1487a60c5b4fbe5bfe9514337480a8.1452158975.git.asias@scylladb.com>
2016-01-07 15:55:01 +02:00
Asias He
933614bdf9 main: Change API server starting message
It comes from the Seastar HTTP server and is inaccurate.

Message-Id: <6a634437d2bd4368400010e25969e215894c2df9.1452162686.git.asias@scylladb.com>
2016-01-07 15:53:28 +02:00
Asias He
6439f4d808 storage_service: Fix load_broadcaster in get_load_map
If get_load_map is called from the api while load_broadcaster is not
set yet, we dereference nullptr.

Fixes #763.
Message-Id: <6f8d554f4976aea85d5cec5a76a3848234138b0a.1452152148.git.asias@scylladb.com>
2016-01-07 10:36:36 +02:00
Asias He
2345cda42f messaging_service: Rename shard_id to msg_addr
Use shard_id as the destination of the messaging_service is confusing,
since shard_id is used in the context of cpu id.
Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>
2016-01-07 10:36:35 +02:00
Asias He
8c909122a6 gossip: Add wait_for_gossip_to_settle
Implement the wait for gossip to settle logic in the bootup process.

CASSANDRA-4288

Fixes:
bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test

1) start node2
2) wait for cql connection with node2 is ready
3) stop node2
4) delete data and commitlog directory for node2
5) start node2

In step 5, sometimes I saw in shadow round of node2, it gets node2's
status as BOOT from other nodes in the cluster instead of NORMAL. The
problem is we do not wait for gossip to settle before we start cql server,
as a result, when we stop node2 in step 3), other nodes in the cluster
have not got node2's status update to NORMAL.
2016-01-07 10:09:25 +02:00
Benoît Canet
8f725256e1 config: Mark ssl_storage_port as Used
Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1452082041-6117-1-git-send-email-benoit@scylladb.com>
2016-01-06 17:34:53 +02:00
Benoît Canet
e80c8b6130 config: Mark previously unused SSL client/server options as used
The previous SSL enablement patches do make uses of these
options but they are still marked as Unused.
Change this and also update the db/config.hh documentation
accordingly.

Syntax is now:

client_encryption_options:
   enabled: true
   certificate: <path-to-PEM-x509-cert> (default conf/scylla.crt)
   keyfile: <path-to-PEM-x509-key> (default conf/scylla.key)

Fixes: #756.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1452032073-6933-1-git-send-email-benoit@scylladb.com>
2016-01-06 10:32:53 +02:00
Tomasz Grabiec
9d71e4a7eb Merge branch 'fix_to_issue_676_v4' from git@github.com:raphaelsc/scylla.git
Compaction fixes from Raphael:

There were two problems causing issue 676:
1) max_purgeable was being miscalculated (fixed by b7d36af).
2) empty row not being removed by mutation_partition::do_compact
Testcase is added to make sure that a tombstone will be purged under
certain conditions.
2016-01-05 15:19:22 +01:00
Raphael S. Carvalho
a81b660c0d tests: check that tombstone is purged under certain conditions
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-05 15:19:21 +01:00
Raphael S. Carvalho
03eee06784 remove empty rows in mutation_partition::do_compact
do_compact() wasn't removing an empty row that is covered by a
tombstone. As a result, an empty partition could be written to a
sstable. To solve this problem, let's make trim_rows remove a
row that is considered to be empty. A row is empty if it has no
tombstone, no marker and no cells.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-05 15:19:21 +01:00
Pekka Enberg
800ed6376a Merge "Repair overhaul" from Nadav
"This is another version of the repair overhaul, to avoid streaming *all* the
 data between nodes by sending checksums of token ranges and only streaming
 ranges which contain differing data."
2016-01-05 16:05:44 +02:00
Pekka Enberg
f4bdec4d09 Merge "Support for deleting all snapshots" from Vlad
"Add support for deleting all snapshots of all keyspaces."

Fixes #639.
2016-01-05 15:42:44 +02:00
Nadav Har'El
f90e1c1548 repair: support "hosts" and "dataCenters" parameters
Support the "hosts" and "dataCenters" parameters of repair. The first
specifies the known good hosts to repair this host from (plus this host),
and the second asks to restrict the repair to the local data center (you
must issue the repair to a node in the data center you want to repair -
issuing the command to a data center other than the named one returns
an error).

For example these options are used by nodetool commands like:
nodetool repair -hosts 127.0.0.1,127.0.0.2 keyspace
nodetool repair -dc datacenter1

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Nadav Har'El
ac4e86d861 repair: use repair_checksum_range
The existing repair code always streamed the entire content of the
database. In this overhaul, we send "repair_checksum_range" messages to
the other nodes to verify whether they have exactly the same data as
this node, and if they do, we avoid streaming the identical code.

We make an attempt to split the token ranges up to contain an estimated
100 keys each, and send these ranges' checksums. Future versions of this
code will need to improve this estimation (and make this "100" a parameter)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Nadav Har'El
9e65ecf983 repair: convenience function for syncing a range
This patch adds a function sync_range() for synchronizing all partitions
in a given token range between a set of replicas (this node and a list of
neighbors).
Repair will call this function once it has decided that the data the
replicas hold in this range is not identical.

The implementation streams all the data in the given range, from each of
the neighbors to this node - so now this node contains the most up-to-date
data. It then streams the resulting data back to all the neighbors.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Nadav Har'El
f5b2135a80 repair: repair_checksum_range message
This patch adds a new type of message, "REPAIR_CHECKSUM_RANGE" to scylla's
"messaging_service" RPC mechanism, for the use of repair:

With this message the repair's master host tells a slave host to calculate
the checksum of a column-family's partitions in a given token range, and
return that checksum.

The implementation of this message uses the checksum_range() function
defined in the previous patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Nadav Har'El
e9d266a189 repair: checksum of partitions in range
This patch adds functions for calculating the checksum of all the
partitions in a given token range in the given column-family - either
in the current shard, or across all shards in this node.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Nadav Har'El
0591fa7089 repair: partition-set checksum
This patch adds a mechanism for calculating a checksum for a set of
partitions. The repair process will use these checksums to compare the
data held by different replicas.

We use a strong checksum (SHA-256) for each individual partition in the set,
and then a simple XOR of those checksums to produce a checksum for the
entire set. XOR is good enough for merging strong checksums, and allows us
to independently calculate the checksums of different subsets of the
original sets - e.g., each shard can calculate its own checksum and we
can XOR the resulting checksums to get the final checksum.

Apache Cassandra uses a very similar checksum scheme, also using SHA-256
and XOR. One small difference in the implementation is that we include the
partition key in its checksum, while Cassandra don't, which I believe to
have no real justification (although it is very unlikely to cause problems
in practice). See further discussion on this in
https://issues.apache.org/jira/browse/CASSANDRA-10728.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Nadav Har'El
faa87b31a8 fix to_partition_range() inclusiveness
A cut-and-paste accident in query::to_partition_range caused the wrong
end's inclusiveness to be tested.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Shlomi Livne
846bf9644e dist/redhat: Increase scylla-server service start timeout to 15 min
Fixes #749

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-01-05 15:30:22 +02:00
Pekka Enberg
67ccd05bbe api/storage_service: Wire up 'compaction_throughput_mb_per_sec'
The API is needed by nodetool compactionstats command.
2016-01-05 13:01:05 +02:00
Pekka Enberg
5db82aa815 Merge "Fix frozen collections" from Paweł
"This series prevents frozen collections from appearing in the schema
comparator.

Fixes #579."
2016-01-05 12:43:06 +02:00
Paweł Dziepak
284162c41b test/cql3: add test for frozen collections
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 11:13:53 +01:00
Paweł Dziepak
a5a744655e schema: do not add frozen collections to compound name
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 10:49:32 +01:00
Paweł Dziepak
ed7d9d4996 schema: change has_collections() to has_multi_column_collections()
All users of schema::has_collections() aren't really interested in
frozen ones.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 10:46:42 +01:00
Tomasz Grabiec
fecb1c92e7 Merge branch 'pdziepak/prepare-for-alter-table/v1' from seastar-dev.git 2016-01-05 10:30:46 +01:00
Paweł Dziepak
70f5ed6c64 cql3: enable ALTER TABLE in cql3 grammar definition
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 09:49:04 +01:00
Paweł Dziepak
3ca4e27dba cql3: convert alter_table_statement to c++
Everything except alter_table_statement::announce_migration() is
translated. announce_migration() has to wait for multi schema support to
be merged.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 09:49:04 +01:00
Paweł Dziepak
35edda76c4 cql3: import AlterTableStatement.java
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 09:49:04 +01:00
Paweł Dziepak
b615bfa47e cql3: add cf_prop_defs::get_default_time_to_live()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 09:49:04 +01:00
Paweł Dziepak
18747b21f2 map_difference: accept std::unordered_map
map_difference implementation doesn't need elements in the container to
be sorted.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 09:49:04 +01:00
Tomasz Grabiec
71d6e73ae3 map_difference: Define default value for key_comp 2016-01-05 09:49:04 +01:00
Paweł Dziepak
3693e77eec schema: add is_cql3_table()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 09:49:04 +01:00
Paweł Dziepak
4bd031d885 schema: add column_definition::is_part_of_cell_name()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 09:49:04 +01:00
Paweł Dziepak
f39f21ce02 schema: add column_definition::is_indexed()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-05 09:49:04 +01:00
Glauber Costa
74fbd8fac0 do not call open_file_dma directly
We have an API that wraps open_file_dma which we use in some places, but in
many other places we call the reactor version directly.

This patch changes the latter to match the former. It will have the added benefit
of allowing us to make easier changes to these interfaces if needed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>
2016-01-05 10:37:57 +02:00
Avi Kivity
e9400dfa96 Revert "sstable: Initialize super class of malformed_sstable_exception"
This reverts commit d69dc32c92d63057edf9f84aa57ca53b2a6e37e4; it does nothing
and does not address issue #669.
2016-01-05 10:21:00 +02:00
Benoît Canet
d69dc32c92 sstable: Initialize super class of malformed_sstable_exception
This exception was not caught properly as a std::exception
by report_failed_future call to report_exception because the
superclass std::exception was not initialized.

Fixes #669.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
2016-01-05 09:54:36 +02:00
Lucas Meneghel Rodrigues
8ef9a60c09 dist/common/scripts/scylla_prepare: Change message to error
Recently, Scylla was changed to mandate the use of XFS
for its data directories, unless the flag --developer-mode true
is provided. So during the AMI setup stage, if the user
did not provide extra disks for the setup scripts to prepare,
the scylla service will refuse to start. Therefore, the
message in scylla_prepare has to be changed to an actual
error message, and the file name, to be changed to
something that reflects the event that happened.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2016-01-05 09:23:57 +02:00
Tomasz Grabiec
cc6c35d45c Move seastar submodule head
Avi Kivity (1):
      Merge "rpc negotiation fixes" from Gleb

Gleb Natapov (3):
      rpc: fix peer address printing during logging
      rpc: make server send negotiation frame back before closing connection on error
      rpc: fix documentation for negotiation procedure.

Nadav Har'El (1):
      fix operator<<() for std::vector<T>

Tomasz Grabiec (1):
      core/byteorder: Add missing include
2016-01-04 15:07:21 +01:00
Shlomi Livne
f26a75f48b Fixing missing items in move from scylla-ami.sh to scylla_install
scylla-ami.sh moved some ami specific files. This parts have been
dropped when converging scylla-ami into scylla_install. Fixing that.

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-01-04 15:23:14 +02:00
Shlomi Livne
cec6e6bc20 Invoke scylla_bootparam_setup with/without ami flag
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-01-04 15:23:08 +02:00
Shlomi Livne
aebeb95342 Fix error: no integer expression expected in AMI creation
The script imports the /etc/sysconfig/scylla-server for configuration
settings (NR_PAGES). The /etc/sysconfig/scylla-server iincludes an AMI
param which is of string value and called as a last step in
scylla_install (after scylla_bootparam_setup has been initated).

The AMI variable is setup in scylla_install and is used in multiple
scripts. To resolve the conflict moving the import of
/etc/sysconfig/scylla-server after the AMI variable has been compared.

Fixes: #744

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-01-04 15:23:01 +02:00
Pekka Enberg
24809de44f dist/docker: Switch to CentOS 7 as base image
Switch to CentOS 7 as the Docker base image. It's more stable and
updated less frequently than Fedora. As a bonus, it's Thrift package
doesn't pull the world as a dependency which reduces image size from 700
MB to 380 MB.

Suggested by Avi.
Message-Id: <1451911969-26647-1-git-send-email-penberg@scylladb.com>
2016-01-04 14:53:53 +02:00
Tomasz Grabiec
5a9d45935a Merge tag 'asias/fix_cql_query_test/v1' from seastar-dev.git
Fixes for cql_query_test and gossip_test from Asias.
2016-01-04 12:28:49 +01:00
Avi Kivity
c559008915 transport: protect against excessive memory consumption
If requests are delayed downstream from the cql server, and the client is
able to generate unrelated requests without limit, then the transient memory
consumed by the requests will overflow the shard's capacity.

Fix by adding a semaphore to cap the amount of transient memory occupied by
requests.

Fixes #674.
2016-01-04 12:11:00 +01:00
Avi Kivity
78429ad818 types: implement collection compatibility checks
compatible: can be cast, keeps sort order
value-compatible: can be cast, may change sort order

frozen: values participate in sort order
unfrozen: only sort keys participate in sort order

Fixes #740.
2016-01-04 11:02:21 +01:00
Avi Kivity
33fd044609 Merge "Event notification exception safety" from Pekka
"Fix both migration manager and storage service to catch and log
exceptions for listeners to ensure all listeners are notified.

Spotted by Avi."
2016-01-04 11:03:01 +02:00
Pekka Enberg
f646241f1c service/storage_service: Make event notification exception safe
If one of the listeners throws an exception, we must ensure that other
listeners are still notified.

Spotted by Avi.
2016-01-04 10:40:02 +02:00
Pekka Enberg
1e29b07e40 service/migration_manager: Make event notification exception safe
If one of the listeners throws an exception, we must ensure that other
listeners are still notified.

Spotted by Avi.
2016-01-04 10:39:49 +02:00
Asias He
7c9f9f068f cql_server: Do not ignore future in stop
Now connection.shutdown() returns future, we can not ignore it.
Plus, add shutdown process info for debug.

Message-Id: <b2d100bf9c817d7a230c6cd720944ba4fae416e2.1451894645.git.asias@scylladb.com>
2016-01-04 10:17:44 +02:00
Amnon Heiman
6942b41693 API: rename the map of string, double to map_string_double
This replaces the confusing name to a more meaningful name.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1451466952-405-1-git-send-email-amnon@scylladb.com>
2016-01-03 19:10:49 +02:00
Avi Kivity
0d93aa8797 Merge "revert redundant use of gate in messaging_service" from Gleb
"Messaging service inherits from seastar::async_sharded_service which
guaranties that sharded<>::stop() will not complete until all references
to the service will not go away. It was done specifically to avoid using
more verbose gate interface in sharded<> services since it turned out
that almost all of them need one eventually. Unfortunately patches to
add redundant gate to messaging_service sneaked pass my review. The series
reverts them."
2016-01-03 16:48:40 +02:00
Gleb Natapov
fae98f5d67 Revert "messaging_service: wait for outstanding requests"
This reverts commit 9661d8936b.
Message-Id: <1450690729-22551-3-git-send-email-gleb@scylladb.com>
2016-01-03 16:06:39 +02:00
Gleb Natapov
de0771f1d1 Revert "messaging_service: restore indentation"
This reverts commit dcbba2303e.
Message-Id: <1450690729-22551-2-git-send-email-gleb@scylladb.com>
2016-01-03 16:06:38 +02:00
Vlad Zolotarov
7bb2b2408b database::clear_snapshot(): added support for deleting all snapshots
When 'nodetool clearsnapshot' is given no parameters it should
remove all existing snapshots.

Fixes issue #639

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-03 14:22:25 +02:00
Vlad Zolotarov
d5920705b8 service::storage_service: move clear_snapshot() code to 'database' class
service::storage_service::clear_snapshot() was built around _db.local()
calls so it makes more sense to move its code into the 'database' class
instead of calling _db.local().bla_bla() all the time.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-03 14:22:17 +02:00
Vlad Zolotarov
d0cbcce4a2 service::storage_service: remove not used variable ('deleted_keyspaces')
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-03 13:30:46 +02:00
Takuya ASADA
6e8dd00535 dist: apply limits settings correctly on Ubuntu
Fixes #738

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-02 12:20:18 +02:00
Asias He
d6e352706a storage_service: Drop duplicated print
We have done that in the logger.
2016-01-01 10:15:17 +08:00
Asias He
4952042fbf tests: Fix cql_test_env.cc
Current service initialization is a total mess in cql_test_env. Start
the service the same order as in main.cc.

Fixes #715, #716

'./test.py --mode release' passes.
2016-01-01 10:15:17 +08:00
Asias He
ca935bf602 tests: Fix gossip_test
Gossip depends on get_ring_delay from storage_service. storage_service
depends on database. Start them.
2016-01-01 10:15:17 +08:00
Benoît Canet
de508ba6cc README: Add missing build dependencies
It's just a waste of time to find them manually
when compiling ScyllaDB on a fresh install: add them.

Also fix the ninja-build name.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
2015-12-31 13:34:48 +02:00
Shlomi Livne
6316fedc0a Make sure the directory we are writting coredumps to exists
After upgrading an AMI and trying to stop and start a machine the
/var/lib/scylla/coredump is not created. Create the directory if it does
not exist prior to generating core

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2015-12-31 13:21:39 +02:00
Pekka Enberg
32d22a1544 cql3: Fix relation grammar rule
There's two untranslated grammar subrules in the 'relation' rule. We
have all the necessary AST classes translated, so translate the
remaining subrules.

Refs #534.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-12-31 11:44:15 +01:00
Tomasz Grabiec
e23adb71cf Merge tag 'asias/streaming/cleanup_verb/v2' from seastar-dev.git
messaging_service cleanups and improvements from Asias
2015-12-31 11:25:09 +01:00
Asias He
1b3d2dee8f streaming: Drop src_cpu_id parameter
Now that we can get the src_cpu_id from rpc::client_info.
No need to pass it as verb parameter.
2015-12-31 11:25:09 +01:00
Asias He
3ae21e06b5 messaging_service: Add src_cpu_id to CLIENT_ID verb
It is useful to figure out which shard to send messages back to the
sender.
2015-12-31 11:25:09 +01:00
Asias He
22d0525bc0 streaming: Get rid of the _from_ parameter
Get this from cinfo.retrieve_auxiliary inside the rpc handler.
2015-12-31 11:25:08 +01:00
Asias He
89b79d44de streaming: Get rid of the _connecting_ parameter
messaging_service will use private ip address automatically to connect a
peer node if possible. There is no need for the upper level like
streaming to worry about it. Drop it simplifies things a bit.
2015-12-31 11:25:08 +01:00
Amnon Heiman
8d31c27f7b api: stream_state should include the stream_info
The stream info was creted but was left out of the stream state. This
patch adds the created stream_info to the stream state vector.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-31 12:01:03 +02:00
Gleb Natapov
2bcfe02ee6 messaging: remove unused verbs 2015-12-30 15:06:35 +01:00
Gleb Natapov
f0e8b8805c messaging: constify some handlers 2015-12-30 15:06:35 +01:00
Avi Kivity
a318b02335 Merge seastar upstream
* seastar 8b2171e...de112e2 (4):
  > build: disable -fsanitize=vptr harder
  > Merge "Make RPC more robust against protocol changes" from Gleb
  > new when_all implementation
  > apps: memcached: Don't fiddle with the expiration time value in a usable range
2015-12-30 14:56:09 +02:00
Pekka Enberg
500f5b3f27 dist: Increase NOFILE rlimit to 200k
Commit 2ba4910 ("main: verify that the NOFILE rlimit is sufficient")
added a recommendation to set NOFILE rlimit to 200k. Update our release
binaries to do the same.
2015-12-30 11:18:23 +01:00
Avi Kivity
2ba4910385 main: verify that the NOFILE rlimit is sufficient
Require 10k files, recommend 200k.

Allow bypassing via --developer-mode.

Fixes #692.
2015-12-30 11:02:08 +02:00
Avi Kivity
c26689f325 init: bail out if running not on an XFS filesystem
Allow an override via '--developer-mode true', and use it in
the docker setup, since that cannot be expected to use XFS.

Fixes #658.
2015-12-30 10:56:21 +02:00
Pekka Enberg
0aa105c9cf Merge "load report a negative value" from Amnon
"This series solve an issue with the load broadcaster that reports negative
 values due to an integer wrap around.  While fixing this issue an additional
 change was made so that the load_map would return doubles and not formatted
 string.  This is a better API, safer and better documented."
2015-12-30 10:21:55 +02:00
Nadav Har'El
f0b27671a2 murmur3 partitioner: remove outdated comment, and code
Since commit 16596385ee, long_token() is already checking
t.is_minimum(), so the comment which explains why it does not (for
performance) is no longer relevant. And we no longer need to check
t._kind before calling long_token (the check we do here is the same
as is_minimum).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-30 10:01:29 +02:00
Nadav Har'El
de5a3e5c5a repair: check columnFamilies list
Check the list of column families passed as an option to repair, to
provide the user with a more meaningful exception when a non-existant
column family is passed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-30 09:59:54 +02:00
Nadav Har'El
3ae29216c8 repair: add missing ampersand
This was a plain bug - ranges_opt is supposed to parse the option into
the vector "var", but took the vector by value.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-30 09:46:13 +02:00
Nadav Har'El
a0a649c1be repair: support "columnFamilies" parameter
Support the "columnFamilies" parameter of repair, allowing to repair
only some of the column families of a keyspace, instead of all of them.
For example, using a command like "nodetool repaire keyspace cf1 cf2".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-30 09:45:28 +02:00
Lucas Meneghel Rodrigues
43d39d8b03 scylla_coredump_setup: Don't call yum on scylla server spec file
The script scylla_coredump_setup was introduced in
9b4d0592, and added to the scylla rpm spec file, as a
post script. However, calling yum when there's one
yum instance installng scylla server will cause a deadlock,
since yum waits for the yum lock to be released, and the
original yum process waits for the script to end.

So let's remove this from the script. Debian shouldn't be
affected, since it was never added to the debian build
rules (to the best of my knowlege, after analyzing 9b4d0592),
hence I did not remove it. It should cause the same problem
with apt-get in case it was used.

CC: Takuya ASADA <syuu@scylladb.com>
[ penberg: Rebase and drop statement about 'abrt' package not in Fedora. ]
Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-12-30 09:38:36 +02:00
Nadav Har'El
ebebaa525d repair: fix missing default values
A default value was not set for the "incremental" and "parallelism"
repair parameters, so Scylla can wrongly decide that they have an
unsupported value.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-29 15:39:47 +02:00
Amnon Heiman
ec379649ea API: repair to use documented params
The repair API use to have an undocumented parameter list similiar to
origin.

This patch changes the way repair is getting its parameters.
Instead of a one undocumented string it now lists all the different
optional parameters in the swagger file and accept them explicitely.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-29 15:38:44 +02:00
Amnon Heiman
f0d68e4161 main: start the http server in the first step
This change set the http server to start as the first step in the boot
order.

It is helpfull if some other step takes a long time or stuck.

Fixes #725

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-29 14:20:57 +02:00
Avi Kivity
c8b09a69a9 lsa: disable constant_time_size in binomial_heap implementation
Corrupts heap on boost < 1.60, and not needed.

Fixes #698.
2015-12-29 12:59:00 +01:00
Vlad Zolotarov
756de38a9d database: actually check that a snapshot directory exists
Actually check that a snapshot directory with a given tag
exists instead of just checking that a 'snapshot' directory
exists.

Fixes issue #689

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-29 12:59:00 +01:00
Amnon Heiman
71905081b1 API: report the load map as an unformatted double
In origin the storage_serivce report the load map as a formatted string.
As an API a better option is to report the load map as double and let
the JMX proxy do the formatting.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-29 11:55:34 +02:00
Amnon Heiman
06e1facc34 load_broadcaster report negative size
The map_reduce0 convert the result value to the init value type. In
load_bradcaster 0 is of type int.
This result with an int wrap around and negative results.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-29 11:55:34 +02:00
Avi Kivity
41bd266ddd db: provide more information on "Unrecognized error" while loading sstables
This information can be used to understand the root cause of the failure.

Refs #692.
2015-12-29 10:23:32 +02:00
Nadav Har'El
7247f055df repair: partial support for some options
Add partial support for the "incremental" option (only support the
"false" setting, i.e., not incremental repair) and the "parallelism"
option (the choice of sequential or parallel repair is ignored - we
always use our own technique).

This is needed because scylla-jmx passes these options by default
(e.g., "incremental=false" is passed to say this is *not* incremental
repair, and we just need to allow this and ignore it).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-29 09:38:09 +02:00
Nadav Har'El
3cfa39e1f0 repair: log repair options
When throwing an "unsupported repair options" exception to the caller
(such as "nodetool repair"), also list which options were not recognized.
Additionally, list the options when logging the repair operation.

This patch includes an operator<< implementation for pretty-printing an
std::unordered_map. We may want to move it later to a more central
location - even Seastar (like we have a pretty-printer for std::vector
in core/sstring.hh).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-29 09:37:30 +02:00
Raphael S. Carvalho
b7d36af26f compaction: fix max_purgeable calculation
max_purgeable was being incorrectly calculated because the code
that creates vector of uncompacted sstables was wrong.
This value is used to determine whether or not a tombstone can
be purged.
Operand < is supposed to be used instead in the callback passed
as third parameter to boost::set_difference.
This fix is a step towards closing the issue #676.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-29 09:30:08 +02:00
Takuya ASADA
46767fcacf dist: fix .rpm build error (File not found: scylla_extlinux_setup)
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-29 09:26:58 +02:00
Pekka Enberg
ca1f9f1c9a main: Fix implicitly disabled client encryption options
The start_native_transport() function in storage_service expects the
'enabled' option to be defined. If the option is not defined, it means
that encryption is implicitly disabled.

Fixes #718.
2015-12-28 16:24:49 +02:00
Pekka Enberg
a76b3a009b Merge "use steady_clock where monotonic clock is required" from Vlad
"The first patch in this series fixes the issue #638 in scylla.
 The second one fixes the tests to use the appropriate clock."
2015-12-28 13:35:50 +02:00
Avi Kivity
561bb79d22 Merge "CQL server SSL" from Calle
"* Update scylla.conf section
 * Add SSL capability to cql server
 * Use conf and initiate optional SSL cql server in
   main/storage_service"
2015-12-28 12:55:25 +02:00
Avi Kivity
72cb8d4461 Merge "Messaging service TLS" from Calle
"Adds support for TLS/SSL encrypted (and cert verified)
connections for message service

* Modify config option to match "native" style cerificate management
* Add SSL options to messaging service and generate SSL server/client
  endpoints when required
* Add config option handling to init/main"
2015-12-28 12:54:28 +02:00
Calle Wilund
fae3bb7a24 storage_service: Set up CQL server as SSL if specified
* Massage user options in main
* Use them in storage_service, and if needed, load certificates etc
  and pass to transport/cql server.

Conflicts:
	service/storage_service.cc
2015-12-28 10:13:48 +00:00
Calle Wilund
51d3990261 cql_server: Allow using SSL socket
Optional credentials argument determine if SSL or normal
server socket is created.

Note: This does not follow the pattern of "socket as argument", simply
because this is a distributed object, so only trivial or immutable
objects should be passed to it.
2015-12-28 10:13:48 +00:00
Calle Wilund
d8b2581a07 scylla.conf: Update client_encryption_options with scylla syntax
Using certificate+key directly
2015-12-28 10:13:48 +00:00
Calle Wilund
5f003f9284 scylla.conf: Modify server_encryption_options section
Describe scylla version of option.

Note, for test usage, the below should be workable:

server_encryption_options:
    internode_encryption: all
    certificate: seastar/tests/test.crt
    truststore: seastar/tests/catest.pem
    keyfile: seastar/tests/test.key

Since the seastar test suite contains a snakeoil cert + trust
combo
2015-12-28 10:10:35 +00:00
Calle Wilund
70f293d82e main/init: Use server_encryption_options
* Reads server_encryption_options
* Interpret the above, and load and initialize credentials
  and use with messaging service init if required
2015-12-28 10:10:35 +00:00
Calle Wilund
d1badfa108 messaging_service: Optionally create SSL endpoints
* Accept port + credentials + option for what to encrypt
* If set, enable a SSL listener at ssl_port
* Check outgoing connections by IP to determine if
  they should go to SSL/normal endpoint

Requires seastar RPC patch

Note: currently, the connections created by messaging service
does _not_ do certificate name verification. While DNS lookup
is probably not that expensive here, I am not 100% sure it is
the desired behaviour.
Normal trust is however verified.
2015-12-28 10:10:35 +00:00
Calle Wilund
1a9fb4ed7f config: Modify/use server_encryption_options
* Mark option used
* Make sub-options adapted to seastar-tls useable values (i.e. x509)

Syntax is now:

server_encryption_options:
	internode_encryption: <none, all, dc, rack>
	certificate: <path-to-PEM-x509-cert> (default conf/scylla.crt)
	keyfile: <path-to-PEM-x509-key> (default conf/scylla.key)
	truststore: <path-to-PEM-trust-store-file> (default empty,
                                                    use system trust)
2015-12-28 10:10:35 +00:00
Calle Wilund
b7baa4d1f5 config: clean up some style + move method to cc file 2015-12-28 10:10:35 +00:00
Takuya ASADA
fc29a341d2 dist: show usage and scylla-server status when login to AMI instance
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-28 11:40:34 +02:00
Avi Kivity
827a4d0010 Merge "streaming: Invalidate cache upon receiving of stream" from Asias
"When a node gain or regain responsibility for certain token ranges, streaming
will be performed, upon receiving of the stream data, the row cache
is invalidated for that range.

Refs #484."
2015-12-28 10:24:46 +02:00
Amnon Heiman
2c79fe1488 storage_service: describe_ring return full data
The describe_ring method in storage_service did not report the start and
end tokens.

Also for rpc addresses that are not the local address, it returned the
value representation (including the version) and not just the adress.

Fixes #695

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-28 09:56:12 +02:00
Takuya ASADA
0abcf5b3f3 dist: use readable time format on coredump file, instead of unix time
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-28 09:55:05 +02:00
Takuya ASADA
940c34b896 dist: don't abort scylla_coredump_setup when 'yum remove abrt' failed
It always fail when abrt is not installed.
This also fixes build_ami.sh failing because of this error.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-28 09:40:57 +02:00
Vlad Zolotarov
0f8090d6c7 tests: use steady_clock where monotinic clock is required
Use steady_clock instead of high_resolution_clock where monotonic
clock is required. high_resolution_clock is essentially a
system_clock (Wall Clock) therefore may not to be assumed monotonic
since Wall Clock may move backwards due to time/date adjustments.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-27 18:08:15 +02:00
Vlad Zolotarov
33552829b2 core: use steady_clock where monotinic clock is required
Use steady_clock instead of high_resolution_clock where monotonic
clock is required. high_resolution_clock is essentially a
system_clock (Wall Clock) therefore may not to be assumed monotonic
since Wall Clock may move backwards due to time/date adjustments.

Fixes issue #638

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-27 18:07:53 +02:00
Takuya ASADA
7f4a1567c6 dist: support non-ami boot parameter setup, add parameters for preallocate hugepages on boot-time
Fixes #172

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-27 17:56:49 +02:00
Takuya ASADA
6bf602e435 dist: setup ntpd on AMI
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-27 17:54:32 +02:00
Avi Kivity
2b22772e3c Merge "Introduce keep alive timer for stream_session" from Asias
"Fixes stream_session hangs:

1) if the sending node is gone, the receiving peer will wait forever
2) if the node which should send COMPLETE_MESSAGE to the peer node is gone,
   the peer node will wait forever"
2015-12-27 16:56:32 +02:00
Avi Kivity
f3980f1fad Merge seastar upstream
* seastar 51154f7...8b2171e (9):
  > memcached: avoid a collision of an expiration with time_point(-1).
  > tutorial: minor spelling corrections etc.
  > tutorial: expand semaphores section
  > Merge "Use steady_clock where monotonic clock is required" from Vlad
  > Merge "TLS fixes + RPC adaption" from Calle
  > do_with() optimization
  > tutorial: explain limiting parallelism using semaphores
  > submit_io: change pending flushes criteria
  > apps: remove defunct apps/seastar

Adjust code to use steady_clock instead of high_resolution_clock.
2015-12-27 14:40:20 +02:00
Avi Kivity
0687d7401d Merge "storage_service updates" from Asias
"
- Fix erase of new_replica_endpoints in get_changed_ranges_for_leaving
- Introduce ntroduce ring_delay_ms option
"
2015-12-27 12:46:37 +02:00
Nadav Har'El
06f8dd4eb2 repair: job id must start at 1
This patch fixes a bug where the *first* run of "nodetool repair" always
returned immediately, instead of waiting for the repair to complete.

Repair operations are asynchronous: Starting a repair returns a numeric
id, which can then be used to query for the repair's completion, and this
is what "nodetool repair" does (through our JMX layer). We started with
the repair ID "0", the next one is "1", and so on.

The problem is that "nodetool repair", when it sees 0 being returned,
treats it not as a regular repair ID, but rather as an answer that
there is nothing to repair - printing a message to that effect and *not*
waiting for the repair (which was correctly started) to complete.

The trivial fix is to start our repair IDs at 1, instead of 0.
We currently do not return 0 in any case (we don't know there is nothing
to repair before we actually start the work, and parameter errors
cause an exception, not a return of 0).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-27 12:42:26 +02:00
Avi Kivity
93aeedf403 Merge "Fixes for CentOS/RHEL support" from Takuya
"Recent changes on scripts causes error on CentOS/RHEL, this patchset fixes it."
2015-12-27 12:21:29 +02:00
Glauber Costa
e299127e81 main: check if options file can be read.
If we can't open the file, we will fail with a misterious error. It is a costumary
scenario, though, since people who are unaware or have just forgotten about seastar's
restriction of direct io access may put those files in tmpfs and other mount points.

We have a direct_io check that is designed exactly for this purpose, so as to give
the user a better error message. This patch makes use of it.

Fixes #644

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2015-12-27 12:20:40 +02:00
Asias He
f57ba6902b storage_service: Introduce ring_delay_ms option
It is hard-coded as 30 seconds at the moment.

Usage:
$ scylla --ring-delay-ms 5000

Time a node waits to hear from other nodes before joining the ring in
milliseconds.

Same as -Dcassandra.ring_delay_ms in cassandra.
2015-12-25 15:08:22 +08:00
Asias He
9c07ed8db6 storage_service: Fix erase new_replica_endpoints in get_changed_ranges_for_leaving
We need to calculate begin() and end() in the loop since elements in
new_replica_endpoints might be removed.

Refs #700
2015-12-25 15:08:22 +08:00
Asias He
88846bc816 storage_service: Add more debug info in decommission
It is useful to debug decommission issue.
2015-12-25 15:08:22 +08:00
Asias He
19f1875682 gossip: Print endpoint_state_map debug info in trace level
This generates too many logs with debug level. Make it trace level.
2015-12-25 15:08:22 +08:00
Nadav Har'El
06ab43a7ee murmur3 partitioner: fix midpoint() algorithm
The midpoint() algorithm to find a token between two tokens doesn't
work correctly in case of wraparound. The code tried to handle this
case, but did it wrong. So this patch fixes the midpoint() algorithm,
and adds clearer comments about why the fixed algorithm is correct.

This patch also modifies two midpoint() tests in partitioner_test,
which were incorrect - they verified that midpoint() returns some expected
values, but expected values were wrong!

We also add to the test a more fundemental test of midpoint() correctness,
which doesn't check the midpoint against a known value (which is easy to
get wrong, like indeed happened); Rather we simply check that the midpoint
is really inside the range (according to the token ordering operator).
This simple test failed with the old implementation of midpoint() and
passes with the new one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-24 17:19:49 +02:00
Avi Kivity
3392f02b54 Merge "Make date parser more liberal" from Paweł
"This series makes date and time parsing more liberal so that Scylla
accepts the same date formats the origin does.

Fixes #521."
2015-12-24 17:18:04 +02:00
Asias He
20c258f202 streaming: Fix session hang with maybe_completed: WAIT_COMPLETE -> WAIT_COMPLETE
The problem is that we set the session state to WAIT_COMPLETE in
send_complete_message's continuation, the peer node might send
COMPLETE_MESSAGE before we run the continuation, thus we set the wrong
status in COMPLETE_MESSAGE's handler and will not close the session.

Before:

   GOT STREAM_MUTATION_DONE
   receive  task_completed
   SEND COMPLETE_MESSAGE to 127.0.0.2:0
   GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0
   complete: PREPARING -> WAIT_COMPLETE
   GOT COMPLETE_MESSAGE Reply
   maybe_completed: WAIT_COMPLETE -> WAIT_COMPLETE

After:

   GOT STREAM_MUTATION_DONE
   receive  task_completed
   maybe_completed: PREPARING -> WAIT_COMPLETE
   SEND COMPLETE_MESSAGE to 127.0.0.2:0
   GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0
   complete: WAIT_COMPLETE -> COMPLETE
   Session with 127.0.0.2 is complete
2015-12-24 20:34:44 +08:00
Asias He
c971fad618 streaming: Introduce keep alive timer for each stream_session
If the session is idle for 10 minutes, close the session. This can
detect the following hangs:

1) if the sending node is gone, the receiving peer will wait forever
2) if the node which should send COMPLETE_MESSAGE to the peer node is
gone, the peer node will wait forever

Fixes simple_kill_streaming_node_while_bootstrapping_test.
2015-12-24 20:34:44 +08:00
Asias He
f527e07be6 streaming: Get stream_session in STREAM_MUTATION handler
Get from address from cinfo. It is needed to figure out which stream
session this mutation is belonged to, since we need to update the keep
alive timer for this stream session.
2015-12-24 20:34:44 +08:00
Asias He
d7a8c655a6 streaming: Print All sessions completed after state change message
close_session will print "All sessions completed" message, print the
state change message before that.
2015-12-24 20:34:44 +08:00
Asias He
bd276fd087 streaming: Increase retry timeout
Currently, if the node is actually down, although the streaming_timeout
is 10 seconds, the sending of the verb will return rpc_closed error
immediately, so we give up in 20 * 5 = 100 seconds. After this change,
we give up in 10 * 30 = 300 seconds at least, and 10 * (30 + 30) = 600
seconds at most.
2015-12-24 20:34:44 +08:00
Asias He
eaea09ee71 streaming: Retransmit COMPLETE_MESSAGE message
It is oneway message at the moment. If a COMPLETE_MESSAGE is lost, no
one will close the session. The first step to fix the issue is to try to
retransmit the message.
2015-12-24 20:34:44 +08:00
Asias He
d1d6395978 streaming: Print old state before setting the new state 2015-12-24 20:34:44 +08:00
Takuya ASADA
bf9547b1c4 dist: support RHEL on scylla_install
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-24 18:48:30 +09:00
Takuya ASADA
bb0880f024 dist: use /etc/os-release instead of /etc/redhat-release
Since other scripts using /etc/os-release, it is better to use same one here.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-24 18:48:30 +09:00
Takuya ASADA
b6df28f3d5 dist: use $ID instead of $NAME to detect type of distribution
$NAME is full name of distribution, for script it is too long.
$ID is shortened one, which is more useful.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-24 18:48:30 +09:00
Takuya ASADA
0a4b68d35e dist: support CentOS yum repository
Fixes #671

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-24 18:48:30 +09:00
Takuya ASADA
8f4e90b87a dist: use tsc clocksource on AMI
Stop using xen clocksource, use tsc clocksource instead.
Fixes #462

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-22 22:29:32 +02:00
Amnon Heiman
b0856f7acf API: Init value for cf_map reduce should be of type int64_t
The helper function for summing statistic over the column family are
template function that infer the return type acording to the type of the
Init param.

In the API the return value should be int64_t, passing an integer would
cause a number wrap around.

A partial output from the nodetool cfstats after the fix

nodetool cfstats keyspace1
Keyspace: keyspace1
	Read Count: 0
	Read Latency: NaN ms.
	Write Count: 4050000
	Write Latency: 0.009178098765432099 ms.
	Pending Flushes: 0
		Table: standard1
		SSTable count: 12
		Space used (live): 1118617445
		Space used (total): 23336562465

Fixes #682

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-22 17:33:13 +02:00
Tomasz Grabiec
88f5da5d1d Merge branch 'calle/paging_fixes' from seastar-dev.git
From Calle:

Fixes #589
Query should not return dangling static row in partition without any
regular/ck columns if a CK restriction is applied.

Refs #650
Fixes bug in CK range code for paging, and removes CK use for tables with not
clustering -> way simpler code. Also removed lots of workaround code no longer
required.

Note that this patch set does not fully fix #650/paging since bug #663 causes
duplicate rows. Still almost there though.
2015-12-22 11:22:42 +01:00
Avi Kivity
926d340661 logger: be robust when exceptions are thrown while stringifying args
Instead of propagating the exception, swallow it and print it out in
the log message.

Fixes #672.
2015-12-21 19:58:08 +01:00
Paweł Dziepak
cf949e98cb tests/types: add more tests for date and time parsing
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 15:34:17 +01:00
Paweł Dziepak
633a13f7b3 types: timestamp_from_string: accept more date formats
Boost::date_time doesn't accept some of the date and time formats that
the origin do (e.g. 2013-9-22 or 2013-009-22).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 15:30:35 +01:00
Calle Wilund
f118222b2d query_pagers: Remove unneeded clustering + remove static workaround
Refs #640

* Remove use of cluster key range for tables without CK
  Checking CK existance once and use the info allows us to remove some
  stupid complexity in checking for "last key" match
* With fix for #589 we can also remove some superfluous code to
  compensate for that issue, and make "partition end" simper
* Remove extra row in CK case. Not needed anymore

End result is that pager now more or less only relies on adapted query
ranges.
2015-12-21 14:19:45 +00:00
Calle Wilund
72a079d196 paging_state: Make clustering key optional 2015-12-21 14:19:45 +00:00
Calle Wilund
c868d22d0c db/serializer: Add support for optional<T> to be serialized
template spacialization.

Simply just wraps underlying type serialization and adds a "bool"
check mark first in stream.
2015-12-21 14:19:45 +00:00
Calle Wilund
803b58620f data_output: specialize serialized_size for bool to ensure sync with write 2015-12-21 14:19:45 +00:00
Paweł Dziepak
d41807cb66 types: timestamp_from_string(): restore indentation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 15:17:50 +01:00
Paweł Dziepak
873ed78358 types: catch parsing errors in timestamp_from_string()
timestamp_from_string() is used by both timestamp and date types, so it
is better to move the try { } catch { } to the functions itself instead
of expecting its callers to catch exceptions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 15:14:36 +01:00
Takuya ASADA
0d1ef007d3 dist: skip mounting RAID if it's already mounted
On AMI, scylla-server fails to systemctl restart because scylla_prepare tries to mount /var/lib/scylla even it's already mounted.
This patch fixes the issue.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-21 15:50:09 +02:00
Avi Kivity
c3d0ae822d Merge seastar upstream
* seastar b44d729...51154f7 (6):
  > semaphore: add with_semaphore()
  > scripts: posix_net_conf.sh: don't transform wide CPU mask
  > resource: fix build for systems without HWLOC
  > build: link libasan before all other libraries
  > Use sys_membarrier() when available
  > build: add missing library (boost_filesystem)
2015-12-21 14:45:57 +02:00
Calle Wilund
8c17e9e26c mutation_partition: Do not return static row if CK range does not match
Fixes #589

If we got no rows, but have live static columns, we should only
give them back IFF we did not have any CK restrictions.
If ck:s exist, and we have a restriction on them, we either have maching
rows, or return nothing, since cql does not allow "is null".
2015-12-21 10:38:48 +00:00
Pekka Enberg
98454b13b9 cql3: Remove some ifdef'd code 2015-12-21 10:38:48 +00:00
Pekka Enberg
c6541b4cc2 cql3: Remove untranslated IMeasurableMemory code from column_identifier
We will not be using it so just remove the untranslated code.
2015-12-21 10:38:48 +00:00
Pekka Enberg
81d72afd85 cql3: Move delete_statement implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
cd58ea3b96 cql3: Move modification_statement implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
bcd602d3f8 cql3: Move parsed_statement implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
44ba4857eb cql3: Move property_definitions implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
2759473c7a cql3: Move select_statement implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
7a5d6818a3 cql3: Move update_statement implementation to source file 2015-12-21 10:38:48 +00:00
Paweł Dziepak
9aa24860d7 test/sstables: add more key_reader tests
This patch introduces a test for reading keys from a single sstable with
the range begining and end being the keys present in the index summary.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 10:38:48 +00:00
Paweł Dziepak
2fd7caafa0 sstables: respect range inclusiveness in key_reader
When choosing a relevant range of buckets it wasn't taken into account
whether the range bounds are inclusive or not. That may have resulted in
more buckets being read than necessary which was a condition not
expected by the code responsible from looking for a relevant keys inside
the buckets.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 10:38:48 +00:00
Raphael S. Carvalho
d8e810686a sstables: remove outdated comment
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-21 10:38:48 +00:00
Raphael S. Carvalho
99710ae0e6 db: fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-21 10:38:48 +00:00
Raphael S. Carvalho
e1edc2111c sstables: fix comment describing sstable::mark_for_deletion
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-21 10:38:48 +00:00
Raphael S. Carvalho
22ac260059 db: add missing sstable::mark_for_deletion call
If a sstable doesn't belong to current shard, mark_for_deletion
should be called for the deletion manager to still work.
It doesn't mean that the sstable will be deleted, but that the
sstable is not relevant to the current shard, thus it can be
deleted by the deletion manager in the future.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-21 10:38:48 +00:00
Asias He
2d32195c32 streaming: Invalidate cache upon receiving of stream
When a node gain or regain responsibility for certain token ranges,
streaming will be performed, upon receiving of the stream data, the
row cache is invalidated for that range.

Refs #484.
2015-12-21 14:44:13 +08:00
Asias He
517fd9edd4 streaming: Add helper to get distributed<database> db 2015-12-21 14:42:47 +08:00
Asias He
d51227ad9c streaming: Remove transfer_files
It is never used.
2015-12-21 14:42:47 +08:00
Asias He
c25393a3f6 database: Add non-const version of get_row_cache
We need this to invalidate row cache of a column family.
2015-12-21 14:42:47 +08:00
Tomasz Grabiec
324ad43be1 Merge branch 'penberg/cql-cleanups/v1' from seastar-dev.git
Another round of cleanups to the CQL code from Pekka.
2015-12-18 17:36:45 +01:00
Tomasz Grabiec
0862d2f531 Merge branch 'pdziepak/fix-sstables-key_reader-663/v2'
From Paweł:

"This series fixes sstables::key_reader not respecting range inclusiveness
if the bounds were the keys that were present in the index summary.

Fixes #663."
2015-12-18 17:35:09 +01:00
Paweł Dziepak
b39d1fb1fc test/sstables: add more key_reader tests
This patch introduces a test for reading keys from a single sstable with
the range begining and end being the keys present in the index summary.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-18 17:24:29 +01:00
Paweł Dziepak
18b8d7cccc sstables: respect range inclusiveness in key_reader
When choosing a relevant range of buckets it wasn't taken into account
whether the range bounds are inclusive or not. That may have resulted in
more buckets being read than necessary which was a condition not
expected by the code responsible from looking for a relevant keys inside
the buckets.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-18 17:24:26 +01:00
Pekka Enberg
eeadf601e6 Merge "cleanups and improvements" from Raphael 2015-12-18 13:45:11 +02:00
Pekka Enberg
9521ef6402 cql3: Remove some ifdef'd code 2015-12-18 13:29:58 +02:00
Pekka Enberg
f5597968ac cql3: Remove untranslated IMeasurableMemory code from column_identifier
We will not be using it so just remove the untranslated code.
2015-12-18 13:29:58 +02:00
Pekka Enberg
b754de8f4a cql3: Move delete_statement implementation to source file 2015-12-18 13:29:58 +02:00
Pekka Enberg
227e517852 cql3: Move modification_statement implementation to source file 2015-12-18 13:29:58 +02:00
Pekka Enberg
ca963d470e cql3: Move parsed_statement implementation to source file 2015-12-18 13:07:55 +02:00
Pekka Enberg
ff994cfd39 cql3: Move property_definitions implementation to source file 2015-12-18 13:04:32 +02:00
Pekka Enberg
d7db5e91b6 cql3: Move select_statement implementation to source file 2015-12-18 12:59:22 +02:00
Pekka Enberg
8b780e3958 cql3: Move update_statement implementation to source file 2015-12-18 12:54:19 +02:00
Takuya ASADA
0f46d10011 dist: add execute permission to build_ami_local.sh
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-18 11:56:44 +02:00
Pekka Enberg
e56bf8933f Improve not implemented errors
Print out the function name where we're throwing the exception from to
make it easier to debug such exceptions.
2015-12-18 10:51:37 +01:00
Paweł Dziepak
73f9850e1c tests/key_reader: make sure that the reader lives long enough
Fixes test failure in debug mode.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-18 10:32:37 +01:00
Pekka Enberg
39af3ec190 Merge "Implement nodetool drain" from Paweł
"This series adds support for nodetool command 'drain'. The general idea
 of this command is to close all connection (both with clienst and other
 nodes) and flush all memtables to disk.

 Fixes #662."
2015-12-18 11:16:32 +02:00
Takuya ASADA
ae10d86ba4 dist: add missing building time dependencies for Ubuntu
This is necessary to make --cpuset parameter work correctly.

Fixes #554

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-18 11:16:02 +02:00
Takuya ASADA
01bd4959ac dist: downgrade g++ to 4.9 on Ubuntu
Since Ubuntu package fails to build with g++-5, we need to downgrade it.

Fixes #665

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-18 11:15:22 +02:00
Takuya ASADA
aad9c9741a dist: add hwloc as a dependency
It is required for posix_net_conf.sh

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-18 11:14:07 +02:00
Paweł Dziepak
ae3e1374b4 test.py: add missing tests
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 19:08:21 +01:00
Pekka Enberg
89dcc5dfb3 Merge "dist: provide generic Scylla setup script" from Takuya
"Merge AMI scripts to dist/common/scripts, make it usable on non-AMI
 environments. Provides a script to do all settings automatically, which
 able to run as one-liner like this:

   curl http://url_to_scylla_install | sudo bash -s -- -d /dev/xvdb,/dev/xvdc -n eth0 -l ./

 Also enables coredump, save it to /var/lib/scylla/coredump"
2015-12-17 16:01:49 +02:00
Paweł Dziepak
39a65e6294 api: enable storage_service::drain()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
9c0b7f9bbe storage_service: implement drain()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
dcbba2303e messaging_service: restore indentation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
9661d8936b messaging_service: wait for outstanding requests
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
442bc90505 compaction_manager: check whether the manager is already stopped
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
25d255390e database: add non-const getter for compaction_manager
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
31672906d3 transport: wait for outstanding requests to end during shutdown
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
8ee1a44720 storage_service: implement get_drain_progress()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:40 +01:00
Paweł Dziepak
28e6edf927 transport: ignore future when stopping the server
When the server is shutting down a flag _stopping is set and listeners
are aborted using abort_accept(), which causes accept() calls to return
failed futures. However, accept handler just checks that the flag
_stopping is set and returns which causes a failed future to be
destroyed and a warning is printed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:40 +01:00
Takuya ASADA
f7796ef7b3 dist: host gcc-5.1.1-4.fc22.src.rpm on our S3 account, since Fedora mirror deleted it
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 12:53:32 +02:00
Takuya ASADA
9b4d0592fa dist: enable coredump, save it to /var/lib/scylla/coredump
Enables coredump, save it to /var/lib/scylla/coredump

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:20:27 +09:00
Takuya ASADA
d0e5f8083f dist: provide generic scylla setup script
Merge AMI scripts to dist/common/scripts, make it usable on non-AMI environments.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:20:03 +09:00
Takuya ASADA
768ad7c4b8 dist: add SET_NIC entry on sysconfig
Add SET_NIC parameter which is already used in scylla_prepare

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:19:46 +09:00
Takuya ASADA
de1277de29 dist: specify NIC ifname on sysconfig, pass it to posix_net_conf.sh
Support to specify IFNAME for posix_net_conf.sh

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:19:23 +09:00
Takuya ASADA
04d9a2a210 dist: add mdadm, xfsprogs on package dependencies
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 16:59:07 +09:00
Pekka Enberg
9604d55a44 Merge "Add unit test for get_restricted_ranges()" from Tomek 2015-12-17 09:14:30 +02:00
Avi Kivity
b34a1f6a84 Merge "Preliminary changes for handling of schema changes" from Tomasz
"I extracted some less controversial changes on which the schema changes series will depend
 o somewhat reduce the noise in the main series."
2015-12-16 19:08:22 +02:00
Tomasz Grabiec
e2037ebc62 schema: Fix operator==() to include missing fields 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
5a4d47aa1b schema: Remove dead code 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
7a3bae0322 schema: Add equality operators 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
f9d6c7b026 compress: Add equality operators 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
adb93ef31f types: Make name() return const& 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
f28e5f0517 tests: mutation_assertions: Make is_equal_to() check symmetricity 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
3324cf0b8c tests: mutation_reader_assertions: Introduce next_mutation() 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
ad99f89228 tests: mutation_assertion: Introduce has_schema() 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
7451ab4356 tests: mutation_assertion: Allow chaining of assertions 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
efe08a0512 tests: mutation_assertions: Own the mutation which is checked
Easier for users because they don't have to ensure liveness.
2015-12-16 18:06:55 +01:00
Tomasz Grabiec
0cdee6d1c3 tests: row_cache: Fix test_update()
The underlying data source for cache should not be the same memtable
which is later used to update the cache from. This fixes the following
assertion failure:

row_cache_test_g: utils/logalloc.hh:289: decltype(auto) logalloc::allocating_section::operator()(logalloc::region&, Func&&) [with Func = memtable::make_reader(schema_ptr, const partition_range&)::<lambda()>]: Assertion `r.reclaiming_enabled()' failed.

The problem is that when memtable is merged into cache their regions
are also merged, so locking cache's region locks the memtable region
as well.
2015-12-16 18:06:55 +01:00
Tomasz Grabiec
09188bccde mutation_query: Make reconcilable_result printable 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
dd51ff0410 query: Make query::result movable 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
872bfadb3d messaging_service: Remove unused parameters from send_migration_request() 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
157af1036b data_output: Introduce write_view() which matches data_input::read_view() 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
054187acf2 db/serializer: Introduce to_bytes/from_bytes helpers 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
e8d49a106c query_processor: Add trace-level logging of queries 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
de09c86681 data_value: Make printable 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
2ee60d8496 tests: sstable_test: Avoid throwing during expected conditions
Makes debugging easier by making 'catch throw' not stop on expected
conditions.
2015-12-16 18:06:54 +01:00
Tomasz Grabiec
ef49c95015 tests: cql_query_env: Avoid exceptions during normal execution 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
50984ad8d4 scylla-gdb.py: Allow the script to be sourced multiple times
Currently sourcing for the second time causes an exception from
pretty printer registration:

Traceback (most recent call last):
  File "./scylla-gdb.py", line 41, in <module>
    gdb.printing.register_pretty_printer(gdb.current_objfile(), build_pretty_printer())
  File "/usr/share/gdb/python/gdb/printing.py", line 152, in register_pretty_printer
    printer.name)
RuntimeError: pretty-printer already registered: scylla
2015-12-16 18:06:51 +01:00
Avi Kivity
e27a5d97f6 Merge "background mutation throttling" from Gleb
Fixes the case where background activity needed to complete CL=ONE writes
is queued up in the storage proxy, and the client adds new work faster
than it can be cleared.
2015-12-16 18:08:12 +02:00
Raphael S. Carvalho
41be378ff1 db: fix build of sstable list in column_family::compact_sstables
The last two loops were incorrectly inside the first one. That's a
bug because a new sstable may be emplaced more than once in the
sstable list, which can cause several problems. mark_for_deletion
may also be called more than once for compacted sstables, however,
it is idempotent.
Found this issue while auditing the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-16 17:46:17 +02:00
Avi Kivity
4c84d23f3b Merge seastar upstream
"* seastar 294ea30...b44d729 (5):
  > Merge "Properly distribute IO queues" from Glauber
  > reactor: allow more poll time in virtualized environments
  > reactor: fix idle-poll limit
  > reactor: use a vector of unique_ptr for the IO queues
  > io queues: make the queues really part of the reactor"
2015-12-16 17:42:30 +02:00
Tomasz Grabiec
0d5166dcd8 tests: Add test for get_restricted_ranges() 2015-12-16 13:09:01 +01:00
Tomasz Grabiec
e445e4785c storage_proxy: Extract get_restricted_ranges() as a free function
To make it directly testable.
2015-12-16 13:09:01 +01:00
Tomasz Grabiec
756624ef18 Remove dead code 2015-12-16 13:09:01 +01:00
Tomasz Grabiec
eb27fb1f6b range: Introduce equal() 2015-12-16 13:09:01 +01:00
Calle Wilund
43929d0ec1 commitlog: Add some comments about the IO flow
Documentation.
2015-12-16 13:13:31 +02:00
Gleb Natapov
de63b3a824 storage_proxy: provide timeout for send_mutation verb
Providing timeout for send_mutation verb allows rpc to drop packets that
sit in outgoing queue for to long.
2015-12-16 10:13:46 +02:00
Gleb Natapov
fe4bc741f4 storage_proxy: throttle mutations based on ongoing background activity
With consistency level less then ALL mutation processing can move to
background (meaning client was answered, but there is still work to
do on behalf of the request). If background request rate completion
is lower than incoming request rate background request will accumulate
and eventually will exhaust all memory resources. This patch's aim is
to prevent this situation by monitoring how much memory all current
background request take and when some threshold is passed stop moving
request to background (by not replying to a client until either memory
consumptions moves below the threshold or request is fully completed).

There are two main point where each background mutation consumes memory:
holding frozen mutation until operation is complete in order to hint it
if it does not) and on rpc queue to each replica where it sits until it's
sent out on the wire. The patch accounts for both of those separately
and limits the former to be 10% of total memory and the later to be 6M.
Why 6M? The best answer I can give is why not :) But on a more serious
note the number should be small enough so that all the data can be
sent out in a reasonable amount of time and one shard is not capable to
achieve even close to a full bandwidth, so empirical evidence shows 6M
to be a good number.
2015-12-16 10:13:46 +02:00
Pekka Enberg
40e8a9c99c sstables/compaction: Fix compilation error with GCC 4.9.2
I am sure it's a compiler issue but I am not ready to give up and
upgrade just yet:

  sstables/compaction.cc:307:55: error: converting to ‘std::unordered_map<int, long int>’ from initializer list would use explicit constructor ‘std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::unordered_map(std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type, const hasher&, const key_equal&, const allocator_type&) [with _Key = int; _Tp = long int; _Hash = std::hash<int>; _Pred = std::equal_to<int>; _Alloc = std::allocator<std::pair<const int, long int> >; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type = long unsigned int; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::hasher = std::hash<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::key_equal = std::equal_to<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::allocator_type = std::allocator<std::pair<const int, long int> >]’
                 stats->start_size, stats->end_size, {});
2015-12-16 10:03:14 +02:00
Raphael S. Carvalho
36d31a5dab fix cql_query_test
Test was failing because _qp (distributed<cql3::query_processor>) was stopped
before _db (distributed<database>).
Compaction manager is member of database, and when database is stopped,
compaction manager is also stopped. After a2fb0ec9a, compaction updates the
system table compaction history, and that requires a working query context.
We cannot simply move _qp->stop() to after _db->stop() because the former
relies on migration_manager and storage_proxy. So the most obvious fix is to
clean the global variable that stores query context after _qp was stopped.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-16 09:58:46 +02:00
Nadav Har'El
63c0906b16 messaging_service: drop unnecessary explicit templates
The previous patch added message_service read()/write() support for all
types which know how to serialize themselves through our "old" serialization
API (serialize()/deserialize()/serialized_size()).

So we no longer need the almost 200 lines of repetitive code in
messaging_service.{cc,hh} which defined these read/write templates
separately for a dozen different types using their *serialize() methods.
We also no longer need the helper functions read_gms()/write_gms(), which
are basically the same code as that in the template functions added in the
previous patch.

Compilation is not significantly slowed down by this patch, because it
merely replaces a dozen templates by one template that covers them all -
it does not add new template complexity, and these templates are anyway
instantiated only in messaging_service.cc (other code only calls specific
functions defined in messaging_service.cc, and does not use these templates).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-15 19:07:05 +02:00
Nadav Har'El
438f6b79f7 messaging_service: allow any self-serializing type
Currently, messaging_service only supports sending types for which a read/
write function has been explicitly implemented in messageing_service.hh/cc.

Some types already have serialization/deserialization methods inside them,
and those could have been used for the serialization without having to write
new functions for each of these types. Many of these types were already
supported explicitly in messaging_service.{cc,hh}, but some were forgot -
for example, dht::token.

So this patch adds a default implemention of messaging_service write()/read()
which will work for any type which has these serialization methods.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-15 19:07:05 +02:00
Tomasz Grabiec
a78f4656e8 Introduce ring_position_less_comparator 2015-12-15 18:00:55 +01:00
Avi Kivity
8fc7583224 Merge seastar upstream
* seastar 5b9e3da...294ea30 (9):
  > Merge "IO queues" from Glauber
  > reactor: increment check_direct_io_support to also deal with files
  > Merge "SSL/TLS initial certificate validation" from Calle
  > tutorial.md: remove inaccurate statements about x86
  > build: verify that the installed compiler is up to date
  > build: complain if fossil version of gnutls is installed
  > build: fix debian naming of gnutls-devel package
  > build: add configure-time check for gnutls-devel
  > tutorial.md: introduction to asynchrnous programming
2015-12-15 16:50:16 +02:00
Gleb Natapov
e43ae7521f storage_proxy: unfuturize send_to_live_endpoints()
send_to_live_endpoints() is never waited upon, it does its job in the
background. This patch formalize that by changing return value to void
and also refactoring code so that frozen_mutation shared pointer is not
held more that it should: currently it is held until send_mutation()
completes, but since send_mutation() does not use frozen_mutation
asynchronously this is not necessary.
2015-12-15 15:40:36 +02:00
Tomasz Grabiec
305c2b0880 frozen_mutation: Introduce decorated_key() helper
Requested by Asias for use in streaming code.
2015-12-15 15:16:04 +02:00
Tomasz Grabiec
179b587d62 Abstract timestamp creation behind new_timestamp()
Replace db_clock::now_in_usec() and db_clock::now() * 1000 accesses
where the intent is to create a new auto-generate cell timestamp with
a call to new_timestamp(). Now the knowledge of how to create timestamps
is in a single place.
2015-12-15 15:16:04 +02:00
Avi Kivity
8abd013601 Merge 2015-12-15 15:00:49 +02:00
Avi Kivity
fb8a4f6c1b Merge " implement get_compactions API" from Raphael
"get_compactions returns progress information for each compaction
running in the system. It can be accessed using swagger UI.
'nodetool compactionstats' is not working yet because of some
pending work in the nodetool side."
2015-12-15 14:59:49 +02:00
Paweł Dziepak
71f92c4d14 mutation_partition: do not move rows_entry::_link
Apparently, link hook copy constructor is a no-op and move contructor
doesn't exist so the code is correct, but that explicit move makes code
needlessly confusing.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-15 13:22:23 +01:00
Paweł Dziepak
59245e7913 row_cache: add functions for invalidating entries in cache
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-15 13:21:11 +01:00
Raphael S. Carvalho
833a78e9f7 api: implement get_compactions
get_compactions returns progress information about each
ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:36 -02:00
Raphael S. Carvalho
193ede68f3 compaction: register and deregister compaction_stats
That's important for compaction stats API that will need stats
data of each ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:32 -02:00
Raphael S. Carvalho
e74dcc86bd compaction_manager: introduce list of compaction_stats
This list will store compaction_stats for each ongoing compaction.
That's why register and deregister methods are provided.
This change is important for compaction stats API that needs data
of each ongoing compaction, such as progress, ks, cf, etc.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:28 -02:00
Raphael S. Carvalho
a26fb15d1a db: add method to get compaction manager from cf
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:20 -02:00
Raphael S. Carvalho
1fba394dd0 sstables: store keyspace and cf in compaction_stats
The reason behind this change is that we will need ks and cf
for the compaction stats API.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:02 -02:00
Raphael S. Carvalho
ac1a67c8bc sstables: move compaction_stats to header file
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:49:45 -02:00
Avi Kivity
a2eac711cf Merge "compaction history support" from Raphael
"This patchset will make Scylla update the system table
COMPACTION_HISTORY whenever a compaction job finishes.
Functions were added to both update and retrieve the
content of this system table. Compaction history API
is also enabled in this series."
2015-12-15 13:22:14 +02:00
Raphael S. Carvalho
87fbe29cf9 api: add support to compaction history
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:00:21 -02:00
Takuya ASADA
9c5afb8e58 dist: add scylla-gdb.py on scylla-server-debuginfo rpm package
It will place at /usr/src/debug/scylla-server-development/scylla-gdb.py
Fixes #604

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-15 12:13:17 +02:00
Raphael S. Carvalho
a2fb0ec9a3 sstables: update compaction history at the end of compaction
When compaction job finishes, call function to update the system
table COMPACTION_HISTORY. That's also needed for the compaction
history API.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 14:20:03 -02:00
Raphael S. Carvalho
433ed60ca3 db: add method to get compaction history
This method is intended to return content of the system table
COMPACTION_HISTORY as a vector of compaction_history_entry.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 14:19:04 -02:00
Raphael S. Carvalho
f3beacac28 db: add method to update the system table COMPACTION_HISTORY
It's supposed to be called at the end of compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 13:47:10 -02:00
Raphael S. Carvalho
0fa194c844 sstables: remove outdated comment
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:43:53 -02:00
Raphael S. Carvalho
6142efaedb db: fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:43:34 -02:00
Raphael S. Carvalho
81f5b1716e sstables: fix comment describing sstable::mark_for_deletion
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:43:11 -02:00
Raphael S. Carvalho
7bbc1b49b6 db: add missing sstable::mark_for_deletion call
If a sstable doesn't belong to current shard, mark_for_deletion
should be called for the deletion manager to still work.
It doesn't mean that the sstable will be deleted, but that the
sstable is not relevant to the current shard, thus it can be
deleted by the deletion manager in the future.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:42:26 -02:00
Tomasz Grabiec
0865ecde17 storage_proxy: Fix range splitting
There is a check whose intent was to detect wrap around during walk of
the ring tokens by comparing the split point with minimum token, which
is supposed to be inserted by the ring iterator. It assumed that when
we encounter it, the range is a wrap around. It doesn't hold when
minimum token is part of the token metadata or set of tokens is empty.

In such case, a full range would be split into 3 overlapping full
ranges. The fix is to drop the assumption and instead ensure that
ranges do not wrap around by unwrapping them if necessary.

Fixes #655.
2015-12-14 16:05:54 +02:00
Takuya ASADA
3b7693feda dist: add package dependency to gnutls library
Now Seastar depends to gnutls, we need to add it on .rpm/.deb package dependency.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-14 13:28:28 +02:00
Pekka Enberg
ba09c545fc dist/docker: Enable SMP support
Now that Scylla has a sleep mode, we can enable SMP support again.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-12-14 13:23:30 +02:00
Avi Kivity
fd14cb3743 mutation_partition: fix leak in move assignment operator
The default move assignment operator calls boost::intrusive::set's move
assignment operator, which leaks, because it does not believe it owns
the data.

Fix by providing a custom implementation.
2015-12-14 10:33:19 +01:00
Asias He
9781e0d34d storage_service: Make bootstrapping/leaving/moving log more consistent
It is useful for test code to grep the log.
2015-12-11 13:57:40 +02:00
Tomasz Grabiec
1991fd5ca2 Merge branch 'pdziepak/fix-clustering-key-comparison/v2' fom seastar-dev.git
From Paweł:

This series fixes comparison of byte order comparable clustering keys.

Fixes #645.
2015-12-11 12:51:02 +01:00
Paweł Dziepak
3a73496817 tests/cql: add test for ordering clustering keys
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-11 12:05:25 +01:00
Paweł Dziepak
8cab343895 compound: fix compare() of prefixable types
All components of prefixable compound type are preceeded by their
length what makes them not byte order comparable.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-11 12:04:31 +01:00
Paweł Dziepak
8fd4b9f911 schema: remove _clustering_key_prefix_type
All clustering keys are now prefixable.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-11 10:47:24 +01:00
Paweł Dziepak
bb9a71f70c thrift: let class_from_compound_type() accept prefixable types
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-11 10:45:56 +01:00
Pekka Enberg
0d8a02453e types: Fix frozen collection type names
Frozen collection type names must be wrapped in FrozenType so that we
are able to store the types correctly in system tables.

This fixes #646 and fixes #580.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-12-11 10:41:11 +01:00
Pekka Enberg
63bdeb65f2 cql3: Implement maps::literal::test_assignment() function
The test_assignement() function is invoked via the Cassandra unit tests
so we might as well implement it.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-12-11 09:35:13 +01:00
Asias He
57ee9676c2 storage_service: Fix default ring_delay time
It is 30 seconds instead of 5 seconds by default. To align with c*.

Pleas note, after this a node will takes at least 30 seconds to complete
a bootstrap.
2015-12-11 09:05:19 +02:00
Avi Kivity
b3cd672d97 Merge seastar upstream
* seastar ad07a2e...5b9e3da (2):
  > Merge "rpc cleanups and improvements" from Gleb
  > shared_future: Add missing include
2015-12-10 18:11:59 +02:00
Paweł Dziepak
9d482532f4 tests/lsa: reduce the size of large allocation
Originally, large allocation test case attempted to allocate an object
as big as halft of the space used by the lsa. That failed when the test
was executed with lower amount of memory available mainly due to the
memory fragmentation caused by previous test cases.

This patches reduces the size of the large allocation to 3/8 of the
total space used by the lsa which is still a lot but seems to make the
test pass even with as little memory as 64MB per shard.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 13:16:43 +01:00
Avi Kivity
d425aacaeb release: copy version string into heap
If we get a core dump from a user, it is important to be able to
identify its version.  Copy the release string into the heap (which is
copied into the code dump), so we can search for it using the "strings"
or "ident" commands.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2015-12-10 13:12:40 +02:00
Lucas Meneghel Rodrigues
2167173251 utils/logalloc.cc - Declare member minimum_size from segment_zone struct
This fixes compile error:

In function `logalloc::segment_zone::segment_zone()':
/home/lmr/Code/scylla/utils/logalloc.cc:412: undefined reference to `logalloc::segment_zone::minimum_size'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-12-10 12:54:34 +02:00
Asias He
b7d10b710e streaming: Propagate fail to send PREPARE_DONE_MESSAGE exception
Otherwise the stream_plan will not be marked as failed state.
2015-12-10 12:38:00 +02:00
Paweł Dziepak
ec453c5037 managed_bytes: fix potentially unaligned accesses
blob_storage defined with attribute packed which makes its alignment
requirement equal 1. This means that its members may be unaligned.
GCC is obviously aware of that and will generate appropriate code
(and not generate ubsan checks). However, there are few places where
members of blob_storage are accessed via pointers, these have to be
wrapped by unaligned_cast<> to let the compiler know that the location
pointed to may be not aligned properly.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 11:59:54 +02:00
Tomasz Grabiec
43498b3158 Merge branch 'pdziepak/fix-partial-clustering-keys/v1' from seastar-dev.git
Form Paweł:

This series fixes support for clustering keys which trailing components
are null. The solution is to use clustering_key_prefix instead of
clustering_key everywhere.

Fixes #515.
2015-12-10 10:43:12 +01:00
Paweł Dziepak
66ff1421f0 tests/cql: add test for clustering keys with empty components
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:47:07 +01:00
Paweł Dziepak
64f50a4f40 db: make clustering_key a prefix
Schemas using compact storage can have clustering keys with the trailing
components not set and effectively being a clustering key prefixes
instead of full clustering keys.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:47 +01:00
Paweł Dziepak
77c7ed6cc5 keys: add prefix_equality_less_compare for prefixes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
220a3b23c0 keys: allow creating partial views of prefixes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
3c16ab080a sstables: do not assume clustering_key has the proper format
In case of non-compound dense tables the column name is just the value
of the clustering key (which has only one component). Current code just
casts clustering_key to bytes_view which works because there is no
additional metadata in single element clustering keys.
However, that may change when the internal representation of clustering
key is changed so explicitly extract the proper component.

This change will become necessary when clustering_key is replaced by
clustering_key_prefix.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
5f1e9fd88f mutation_partition: remove unused find_entry()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
3287022000 cql3: do not assume that clustering key is full
In case of schemas that use compact storage it is possible that trailing
components of clustering keys are not set.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Avi Kivity
167addbfe1 main: remove issue #417 (poll mode) warning
Fixed.
2015-12-09 19:00:32 +02:00
Avi Kivity
a352d63bf9 Merge seastar upstream
* seastar c5e595b...ad07a2e (1):
  > reactor: add command line option to disable sleep mode

Fixes #417
2015-12-09 19:00:20 +02:00
Glauber Costa
3c988e8240 perf_sstable: use current scylla default directory
When this tool was written, we were still using /var/lib/cassandra as a default
location. We should update it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2015-12-09 17:46:31 +02:00
Avi Kivity
01c3670def Merge seastar upstream
* seastar 5dc22fa...c5e595b (3):
  > memory: be less strict about NUMA bindings
  > reactor: let the resource code specify the default memory reserve
  > resource: reserve even more memory when hwloc is compiled in

Fixes #642
2015-12-09 16:47:47 +02:00
Asias He
66938ac129 streaming: Add retransmit logic for streaming verbs
Retransmit streaming related verbs and give up in 5 minutes.

Tested with:

  lein test :only cassandra.batch-test/batch-halves-decommission

Fixes #568.
2015-12-09 15:12:36 +02:00
Avi Kivity
14794af260 Merge seastar upstream
* seastar 9f9182e...5dc22fa (1):
  > future: add repeat_until_value(): repeat an action until it returns a value
2015-12-09 15:11:59 +02:00
Avi Kivity
213700e42f Merge seastar upstream
* seastar d40453b...9f9182e (5):
  > Merge "Sleep mode support"
  > future: add futurize<T>::from_tuple(tuple<T>)
  > tls: Add missing destructor for dh_params::impl, fixes ASAN error
  > tls/socket fix: Add missing noexcept to constructor/move
  > Merge "Initial SSL/TLS socket support" from Calle
2015-12-09 11:01:13 +02:00
Avi Kivity
204610ac61 Merge "Make LSA more large-allocation-friendly" from Paweł
"This series attempts to make LSA more friendly for large (i.e. bigger
than LSA segment) allocations. It is achieved by introducing segment
zones – large, contiguous areas of segments and using them to allocate
segments instead of calling malloc() directly.
Zones can be shrunk when needed to reclaim memory and segments can be
migrated either to reduce number of zone or to defragment one in order
to be able to shrink it. LSA tries to keep all segments at the lower
addresses and reclaims memory starting from the zones in the highest
parts of the address space."
2015-12-09 10:49:23 +02:00
Avi Kivity
883074e936 Merge "Fix replace_node support" from Asias
Also:

[PATCH scylla v1 0/7] gossip mark node down fix + cleanup
[PATCH scylla v1 0/2] Refuse decommissioned node to rejoin
[PATCH scylla] storage_service: Fix added node not showing up in nodetool in status joining
2015-12-09 10:42:52 +02:00
Paweł Dziepak
8ba66bb75d managed_bytes: fix copy size in move constructor
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-09 10:38:28 +02:00
Asias He
b63d49c773 storage_service: Log removing replaced endpoint from system.peers
This info is important when replacing a node. Useful for debugging.
2015-12-09 12:30:52 +08:00
Asias He
d26c7e671d storage_service: Enable commented out code in handle_state_normal
Add current_owner to endpoints_to_remove if endpoint and current_owner
have the same token and endpoint is newer than current_owner.
2015-12-09 12:30:52 +08:00
Asias He
3793bb7be1 token_metadata: Add get_endpoint_to_token_map_for_reading 2015-12-09 12:30:52 +08:00
Asias He
1cc7887ffb token_metadata: Do nothing if tokens is empty.
When replacing a node, we might ignore the tokens so that the tokens is
empty. In this case, we will have

   std::unordered_map<inet_address, std::unordered_set<token>> = {ip, {}}

passed to token_metadata::update_normal_tokens(std::unordered_map<inet_address,
std::unordered_set<token>>& endpoint_tokens)

and hit the assert

   assert(!tokens.empty());
2015-12-09 12:30:52 +08:00
Asias He
e79c85964f system_keyspace: Flush system.peers in remove_endpoint
1) Start node 1, node 2, node 3
2) Stop  node 3
3) Start node 4 to replace node 3
4) Kill  node 4 (removal of node 3 in system.peers is not flushed to disk)
5) Start node 4 (will load node 3's token and host_id info in bootup)

This makes

   "Token .* changing ownership from 127.0.0.3 to 127.0.0.4"

messages printed again in step 5) which are not expected, which fails the dtest

   FAIL: replace_first_boot_test (replace_address_test.TestReplaceAddress)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "scylla-dtest/replace_address_test.py",
   line 220, in replace_first_boot_test
       self.assertEqual(len(movedTokensList), numNodes)
   AssertionError: 512 != 256
2015-12-09 12:30:52 +08:00
Asias He
110a18987e token_metadata: Print Token changing ownership from
Needed by test.
2015-12-09 12:30:52 +08:00
Asias He
906f670a86 gossip: Print node status in handle_major_state_change
It is useful to know the STATUS value when debugging.
2015-12-09 12:29:15 +08:00
Asias He
a0325a5528 gossip: Simplify is_shutdown and friends.
Use the newly added helper get_gossip_status.
2015-12-09 12:29:15 +08:00
Asias He
9d4382c626 gossip: Introduce get_gossip_status
Get value of application_state::STATUS.
2015-12-09 12:29:15 +08:00
Asias He
5a65d8bcdd gossip: Fix endless marking a node down
In commit 56df32ba56 (gossip: Mark node as
dead even if already left). A node liveness check is missed.

Fix it up.

Before: (mark a node down multiple times)

[Tue Dec  8 12:16:33 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:33 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:34 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:34 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:35 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:35 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:36 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:36 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead

After: (mark a node down only one time)

[Tue Dec  8 12:28:36 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:28:36 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
2015-12-09 12:29:15 +08:00
Asias He
fa3c84db10 gossip: Kill default constructor for versioned_value
The only reason we needed it is to make
   _application_state[key] = value
work.

With the current default constructor, we increase the version number
needlessly. To fix and to be safe, remove the default constructor
completely.
2015-12-09 12:29:15 +08:00
Asias He
52a5e954f9 gossip: Pass const ref for versioned_value in on_change and before_change 2015-12-09 12:29:15 +08:00
Asias He
3308430343 storage_service: Make before_change and on_change log print more informative
- Make before_change and on_change print the versioned_value
- Print endpoint address first in handle_state_* and
  on_change and friends.
2015-12-09 12:29:15 +08:00
Asias He
ccbd801f40 storage_service: Fix decommissioned nodes are willing to rejoin the cluster if restarted
Backport: CASSANDRA-8801

a53a6ce Decommissioned nodes will not rejoin the cluster.

Tested with:
topology_test.py:TestTopology.decommissioned_node_cant_rejoin_test
2015-12-09 10:43:51 +08:00
Asias He
b3dd2d976a storage_service: Simplify prepare_to_join with seastar thread 2015-12-09 10:43:51 +08:00
Asias He
e9a4d93d1b storage_service: Fix added node not showing up in nodetool in status joining
The get_token_endpoint API should return a map of tokens to endpoints,
including the bootstrapping ones.

Use get_local_storage_service().get_token_to_endpoint_map() for it.

$ nodetool -p 7100 status

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID Rack
UN  127.0.0.1  12645      256     ?  eac5b6cf-5fda-4447-8104-a7bf3b773aba  rack1
UN  127.0.0.2  12635      256     ?  2ad1b7df-c8ad-4cbc-b1f1-059121d2f0c7  rack1
UN  127.0.0.3  12624      256     ?  61f82ea7-637d-4083-acc9-567e0c01b490  rack1
UJ  127.0.0.4  ?          256     ?  ced2725e-a5a4-4ac3-86de-e1c66cecfb8d  rack1

Fixes #617
2015-12-09 10:43:51 +08:00
Paweł Dziepak
63bdf52803 tests/lsa: add large allocations test
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 23:56:46 +01:00
Tomasz Grabiec
d68a8b5349 Merge branch 'dev/amnon/index_summary_size_v2' from seastar-dev.git
API for getting sstable index summary memory footprint from Amnon
2015-12-08 20:03:39 +01:00
Paweł Dziepak
73a1213160 scylla-gdb.py: print lsa zones
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
0d66300d43 lsa: add more counters
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
83b004b2fb lsa: avoid fragmenting memory
Originally, lsa allocated each segment independently what could result
in high memory fragmentation. As a result many compaction and eviction
passes may be needed to release a sufficiently big contiguous memory
block.

These problems are solved by introduction of segment zones, contiguous
groups of segments. All segments are allocated from zones and the
algorithm tries to keep the number of zones to a minimum. Moreover,
segments can be migrated between zones or inside a zone in order to deal
with fragmentation inside zone.

Segment zones can be shrunk but cannot grow. Segment pool keeps a tree
containing all zones ordered by their base addresses. This tree is used
only by the memory reclamer. There is also a list of zones that have
at least one free segments that is used during allocation.

Segment allocation doesn't have any preferences which segment (and zone)
to choose. Each zone contains a free list of unused segments. If there
are no zones with free segments a new one is created.

Segment reclamation migrates segments from the zones higher in memory
to the ones at lower addresses. The remaining zones are shrunk until the
requested number of segments is reclaimed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
6c4a54fb0b tests: add tests for utils::dynamic_bitset
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
2fb14a10b6 utils: add dynamic_bitset
A dynamic bitset implementation that provides functions to search for
both set and cleared bits in both directions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
40dda261f2 lsa: maintain segment to region mapping
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
c4e71bac7f tests/row_cache_alloc_stress: make sure that allocation fails
Currently test case "Testing reading when memory can't be reclaimed."
assumes that the allocation section used by row cache upon entering
will require more free memory than there is available (inc. evictable).
However, the reserves used by allocation section are adjusted
dynamically and depend solely on previous events. In other words there
is no guarantee that the reserve would be increased so much that the
allocation will fail.

The problem is solved by adding another allocation that is guaranteed
to be bigger than all evictable and free memory.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
2e94086a2c lsa: use bi::list to implement segment_stack
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Tomasz Grabiec
6ead7a0ec5 Merge tag 'large-blobs/v3' from git@github.com:avikivity/scylla.git
Scattering of blobs from Avi:

This patchset converts the stack to scatter managed_bytes in lsa memory,
allowing large blobs (and collections) to be stored in memtable and cache.
Outside memtable/cache, they are still stored sequentially, but it is assumed
that the number of transient objects is bounded.

The approach taken here is to scatter managed_bytes data in multiple
blob_storage objects, but to linearize them back when accessing (for
example, to merge cells).  This allows simple access through the normal
bytes_view.  It causes an extra two copies, but copying a megabyte twice
is cheap compared to accessing a megabyte's worth of small cells, so
per-byte throughput is increased.

Testing show that lsa large object space is kept at zero, but throughput
is bad because Scylla easily overwhelms the disk with large blobs; we'll
need Glauber's throttling patches or a really fast disk to see good
throughput with this.
2015-12-08 19:15:13 +01:00
Avi Kivity
5c5331d910 tests: test large blobs in memtables 2015-12-08 15:17:09 +02:00
Avi Kivity
0c2fba7e0b lsa: advertize our preferred maximum allocation size
Let managed_bytes know that allocating below a tenth of the segment size is
the right thing to do.
2015-12-08 15:17:09 +02:00
Avi Kivity
f9e2a9a086 mutation_partition: work on linearized atomic_cell_or_mutation objects
Ensure that when we examine atomic_cell_or_mutation objects for merging,
that they are contiguous in memory.  When we are done we scatter them again.
2015-12-08 15:17:09 +02:00
Avi Kivity
ad975ad629 atomic_cell_or_collection: linearize(), unlinearize()
Add linearize() and unlinearize() methods that allow making an
atomic_cell_or_collection object temporarily contiguous, so we can examine
it as a bytes_view.
2015-12-08 15:17:09 +02:00
Avi Kivity
13324607e6 managed_bytes: conform to allocation_strategy's max_preferred_allocation_size
Instead of allocating a single blob_storage, chain multiple blob_storage
objects in a list, each limited not to exceed the allocation_strategy's
max_preferred_allocation_size.  This allows lsa to allocate each blob_storage
object as an lsa managed object that can be migrated in memory.

Also provide linearize()/scatter() methods that can be used to temporarily
consolidate the storage into a single blob_storage.  This makes the data
contiguous, so we can use a regular bytes_view to examine it.
2015-12-08 15:17:08 +02:00
Takuya ASADA
8c98e239d0 dist: use /etc/scylla as SCYLLA_CONF directory on AMI
We don't need copy /var/lib/scylla/conf on RAID anymore, it moved to /etc/scylla.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-08 11:09:12 +02:00
Avi Kivity
098136f4ab Merge "Convert serialization of query::result to use db::serializer<>" from Tomasz
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2015-12-07 16:53:34 +02:00
Amnon Heiman
3ce7fa181c API: Add the implementation for index_summary_off_heap_memory
This adds the implementation for the index_summary_off_heap_memory for a
single column family and for all of them.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-07 15:15:39 +02:00
Amnon Heiman
e786f1d02f sstable: Add get_summary function
The get_summary method returns a const reference to the summary object.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-07 14:52:18 +02:00
Amnon Heiman
bae286a5b4 Add memory_footprint method to summary_ka
Similiar to origin, off heap memory, memory_footprint is the size of
queus multiply by the structure size.

memory_footprint is used by the API to report the memory that is taken
by the summary.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-07 14:52:18 +02:00
Amnon Heiman
2086c651ba column_family: get_snapshot_details should return empty map for no snapshots
If there is no snapshot directory for the specific column family,
get_snapshot_details should return an empty map.

This patch check that a directory exists before trying to iterate over
it.

Fixes #619

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-07 12:51:04 +01:00
Tomasz Grabiec
b43b5af894 Merge tag 'tgrabiec/make-future-values-nothrow-move-constructible-v3' from seastar-dev.git
Seastar's future<> now requires types to be nothrow move
constructible. This series makes Scylla code comply.
2015-12-07 10:43:18 +01:00
Tomasz Grabiec
95f515a6bd Move seastar submodule head
Scylla changes:
  sstable.cc: Remove file_exists() function which conflicts with seastar's

Amnon Heiman (2):
      reactor: Add file_exists method
      Add a wrapper for file_exists

Avi Kivity (2):
      Merge "Introduce shared_future" from Tomasz
      Merge ""scripts: a few fixes in posix_net_conf.sh" from Vlad

Gleb Natapov (3):
      rpc: not stop client in error state
      avoid allocation in parallel_for_each is there is nothing to do
      memory: fix size_to_idx calculation

Nadav Har'El (1):
      test: fix use-after-free in timertest

Pawe�� Dziepak (1):
      memory: use size instead of old_size to shrink memory block

Tomasz Grabiec (7):
      file: Mark move constructor as noexcept
      core: future: Add static asserts about type's noexcept guarantees
      core: future: Drop now redundant move_noexcept flag
      core: future_state: Make state getters non-destructive for non-rvalue-refs
      core: future: Make get_available_state() noexcept
      core: Introduce shared_future
      Make json_return_type movable

Vlad Zolotarov (8):
      scripts: posix_net_conf.sh: ban NIC IRQs from being moved by irqbalance
      scripts: posix_net_conf.sh: exclude CPU0 siblings from RPS
      scripts: posix_net_conf.sh: Configure XPS
      scripts: posix_net_conf.sh: Add a new mode for MQ NICs
      scripts: posix_net_conf.sh: increase some backlog sizes
      core: to_sstring(): cleanup
      core: to_sstring_strintf(): always use %g(or %lg) format for floating point values
      core: prevent explicit calls for to_sstring_sprintf()
2015-12-07 10:41:39 +01:00
Glauber Costa
79e70568d7 scylla-setup: do not add discard to the command line
In a recent discussion with the XFS developers, Dave Chinner recommended
us *not* to use discard, but rather issue fstrims explicitly. In machines
like Amazon's c3-class, the situation is made worse by the fact that discard
is not supported by the disk. Contrary to my intuition, adding the discard
mount option in such situation is *not* a nop and will just create load
for no reason.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-12-07 11:22:27 +02:00
Tomasz Grabiec
934d3f06d1 api: Make histogram reduction work on domain value instead of json objects
Objects extending json_base are not movable, so we won't be able to
pass them via future<>, which will assert that types are nothrow move
constructible.

This problem only affects httpd::utils_json::histogram, which is used
in map-reduce. This patch changes the aggregation to work on domain
value (utils::ihistrogram) instead of json objects.
2015-12-07 09:50:28 +01:00
Tomasz Grabiec
c0ac7b3a73 commitlog: Wrap subscription in a unique_ptr<> to make it nothrow movable
future<> will require nothrow move constructible types.
2015-12-07 09:50:28 +01:00
Tomasz Grabiec
657841922a Mark move constructors noexcept when possible 2015-12-07 09:50:27 +01:00
Tomasz Grabiec
fdc28a73f8 thrift: Make with_cob() handle not nothrow move constructible types 2015-12-07 09:50:27 +01:00
Tomasz Grabiec
538de7222a Introduce noexcept_traits 2015-12-07 09:50:27 +01:00
Tomasz Grabiec
bc23ebcbc3 schema_tables: Replace schema_result::value_type with equivalent movable type
future<> requires and will assert nothrow move constructible types.
2015-12-07 09:50:27 +01:00
Avi Kivity
91c2af2803 Merge "nodetool removenode fix + cleanup" from Asias 2015-12-07 10:41:51 +02:00
Takuya ASADA
2891291ad1 dist: add swagger-ui and api-doc on ubuntu package
Fixes .deb part of #520

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-07 10:39:59 +02:00
Takuya ASADA
3f0ca277e5 dist: add swagger-ui and api-doc on rpm package
Fixes .rpm part of #520

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-07 10:39:59 +02:00
Avi Kivity
2437fc956c allocation_strategy: expose preferred allocation size limit
Our premier allocation_strategy, lsa, prefers to limit allocations below
a tenth of the segment size so they can be moved around; larger allocations
are pinned and can cause memory fragmentation.

Provide an API so that objects can query for this preferred size limit.

For now, lsa is not updated to expose its own limit; this will be done
after the full stack is updated to make use of the limit, or intermediate
steps will not work correctly.
2015-12-06 16:23:42 +02:00
Vlad Zolotarov
564cb2bcd1 gms::versioned_value: don't use to_sstring_sprintf() directly
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-06 12:24:54 +02:00
Raphael S. Carvalho
d435ca7da6 enable more logging for leveled compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-06 11:36:50 +02:00
Pekka Enberg
a95a7294ef types: Fix 'varint' type value compatibility check
Fixes #575.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-12-04 13:25:34 +01:00
Glauber Costa
5e8249f062 commitlog: fix but preventing flushing with default max_size value
The config file expresses this number in MB, while total_memory() gives us
a quantity in bytes. This causes the commitlog not to flush until we reach
really skyhigh numbers.

While we need this fix for the short term before we cook another release,
I will note that for the mid/long term, it would be really helpful to stop
representing memory amounts as integers, and use an explicit C++ type for
those. That would have prevented this bug.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-12-04 09:29:19 +02:00
Vlad Zolotarov
cd215fc552 types: map::to_string() - non-empty implementation
Print a map in the form of [(]{ key0 : value0 }[, { keyN : valueN }]*[)]
The map is printed inside () brackets if it's frozen.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-03 18:46:12 +01:00
Amnon Heiman
54b4f26cb0 API: Change the compaction summary to use an object
In origin, there are two APIs to get the information about the current
running compactions. Both APIs do the string formatting.

This patch changes the API to have a single API get_compaction that
would return a list of summary object.

The jmx would do the string formatting for the two APIs.

This change gives a better API experience is it's better documented and
would make it easier to support future format changes in origin.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-03 11:57:37 +02:00
Asias He
dcb9b441ab storage_service: Fix debug build
Start non-seed with debug build I saw:

==9844==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack top: 0x7ffdabd73000; bottom 0x7fe309218000; size: 0x001aa2b5b000 (114398965760)
False positive error reports may follow
For details see http://code.google.com/p/address-sanitizer/issues/detail?id=189
DEBUG [shard 0] storage_service - Starting shadow gossip round to check for endpoint collision
=================================================================
==9844==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe309219ad0 at pc 0x00000495a88e bp 0x7fe309219960 sp 0x7fe309219950
WRITE of size 8 at 0x7fe309219ad0 thread T0
    #0 0x495a88d in _Head_base<seastar::async(Func&&, Args&& ...)
       [with Func = service::storage_service::check_for_endpoint_collision()::<lambda()>; Args = {};
       futurize_t<typename std::result_of<typename std::decay<_Tp>::type(std::decay_t<Args>...)>::type> = future<>]::work*>
       /usr/include/c++/5.1.1/tuple:115
    #1 0x495a993 in _Tuple_impl<seastar::async(Func&&, Args&& ...)
       [with Func = service::storage_service::check_for_endpoint_collision()::<lambda()>; Args = {};
       futurize_t<typename std::result_of<typename std::decay<_Tp>::type(std::decay_t<Args>...)>::type> = future<>]::work*,
       std::default_delete<seastar::async(Func&&, Args&& ...) [with Func = service::storage_service::check_for_endpoint_collision()::<lambda()>;
       Args = {}; futurize_t<typename std::result_of<typename std::decay<_Tp>::type(std::decay_t<Args>...)>::type> = future<>]::work>, void>
       /usr/include/c++/5.1.1/tuple:213
    #2 0x495aa73 in tuple<seastar::async(Func&&, Args&& ...)
       [with Func = service::storage_service::check_for_endpoint_collision()::<lambda()>; Args = {};
       futurize_t<typename std::result_of<typename std::decay<_Tp>::type(std::decay_t<Args>...)>::type> = future<>]::work*,
       std::default_delete<seastar::async(Func&&, Args&& ...)
       [with Func = service::storage_service::check_for_endpoint_collision()::<lambda()>; Args = {};
       futurize_t<typename std::result_of<typename std::decay<_Tp>::type(std::decay_t<Args>...)>::type> = future<>]::work>, void>
       /usr/include/c++/5.1.1/tuple:613
    #3 0x495ab82 in unique_ptr /usr/include/c++/5.1.1/bits/unique_ptr.h:206
    ...
    #16 0x4d44c8e in _M_invoke /usr/include/c++/5.1.1/functional:1871
    #17 0x5d2fb7 in std::function<void ()>::operator()() const /usr/include/c++/5.1.1/functional:2271
    #18 0x8a1e70 in seastar::thread_context::main() core/thread.cc:139
    #19 0x8a1d89 in seastar::thread_context::s_main(unsigned int, unsigned int) core/thread.cc:130
    #20 0x7fe311b6cf0f  (/lib64/libc.so.6+0x48f0f)

I'm not sure why this patch helps. Perhaps the exception makes ASAN unhappy.

Anyway, this patch makes the debug build work again.

Fixes #613.
2015-12-03 10:42:11 +02:00
Tomasz Grabiec
d64db98943 query: Convert serialization of query::result to use db::serializer<>
That's what we're trying to standardize on.

This patch also fixes an issue with current query::result::serialize()
not being const-qualified, because it modifies the
buffer. messaging_service did a const cast to work this around, which
is not safe.
2015-12-03 09:19:11 +01:00
Tomasz Grabiec
d4d3a5b620 bytes_ostream: Make size_type and value_type public 2015-12-03 09:19:11 +01:00
Tomasz Grabiec
96d215168e Merge tag 'asias/gossip_start_stop/fix/v1' from seastar-dev.git
Fixes for issues in tests from Asias.
2015-12-03 09:10:55 +01:00
Tomasz Grabiec
f0cfa61968 Relax header dependencies 2015-12-03 09:10:02 +01:00
Tomasz Grabiec
9e0c498425 Merge branch 'dev/amnon/latency_clock_v2'
From Amnon:

After this series an example run of cfhistograms report a maximal 0.5s latency
as it should
2015-12-02 19:58:43 +01:00
Amnon Heiman
1812fe9e70 API: Add the get_version to messaging_service swagger definition file
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-02 14:45:44 +02:00
Amnon Heiman
ae53604ed7 API: Add the get_version implementation to messaging service
This patch adds the implementation to the get_version.
After this patch the following url will be available:
messaging_service/version?addr=127.0.0.1

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-02 13:29:40 +02:00
Avi Kivity
53e3e79349 Merge "API: Stubing the compaction manager" from Amnon
"This series allows the compaction manager to be used by the nodetool as a stub implementation.

It has two changes:
* Add to the compaction manager API a method that returns a compaction info
object

* Stub all the compaction method so that it will create an unimplemented
warning but will not fail, the API implementation will be reverted when the
work on compaction will be completed."
2015-12-02 13:28:34 +02:00
Takuya ASADA
871bfb1c94 dist: generate correct distribution codename on debian/changelog
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-02 12:38:52 +02:00
Takuya ASADA
b61ea247d2 dist: check supported Ubuntu release
Warn if unsupported release.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-02 12:38:52 +02:00
Takuya ASADA
0c66c25250 dist: fix typo on scylla_prepare
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-02 11:30:15 +02:00
Asias He
3004866f59 gossip: Rename start to start_gossiping
So that we have a more consistent name start_gossiping() and
stop_gossiping() and it will not confuse with get_gossiper.start().
2015-12-02 16:50:34 +08:00
Asias He
5c3951b28a gossip: Get rid of the handler helper 2015-12-02 16:50:34 +08:00
Asias He
7a6ad7aec2 gossip: Fix Assertion `local_is_initialized()' failed
This patch fixes the following cql_query_test failure.

   cql_query_test: scylla/seastar/core/sharded.hh:439:
   Service& seastar::sharded<Service>::local() [with Service =
   gms::gossiper]: Assertion `local_is_initialized()' failed.

The problem is in gossiper::stop() we call gossip::add_local_application_state()
which will in turn call gms::get_local_gossiper(). In seastar::sharded::stop

 _instances[engine().cpu_id()].service = nullptr;
 return inst->stop().then([this, inst] {
     return _instances[engine().cpu_id()].freed.get_future();
 });

We set the _instances to nullptr before we call the stop method, so
local_is_initialized asserts when we try to access get_local_gossiper
again.

To fix, we make the stopping of gossiper explicit. In the shutdown
procedure, we call stop_gossiping() explicitly.

This has two more advantages:

1) The api to stop gossip is now calling the stop_gossiping() instead of
sharing the seastar::sharded's stop method.

2) We can now get rid of the _handler seastar::sharded helper.
2015-12-02 16:50:34 +08:00
Asias He
e22972009b gossip: Make log message in mark_dead stick to cassandra 2015-12-02 14:21:26 +08:00
Asias He
ad30cf0faf failure_detector: Use a standalone logger name
Do not share logger with gossip. Sometimes, it is useful to only see one
of them.
2015-12-02 14:21:26 +08:00
Asias He
eb05dc680d storage_service: Warn lost of data when remove a node
If RF = 1 and one node is down, it is possible that data is lost. Warn
this in the logger.
2015-12-02 14:21:26 +08:00
Asias He
0fe14e2b4b storage_service: Do not ignore future in remove_node 2015-12-02 14:21:26 +08:00
Asias He
5a7f15ba49 storage_service: Run confirm_replication on cpu zero
storage_service::_replicating_nodes is valid on cpu zero only.
2015-12-02 14:21:26 +08:00
Amnon Heiman
7e79d35f85 Estimated histogram: Clean the add interface
The add interface of the estimated histogram is confusing as it is not
clear what units are used.

This patch removes the general add method and replace it with a add_nano
that adds nanoseconds or add that gets duration.

To be compatible with origin, nanoseconds vales are translated to
microseconds.
2015-12-01 15:28:06 +02:00
Amnon Heiman
61abc85eb3 histogram: Add started counter
This patch adds a started counter, that is used to mark the number of
operation that were started.

This counter serves two purposes, it is a better indication for when to
sample the data and it is used to indicate how many pending operations
are.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-01 15:28:06 +02:00
Amnon Heiman
88dcf2e935 latency: Switch to steady_clock
The system clock is less suitable for for time difference than
steady_clock.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-01 15:28:06 +02:00
Avi Kivity
f667e05e08 Merge "backport gosisp and storage_service fix" from Asias
"This contains most bug fixes from imported version commit
38847a6bd967e4f41bc7b1fc83629161a2c214dc to c* 2.1.11 for gossip and
storage_service."
2015-12-01 14:42:41 +02:00
Asias He
dc6f2157e7 Update ORIGIN for gossip and storage_service 2015-12-01 19:45:04 +08:00
Amnon Heiman
3674ee2fc1 API: get snapshot size
This patch adds the column family API that return the snapshot size.
The changes in the swagger definition file follo origin so the same API will be used for the metric and the
column_family.

The implementation is based on the get_snapshot_details in the
column_family.

This fix:
425

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-01 11:41:52 +02:00
Asias He
56df32ba56 gossip: Mark node as dead even if already left
Backport: CASSANDRA-10205

484e645 Mark node as dead even if already left
2015-12-01 17:29:25 +08:00
Asias He
59694a8e43 failure_detector: Print versions for gossip states in gossipinfo
Backport: CASSANDRA-10330

ae4cd69 Print versions for gossip states in gossipinfo

For instance, the version for each state, which can be useful for
diagnosing the reason for any missing states. Also instead of just
omitting the TOKENS state, let's indicate whether the state was actually
present or not.
2015-12-01 17:29:25 +08:00
Asias He
af91a8f31b storage_service: Fix transition from write survey to normal mode
Backport: CASSANDRA-9740

52dbc3f Can't transition from write survey to normal mode
2015-12-01 17:29:25 +08:00
Asias He
2f071d9648 storage_service: Refuse to decommission if not in state NORMAL
Backport: CASSANDRA-8741

5bc56c3 refuse to decomission if not in state NORMAL
2015-12-01 17:29:25 +08:00
Asias He
224db2ba37 failure_detector: Don't mark nodes down before the max local pause interval once paused
Backport: CASSANDRA-9446

7fba3d2 Don't mark nodes down before the max local pause interval once paused
2015-12-01 17:29:25 +08:00
Asias He
51fcc48700 failure_detector: Failure detector detects and ignores local pauses
Backport: CASSANDRA-9183

4012134 Failure detector detects and ignores local pauses
2015-12-01 17:29:25 +08:00
Asias He
1b9e350614 gossip: Do not print node is now part of the cluster during gossip shadow round
With

   Node 1 (Seed node, Port 7000 is opened, 10.184.9.144)
   Node 2 (Port 7000 is opened, 10.184.9.145)
   Node 3 (Port 7000 is blocked by firewall)

On Node 3, we saw the following error which was very confusing: Node 3
saw Node 1 and Node 3 but it complained it can not contact any seeds.

The message "Node 10.184.9.144 is now part of the cluster" and friends
are actually messages printed during the gossip shadow round where Node
3 connects to Node 1's port 7000 and Node 1 returns all info it knows to
Node 3, so that Node 3 knows Node 1 and Node 2 and we see the "Node
10.184.9.144/145 is now part of the cluster" message.

However, during the normal gossip round, Node 3 will not mark Node 1 and
Node 2 UP until the Seed node initiates a gossip round to Node 3, (note
port 7000 on node 3 is blocked in this case). So Node 3 will not mark
Node 1 and Node 2 UP and we see the "Unable to contact any seeds" error.

[shard 0] storage_service - Loading persisted ring state
[shard 0] gossip - Node 10.184.9.144 is now part of the cluster
[shard 0] gossip - inet_address 10.184.9.144 is now UP
[shard 0] gossip - Node 10.184.9.145 is now part of the cluster
[shard 0] gossip - inet_address 10.184.9.145 is now UP
[shard 0] storage_service - Starting up server gossip
scylla_run[12479]: Start gossiper service ...
[shard 0] storage_service - JOINING: waiting for ring information
[shard 0] storage_service - JOINING: schema complete, ready to bootstrap
[shard 0] storage_service - JOINING: waiting for pending range calculation
[shard 0] storage_service - JOINING: calculation complete, ready to bootstrap
[shard 0] storage_service - JOINING: getting bootstrap token
[shard 0] storage_service - JOINING: sleeping 5000 ms for pending range setup
scylla_run[12479]: Exiting on unhandled exception of type 'std::runtime_error': Unable to contact any seeds!
2015-12-01 17:29:25 +08:00
Asias He
f62a6f234b gossip: Add shutdown gossip state
Backported: CASSANDRA-8336 and CASSANDRA-9871

84b2846 remove redundant state
b2c62bb Add shutdown gossip state to prevent timeouts during rolling restarts
8f9ca07 Cannot replace token does not exist - DN node removed as Fat Client

Fixes:

When X is shutdown, X sends SHUTDOWN message to both Y and Z, but for
some reason, only Y receives the message and Z does not receive the
message. If Z has a higher gossip version for X than Y has for
X, Z will initiate a gossip with Y and Y will mark X alive again.

X ------> Y
 \      /
  \    /
    Z
2015-12-01 17:29:25 +08:00
Gleb Natapov
8c02ad0e9e messaging: log connection dropping event 2015-11-30 19:42:04 +02:00
Avi Kivity
b85f3ad130 Merge "Commit log replay - handle corrupted data silently, as non-fatal"
Fixes: #593

"Changes the parser/replayer to treat data corruption as non-fatal,
skipping as little as possible to get the most data out of a segment,
but keeping track of, and reporting, the amount corrupted.

Replayer handles this and reports any non-fatal errors on replay finish.
Also added tests for corruption cases.

This patch series contains a cleanup-patch for commitlog_tests that was
previously submitted, but got lost."
2015-11-30 19:13:31 +02:00
Gleb Natapov
5b9f3bff7d storage_proxy: simplify error handling by using this/handle_exception
It is cleaner to use this/handle_exception instead of then_wrapped if
normal and error flow do not share any state.
2015-11-30 17:41:32 +02:00
Gleb Natapov
5484f25091 storage_proxy: remove unneeded continuation
make_ready_future() around when_all() is not longer needed. It was
added to catch mutate_locally() exceptions, but now it is handled in
lower level.
2015-11-30 17:41:28 +02:00
Gleb Natapov
cf95c3f681 storage_proxy: introduce unique_response_handler object to prevent write request leaks
If something bad happens between write request handler creation and
request execution the request handler have to be destroyed. Currently
code tries to do that explicitly in all places where request may be
abandoned, but it misses some (at least one). This patch replaces this
by introducing unique_response_handler object that will remove the handler
automatically if request is not executed for some reason.
2015-11-30 17:41:27 +02:00
Gleb Natapov
d8afc6014e storage_proxy: catch exception thrown by mutate_locally in mutate verb handler
Also simplify error logging.
2015-11-30 17:41:25 +02:00
Avi Kivity
3c9ded27cc Update scylla-ami submodule
* ami/files/scylla-ami 3f37184...07b7118 (1):
  > Use /etc/scylla as SCYLLA_CONF directory
2015-11-30 16:39:49 +02:00
Takuya ASADA
616903de12 dist: use distribution version of antlr3, on Ubuntu 15.10
Rename antlr3-tool to antlr3 (same as distribution package), and use distribution version if it's available

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-30 16:37:36 +02:00
Pekka Enberg
0e8f80b5ee Merge "Relax bootstrapping/leaving/moving nodes check" from Asias 2015-11-30 11:53:07 +02:00
Asias He
2022117234 failure_detector: Enable phi_convict_threshold option
Adjusts the sensitivity of the failure detector on an exponential scale.

Use as:

$ scylla --phi-convict-threshold 9

Default to 8.
2015-11-30 11:09:36 +02:00
Asias He
db70643fe3 failure_detector: Print application_state properly 2015-11-30 11:08:40 +02:00
Asias He
aaca88a1e7 token_metadata: Add print_pending_ranges for debug print
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-11-30 11:07:42 +02:00
Avi Kivity
2c59e2f81f Merge "Fix race between population and update in row cache" from Tomasz
"Before this change, populations could race with update from flushed
memtable, which might result in cache being populated with older
data. Populations started before the flush are not considering the
memtable nor its sstable.

The fix employed here is to make update wait for populations which
were started before the flushed memtable's sstable was added to the
undrelying data source. All populatinos started after that are
guaranteed to see the new data. The update() call will wait only for
current populating reads to complete, it will not wait for readers to
get advanced by the consumer for instance."
2015-11-30 11:06:23 +02:00
Tomasz Grabiec
8d88ece896 schema_tables: Fix "comment" property not being loaded from storage 2015-11-30 10:57:36 +02:00
Pekka Enberg
a64fa3db03 Merge "range_streamer fix and cleanup" from Asias
Do not use hard-coded value for is_replacing and rangemovement.
2015-11-30 10:47:06 +02:00
Asias He
879a4ad4d3 storage_service: Update pending ranges immediately after update of normal tokens
To avoid a race where natural endpoint was updated to contain node A,
but A was not yet removed from pending endpoints.

This fixes the root cause of commit d9d8f87c1 (storage_proxy: filter out
natural endpoints from pending endpoint). This patch alone fixes #539,
but we still want commit d9d8f87c1 to be safe.
2015-11-30 10:20:59 +02:00
Asias He
0af7fb5509 range_streamer: Kill FIXME in use_strict_consistency for consistent_rangemovement 2015-11-30 09:15:42 +08:00
Asias He
f80e3d7859 range_streamer: Simplify multiple_map to map conversion in add_ranges 2015-11-30 09:15:42 +08:00
Asias He
21882f5122 range_streamer: Kill one leftover comment 2015-11-30 09:15:42 +08:00
Asias He
6b258f1247 range_streamer: Kill FIXME for is_replacing 2015-11-30 09:15:42 +08:00
Asias He
aa2b11f21b database: Move is_replacing and get_replace_address to database class
So they can be used outside storage_service.
2015-11-30 09:15:42 +08:00
Asias He
80d1d4d161 storage_service: Relax bootstrapping/leaving/moving nodes check in check_for_endpoint_collision
When other bootstrapping/leaving/moving nodes are found during
bootstrap, instead of throwing immediately, sleep and try again for one
minute, hoping other nodes will finish the operation soon.

Since we are retrying using shadow gossip round more than once, we need
to put the gossip state back to shadow round after each shadow round, to
make shadow round works correctly.

This is useful when starting an empty cluster for testing. E.g,

   $ scylla --listen-address 127.0.0.1
   $ sleep 3
   $ scylla --listen-address 127.0.0.2
   $ sleep 3
   $ scylla --listen-address 127.0.0.3

Without this patch, node 3 will hit the check.

   TIME  STATUS
   -----------------------
   Node  1:
   32:00 Starts
   32:00 In NORMAL status

   Node  2:
   32:03 Starts
   32:04 In BOOT status
   32:10 In NORMAL status

   Node  3:
   32:06 Starts
   32:06 Found node 2 in BOOT status, hit the check, sleep and try again
   32:11 Found node 2 in NORMAL status, can keep going now
   32:12 In BOOT status
   32:18 In NORMAL status
2015-11-30 09:07:57 +08:00
Asias He
8b19373536 storage_service: Relax bootstrapping/leaving/moving nodes check in join_token_ring
When other bootstrapping/leaving/moving nodes are found during
bootstrap, instead of throwing immediately, sleep and try again for one
minute, hoping other nodes will finish the operation soon.

This is useful when starting an empty cluster for testing. E.g,

   $ scylla --listen-address 127.0.0.1
   $ scylla --listen-address 127.0.0.2
   $ scylla --listen-address 127.0.0.3

Without this patch, node 3 will hit the check.

   TIME  STATUS
   -----------------------
   Node  1:
   25:19 Starts
   25:20 In NORMAL status

   Node  2:
   25:19 Starts
   25:23 In BOOT status
   25:28 In NORMAL status

   Node  3:
   25:19 Starts
   25:24 Found node 2 in BOOT status, hit the check, sleep and try again
   25:29 Found node 2 in NORMAL status, can keep going now
   25:29 In BOOT status
   25:34 In NORMAL status
2015-11-30 09:07:57 +08:00
Tomasz Grabiec
df46542832 tests: Add test for populate and update race 2015-11-29 16:25:22 +01:00
Tomasz Grabiec
6f69d4b700 tests: Avoid potential use after free on partition range 2015-11-29 16:25:21 +01:00
Tomasz Grabiec
de75f3fa69 row_cache: Add default value for partition range in make_reader() 2015-11-29 16:25:21 +01:00
Tomasz Grabiec
ab328ead3d mutation: Introduce ring_position() 2015-11-29 16:25:21 +01:00
Tomasz Grabiec
32ac2ccc4a memtable: Introduce apply(memtable&) 2015-11-29 16:25:21 +01:00
Tomasz Grabiec
7c3e6c306b row_cache: Wait for in-flight populations on update
Before this change, populations could race with update from flushed
memtable, which might result in cache being populated with older
data. Populations started before the flush are not considering the
memtable nor its sstable.

The fix employed here is to make update wait for populations which
were started before the flushed memtable's sstable was added to the
undrelying data source. All populatinos started after that are
guaranteed to see the new data.
2015-11-29 16:25:21 +01:00
Tomasz Grabiec
a3e3add28a utils: Introduce phased_barrier
Utility for waiting on a group of async actions started before certain
point in time.
2015-11-29 16:25:21 +01:00
Pekka Enberg
a26ffefd53 transport/server: Remove CQL text type from encoding
The text data type is no longer present in CQL binary protocol v3 and
later. We don't need it for encoding earlier versions either because
it's an alias for varchar which is present in all CQL binary protocol
versions.

Fixes #526.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-11-27 09:13:56 +01:00
Pekka Enberg
2599b78583 Merge "CQL notification + storage_service fix" from Asias
"pushed_notifications_test.py dtest is now passing"
2015-11-27 10:09:56 +02:00
Asias He
da0e80a286 storage_service: Fix failed bootstrap/replace attempts being persisted in system.peers
Backported from:

ac46747 Fix failed bootstrap/replace attempts being persisted in system.peers (CASSANDRA-9180)
2015-11-27 15:31:56 +08:00
Asias He
36b2de10ed failure_detector: Improve FD logging when the arrival time is ignored
Backport from:

eb9c5bb Improve FD logging when the arrival time is ignored.
2015-11-27 15:31:56 +08:00
Asias He
ed9cd23a2d transport: Fix duplicate up/down messages sent to native clients
This patch plus pekka's previous commit 3c72ea9f96

   "gms: Fix gossiper::handle_major_state_change() restart logic"

fix CASSANDRA-7816.

Backported from:

   def4835 Add missing follow on fix for 7816 only applied to
           cassandra-2.1 branch in 763130bdbde2f4cec2e8973bcd5203caf51cc89f
   763130b Followup commit for 7816
   2199a87 Fix duplicate up/down messages sent to native clients

Tested by:
   pushed_notifications_test.py:TestPushedNotifications.restart_node_test
2015-11-27 15:31:56 +08:00
Asias He
25bb889c2a transport: Fix wrong message for UP and DOWN event 2015-11-27 15:31:56 +08:00
Asias He
ca8c4f3e77 storage_service: Fix MOVED_NODE client event (CASSANDRA-8516)
Backport from:

b296c55f956c6ef07c8330dc28ef8c351e5bcfe2 (Fix MOVED_NODE client event)

Fixes:

DISABLE_VNODES=true nosetests
pushed_notifications_test.py:TestPushedNotifications.move_single_node_test
2015-11-27 15:31:56 +08:00
Gleb Natapov
ad358300a9 cql server: remove connection from notifiers earlier
Remove connection from notifiers lists just before closing it to prevent
attempts to send notification on already closed connection.
2015-11-26 18:50:08 +02:00
Pekka Enberg
569d288891 cql3: Add TRUNCATE TABLE alias for TRUNCATE
CQL 3.2.1 introduces a "TRUNCATE TABLE X" alias for "TRUNCATE X":

  4e3555c1d9

Fix our CQL grammar to also support that.

Please note that we don't bump up advertised CQL version yet because our
cqlsh clients won't be able to connect by default until we upgrade them
to C* 2.1.10 or later.

Fixes #576

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-11-26 18:45:50 +02:00
Gleb Natapov
96f40d535e cql server: add missing gate during connection access
cql connection access is protected by a gate, but event notifiers have
omitted taking it. Fix it.
2015-11-26 13:05:59 +02:00
Tomasz Grabiec
a7c11d1e30 db: Fix handling of missing column family
The FIXMEs are no longer valid, we load schema on bootstrap and don't
support hot-plugging of column families via file system (nor does
Cassandra).

Handling of missing tables matches Cassandra 2.1, applies log
it and continue, queries propagate the error.
2015-11-25 16:59:15 +02:00
Tomasz Grabiec
3a402db1be storage_proxy: Remove dead signature 2015-11-25 16:57:03 +02:00
Asias He
d03b452322 storage_service: Remove RPC client in on_dead
When gossip mark a node down, we should close all the RPC connections to
that node.
2015-11-25 16:30:14 +02:00
Gleb Natapov
d9d8f87c1b storage_proxy: filter out natural endpoints from pending endpoint
If request comes after natural endpoint was updated to contain node A,
but A was not yet removed from pending endpoints it will be in both and
write request logic cannot handle this properly. Filter nodes which are
already in natural endpoint from pending endpoint to fix this.

Fixes #539.
2015-11-25 16:28:55 +02:00
Pekka Enberg
cf7541020f Merge "Enable more config options" from Asias 2015-11-25 16:09:22 +02:00
Tomasz Grabiec
c3f03d5c96 Merge branch 'pdziepak/random-lsa-patches/v3' from seastar-dev.git
LSA fixes from Paweł.
2015-11-25 10:26:23 +01:00
Paweł Dziepak
89f7f746cb lsa: fix printing object_descriptor::_alignment
object_descriptor::_alignment is of type uint8_t which is actually an
unsigned char.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 20:13:29 +01:00
Paweł Dziepak
65875124b7 lsa: guarantee that segment_heap doesn't throw
boost::heap::binomial_heap allocates helper object in push() and,
therefore, may throw an exception. This shouldn't happen during
compaction.

The solution is to reserve space for this helper object in
segment_descriptor and use a custom allocator with
boost::heap::binomial_heap.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 19:51:22 +01:00
Paweł Dziepak
273b8daeeb lsa: add no-op default constructor for segment
Zero initialization of segment::data when segment is value initialized
is undesirable.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:37:37 +01:00
Paweł Dziepak
e6cf3e915f lsa: add counters for memory used by large objects
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:36:27 +01:00
Paweł Dziepak
9396956955 scylla-gdb.py: show lsa statistics and regions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:36:20 +01:00
Paweł Dziepak
aaecf5424c scylla-gdb.py: show free, used and total memory
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:36:16 +01:00
Paweł Dziepak
6b113a9a7a lsa: fix eviction of large blobs
LSA memory reclaimer logic assumes that the amount of memory used by LSA
equals: segments_in_use * segment_size. However, LSA is also responsible
for eviction of large objects which do not affect the used segmentcount,
e.g. region with no used segments may still use a lot of memory for
large objects. The solution is to switch from measuring memory in used
segments to used bytes count that includes also large objects.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-24 16:29:09 +01:00
Takuya ASADA
4a8c79ca0e dist: re-initialize RAID on ephemeral disk when stop/restart AMI instance
Since this won't check disk types, may re-initialize RAID on EBS when first block was lost.
But in such condition, probably re-initialize RAID is the only choice we can take, so this is fine.
Fixes #364.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-24 10:46:10 +02:00
Asias He
3a9200db03 config: Add more document for options
For consistent_rangemovement, join_ring, load_ring_state, etc.
2015-11-24 10:07:31 +08:00
Asias He
7ddf8963f5 config: Enable broadcast_rpc_address option
With this patch, start two nodes

node 1:
scylla --rpc-address 127.0.0.1 --broadcast-rpc-address 127.0.0.11

node 2:
scylla --rpc-address 127.0.0.2 --broadcast-rpc-address 127.0.0.12

On node 1:
cqlsh> SELECT rpc_address from system.peers;

 rpc_address
-------------
  127.0.0.12

which means client should use this address to connect node 2 for cql and
thrift protocol.
2015-11-24 10:07:31 +08:00
Asias He
33ef58c5c9 utils: Add get_broadcast_rpc_address and set_broadcast_rpc_address helper 2015-11-24 10:07:31 +08:00
Asias He
1e55aa38c1 storage_service: Implement is_replacing 2015-11-24 10:07:29 +08:00
Asias He
644c226d58 config: Enable replace_address and replace_address_first_boot option
It is same as

   -Dcassandra.replace_address
   -Dcassandra.replace_address_first_boot

in cassandra.
2015-11-24 10:07:24 +08:00
Asias He
bfe26ea208 config: Enable replace_token option
It is same as

   -Dcassandra.replace_token

in cassandra.

Use it as:

   $ scylla --replace-token $token1,$token2,$token3
2015-11-24 10:07:20 +08:00
Asias He
730abbc421 config: Enable replace_node option
It is same as

   -Dcassandra.replace_node

in cassandra.

Use it as:

   $ scylla --replace-node $node_uuid
2015-11-24 10:07:16 +08:00
Asias He
2513d6ddbe config: Enable load_ring_state option
It is same as

   -Dcassandra.load_ring_state

in cassandra.

Use it as:

   $ scylla --load-ring-state 0

or

   $ scylla --load-ring-state 1
2015-11-24 10:07:12 +08:00
Asias He
6e72e78e0d config: Enable join_ring option
It is same as

   -Dcassandra.join_ring

in cassandra.

Use it as:

   $ scylla --join-ring 0

or

   $ scylla --join-ring 1
2015-11-24 10:07:07 +08:00
Asias He
505b3e4936 config: Enable consistent_rangemovement option
It is same as

  -Dcassandra.consistent.rangemovement

in cassandra.

Use it as:

  $ scylla --consistent-rangemovement 0

or

  $ scylla --consistent-rangemovement 1
2015-11-24 10:06:54 +08:00
Gleb Natapov
33e5097090 messaging: do not kill live connection needlessly
Messaging service closes connection in rpc call continuation on
closed_error, but the code runs for each outstanding rpc call on the
connection, so first continuation may destroy genuinely closed connection,
then connection is reopened and next continuation that handless previous
error kills now perfectly healthy connection. Fix this by closing
connection only in error state.
2015-11-23 20:16:28 +02:00
Tomasz Grabiec
cb0b56f75f Merge tag 'empty/v3' from https://github.com/avikivity/scylla
From Avi:

Origin supports a notion of empty values for non-container types; these
are serialized as zero-length blobs.  They are mostly useless and only
retained for compatibility.

The implementation here introduces a wrapper maybe_empty<T>, similar to
optional<T> but oriented towards usually-nonempty usage with implicit
conversion.

There is more work needed for full empty support: fixing up deserializers to
create empty values instead of nulls, and splitting up data_value into
data_value and a data_value_nonnull for the cases that require it.

(I chose maybe_empty<> rather than using optional<data_value> for nullable
data_value both because it requires fewer changes, and because
optional<data_value> introduces a lot of control flow when moving or copying,
which would be mostly useless in most cases).
2015-11-23 16:12:06 +01:00
Calle Wilund
b1a0c4b451 commitlog_tests: Add segment corruption tests
Test entry and chunk corruption.
2015-11-23 15:43:33 +01:00
Calle Wilund
d65adef10c commitlog_tests: test cleanup
This cleanup patch got lost in git-space some time ago. It is however sorely
needed...

* Use cleaner wrapper for creating temp dir + commit log, avoiding
  having to clear and clean in every test, etc.
* Remove assertions based on file system checks, since these are not
  valid due to both the async nature of the CL, and more to the point,
  because of pre-allocation of files and file blocks. Use CL
  counters/methods instead
* Fix some race conditions to ensure tests are safe(r)
* Speed up some tests
2015-11-23 15:42:45 +01:00
Calle Wilund
262f44948d commitlog: Add get_flush_count method (for testing) 2015-11-23 15:42:45 +01:00
Calle Wilund
76b43fbf74 commitlog_replayer: Handle replay data errors as non-fatal
Discern fatal and non-fatal excceptions, and handle data corruption 
by adding to stats, resporting it, but continue processing.

Note that "invalid_arguement", i.e. attempting to replay origin/old
segments are still considered fatal, as it is probably better to 
signal this strongly to user/admin
2015-11-23 15:42:45 +01:00
Calle Wilund
2fe2320490 commitlog: Make reading segments with crc/data errors non-fatal
Parser object now attempts to skip past/terminate parsing on corrupted
entries/chunks (as detected by invalid sizes/crc:s). The amount of data
skipped is kept track of (as well as we can estimate - pre-allocation
makes it tricky), and at the end of parsing/reporting, IFF errors 
occurred, and exception detailing the failures is thrown (since 
subsciption has little mechanism to deal with this otherwise). 

Thus a caller can decide how to deal with data corruption, but will be
given as many entries as possible.
2015-11-23 15:42:45 +01:00
Avi Kivity
23895ac7f5 types: fix up confusion around empty serialized representation
An empty serialized representation means an empty value, not NULL.

Fix up the confusion by converting incorrect make_null() calls to a new
make_empty(), and removing make_null() in empty-capable types like
bytes_type.

Collections don't support empty serialized representations, so remove
the call there.
2015-11-22 12:20:24 +02:00
Tomasz Grabiec
ae9e0c3d41 storage_proxy: Avoid potential use after move on schema_ptr
Paramter evaluation order is unspecified, so it's possible that the
move of 'schema' into lambda captures would happen before construction of
mutation.
2015-11-22 12:15:04 +02:00
Avi Kivity
0799251a9f Merge "optimize the sstable loading step of boot" from Raphael
"To speed up boot, parallelism was introduced to our code that loads
sstables from a column family, a function was implemented to read
the minimum from a sstable to determine whether it belongs to the
current shard, and buffer size in read simple is dynamically chosen
based on the size of the file and dma alignment.
The latter is important because filter file can be considerably
large when the respective sstable (data file) is very large.
Before this patchset, scylla took about 5 minutes to boot with a
data directory of 660GB. After this patchset, scylla took about 20
seconds to boot with the same data directory."
2015-11-22 11:27:34 +02:00
Asias He
23723991ed gossip: Fix STATUS field in nodetool gossipinfo
Before:
   === with c* cluster ===
   $ nodetool -p 7100 gossipinfo

   STATUS:NORMAL,-1139428872328849340

   === with scylla ===
   $ nodetool -p 7100 gossipinfo

   0:NORMAL,8251763528961471825;-9147358554612963965;5334343410266177046

After:
   === with scylla ===
   $ nodetool -p 7100 gossipinfo

   0:NORMAL,8251763528961471825

To align with c*, print one token in the STATUS field.

Refs #508.
2015-11-20 10:57:49 +02:00
Raphael S. Carvalho
a5842642fa sstables: change buf size in read_simple to 128k
Avi says:
"A small buffer size will hurt if we read a large file, but
a large buffer size won't hurt if we read a small file, since
we close it immediately."

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:35:25 -02:00
Raphael S. Carvalho
0f3ccc1143 db: optimize the sstable loading process
Currently, we only determine if a sstable belongs to current shard
after loading some of its components into memory. For example,
filter may be considerably big and its content is irrelevant to
decide if a sstable should be included to a given shard.
Start using the functions previously introduced to optimize the
sstable loading process. add_sstable no longer checks if a sstable
is relevant to the current shard.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:25 -02:00
Raphael S. Carvalho
0053394ec0 sstables: introduce mark_sstable_for_deletion
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:24 -02:00
Raphael S. Carvalho
0ce2b7bc8d db: introduce belongs_to_current_shard
Returns true if key range belongs to current shard.
False otherwise.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:21 -02:00
Raphael S. Carvalho
f06b72eb18 sstables: introduce function to return sstable key range
Provides a function that will return sstable key range reading only
the summary component.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:19 -02:00
Raphael S. Carvalho
966e8c7144 db: introduce parallelism to sstable loading
Boot may be slow because the function that loads sstables do so
serially instead of in parallel. In the callback supplied to
lister::scan_dir, let's push the future returned by probe_file
(function that loads sstable) into a vector of future and wait
for all of them at the end.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-19 13:34:11 -02:00
Takuya ASADA
83c8b3e433 dist: support Ubuntu 15.10
We cannot share some dependency package names between 14.04 and 15.10, so need to add ifdefs.
Not tested on other version of Ubuntu.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-19 10:57:25 +02:00
Tomasz Grabiec
53e842aaf7 scylla-gdb.py: Fix scylla column_families command 2015-11-19 10:44:00 +02:00
Avi Kivity
0b91b643ba types: empty value support for non-container types
Origin supports (https://issues.apache.org/jira/browse/CASSANDRA-5648) "empty"
values even for non-container types such as int.  Use maybe_empty<> to
encapsulate abstract_type::native_type, adding an empty flag if needed.
2015-11-18 18:38:38 +02:00
Avi Kivity
7257f72fbf types: introduce maybe_empty<T> type alias
- T for container types (that can naturally be empty)
 - emptyable<T> otherwise (adding that property artificially)
2015-11-18 15:25:24 +02:00
Avi Kivity
58d3a3e138 types: introduce emptyable<> template
Similar to optional<>, with the following differences:
 - decays back to the encapsulated type, with an emptiness check;
   this reflects the expectation that the value will rarely be empty
 - avoids conditionals during copy/move (and requires a default constructor),
   again with the same expectation.
2015-11-18 15:25:22 +02:00
Gleb Natapov
0870caaea1 cql transport: catch all exceptions
Not all exceptions are inherited from std::exception
(std::nested_exception) for instance, so catch and log all of them.
2015-11-18 15:17:43 +02:00
Asias He
242e5ea291 streaming: Ignore remote no_such_column_family for stream_transfer_task
When we start to sending mutations for cf_id to remote node, remote node
might do not have the cf_id anymore due to dropping of the cf for
instance.

We should not fail the streaming if this happens, since the cf does not
exist anymore there is no point streaming it.

Fixes #566
2015-11-18 15:12:23 +02:00
Asias He
3816e35d11 storage_service: Detect other bootstrapping/leaving/moving nodes during bootstrap 2015-11-18 15:11:56 +02:00
Avi Kivity
6390bc3121 README: instructions for contributing 2015-11-18 15:10:37 +02:00
Asias He
3b52033371 gossip: Favor newly added node in do_gossip_to_live_member
When a new node joins a cluster, it will starts a gossip round with seed
node. However, within this round, the seed node will not tell the new
node anything it knows about other nodes in the cluster, because the
digest in the gossip SYN message contains only the new node itself and
no other nodes. The seed node will pick randomly from the live nodes,
including the newly added node in do_gossip_to_live_member to start a
gossip round. If the new node is "lucky", seed node will talk to it very
soon and tells all the information it knows about the cluster, thus the
new node will mark the seed node alive and think it has seen the seed
node. If there considerably large number of live nodes, it might take a
long time before the seed node pick the new node and talk to it.

In bootstrap code, storage_service::bootstrap checks if we see any nodes
after sleep of RING_DELAY milliseconds and throw "Unable to contact any
seeds!" if not, thus the node will fail to bootstrap.

To help the seed node talk to new node faster, we favor new node in
do_gossip_to_live_member.
2015-11-18 15:00:37 +02:00
Amnon Heiman
374414ffd0 API: failure_detector modify the get_all_endpoint_states
In origin, get_all_endpoint_states perform all the information
formatting and returns a string.

This is not a good API approach, this patch replaces the implementation
so the API will return an array of values and the JMX will do the
formatting.

This is a better API and would make it simpler in the future to stay in
sync with origin output.

This patch is part of #508

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-18 14:59:09 +02:00
Avi Kivity
17f6dc3671 Merge seastar upstream
* seastar 95ddb8e...84cb6df (2):
  > rpc: do not convert EOF into exception
  > reactor: remove debug output in command line option validation
2015-11-18 11:20:27 +02:00
Takuya ASADA
16cd5892f7 dist: setup rps on scylla_prepare, not on scylla_run
All preparation of running scylla should be done in scylla_prepare

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-18 11:20:17 +02:00
Asias He
269ea7f81b storage_service: Enable is_ready_for_bootstrap in join_token_ring
The goal is to make sure our schema matches with other nodes in the
cluster.
2015-11-18 10:46:40 +02:00
Asias He
bb1470f0d4 migration_manager: Introduce is_ready_for_bootstrap
This compares local schema version with other nodes in the cluster.
Return true if all of them match with each other.
2015-11-18 10:46:06 +02:00
Avi Kivity
ba859acb3b big_decimal: add default constructor
Arithmetic types should have a default constructor, and anyway the
following patch wants it.
2015-11-18 10:36:03 +02:00
Takuya ASADA
f0a6c33b6d dist: use /var/lib/scylla instead of /data on ami
Fixes #551.
Change mountpoint to /var/lib/scylla, copy conf/ on it.
Note: need to replace conf/ with symlink to /etc/scylla when new rpm uploaded on yum repository.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Signed-off-by: Pekka Enberg <penberg@iki.fi>
2015-11-18 10:10:48 +02:00
Calle Wilund
6d25b8d496 query_pagers: Handle partition with no data but statics
If we get a partition with no row data, but statics, we should treat this as
a row (include in count), but also make sure we skip to next partition
if our page ends here.

The "end partition" with zero rows but static data can also happen if we
happen to resume paging by giving a column range exluding all data. In this
case we should _not_ include it, since we have already provided the
data in question in previous page.

Fixes #556
2015-11-17 17:49:14 +02:00
Calle Wilund
5368f1a998 query_pagers: Fix aggregate query bug
1.) Should not reset to input query state if run repeatedly
2.) And if run repeatedly without input state, likewise keep
    the internal one active

Fixes #560
2015-11-17 17:48:58 +02:00
Avi Kivity
7d8c72d24f Merge "move scylla.yaml to /etc/scylla" from Takuya
"To keep compatibility with scylla-tools-java, it links /etc/scylla to /var/lib/scylla/conf.

Problem on this patchset is, I added SCYLLA_HOME and SCYLLA_CONF on /etc/sysconfig/scylla-server.
However, the file is marked as config file, it won't be automatically upgrade.
If user doesn't upgrade the file manually, scylla-server still able to run with /var/lib/scylla/conf because we have symlink, but never switches to /etc/scylla."
2015-11-17 17:02:14 +02:00
Takuya ASADA
60efcd791a dist: don't specify scylla.yaml path directly, use SCYLLA_CONF instead 2015-11-17 02:37:37 +09:00
Takuya ASADA
6972d78e93 dist: specify SCYLLA_CONF as /etc/scylla
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-17 02:37:37 +09:00
Takuya ASADA
aeb076a6c0 dist: change scylla.yaml install path to /etc/scylla (Ubuntu)
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-17 02:37:37 +09:00
Takuya ASADA
b5bd34c0c2 dist: change scylla.yaml install path to /etc/scylla (Fedora/CentOS)
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-17 02:37:37 +09:00
Takuya ASADA
51813dc8ce dist: merge redhat/limits.d and debian/limits.d to common/limits.d
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-17 02:37:37 +09:00
Takuya ASADA
863eea2bb3 dist: do not overwrite config files on 'yum update'
Fixes a problem on AMI which recently reported by Glauber

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-17 02:37:37 +09:00
Asias He
6ac54a27dc streaming: Skip non-exist cf for stream_transfer_task
Skip sending the mutation if the cf is dropped after we call
make_local_reader in stream_session::add_transfer_ranges().

Fix #550.
2015-11-16 16:48:35 +01:00
Paweł Dziepak
c37afcfdee lsa: account for size of objects too big for LSA
While the objects above max_manage_object_size aren't stored in the
LSA segments they are still considered to be belonging to the LSA
region and are evictable using that region evictor.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-16 12:22:12 +01:00
Avi Kivity
e04a6ea80a Merge "Add the natural_endpoints API" from Amnon
"This series adds the natural_endpoints API. It adds the implementation to the storage_service and to the storage_service API.

After this series the noodtool command getendpoints should work.

example:
$ bin/nodetool getendpoints keyspace1 standard1 0x5032394c323239385030127.0.0.2
127.0.0.2"
2015-11-16 13:16:42 +02:00
Amnon Heiman
c2051c200e API: messaging service add timeout and dropped support
This patch adds the API for timeout messages and dropped messages.

For dropped messages, origin has two APIs one for messages and one for
command.

droped messages return the number of messages per ver, so our API was
rename to reflect that.

For dropped messages (command) we currently do not have this logic of
throwing messages before sending, so the API will always return 0.

The total timeout API was removed and will be done on the jmx proxy
level.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-16 13:15:59 +02:00
Gleb Natapov
79ab5e68f4 enable broadcast_address configuration option
Reviewed-by: Asias He <asias@scylladb.com>
2015-11-16 13:14:07 +02:00
Asias He
358648a8ee init: Check seeds address if listen_address != broadcast_address
If listen_address is different than broadcast_address, we should use
broadcast_address for the seeds list. Check and ask user to fix the
configuration, e.g.,

$ scylla --rpc-address 127.0.0.1 --listen-address 127.0.0.1 --broadcast-address 192.168.1.100 --seed-provider-parameters seeds=127.0.0.1
Use broadcast_address instead of listen_address for seeds list: seeds={127.0.0.1}, listen_address=127.0.0.1, broadcast_address=192.168.1.100
Exiting on unhandled exception of type 'std::runtime_error': Use broadcast_address for seeds list
2015-11-16 13:11:44 +02:00
Asias He
c973d2ea65 to_string: Support std::set and std::unordered_set for to_string 2015-11-16 13:11:43 +02:00
Avi Kivity
9575a7b2ea Merge "Gossiper const correctness" from Pekka
"Fix various const correctness issues in gossiper code found while
working on inter-node protocol."
2015-11-16 13:08:53 +02:00
Asias He
b5cc3ac81c transport: Fix cql server stop
Abort accept and kill existing connections and wait for them.

Fix tests in debug mode:

==29352==ERROR: AddressSanitizer: heap-use-after-free on address 0x60c000012758
at pc 0x00000211e371 bp 0x7ffe27369e10 sp 0x7ffe27369e00
READ of size 8 at 0x60c000012758 thread T0
    #0 0x211e370 in transport::cql_server::do_accepts(int)::{
    lambda(connected_socket, socket_address)#1}::operator()(connected_socket,
    socket_address)::{lambda(future<>)#1}::operator()(future) const
    (/home/asias/.dtest/dtest-AOBJua/test/node2/bin/scylla+0x211e370)
    #1 0x21a8090 in do_void_futurize_apply<transport::cql_server::do_accepts(int)::<
    lambda(connected_socket, socket_address)> mutable::<lambda(future<>)>, future<> >
    /home/asias/src/cloudius-systems/scylla/seastar/core/future.hh:1078
    #2 0x217e861 in apply<transport::cql_server::do_accepts(int)::<
    lambda(connected_socket, socket_address)> mutable::<lambda(future<>)>, future<> >
    /home/asias/src/cloudius-systems/scylla/seastar/core/future.hh:1126
    #3 0x223deb9 in _ZZN6futureIJEE12then_wrappedIZZN9transport10cql_server10do_accepts
    EiENUl16connected_socket14socket_addressE_clES4_S5_EUlS0_E_S0_EET0_OT_ENUlSA_E_
    clI12future_stateIJEEEEDaSA_
    (/home/asias/.dtest/dtest-AOBJua/test/node2/bin/scylla+0x223deb9)
2015-11-16 13:06:20 +02:00
Gleb Natapov
eb220507ce storage_proxy: use correct endpoint address for mutation acks processing
Write handler keeps track of all endpoints that not yet acked mutation
verb. It uses broadcast address as an enpoint id, but if local address
is different from broadcast address for local enpoints acknowledgements
will come from different address, so socket address cannot be used as
an acknowledgement source. Origin solves this by sending "from" in each
message, it looks like an overhead, solve this by providing endpoint's
broadcast address in rpc client_info and use that instead.
2015-11-16 10:29:47 +01:00
Tomasz Grabiec
a6e73d40d7 Move seastar.git HEAD
Gleb Natapov (2):
      rpc: allow non const client_info to be passed to rpc handler
      rpc: allow auxiliary data to be stored in client_info
2015-11-16 10:29:47 +01:00
Pekka Enberg
3c72ea9f96 gms: Fix gossiper::handle_major_state_change() restart logic
The restart logic is wrong because C* had a bug in
bf599fb5b062cbcc652da78b7d699e7a01b949ad and they fixed later and we
translated the broken version. We must check if there is an existing
endpoint state and call on_restart() hooks on that, not the newly
available endpoint state.

Spotted while inspecting the code.

Acked-by: Asias He <asias@scylladb.com>
2015-11-16 11:25:48 +02:00
Tomasz Grabiec
7e0f99cc3b Merge tag 'native-preparatory/v1' from https://github.com/avikivity/scylla.git
Assorted patches that pave the way for native storage (while not
committing us in any way).
2015-11-16 10:01:38 +01:00
Tomasz Grabiec
b6e7312c07 Merge 'memtable-allocating_section/v2' from https://github.com/avikivity/scylla.git
From Avi:

Memtables do not use an allocating_section to guard against allocation
failure, and hence can fail an allocation.  Reproducible by changing
perf_mutation to use an allocating type (bytes_type with a nontrivial
size) and making the loop longer.

Fix by using an allocating_section.
2015-11-16 10:01:05 +01:00
Avi Kivity
a40a62d840 memtable: use allocating_section to guard allocations
Without this, an allocation can fail, and we may not be able to reclaim
memory.
2015-11-16 10:56:06 +02:00
Pekka Enberg
5f48f4aa7c gms/heart_beat_state: Make get_heart_beat_version() const
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-11-16 08:12:25 +02:00
Pekka Enberg
dc6eaa3a44 gms/gossip_digest: Make some members const
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-11-16 08:12:16 +02:00
Pekka Enberg
5815bde695 gms/gossip_digest_ack: Make some members const
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-11-16 08:12:16 +02:00
Avi Kivity
1c425d6b50 logalloc: allow allocating_section code blocks to return references 2015-11-15 19:10:24 +02:00
Tomasz Grabiec
6626d1a100 storage_proxy: Stop latency_counter before throwing 2015-11-15 10:34:55 +02:00
Glauber Costa
0989f80c97 provide cf_stats where one is needed
Recently, I have introduced cf_stats into the database, propagating all the way
back to the column family. The problem, however, is that some tests create a
column family config themselves instead of going through make_column_family.

That is ultimately ok if those tests are not expected to flush memtables. But
if they are, the cf_stats pointer will be null and we will crash. Although
there are many solutions to this, the one that is in tune with our current
practices is to have the test that requires it provide an empty cf_stats storage
area that can be written to. That's already how we handle the disk directory and
other things like compaction properties.

With this patch, test.py passes again.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-15 10:31:32 +02:00
Glauber Costa
fe2b928a3f change defaults for commitlog maximum size
arbitrary 8G -> all_memory.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-15 10:29:23 +02:00
Glauber Costa
00c12319f1 config: change type for commitlog maximum size config option
This patch substitutes uint64_t for uint32_t as the type for
commitlog_total_space_in_mb.  Moving to 64 is not strictly needed, since even a
signed 32-bit type would allow us to easily handle 2TB. But since we store that
in the commitlog as a 64-bit value, let's match it.

Moving from unsigned to signed, however, allow us to represent negative
numbers.  With that in place, we can change the semantics of the value
slightly, so to allow a negative number to mean "all memory".

The reason behind this, is that the default value "8GB", is an artifact of the
JVM.  We don't need that, and in many-shards configuration, each shard flushes
the commitlog way too often, since 8GB / many_shards = small_number.

8GB also happens to be a popular heap size for C* in the JVM. For us, we would
like to equate that (at least) with the amount of memory. The problem is how to
do that without introducing new options or changing the semantics of existing
options too radically.

The proposed solution will allow us to still parse C* yaml files, since those
will always have positive numbers, while introducing our own defaults.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-15 10:29:23 +02:00
Avi Kivity
baf3b2ff4f Merge "Eliminate errors/warnings on building ubuntu package" from Takuya
"Fixes part of #493 (few warnings still remains)"
2015-11-15 10:26:19 +02:00
Takuya ASADA
df9e7d542f dist: ignore bogus 'dist/ubuntu/debian/source/lintian-overrides' on ubuntu package building 2015-11-15 15:03:02 +09:00
Takuya ASADA
d40ce51a01 dist: fix 'init.d-script-missing-dependency-on-remote_fs' message on building ubuntu package 2015-11-15 15:03:02 +09:00
Takuya ASADA
87ed97c387 dist: fix 'init.d-script-not-included-in-package' message on building ubuntu package 2015-11-15 15:03:02 +09:00
Takuya ASADA
100c22cfa5 dist: place 'default' file by debian way 2015-11-15 15:03:02 +09:00
Takuya ASADA
7c4c182268 dist: move redhat/sysconfig to common/sysconfig
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
0c9f837d04 dist: fix 'source-is-missing' message on building ubuntu package 2015-11-15 15:03:02 +09:00
Takuya ASADA
580ca26092 dist: fix 'extended-description-line-too-long' message on building ubuntu package 2015-11-15 15:03:02 +09:00
Takuya ASADA
ee45201b5a dist: make ubuntu package as 'debian non-native package'
Debian package system has two types of package, 'native' and 'non-native'.
'native' is the package just for Debian, it contains debian/ directory source tar.gz, doesn't have debian.tar.gz.
'non-native' has orig.tar.gz which is upstream source code tar ball, then it has debian.tar.gz which contains debian/ directory.
Scylla is 'native' now but should be 'non-native' since this is not just for Debian, so move debian/ to dist/ubuntu/, make orig.tar.gz using git-archive-all, copy dist/ubuntu/debian/ to debian/ then generate debian.tar.gz.
2015-11-15 15:03:02 +09:00
Takuya ASADA
c9c9a86195 dist: fix 'maintainer-script-needs-depends-on-adduser postinst' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
922f747122 dist: ignore some files under debian/ which are created on building time
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
10a7bd2d4a dist: fix 'wrong-section-according-to-package-name' and 'debug-package-should-be-priority-extra' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
63e262e507 correct permission of cassandra-rackdc.properties
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
91bd7cf71a dist: fix 'maintainer-script-lacks-debhelper-token' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
89fe283506 dist: fix 'ancient-standards-version 3.9.2' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
c09226fda8 dist: fix 'missing-license-paragraph-in-dep5-copyright' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Avi Kivity
36994a5d08 managed_bytes: add a constructor from std::initializer_list<>
Not actually used in the patchset now, but nice.
2015-11-13 17:13:07 +02:00
Avi Kivity
f3afe3e876 allocation_strategy: constify migrate_fn
Since abstract_type will be providing our migrate_fn, they must be const,
and indeed a migration does not change the migration function.
2015-11-13 17:13:07 +02:00
Avi Kivity
a4a776e66c cql3: add operation::make_cell() helpers
atomic_cell will soon become type-aware, so add helpers to class operation
that can supply the type, as it is available in operation::column.type.

(the type will be used in following patches)
2015-11-13 17:13:07 +02:00
Avi Kivity
79f7431a03 db: change collection_mutation::{one,view} not to use nested classes
Nested classes cannot be forward-declared, so change the naming
not to use them.  Follows atomic_cell{,_view}.
2015-11-13 17:13:07 +02:00
Avi Kivity
3fcb7add2e types: fix concrete_type::native_type_move()
The source is modified during a move, and so must not be const.
2015-11-13 17:13:07 +02:00
Avi Kivity
68a902ad0c data_value: add constructor from bool
schema_tables manages some boolean columns stored in system tables; it
dynamically creates them from C++ values.  But as we lacked bool->data_value
conversion, the C++ value was converted to a int32_type.  Somehow this didn't
cause any problems, but with some pending patches I have, it does.

Add a bool->data_value converting constructor to fix this.
2015-11-13 17:13:07 +02:00
Avi Kivity
47499dcf18 data_value: make conversion from bytes explicit
Since bytes is a very generic value that is returned from many calls,
it is easy to pass it by mistake to a function expecting a data_value,
and to get a wrong result.  It is impossible for the data_value constructor
to know if the argument is a genuine bytes variable, a data_value of another
type, but serialized, or some other serialized data type.

To prevent misuse, make the data_value(bytes) constructor
(and complementary data_value(optional<bytes>) explicit.
2015-11-13 17:12:29 +02:00
Asias He
2ebd18a248 streaming: Add two more virtual destructors
For stream_event_handler and stream_task.
2015-11-13 09:55:19 +02:00
Asias He
25a03013bf storage_service: Fix cql and thrift server stop in decommission
When do_stop_native_transport exits, cserver is destroyed which can
happen before cserver->stop(). Fix by capturing cserver in
cserver->stop()'s continuation to extend its lifetime. The same for
thrift server.

scylla: scylla/seastar/core/sharded.hh:327: seastar::sharded<Service>::~sharded()
[with Service = transport::cql_server]: Assertion `_instances.empty()' failed.
2015-11-13 09:40:07 +02:00
Tomasz Grabiec
b883ed3abf paging_state: Move instead of copy when possible 2015-11-12 20:19:00 +02:00
Glauber Costa
fa1ae45218 database: export collectd metrics about the state of memtable flushing
When analyzing a recent performance issue, I found helpful to keep track of
the amount of memtables that are currently in flight, as well as how much memory
they are consuming in the system.

Although those are memtable statistics, I am grouping them under the "cf_stats"
structure: being the column family a central piece of the puzzle, it is reasonable
to assume that a lot of metrics about it would be potentially welcome in the future.

Note that we don't want to reuse the "stats" structure in the column family: for once,
the fields not always map precisely (pending flushes, for instance, only tracks explicit
flushes), and also the stats structure is a lot more complex than we need.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-12 20:17:22 +02:00
Amnon Heiman
cda79c9a31 API: add get_natural_endpoints to storage_service
This adds the get_natural_endpoints implementation to the
storage_service API.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-12 14:55:19 +02:00
Amnon Heiman
3efa52c345 storage_service: add get_natural_endpoints implementation
This adds the get_natural_endpoints implementation similiar to origin.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-12 14:53:06 +02:00
Tomasz Grabiec
f3f2bf0b44 schema: Move definitions to source file 2015-11-12 13:50:01 +02:00
Pekka Enberg
551f5585bf gms/gossip_digest_syn: Make get_gossip_digests() const
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-11-12 11:51:28 +02:00
Avi Kivity
6a9ed4a4eb Merge seastar upstream
* seastar 5c10d3e...20bf03b (5):
  > do not re-throw exception to get to an exception pointer
  > Adding timeout counter to the rpc
  > configure.py: support for pkg-config before release 0.28
  > future: don't forget to warn about ignored exception
  > tutorial: continue network API section
2015-11-12 11:19:52 +02:00
Asias He
6aa5bfe59f range_streamer: Add virtual destructor to i_source_filter
Found by debug build

==10190==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x602000084430 in thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   16 bytes;
  size of the deallocated type: 8 bytes.
    #0 0x7fe244add512 in operator delete(void*, unsigned long) (/lib64/libasan.so.2+0x9a512)
    #1 0x3c674fe in std::default_delete<dht::range_streamer::i_source_filter>::operator()(dht::range_streamer::i_source_filter*)
       const /usr/include/c++/5.1.1/bits/unique_ptr.h:76
    #2 0x3c60584 in std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> >::~unique_ptr()
       /usr/include/c++/5.1.1/bits/unique_ptr.h:236
    #3 0x3c7ac22 in void __gnu_cxx::new_allocator<std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> > >::destroy<std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> > >(std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> >*) /usr/include/c++/5.1.1/ext/new_allocator.h:124
...
2015-11-12 11:19:22 +02:00
Avi Kivity
47c7dd96c5 Merge "Aggregate paging support" from Calle
"This adds repeated paged querying to do aggregate queries (similar to
origin). Uses "batched" paging."

Fixes #549
2015-11-11 20:47:14 +02:00
Calle Wilund
fdc549cd47 select_statement: Handle aggregate queries
Fixes #549.

Being clinically absent-minded, aggregate query support (i.e. count(...))
was left out of the "paging" change set.

This adds repeated paged querying to do aggregate queries (similar to
origin). Uses "batched" paging.
2015-11-11 18:41:47 +01:00
Calle Wilund
cc88763961 query_pager: Add method for repeated queries
fetch_page method that instead of returning a full result set, adds row
to a pre-existing one. For "batching".
2015-11-11 18:40:14 +01:00
Amnon Heiman
27737d702b API: Stubing the compaction manager - workaround
Until the compaction manager api would be ready, its failing command
causes problem with nodetool related tests.
Ths patch stub the compaction manager logic so it will not fail.

It will be replaced by an actuall implementation when the equivelent
code in compaction will be ready.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-11 16:58:13 +02:00
Amnon Heiman
66e428799f API: add compaction info object
This patch adds a compaction info object and an API that returns it.
It will be mapped to the JMX getCompactions that returns a map.

The use of an object is more RESTFull and will be better documented in
the swagger definition file.
2015-11-11 16:57:44 +02:00
Amnon Heiman
1b369be663 compaction_strategy should accept both class name and full class name
For compatibility reasons, compaction_strategy should accept both class
name strategy and the full class name that includes the package name.

In origin the result name depends on the configuration, we cannot mimic
that as we are using enum for the type.

So currently the return class name remains the class itself, we can
consider changing it in the future.

If the name is org.apache.cassandra.db.compaction.Name the it will be
compare as Name

The error message was modified to report the name it was given.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Fixes #545
2015-11-11 15:31:39 +02:00
Avi Kivity
b95ff5c09e Merge "Commitlog format change: add scylla "magic"" from Calle
"Slight file format change for commitlog segments, now incluing
a scylla "marker". Allows for fast-fail if trying to load an
Origin segment.

WARNING: This changes the file format, and there is no good way for me to
check if a CL is "old" scylla, or Origin (since "version" is the same). So
either "old" scylla files also fail, or we never fail (until later, and
worse). Thus, if upgrading from older to this patch ensure to
have cleaned out all commit logs first."
2015-11-11 11:42:47 +02:00
Avi Kivity
dcc7302312 Merge "Paging support" from Calle
Fixes #355

"Implements query paging similar to origin. If driver sets a "page size" in
a query, and we cannot know that we will not exceed this limit in a single
query, the query is performed using a "pager" object, which, using modified
partition ranges and query limits, keeps track of returned rows to "page"
through the results.

Implementation structure sort of mimics the origin design, even though it
is maybe a little bit overkill for us (currently). On the other hand, it
does not really hurt.

This implementation is tested using the "paging_test" subset in dtest.
It passes all test except:

* test_paging_using_secondary_indexes
* test_paging_using_secondary_indexes_with_static_cols
* test_failure_threshold_deletions

The two first because we don't have secondary indexes yet, the latter
because the test depends on "tombstone_failure_threshold" in origin.

Potential todo: Currently the pager object does not shortcut result
building fully when page limit is exceeded. Could save a little work
here, but probably not very significant."
2015-11-11 10:45:41 +02:00
Avi Kivity
5aecf210e2 Merge "gossip shutdown fix + streaming fix" from Asias
Fixes: #540 #542
2015-11-11 10:27:43 +02:00
Asias He
efda753c0c token_metadata: Implement pending_endpoints_for
It is used in storage_proxy::create_write_response_handler. The second
argument should be keyspace name instead of the keyspace class.

Refs: #539
2015-11-11 09:41:21 +02:00
Calle Wilund
43712a583d commitlog_replayer: Special case exception from "old/origin file"
And write some nice informative stuff.
2015-11-10 17:14:22 +01:00
Calle Wilund
85b8d65374 commitlog: Change file format to include magic marker
Allows us fail fast if someone tries to replay an Origin commit log.

WARNING: This changes the file format, and there is no good way for me to
check if a CL is "old" scylla, or Origin (since "version" is the same). So
either "old" scylla files also fail, or we never fail (until later, and
worse). Thus, if upgrading from older to this patch, likewise, ensure to
have cleaned out all commit logs first.
2015-11-10 17:11:06 +01:00
Glauber Costa
72573d0b46 storage proxy: be more vocal about timeouts
If a timeout happens, we should log it. Trace level is really
not adequate for this.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-10 16:41:22 +02:00
Calle Wilund
ecd7674867 select_statement: Paging
Check if paging might be needed, and if so, use a paging object.
Similar to origin, but without all the filters.
2015-11-10 13:16:06 +01:00
Calle Wilund
13189c3176 query_pagers: Paging implementation
* Static query method to determine if paging might be required
(very conservative - almost all querys will be paged me thinks).
* Static factory method for pager
* Actual pager implementation

Pager object uses three variables to keep track of paging state:
1.) Last partition key - partition key of last partion processed 
	-> next partition to start process
2.) Last clustering key, i.e. row offset within last key partition, 
	i.e. how far we got last time
3.) Max remaining - max rows to process further, i.e. initial limit - 
    processed so far
    
Partition ranges are modified/removed so that we begin with "Last key", 
if present. (Or end with, in the case of reversed processing)

A counting visitor then keeps count of rows to include in processing.
2015-11-10 13:16:05 +01:00
Calle Wilund
bef3604f5a query_pager: Pure virtual interface
Basic interface for paging control objects.

We probably do not need virtual behaviour for paging, but on the other
hand it does not really cost much, and it keeps a nice symmetry with
origin.
2015-11-10 13:12:34 +01:00
Calle Wilund
284b10cabe Make partition_slice::row_ranges mulitplex on partition
Allows for having more than one clustering row range set, depending on
PK queried (although right now limited to one - which happens to be exactly
the number of mutiplexing paging needs... What a coincidence...)

Encapsulates the row_ranges member in a query function, and if needed holds
ranges outside the default one in an extra object.

Query result::builder::add_partition now fetches the correct row range for
the partition, and this is the range used in subsequent iteration.
2015-11-10 13:12:33 +01:00
Calle Wilund
d8cafa8dec cql_server::connection::read_options: Deserialize paging state 2015-11-10 13:12:33 +01:00
Calle Wilund
545d3151e2 paging_state implementation
Note: serial format blob is different compared to origin, due to scyllas
different internal architecture. I.e. we query actual rows.
But drivers etc ignore the content of the blob, it is opaque.
2015-11-10 13:12:33 +01:00
Calle Wilund
7e2569c680 result_set: Make "paging_state" a pointer to const member
Paging state is immutable. Constness is nice.
2015-11-10 13:12:33 +01:00
Calle Wilund
4a1a17defc cql3::selection: Move result set building visitor to result_set_builder
Allows its use (and partial override - hint hint) in more place than
one.
2015-11-10 13:12:33 +01:00
Calle Wilund
820ba3540b result_view: Add static helper method to do the dispatch from select
Again, makes it easier to use the same optimization/technique at more than
one location.
2015-11-10 13:12:33 +01:00
Calle Wilund
23b6240dad cql3::selection: Fix some constness correctness 2015-11-10 13:12:33 +01:00
Calle Wilund
369e09459c cql3::result_set: Add non-const metadata getter. 2015-11-10 13:12:33 +01:00
Calle Wilund
0fa543800a data_output: Template "blob" writers (bytes*) to allow for varying "size" type 2015-11-10 13:12:33 +01:00
Calle Wilund
9ee8204993 data_input: Fix missing bounds check 2015-11-10 13:12:33 +01:00
Asias He
19a6dfcfd0 streaming: stream_session print stream_session_state properly 2015-11-10 15:39:34 +08:00
Asias He
7506d57dec streaming: Add operator<< for stream_session_state 2015-11-10 15:39:34 +08:00
Asias He
860c7aff37 streaming: Print plan_id in logger 2015-11-10 15:39:34 +08:00
Asias He
d2e5d13e69 streaming: Set state to STREAMING only if we really have data to sent 2015-11-10 15:39:34 +08:00
Asias He
fcf7486d4c streaming: Improve state transition log for maybe_completed and complete 2015-11-10 15:39:34 +08:00
Asias He
72a7a6bd9b streaming: session close
Currently, there are multiple places we can close a session, this makes
the close code path hard to follow. Remove the call to maybe_completed
in follower_start_sent to simplify closing a bit.

- stream_session::follower_start_sent -> maybe_completed()
- stream_session::receive_task_completed -> maybe_completed()
- stream_session::transfer_task_completed -> maybe_completed()
- on receive of the COMPLETE_MESSAGE -> complete()
2015-11-10 15:39:34 +08:00
Asias He
cadf8b1484 streaming: Handle stream_plan with no range added
If no ranges for neither sending nor receiving are added for the stream
plan, the stream plan is empty. Return a ready future immediately.
2015-11-10 15:39:34 +08:00
Asias He
13934140f6 streaming: Remove bogus file size info for prepare completed
Scylla does not streaming sstable files directly for streaming. The file
size info is incorrect, let's get rid of it.
2015-11-10 15:39:34 +08:00
Asias He
ac1977486d gossip: Fix shutdown
nodetool decommission node 127.0.0.2, on node 127.0.0.1, I saw:

DEBUG [shard 0] gossip - failure_detector: Forcing conviction of 127.0.0.1
TRACE [shard 0] gossip - convict ep=127.0.0.1, phi=8, is_alive=1, is_dead_state=0
TRACE [shard 0] gossip - marking as down 127.0.0.1
INFO [shard 0] gossip - inet_address 127.0.0.1 is now DOWN
DEBUG [shard 0] storage_service - on_dead endpoint=127.0.0.1

This is wrong since the argument for send_gossip_shutdown should be the
node being shutdown instead of the live node.
2015-11-10 15:39:34 +08:00
Avi Kivity
fee9688ae8 Merge "Fixes for collections of collections" from Paweł
"These are some fixes for removing items from collections of frozen
collections."
2015-11-10 09:38:13 +02:00
Paweł Dziepak
e494b2c1a0 tests/cql: add tests for collections of collections
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-09 16:37:50 +01:00
Paweł Dziepak
ee182f39a5 cql3/sets: simplify sets::discarder
Since the introduction of sets::element_discarder sets::discarder is
always given a set, never a single value.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-09 15:24:25 +01:00
Paweł Dziepak
0b0cef2457 cql3/maps: do not assume that the term type is constants::value
Fixes #516.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-09 15:18:07 +01:00
Paweł Dziepak
101ee1affd cql3/sets: add element_discarder
Currently sets::discarder is used by both set difference and removal of
a single element operations. To distinguish between them the discarder
checks whether the provided value is a set or something else, this won't
work however if a set of frozen sets is created.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-09 14:47:09 +01:00
Tomasz Grabiec
488528d1cd range: Fix range::subtract() for some cases
Was not handling wrap-around cases like this properly:

 (8, 3) - (2, 1)
2015-11-09 14:20:31 +02:00
Glauber Costa
8e0ad183b9 dist: AMI: add discard mount option
Crucial on SSDs, inocuous (hopefully) on HDDs. Let's add it to make peace
with our inner selves.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-09 13:31:01 +02:00
Asias He
d622fe867e gossip: Pass const ref if possible
It is clear that we will not change the parameter.
2015-11-09 13:01:37 +02:00
Asias He
615e1362da gossip: Add const version for get_heart_beat_state and friends
get_endpoint_state_map
get_application_state_map
2015-11-09 13:01:36 +02:00
Gleb Natapov
d77a2a0f03 do not try to write same memtable to sstable twice if moving it to a cache failed.
Error handling in column_family::try_flush_memtable_to_sstable() is
misplaced. It happens after update_cache(), so writing sstable may
have succeeded, but moving memtable into the cache may have failed.
update_cache() destroys memtable even if it fails, but error handler
is not aware of it (it does not even distinguish whether error happened
during sstable creation or moving into cache) and when it tells caller
to retry it retries with already destroyed memtable. Fix it by ignoring
moving to cache errors.
2015-11-09 11:27:37 +01:00
Avi Kivity
cb93af2ad7 Revert "do not try to write same memtable to sstable twice if moving it to a cache failed."
This reverts commit fff37d15cd.

Says Tomek (and the comment in the code):

"update_cache() must be called before unlinking the memtable because cache + memtable at any time is supposed to be authoritative source of data for contained partitions. If there is a cache hit in cache, sstables won't be checked. If we unlink the memtable before cache is updated, it's possible that a query will miss data which was in that unlinked memtable, if it hits in the cache (with an old value)."
2015-11-09 11:22:12 +02:00
Avi Kivity
975a3eb49f Merge "move node support + decommission fix" from Asias
"Usage example:

$ nodetool -p 7200 move -- "-9223372036854775000"

or

$ curl -X POST --header "Content-Type: application/json" --header "Accept: application/json"
  "http://127.0.0.2:10000/storage_service/move?new_token=8000000"

Note: Tomek's range subtract and cql_test_env fix for system_keyspace seriers are included."
2015-11-09 11:08:51 +02:00
Tzach Livyatan
0e9b3e4d37 add build and run from source instructions in README.md
fix #451

Signed-off-by: Tzach Livyatan <tzach@cloudius-systems.com>
2015-11-09 10:05:05 +02:00
Gleb Natapov
fff37d15cd do not try to write same memtable to sstable twice if moving it to a cache failed.
Error handling in column_family::try_flush_memtable_to_sstable() is
misplaced. It happens after update_cache(), so writing sstable may
have succeeded, but moving memtable into the cache may have failed.
update_cache() destroys memtable even if it fails, but error handler
is not aware of it (it does not even distinguish whether error happened
during sstable creation or moving into cache) and when it tells caller
to retry it retries with already destroyed memtable. Fix it by ignoring
moving to cache errors.
2015-11-09 09:56:45 +02:00
Asias He
85b42e362b storage_service: Fix hang of decommission
nodetool decommission hangs forever due to a recursive lock.

decommission()
  with api lock
  shutdown_client_servers()
     with api lock
       stop_rpc_server()
     with api lock
       stop_native_transport()

Fix it by calling helpers for stop_rpc_server and stop_native_transport
without the lock.
2015-11-09 08:43:04 +08:00
Asias He
cb8b0eedfc token_metadata: Fix set_difference in calculate_pending_ranges
std::set_difference requires the container to be sorted.
2015-11-09 08:43:04 +08:00
Asias He
87292d6a16 range_streamer: Simplify unordered_multimap_to_unordered_map
operator[] is own friend, it creates map[x] if x is not in the map.
2015-11-09 08:43:04 +08:00
Asias He
a54989cd65 range_streamer: Fix get_all_ranges_with_strict_sources_for
std::set_difference requires the container to be sorted which is not
true here, use remove_if.

Do not use assert, use throw instead so that we can recover from this
error.
2015-11-09 08:43:04 +08:00
Asias He
145c122cb7 storage_service: Enable commented out in on_restart
on_dead is now available.
2015-11-09 08:43:04 +08:00
Asias He
1ab1a1d8d9 storage_service: Fix bogus log in on_dead 2015-11-09 08:43:04 +08:00
Asias He
9f69b4c969 storage_service: Enable add_moving_endpoint in handle_state_moving 2015-11-09 08:43:04 +08:00
Asias He
c90e9c97f5 token_metadata: Add add_moving_endpoint 2015-11-09 08:43:04 +08:00
Asias He
0fc6e76d1e storage_service: Make fd a reference in remove_node 2015-11-09 08:43:04 +08:00
Asias He
2d55e5e8ff api: Implement storage_service::move
Start node 1 and node 2:

node 1:
$ scylla --initial-token 0

node 2:
$ scylla --initial-token 2000000

To move node 2 to a new token 8000000

$ curl -X POST --header "Content-Type: application/json" --header "Accept: application/json"
"http://127.0.0.2:10000/storage_service/move?new_token=8000000"
2015-11-09 08:43:04 +08:00
Asias He
d124df17b4 storage_service: Use get_local_tokens in set_tokens
We are using _bootstrap_tokens since get_local_tokens was not available.
2015-11-09 08:43:04 +08:00
Asias He
e637faefbe storage_service: Implement move
Moves the node on the token ring to a new token.

$ nodetool move <new token>
2015-11-09 08:43:04 +08:00
Asias He
b12d95d955 storage_service: Implement calculate_to_from_streams 2015-11-09 08:43:04 +08:00
Asias He
d166b0f3fa range_streamer: Add get_work_map 2015-11-09 08:43:04 +08:00
Asias He
ada2466e18 token_metadata: Add clone_after_all_settled
Needed by storage_service::range_relocator::calculate_to_from_streams.
2015-11-09 08:43:04 +08:00
Asias He
47cf4dfbc5 storage_service: Implement range_relocator class 2015-11-09 08:43:04 +08:00
Asias He
53d30ed985 storage_service: Implement calculate_stream_and_fetch_ranges
This is needed by calculate_to_from_streams.
2015-11-09 08:42:56 +08:00
Tomasz Grabiec
3c4c83c66f cql_test_env: Initialize system keyspace 2015-11-09 08:42:53 +08:00
Tomasz Grabiec
1c9d8571a8 cql_test_env: Flatten stop() 2015-11-09 08:42:51 +08:00
Tomasz Grabiec
1c1a4c9656 tests: Add test for range::subtract() 2015-11-09 08:42:47 +08:00
Tomasz Grabiec
0f2727b183 range: Implement subtract() 2015-11-09 08:42:45 +08:00
Tomasz Grabiec
48c7f51966 range: Extract bound comparators 2015-11-09 08:42:42 +08:00
Tomasz Grabiec
f37512d1f7 range: Make unwrap() return ranges in-order 2015-11-09 08:42:39 +08:00
Gleb Natapov
9abd724b30 storage_proxy: fix mutation verb error reporting
Currently error code is attached to a future returned by when_all() which
is never is exceptional one, but it may hold exceptional future as a
first element. Move error handling close to where error it tries to
catch is generated instead.
2015-11-08 16:35:41 +02:00
Raphael S. Carvalho
4eb94fbc35 sstables: improve exception handling in compact_sstables
Let's move the code that prints that a compaction succeeded only
after the code that catches exception on either read or write
fibers. Let's also get rid of done and use repeat instead in
the read fiber.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-08 11:24:55 +02:00
Takuya ASADA
368ee17d53 dist: prevent error building ubuntu package on EC2
fixes #491

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-06 22:06:55 +02:00
Gleb Natapov
120ae08082 storage_proxy: fix race during write completion
If write timeout and last acknowledgement needed for CL happen simultaneously
_ready can be sent to be exceptional by the timeout handler, but since
removal of the response handler happens in continuation it may be
reordered with last ack processing and there _ready will be set again
which will trigger assert. Fix it by removing the handler immediately,
no need to wait for continuation. It makes code simpler too.
2015-11-05 19:35:18 +02:00
Paweł Dziepak
64f1c2866c lsa: free segment in trim_emergency_reserve_to_max()
_emergency_reserve is an intrusive containers and it doesn't care about
segment lifetime.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-05 18:04:38 +02:00
Amnon Heiman
5f1cae67ff API: compaction_manager should catch the field reference by value
The get_cm_stats gets a pointer to a field in the stats object. It
should capture it by value or segmentation falut may occure when the
caller gets out of scope.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-05 14:17:12 +02:00
Avi Kivity
c9f02ce42f Merge seastar upstream
* seastar 05b41b8...5c10d3e (3):
  > reactor: more conservative signal handling
  > output_stream: add write(temporary_buffer) variant
  > input_stream: add read() method
2015-11-05 13:20:26 +02:00
Takuya ASADA
b50a6a1521 dist: add missing dist/ubuntu/changelog.in
Missing file for d2dd6b90a9

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-05 11:09:32 +02:00
Glauber Costa
958abc17a0 sstables: be more descriptive with filename parsing errors
Currently, we don't let the user know even what is the filename that failed.
That information should be included in the message.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-05 10:45:59 +02:00
Raphael S. Carvalho
279a8f13ee sstables: compaction: handle failure on read fiber
This assert (in write fiber) would fail if read fiber failed
because the variable done will not be set to true.
The use of assert is very bad, because it prevents scylla
from proceeding, which is possible.
To solve it, let's trigger an exception if done is not true.
We do have code that will wait for both read and write fibers,
and catch exceptions, if any.

Closes #523.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-11-04 16:48:01 +01:00
Tomasz Grabiec
598f4104fd query_processor: Use the correct client_state
Since 4641dfff24, query_state keeps a
copy of client_state, not a reference. Therefore _cl is no longer
updated by queries using _qp. Fix by using the client_state from _qp.

Fixes #525.
2015-11-04 12:43:26 +02:00
Takuya ASADA
ec06fd77a8 dist: split debug symbol to sepalate package on ubuntu
Fixes #524

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-04 12:35:35 +02:00
Asias He
0304567c09 storage_service: Implement get_new_source_ranges
It is needed by restore_replica_count.
2015-11-04 12:21:00 +02:00
Takuya ASADA
d2dd6b90a9 dist: support ./SCYLLA-VERSION-GEN on ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-04 11:50:38 +02:00
Avi Kivity
a41ffb3f4b Merge "support initial_token config option" from Asias
"This is needed by some tests to specify specific tokens."
2015-11-04 11:42:39 +02:00
Paweł Dziepak
a553c9236e transport: do not send response with unknown protocol version
All responses sent from the server have protocol version set to
connection::_version which is set to the version used by the client
in its first message. However, if the protocol version used by the
client is unuspported or invalide the server should use the latest
version it recognizes.

This solves problem with version negotiation with Java driver. The
driver first sends a request in the latest version it recognizes, if
that fails it retries with the version that server has used in the error
message. If that fails as well it gives up. However, since Scylla always
responds with the same version that the client has used the negotiation
always fails if the client supports more protocol version than the
server.

Refs #317.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-11-04 08:59:23 +02:00
Asias He
ed313160c2 storage_service: Add initial_token config option support 2015-11-04 10:42:17 +08:00
Asias He
20ecb0bede database: Introduce get_initial_tokens
Get initial tokens specified by the initial_token in scylla.conf.

E.g.,

   --initial-token "-1112521204969569328,1117992399013959838"
   --initial-token "1117992399013959838"

It can be multiple tokens split by comma.
2015-11-04 10:40:12 +08:00
Asias He
16596385ee token: Handle minimum token correctly in long_token
Fixes:

Exiting on unhandled exception of type 'runtime_exception': runtime
error: Invalid token. Should have size 8, has size 0
2015-11-04 09:01:06 +08:00
Avi Kivity
a060610347 Merge seastar upstream
* seastar 258daf9...05b41b8 (5):
  > resource: fix system memory reservation
  > Add .gitattributes to improve 'git diff' output
  > README: remove irrelevant OSv instructions
  > Add invocation of ninja in Ubuntu section of README
  > collectd: remove incorrect license
2015-11-04 00:18:06 +02:00
Avi Kivity
012f13eb53 Merge "Adding describering API" from Amnon
"This series adds the missing functionality that the nodetool describering would work.

It import the missing functionality from origin.
After this patch the API:
GET /storage_service/describe_ring/{keyspace}

will be available"
2015-11-03 16:54:13 +02:00
Amnon Heiman
ef3c6b2647 API: Add describe ring API implementation
This patch chanages the API to support describe ring instead of describe
ring jmx that will be implemented in the jmx server.

The API will return a list of objects instead of string.

An additional api was added as the equivelent to the jmx call with an
empty param.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-03 16:33:39 +02:00
Amnon Heiman
3809565dbd storage_service: Add method implementation from origin
This patch adds the following methods implementation:
getRpcaddress
getRangeToAddressMap
getRangeToAddressMapInLocalDC
describeRing
getAllRanges

Those methods are used as part of the describe_ring method
implementation.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-03 16:33:20 +02:00
Gleb Natapov
28bb6a3efe messaging_service: fix hanging reference access
Do not pass reference to an on-stack objects to a function that uses
its parameters asynchronously.
2015-11-03 12:10:38 +01:00
Tomasz Grabiec
7a2ac628a0 Fix compilation of types_test 2015-11-03 11:02:35 +01:00
Avi Kivity
74faaa4698 types: implement date_type::from_string()
Luckily all the hard work was already done for timestamp_type.

Fixes #522.
2015-11-03 10:33:20 +01:00
Amnon Heiman
b77ec2bd6a Importing token_range and endpoint_details from origin
The storage server uses the token_range in origin to return inforamtion
about the ring.

This import the structures. The functionality in origin is redundant in
this case and was not imported.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-03 10:17:05 +02:00
Asias He
dafdac6731 ami: Improve scylla raid setup
Use all the disks except the one for rootfs for RAID0 which stores
scylla data. If only one disk is available warn the user since currently
our AMI's rootfs is not XFS.

[fedora@ip-172-31-39-189 ~]$ cat WARN.TXT
WARN: Scylla is not using XFS to store data. Performance will suffer.

Tested on AWS with 1 disk, 2 disks, 7 disk case.

(cherry picked from commit 49d6cba471)
2015-11-03 08:16:09 +08:00
Avi Kivity
389e035d19 Update scylla-ami submodule
* dist/ami/files/scylla-ami c6ddbea...3f37184 (1):
  > Update reflector URL

(cherry picked from commit 440b403089)
2015-11-03 08:12:51 +08:00
Takuya ASADA
a0feaf801c dist: add scylla.repo to fetch scylla rpms on ami
Mistakenly didn't included on yum repository for AMI patchset, but it's needed

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
(cherry picked from commit 8587c4d6b3)
2015-11-03 08:01:09 +08:00
Takuya ASADA
a5a69279ef dist: update packages on ec2 instance first bootup
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
(cherry picked from commit aaeccdee60)
2015-11-03 07:50:43 +08:00
Takuya ASADA
db634dfe01 dist: use scylla repo on ami, instead of locally built rpms
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
(cherry picked from commit 8b98fe5a1c)
2015-11-03 07:50:33 +08:00
Takuya ASADA
63d16f5816 dist: move inline script to setup-ami.sh
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
(cherry picked from commit 2863b8098f)
2015-11-03 07:50:12 +08:00
Amnon Heiman
e562736e82 API: A workaround for nodetool cleanup
The nodetool cleanup command is used in many of the tests, because the
API call is not implemented it causes the tests to fail.

This is a workaround until the cleanup will be implemented, the method
return successfuly.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-02 17:38:48 +02:00
Amnon Heiman
80e5289359 Adding API as an unimplemented cause
Normally an API call that is not implemented should fail, there are
cases that as a workaround an API call is stub, in those cases a warning
is added to indicate that the API is not implemented.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-02 17:38:48 +02:00
Avi Kivity
c9b8c368bd Merge "Adding missing statistics to enable nodetool netstats" from Amnon
"The series adds read repair and storage_service server statistics."
2015-11-02 17:10:43 +02:00
Amnon Heiman
fc7ea6207c API: messaging service, Add getter for the pending and completed server
This patch adds getter for the messeging service server pending and sent
messages.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-02 16:17:47 +02:00
Amnon Heiman
55d7ddafe5 API: Add repond complete to the messaging service swagger
This patch do the following:
It adds a getter for the completed respond messages (i.e. the total
messages that were sent by the server)

It replaces the return mapping for the statistics to use the key, value
notation that is used in the jmx side.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-02 16:17:36 +02:00
Amnon Heiman
afb921a643 API: add read repair stats to the storage_proxy
This adds the API to the storage_proxy read repair stats that replaces
the stub implementation.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-02 16:17:22 +02:00
Amnon Heiman
b6034572dc storage_proxy: Add read repair statistics
This adds the read repair statistics to he storage_proxy stats and adds
to its implementation incrementing the counters value.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-02 16:16:40 +02:00
Amnon Heiman
d5d0653210 messaging_service: Add a function that goes over all the server stats
The API needs to get the stats from the rpc server, that is hidden from the
messaging service API.

This patch adds a foreach function that goes over all the server stats
without exposing the server implementation.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-02 16:15:52 +02:00
Avi Kivity
9c09f43311 Merge "add [background] read/write statistics" from Gleb
"The main objective of the series is to introduce statistics about ongoing
read/writes and especially those that are done in the background (acknowledged,
but uncompleted), but it contains some cleanups as well."
2015-11-02 15:46:07 +02:00
Avi Kivity
e2b661bb78 sstables: fix compile error 2015-11-02 15:32:44 +02:00
Gleb Natapov
9af55b3a7b storage_proxy: add ongoing read statistics
Add statistics for ongoing reads and ongoing background reads. Read is
a background one if it was acknowledged, but there still work to do to
complete it.
2015-11-02 15:02:13 +02:00
Gleb Natapov
463d330a24 storage_proxy: do not capture 'this' needlessly. 2015-11-02 15:01:16 +02:00
Gleb Natapov
5067d027ba storage_proxy: export statistics via collectd 2015-11-02 15:01:16 +02:00
Gleb Natapov
287cf894a0 storage_proxy: count background mutation writes
Count how many writes are running in the background. Write is a
background write if it was acknowledged, but there still work to do to
complete it.
2015-11-02 15:01:12 +02:00
Gleb Natapov
afa61cb87c storage_proxy: separate methods from members with access specifier 2015-11-02 15:00:47 +02:00
Gleb Natapov
62e5ad27d8 storage_proxy: devirtualize total_block_for()
total_block_for() is defined only in base class, so make it non virtual.
2015-11-02 15:00:47 +02:00
Gleb Natapov
59d8f9f392 storage_proxy: move proxy pointer to write response handler
We will need it later there.
2015-11-02 15:00:47 +02:00
Gleb Natapov
ca0177f5cc storage_proxy: simplify timeout handling
Use _cl_achieved instead of calculating if there is enough acks.
2015-11-02 15:00:47 +02:00
Gleb Natapov
4838de008f storage_proxy: get rid of semaphore in mutation write response handler
Acks are counted in _cl_acks, so there is no need to count them once
more in semaphore, just make promise available when there is enough of
them.
2015-11-02 15:00:47 +02:00
Gleb Natapov
9381ad0741 storage_proxy: initialize _stats 2015-11-02 15:00:47 +02:00
Avi Kivity
b1d13636bb Merge "locator: use the correct snitch instance in gossiping_property_file_snitch::reload_configuration()" from Vlad
"Commit 4cd9c4c0c5441cf55e280c6f2f2e5529426b9c98 introduced a minor
issue: a wrong snitch instance may be used when updating a Gossiper state
(if I/O CPU is different from CPU0).

In order to fix this issue a local snitch instance on CPU0 should be used,
just like a Gossiper local instance.

We have to move some interfaces to i_endpoint_snitch
from being private in a gossiping_property_file_snitch in order to be
able to access it using snitch_ptr handle."
2015-11-02 14:09:27 +02:00
Vlad Zolotarov
b654d942b4 locator::gossiping_property_file_snitch: don't ignore a returned future
Don't ignore yet another returned future in reload_configuration().

Since commit 5e8037b50a
storage_service::gossip_snitch_info() returns a future.

This patch takes this into an account.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-11-02 13:44:53 +02:00
Vlad Zolotarov
689d5fb000 locator::gossiping_property_file_snitch: fix in reload_configuration()
When we access a gossiper instance we use a _gossip_started
state of a snitch, which is set in a gossiper_starting() method.

gossiper_starting() method however is invoked by a gossiper on CPU0
only therefore the _gossip_started snitch state will be set for an
instance on CPU0 only.

Therefore instead of synchronizing the _gossip_started state between
all shards we just have to make sure we check it on the right CPU,
which is CPU0.

This patch fixes this issue.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-11-02 13:44:53 +02:00
Vlad Zolotarov
5da4e62a59 locator::i_endpoint_snitch: align the _prefer_local parameter with _my_dc and _my_rack
Adjust the interface and distribution of prefer_local parameter read
from a snitch property file with the rest of similar parameters (e.g. dc and rack):
they are read and their values are distributed (copied) across all shards'
instances.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-11-02 13:44:53 +02:00
Vlad Zolotarov
5042f3c952 locator::i_endpoint_snitch_base: make reload_gossiper_state() a virtual function
Make reload_gossiper_state() be a virtual method
of a base class in order to allow calling it using a snitch_ptr
handle.

A base class already has a ton of virtual methods so no harm is
done performance-wise. Using virtual methods instead of doing
dynamic_cast results in a much cleaner code however.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-11-02 13:44:53 +02:00
Vlad Zolotarov
926ce145db locator::i_endpoint_snitch_base: move _gossip_started to the base class
Move the member and add an access method.
This is needed in order to be able to access this state using
snitch_ptr handle.

This also allows to get rid of ec2_multi_region_snitch::_helper_added
member since it duplicates _gossip_started semantics.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-11-02 13:44:31 +02:00
Takuya ASADA
919d94a52b dist: update Ubuntu's g++ to 5.x
Now ubuntu-toolchain-r repo has g++-5 package for 14.04, so switch to it

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-02 13:31:19 +02:00
Avi Kivity
4e3d281402 Merge 2015-11-02 13:31:02 +02:00
Avi Kivity
d436617adc Merge seastar upstream
* seastar 9ae6407...258daf9 (6):
  > rpc server: Add pending and sent messages to server
  > scripts: posix_net_conf.sh: Use a generic logic for RPS configuring
  > scripts: posix_net_conf.sh: allow passing a NIC name as a parameter
  > doc: link to the tutorial
  > tutorial: begin documenting the network API
  > slab: remove bogus uintptr_t definition
2015-11-02 13:30:26 +02:00
Avi Kivity
c6968b2940 Make abstract_type::serialize(data_value) a member of data_value
Reduces duplication.  Fixes #517.
2015-11-02 12:25:35 +01:00
Asias He
66a576c0c0 tests: Add documents to tests/gossip.cc
- How to run the test
- What to expect
2015-11-02 11:29:43 +02:00
Avi Kivity
4b1fffe54c Merge "storage_service: serialize more management operations" from Asias
- Add run_with_no_api_lock
- Do not hold the api lock while do the streaming for rebuild
- Include "config: Enable storage_port option" in this series
2015-11-02 10:55:57 +02:00
Avi Kivity
107877f33a Merge "Futurize all callers of add_local_application_state" from Asias
"In 5e8037b50a (gossip: Futurize
add_local_application_state()) , we futurized add_local_application_state.
However, not all of the callers are futurized. Fix it up."
2015-11-02 10:22:01 +02:00
Asias He
f9ff98f4c4 tests: Fix ignored future for add_local_application_state 2015-11-02 09:34:17 +08:00
Asias He
6748bc686f load_broadcaster: Fix ignored future in start_broadcasting 2015-11-02 09:22:05 +08:00
Asias He
a8724a046e migration_manager: Fix ignored future in passive_announce
add_local_application_state returns future. Do not ignore it.
2015-11-02 09:21:55 +08:00
Asias He
e3c5a31e85 gossip: Futurize gossiper_starting
gossiper_starting calls gossiper::add_local_application_state which
returns a future, so futurize gossiper_starting as well.
2015-11-02 09:10:48 +08:00
Avi Kivity
6470cfc63e Merge "Updates for EC2Snitchs"
"- Fix snitch names from EC2XXX to Ec2XXX to align with configuration.
- Copy cassandra-rackdc.properties file to /var/lib/scylla/conf
- Set SCYLLA_HOME before booting process"
2015-11-01 18:52:23 +02:00
Shlomi Livne
67899b7b51 dist: set SCYLLA_HOME used to find the configuration and property files
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2015-11-01 17:31:47 +02:00
Shlomi Livne
f0cd241478 dist: fedora/rhel/centos copy cassandra-rackdc.properties
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2015-11-01 15:21:54 +02:00
Shlomi Livne
b35f56cf8d dist: ubuntu copy cassandra-rackdc.properties
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2015-11-01 15:21:34 +02:00
Shlomi Livne
50cdcbd255 Update snitch registration EC2MultiRegionSnitch --> Ec2MultiRegionSnitch
Update snitch EC2MultiRegionSnitch to Ec2MultiRegionSnitch,
org.apache.cassandra.locator.EC2MultiRegionSnitch to
org.apache.cassandra.locator.Ec2MultiRegionSnitch

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2015-11-01 15:21:26 +02:00
Shlomi Livne
72b27a91fe Update snitch registration EC2Snitch --> Ec2Snitch
Update EC2Snitch to Ec2Snitch, org.apache.cassandra.locator.EC2Snitch to
org.apache.cassandra.locator.Ec2Snitch

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2015-11-01 15:20:28 +02:00
Lucas Meneghel Rodrigues
6a2f5409c9 test.py: Use shlex to split the patch
Let's use shlex to do the path splitting instead of the
simple .split() call.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-11-01 14:33:32 +02:00
Avi Kivity
212fcdc159 Remove kvm/ subdirectory
It's a seastar thing, not a Scylla thing.
2015-11-01 12:48:32 +02:00
Lucas Meneghel Rodrigues
c1dde5dc4d service/storage_service.cc: Fix control reaches end of non-void function error
During testing build, the debugging statement at the end
of the function body (after return statements) causes compilation to
fail due to the flag -Werror=return-type:

service/storage_service.cc: In member function ‘future<> service::storage_service::clear_snapshot(sstring, std::vector<basic_sstring<char, unsigned int, 15u> >)’:
service/storage_service.cc:1358:1: error: control reaches end of non-void function [-Werror=return-type]

Which traces back to 21f84d77. Let's attach a then_wrapped()
clause to parallel_for_each() adding the debug message as
suggested by Avi.

CC: Glauber Costa <glommer@scylladb.com>
Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-11-01 11:59:44 +02:00
Asias He
5e8037b50a gossip: Futurize add_local_application_state()
We are ignoring the future returned by seastar::async. Futurize it so
caller can wait for the application state to be actually applied.

In addition, dropping the unused add_local_application_states function.
2015-11-01 11:20:52 +02:00
Avi Kivity
2c3591cbd9 data_value de-any-fication
We use boost::any to convert to and from database values (stored in
serlialized form) and native C++ values.  boost::any captures information
about the data type (how to copy/move/delete etc.) and stores it inside
the boost::any instance.  We later retrieve the real value using
boost::any_cast.

However, data_value (which has a boost::any member) already has type
information as a data_type instance.  By teaching data_type intances about
the corresponding native type, we can elimiante the use of boost::any.

While boost::any is evil and eliminating it improves efficiency somewhat,
the real goal is growing native type support in data_type.  We will use that
later to store native types in the cache, enabling O(log n) access to
collections, O(1) access to tuples, and more efficient large blob support.
2015-10-30 17:38:51 +01:00
Pekka Enberg
b478d9d8a6 Merge "Wire reconnectable_snitch_helper to gossiping_property_file_snitch" from Vlad
"gossiping_property_file_snitch checks its property
file (cassandra-rackdc.properties) for changes every minute and
if there were changes it re-registers the helper and initiates
re-read of the new DC and Rack values in the corresponding places.

Therefore we need the ability to unregister/register the corresponding subscriber
at the same time when a subscriber list is possibly iterated by
some other asynchronous context on the current CPU.

The current gossiper implementation assumes that subscribers list may not be
changed from the context different from the one that iterates on their list.

So, this had to be fixed.

There was also missing an update_endpoint(ep) interface in the locator::topology
class and the corresponding token_metadata::update_topology(ep) wrapper.

Also there were some bugs in the gossiping_property_file::reload_configuration()
method."
2015-10-30 15:48:42 +02:00
Pekka Enberg
a9a8d1e8a2 Merge "Fixes for dependency packages, fix upstart script" from Takuya
"Fixes #510, (part of) #493."
2015-10-30 15:41:40 +02:00
Lucas Meneghel Rodrigues
a3d26f699e test.py: Only print test stdout if non None, non empty
On hindsight, it doesn't make much sense to print an
empty string, so let's only print stdout if it's non
None, non empty.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-10-30 09:11:07 +02:00
Avi Kivity
91a8cfc029 cql3: don't declare counter-related functions
This works now, but fails when an in-progress patch I'm working on calls
one of the type's methods.
2015-10-30 09:06:48 +02:00
Vlad Zolotarov
6b4b983f9d locator::gossiping_property_file_snitch: implement gossiper_starting() and reload_gossiper_state()
This functions were empty and now they have the intended code:
   - Register the reconnectable_snitch_helper if "prefer_local"
     parameter was given the TRUE value.
   - Set the application INTERNAL_IP state to listen_address().

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-30 00:16:53 +02:00
Vlad Zolotarov
17294a3bc7 locator::gossiping_property_file_snitch: fix some issues in reload_configuration()
- Invoke reload_gossiper_state() and gossip_snitch_info() on CPU0 since
     gossiper is effectively running on CPU0 therefore all methods
     modifying its state should be invoked on CPU0 as well.
   - Don't invoke any method on external "distributed" objects unless their
     corresponding per-shard service object have already been initialized.
   - Update a local Node info in a storage_service::token_metadata::topology
     when reloading snitch configuration when DC and/or Rack info has changed.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-30 00:16:53 +02:00
Vlad Zolotarov
b3504f9b1f locator::token_metadata: added topology::update_endpoint(ep) and update_topology(ep)
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-30 00:16:53 +02:00
Vlad Zolotarov
a3d55ba882 locator::reconnectable_snitch_helper: remove unused constructor parameter
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-30 00:16:53 +02:00
Vlad Zolotarov
33b195760b gms::gossiper: allow the modification of _subscribers while it's being iterated
Introduce a subscribers_list class that exposes 3 methods:
  - push_back(s) - adds a new element s to the back of the list
  - remove(s) - removes an element s from the list
  - for_each(f) - invoke f on each element of the list
  - make a subscriber_list store shared_ptr to a subscriber
    to allow removing (currently it stores a naked pointer to the object).

subscribers_list allows push_back() and remove() to be called while
another thread (e.g. seastar::async()) is in the middle of for_each().

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Simplify subscribers_list::remove() method.
   - load_broadcaster: inherit from enable_shared_from_this instead
     of async_sharded_service.
2015-10-30 00:16:16 +02:00
Pekka Enberg
763b718418 Update 'docker run' instructions in README
Explicitly mention how to enable port forwarding so that external
applications are able to connect.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-29 20:07:04 +02:00
Takuya ASADA
3bf69db90c dist: handle SIGKILL correctly on upstart, also do not respawn process
Fixes #510
2015-10-30 00:55:47 +09:00
Takuya ASADA
6aac944588 dist: fix warning when building scylla-server ubuntu package 2015-10-30 00:53:20 +09:00
Takuya ASADA
37e00d3527 dist: fix warning when building thrift ubuntu package 2015-10-30 00:52:11 +09:00
Takuya ASADA
a165d6aa01 dist: fix warning when building antlr3-c++-dev ubuntu package 2015-10-30 00:52:00 +09:00
Takuya ASADA
9eb8aa7ea3 dist: fix warning when building antlr3-tool ubuntu package 2015-10-30 00:51:42 +09:00
Avi Kivity
f7f996d473 Merge seastar upstream
* seastar 9d8913a...9ae6407 (2):
  > core/memory.cc: Declare member min_free_pages from cpu_pages struct
  > http: All http replies should have version set
2015-10-29 16:49:10 +02:00
Asias He
6259dd73b6 token: Print token using the partitioner defined method
Make nodetool ring output align with c* output.
2015-10-29 15:53:47 +02:00
Pekka Enberg
f4849000c8 cql3: Fix exception message in lists::setter_by_index::execute()
Spotted by C* unit tests.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-29 15:36:56 +02:00
Pekka Enberg
6634f37464 cql3: Fix exception message in lists::discarder_by_index::execute()
Spotted by C* unit tests.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-29 15:36:51 +02:00
Pekka Enberg
2344bd806d cql3: Fix grammar 'WITH WITH' bug that causes a SIGSEGV
Fix an issue where 'WITH WITH' in CREATE and ALTER KEYSPACE would bring
down the whole server.

  https://issues.apache.org/jira/browse/CASSANDRA-9565

  c939422637

Spotted by C* unit tests.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-29 10:59:57 +01:00
Raphael S. Carvalho
6bea503f9a db: fallback to sizetiered if compaction strategy isn't supported
It may happen that the user will migrate a table to Scylla which
compaction strategy isn't supported yet, such as Data tiered.
Let's handle that by falling back to size-tiered compaction
strategy and printing a warning message.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-29 09:33:28 +02:00
Asias He
c030e6713b storage_service: Serialize rebuild
Do not hold the api lock while streaming the data since it might take a
long time, so we need to reconcile other operations while we are in the
middle of rebuild.
2015-10-29 09:51:52 +08:00
Asias He
cc6d4e0fc9 storage_service: Introduce run_with_no_api_lock
Run on cpu zero without take the api lock.
2015-10-29 09:50:03 +08:00
Asias He
2c8867c348 config: Enable storage_port option 2015-10-29 08:58:41 +08:00
Asias He
97d2bdf663 storage_service: Serialize gossiping operations 2015-10-29 08:58:41 +08:00
Asias He
f2e5e318d7 storage_service: Serialize effective_ownershipp with read api lock 2015-10-29 08:58:41 +08:00
Asias He
94a49121c2 storage_service: Serialize get_ownership with read api lock 2015-10-29 08:58:41 +08:00
Asias He
d41cd48a5f storage_service: Serialize is_joined with read api lock 2015-10-29 08:58:41 +08:00
Asias He
24a452133a storage_service: Serialize join_ring with write api lock 2015-10-29 08:58:41 +08:00
Asias He
08337b0df1 storage_service: Serialize more node management operations with read api lock
- get_operation_mode
- is_starting
- is_rpc_server_running
- is_native_transport_running
- get_load_map
- is_initialized
2015-10-29 08:58:41 +08:00
Asias He
c6bf157610 storage_service: Introduce run_with_read_api_lock
Similar to run_with_write_api_lock, but the operations only query the
cluster.
2015-10-29 08:58:41 +08:00
Asias He
93154e0184 storage_service: Serialize more node management operations with write api lock
- start_rpc_server
- stop_rpc_server
- start_native_transport
- stop_native_transport
- remove_node
2015-10-29 08:57:45 +08:00
Raphael S. Carvalho
c2a98807c7 compaction_manager: fix remove
remove() is the function used to remove every reference to a cf from
the compaction manager. This function works by removing cf from the
queue, and waiting for possible ongoing compaction on cf.
However, a cf may be re-queued by compaction manager task if there
is pending compaction by the end of compaction.

If cf is still referenced by the time remove() returns, we could end
up with an use-after-free. To fix that, a task shouldn't re-queue a
cf if it was asked to stop. The stat pending_tasks was also not
being updated when a cf was removed from the task queue.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-28 17:35:26 +02:00
Gleb Natapov
ac5f92db70 storage_proxy: clean up local_dc checking
The only place local_dc is checked during mutation sending is in
send_to_live_endpoints(), but current code pass it there throw several
function call layers. Simplify the code by getting local_dc when it is
used directly.
2015-10-28 16:10:18 +02:00
Paweł Dziepak
6e5916161c test/cql_test_env: merge client state after executing query
Since 4641dfff24 "service: Copy client
state to query state" after executing a query client state needs to be
merged back. If that's not done client_state::_last_timestamp_micros
won't be advanced properly and mutations originating from the same
source may have exactly the same timestamp.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-28 14:31:11 +02:00
Avi Kivity
4d23712c9c Merge "Fix gossip test" from Asias 2015-10-28 12:59:19 +02:00
Asias He
25c898fe9c gossip: Enable too more log prints for debug 2015-10-28 16:13:57 +08:00
Asias He
e0e8e9a1ed tests: Remove redundant debug info for gossip
The debug info is printed in logger already. Avoid to print it twice.
2015-10-28 16:13:57 +08:00
Asias He
01ee5d002a failure_detector: Remove debug print in operator<< 2015-10-28 16:13:57 +08:00
Asias He
f205ae30c3 tests: Fix gossip
- scylla/seastar/core/sharded.hh:439: Service& seastar::sharded<Service>::local()
  [with Service = locator::snitch_ptr]: Assertion `local_is_initialized()' failed.
- ./utils/fb_utilities.hh:74: static const inet_address utils::fb_utilities::get_broadcast_address():
  Assertion `broadcast_address()' failed.
2015-10-28 16:13:57 +08:00
Takuya ASADA
5aab69e72f dist: do not remove build/ dir when scylla-server ubuntu package building
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-10-27 18:23:09 +02:00
Avi Kivity
7f88db8625 Merge "storage_service and gossip update" from Asias 2015-10-27 18:14:43 +02:00
Lucas Meneghel Rodrigues
a9a33d5a99 test.py: PEP8 Fixes
Fix some PEP8 problems found in the tester code:

 * Wrong spacing around operators
 * Lines between class and function definitions
 * Fixed some of the larger than 80 column statements
 * Removed an unused import

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-10-27 17:54:54 +02:00
Tomasz Grabiec
491eae58a4 Merge branch 'gleb/read-repair' from seastar-dev.git
From Gleb:

Now that we can calculate mutation diffs do so on digest mismatch and send
them out.
2015-10-27 16:36:05 +01:00
Asias He
7e6e90dc52 storage_service: Serialize decommission
Prevent two operations happen simultaneously.
2015-10-27 21:48:38 +08:00
Asias He
4b75815306 storage_service: Introduce run_with_write_api_lock helper
It is useful to run code on cpu zero with the api lock.
2015-10-27 21:48:38 +08:00
Asias He
62228726a3 storage_service: Introduce a rwlock to serialize management operations 2015-10-27 21:48:38 +08:00
Asias He
1bbc1920d2 range_streamer: Start to use get_preferred_ip
It is available now.
2015-10-27 21:48:37 +08:00
Asias He
306bab9ead storage_service: Use get_preferred_ip
Now that it is available, use it.
2015-10-27 21:48:37 +08:00
Asias He
55b76a8963 api: No cpu zero trick for remove_node
storage_service::remove_node is guaranteed to run on cpu zero only.
2015-10-27 21:48:37 +08:00
Asias He
b3c7305d25 storage_service: Make remove_node runs on cpu 0 only
We need to serialize nodetool operations to avoid two operations
happening simultaneously. Running on cpu 0 is one step toward this
goal.
2015-10-27 21:48:37 +08:00
Asias He
6c6b1c4ba7 storage_service: Make decommission runs on cpu 0 only
We need to serialize nodetool operations to avoid two operations
happening simultaneously. Running on cpu 0 is one step toward this goal.
2015-10-27 21:48:37 +08:00
Asias He
49daba2599 storage_service: Do not ignore future in decommission
gossiper::stop returns a future which we can not ignore.
2015-10-27 21:48:37 +08:00
Asias He
00311817bd storage_service: Implement shutdown_client_servers 2015-10-27 21:48:37 +08:00
Asias He
83eb36796f storage_service: Kill FIXME for LoadBroadcaster.BROADCAST_INTERVAL
It is available now.
2015-10-27 21:48:37 +08:00
Asias He
1469cec5bf gossiper: Kill free function helper to get heart version and generation number
They can only be executed on cpu 0. Make the gossiper member
functions for them to do so.
2015-10-27 21:48:37 +08:00
Asias He
f573059698 gossiper: Kill free function helper for {unsafe_,}assassinate_endpoint
They can only be executed on cpu 0. Make the gossiper member functions
for them to do so.
2015-10-27 21:48:37 +08:00
Asias He
c5f377eb8b gossip: Simplify get_endpoint_downtime
_unreachable_endpoints is replicated to call cores. No need to query
on core 0.
2015-10-27 21:48:37 +08:00
Asias He
6f1db4fb72 gossip: Simplify get_unreachable_members
_unreachable_endpoints is replicated to call cores. No need to query on
core 0.

This also fixes a bug in storage_proxy::truncate_blocking
which might access _unreachable_endpoints on non-zero cores.
2015-10-27 21:48:37 +08:00
Asias He
a9f96d1f5a gossip: Replicate _unreachable_endpoints to all cores 2015-10-27 21:48:37 +08:00
Asias He
2439a2a982 gossip: Simplify get_live_members
_live_endpoints is replicated to call cores. No need to query on core 0.
2015-10-27 21:48:37 +08:00
Asias He
a28ba9cde8 api: Simplify get_tokens get_node_tokens and get_token_endpoint
token_metadata is replicated to all cores. No need to query on core 0.
2015-10-27 21:48:37 +08:00
Asias He
4e886f0399 storage_service: Implement is_rpc_server_running 2015-10-27 21:48:37 +08:00
Asias He
7f0634b429 storage_service: Implement stop_rpc_server 2015-10-27 21:48:37 +08:00
Asias He
ca66bea619 storage_service: Implement is_native_transport_running 2015-10-27 21:48:37 +08:00
Asias He
7bc49a1efe storage_service: Implement stop_native_transport 2015-10-27 21:48:37 +08:00
Asias He
8218ab7922 storage_service: Implement start_native_transport and start_rpc_server
They are used for APIs. Share the code in main.cc as well.
2015-10-27 21:48:37 +08:00
Lucas Meneghel Rodrigues
42c0acfc44 test.py: Return test output only if subprocess succeeded
The current code will try to print the output of a
subprocess.Popen().communicate() call even if that
call raised an exception and that output is None.

Let's fix this problem by only printing the output
if it's not None.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-10-27 15:17:05 +02:00
Asias He
3c47844e8c storage_service: Complete check_for_endpoint_collision
Part of it was stubbed.
2015-10-27 21:17:02 +08:00
Asias He
77a87cb2b6 storage_service: Implement prepare_replacement_info
Needed by replace node operation.
2015-10-27 21:17:02 +08:00
Vlad Zolotarov
5cdbc3701a tests: set broadcast address
Since commit 5613979a85
broadcast address has to be set before it's used for the first
time.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-27 15:16:13 +02:00
Avi Kivity
345f739f28 Merge seastar upstream
* seastar 501e4cb...9d8913a (3):
  > Add mutable to with_lock and do_with
  > app-template: disable collectd by default
  > reactor: use fdatasync() instead of fsync()
2015-10-27 15:13:36 +02:00
Avi Kivity
42a5c8b92e Merge "CQL request load balancing" from Pekka
"Currently, CQL requests are processed on the same CPU core where the
connection lives in. This series adds infrastructure for migrating CQL
processing to other cores and implements a round-robin load balancing
algorithm that can be enabled with the "--load-balance=round-robin"
command line option. Load balancing is not enabled by default because we
need to first run performance tests to determine if the simple
round-robin algorithm is sufficient, or wheter we need to implement more
sophisticated dynamic load balancing."
2015-10-27 15:04:44 +02:00
Gleb Natapov
58154333e8 storage_proxy: send out mutation diffs to each destination 2015-10-27 14:58:35 +02:00
Pekka Enberg
a772938e73 transport/server: Round-robin CQL request load balancing
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-27 13:24:58 +02:00
Pekka Enberg
ed0607c10e transport/server: Merge client state changes in process_request()
In preparation for processing queries on shards other than where the
connection lives in, merge client state changes in process_request().

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-27 13:24:58 +02:00
Pekka Enberg
4641dfff24 service: Copy client state to query state
In preparation for processing CQL requests on different core than where
the connection lives in, copy client state to query state for
processing and merge back the results after we're done.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-27 13:24:58 +02:00
Pekka Enberg
27c678b2a5 transport/server: Use bytes_view instead of moving temporary buffer around
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-27 13:13:25 +02:00
Pekka Enberg
4482406aee transport/server: Write response from process_request()
In preparation for spreading request processing to multiple cores, make
sure CQL response is written out on the connection shard.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-27 13:13:25 +02:00
Pekka Enberg
3b6eba1344 transport/server: Remove _query_states from connection
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-27 13:13:25 +02:00
Pekka Enberg
3884f1c8b6 transport/server.cc: Fix formatting glitch
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-27 13:13:25 +02:00
Gleb Natapov
122f03e689 storage_proxy: calculate mutation diffs during reconcile. 2015-10-27 13:02:15 +02:00
Gleb Natapov
8bbe4bdbd4 storage_proxy: rename reconciliate to reconcile
'reconciliate' does not seams to be a world.
2015-10-27 12:57:02 +02:00
Vlad Zolotarov
3a3867588a tests::cql_test_env: set broadcast address
cql_query_test hasn't configured Broadcast address before
it was used for the first time.

Broadcast address is an essential Node's configuration.

There is an assert in utils::fb_utils::get_broadcast_address()
that ensures that broadcast address has been properly configured
before it's used for the first time and it is triggered without
this patch.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-27 12:38:56 +02:00
Avi Kivity
0754e9a34d Merge "Even more commitlog fixes" from Calle
"Fixes for commitlog (debug) test failures related to shutdowns.
Note that most the fixes here are only really related to the tests
failing, not really real scylla runs. However, at some point we'll
have real shutdown in scylla as well (not just hard exit), at which
point this becomes more relevant there as well.

Main issue was post-flush continuation chains for stats update
remaining unexecuted, due to task reordering, once the commitlog
object itself had been destroyed. This could have been handled by just
making the stats object a shared pointer, but in general it seems more
prudent to enforce having all tasks completed after shutdown.

* Change commitlog shutdown to use gate+wait for all outstanding ops
  (flush, write, timer). Thus we can ensure everything is finished
  when returning from "shutdown".
* Fix bug with "commitlog::clear" (test method) not doing the intended deed
* Most importantly, fix the tests themselves, cleaning up old crud, and
  fixing invalid assumptions (CL behaviour changed quite a bit since tests
  were created), and remove races.

Disclaimer: I've _never_ managed to reproduce the debug tests failing
like in jenkins locally (though I managed to provoke other failures),
but at least jenkins runs with this series have been clean. Knock knock."
2015-10-27 12:16:20 +02:00
Paweł Dziepak
f46cba7bc8 sstable: simplify key reader
Now that #475 is solved an read_indexes() guarantees to return disjont
sets of keys sstable key reader can be simplified, namely, only two key
lookups are needed (the first and the last one) and there is no need for
range splitting.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-27 10:33:02 +02:00
Avi Kivity
fa0f00e9d2 Merge "Add EC2MultiRegionSnitch and Co." from Vlad
"This series add the mighty EC2MultiRegionSnitch and some missing
multi-DC related functionality:
   - Use the proper Broadcast Address: either the one from the
    .yaml configuration (if present) or the one configured by some
    scylla component (e.g. snitch).
   - Introduce the ability to switch to internal IPs when connecting
     to Nodes in the same data center.
   - Store the known internal IPs in system.peers table and
     load then immediately during boot.

This series also contains some related fixes done on the way."
2015-10-26 18:38:57 +02:00
Gleb Natapov
5bc532261f load_broadcaster: add missing header file protector 2015-10-26 18:38:01 +02:00
Avi Kivity
91bde5e2e8 Merge "sstable improvements" from Raphael 2015-10-26 17:25:58 +02:00
Asias He
6f08c4facb dns: Move gethostbyname to source file
Fix multiple definition of gethostbyname.
2015-10-26 15:59:58 +02:00
Calle Wilund
6dc6b40495 commitlog_test: test fix 2015-10-26 14:57:15 +01:00
Calle Wilund
ceb9f4d647 database: Just do commitlog::shutdown on shutdown. It will do flushes. 2015-10-26 14:56:24 +01:00
Calle Wilund
5299cece4c commitlog: Make "shutdown" do flushing + hard sync of pending ops
* Do close + fsync on all segments
* Make sure all pending cycle/sync ops are guarded with a gate, and
  explicitly wait for this gate on shutdown to make sure we don't
  leave hanging flushes in the task queue.
* Fix bug where "commitlog::clear" did not in fact shut down the CL,
  due to "_shutdown" being already set.

Note: This is (at least currently) not an issue for anything else than tests,
since we don't shutdown the normal server "properly", i.e. the CL itself
will not go away, and hanging tasks are ok, as long as the sync-all is done
(which it was previously). But, to make tests predictable, and future-proof
the CL, this is better.
2015-10-26 14:50:54 +01:00
Raphael S. Carvalho
25a4d6686a sstables: update comment
sstable level is set to zero by default, but it may be set to
a different value if a new sstable is the result of leveled
compaction. This is done outside write_components.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-26 10:39:22 -02:00
Raphael S. Carvalho
028283869f sstables: reuse BASE_SAMPLING_LEVEL constant from downsampling class
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-26 10:35:46 -02:00
Raphael S. Carvalho
b908d3a5e9 sstables: fix prepare_summary
We were incorrectly setting s.header.min_index_interval to
BASE_SAMPLING_LEVEL, which luckily is the default value to
min index interval. BASE_SAMPLING_LEVEL was also used as
the min index interval when checking if the estimated
number of summary entries is greater than the limit.
To fix problems, get min index interval from schema and
use this value to check the limit.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-26 10:27:03 -02:00
Avi Kivity
38df381206 Merge "remove node support" from Asias 2015-10-26 14:23:47 +02:00
Vlad Zolotarov
f70aab2fbb locator: added ec2_multi_region_snitch
This snitch in addition to what EC2Snitch does registers
a reconnectable_snitch_helper that will make messenger_service
connect to internal IPs when it connects to the nodes in the same
data center with the current Node.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v4:
   - Added dual license in newly added files.

New in v3:
   - Returned the Apache license.

New in v2:
   - Update the license to the latest version. ;)
2015-10-26 14:10:47 +02:00
Vlad Zolotarov
5613979a85 utils::fb_utilities: add the ability to set a broadcast address
Add utils::fb_utilities::set_broadcast_address().
Set it to either broadcast_address or listen_address configuration value
if appropriate values are set. If none of the two values above
are set - abort the application.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Simplify the utils::fb_utilities::get_broadcast() logic.
2015-10-26 14:10:39 +02:00
Vlad Zolotarov
a182a33d4d locator::snitch_base: added i_endpoint_snitch::set_local_private_addr()
Sets the value of the local private IP address.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-26 14:10:39 +02:00
Vlad Zolotarov
1bb91399cd locator: added reconnectable_snitch_helper
reconnectable_snitch_helper implements i_endpoint_state_change_subscriber
and triggers reconnect using the internal IP to the nodes in the
same data center when one of the following events happen:

   - on_join()
   - on_change() - when INTERNAL_IP state is changed
   - on_alive()

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v4:
   - Added dual license for newly added files.

New in v3:
   - Fix reconnect() logic.
   - Returned the Apache license.
   - Check if the new local address is not already stored in the cache.
   - Get rid of get_ep_addr().

New in v2:
   - Update the license to the latest version. ;)
2015-10-26 14:09:48 +02:00
Vlad Zolotarov
d683d9cfb0 locator::ec2_snitch: moved info parsing and distribution into the separate function
Added load_config() function that reads AWS info and property file
and distributes the read values on all shards.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-26 14:09:26 +02:00
Vlad Zolotarov
703807b67d locator::ec2_snitch: rename ZONE_QUERY_SERVER_ADDR -> AWS_QUERY_SERVER_ADDR
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-26 14:09:26 +02:00
Vlad Zolotarov
d8de1099eb message::messaging_service: introduce _preferred_ip_cache
This map will contain the (internal) IPs corresponding to specific Nodes.
The mapping is also stored in the system.peers table.

So, instead of always connecting to external IP messaging_service::get_rpc_client()
will query _preferred_ip_cache and only if there is no entry for a given
Node will connect to the external IP.

We will call for init_local_preferred_ip_cache() at the end of system table init.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Improved the _preferred_ip_cache description.
   - Code styling issues.

New in v3:
   - Make get_internal_ip() public.
   - get_rpc_client(): return a get_preferred_ip() usage dropped
     in v2 by mistake during rebase.
2015-10-26 14:09:26 +02:00
Vlad Zolotarov
f896f9a908 message::messaging_service: added remove_rpc_client(shard_id)
This function erases shard_info objects from all _clients maps.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Use remove_rpc_client_one() instead of direct map::erase().
2015-10-26 14:09:26 +02:00
Vlad Zolotarov
e9789dd68c message::messaging_service: fixes in rpc_protocol_client_wrapper shut down
- Ensure messaging_service::stop() blocks until all rpc_protocol::client::stop()
     are over.
   - Remove the async code from rpc_protocol_client_wrapper destructor - call
     for stop() everywhere it's needed instead. Ensure that
     rpc_protocol_client_wrapper is always "stopped" when its destructor is called.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v3:
   - Code style fixes.
   - Killed rpc_protocol_client_wrapper::_stopped.
   - Killed rpc_protocol_client_wrapper::~rpc_protocol_client_wrapper().
   - Use std::move() for saving shared pointer before
     erasing the entry from _clients in
     remove_rpc_client_one() in
     order to avoid extra ref count bumping.
2015-10-26 14:09:26 +02:00
Vlad Zolotarov
842b13325d message::messaging_service: make _clients to be std::array
This makes code cleaner. Also it would allow less changes
if we decide to increase _clients size in the future.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-26 14:09:26 +02:00
Vlad Zolotarov
fd811dd707 db::system_keyspace: added get_preferred_ips()
get_preferred_ips() returns all preferred_ip's stored in system.peers
table.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Get rid of extra std::move().
2015-10-26 14:09:26 +02:00
Vlad Zolotarov
f2e1be0fc1 db::system_keyspace::update_preferred_ip(): use net::ipv4_address as a preferred_ip value
Fixes issue #481

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-26 14:09:26 +02:00
Vlad Zolotarov
4f603d285e gms::gossiper: call for i_endpoint_snitch::gossiper_starting()
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-26 14:09:26 +02:00
Takuya ASADA
17f2038949 dist: fix upstart script to make it able to start/stop correctly
Scylla is not "daemon" (witch forks twice), but it can be "fork" (forks once) when we don't use "exec" to call startup scripts.
Fixes #495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-10-26 13:00:52 +02:00
Takuya ASADA
25398d5a1d dist: statically link to libstdc++ on Ubuntu
Fixes #494

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-10-26 12:59:28 +02:00
Asias He
201d439440 storage_service: Enable remove_node support 2015-10-26 10:33:19 +02:00
Asias He
06b19867d8 api: Fix storage_service remove_node parameter
nodetool removenode takes the parameter of host_id instead of token.

Refs: #496
2015-10-26 10:32:47 +02:00
Asias He
9912a68364 storage_service: Enable remove_node support 2015-10-26 15:29:13 +08:00
Asias He
5f2d97c0b8 api: Fix storage_service remove_node parameter
nodetool removenode takes the parameter of host_id instead of token.

Refs: #496
2015-10-26 15:26:59 +08:00
Avi Kivity
7254c87776 Merge seastar upstream
* seastar 6af5a0d...501e4cb (2):
  > configure: don't fail when sanitizer is unavailable
  > memcached: close socket at the end of request handling
2015-10-25 19:59:45 +02:00
Shlomi Livne
d493b2a7f7 dist: ubuntu update build script file permissions
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-10-25 12:44:50 +02:00
Avi Kivity
e2e869f150 Merge "API: storage proxy metrics" from Amnon
"This series adds two types of functionality to the storage_proxy, it adds the
API that returns the timeout constants from the config and it aligned the
metrics of the read, write and range to origin StorageProxy metrics."
2015-10-25 11:40:46 +02:00
Avi Kivity
01e301fc9a Merge "fix read_indexes" from Raphael
Fixes issue #474.
2015-10-25 11:25:42 +02:00
Avi Kivity
ec4d1dd7fd Merge "CQL code cleanups" from Pekka
"Yet another round of code cleanups to cql3/*."
2015-10-25 10:35:19 +02:00
Tomasz Grabiec
a3e84c1553 tests: Add test for make_local_reader() and min/max tokens 2015-10-25 10:27:59 +02:00
Tomasz Grabiec
7b1e78ffcd storage_proxy: Fix make_local_reader() for ranges with min/max tokens
shard_of() was undefined for before_all/after_all tokens. Fix by
adding handling for these.
2015-10-25 10:27:58 +02:00
Raphael S. Carvalho
5cdc886f84 tests: add test to read_indexes
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-23 16:57:41 -02:00
Raphael S. Carvalho
af88a6864a sstables: fix read_indexes
read_indexes() will not work for a column family that minimum
index interval is different than sampling level or that sampling
level is lower than BASE_SAMPLING_LEVEl.
That's because the function was using sampling level to determine
the interval between indexes that are stored by index summary.
Instead, method from downsampling will be used to calculate the
effective interval based on both minimum_index_interval and
sampling_level parameters.

Fixes issue #474.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-23 16:57:41 -02:00
Raphael S. Carvalho
e18cf96b01 sstables: convert Downsampling to C++
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-23 16:57:39 -02:00
Pekka Enberg
e48f8d25e2 cql3: Move cf_statement class implementation to source file
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-23 16:17:17 +03:00
Pekka Enberg
d79d8b00f3 cql3: Move create_keyspace_statement class implementation to source file
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-23 16:17:17 +03:00
Pekka Enberg
baea79d17d cql3: Move schema_altering_statement class implementation to source file
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-23 16:17:17 +03:00
Pekka Enberg
6dbee22d2e cql3: Move use_statement class implementation to source file
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-23 16:17:17 +03:00
Pekka Enberg
ff93a840ae cql3: Remove ifdef'd code from use_statement.hh
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-23 15:52:02 +03:00
Pekka Enberg
83e6c428be transport: Add messages_fwd.hh and use it
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-23 15:52:02 +03:00
Tomasz Grabiec
7eb0da3ed8 Merge tag 'asias/range_streamer_use_after_free_gossip_print/v1' from seastar-dev.git
range_streamer use after free fix and gossip print cleanups from
Asias.
2015-10-23 14:43:20 +02:00
Asias He
a63871f024 gossip: Print application_state name instead of number
Plus print cleanup.
2015-10-23 17:03:18 +08:00
Asias He
687d88dd0c gossip: Add operator<< operator for application_state 2015-10-23 16:32:36 +08:00
Asias He
80f1b9781a gossip: Simplify make_token_string with boost::adaptors::transformed 2015-10-23 16:13:31 +08:00
Asias He
1ba20f4efd storage_service: Enable single_datacenter_filter in rebuild 2015-10-23 16:13:30 +08:00
Asias He
b172146223 range_streamer: Introduce single_datacenter_filter 2015-10-23 16:13:30 +08:00
Asias He
a934e31379 storage_service: Fix use after free in rebuild
streamer is a stack variable, it is gone when the function returns.
Fix it using a shared pointer.
2015-10-23 16:13:30 +08:00
Asias He
98b34ecc67 boot_strapper: Fix use after free
streamer is a stack variable, it is gone when the function returns.
Fix it using a shared pointer.

Fixes #489
2015-10-23 16:13:30 +08:00
Asias He
e2391b02da storage_service: Add debug info for get_load_map 2015-10-23 16:13:30 +08:00
Avi Kivity
2c16d1f980 Merge "fix nodetool gossipinfo and status" from Asias
"- Implement get_load_map using load_broadcaster
- Fix token in nodetool gossipinfo"
2015-10-23 10:34:13 +03:00
Asias He
bc8d3f0d24 storage_service: Implement get_load_map using load_broadcaster
$nodetool -p 7199 status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID Rack
UN  127.0.0.1  60711      3       ?  78ce8d76-69f8-44fc-a5ae-b932eb4f641b  rack1
UN  127.0.0.2  66237      3       ?  b1eaaee6-c334-4a65-bc0c-95af49bca0b3  rack1
2015-10-23 11:53:46 +08:00
Asias He
a137006d27 main: Set load_broadcaster for storage_service during startup
Note only storage_service on shard 0 will access load_broadcaster
2015-10-23 11:28:06 +08:00
Asias He
992390feda storage_service: Store load_broadcaster into storage_service
It is needed by get_load_map.
2015-10-23 11:25:51 +08:00
Asias He
270e713fea Revert "API: Workaround for load_map"
This reverts commit 497e403387.
2015-10-23 10:45:54 +08:00
Asias He
6b7ca7e334 gossip: Cleanup versioned_value class 2015-10-23 10:06:19 +08:00
Asias He
5556861ac0 gossip: Fix nodetool gossipinfo
Use dht::global_partitioner().{to_sstring and from_sstring) to handle
tokens.

Before:
$ nodetool -p 7199  gossipinfo
127.0.0.1
  generation:1445528714
  heartbeat:105
  0:NORMAL,TOKENS
  2:c1e6c9ef-22c8-3f93-bca1-ea5810bafd36
  3:datacenter1
  4:rack1
  5:2.1.8
  8:127.0.0.1
  11:0
  12:d24aef9c-fc4e-4290-92f4-1fd317b8883a

After:
$ nodetool -p 7199  gossipinfo
127.0.0.1
  generation:1445528714
  heartbeat:105
  0:NORMAL,-3524784140453853209;2276970246802708341;-4108982606669659076
  2:c1e6c9ef-22c8-3f93-bca1-ea5810bafd36
  3:datacenter1
  4:rack1
  5:2.1.8
  8:127.0.0.1
  11:0
  12:d24aef9c-fc4e-4290-92f4-1fd317b8883a
2015-10-23 09:40:59 +08:00
Raphael S. Carvalho
19f2dc9ef9 import Downsampling.java
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-22 14:44:40 -02:00
Gleb Natapov
5b97604735 load_broadcaster: fix linkage error in debug mode
Also move it to service namespace.
2015-10-22 18:18:05 +02:00
Amnon Heiman
14482d4496 API: Add read, write, range estimated histogram implementation for storage_proxy
This patch adds the implmentation for the read, write and range
estimated histogram and total latency.

After this patch the following url will be available:
/storage_proxy/metrics/read/estimated_histogram/
/storage_proxy/metrics/read
/storage_proxy/metrics/write/estimated_histogram/
/storage_proxy/metrics/write
/storage_proxy/metrics/range/estimated_histogram/
/storage_proxy/metrics/range

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-22 18:54:45 +03:00
Amnon Heiman
e1168c0a34 API: Add estimated histogram to storage_proxy
This patch close the gap between the storage_proxy read, write and range
metrics and the API.

For each of the metrics there will be a histogram, estimated histogram
and total.

The patch contains the definitions for the following:
get_read_estimated_histogram
get_read_latency
get_write_estimated_histogram
get_write_latency
get_range_estimated_histogram
get_range_latency

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>

need mrege storage_proxy
2015-10-22 18:54:45 +03:00
Amnon Heiman
7b8c557f30 storage_service: Add estimated histogram for read, write and range
This patch adds an estimated histogram for read, write and range to the
proxy_service stats.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-22 18:54:45 +03:00
Amnon Heiman
a071cdce66 API: Add getters to storage_proxy timers timeout
This patch expose the configuration timeout values of the timers.
The timers will return their values in seconds, the swagger definition
file was modified to reflect the change.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-22 18:54:11 +03:00
Tomasz Grabiec
5bbc902eec mutation_partition: Drop now unnecessary unconst() usage
This change was actually promised by
f74c665671.
2015-10-22 17:12:03 +02:00
Tomasz Grabiec
c7be350961 mutation_partition: Rename reversion_traits to reversal_traits
As pointed out by Nadav, 'reversion' is from 'revert', 'reversal' is
from 'reverse'.
2015-10-22 18:09:07 +03:00
Tomasz Grabiec
f74c665671 mutation_partition: Add non-const-qualified version of range() and use it 2015-10-22 18:09:07 +03:00
Asias He
6c48ce065e storage_service: Make comment clearer in get_changed_ranges_for_leaving
We are not removing the range. Current node and new node will be
responsible for the range are calculated. We only need to stream data to
node = new node - current node. E.g,

Assume we have node 1 and node 2 in the cluster, RF=2. If we remove node2:

Range (3c 25 fa 7e d2 2a 26 b4 , 81 2a a7 32 29 e5 3a 7c ],
current_replica_endpoints={127.0.0.1, 127.0.0.2} new_replica_endpoints={127.0.0.1}

Range (3c 25 fa 7e d2 2a 26 b4 , 81 2a a7 32 29 e5 3a 7c ] already in all replicas

no data will be streamed to node 1 since it already has it.
2015-10-22 18:08:03 +03:00
Avi Kivity
a699bc20bc Merge "Adding the stream metrics API" from Amnon
"This series adds the stream metrics API, the swagger definition are based on
the StreamMetrics class in origin."
2015-10-22 17:18:01 +03:00
Amnon Heiman
5323b29699 API: Add compaction history to the API
This patch adds a definition and a stub for the compaction history. The
implementation should read fromt the compaction history table and return
an array of results.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-22 17:16:58 +03:00
Avi Kivity
5d37ee29f8 Merge "Adding log level support to the API" from Amnon
"This series adds an API to get and set the log level.
After this series it will be possible to use the folloing url:
GET/POST:
/system/logger
/system/logger/{name}"
2015-10-22 17:15:41 +03:00
Avi Kivity
b4e5b9dcf1 Merge "Add support to nodetool describecluster"
"This series adds the functionality that is required for nodetool
describecluster

It uses the gossiper for get cluster name and get partitioner.  The
describe_schema_versions functionality is missing and a workaround is used so
the command would work.

After this series an example for nodetool describecluster:
./bin/nodetool describecluster
Cluster Information:
	Name: Test Cluster
	Snitch: org.apache.cassandra.locator.SimpleSnitch
	Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
	Schema versions:
		127.0.0.1: [48c4e6c8-5d6a-3800-9a3a-517d3f7b2f26]"
2015-10-22 17:11:30 +03:00
Gleb Natapov
ece6c68288 convert loadBroadcaster 2015-10-22 17:10:20 +03:00
Avi Kivity
8362622eb8 Merge "Support Ubuntu 14.04LTS" from Takuya 2015-10-22 17:03:04 +03:00
Avi Kivity
0129e42b06 Merge "Mutation diff" from Paweł
"This series add code for computing mutation_partition difference.
For mutations A and B:

diffA = A.difference(B);
diffB = B.difference(A);
AB = A.apply(B);

diffA is the minimal mutation that when applied to B makes it equal
to AB and diffB is the minimal mutation that applied to A results in AB.

Fixes #430."
2015-10-22 16:38:25 +03:00
Avi Kivity
f7087da054 Merge "GET methods for snapshots" from Glauber
"The snapshots API need to expose GET methods so people can
query information on them. Now that taking snapshots is supported,
this relatively simple series implement get_snapshot_details, a
column family method, and wire that up through the storage_service."
2015-10-22 15:23:45 +03:00
Calle Wilund
05de462fa9 commitlog: Make flush/segment delete slightly mode defensive + test tolerant
Fix for (mainly) test failures (use-after free)
I.e. test case test_commitlog_delete_when_over_disk_limit causes
use-after free because test shuts down before a pending flush is done,
and the segment manager is actually gone -> crash writing stats.
Now, we could make the stats a shared pointer, but we should never
allow an operation to outlive the segment_manager.
In normal op, we _almost_ guarantee this with the shutdown() call,
but technically, we could have a flush continuation trailing somewhere.

* Make sure we never delete segments from segment_manager until they are
  fully flushed
* Make test disposal method "clear" be more defensive in flushing and
  clearing out segments
2015-10-22 15:19:24 +03:00
Avi Kivity
9dc8d98146 Merge "Make mutation queries respect reversed order" from Tomasz
"Affects CQL statements like the following one:

   select * from table order by ck desc;

Fixes #480."
2015-10-22 15:08:19 +03:00
Takuya ASADA
a03056a915 dist: add dependency packages and build script for ubuntu 2015-10-22 11:55:50 +00:00
Takuya ASADA
e0604b066a dist: add debian/ directory to build .deb package for Ubuntu 2015-10-22 11:55:50 +00:00
Takuya ASADA
592b64e478 dist: mount hugetlbfs on ubuntu 2015-10-22 11:55:50 +00:00
Takuya ASADA
830041d6be dist: share scripts both on redhat and ubuntu 2015-10-22 11:55:50 +00:00
Avi Kivity
a2577fefb9 Merge "Enable decommission support" from Asias
"Tested with:

- start node 1
- insert value
- start node 2
- insert value
- decommission node2

I can see from the log that data range belongs to node2 is streamed to node1
and cqlsh query node1 returns all the data, and node2 is not in the live node
list from node1's view."
2015-10-22 14:44:24 +03:00
Avi Kivity
5f3a46eabb Merge "load_new_sstables" from Glauber
"This patchset implements load_new_sstables, allowing one to move tables inside the
data directory of a CF, and then call "nodetool refresh" to start using them.

Keep in mind that for Cassandra, this is deemed an unsafe operation:
https://issues.apache.org/jira/browse/CASSANDRA-6245

It is still for us something we should not recommend - unless the CF is totally
empty and not yet used, but we can do a much better job in the safety front.

To guarantee that, the process works in four steps:

1) All writes to this specific column family are disabled. This is a horrible thing to
   do, because dirty memory can grow much more than desired during this. Throughout out
   this implementation, we will try to keep the time during which the writes are disabled
   to its bare minimum.

   While disabling the writes, each shard will tell us about the highest generation number
   it has seen.

2) We will scan all tables that we haven't seen before. Those are any tables found in the
   CF datadir, that are higher than the highest generation number seen so far. We will link
   them to new generation numbers that are sequential to the ones we have so far, and end up
   with a new generation number that is returned to the next step

3) The generation number computed in the previous step is now propagated to all CFs, which
   guarantees that all further writes will pick generation numbers that won't conflict with
   the existing tables. Right after doing that, the writes are resumed.

4) The tables we found in step 2 are passed on to each of the CFs. They can now load those
   tables while operations to the CF proceed normally."
2015-10-22 13:42:24 +03:00
Avi Kivity
2b0a504cbc Merge "Adding row chache statistic to column family" from Amnon
"This series adds row cache statistic to the column family that will be expose
via the API."
2015-10-22 13:38:01 +03:00
Paweł Dziepak
740e2166c5 tests/mutation: add test for mutation diff
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Paweł Dziepak
f78a80dfa3 mutation_partition: add method for computing difference
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Paweł Dziepak
85edc3de07 mutation_partition: compute row difference
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Paweł Dziepak
75df23dd3c types: add collection_type_impl::difference()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Paweł Dziepak
4440f9b85b mutation_partition: add row_marker::is_live()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Paweł Dziepak
2aa96eb00f mutation_partition: add insert_row()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Paweł Dziepak
a064181d7c mutation_partition: add row::with_both_ranges()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Paweł Dziepak
1c05d7b927 mutation_partition: fix row_marker::apply() for equal timestamps
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Paweł Dziepak
7fab0ee867 mutation_partition: add compare_row_marker_for_merge()
A compare_atomic_cell_for_merge() equivalent intended to be used
with row markers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:08:53 +02:00
Asias He
04291dec28 storage_service: Enable call to excise 2015-10-22 17:41:02 +08:00
Asias He
ee551d070f storage_service: Enable add_expire_time_if_found in excise and handle_state_removing 2015-10-22 17:41:02 +08:00
Asias He
2137ab5522 storage_service: Implement add_expire_time_if_found 2015-10-22 17:41:02 +08:00
Asias He
e2fbd146d7 storage_service: Print keyspace info unbootstrap in debug 2015-10-22 17:41:02 +08:00
Asias He
affab296b0 storage_service: Fix ranges in stream_hints
Use range = make_open_ended_both_sides to present the entire ring.
2015-10-22 17:41:02 +08:00
Asias He
0d2fb9c99d storage_service: Add extract_expire_time 2015-10-22 17:41:02 +08:00
Asias He
58225216b3 storage_service: Fix immediate return for get_changed_ranges_for_leaving
It is a leftover when get_changed_ranges_for_leaving is get stubbed.
2015-10-22 17:41:02 +08:00
Asias He
fb27d682ad storage_service: Fix use after free for stream_plan
sp is a stack variable, it is gone when the function returns.
Fix it using a shared pointer.
2015-10-22 17:41:02 +08:00
Asias He
69b7028f84 storage_service: Fix token contains in handle_state_leaving
std::includes requires sorted container. get_tokens_for returns
std::unordered_set. Fix by put tokens into std::set.
2015-10-22 17:41:02 +08:00
Asias He
4785798904 storage_service: Kill unimplemented in decommission 2015-10-22 17:41:02 +08:00
Asias He
ce6dd0f8f8 storage_service: Implement start_leaving 2015-10-22 17:41:02 +08:00
Paweł Dziepak
513ab87b47 row_cache: update hit and miss stats in scanning reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:25:02 +03:00
Paweł Dziepak
b1b830bcbb row_cache: merge cache_entry::compare and ring_position_compare
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:25:02 +03:00
Tomasz Grabiec
f1306d3771 tests: Add tests for reversed mutation queries 2015-10-22 10:32:08 +02:00
Tomasz Grabiec
cc5cc7117d mutation_query: Respect 'reversed' partition_slice option
Fixes #480
2015-10-22 10:32:08 +02:00
Tomasz Grabiec
9dbd5a92d0 partition_slice_builder: Introduce reversed() 2015-10-22 10:32:08 +02:00
Amnon Heiman
c130381284 Adding live_scanned and tombstone scaned histogram to column family
This series adds a histogrm to the column family for live scanned and
tombstone scaned.

It expose those histogram via the API instead of the stub implmentation,
currently exist.

The implementation update of the histogram will be added in a different
series.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-22 11:13:28 +03:00
Amnon Heiman
378a97b66b API: Add row cahe hits and miss per column family
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-22 11:12:14 +03:00
Glauber Costa
fd8e5c7e4c api: load new sstables
Just a wrapper into the storage_service's homonymous call.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:30:04 +02:00
Glauber Costa
673788ed46 storage_service load_new_sstables.
This is the storage_service implementation of load_new_sstables, and this is
where most of the complication lives.

Keep in mind that for Cassandra, this is deemed an unsafe operation:
https://issues.apache.org/jira/browse/CASSANDRA-6245

It is still for us something we should not recommend - unless the CF is
totally empty and not yet used, but we can do a much better job in the safety front.

To guarantee that, the process works in four steps:

1) All writes to this specific column family are disabled. This is a horrible thing to
   do, because dirty memory can grow much more than desired during this. Throughout out
   this implementation, we will try to keep the time during which the writes are disabled
   to its bare minimum.

   While disabling the writes, each shard will tell us about the highest generation number
   it has seen.

2) We will scan all tables that we haven't seen before. Those are any tables found in the
   CF datadir, that are higher than the highest generation number seen so far. We will link
   them to new generation numbers that are sequential to the ones we have so far, and end up
   with a new generation number that is returned to the next step

3) The generation number computed in the previous step is now propagated to all CFs, which
   guarantees that all further writes will pick generation numbers that won't conflict with
   the existing tables. Right after doing that, the writes are resumed.

4) The tables we found in step 2 are passed on to each of the CFs. They can now load those
   tables while operations to the CF proceed normally.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:06:22 +02:00
Glauber Costa
36cea4313e column family: load new sstables
CF-level code to load new SSTables. There isn't really a lot of complication
here. We don't even need to repopulate the entire SSTable directory: by
requiring that the external service who is coordinating this tell us explicitly
about the new SSTables found in the scan process, we can just load them
specifically and add them to the SSTable map.

All new tables will start their lifes as shared tables, and will be unshared
if it is possible to do so: this all happens inside add_sstable and there isn't
really anything special in this front.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:06:22 +02:00
Glauber Costa
54aaa58899 sstable_tests: test reshuffle operation
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:06:22 +02:00
Glauber Costa
a8db2b28c7 sstable tests: test set_generation
No code works until it's been tested.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:06:22 +02:00
Glauber Costa
c5950c7bf7 sstable_test: get rid of frees
They exist. They shouldn't.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:06:22 +02:00
Glauber Costa
f60021f87f sstable_tests: commonize code to compare two components.
The current codes assumes a particular dir/generation pair. We
will use it for a more generic case. This code could really use some
clean up, by the way. We should do it later.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:06:22 +02:00
Glauber Costa
61be9fb02d reshuffle tables: mechanism to adjust new sstables' generation number
Before loading new SSTables into the node, we need to make sure that their
generation numbers are sequential (at least if we want to follow Cassandra's
footsteps here).

Note that this is unsafe by design. More information can be found at:
https://issues.apache.org/jira/browse/CASSANDRA-6245

However, we can already to slightly better in two ways:

Unlike Cassandra, this method takes as a parameter a generation number. We
will not touch tables that are before that number at all. That number must be
calculated from all shards as the highest generation number they have seen themselves.
Calling load_new_sstables in the absence of new tables will therefore do nothing,
and will be completely safe.

It will also return the highest generation number found after the reshuffling
process.  New writers should start writing after that. Therefore, new tables
that are created will have a generation number that is higher than any of this,
and will therefore be safe.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:06:22 +02:00
Glauber Costa
1351c1cc13 database: mechanism to stop writing sstables
During certain operations we need to stop writing SSTables. This is needed when
we want to load new SSTables into the system. They will have to be scanned by all
shards, agreed upon, and in most cases even renamed. Letting SSTables be written
at that point makes it inherently racy - specially with the rename.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:06:22 +02:00
Glauber Costa
29e2ad7fd8 column family: commonize code to calculate the desired SSTable generation
We will reuse this for load_new_sstables.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:02:43 +02:00
Glauber Costa
3f6d47f1f2 sstables: change the current level of an sstable
This will be used, for instance, when importing an SSTable.
We would like to force all new SSTables to sit at level 0 for
compaction purposes.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:02:42 +02:00
Glauber Costa
e11b828b6f sstables: allow an sstable to set its generation number
In some situations (restoring a backup from load_new_sstables), we want to
change the SSTable generation number. This patch provides a procedure to
achieve that.

It does so by linking the old files to new ones, and then removing the old
ones.

The reason we link instead of removing, is that we want to make sure that in
case there is a crash in the middle, the old data is still accessible.

If the crash happens after the link is done but before we start removing the
old files, that is fine: we will end up with duplicated data that will
disappear after the next compaction.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:02:42 +02:00
Glauber Costa
a4d8e99f1c sstables: fix create_links so a TemporaryTOC is generated
That is the way to generate groups of files for the SSTables, so we must do it.
Because the links were mostly used by processes like snapshots and backups
where and external tool would (hopefully) verify the results, it was not that
serious.

But we now plan to use links to bring things into the main directory. It must
absolutely be done right.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:02:42 +02:00
Glauber Costa
876e770df6 sstables: allow create_links to work with an arbitrary generation
During some situations (restoring a snapshot for instance) we may want a file
to get a different generation. This patch changes the code in create_links
slightly, so that it is able to link not only to a different location, but to
files with a different name, possibly in the same location - that is equivalent
to a generation change.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:02:42 +02:00
Glauber Costa
27efe8bde9 sstables: make read_toc public
This is done on behalf of load_new_sstables: we would like to know which
components are present in the file, but without triggering the read for the
rest of the metadata.

As noted by Avi, using this directly can leave the SSTable in an inconsistent
state. We will have to fix is later since this is not the first offender.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:02:42 +02:00
Glauber Costa
fcebf6f72d sstable tests: don't use set_generation method
There is no reason aside from testing for a table to just change its generation
number.

There will be, however, when we support loading new sstables. The method
however needs to be completely rewritten, so let's make sure the tests are not
using that.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:02:42 +02:00
Glauber Costa
f3bad2032d database: fix type for sstable generation.
Avoid using long for it, and let's use a fixed size instead.  Let's do signed
instead of unsigned to avoid upsetting any code that we may have converted.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:01:20 +02:00
Calle Wilund
412c2a1e5b storage_proxy: mutate_atomically fix for consistency and BL removal
The change to use consistency_level::ONE in send_batchlog_mutation
sort of fixes #478, but is not 100% correct.
When doing async_remove_from_batchlog, the CL is actually supposed to
be ANY.

Also, we should _not_ remove the batch log mutation from any nodes
if the mutate fails, since having it there in case of failure is sort of
the whole point of it. I.e. async_remove_from_batchlog should not be
called from a "finally", but from a "then".

Refs #478
2015-10-21 16:48:27 +02:00
Tomasz Grabiec
764d913d84 Merge branch 'pdziepak/row-cache-range-query/v4' from seastar-dev.git
From Pawel:

This series enables row cache to serve range queries. In order to achieve
that row cache needs to know whether there are some other partitions in
the specified range that are not cached and need to be read from the sstables.
That information is provied by key_readers, which work very similarly to
mutation_readers, but return only the decorated key of partitions in
range. In case of sstables key_readers is implemented to use partition
index.

Approach like this has the disadvantage of needing to access the disk
even if all partitions in the range are cached. There are (at least) two
solutions ways of dealing with that problem:
 - cache partition index - that will also help in all other places where it
   is neededed
 - add a flag to cache_entry which, when set, indicates that the immediate
   successor of the partition is also in the cache. Such flag would be set
   by mutation reader and cleared during eviction. It will also allow
   newly created mutations from memtable to be moved to cache provided that
   both their successors and predecessors are already there.

The key_reader part of this patchsets adds a lot of new code that probably
won't be used in any other place, but the alternative would be to always
interleave reads from cache with reads from sstables and that would be
more heavy on partition index, which isn't cached.

Fixes #185.
2015-10-21 15:26:45 +02:00
Gleb Natapov
6a2a0d628b storage_proxy: use CL=ONE to write logged batch
This is a regression created by logged batch code rework.

Fixes #478.
2015-10-21 15:29:49 +03:00
Avi Kivity
c69c02c162 Merge 2015-10-21 15:17:32 +03:00
Avi Kivity
c49dd5c576 Merge "move dependencies to /opt/scylladb" from Takuya 2015-10-21 15:17:04 +03:00
Glauber Costa
71c1b2fe69 api: get true snapshot size
Thin wrapper around storage service's facility.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 13:48:44 +02:00
Glauber Costa
c0630bedc2 api: get_snapshot_details
That's basically conversion work between what the storage_service returns
and the json types.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 13:48:44 +02:00
Glauber Costa
cf96d68478 storage_service: true snapshot size
For CFStats, one of the things needed is the size used by the snapshots. Since
the bulk of the work is map-reducing it and adding them together, we will just
call get_snapshot_details for the column family, and just selectively add just
what we need. No need for a separate method here.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 13:48:44 +02:00
Glauber Costa
718fea7048 storage_service: get_snapshot_details
The column family object can, for each column family, provide us with a map between
each snapshots it knows about, and two sizes: the total size, and the "real" (or live)
size, which is how much extra space the snapshot is costing us.

This patch map-reduces all CFs to accumulate that system-wide, and then formats that
into an a map of "snapshot_details". That is a more convenient format to be consumed
by our json generator.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 13:48:44 +02:00
Glauber Costa
77513a40db database: get_snapshot_details
For each of the snapshots available, the api may query for some information:
the total size on disk, and the "real" size. As far as I could understand, the
real size is the size that is used by the SSTables themselves, while the total
size includes also the metadata about the snapshot - like the manifest.json
file.

Details follow:

In the original Cassandra code, total size is:

    long sizeOnDisk = FileUtils.folderSize(snapshot);

folderSize recurses on directories, and adds file.length() on files. Again, my
understanding is that file_size() would give us the same as the length() method
for Java.

The other value, real (or true) size is:

    long trueSize = getTrueAllocatedSizeIn(snapshot);

getTrueAllocatedSizeIn seems to be a tree walker, whose visitor is an instance
of TrueFilesSizeVisitor. What that visitor does, is add up the size of the files
within the tree who are "acceptable".

An acceptable file is a file which:

starts with the same prefix as we want (IOW, belongs to the same SSTable, we
will just test that directly), and is not "alive". The alive list is just the
list of all SSTables in the system that are used by the CFs.

What this tries to do, is to make sure that the trueSnapshotSize is just the
extra space on disk used by the snapshot. Since the snapshots are links, then
if a table goes away, it adds to this size. If it would be there anyway, it does
not.

We can do that in a lot simpler fashion: for each file, we will just look at
the original CF directory, and see if we can find the file there. If we can't,
then it counts towards the trueSize. Even for files that are deleted after
compaction, that "eventually" works, and that simplifies the code tremendously
given that we don't have to neither list all files in the system - as Cassandra
does - or go check other shards for liveness information - as we would have to
do.

The scheme I am proposing may need some tweaks when we support multiple data
directories, as the SSTables may not be directly below the snapshot level.
Still, it would be trivial to inform the CF about their possible locations.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 13:48:44 +02:00
Avi Kivity
16006949d0 logalloc: make migrator an object, not a function pointer
The migrator tells lsa how to move an object when it is compacted.
Currently it is a function pointer, which means we must know how to move
the object at compile time.  Making it an object allows us to build the
migration function at runtime, making it suitable for runtime-defined types
(such as tuples and user-defined types).

In the future, we may also store the size there for fixed-size types,
reducing lsa overhead.

C++ variable templates would have made this patch smaller, but unfortunately
they are only supported on gcc 5+.
2015-10-21 11:24:56 +02:00
Avi Kivity
e2cd40e3bc Merge "remove and decommission node support part 2" from Asias
"More preparatory patches for remove and decommission node support:

- stream hints and reanges
- unbootstrap
- replication finished notification"
2015-10-21 12:24:14 +03:00
Takuya ASADA
1bf18679bb dist: add more build dependency for binutils
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 09:02:40 +00:00
Takuya ASADA
613c7e8226 dist: use scylla-* dependency packages on scylla-server.spec
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
29847edd4e dist: move antlr3-C++-devel package to /opt/scylladb
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
ad80845a1d dist: move antlr3-tool package to /opt/scylladb
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
cb18a305db dist: move ragel package to /opt/scylladb
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
b4daf31fcf dist: move ninja-build package to /opt/scylladb, remove unused dependency for re2c
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
91005ba366 dist: move boost package to /opt/scylladb
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
2e379179d3 dist: move gcc package to /opt/scylladb
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
09099081e1 dist: move isl package to /opt/scylladb
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
18a6c25434 dist: move binutils package to /opt/scylladb
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
90666cf558 dist: add scylla-env package for CentOS, to use /opt/scylladb as prefix
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
659624a2ee dist: dependency fix for ninja-build
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Takuya ASADA
7340a5a83a dist: fix srpm not found error
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
2015-10-21 08:49:34 +00:00
Asias He
1271ad6894 storage_service: Implement send_replication_notification 2015-10-21 16:11:33 +08:00
Asias He
56e55cd272 storage_proxy: Register replication_finished verb handler 2015-10-21 16:11:33 +08:00
Asias He
ffce7a7af8 storage_service: Implement confirm_replication 2015-10-21 16:11:33 +08:00
Asias He
1965e8751b messaging_service: Add REPLICATION_FINISHED verb
It is used to send replication finished message by storage_service when
removing a node from a cluster.
2015-10-21 16:11:33 +08:00
Asias He
d903bd0dba storage_service: Implement on_leave_cluster in excise 2015-10-21 14:55:01 +08:00
Asias He
4f57f0cdae storage_service: Enable restore_replica_count in remove_node 2015-10-21 14:55:01 +08:00
Asias He
5b170d1ffe storage_service: Complete handle_state_removing
Implement #if 0'ed code.
2015-10-21 14:55:01 +08:00
Asias He
7d656fe127 storage_service: Enable add_leaving_endpoint in handle_state_leaving 2015-10-21 14:55:01 +08:00
Asias He
d3120b3c2f storage_service: Complete is_replacing logic in handle_state_normal 2015-10-21 14:55:01 +08:00
Asias He
434a9e211e storage_service: Kill one FIXME in handle_state_bootstrap
It is already fixed.
2015-10-21 14:55:01 +08:00
Asias He
6f8f4816a5 storage_service: Implement unbootstrap
All the missing functions for unbootstrap are ready, we can implement
unbootstrap now.
2015-10-21 14:55:01 +08:00
Asias He
8a9374b331 storage_service: Implement stream_hints
Needed by unbootstrap.
2015-10-21 14:55:01 +08:00
Asias He
3f5e9baa17 storage_service: Implement stream_ranges
Needed by unbootstrap.
2015-10-21 14:55:01 +08:00
Asias He
d1eaccd234 storage_service: Implement leave_ring
Needed by unbootstrap.
2015-10-21 14:55:01 +08:00
Asias He
2f86feb581 storage_service: Move send_replication_notification to source file 2015-10-21 14:55:01 +08:00
Avi Kivity
5453cfbab7 Merge "snapshots: take + clear" from Glauber
"This is the code for taking a snapshot, and clearing a snapshot."
2015-10-21 08:59:42 +03:00
Paweł Dziepak
2876c9ebc7 unimplemented: drop RANGE_QUERIES
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
c2b53c5282 row_cache: add scanning_and_populating_reader
This reader enables range queries on row cache. An underlying key_reader
is used to obtain information about partitions that belong to the
specified range and if any of them isn't in the cache an underlying
mutation reader is used to read the missing data.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
9abefbfa28 row_cache: add just_cache_scanning_reader
This mutation reader returns mutations from cache that are in a given
range. There may be other mutations in the system (e.g. in sstables)
that won't be returned, so this reader on its own cannot really satisfy
any query.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
c765b38599 row_cache: add modification counter
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
c1e95dd893 row_cache: pass underlying key_source to row_cache
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
6b73d6bb36 row_cache: add cache_entry::ring_position_compare
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
96a42a9c69 column_family: add sstables_as_key_source()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
b0edaa5bb7 memtable: add as_key_source()
Needed only for tests.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
68e1b1f613 tests/sstables: test sstable key_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:47 +02:00
Paweł Dziepak
7fe6d2dc07 sstables: provide key_reader
In order to get just partition keys stored in the sstable its summary
and index are used.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:47 +02:00
Paweł Dziepak
ca278e6b52 tests/key_reader: add tests for key_readers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:47 +02:00
Paweł Dziepak
7c480f29c3 key_reader: add filtering key reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:47 +02:00
Paweł Dziepak
e683d57768 key_reader: add key_from_mutation_reader
This is a wrapper that allows using mutation_reader as key_reader.
Useful mostly in tests.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:47 +02:00
Paweł Dziepak
5604830389 key_reader: add combined_reader
Combined key reader, just like its mutation equivalents, combines
output from multiple key_readers and provides a single sorted stream
of decorated keys.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:32 +02:00
Paweł Dziepak
1f0cb9066b add key_reader interface
key_readers provide an interface analogous to mutation_readers, but the
only data they return are decorated keys.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:26:54 +02:00
Paweł Dziepak
31341787e4 mutation: optimize mutation_opt
Since mutation stores all its data externally and the object itself is
basically just a std::unique_ptr<> there is no need for stdx::optional.
Smart pointer set to nullptr represents a disengaged mutation_opt.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:24:11 +02:00
Paweł Dziepak
aed403efc2 mutation_reader: move move_and_disengage to a separate header
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:24:11 +02:00
Avi Kivity
cf734132e7 Merge "Flusing of CF:s without replay positions" from Calle
"Fixes: #469

We occasionally generate memtables that are not empty, yet have no
high replay_position set. (Typical case is CL replay, but apparently
there are others).

Moreover, we can do this repeatedly, and thus get caught in the flush
queue ordering restrictions.

Solve this by treating a flush without replay_position as a flush at the
highest running position, i.e. "last" in queue. Note that this will not
affect the actual flush operation, nor CL callbacks, only anyone waiting
for the operation(s) to complete.

To do this, the flush_queue had its restrictions eased, and some introspection
methods added."
2015-10-20 17:36:57 +03:00
Raphael S. Carvalho
93c1035f6e range: add comment explaining !start and !end in lambda
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-20 17:23:35 +03:00
Raphael S. Carvalho
c3a9d342f4 range: rename overlap to overlaps
overlaps() is more grammatical.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-20 17:23:35 +03:00
Avi Kivity
f4bd089b83 Merge "remove and decommission node" from Asias
"Preparatory patch for remove and decommission node support"
2015-10-20 17:00:04 +03:00
Glauber Costa
3a95a9cbe6 api: implement clear snapshot
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:31 +02:00
Glauber Costa
19bb50f450 api: implement take_snapshot
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:31 +02:00
Glauber Costa
21f84d77fc storage_service: delete a snapshot
This patch provides an storage service api to delete an snapshot.  Because all
keyspaces and CFs are visible in all shards. This will allow us to fetch the
list of keyspaces in the present shard and issue the filesystem operations in
that same shard.

That simplifies the code tremendously, and because there are not any operations
we need to do previous to the fs ones (like in the case of create snapshot), we
need no synchronization. Even easier.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:31 +02:00
Glauber Costa
2f2a4e83e0 storage_service: take a snapshot of a particular column family
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:30 +02:00
Glauber Costa
fe3164714f storage_service: take a snapshot of a group of keyspaces
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:30 +02:00
Glauber Costa
d236b01b48 snapshots: check existence of snapshots
We go to the filesystem to check if the snapshot exists. This should make us
robust against deletions of existing snapshots from the filesystem.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:26 +02:00
Glauber Costa
d3aef2c1a5 database: support clear snapshot
This allows for us to delete an existing snapshot. It works at the column
family level, and removing it from the list of keyspace snapshots needs to
happen only when all CFs are processed. Therefore, that is provided as a
separate operation.

The filesystem code is a bit ugly: it can be made better by making our file
lister more generic. First step would be to call it walker, not lister...

For now, we'll use the fact that there are mostly two levels in the snapshot
hierarchy to our advantage, and avoid a full recursion - using the same lambda
for all calls would require us to provide a separate class to handle the state,
that's part of making this generic.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:38:14 +02:00
Glauber Costa
500ee99c93 file lister: allow for more than one directory type
There are situations in which we would like to match more than one directory
type.  One example of that, would be a recursive delete operation: we need to
delete the files inside directories and the directories themselves, but we
still don't want a "delete all" since finding anything other than a directory
or a file is an error, and we should treat it as such.

Since there aren't that many times, it should be ok performance wise to just
use a list. I am using an unordered_set here just because it is easy enough,
but we could actually relax it later if needed. In any case, users of the
interface should not worry about that, and that decision is abstracted away
into lister::dir_entry_types.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:38:14 +02:00
Asias He
9e5ee17f4a storage_service: Implement rebuild 2015-10-20 21:32:30 +08:00
Asias He
31b50a83d1 storage_service: Implement excise 2015-10-20 21:32:30 +08:00
Asias He
0cf112501e storage_service: Implement restore_replica_count 2015-10-20 21:32:30 +08:00
Asias He
0ebcb1ddef storage_service: Stub send_replication_notification
Needed by restore_replica_count.
2015-10-20 21:32:30 +08:00
Asias He
1c480554eb storage_service: Stub get_new_source_ranges
Needed by restore_replica_count.
2015-10-20 21:32:30 +08:00
Asias He
955e766a49 storage_service: Partially implement decommission 2015-10-20 21:32:30 +08:00
Asias He
893849f8af storage_service: Stub unbootstrap 2015-10-20 21:32:30 +08:00
Asias He
9ebae12614 storage_service: Partially implement remove_node
restoreReplicaCount and excise are missing.
2015-10-20 21:32:30 +08:00
Asias He
142f29483a token_metadata: Implement add_leaving_endpoint 2015-10-20 21:32:30 +08:00
Asias He
f1bc882b90 storage_service: Implement get_changed_ranges_for_leaving 2015-10-20 21:32:30 +08:00
Asias He
937474bf14 abstract_replication_strategy: Make calculate_natural_endpoints public
It is used by storage_service.
2015-10-20 21:32:30 +08:00
Asias He
c5e35ac57e storage_service: Enable _replicating_nodes and _removing_node members 2015-10-20 21:32:30 +08:00
Asias He
f30fbd53ff storage_service: Start to use pending_range_calculator_service 2015-10-20 21:32:30 +08:00
Asias He
934c963d85 init: Init pending_range_calculator_service 2015-10-20 21:32:29 +08:00
Asias He
8d6200c036 service: Convert PendingRangeCalculatorService.java to C++ 2015-10-20 21:32:29 +08:00
Asias He
a5d91519f2 service: Import PendingRangeCalculatorService.java 2015-10-20 21:32:29 +08:00
Asias He
c96bc8bbd2 token_metadata: Implement calculate_pending_ranges 2015-10-20 21:32:14 +08:00
Asias He
a6065397d9 token_metadata: Implement clone_after_all_left 2015-10-20 20:38:43 +08:00
Avi Kivity
5abdc4323a Merge seastar upstream
* seastar 46fd389...6af5a0d (2):
  > sharded: do not try to call nonexistent delete callback
  > use real argv0 in dpdk initialization
2015-10-20 15:19:35 +03:00
Tomasz Grabiec
67d0f9c7df lsa: Restore heap invariant before calling _segments.erase()
This is certainly the right thing to do and seems to fix #403. However
I didn't manage to convince myself that this would cause problems for
binomial_heap, given that binomial_heap::erase() calls siftup()
anyway:

    void erase(handle_type handle)
    {
        node_pointer n = handle.node_;
        siftup(n, force_inf());
        top_element = n;
        pop();
    }

    void increase (handle_type handle)
    {
        node_pointer n = handle.node_;
        siftup(n, *this);

        update_top_element();
        sanity_check();
    }
2015-10-20 15:18:05 +03:00
Asias He
08b762111e storage_service: Fix ignored future in on_dead
All the on_xxx interface in i_endpoint_state_change_subscriber are
called within a seastar::async context.
2015-10-20 15:16:03 +03:00
Amnon Heiman
1e8752d55e API: Fix a confusion in the storage service snapshot details
There was a confusion between the snapshot key and the keyspace in the
snapshot details, this fixes it.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 13:53:09 +03:00
Avi Kivity
2575a4602e Merge "Fix for snapshots/create_links and shared SSTables" from Glauber
"Those are fixes needed for the snapshotting process itself. I have bundled this
in the create_snapshot series before to avoid a rebase, but since I will have to
rewrite that to get rid of the snapshot manager (and go to the filesystem),
I am sending those out on their own."
2015-10-20 13:49:17 +03:00
Pekka Enberg
3a3c7f7e79 configure.py: Propagate CFLAGS to seastar config
We need to propagate CFLAGS to seastar config for things like
DEBUG_SHARED_PTR to work.
2015-10-20 10:57:26 +03:00
Amnon Heiman
91d396760e API: add a workaround for the get schema_versions
This adds a workaround for the get schema_version, it will return only a
shcema version of the local node, this is a temporary workaround until
describe_schema_versions will be implemented.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 10:56:21 +03:00
Amnon Heiman
383d7ccf4d Add the get cluster and partitioner names to the API
This adds the implementation for the get cluster name and get
partitioner name to the storage_service API.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 10:54:55 +03:00
Amnon Heiman
77b4fc74cd gossiper need to set the cluster name on all shareds
The API can call any of the gossiper shareds to get the cluster name, so
the initilization needs to set it in all of them.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 10:52:14 +03:00
Amnon Heiman
ff67285091 gossiper: make the get cluster name and partitioner public
The API needs the cluster and the partitioner names, so the methods are
now public.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 10:21:19 +03:00
Calle Wilund
786d66cacf commitlog: Fix use-after-free
Remove "finally". Just use a then_wrapped. Which it was originally, before
"handle_exception" was introduced to seastar. Oh, the irony...
2015-10-20 09:56:40 +03:00
Calle Wilund
02732f19f2 database: Handle CF flush with no high replay_position
We occasionally generate memtables that are not empty, yet have no
high replay_position set. (Typical case is CL replay, but apparently
there are others).

Moreover, we can do this repeatedly, and thus get caught in the flush
queue ordering restrictions.

Solve this by treating a flush without replay_position as a flush at the
highest running position, i.e. "last" in queue. Note that this will not
affect the actual flush operation, nor CL callbacks, only anyone waiting
for the operation(s) to complete.
2015-10-20 08:24:04 +02:00
Calle Wilund
31fca82213 flush_queue_test: Add test for multiple ops per key 2015-10-20 08:24:04 +02:00
Calle Wilund
62c0be376c flush_queue: Ease key restriction and allow multiple calls on each key
As long as we guarantee that the execution order for the post ops are
upheld, we can allow insertion of multiple ops on the same key.

Implemented by adding a ref count to each position.

The restriction then becomes that an added key must either be larger
than any already existing key, _OR_ already exist. In the latter case,
we still know that we have not finished this position and signaled
"upwards".
2015-10-20 08:24:04 +02:00
Calle Wilund
ca451acb41 commitlog: Fix use-after-free
Remove "finally". Just use a then_wrapped. Which it was originally, before
"handle_exception" was introduced to seastar. Oh, the irony...
2015-10-20 08:02:46 +02:00
Avi Kivity
2ccb5feabd Merge "Support nodetool cfhistogram"
"This series adds the missing estimated histogram to the column family and to
the API so the nodetool cfhistogram would work."
2015-10-19 17:11:46 +03:00
Raphael S. Carvalho
28ef8feffa tests: fix test for leveled compaction
test_setup::do_with_test_directory is missing. For some reason,
the test wasn't failing without it until now. Adding it is the
correct thing to do anyway.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-19 17:07:08 +03:00
Avi Kivity
f4706c7050 Merge "initial support to leveled compaction" from Raphael
"This patchset introduces leveled compaction to Scylla.
We don't handle all corner cases yet, but we already have the strategy
and compaction working as expected. Test cases were written and I also
tested the stability with a load of cassandra-stress.

Leveled compaction may output more than one sstable because there is
a limit on the size of sstables. 160M by default.
Related to handling of partial compaction, it's still something to be
worked on.

Anyway, it will not be a big problem. Why? Suppose that a leveled
compaction will generate 2 sstables, and scylla is interrupted after
the first sstable is completely written but before the second one is
completely written. The next boot will delete the second sstable,
because it was partially written, but will not do anything with the
first one as it was completely written.
As a result, we will have two sstables with redundant data."
2015-10-19 16:17:45 +03:00
Amnon Heiman
5998ee718a API: Add logger API implementation to the system API
This patch adds the ability to set one or all log levels get a log level
and get all logs name.

After this patch the following url will be available:
GET/POST
/system/logger
/system/logger/{name}

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-19 15:05:46 +03:00
Amnon Heiman
ba1e6adf2a Adding the system swagger definition
The system api will include system related command, currently it holds
the logger related API. It holds definition for the following commands:
get_all_logger_names
set_all_logger_level
get_logger_level
set_logger_level

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-19 14:59:54 +03:00
Amnon Heiman
521d9b62dd Add a level_name function to logger
This is a helper function that returns a log level name. It will be used
by the API to report the log levels.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-19 14:59:54 +03:00
Glauber Costa
218fdebbeb snapshot: do not allow exceptions in snapshot creation hang us
With the distribute-and-sync method we are using, if an exception happens in
the snapshot creation for any reason (think file permissions, etc), that will
just hang the server since our shard won't do the necessary work to
synchronize and note that we done our part (or tried to) in snapshot creation.

Make the then clause a finally, so that the sync part is always executed.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-19 13:37:02 +02:00
Glauber Costa
9083a0e5a7 snapshots: fix generation of snapshots with shared sstables
create_links will fail in one of the shards if one of the SSTables happen to be
shared. It should be fine if the link already exists, so let's just ignore that case.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-19 13:37:01 +02:00
Gleb Natapov
156f760663 storage_proxy: drop Origin's truncate code from comment
Drop already translated code.
2015-10-19 14:06:06 +03:00
Gleb Natapov
ae42ec7832 storage_proxy: actually sort endpoints in get_live_sorted_endpoints() 2015-10-19 13:38:37 +03:00
Avi Kivity
75ab27f683 Revert "Merge "move dependencies to /opt/scylladb" from Takuya"
This reverts commit c96b0453f2, reversing
changes made to b862165279e70300d29a075dc913523d56a616d2; there were some
extra commits on that branch.
2015-10-19 13:31:51 +03:00
Avi Kivity
c96b0453f2 Merge "move dependencies to /opt/scylladb" from Takuya
"This moves all dependency packages to /opt/scylladb."
2015-10-19 13:28:39 +03:00
Takuya ASADA
14fab1e36d dist: use scylla-* dependency packages on scylla-server.spec 2015-10-19 09:11:22 +00:00
Takuya ASADA
39e17e04bc dist: move antlr3-C++-devel package to /opt/scylladb 2015-10-19 09:11:22 +00:00
Takuya ASADA
71182a0a95 dist: move antlr3-tool package to /opt/scylladb 2015-10-19 09:11:22 +00:00
Takuya ASADA
1c7d8bc0e9 dist: move ragel package to /opt/scylladb 2015-10-19 09:11:22 +00:00
Takuya ASADA
95c67a3e2e dist: move ninja-build package to /opt/scylladb, remove unused dependency for re2c 2015-10-19 09:11:22 +00:00
Takuya ASADA
ab8fe49b4f dist: move boost package to /opt/scylladb 2015-10-19 09:11:22 +00:00
Takuya ASADA
2c7861b7ed dist: move gcc package to /opt/scylladb 2015-10-19 09:11:22 +00:00
Takuya ASADA
23b69d610c dist: move isl package to /opt/scylladb 2015-10-19 09:11:22 +00:00
Takuya ASADA
6db2ddc53b dist: move binutils package to /opt/scylladb 2015-10-19 09:11:22 +00:00
Takuya ASADA
f06e50a569 dist: add scylla-env package for CentOS, to use /opt/scylladb as prefix 2015-10-19 09:11:22 +00:00
Calle Wilund
b862165279 commitlog_test: Modify test_allocation_failure to not require huge segment
Explicitly use up all the memory in the system as best as we can instead
Still not super reliable, but should have less side effects. And work better
with pre-allocated segment files
2015-10-19 11:32:35 +03:00
Calle Wilund
681718f738 commitlog_test: Modify test_allocation_failure to not require huge segment
Explicitly use up all the memory in the system as best as we can instead
Still not super reliable, but should have less side effects. And work better
with pre-allocated segment files
2015-10-19 10:30:57 +02:00
Avi Kivity
eed43b276c tests: use workspace-local directory for commitlog
Can't rely on default /var/lib/scylla to be there or accessible.
2015-10-19 10:21:16 +02:00
Tomasz Grabiec
19d7d30e67 Replace references to 'urchin' with 'scylla' 2015-10-19 11:08:05 +03:00
Takuya ASADA
ca43a5f2ca dist: dependency fix for ninja-build 2015-10-19 06:59:15 +00:00
Takuya ASADA
721d0f9f64 dist: fix srpm not found error 2015-10-19 06:52:48 +00:00
Takuya ASADA
78813d9617 dist: create rpmbuild dirs 2015-10-19 06:51:03 +00:00
Vlad Zolotarov
7492b662c8 locator::gossiping_property_file_snitch: do_withicate file descriptor in property_file_was_modified()
file::stat() requires "this" to be valid will the end of the call.

Fixes issue #464

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-19 06:51:03 +00:00
Vlad Zolotarov
8c4ce0efee locator::production_snitch_base: fix production_snitch_base signature
Make production_snitch_base constructor signature consistent with
the rest of production snitches.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-19 06:51:03 +00:00
Vlad Zolotarov
4b8b3e7841 locator::production_snitch_base: cleanup
Don't keep the file descriptor beyond load_property_file() scope.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-19 06:51:03 +00:00
Vlad Zolotarov
9dfc08cf9f locator::gossiping_property_file_snitch: use empty string as a default config file name
A non-empty default value of a configuration file name was preventing
the db::config::get_conf_dir() to kick in when a default snitch constructor
was used (this is the way it's always used from scylla).

Fixes issue #459

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-19 06:51:03 +00:00
Avi Kivity
ce046d07da tests: create directories for user column families as well 2015-10-19 06:51:03 +00:00
Avi Kivity
f63002a75e tests: initialize system keyspace (and its directories) for cql_test_env
Now that columnb familiy directories are not created on demand, the tests
started to fail.
2015-10-19 06:51:03 +00:00
Raphael S. Carvalho
dfc95587d6 db: do not ignore compaction strategy class
When building the in-memory schema for a column family, we were
ignoring compaction strategy class because of a bug in the
existing code. Example: suppose that you create a column family
with leveled compaction strategy. This option would be ignored
and the default strategy (size-tiered) would be used instead.
Found this problem while working on leveled compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-19 06:51:03 +00:00
Glauber Costa
c8a6883987 sstables: remove outdated comment
We already write all components. The FIXME is fixed.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-19 06:51:03 +00:00
Avi Kivity
dd2bf81131 tests: disable collectd
No need to blast collectd metrics during test runs.
2015-10-19 09:05:48 +03:00
Avi Kivity
268f7acd11 Merge snitch fixes from Vlad
"This series cleans up a few places in the snitches
code that has been noticed during the work on issues #464
and #459.

The last patch actually fixes the issue #464"
2015-10-18 17:21:30 +03:00
Vlad Zolotarov
fe5e983152 locator::gossiping_property_file_snitch: do_withicate file descriptor in property_file_was_modified()
file::stat() requires "this" to be valid will the end of the call.

Fixes issue #464

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-18 16:58:27 +03:00
Vlad Zolotarov
15606fc8d3 locator::production_snitch_base: fix production_snitch_base signature
Make production_snitch_base constructor signature consistent with
the rest of production snitches.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-18 16:58:27 +03:00
Vlad Zolotarov
5712aa4d48 locator::production_snitch_base: cleanup
Don't keep the file descriptor beyond load_property_file() scope.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-18 16:58:27 +03:00
Vlad Zolotarov
f113a57ba1 locator::gossiping_property_file_snitch: use empty string as a default config file name
A non-empty default value of a configuration file name was preventing
the db::config::get_conf_dir() to kick in when a default snitch constructor
was used (this is the way it's always used from scylla).

Fixes issue #459

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-18 16:56:12 +03:00
Avi Kivity
bb67258c68 tests: create directories for user column families as well 2015-10-18 14:45:42 +03:00
Avi Kivity
9fd73417ae tests: initialize system keyspace (and its directories) for cql_test_env
Now that columnb familiy directories are not created on demand, the tests
started to fail.
2015-10-18 12:25:36 +03:00
Raphael S. Carvalho
a21af32eed db: do not ignore compaction strategy class
When building the in-memory schema for a column family, we were
ignoring compaction strategy class because of a bug in the
existing code. Example: suppose that you create a column family
with leveled compaction strategy. This option would be ignored
and the default strategy (size-tiered) would be used instead.
Found this problem while working on leveled compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-18 11:06:37 +03:00
Glauber Costa
f1c681de60 sstables: remove outdated comment
We already write all components. The FIXME is fixed.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-17 15:28:55 +03:00
Avi Kivity
82f3f76e74 Merge "remove on-demand creation of SSTable directory" from Glauber
"Let's do it better and create the directories when the tables are first
created."
2015-10-17 15:23:56 +03:00
Glauber Costa
c814be253a sstables: no longer touch directory when writing SSTables
After the previous two patches, the CF directory where the SSTable will live is
guaranteed to always exist: system CFs' are touched at boot while newly created
tables' are touched when the creation mutations are announced.

With that in place, there is no more need for the recursive touch in the
SSTables path.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-17 13:08:07 +02:00
Glauber Costa
c3e68ad5a4 guarantee that CF directories exist for system tables
This is done in a single shard, so it lives in main.cc

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-17 13:08:07 +02:00
Glauber Costa
e99e418238 schema_tables: make sure CF directory exists upon creation
In Cassandra, when you create a new column family, a directory for it
immediately appears under the KS directory.

In the past, we have made a decision to delay that creation until the first
SSTable is created, which works well in general.

There is a problem, however, for backup restoration: the standard procedure to
call loadNewSSTables is to do that in an empty directory. But the directory
simply won't be there until we create the first SSTable: bummer!

In the current incarnation of the code in schema_tables.cc, there is already
some code that runs on CPU0 only. That is a perfect place for the directory
creation. So let's do it.

After this patch, a directory for the CF appears right after the CF creation.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-17 13:08:07 +02:00
Glauber Costa
df857eb8c6 database: touch directories for the column family
Current code calls make_directory, which will fail if the directory already exists.
We didn't use this code path much before, but once we start creating CF directories
on CF creation - and not on SSTable creation, that will become our default method.

Use touch_directory instead

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-17 13:08:07 +02:00
Raphael S. Carvalho
e555ad6370 tests: add tests for leveled compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-16 01:57:08 -03:00
Raphael S. Carvalho
06e9836a66 wire up support for leveled compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-16 01:57:04 -03:00
Raphael S. Carvalho
35b75e9b67 adapt compaction procedure to support leveled strategy
Adapt our compaction code to start writing a new sstable if the
one being written reached its maximum size. Leveled strategy works
with that concept. If a strategy other than leveled is being used,
everything will work as before.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-16 01:54:52 -03:00
Raphael S. Carvalho
fbec0d0254 convert LeveledManifest to C++
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-16 01:46:15 -03:00
Raphael S. Carvalho
683e29e0bd tests: add test for range overlap
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:34:58 -03:00
Raphael S. Carvalho
936575efd2 dht: introduce comparator for token
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:34:50 -03:00
Raphael S. Carvalho
24f9c2f85e range: add method to check if two ranges overlap
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:29:54 -03:00
Raphael S. Carvalho
cc2e0dc5d8 sstables: implement function to get sstable level
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:29:54 -03:00
Raphael S. Carvalho
b8e392e6fe sstables: add function to compare sstable by age
Leveled strategy prefers choosing older level-0 sstables as
candidates.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:29:50 -03:00
Raphael S. Carvalho
1d043936af sstables: add function to compare sstables by first key
Used by leveled strategy to sort a list of sstables with the
same level.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:25:08 -03:00
Raphael S. Carvalho
d4eb09704f sstables: add method to get first and last as decorated keys
Useful for leveled strategy which looks for overlapping sstables
by checking if token range overlaps.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:25:04 -03:00
Raphael S. Carvalho
0d8b4af9a0 import LeveledManifest.java
LeveledManifest implements the Leveled compaction strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:24:58 -03:00
Avi Kivity
c8cc8a0b0f Merge seastar upstream
* seastar 8207f2c...46fd389 (1):
  > rwlock: drop extranous '}'
2015-10-15 17:19:28 +03:00
Gleb Natapov
f37979eeff Make exception string generation code less verbose 2015-10-15 17:14:51 +03:00
Gleb Natapov
510ed4a1a0 Fix operator<< for std::exception_ptr to receive reference to an object 2015-10-15 17:14:50 +03:00
Avi Kivity
bd5ca7b998 Merge seastar upstream
* seastar a2523ae...8207f2c (3):
  > rwlock: provide lock / unlock semantics
  > with_lock: run a function under a lock
  > rwlock: add documentation to the rwlock module
2015-10-15 17:02:50 +03:00
Calle Wilund
7d93861b28 commitlog_test: test fix
Fixes spurious failures in test_commitlog_discard_completed_segments
* Do explicit sync on all segments to prevent async flushed from keeping
  segements alive.
* Use counter instead of actual file counting to avoid racing with
  pre-allocation of segments
2015-10-15 17:01:11 +03:00
Amnon Heiman
1acd47ddce API: Adding implementation for the stream manager metrics
This adds an implementation for the stream manager metrics.

And the following url will be available:
/stream_manager/metrics/outbound
/stream_manager/metrics/incoming/{peer}
/stream_manager/metrics/incoming
/stream_manager/metrics/outgoing/{peer}
/stream_manager/metrics/outgoing

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-15 16:04:36 +03:00
Amnon Heiman
164ccd0525 API: Adding the stream manager Swagger definition
This adds the swagger definition file for the stream manager. The API is
based on the StreamManagerMBean and the StreamMetrics.
The following commands where added:
get_current_streams
get_current_streams_state
get_all_active_streams_outbound
get_total_incoming_bytes
get_all_total_incoming_bytes
get_total_outgoing_bytes
get_all_total_outgoing_bytes
2015-10-15 16:04:11 +03:00
Pekka Enberg
87fae37c9c dist/docker: Add missing "hostname" package
The Fedora base image has changed so we need to add "hostname" that's
used by the Docker-specific launch script to our image.

Fixes Scylla startup.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-15 13:46:53 +03:00
Avi Kivity
a611a235ee Merge "Flush queue bugfixes" from Calle
"Fixes the crashes in debug mode with the flush queue test, and
simplifies and cleans up the queue itself.

Aforementioned crashes happened due to reordering with the signalling
loop in previous version. A task completing could race with a reordered
loop continuation in who would get to signal and remove an item.

Rewritten to use much simpler promise chaining instead (which also allows
return value to propagate from Pre- to post op), ensuring only one actor
modifies the queue entry."
2015-10-15 13:04:40 +03:00
Avi Kivity
67eb176029 Merge "Invalidate CQL prepared statements on schema changes" from Pekka
"This series converts MigrationSubscriber to C++ and wires it up to
invalidate CQL prepared statements on schema changes that affect them."
2015-10-15 11:44:13 +03:00
Pekka Enberg
e77fbfe709 cql3/query_processor: Wire up migration subscriber
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-15 09:18:53 +03:00
Pekka Enberg
f659e7bc10 cql3: Convert MigrationSubscriber to C++
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-15 09:18:52 +03:00
Pekka Enberg
3a30486d30 cql3/query_processor: Implement invalidate_prepared_statement()
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-15 09:18:52 +03:00
Pekka Enberg
c80deeb6fe cql3/query_processor: Add get_query_processor() helper
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-15 09:18:52 +03:00
Pekka Enberg
1890d276b9 cql3: Add depends_on_{keyspace|column_family} helper to cql_statement
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-15 09:18:52 +03:00
Calle Wilund
ea0f3504c2 flush_queue_test: Clean up and enhance the test a little
Now also tests:

* Value propagation
* Queue wait
2015-10-15 02:15:24 +02:00
Calle Wilund
cd9f5e38f7 flush_queue: Fix task reordering bug, simplify code, allow value propagation
Previous version dit looping on post execution and signaling of waiters.
This could "race" with an op just finishing if task reordering happened.

This version simplifies the code significantly (and raises the question why
it was not written like this in the first place... Shame on me) by simpy
building a promise-dependency chain between _previous_ queue items and next
instead.

Also, the code now handles propagation of return value from the "Func" pre-op
to the "Post" op, with exceptions automatically handled.
2015-10-15 02:10:26 +02:00
Avi Kivity
849464670c commitlog: make new segments more xfs-friendly
xfs doesn't like writes beyond eof (exactly at eof is fine), and due
to continuation reordering, we sometimes do that.

Fix by pre-truncating the segment to its maximum size.
2015-10-14 17:32:59 +03:00
Calle Wilund
206acd8b5b commitlog: Make reader handle pre-allocated files
Silently ignore, and assume eof if reading zeroed file or chunk header data
Reading entries already deal with this.
2015-10-14 17:32:23 +03:00
Calle Wilund
2729d5dd71 commitlog: ensure file size remains <= max_size
Re-check file size overflow after each cycle() call (new buffer),
otherwise we could write more, in the case we are storing a mutation
larger than current buffer size (current pos + sizeof(mut) < max_size, but
after cycle required by sizeof(mut) > buf_remain, the former might not be
true anymore.
2015-10-14 17:32:22 +03:00
Avi Kivity
6a31daffd6 Merge "Unify logged and unlogged batch write code paths" from Gleb
"Logic duplication bothered me for a long time. This series consolidates both
cases as much as possible."
2015-10-14 17:26:40 +03:00
Gleb Natapov
19770268be storage_proxy: use common write code to write batch log mutations.
Reworks write code further so it can be used to write batch log
mutations.
2015-10-14 17:12:57 +03:00
Gleb Natapov
db49a196da storage_proxy: remove code duplication between logged and unlogged batches
Currently logged batch has most of the logic on unlogged batch
duplicated. This patch rework unlogged batch code in such a way that
it can be reused.
2015-10-14 17:12:57 +03:00
Gleb Natapov
c6644d5721 storage_proxy: remove outdated comment 2015-10-14 17:12:57 +03:00
Gleb Natapov
77ceeee2a8 storage_proxy: move array instead of copy it. 2015-10-14 17:12:57 +03:00
Avi Kivity
77daa7a59f Merge "Flush queue ordering" from Calle
"Adds a small utility queue and through this enforces memtable flush ordering
such that a flush may _run_ unchecked, however the "post" operation may
execute once all "lower numbered" (i.e. lower replay position) post ops
has finished.

This means that:
a.) Callbacks to commitlog are now guaranteed to fulfill ordering criteria
b.) Calling column_family::flush() and waiting for the result will also
     wait for any previously initiated flushes to finish. But not those
     initiated _after_."
2015-10-14 15:13:24 +03:00
Calle Wilund
012ab24469 column_family: Add flush queue object to act as ordering guarantee 2015-10-14 14:07:40 +02:00
Calle Wilund
d8658d4536 flush_queue_test
Small test to verify at least some integrity of the util in question
2015-10-14 14:07:39 +02:00
Calle Wilund
4540036a01 Add "flush_queue" helper structure
Small utility to order operation->post operation
so that the "post" step is guaranteed to only be run
when all "post"-ops for lower valued keys (T) have been completed

This is a generalized utility mainly to be testable.
2015-10-14 14:07:38 +02:00
Avi Kivity
ed2481db7f Merge branch-0.10 2015-10-14 15:03:19 +03:00
Asias He
c58ae5432c storage_service: Fix nodetool info return wrong gossiper status
Before:
$ nodetool info
ID                     : a5adfbbf-cfd8-4c88-ab6b-6a34ccc2857c
Gossip active          : false

After:
$ nodetool info
ID                     : a5adfbbf-cfd8-4c88-ab6b-6a34ccc2857c
Gossip active          : true

Fix #354.
2015-10-14 12:37:51 +03:00
Avi Kivity
1a439f2259 Merge seastar upstream
* seastar 78e3924...a2523ae (7):
  > core: fix pipe unread
  > Merge 'xfs-extents'
  > Merge "separate-dma-alignment"
  > output_stream: wait for stream to be taken out of poller in case final flush returns exception.
  > reactor: Use more widely compatible xfs include
  > readme: Add xfslibs-dev to Ubuntu deps
  > pipe: add unread() operation
2015-10-14 11:49:11 +03:00
Pekka Enberg
921dc9ab9f cql3: Add keyspace() and column_family() to select_statement
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-14 09:12:40 +03:00
Avi Kivity
cd6054253c Merge "fix consumer parser" from Raphael
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2015-10-13 18:53:27 +03:00
Raphael S. Carvalho
d05a5fbeb4 tests: add testcase for bug on consumer parser
problem described by commit:
3926748594

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-13 11:49:57 -03:00
Raphael S. Carvalho
3926748594 sstable: fix consumer parser
The first problem is the while loop around the code that processes prestate.
That's wrong because there may be a need to read more data before continuing
to process a prestate.
The second problem is the code assumption that a prestate will be processed
at once, and then unconditionally process the current state.
Both problems are likely to happen when reading a large buffer because more
than one read may be required.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-13 11:45:10 -03:00
Avi Kivity
c1cfec3e5a dht: mark boot_strapper logger static
Or it violates the ODR and causes link errors.
2015-10-13 16:42:42 +03:00
Tomasz Grabiec
11eb9a6260 range: Fix range::contains()
A range open-ended on both sides should contain all wrapping ranges.

Spotted by Avi.
2015-10-13 12:14:54 +03:00
Nadav Har'El
39f70a043d main: don't warn twice about the same directory
I was mildly annoyed by seeing two warnings about the same directory not
being XFS, when the sstable directory and the commitlog directory are the
same one (I don't know if this is typical, but this is what I do in all
my tests...). So I wrote this trivial patch to make sure not to test the
same directory twice.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-10-13 11:23:55 +03:00
Avi Kivity
eed11583cb Merge "Add node support" from Asias
"With this, a new node can stream data from existing nodes when joins the cluster.

I tested with the following:

  1) stat a node 1

  2) insert data into node 1

  3) start node 2

I can see from the logger that data is streamed correctly from node 1
to node 2."
2015-10-13 11:06:11 +03:00
Asias He
cf9d9e2ced boot_strapper: Enable range_streamer code in bootstrap
Add code to actually stream data from other nodes during bootstrap.

I tested with the following:

   1) stat a node 1

   2) insert data into node 1

   3) start node 2

I can see from the logger that data is streamed correctly from node 1
to node 2.
2015-10-13 15:54:18 +08:00
Asias He
0d1e5c3961 boot_strapper: Add debug info for get_bootstrap_tokens
Print the token generated for the node who is bootstrapping.
2015-10-13 15:48:53 +08:00
Asias He
e92a759dad locator: Add debug info for abstract_replication_strategy::get_address_ranges
It is useful to check relations between token and its primary range.
2015-10-13 15:46:36 +08:00
Asias He
dc02d76aee range_streamer: Implement fetch_async
It is used by boot_strapper::bootstrap() in bootstrap process to start
the streaming.
2015-10-13 15:45:56 +08:00
Asias He
887c0a36ec range_streamer: Implement add_ranges
It is used by boot_strapper::bootstrap() in bootstrap process.
2015-10-13 15:45:56 +08:00
Asias He
1c1f9bed09 range_streamer: Implement use_strict_sources_for_ranges 2015-10-13 15:45:55 +08:00
Asias He
43d4e62b5a range_streamer: Implement add_source_filter 2015-10-13 15:45:55 +08:00
Asias He
d47ea88aa8 range_streamer: Implement get_all_ranges_with_strict_sources_for 2015-10-13 15:45:55 +08:00
Asias He
84de936e43 range_streamer: Implement get_all_ranges_with_sources_for 2015-10-13 15:45:55 +08:00
Asias He
944e28cd6c range_streamer: Implement get_range_fetch_map 2015-10-13 15:45:55 +08:00
Asias He
d986a4d875 range_streamer: Add constructor 2015-10-13 15:45:55 +08:00
Asias He
1d6c081766 range_streamer: Add i_source_filter and failure_detector_source_filter
It is used to filter out unwanted node.
2015-10-13 15:45:55 +08:00
Asias He
c8b9a6fa06 dht: Convert RangeStreamer to C++ 2015-10-13 15:45:55 +08:00
Asias He
b95521194e dht: Import dht/RangeStreamer 2015-10-13 15:45:55 +08:00
Asias He
b1c92f377d token_metadata: Add get_pending_ranges
One version returns only the ranges
   std::vector<range<token>>

Another version returns a map
   std::unordered_map<range<token>, std::unordered_set<inet_address>>
which is converted from
   std::unordered_multimap<range<token>, inet_address>

They are needed by token_metadata::pending_endpoints_for,
storage_service::get_all_ranges_with_strict_sources_for and
storage_service::decommission.
2015-10-13 15:45:55 +08:00
Asias He
b860f6a393 token_metadata: Add get_pending_ranges_mm
Helper for get_pending_ranges.
2015-10-13 15:45:55 +08:00
Asias He
c96d826fe0 token_metadata: Introduce _pending_ranges member 2015-10-13 15:45:55 +08:00
Asias He
da072b8814 token_metadata: Remove duplicated sortedTokens
It is implemented already.
2015-10-13 15:45:55 +08:00
Asias He
d820c83141 locator: Add abstract_replication_strategy::get_pending_address_ranges
Given the current token_metadata and the new token which will be
inserted into the ring after bootstrap, calculate the ranges this new
node will be responsible for.

This is needed by boot_strapper::bootstrap().
2015-10-13 15:45:55 +08:00
Asias He
1adb27e283 token_metadata: Add clone_only_token_map
Needed by get_pending_address_ranges.
2015-10-13 15:45:55 +08:00
Asias He
527edd69ae locator: Add abstract_replication_strategy::get_range_addresses
Needed by range_streamer::get_all_ranges_with_sources_for.
2015-10-13 15:45:55 +08:00
Asias He
044dcf43de locator: Add abstract_replication_strategy::get_address_ranges
Needed by get_pending_address_ranges.
2015-10-13 15:45:55 +08:00
Asias He
3d0d02816d token_metadata: Add get_primary_ranges_for and get_primary_range_for
Given tokens, return ranges the tokens present. For example, with t1 and
t2, it returns ranges:

(token before t1, t1]
(token before t2, t2]
2015-10-13 15:45:55 +08:00
Asias He
542b1394d7 token_metadata: Add get_predecessor
It is used to get the previous token of this token in the ring.
2015-10-13 15:45:55 +08:00
Asias He
ddfd417c13 locator: Make calculate_natural_endpoints take extra token_metadata parameter
When adding/removing a node, we need to use a temporary token_metadata
with pending tokens.
2015-10-13 15:45:55 +08:00
Asias He
7959c12073 stream_session: Support column_families is empty case
An empty column_families means to get all the column families.
2015-10-13 15:44:59 +08:00
Tomasz Grabiec
a383f91b68 range: Implement range::contains() which takes another range 2015-10-13 15:44:36 +08:00
Avi Kivity
0498cebc58 Merge seastar upstream
* seastar c2e86d5...78e3924 (2):
  > fix output stream batching
  > rpc: server connection shutdown fix

Adjust transport/server.cc for the demise of output_stream::batch_flush()
2015-10-12 14:00:40 +03:00
Avi Kivity
b8c8473505 Merge seastar upstream
* seastar 1995676...c2e86d5 (3):
  > doc: add Seastar tutorial
  > resource: increase default reserve memory
  > http client: moved http_response_parser.rl from apps/seawreck into http directory
2015-10-12 10:29:10 +03:00
Amnon Heiman
6fd3c81db5 keyspace clean up should be a POST not a GET 2015-10-11 15:51:56 +03:00
Avi Kivity
e252475e67 Merge "locator: Adding EC2Snitch" from Vlad
"This series adds EC2Snich.

Since both GossipingPropertyFileSnitch and EC2SnitchXXX snitches family
are using the same property file it was logical to share the corresponding
code. Most of this series does just that... "
2015-10-11 14:55:26 +03:00
Glauber Costa
f03480c054 avoid exception when processing caching_options
While trying to debug an unrelated bug, I was annoyed by the fact that parsing
caching options keep throwing exceptions all the time. Those exceptions have no
reason to happen: we try to convert the value to a number, and if we fail we
fall back to one of the two blessed strings.

We could just as easily just test for those strings beforehand and avoid all of
that.

While we're on it, the exception message should show the value of "r", not "k".

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-11 14:53:55 +03:00
Avi Kivity
2b87c8c372 build: allow defaulting test binaries not to strip debug information
Useful for continuous integration, which has enough disk space.
2015-10-11 12:25:20 +03:00
Glauber Costa
b2fef14ada do not calculate truncation time independently
Currently, we are calculating truncated_at during truncate() independently for
each shard. It will work if we're lucky, but it is fairly easy to trigger cases
in which each shard will end up with a slightly different time.

The main problem here, is that this time is used as the snapshot name when auto
snapshots are enabled. Previous to my last fixes, this would just generate two
separate directories in this case, which is wrong but not severe.

But after the fix, this means that both shards will wait for one another to
synchronize and this will hang the database.

Fix this by making sure that the truncation time is calculated before
invoke_on_all in all needed places.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-09 17:17:11 +03:00
Vlad Zolotarov
c30c1bb1ec tests: added ec2_snitch_test
Checks the following:
   - That EC2Snich is able to receive the availability zone from EC2.
   - That the resulting DC and RACK values are distributed among all
     shards.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:20 +03:00
Vlad Zolotarov
38f77bbfe5 locator: add ec2_snitch
This snitch will read the EC2 availability zone and set the DC
and RACK as follows:

If availability zone is "us-east-1d", then
DC="us-east" and  RACK="1d".

If cassandra-rackdc.properties contains "dc_suffix" field then
DC will be appended with its value.

For instance if dc_suffix=_1_cassandra, then in the example above

DC=us-east_1_cassandra

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:20 +03:00
Vlad Zolotarov
1d63ec4143 conf: added cassandra-rackdc.properties
This is a configuration file used by GossipingPropertyFileSnitch and
EC2SnitchXXX snitches family.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:20 +03:00
Vlad Zolotarov
afd44a6e08 locator::gossiping_property_file_snitch: initialize i_endpoint_snitch::io_cpu_id() in the constructor
This is just cleaner.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:19 +03:00
Vlad Zolotarov
2febae90c9 locator::production_snitch_base: unify property file parsing facilities
- Move property file parsing code into production_snitch_base class.
   - Make a parsing code more general:
      - Save the parsed keys in the hash table.
      - Check only two types of errors:
         - Repeating keys.
         - Add a set of all supported keys and add the check for a key
           being supported.
   - Added production_snitch_base.cc file.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:19 +03:00
Vlad Zolotarov
2db219f074 locator::gossiping_property_file_snitch: remove extra "namespace locator"
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:19 +03:00
Vlad Zolotarov
0cfdca55f3 locator::gossiping_property_file_snitch: make get_name() public as it should be
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:19 +03:00
Vlad Zolotarov
ba68436f2f locator::gossiping_property_file_snitch: get rid of warn() and err() wrappers
Use logger() accessor instead for a better resemblance with the Origin.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:19 +03:00
Vlad Zolotarov
d196b034e2 locator::snitch_base: Add a default snitch_base::stop() method
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-10-08 20:57:19 +03:00
Vlad Zolotarov
de6cf8db51 db::config: add get_conf_dir()
This function returns the directory containing the configuration
files. It takes into an account the evironment variables as follows:
   - If SCYLLA_CONF is defines - this is the directory
   - else if SCYLLA_HOME is defines, then $SCYLLA_HOME/conf is the directory
   - else "conf" is a directory, namely the configuration files should be
     looked at ./conf

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Updated get_conf_dir() description.
2015-10-08 20:57:11 +03:00
Avi Kivity
0c8f906b2d Merge "Fixes for snapshots" from Glauber 2015-10-08 18:21:54 +03:00
Glauber Costa
1549a43823 snapshots: fix json type
We are generating a general object ({}), whereas Cassandra 2.1.x generates an
array ([]). Let's do that as well to avoid surprising parsers.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 16:54:51 +02:00
Glauber Costa
cc343eb928 snapshots: handle jsondir creation for empty files case
We still need to write a manifest when there are no files in the snapshot.
But because we have never reached the touch_directory part in the sstables
loop for that case, nobody would have created jsondir in that case.

Since now all the file handling is done in the seal_snapshot phase, we should
just make sure the directory exists before initiating any other disk activity.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 16:54:51 +02:00
Glauber Costa
efdfc78c0c snapshots: get rid of empty tables optimization
We currently have one optimization that returns early when there are no tables
to be snapshotted.

However, because of the way we are writing the manifest now, this will cause
the shard that happens to have tables to be waiting forever. So we should get
rid of it. All shards need to pass through the synchronization point.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 16:54:51 +02:00
Glauber Costa
0776ca1c52 snapshots: don't hash pending snapshots by snapshot name
If we are hashing more than one CF, the snapshot themselves will all have the same name.
This will cause the files from one of them to spill towards the other when writing the manifest.

The proper hash is the jsondir: that one is unique per manifest file.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-08 16:54:51 +02:00
Avi Kivity
e7f58491c3 version: mark master branch as development version 2015-10-08 15:31:50 +03:00
Amnon Heiman
0ec0a5703b API: column family estimated histograms
This patch fix an issue with the read latency estimated historam
implementation and add a call to the estimated number of sstable
histogram.

The later is not yet implemented on the datbase side.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-08 14:59:17 +03:00
Amnon Heiman
6d90eebfb9 column family: Add estimated histogram impl
This patch adds the read and write latency estimated histogram support
and add an estimatd histogram to the number of sstable that were used in
a read.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-08 14:59:17 +03:00
Amnon Heiman
9c16b56b65 estimated_histogram to support sampling
Taking time messurment of an operation can cause peformence degredation,
this patch adds sampling support for the estimated histogram.

It allows to add a sample with a counter that holds what is the actual
total number so far. So the samplling will be counted as multiple
entries in the estimated histogram.

The totatl count of the entries in the histogram will equal the _count
parameter.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-08 14:59:17 +03:00
892 changed files with 54680 additions and 18086 deletions

9
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,9 @@
## Installation details
Scylla version (or git commit hash):
Cluster size:
OS (RHEL/CentOS/Ubuntu/AWS AMI):
## Hardware details (for performance issues)
Platform (physical/VM/cloud instance type/docker):
Hardware: sockets= cores= hyperthreading= memory=
Disks: (SSD/HDD, count)

5
.gitignore vendored
View File

@@ -4,3 +4,8 @@
build
build.ninja
cscope.*
/debian/
dist/ami/files/*.rpm
dist/ami/variables.json
dist/ami/scylla_deploy.sh
*.pyc

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

103
IDL.md Normal file
View File

@@ -0,0 +1,103 @@
#IDL definition
The schema we use similar to c++ schema.
Use class or struct similar to the object you need the serializer for.
Use namespace when applicable.
##keywords
* class/struct - a class or a struct like C++
class/struct can have final or stub marker
* namespace - has the same C++ meaning
* enum class - has the same C++ meaning
* final modifier for class - when a class mark as final it will not contain a size parameter. Note that final class cannot be extended by future version, so use with care
* stub class - when a class is mark as stub, it means that no code will be generated for this class and it is only there as a documentation.
* version attributes - mark with [[version id ]] mark that a field is available from a specific version
* template - A template class definition like C++
##Syntax
###Namespace
```
namespace ns_name { namespace-body }
```
* ns_name: either a previously unused identifier, in which case this is original-namespace-definition or the name of a namespace, in which case this is extension-namespace-definition
* namespace-body: possibly empty sequence of declarations of any kind (including class and struct definitions as well as nested namespaces)
###class/struct
`
class-key class-name final(optional) stub(optional) { member-specification } ;(optional)
`
* class-key: one of class or struct.
* class-name: the name of the class that's being defined. optionally followed by keyword final, optionally followed by keyword stub
* final: when a class mark as final, it means it can not be extended and there is no need to serialize its size, use with care.
* stub: when a class is mark as stub, it means no code will generate for it and it is added for documentation only.
* member-specification: list of access specifiers, and public member accessor see class member below.
* to be compatible with C++ a class definition can be followed by a semicolon.
###enum
`enum-key identifier enum-base { enumerator-list(optional) }`
* enum-key: only enum class is supported
* identifier: the name of the enumeration that's being declared.
* enum-base: colon (:), followed by a type-specifier-seq that names an integral type (see the C++ standard for the full list of all possible integral types).
* enumerator-list: comma-separated list of enumerator definitions, each of which is either simply an identifier, which becomes the name of the enumerator, or an identifier with an initializer: identifier = integral value.
Note that though C++ allows constexpr as an initialize value, it makes the documentation less readable, hence is not permitted.
###class member
`type member-access attributes(optional) default-value(optional);`
* type: Any valid C++ type, following the C++ notation. note that there should be a serializer for the type, but deceleration order is not mandatory
* member-access: is the way the member can be access. If the member is public it can be the name itself. if not it could be a getter function that should be followed by braces. Note that getter can (and probably should) be const methods.
* attributes: Attributes define by square brackets. Currently are use to mark a version in which a specific member was added [ [ version version-number] ] would mark that the specific member was added in the given version number.
###template
`template < parameter-list > class-declaration`
* parameter-list - a non-empty comma-separated list of the template parameters.
* class-decleration - (See class section) The class name declared become a template name.
##IDL example
Forward slashes comments are ignored until the end of the line.
```
namespace utils {
// An example of a stub class
class UUID stub {
int64_t most_sig_bits;
int64_t least_sig_bits;
}
}
namespace gms {
//an enum example
enum class application_state:int {STATUS = 0,
LOAD,
SCHEMA,
DC};
// example of final class
class versioned_value final {
// getter and setter as public member
int version;
sstring value;
}
class heart_beat_state {
//getter as function
int32_t get_generation();
//default value example
int32_t get_heart_beat_version() = 1;
}
class endpoint_state {
heart_beat_state get_heart_beat_state();
std::map<application_state, versioned_value> get_application_state_map();
}
class gossip_digest {
inet_address get_endpoint();
int32_t get_generation();
//mark that a field was added on a specific version
int32_t get_max_version() [ [version 0.14.2] ];
}
class gossip_digest_ack {
std::vector<gossip_digest> digests();
std::map<inet_address, gms::endpoint_state> get_endpoint_state_map();
}
}
```

76
ORIGIN
View File

@@ -1 +1,77 @@
http://git-wip-us.apache.org/repos/asf/cassandra.git trunk (bf599fb5b062cbcc652da78b7d699e7a01b949ad)
import = bf599fb5b062cbcc652da78b7d699e7a01b949ad
Y = Already in scylla
$ git log --oneline import..cassandra-2.1.11 -- gms/
Y 484e645 Mark node as dead even if already left
d0c166f Add trampled commit back
ba5837e Merge branch 'cassandra-2.0' into cassandra-2.1
718e47f Forgot a damn c/r
a7282e4 Merge branch 'cassandra-2.0' into cassandra-2.1
Y ae4cd69 Print versions for gossip states in gossipinfo.
Y 7fba3d2 Don't mark nodes down before the max local pause interval once paused.
c2142e6 Merge branch 'cassandra-2.0' into cassandra-2.1
ba9a69e checkForEndpointCollision fails for legitimate collisions, finalized list of statuses and nits, CASSANDRA-9765
54470a2 checkForEndpointCollision fails for legitimate collisions, improved version after CR, CASSANDRA-9765
2c9b490 checkForEndpointCollision fails for legitimate collisions, CASSANDRA-9765
4c15970 Merge branch 'cassandra-2.0' into cassandra-2.1
ad8047a ArrivalWindow should use primitives
Y 4012134 Failure detector detects and ignores local pauses
9bcdd0f Merge branch 'cassandra-2.0' into cassandra-2.1
cefaa4e Close incoming connections when MessagingService is stopped
ea1beda Merge branch 'cassandra-2.0' into cassandra-2.1
08dbbd6 Ignore gossip SYNs after shutdown
3c17ac6 Merge branch 'cassandra-2.0' into cassandra-2.1
a64bc43 lists work better when you initialize them
543a899 change list to arraylist
730d4d4 Merge branch 'cassandra-2.0' into cassandra-2.1
e3e2de0 change list to arraylist
f7884c5 Merge branch 'cassandra-2.0' into cassandra-2.1
Y 84b2846 remove redundant state
4f2c372 Merge branch 'cassandra-2.0' into cassandra-2.1
Y b2c62bb Add shutdown gossip state to prevent timeouts during rolling restarts
Y def4835 Add missing follow on fix for 7816 only applied to cassandra-2.1 branch in 763130bdbde2f4cec2e8973bcd5203caf51cc89f
Y 763130b Followup commit for 7816
1376b8e Merge branch 'cassandra-2.0' into cassandra-2.1
Y 2199a87 Fix duplicate up/down messages sent to native clients
136042e Merge branch 'cassandra-2.0' into cassandra-2.1
Y eb9c5bb Improve FD logging when the arrival time is ignored.
$ git log --oneline import..cassandra-2.1.11 -- service/StorageService.java
92c5787 Keep StorageServiceMBean interface stable
6039d0e Fix DC and Rack in nodetool info
a2f0da0 Merge branch 'cassandra-2.0' into cassandra-2.1
c4de752 Follow-up to CASSANDRA-10238
e889ee4 2i key cache load fails
4b1d59e Merge branch 'cassandra-2.0' into cassandra-2.1
257cdaa Fix consolidating racks violating the RF contract
Y 27754c0 refuse to decomission if not in state NORMAL patch by Jan Karlsson and Stefania for CASSANDRA-8741
Y 5bc56c3 refuse to decomission if not in state NORMAL patch by Jan Karlsson and Stefania for CASSANDRA-8741
Y 8f9ca07 Cannot replace token does not exist - DN node removed as Fat Client
c2142e6 Merge branch 'cassandra-2.0' into cassandra-2.1
54470a2 checkForEndpointCollision fails for legitimate collisions, improved version after CR, CASSANDRA-9765
1eccced Handle corrupt files on startup
2c9b490 checkForEndpointCollision fails for legitimate collisions, CASSANDRA-9765
c4b5260 Merge branch 'cassandra-2.0' into cassandra-2.1
Y 52dbc3f Can't transition from write survey to normal mode
9966419 Make rebuild only run one at a time
d693ca1 Merge branch 'cassandra-2.0' into cassandra-2.1
be9eff5 Add option to not validate atoms during scrub
2a4daaf followup fix for 8564
93478ab Wait for anticompaction to finish
9e9846e Fix for harmless exceptions being logged as ERROR
6d06f32 Fix anticompaction blocking ANTI_ENTROPY stage
4f2c372 Merge branch 'cassandra-2.0' into cassandra-2.1
Y b2c62bb Add shutdown gossip state to prevent timeouts during rolling restarts
Y cba1b68 Fix failed bootstrap/replace attempts being persisted in system.peers
f59df28 Allow takeColumnFamilySnapshot to take a list of tables patch by Sachin Jarin; reviewed by Nick Bailey for CASSANDRA-8348
Y ac46747 Fix failed bootstrap/replace attempts being persisted in system.peers
5abab57 Merge branch 'cassandra-2.0' into cassandra-2.1
0ff9c3c Allow reusing snapshot tags across different column families.
f9c57a5 Merge branch 'cassandra-2.0' into cassandra-2.1
Y b296c55 Fix MOVED_NODE client event
bbb3fc7 Merge branch 'cassandra-2.0' into cassandra-2.1
37eb2a0 Fix NPE in nodetool getendpoints with bad ks/cf
f8b43d4 Merge branch 'cassandra-2.0' into cassandra-2.1
e20810c Remove C* specific class from JMX API

View File

@@ -11,11 +11,35 @@ git submodule init
git submodule update --recursive
```
### Building scylla on Fedora
Installing required packages:
### Building and Running Scylla on Fedora
* Installing required packages:
```
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing
```
* Build Scylla
```
./configure.py --mode=release --with=scylla --disable-xen
ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM
```
* Run Scylla
```
./build/release/scylla
```
* run Scylla with one CPU and ./tmp as data directory
```
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
```
* For more run options:
```
./build/release/scylla --help
```
## Building Fedora RPM
@@ -56,5 +80,17 @@ docker build -t <image-name> .
Run the image with:
```
docker run -i -t <image name>
docker run -p $(hostname -i):9042:9042 -i -t <image name>
```
## Contributing to Scylla
Do not send pull requests.
Send patches to the mailing list address scylladb-dev@googlegroups.com.
Be sure to subscribe.
In order for your patches to be merged, you must sign the Contributor's
License Agreement, protecting your rights and ours. See
http://www.scylladb.com/opensource/cla/.

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=0.10
VERSION=1.1.4
if test -f version
then

View File

@@ -579,30 +579,6 @@
}
]
},
{
"path":"/column_family/sstables/snapshots_size/{name}",
"operations":[
{
"method":"GET",
"summary":"the size of SSTables in 'snapshots' subdirectory which aren't live anymore",
"type":"double",
"nickname":"true_snapshots_size",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/column_family/metrics/memtable_columns_count/{name}",
"operations":[
@@ -2041,7 +2017,7 @@
]
},
{
"path":"/column_family/metrics/true_snapshots_size/{name}",
"path":"/column_family/metrics/snapshots_size/{name}",
"operations":[
{
"method":"GET",

View File

@@ -15,7 +15,7 @@
"summary":"get List of running compactions",
"type":"array",
"items":{
"type":"jsonmap"
"type":"summary"
},
"nickname":"get_compactions",
"produces":[
@@ -27,16 +27,35 @@
]
},
{
"path":"/compaction_manager/compaction_summary",
"path":"/compaction_manager/compaction_history",
"operations":[
{
"method":"GET",
"summary":"get compaction summary",
"summary":"get List of the compaction history",
"type":"array",
"items":{
"type":"string"
"type":"history"
},
"nickname":"get_compaction_summary",
"nickname":"get_compaction_history",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/compaction_manager/compaction_info",
"operations":[
{
"method":"GET",
"summary":"get a list of all active compaction info",
"type":"array",
"items":{
"type":"compaction_info"
},
"nickname":"get_compaction_info",
"produces":[
"application/json"
],
@@ -87,7 +106,7 @@
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"string"
"paramType":"query"
}
]
}
@@ -155,32 +174,112 @@
}
],
"models":{
"mapper":{
"id":"mapper",
"description":"A key value mapping",
"row_merged":{
"id":"row_merged",
"description":"A row merged information",
"properties":{
"key":{
"type":"string",
"description":"The key"
"type":"int",
"description":"The number of sstable"
},
"value":{
"type":"string",
"description":"The value"
"type":"long",
"description":"The number or row compacted"
}
}
},
"jsonmap":{
"id":"jsonmap",
"description":"A json representation of a map as a list of key value",
"compaction_info" :{
"id": "compaction_info",
"description":"A key value mapping",
"properties":{
"operation_type":{
"type":"string",
"description":"The operation type"
},
"completed":{
"type":"long",
"description":"The current completed"
},
"total":{
"type":"long",
"description":"The total to compact"
},
"unit":{
"type":"string",
"description":"The compacted unit"
}
}
},
"summary":{
"id":"summary",
"description":"A compaction summary object",
"properties":{
"value":{
"type":"array",
"items":{
"type":"mapper"
},
"description":"A list of key, value mapping"
"id":{
"type":"string",
"description":"The UUID"
},
"ks":{
"type":"string",
"description":"The keyspace name"
},
"cf":{
"type":"string",
"description":"The column family name"
},
"completed":{
"type":"long",
"description":"The number of units completed"
},
"total":{
"type":"long",
"description":"The total number of units"
},
"task_type":{
"type":"string",
"description":"The task compaction type"
},
"unit":{
"type":"string",
"description":"The units being used"
}
}
},
"history": {
"id":"history",
"description":"Compaction history information",
"properties":{
"id":{
"type":"string",
"description":"The UUID"
},
"cf":{
"type":"string",
"description":"The column family name"
},
"ks":{
"type":"string",
"description":"The keyspace name"
},
"compacted_at":{
"type":"long",
"description":"The time of compaction"
},
"bytes_in":{
"type":"long",
"description":"Bytes in"
},
"bytes_out":{
"type":"long",
"description":"Bytes out"
},
"rows_merged":{
"type":"array",
"items":{
"type":"row_merged"
},
"description":"The merged rows"
}
}
}
}
}
}

View File

@@ -48,7 +48,10 @@
{
"method":"GET",
"summary":"Get all endpoint states",
"type":"string",
"type":"array",
"items":{
"type":"endpoint_state"
},
"nickname":"get_all_endpoint_states",
"produces":[
"application/json"
@@ -148,6 +151,57 @@
"description": "The value"
}
}
},
"endpoint_state": {
"id": "states",
"description": "Holds an endpoint state",
"properties": {
"addrs": {
"type": "string",
"description": "The endpoint address"
},
"generation": {
"type": "int",
"description": "The heart beat generation"
},
"version": {
"type": "int",
"description": "The heart beat version"
},
"update_time": {
"type": "long",
"description": "The update timestamp"
},
"is_alive": {
"type": "boolean",
"description": "Is the endpoint alive"
},
"application_state" : {
"type":"array",
"items":{
"type":"version_value"
},
"description": "Is the endpoint alive"
}
}
},
"version_value": {
"id": "version_value",
"description": "Holds a version value for an application state",
"properties": {
"application_state": {
"type": "int",
"description": "The application state enum index"
},
"value": {
"type": "string",
"description": "The version value"
},
"version": {
"type": "int",
"description": "The application state version"
}
}
}
}
}

View File

@@ -8,13 +8,16 @@
],
"apis":[
{
"path":"/messaging_service/totaltimeouts",
"path":"/messaging_service/messages/timeout",
"operations":[
{
"method":"GET",
"summary":"Total number of timeouts happened on this node",
"type":"long",
"nickname":"get_totaltimeouts",
"summary":"Get the number of timeout messages",
"type":"array",
"items":{
"type":"message_counter"
},
"nickname":"get_timeout_messages",
"produces":[
"application/json"
],
@@ -25,7 +28,7 @@
]
},
{
"path":"/messaging_service/messages/dropped",
"path":"/messaging_service/messages/dropped_by_ver",
"operations":[
{
"method":"GET",
@@ -34,6 +37,25 @@
"items":{
"type":"verb_counter"
},
"nickname":"get_dropped_messages_by_ver",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/messaging_service/messages/dropped",
"operations":[
{
"method":"GET",
"summary":"Get the number of messages that were dropped before sending",
"type":"array",
"items":{
"type":"message_counter"
},
"nickname":"get_dropped_messages",
"produces":[
"application/json"
@@ -143,6 +165,49 @@
]
}
]
},
{
"path":"/messaging_service/messages/respond_completed",
"operations":[
{
"method":"GET",
"summary":"Get the number of completed respond messages",
"type":"array",
"items":{
"type":"message_counter"
},
"nickname":"get_respond_completed_messages",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/messaging_service/version",
"operations":[
{
"method":"GET",
"summary":"Get the version number",
"type":"int",
"nickname":"get_version",
"produces":[
"application/json"
],
"parameters":[
{
"name":"addr",
"description":"Address",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
],
"models":{
@@ -150,10 +215,10 @@
"id":"message_counter",
"description":"Holds command counters",
"properties":{
"count":{
"value":{
"type":"long"
},
"ip":{
"key":{
"type":"string"
}
}
@@ -168,46 +233,27 @@
"verb":{
"type":"string",
"enum":[
"MUTATION",
"BINARY",
"READ_REPAIR",
"READ",
"REQUEST_RESPONSE",
"STREAM_INITIATE",
"STREAM_INITIATE_DONE",
"STREAM_REPLY",
"STREAM_REQUEST",
"RANGE_SLICE",
"BOOTSTRAP_TOKEN",
"TREE_REQUEST",
"TREE_RESPONSE",
"JOIN",
"GOSSIP_DIGEST_SYN",
"GOSSIP_DIGEST_ACK",
"GOSSIP_DIGEST_ACK2",
"DEFINITIONS_ANNOUNCE",
"DEFINITIONS_UPDATE",
"TRUNCATE",
"SCHEMA_CHECK",
"INDEX_SCAN",
"REPLICATION_FINISHED",
"INTERNAL_RESPONSE",
"COUNTER_MUTATION",
"STREAMING_REPAIR_REQUEST",
"STREAMING_REPAIR_RESPONSE",
"SNAPSHOT",
"MIGRATION_REQUEST",
"GOSSIP_SHUTDOWN",
"_TRACE",
"ECHO",
"REPAIR_MESSAGE",
"PAXOS_PREPARE",
"PAXOS_PROPOSE",
"PAXOS_COMMIT",
"PAGED_RANGE",
"UNUSED_1",
"UNUSED_2",
"UNUSED_3"
"CLIENT_ID",
"MUTATION",
"MUTATION_DONE",
"READ_DATA",
"READ_MUTATION_DATA",
"READ_DIGEST",
"GOSSIP_ECHO",
"GOSSIP_DIGEST_SYN",
"GOSSIP_DIGEST_ACK2",
"GOSSIP_SHUTDOWN",
"DEFINITIONS_UPDATE",
"TRUNCATE",
"REPLICATION_FINISHED",
"MIGRATION_REQUEST",
"PREPARE_MESSAGE",
"PREPARE_DONE_MESSAGE",
"STREAM_MUTATION",
"STREAM_MUTATION_DONE",
"COMPLETE_MESSAGE",
"REPAIR_CHECKSUM_RANGE",
"GET_SCHEMA_VERSION"
]
}
}

View File

@@ -193,8 +193,8 @@
"operations":[
{
"method":"GET",
"summary":"Get the RPC timeout",
"type":"long",
"summary":"Get the RPC timeout in seconds",
"type":"double",
"nickname":"get_rpc_timeout",
"produces":[
"application/json"
@@ -214,10 +214,10 @@
"parameters":[
{
"name":"timeout",
"description":"Timeout in millis",
"description":"Timeout in seconds",
"required":true,
"allowMultiple":false,
"type":"long",
"type":"double",
"paramType":"query"
}
]
@@ -229,8 +229,8 @@
"operations":[
{
"method":"GET",
"summary":"Get the read RPC timeout",
"type":"long",
"summary":"Get the read RPC timeout in seconds",
"type":"double",
"nickname":"get_read_rpc_timeout",
"produces":[
"application/json"
@@ -250,10 +250,10 @@
"parameters":[
{
"name":"timeout",
"description":"timeout_in_millis",
"description":"The timeout in second",
"required":true,
"allowMultiple":false,
"type":"long",
"type":"double",
"paramType":"query"
}
]
@@ -265,8 +265,8 @@
"operations":[
{
"method":"GET",
"summary":"Get the write RPC timeout",
"type":"long",
"summary":"Get the write RPC timeout in seconds",
"type":"double",
"nickname":"get_write_rpc_timeout",
"produces":[
"application/json"
@@ -286,10 +286,10 @@
"parameters":[
{
"name":"timeout",
"description":"timeout in millisecond",
"description":"timeout in seconds",
"required":true,
"allowMultiple":false,
"type":"long",
"type":"double",
"paramType":"query"
}
]
@@ -301,8 +301,8 @@
"operations":[
{
"method":"GET",
"summary":"Get counter write rpc timeout",
"type":"long",
"summary":"Get counter write rpc timeout in seconds",
"type":"double",
"nickname":"get_counter_write_rpc_timeout",
"produces":[
"application/json"
@@ -322,10 +322,10 @@
"parameters":[
{
"name":"timeout",
"description":"timeout in millisecond",
"description":"timeout in seconds",
"required":true,
"allowMultiple":false,
"type":"long",
"type":"double",
"paramType":"query"
}
]
@@ -337,8 +337,8 @@
"operations":[
{
"method":"GET",
"summary":"Get CAS contention timeout",
"type":"long",
"summary":"Get CAS contention timeout in seconds",
"type":"double",
"nickname":"get_cas_contention_timeout",
"produces":[
"application/json"
@@ -358,10 +358,10 @@
"parameters":[
{
"name":"timeout",
"description":"timeout in millisecond",
"description":"timeout in second",
"required":true,
"allowMultiple":false,
"type":"long",
"type":"double",
"paramType":"query"
}
]
@@ -373,8 +373,8 @@
"operations":[
{
"method":"GET",
"summary":"Get range rpc timeout",
"type":"long",
"summary":"Get range rpc timeout in seconds",
"type":"double",
"nickname":"get_range_rpc_timeout",
"produces":[
"application/json"
@@ -394,10 +394,10 @@
"parameters":[
{
"name":"timeout",
"description":"timeout in millisecond",
"description":"timeout in second",
"required":true,
"allowMultiple":false,
"type":"long",
"type":"double",
"paramType":"query"
}
]
@@ -409,8 +409,8 @@
"operations":[
{
"method":"GET",
"summary":"Get truncate rpc timeout",
"type":"long",
"summary":"Get truncate rpc timeout in seconds",
"type":"double",
"nickname":"get_truncate_rpc_timeout",
"produces":[
"application/json"
@@ -430,10 +430,10 @@
"parameters":[
{
"name":"timeout",
"description":"timeout in millisecond",
"description":"timeout in second",
"required":true,
"allowMultiple":false,
"type":"long",
"type":"double",
"paramType":"query"
}
]
@@ -717,7 +717,7 @@
]
},
{
"path": "/storage_proxy/metrics/read/latency/histogram",
"path": "/storage_proxy/metrics/read/histogram",
"operations": [
{
"method": "GET",
@@ -732,7 +732,7 @@
]
},
{
"path": "/storage_proxy/metrics/range/latency/histogram",
"path": "/storage_proxy/metrics/range/histogram",
"operations": [
{
"method": "GET",
@@ -807,7 +807,7 @@
]
},
{
"path": "/storage_proxy/metrics/write/latency/histogram",
"path": "/storage_proxy/metrics/write/histogram",
"operations": [
{
"method": "GET",
@@ -820,7 +820,103 @@
"parameters": []
}
]
}
},
{
"path":"/storage_proxy/metrics/read/estimated_histogram/",
"operations":[
{
"method":"GET",
"summary":"Get read estimated latency",
"$ref":"#/utils/estimated_histogram",
"nickname":"get_read_estimated_histogram",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_proxy/metrics/read",
"operations":[
{
"method":"GET",
"summary":"Get read latency",
"type":"int",
"nickname":"get_read_latency",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_proxy/metrics/write/estimated_histogram/",
"operations":[
{
"method":"GET",
"summary":"Get write estimated latency",
"$ref":"#/utils/estimated_histogram",
"nickname":"get_write_estimated_histogram",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_proxy/metrics/write",
"operations":[
{
"method":"GET",
"summary":"Get write latency",
"type":"int",
"nickname":"get_write_latency",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_proxy/metrics/range/estimated_histogram/",
"operations":[
{
"method":"GET",
"summary":"Get range estimated latency",
"$ref":"#/utils/estimated_histogram",
"nickname":"get_range_estimated_histogram",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_proxy/metrics/range",
"operations":[
{
"method":"GET",
"summary":"Get range latency",
"type":"int",
"nickname":"get_range_latency",
"produces":[
"application/json"
],
"parameters":[
]
}
]
}
],
"models":{
"mapper_list":{

View File

@@ -290,6 +290,25 @@
}
]
},
{
"path":"/storage_service/describe_ring/",
"operations":[
{
"method":"GET",
"summary":"The TokenRange for a any keyspace",
"type":"array",
"items":{
"type":"token_range"
},
"nickname":"describe_any_ring",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/describe_ring/{keyspace}",
"operations":[
@@ -298,9 +317,9 @@
"summary":"The TokenRange for a given keyspace",
"type":"array",
"items":{
"type":"string"
"type":"token_range"
},
"nickname":"describe_ring_jmx",
"nickname":"describe_ring",
"produces":[
"application/json"
],
@@ -311,7 +330,7 @@
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
"paramType":"path"
}
]
}
@@ -406,7 +425,7 @@
"summary":"load value. Keys are IP addresses",
"type":"array",
"items":{
"type":"mapper"
"type":"map_string_double"
},
"nickname":"get_load_map",
"produces":[
@@ -609,7 +628,7 @@
"path":"/storage_service/keyspace_cleanup/{keyspace}",
"operations":[
{
"method":"GET",
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace",
"type":"int",
"nickname":"force_keyspace_cleanup",
@@ -778,8 +797,88 @@
"paramType":"path"
},
{
"name":"options",
"description":"Options for the repair",
"name":"primaryRange",
"description":"If the value is the string 'true' with any capitalization, repair only the first range returned by the partitioner.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"parallelism",
"description":"Repair parallelism, can be 0 (sequential), 1 (parallel) or 2 (datacenter-aware).",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"incremental",
"description":"If the value is the string 'true' with any capitalization, perform incremental repair.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"jobThreads",
"description":"An integer specifying the parallelism on each node.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"ranges",
"description":"An explicit list of ranges to repair, overriding the default choice. Each range is expressed as token1:token2, and multiple ranges can be given as a comma separated list.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"startToken",
"description":"Token on which to begin repair",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"endToken",
"description":"Token on which to end repair",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"columnFamilies",
"description":"Which column families to repair in the given keyspace. Multiple columns families can be named separated by commas. If this option is missing, all column families in the keyspace are repaired.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"dataCenters",
"description":"Which data centers are to participate in this repair. Multiple data centers can be listed separated by commas.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"hosts",
"description":"Which hosts are to participate in this repair. Multiple hosts can be listed separated by commas.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"trace",
"description":"If the value is the string 'true' with any capitalization, enable tracing of the repair.",
"required":false,
"allowMultiple":false,
"type":"string",
@@ -890,8 +989,8 @@
],
"parameters":[
{
"name":"token",
"description":"The token to remove",
"name":"host_id",
"description":"Remove the node with host_id from the cluster",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -1945,6 +2044,20 @@
}
}
},
"map_string_double":{
"id":"map_string_double",
"description":"A key value mapping between a string and a double",
"properties":{
"key":{
"type":"string",
"description":"The key"
},
"value":{
"type":"double",
"description":"The value"
}
}
},
"maplist_mapper":{
"id":"maplist_mapper",
"description":"A key value mapping, where key and value are list",
@@ -1969,9 +2082,9 @@
"id":"snapshot",
"description":"Snapshot detail",
"properties":{
"key":{
"ks":{
"type":"string",
"description":"The key snapshot key"
"description":"The key space snapshot key"
},
"cf":{
"type":"string",
@@ -1993,7 +2106,7 @@
"properties":{
"key":{
"type":"string",
"description":"The keyspace"
"description":"The snapshot key"
},
"value":{
"type":"array",
@@ -2003,6 +2116,59 @@
"description":"The column family"
}
}
},
"endpoint_detail":{
"id":"endpoint_detail",
"description":"Endpoint detail",
"properties":{
"host":{
"type":"string",
"description":"The endpoint host"
},
"datacenter":{
"type":"string",
"description":"The endpoint datacenter"
},
"rack":{
"type":"string",
"description":"The endpoint rack"
}
}
},
"token_range":{
"id":"token_range",
"description":"Endpoint range information",
"properties":{
"start_token":{
"type":"string",
"description":"The range start token"
},
"end_token":{
"type":"string",
"description":"The range start token"
},
"endpoints":{
"type":"array",
"items":{
"type":"string"
},
"description":"The endpoints"
},
"rpc_endpoints":{
"type":"array",
"items":{
"type":"string"
},
"description":"The rpc endpoints"
},
"endpoint_details":{
"type":"array",
"items":{
"type":"endpoint_detail"
},
"description":"The endpoint details"
}
}
}
}
}

View File

@@ -25,6 +25,102 @@
]
}
]
},
{
"path":"/stream_manager/metrics/outbound",
"operations":[
{
"method":"GET",
"summary":"Get number of active outbound streams",
"type":"int",
"nickname":"get_all_active_streams_outbound",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/stream_manager/metrics/incoming/{peer}",
"operations":[
{
"method":"GET",
"summary":"Get total incoming bytes",
"type":"int",
"nickname":"get_total_incoming_bytes",
"produces":[
"application/json"
],
"parameters":[
{
"name":"peer",
"description":"The stream peer",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/stream_manager/metrics/incoming",
"operations":[
{
"method":"GET",
"summary":"Get all total incoming bytes",
"type":"int",
"nickname":"get_all_total_incoming_bytes",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/stream_manager/metrics/outgoing/{peer}",
"operations":[
{
"method":"GET",
"summary":"Get total outgoing bytes",
"type":"int",
"nickname":"get_total_outgoing_bytes",
"produces":[
"application/json"
],
"parameters":[
{
"name":"peer",
"description":"The stream peer",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/stream_manager/metrics/outgoing",
"operations":[
{
"method":"GET",
"summary":"Get all total outgoing bytes",
"type":"int",
"nickname":"get_all_total_outgoing_bytes",
"produces":[
"application/json"
],
"parameters":[
]
}
]
}
],
"models":{

114
api/api-doc/system.json Normal file
View File

@@ -0,0 +1,114 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/system",
"produces":[
"application/json"
],
"apis":[
{
"path":"/system/logger",
"operations":[
{
"method":"GET",
"summary":"Get all logger names",
"type":"array",
"items":{
"type":"string"
},
"nickname":"get_all_logger_names",
"produces":[
"application/json"
],
"parameters":[
]
},
{
"method":"POST",
"summary":"Set all logger level",
"type":"void",
"nickname":"set_all_logger_level",
"produces":[
"application/json"
],
"parameters":[
{
"name":"level",
"description":"The new log level",
"required":true,
"allowMultiple":false,
"type":"string",
"enum":[
"error",
"warn",
"info",
"debug",
"trace"
],
"paramType":"query"
}
]
}
]
},
{
"path":"/system/logger/{name}",
"operations":[
{
"method":"GET",
"summary":"Get logger level",
"type":"string",
"nickname":"get_logger_level",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The logger to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
},
{
"method":"POST",
"summary":"Set logger level",
"type":"void",
"nickname":"set_logger_level",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The logger to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"level",
"description":"The new log level",
"required":true,
"allowMultiple":false,
"type":"string",
"enum":[
"error",
"warn",
"info",
"debug",
"trace"
],
"paramType":"query"
}
]
}
]
}
]
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright 2015 ScyllaDB
*/
/*
@@ -38,6 +38,7 @@
#include "hinted_handoff.hh"
#include "http/exception.hh"
#include "stream_manager.hh"
#include "system.hh"
namespace api {
@@ -51,63 +52,98 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
return std::make_unique<reply>();
}
future<> set_server(http_context& ctx) {
future<> set_server_init(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
r.register_exeption_handler(exception_reply);
httpd::directory_handler* dir = new httpd::directory_handler(ctx.api_dir,
new content_replace("html"));
r.put(GET, "/ui", new httpd::file_handler(ctx.api_dir + "/index.html",
new content_replace("html")));
r.add(GET, url("/ui").remainder("path"), dir);
r.add(GET, url("/ui").remainder("path"), new httpd::directory_handler(ctx.api_dir,
new content_replace("html")));
rb->set_api_doc(r);
rb->register_function(r, "storage_service",
"The storage service API");
set_storage_service(ctx,r);
rb->register_function(r, "commitlog",
"The commit log API");
set_commitlog(ctx,r);
rb->register_function(r, "gossiper",
"The gossiper API");
set_gossiper(ctx,r);
rb->register_function(r, "column_family",
"The column family API");
set_column_family(ctx, r);
rb->register_function(r, "system",
"The system related API");
set_system(ctx, r);
});
}
static future<> register_api(http_context& ctx, const sstring& api_name,
const sstring api_desc,
std::function<void(http_context& ctx, routes& r)> f) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, api_name, api_desc, f](routes& r) {
rb->register_function(r, api_name, api_desc);
f(ctx,r);
});
}
future<> set_server_storage_service(http_context& ctx) {
return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
}
future<> set_server_gossip(http_context& ctx) {
return register_api(ctx, "gossiper",
"The gossiper API", set_gossiper);
}
future<> set_server_load_sstable(http_context& ctx) {
return register_api(ctx, "column_family",
"The column family API", set_column_family);
}
future<> set_server_messaging_service(http_context& ctx) {
return register_api(ctx, "messaging_service",
"The messaging service API", set_messaging_service);
}
future<> set_server_storage_proxy(http_context& ctx) {
return register_api(ctx, "storage_proxy",
"The storage proxy API", set_storage_proxy);
}
future<> set_server_stream_manager(http_context& ctx) {
return register_api(ctx, "stream_manager",
"The stream manager API", set_stream_manager);
}
future<> set_server_gossip_settle(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx,r);
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
rb->register_function(r, "endpoint_snitch_info",
"The endpoint snitch info API");
set_endpoint_snitch(ctx, r);
});
}
future<> set_server_done(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
rb->register_function(r, "compaction_manager",
"The Compaction manager API");
set_compaction_manager(ctx, r);
rb->register_function(r, "lsa", "Log-structured allocator API");
set_lsa(ctx, r);
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx,r);
rb->register_function(r, "messaging_service",
"The messaging service API");
set_messaging_service(ctx, r);
rb->register_function(r, "storage_proxy",
"The storage proxy API");
set_storage_proxy(ctx, r);
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
rb->register_function(r, "commitlog",
"The commit log API");
set_commitlog(ctx,r);
rb->register_function(r, "hinted_handoff",
"The hinted handoff API");
set_hinted_handoff(ctx, r);
rb->register_function(r, "collectd",
"The collectd API");
set_collectd(ctx, r);
rb->register_function(r, "endpoint_snitch_info",
"The endpoint snitch info API");
set_endpoint_snitch(ctx, r);
rb->register_function(r, "compaction_manager",
"The Compaction manager API");
set_compaction_manager(ctx, r);
rb->register_function(r, "hinted_handoff",
"The hinted handoff API");
set_hinted_handoff(ctx, r);
rb->register_function(r, "stream_manager",
"The stream manager API");
set_stream_manager(ctx, r);
});
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright 2015 ScyllaDB
*/
/*
@@ -21,31 +21,17 @@
#pragma once
#include "http/httpd.hh"
#include "json/json_elements.hh"
#include "database.hh"
#include "service/storage_proxy.hh"
#include <boost/lexical_cast.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include "api/api-doc/utils.json.hh"
#include "utils/histogram.hh"
#include "http/exception.hh"
#include "api_init.hh"
namespace api {
struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
http_context(distributed<database>& _db, distributed<service::storage_proxy>&
_sp) : db(_db), sp(_sp) {}
};
future<> set_server(http_context& ctx);
template<class T>
std::vector<sstring> container_to_vec(const T& container) {
std::vector<sstring> res;
@@ -128,47 +114,54 @@ inline double pow2(double a) {
return a * a;
}
inline httpd::utils_json::histogram add_histogram(httpd::utils_json::histogram res,
// FIXME: Move to utils::ihistogram::operator+=()
inline utils::ihistogram add_histogram(utils::ihistogram res,
const utils::ihistogram& val) {
if (!res.count._set) {
res = val;
return res;
if (res.count == 0) {
return val;
}
if (val.count == 0) {
return res;
return std::move(res);
}
if (res.min() > val.min) {
if (res.min > val.min) {
res.min = val.min;
}
if (res.max() < val.max) {
if (res.max < val.max) {
res.max = val.max;
}
double ncount = res.count() + val.count;
double ncount = res.count + val.count;
// To get an estimated sum we take the estimated mean
// and multiply it by the true count
res.sum = res.sum() + val.mean * val.count;
double a = res.count()/ncount;
res.sum = res.sum + val.mean * val.count;
double a = res.count/ncount;
double b = val.count/ncount;
double mean = a * res.mean() + b * val.mean;
double mean = a * res.mean + b * val.mean;
res.variance = (res.variance() + pow2(res.mean() - mean) )* a +
res.variance = (res.variance + pow2(res.mean - mean) )* a +
(val.variance + pow2(val.mean -mean))* b;
res.mean = mean;
res.count = res.count() + val.count;
res.count = res.count + val.count;
for (auto i : val.sample) {
res.sample.push(i);
res.sample.push_back(i);
}
return res;
}
inline
httpd::utils_json::histogram to_json(const utils::ihistogram& val) {
httpd::utils_json::histogram h;
h = val;
return h;
}
template<class T, class F>
future<json::json_return_type> sum_histogram_stats(distributed<T>& d, utils::ihistogram F::*f) {
return d.map_reduce0([f](const T& p) {return p.get_stats().*f;}, httpd::utils_json::histogram(),
add_histogram).then([](const httpd::utils_json::histogram& val) {
return make_ready_future<json::json_return_type>(val);
return d.map_reduce0([f](const T& p) {return p.get_stats().*f;}, utils::ihistogram(),
add_histogram).then([](const utils::ihistogram& val) {
return make_ready_future<json::json_return_type>(to_json(val));
});
}

51
api/api_init.hh Normal file
View File

@@ -0,0 +1,51 @@
/*
* Copyright 2016 ScylaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "database.hh"
#include "service/storage_proxy.hh"
#include "http/httpd.hh"
namespace api {
struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp)
: db(_db), sp(_sp) {
}
};
future<> set_server_init(http_context& ctx);
future<> set_server_storage_service(http_context& ctx);
future<> set_server_gossip(http_context& ctx);
future<> set_server_load_sstable(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx);
future<> set_server_stream_manager(http_context& ctx);
future<> set_server_gossip_settle(http_context& ctx);
future<> set_server_done(http_context& ctx);
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -40,7 +40,7 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {
if (pos == sstring::npos) {
pos = name.find(":");
if (pos == sstring::npos) {
throw bad_param_exception("Column family name should be in keyspace::column_family format");
throw bad_param_exception("Column family name should be in keyspace:column_family format");
}
end = pos + 1;
} else {
@@ -64,21 +64,21 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f) {
return map_reduce_cf(ctx, name, 0, [f](const column_family& cf) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family::stats::*f) {
return map_reduce_cf(ctx, 0, [f](const column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::ihistogram column_family::stats::*f) {
return map_reduce_cf(ctx, name, 0, [f](const column_family& cf) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).count;
}, std::plus<int64_t>());
}
@@ -101,7 +101,7 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::ihistogram column_family::stats::*f) {
return map_reduce_cf(ctx, 0, [f](const column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).count;
}, std::plus<int64_t>());
}
@@ -110,28 +110,30 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, const
utils::ihistogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {return p.find_column_family(uuid).get_stats().*f;},
httpd::utils_json::histogram(),
utils::ihistogram(),
add_histogram)
.then([](const httpd::utils_json::histogram& val) {
return make_ready_future<json::json_return_type>(val);
.then([](const utils::ihistogram& val) {
return make_ready_future<json::json_return_type>(to_json(val));
});
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::ihistogram column_family::stats::*f) {
std::function<httpd::utils_json::histogram(const database&)> fun = [f] (const database& db) {
httpd::utils_json::histogram res;
std::function<utils::ihistogram(const database&)> fun = [f] (const database& db) {
utils::ihistogram res;
for (auto i : db.get_column_families()) {
res = add_histogram(res, i.second->get_stats().*f);
}
return res;
};
return ctx.db.map(fun).then([](const std::vector<httpd::utils_json::histogram> &res) {
return make_ready_future<json::json_return_type>(res);
return ctx.db.map(fun).then([](const std::vector<utils::ihistogram> &res) {
std::vector<httpd::utils_json::histogram> r;
boost::copy(res | boost::adaptors::transformed(to_json), std::back_inserter(r));
return make_ready_future<json::json_return_type>(r);
});
}
static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ctx, const sstring& name) {
return map_reduce_cf(ctx, name, 0, [](const column_family& cf) {
return map_reduce_cf(ctx, name, int64_t(0), [](const column_family& cf) {
return cf.get_unleveled_sstables();
}, std::plus<int64_t>());
}
@@ -171,6 +173,35 @@ static ratio_holder mean_row_size(column_family& cf) {
return res;
}
template <typename T>
class sum_ratio {
uint64_t _n = 0;
T _total = 0;
public:
future<> operator()(T value) {
if (value > 0) {
_total += value;
_n++;
}
return make_ready_future<>();
}
// Returns average value of all registered ratios.
T get() && {
return _n ? (_total / _n) : T(0);
}
};
static double get_compression_ratio(column_family& cf) {
sum_ratio<double> result;
for (auto i : *cf.get_sstables()) {
auto compression_ratio = i.second->get_compression_ratio();
if (compression_ratio != sstables::metadata_collector::NO_COMPRESSION_RATIO) {
result(compression_ratio);
}
}
return std::move(result).get();
}
void set_column_family(http_context& ctx, routes& r) {
cf::get_column_family_name.set(r, [&ctx] (const_req req){
vector<sstring> res;
@@ -221,25 +252,25 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().total_space();
}, std::plus<int64_t>());
});
cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().total_space();
}, std::plus<int64_t>());
});
cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
}, std::plus<int64_t>());
});
cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
}, std::plus<int64_t>());
});
@@ -254,7 +285,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.occupancy().total_space();
}, std::plus<int64_t>());
});
@@ -263,21 +294,21 @@ void set_column_family(http_context& ctx, routes& r) {
warn(unimplemented::cause::INDEXES);
return ctx.db.map_reduce0([](const database& db){
return db.dirty_memory_region_group().memory_used();
}, 0, std::plus<int64_t>()).then([](int res) {
}, int64_t(0), std::plus<int64_t>()).then([](int res) {
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.occupancy().used_space();
}, std::plus<int64_t>());
});
cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, 0, [](column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
}, std::plus<int64_t>());
});
@@ -302,7 +333,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
uint64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res += i.second->get_stats_metadata().estimated_row_size.count();
@@ -370,7 +401,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -422,11 +453,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, max_row_size, max_int64);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_row_size, max_int64);
});
cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, max_row_size, max_int64);
return map_reduce_cf(ctx, int64_t(0), max_row_size, max_int64);
});
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -537,20 +568,20 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_index_summary_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
// We are missing the off heap memory calculation
// Return 0 is the wrong value. It's a work around
// until the memory calculation will be available
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
cf::get_all_index_summary_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
cf::get_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
@@ -589,11 +620,16 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cf::get_true_snapshots_size.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<request> req) {
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.local().find_column_family(uuid).get_snapshot_details().then([](
const std::unordered_map<sstring, column_family::snapshot_details>& sd) {
int64_t res = 0;
for (auto i : sd) {
res += i.second.total;
}
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<request> req) {
@@ -615,30 +651,29 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cf::get_row_cache_hit.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
});
cf::get_all_row_cache_hit.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
});
cf::get_row_cache_miss.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
});
cf::get_all_row_cache_miss.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
});
cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {
@@ -662,41 +697,19 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cf::get_sstables_per_read_histogram.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
std::vector<double> res;
return make_ready_future<json::json_return_type>(res);
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], sstables::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_sstable_per_read;
},
sstables::merge, utils_json::estimated_histogram());
});
cf::get_tombstone_scanned_histogram.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
//auto id = get_uuid(req->param["name"], ctx.db.local());
httpd::utils_json::histogram res;
res.count = 0;
res.mean = 0;
res.max = 0;
res.min = 0;
res.sum = 0;
res.variance = 0;
return make_ready_future<json::json_return_type>(res);
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);
});
cf::get_live_scanned_histogram.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
//auto id = get_uuid(req->param["name"], ctx.db.local());
//std::vector<double> res;
httpd::utils_json::histogram res;
res.count = 0;
res.mean = 0;
res.max = 0;
res.min = 0;
res.sum = 0;
res.variance = 0;
return make_ready_future<json::json_return_type>(res);
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);
});
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
@@ -735,11 +748,15 @@ void set_column_family(http_context& ctx, routes& r) {
return std::vector<sstring>();
});
cf::get_compression_ratio.set(r, [](const_req) {
// FIXME
// Currently there are no compression information
// so we return 0 as the ratio
return 0;
cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<request> req) {
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.map_reduce(sum_ratio<double>(), [uuid](database& db) {
column_family& cf = db.find_column_family(uuid);
return make_ready_future<double>(get_compression_ratio(cf));
}).then([] (const double& result) {
return make_ready_future<json::json_return_type>(result);
});
});
cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -21,16 +21,17 @@
#include "compaction_manager.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
namespace api {
using namespace scollectd;
namespace cm = httpd::compaction_manager_json;
using namespace json;
static future<json::json_return_type> get_cm_stats(http_context& ctx,
int64_t compaction_manager::stats::*f) {
return ctx.db.map_reduce0([&](database& db) {
return ctx.db.map_reduce0([f](database& db) {
return db.get_compaction_manager().get_stats().*f;
}, int64_t(0), std::plus<int64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
@@ -38,30 +39,42 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,
}
void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_compactions.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
std::vector<cm::jsonmap> map;
return make_ready_future<json::json_return_type>(map);
});
cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([](database& db) {
std::vector<cm::summary> summaries;
const compaction_manager& cm = db.get_compaction_manager();
cm::get_compaction_summary.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
std::vector<sstring> res;
return make_ready_future<json::json_return_type>(res);
for (const auto& c : cm.get_compactions()) {
cm::summary s;
s.ks = c->ks;
s.cf = c->cf;
s.unit = "keys";
s.task_type = sstables::compaction_name(c->type);
s.completed = c->total_keys_written;
s.total = c->total_partitions;
summaries.push_back(std::move(s));
}
return summaries;
}, std::vector<cm::summary>(), concat<cm::summary>).then([](const std::vector<cm::summary>& res) {
return make_ready_future<json::json_return_type>(res);
});
});
cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>("");
// FIXME
warn(unimplemented::cause::API);
return make_ready_future<json::json_return_type>(json_void());
});
cm::stop_compaction.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>("");
cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<request> req) {
auto type = req->get_query_param("type");
return ctx.db.invoke_on_all([type] (database& db) {
auto& cm = db.get_compaction_manager();
cm.stop_compaction(type);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -81,11 +94,44 @@ void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_bytes_compacted.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
// FIXME
warn(unimplemented::cause::API);
return make_ready_future<json::json_return_type>(0);
});
cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {
return db::system_keyspace::get_compaction_history().then([] (std::vector<db::system_keyspace::compaction_history_entry> history) {
std::vector<cm::history> res;
res.reserve(history.size());
for (auto& entry : history) {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
for (auto it : entry.rows_merged) {
httpd::compaction_manager_json::row_merged e;
e.key = it.first;
e.value = it.second;
h.rows_merged.push(std::move(e));
}
res.push_back(std::move(h));
}
return make_ready_future<json::json_return_type>(res);
});
});
cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
warn(unimplemented::cause::API);
std::vector<cm::compaction_info> res;
return make_ready_future<json::json_return_type>(res);
});
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -22,15 +22,34 @@
#include "failure_detector.hh"
#include "api/api-doc/failure_detector.json.hh"
#include "gms/failure_detector.hh"
#include "gms/application_state.hh"
#include "gms/gossiper.hh"
namespace api {
namespace fd = httpd::failure_detector_json;
void set_failure_detector(http_context& ctx, routes& r) {
fd::get_all_endpoint_states.set(r, [](std::unique_ptr<request> req) {
return gms::get_all_endpoint_states().then([](const sstring& str) {
return make_ready_future<json::json_return_type>(str);
});
std::vector<fd::endpoint_state> res;
for (auto i : gms::get_local_gossiper().endpoint_state_map) {
fd::endpoint_state val;
val.addrs = boost::lexical_cast<std::string>(i.first);
val.is_alive = i.second.is_alive();
val.generation = i.second.get_heart_beat_state().get_generation();
val.version = i.second.get_heart_beat_state().get_heart_beat_version();
val.update_time = i.second.get_update_timestamp().time_since_epoch().count();
for (auto a : i.second.get_application_state_map()) {
fd::version_value version_val;
// We return the enum index and not it's name to stay compatible to origin
// method that the state index are static but the name can be changed.
version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(a.first);
version_val.value = a.second.value;
version_val.version = a.second.version;
val.application_state.push(version_val);
}
res.push_back(val);
}
return make_ready_future<json::json_return_type>(res);
});
fd::get_up_endpoint_count.set(r, [](std::unique_ptr<request> req) {

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -27,47 +27,43 @@ namespace api {
using namespace json;
void set_gossiper(http_context& ctx, routes& r) {
httpd::gossiper_json::get_down_endpoint.set(r, [](std::unique_ptr<request> req) {
return gms::get_unreachable_members().then([](std::set<gms::inet_address> res) {
return make_ready_future<json::json_return_type>(container_to_vec(res));
});
httpd::gossiper_json::get_down_endpoint.set(r, [] (const_req req) {
auto res = gms::get_local_gossiper().get_unreachable_members();
return container_to_vec(res);
});
httpd::gossiper_json::get_live_endpoint.set(r, [](std::unique_ptr<request> req) {
return gms::get_live_members().then([](std::set<gms::inet_address> res) {
return make_ready_future<json::json_return_type>(container_to_vec(res));
});
httpd::gossiper_json::get_live_endpoint.set(r, [] (const_req req) {
auto res = gms::get_local_gossiper().get_live_members();
return container_to_vec(res);
});
httpd::gossiper_json::get_endpoint_downtime.set(r, [](std::unique_ptr<request> req) {
httpd::gossiper_json::get_endpoint_downtime.set(r, [] (const_req req) {
gms::inet_address ep(req.param["addr"]);
return gms::get_local_gossiper().get_endpoint_downtime(ep);
});
httpd::gossiper_json::get_current_generation_number.set(r, [] (std::unique_ptr<request> req) {
gms::inet_address ep(req->param["addr"]);
return gms::get_endpoint_downtime(ep).then([](int64_t res) {
return gms::get_local_gossiper().get_current_generation_number(ep).then([] (int res) {
return make_ready_future<json::json_return_type>(res);
});
});
httpd::gossiper_json::get_current_generation_number.set(r, [](std::unique_ptr<request> req) {
httpd::gossiper_json::get_current_heart_beat_version.set(r, [] (std::unique_ptr<request> req) {
gms::inet_address ep(req->param["addr"]);
return gms::get_current_generation_number(ep).then([](int res) {
return make_ready_future<json::json_return_type>(res);
});
});
httpd::gossiper_json::get_current_heart_beat_version.set(r, [](std::unique_ptr<request> req) {
gms::inet_address ep(req->param["addr"]);
return gms::get_current_heart_beat_version(ep).then([](int res) {
return gms::get_local_gossiper().get_current_heart_beat_version(ep).then([] (int res) {
return make_ready_future<json::json_return_type>(res);
});
});
httpd::gossiper_json::assassinate_endpoint.set(r, [](std::unique_ptr<request> req) {
if (req->get_query_param("unsafe") != "True") {
return gms::assassinate_endpoint(req->param["addr"]).then([] {
return make_ready_future<json::json_return_type>(json_void());
return gms::get_local_gossiper().assassinate_endpoint(req->param["addr"]).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
}
return gms::unsafe_assassinate_endpoint(req->param["addr"]).then([] {
return make_ready_future<json::json_return_type>(json_void());
return gms::get_local_gossiper().unsafe_assassinate_endpoint(req->param["addr"]).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -32,17 +32,17 @@ using namespace net;
namespace api {
using shard_info = messaging_service::shard_info;
using shard_id = messaging_service::shard_id;
using msg_addr = messaging_service::msg_addr;
static const int32_t num_verb = static_cast<int32_t>(messaging_verb::UNUSED_3) + 1;
static const int32_t num_verb = static_cast<int32_t>(messaging_verb::LAST);
std::vector<message_counter> map_to_message_counters(
const std::unordered_map<gms::inet_address, unsigned long>& map) {
std::vector<message_counter> res;
for (auto i : map) {
res.push_back(message_counter());
res.back().ip = boost::lexical_cast<sstring>(i.first);
res.back().count = i.second;
res.back().key = boost::lexical_cast<sstring>(i.first);
res.back().value = i.second;
}
return res;
}
@@ -58,7 +58,7 @@ future_json_function get_client_getter(std::function<uint64_t(const shard_info&)
using map_type = std::unordered_map<gms::inet_address, uint64_t>;
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
ms.foreach_client([&map, f] (const shard_id& id, const shard_info& info) {
ms.foreach_client([&map, f] (const msg_addr& id, const shard_info& info) {
map[id.addr] = f(info);
});
return map;
@@ -70,12 +70,39 @@ future_json_function get_client_getter(std::function<uint64_t(const shard_info&)
};
}
future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)> f) {
return [f](std::unique_ptr<request> req) {
using map_type = std::unordered_map<gms::inet_address, uint64_t>;
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
ms.foreach_server_connection_stats([&map, f] (const rpc::client_info& info, const rpc::stats& stats) mutable {
map[gms::inet_address(net::ipv4_address(info.addr))] = f(stats);
});
return map;
};
return get_messaging_service().map_reduce0(get_shard_map, map_type(), map_sum<map_type>).
then([](map_type&& map) {
return make_ready_future<json::json_return_type>(map_to_message_counters(map));
});
};
}
void set_messaging_service(http_context& ctx, routes& r) {
get_timeout_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().timeout;
}));
get_sent_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().sent_messages;
}));
get_dropped_messages.set(r, get_client_getter([](const shard_info& c) {
// We don't have the same drop message mechanism
// as origin has.
// hence we can always return 0
return 0;
}));
get_exception_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().exception_received;
}));
@@ -84,12 +111,20 @@ void set_messaging_service(http_context& ctx, routes& r) {
return c.get_stats().pending;
}));
get_respond_pending_messages.set(r, get_client_getter([](const shard_info& c) {
return c.get_stats().wait_reply;
get_respond_pending_messages.set(r, get_server_getter([](const rpc::stats& c) {
return c.pending;
}));
get_dropped_messages.set(r, [](std::unique_ptr<request> req) {
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb, 0);
get_respond_completed_messages.set(r, get_server_getter([](const rpc::stats& c) {
return c.sent_messages;
}));
get_version.set(r, [](const_req req) {
return net::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));
});
get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);
return net::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
for (auto i = 0; i < num_verb; i++) {
@@ -102,8 +137,12 @@ void set_messaging_service(http_context& ctx, routes& r) {
for (auto i : verb_counter::verb_wrapper::all_items()) {
verb_counter c;
messaging_verb v = i; // for type safety we use messaging_verb values
if ((*map)[static_cast<int32_t>(v)] > 0) {
c.count = (*map)[static_cast<int32_t>(v)];
auto idx = static_cast<uint32_t>(v);
if (idx >= map->size()) {
throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));
}
if ((*map)[idx] > 0) {
c.count = (*map)[idx];
c.verb = i;
res.push_back(c);
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -23,6 +23,9 @@
#include "service/storage_proxy.hh"
#include "api/api-doc/storage_proxy.json.hh"
#include "api/api-doc/utils.json.hh"
#include "service/storage_service.hh"
#include "db/config.hh"
#include "utils/histogram.hh"
namespace api {
@@ -30,6 +33,23 @@ namespace sp = httpd::storage_proxy_json;
using proxy = service::storage_proxy;
using namespace json;
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, sstables::estimated_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, sstables::estimated_histogram(),
sstables::merge).then([](const sstables::estimated_histogram& val) {
utils_json::estimated_histogram res;
res = val;
return make_ready_future<json::json_return_type>(res);
});
}
static future<json::json_return_type> total_latency(http_context& ctx, utils::ihistogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).mean * (p.get_stats().*f).count;}, 0.0,
std::plus<double>()).then([](double val) {
int64_t res = val;
return make_ready_future<json::json_return_type>(res);
});
}
void set_storage_proxy(http_context& ctx, routes& r) {
sp::get_total_hints.set(r, [](std::unique_ptr<request> req) {
//TBD
@@ -39,7 +59,9 @@ void set_storage_proxy(http_context& ctx, routes& r) {
sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
// FIXME
// hinted handoff is not supported currently,
// so we should return false
return make_ready_future<json::json_return_type>(false);
});
@@ -96,10 +118,8 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
sp::get_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(1);
sp::get_rpc_timeout.set(r, [&ctx](const_req req) {
return ctx.db.local().get_config().request_timeout_in_ms()/1000.0;
});
sp::set_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
@@ -109,10 +129,8 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_read_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_read_rpc_timeout.set(r, [&ctx](const_req req) {
return ctx.db.local().get_config().read_request_timeout_in_ms()/1000.0;
});
sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
@@ -122,10 +140,8 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_write_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_write_rpc_timeout.set(r, [&ctx](const_req req) {
return ctx.db.local().get_config().write_request_timeout_in_ms()/1000.0;
});
sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
@@ -135,11 +151,10 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_counter_write_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_counter_write_rpc_timeout.set(r, [&ctx](const_req req) {
return ctx.db.local().get_config().counter_write_request_timeout_in_ms()/1000.0;
});
sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
@@ -147,10 +162,8 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_cas_contention_timeout.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_cas_contention_timeout.set(r, [&ctx](const_req req) {
return ctx.db.local().get_config().cas_contention_timeout_in_ms()/1000.0;
});
sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<request> req) {
@@ -160,10 +173,8 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_range_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_range_rpc_timeout.set(r, [&ctx](const_req req) {
return ctx.db.local().get_config().range_request_timeout_in_ms()/1000.0;
});
sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
@@ -173,10 +184,8 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_truncate_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_truncate_rpc_timeout.set(r, [&ctx](const_req req) {
return ctx.db.local().get_config().truncate_request_timeout_in_ms()/1000.0;
});
sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<request> req) {
@@ -192,29 +201,29 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
sp::get_read_repair_attempted.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_read_repair_attempted.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::read_repair_attempts);
});
sp::get_read_repair_repaired_blocking.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_read_repair_repaired_blocking.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::read_repair_repaired_blocking);
});
sp::get_read_repair_repaired_background.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
sp::get_read_repair_repaired_background.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::read_repair_repaired_background);
});
sp::get_schema_versions.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
std::vector<sp::mapper_list> res;
return make_ready_future<json::json_return_type>(res);
return service::get_local_storage_service().describe_schema_versions().then([] (auto result) {
std::vector<sp::mapper_list> res;
for (auto e : result) {
sp::mapper_list entry;
entry.key = std::move(e.first);
entry.value = std::move(e.second);
res.emplace_back(std::move(entry));
}
return make_ready_future<json::json_return_type>(std::move(res));
});
});
sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {
@@ -316,6 +325,29 @@ void set_storage_proxy(http_context& ctx, routes& r) {
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::read);
});
sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::estimated_read);
});
sp::get_read_latency.set(r, [&ctx](std::unique_ptr<request> req) {
return total_latency(ctx, &proxy::stats::read);
});
sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::estimated_write);
});
sp::get_write_latency.set(r, [&ctx](std::unique_ptr<request> req) {
return total_latency(ctx, &proxy::stats::write);
});
sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {
return total_latency(ctx, &proxy::stats::range);
});
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -30,8 +30,7 @@
#include "repair/repair.hh"
#include "locator/snitch_base.hh"
#include "column_family.hh"
#include <unordered_map>
#include "utils/fb_utilities.hh"
#include "log.hh"
namespace api {
@@ -45,6 +44,29 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {
throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");
}
static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
std::vector<ss::token_range> res;
for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {
ss::token_range r;
r.start_token = d._start_token;
r.end_token = d._end_token;
r.endpoints = d._endpoints;
r.rpc_endpoints = d._rpc_endpoints;
for (auto det : d._endpoint_details) {
ss::endpoint_detail ed;
ed.host = det._host;
ed.datacenter = det._datacenter;
if (det._rack != "") {
ed.rack = det._rack;
}
r.endpoint_details.push(ed);
}
res.push_back(r);
}
return res;
}
void set_storage_service(http_context& ctx, routes& r) {
ss::local_hostid.set(r, [](std::unique_ptr<request> req) {
return db::system_keyspace::get_local_host_id().then([](const utils::UUID& id) {
@@ -52,28 +74,25 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::get_tokens.set(r, [](std::unique_ptr<request> req) {
return service::sorted_tokens().then([](const std::vector<dht::token>& tokens) {
return make_ready_future<json::json_return_type>(container_to_vec(tokens));
});
ss::get_tokens.set(r, [] (const_req req) {
auto tokens = service::get_local_storage_service().get_token_metadata().sorted_tokens();
return container_to_vec(tokens);
});
ss::get_node_tokens.set(r, [](std::unique_ptr<request> req) {
gms::inet_address addr(req->param["endpoint"]);
return service::get_tokens(addr).then([](const std::vector<dht::token>& tokens) {
return make_ready_future<json::json_return_type>(container_to_vec(tokens));
});
ss::get_node_tokens.set(r, [] (const_req req) {
gms::inet_address addr(req.param["endpoint"]);
auto tokens = service::get_local_storage_service().get_token_metadata().get_tokens(addr);
return container_to_vec(tokens);
});
ss::get_commitlog.set(r, [&ctx](const_req req) {
return ctx.db.local().commitlog()->active_config().commit_log_location;
});
ss::get_token_endpoint.set(r, [](std::unique_ptr<request> req) {
return service::get_token_to_endpoint().then([] (const std::map<dht::token, gms::inet_address>& tokens){
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(tokens, res));
});
ss::get_token_endpoint.set(r, [] (const_req req) {
auto token_to_ep = service::get_local_storage_service().get_token_to_endpoint_map();
std::vector<storage_service_json::mapper> res;
return map_to_key_value(token_to_ep, res);
});
ss::get_leaving_nodes.set(r, [](const_req req) {
@@ -130,12 +149,13 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(res);
});
ss::describe_ring_jmx.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
std::vector<sstring> res;
return make_ready_future<json::json_return_type>(res);
ss::describe_any_ring.set(r, [&ctx](const_req req) {
return describe_ring("");
});
ss::describe_ring.set(r, [&ctx](const_req req) {
auto keyspace = validate_keyspace(ctx, req.param);
return describe_ring(keyspace);
});
ss::get_host_id_map.set(r, [](const_req req) {
@@ -148,74 +168,88 @@ void set_storage_service(http_context& ctx, routes& r) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
});
ss::get_load_map.set(r, [&ctx](std::unique_ptr<request> req) {
// FIXME
// The function should return a mapping between inet address
// and the load (disk space used)
// in origin the implementation is based on the load broadcast
// we do not currently support.
// As a workaround, the local load is calculated (this part is similar
// to origin) and a map with a single entry is return.
return ctx.db.map_reduce0([](database& db) {
int64_t res = 0;
for (auto i : db.get_column_families()) {
res += i.second->get_stats().live_disk_space_used;
ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
std::vector<ss::map_string_double> res;
for (auto i : load_map) {
ss::map_string_double val;
val.key = i.first;
val.value = i.second;
res.push_back(val);
}
return res;
}, 0, std::plus<int64_t>()).then([](int64_t size) {
std::vector<ss::mapper> res;
std::unordered_map<gms::inet_address, double> load_map;
load_map[utils::fb_utilities::get_broadcast_address()] = size;
return make_ready_future<json::json_return_type>(map_to_key_value(load_map, res));
return make_ready_future<json::json_return_type>(res);
});
});
ss::get_current_generation_number.set(r, [](std::unique_ptr<request> req) {
gms::inet_address ep(utils::fb_utilities::get_broadcast_address());
return gms::get_current_generation_number(ep).then([](int res) {
return gms::get_local_gossiper().get_current_generation_number(ep).then([](int res) {
return make_ready_future<json::json_return_type>(res);
});
});
ss::get_natural_endpoints.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
auto key = req->get_query_param("key");
std::vector<sstring> res;
return make_ready_future<json::json_return_type>(res);
ss::get_natural_endpoints.set(r, [&ctx](const_req req) {
auto keyspace = validate_keyspace(ctx, req.param);
return container_to_vec(service::get_local_storage_service().get_natural_endpoints(keyspace, req.get_query_param("cf"),
req.get_query_param("key")));
});
ss::get_snapshot_details.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
std::vector<ss::snapshots> res;
return make_ready_future<json::json_return_type>(res);
return service::get_local_storage_service().get_snapshot_details().then([] (auto result) {
std::vector<ss::snapshots> res;
for (auto& map: result) {
ss::snapshots all_snapshots;
all_snapshots.key = map.first;
std::vector<ss::snapshot> snapshot;
for (auto& cf: map.second) {
ss::snapshot s;
s.ks = cf.ks;
s.cf = cf.cf;
s.live = cf.live;
s.total = cf.total;
snapshot.push_back(std::move(s));
}
all_snapshots.value = std::move(snapshot);
res.push_back(std::move(all_snapshots));
}
return make_ready_future<json::json_return_type>(std::move(res));
});
});
ss::take_snapshot.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto tag = req->get_query_param("tag");
auto keyname = req->get_query_param("kn");
auto column_family = req->get_query_param("cf");
return make_ready_future<json::json_return_type>(json_void());
std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");
auto resp = make_ready_future<>();
if (column_family.empty()) {
resp = service::get_local_storage_service().take_snapshot(tag, keynames);
} else {
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
resp = service::get_local_storage_service().take_column_family_snapshot(keynames[0], column_family, tag);
}
return resp.then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::del_snapshot.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto tag = req->get_query_param("tag");
auto keyname = req->get_query_param("kn");
return make_ready_future<json::json_return_type>(json_void());
std::vector<sstring> keynames = split(req->get_query_param("kn"), ",");
return service::get_local_storage_service().clear_snapshot(tag, keynames).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::true_snapshots_size.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
return service::get_local_storage_service().true_snapshots_size().then([] (int64_t size) {
return make_ready_future<json::json_return_type>(size);
});
});
ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -238,11 +272,23 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::force_keyspace_cleanup.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
return make_ready_future<json::json_return_type>(json_void());
auto column_families = split_cf(req->get_query_param("cf"));
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -281,18 +327,15 @@ void set_storage_service(http_context& ctx, routes& r) {
ss::repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
// Currently, we get all the repair options encoded in a single
// "options" option, and split it to a map using the "," and ":"
// delimiters. TODO: consider if it doesn't make more sense to just
// take all the query parameters as this map and pass it to the repair
// function.
static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",
"jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace",
"startToken", "endToken" };
std::unordered_map<sstring, sstring> options_map;
for (auto s : split(req->get_query_param("options"), ",")) {
auto kv = split(s, ":");
if (kv.size() != 2) {
throw httpd::bad_param_exception("malformed async repair options");
for (auto o : options) {
auto s = req->get_query_param(o);
if (s != "") {
options_map[o] = s;
}
options_map.emplace(std::move(kv[0]), std::move(kv[1]));
}
// The repair process is asynchronous: repair_start only starts it and
@@ -330,31 +373,30 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::move.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
ss::move.set(r, [] (std::unique_ptr<request> req) {
auto new_token = req->get_query_param("new_token");
return make_ready_future<json::json_return_type>(json_void());
return service::get_local_storage_service().move(new_token).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::remove_node.set(r, [](std::unique_ptr<request> req) {
// FIXME: This api is incorrect. remove_node takes a host id string parameter instead of token.
auto host_id = req->get_query_param("token");
return service::get_local_storage_service().remove_node(std::move(host_id)).then([] {
auto host_id = req->get_query_param("host_id");
return service::get_local_storage_service().removenode(host_id).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::get_removal_status.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>("");
return service::get_local_storage_service().get_removal_status().then([] (auto status) {
return make_ready_future<json::json_return_type>(status);
});
});
ss::force_remove_completion.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
return service::get_local_storage_service().force_remove_completion().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::set_logging_level.set(r, [](std::unique_ptr<request> req) {
@@ -366,9 +408,13 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_logging_levels.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
std::vector<ss::mapper> res;
for (auto i : logging::logger_registry().get_all_logger_names()) {
ss::mapper log;
log.key = i;
log.value = logging::level_name(logging::logger_registry().get_logger_level(i));
res.push_back(log);
}
return make_ready_future<json::json_return_type>(res);
});
@@ -385,15 +431,18 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_drain_progress.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>("");
return service::get_storage_service().map_reduce(adder<service::storage_service::drain_progress>(), [] (auto& ss) {
return ss.get_drain_progress();
}).then([] (auto&& progress) {
auto progress_str = sprint("Drained %s/%s ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
return make_ready_future<json::json_return_type>(std::move(progress_str));
});
});
ss::drain.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
return service::get_local_storage_service().drain().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::truncate.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
@@ -440,8 +489,10 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
ss::is_initialized.set(r, [](const_req req) {
return service::get_local_storage_service().is_initialized();
ss::is_initialized.set(r, [](std::unique_ptr<request> req) {
return service::get_local_storage_service().is_initialized().then([] (bool initialized) {
return make_ready_future<json::json_return_type>(initialized);
});
});
ss::stop_rpc_server.set(r, [](std::unique_ptr<request> req) {
@@ -456,8 +507,10 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::is_rpc_server_running.set(r, [](const_req req) {
return service::get_local_storage_service().is_rpc_server_running();
ss::is_rpc_server_running.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().is_rpc_server_running().then([] (bool running) {
return make_ready_future<json::json_return_type>(running);
});
});
ss::start_native_transport.set(r, [](std::unique_ptr<request> req) {
@@ -472,8 +525,10 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::is_native_transport_running.set(r, [](const_req req) {
return service::get_local_storage_service().is_native_transport_running();
ss::is_native_transport_running.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().is_native_transport_running().then([] (bool running) {
return make_ready_future<json::json_return_type>(running);
});
});
ss::join_ring.set(r, [](std::unique_ptr<request> req) {
@@ -482,8 +537,10 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::is_joined.set(r, [](const_req req) {
return service::get_local_storage_service().is_joined();
ss::is_joined.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().is_joined().then([] (bool is_joined) {
return make_ready_future<json::json_return_type>(is_joined);
});
});
ss::set_stream_throughput_mb_per_sec.set(r, [](std::unique_ptr<request> req) {
@@ -499,10 +556,9 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
ss::get_compaction_throughput_mb_per_sec.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
ss::get_compaction_throughput_mb_per_sec.set(r, [&ctx](std::unique_ptr<request> req) {
int value = ctx.db.local().get_config().compaction_throughput_mb_per_sec();
return make_ready_future<json::json_return_type>(value);
});
ss::set_compaction_throughput_mb_per_sec.set(r, [](std::unique_ptr<request> req) {
@@ -532,6 +588,8 @@ void set_storage_service(http_context& ctx, routes& r) {
auto val_str = req->get_query_param("value");
bool value = (val_str == "True") || (val_str == "true") || (val_str == "1");
return service::get_local_storage_service().db().invoke_on_all([value] (database& db) {
db.set_enable_incremental_backups(value);
// Change both KS and CF, so they are in sync
for (auto& pair: db.get_keyspaces()) {
auto& ks = pair.second;
@@ -575,11 +633,16 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::load_new_ss_tables.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
return make_ready_future<json::json_return_type>(json_void());
auto ks = validate_keyspace(ctx, req->param);
auto cf = req->get_query_param("cf");
// No need to add the keyspace, since all we want is to avoid always sending this to the same
// CPU. Even then I am being overzealous here. This is not something that happens all the time.
auto coordinator = std::hash<sstring>()(cf) % smp::count;
return service::get_storage_service().invoke_on(coordinator, [ks = std::move(ks), cf = std::move(cf)] (service::storage_service& s) {
return s.load_new_sstables(ks, cf);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::sample_key_range.set(r, [](std::unique_ptr<request> req) {
@@ -631,16 +694,12 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
ss::get_cluster_name.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
ss::get_cluster_name.set(r, [](const_req req) {
return gms::get_local_gossiper().get_cluster_name();
});
ss::get_partitioner_name.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
ss::get_partitioner_name.set(r, [](const_req req) {
return gms::get_local_gossiper().get_partitioner_name();
});
ss::get_tombstone_warn_threshold.set(r, [](std::unique_ptr<request> req) {
@@ -711,17 +770,19 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
ss::get_ownership.set(r, [](const_req req) {
auto tokens = service::get_local_storage_service().get_ownership();
std::vector<storage_service_json::mapper> res;
return map_to_key_value(tokens, res);
ss::get_ownership.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().get_ownership().then([] (auto&& ownership) {
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
});
});
ss::get_effective_ownership.set(r, [&ctx](const_req req) {
auto tokens = service::get_local_storage_service().effective_ownership(
(req.param["keyspace"] == "null")? "" : validate_keyspace(ctx, req.param));
std::vector<storage_service_json::mapper> res;
return map_to_key_value(tokens, res);
ss::get_effective_ownership.set(r, [&ctx] (std::unique_ptr<request> req) {
auto keyspace_name = req->param["keyspace"] == "null" ? "" : validate_keyspace(ctx, req->param);
return service::get_local_storage_service().effective_ownership(keyspace_name).then([] (auto&& ownership) {
std::vector<storage_service_json::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));
});
});
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -24,6 +24,7 @@
#include "streaming/stream_result_future.hh"
#include "api/api-doc/stream_manager.json.hh"
#include <vector>
#include "gms/gossiper.hh"
namespace api {
@@ -31,11 +32,16 @@ namespace hs = httpd::stream_manager_json;
static void set_summaries(const std::vector<streaming::stream_summary>& from,
json::json_list<hs::stream_summary>& to) {
for (auto sum : from) {
if (!from.empty()) {
hs::stream_summary res;
res.cf_id = boost::lexical_cast<std::string>(sum.cf_id);
res.files = sum.files;
res.total_size = sum.total_size;
res.cf_id = boost::lexical_cast<std::string>(from.front().cf_id);
// For each stream_session, we pretend we are sending/receiving one
// file, to make it compatible with nodetool.
res.files = 1;
// We can not estimate total number of bytes the stream_session will
// send or recvieve since we don't know the size of the frozen_mutation
// until we read it.
res.total_size = 0;
to.push(res);
}
}
@@ -46,7 +52,7 @@ static hs::progress_info get_progress_info(const streaming::progress_info& info)
res.direction = info.dir;
res.file_name = info.file_name;
res.peer = boost::lexical_cast<std::string>(info.peer);
res.session_index = info.session_index;
res.session_index = 0;
res.total_bytes = info.total_bytes;
return res;
}
@@ -69,13 +75,14 @@ static hs::stream_state get_state(
for (auto info : result_future.get_coordinator().get()->get_all_session_info()) {
hs::stream_info si;
si.peer = boost::lexical_cast<std::string>(info.peer);
si.session_index = info.session_index;
si.session_index = 0;
si.state = info.state;
si.connecting = boost::lexical_cast<std::string>(info.connecting);
si.connecting = si.peer;
set_summaries(info.receiving_summaries, si.receiving_summaries);
set_summaries(info.sending_summaries, si.sending_summaries);
set_files(info.receiving_files, si.receiving_files);
set_files(info.sending_files, si.sending_files);
state.sessions.push(si);
}
return state;
}
@@ -83,20 +90,74 @@ static hs::stream_state get_state(
void set_stream_manager(http_context& ctx, routes& r) {
hs::get_current_streams.set(r,
[] (std::unique_ptr<request> req) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
std::vector<hs::stream_state> res;
for (auto i : stream.get_initiated_streams()) {
res.push_back(get_state(*i.second.get()));
}
for (auto i : stream.get_receiving_streams()) {
res.push_back(get_state(*i.second.get()));
}
return res;
}, std::vector<hs::stream_state>(),concat<hs::stream_state>).
then([](const std::vector<hs::stream_state>& res) {
return make_ready_future<json::json_return_type>(res);
return streaming::get_stream_manager().invoke_on_all([] (auto& sm) {
return sm.update_all_progress_info();
}).then([] {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
std::vector<hs::stream_state> res;
for (auto i : stream.get_initiated_streams()) {
res.push_back(get_state(*i.second.get()));
}
for (auto i : stream.get_receiving_streams()) {
res.push_back(get_state(*i.second.get()));
}
return res;
}, std::vector<hs::stream_state>(),concat<hs::stream_state>).
then([](const std::vector<hs::stream_state>& res) {
return make_ready_future<json::json_return_type>(res);
});
});
});
hs::get_all_active_streams_outbound.set(r, [](std::unique_ptr<request> req) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
return stream.get_initiated_streams().size();
}, 0, std::plus<int64_t>()).then([](int64_t res) {
return make_ready_future<json::json_return_type>(res);
});
});
hs::get_total_incoming_bytes.set(r, [](std::unique_ptr<request> req) {
gms::inet_address peer(req->param["peer"]);
return streaming::get_stream_manager().map_reduce0([peer](streaming::stream_manager& sm) {
return sm.get_progress_on_all_shards(peer).then([] (auto sbytes) {
return sbytes.bytes_received;
});
}, 0, std::plus<int64_t>()).then([](int64_t res) {
return make_ready_future<json::json_return_type>(res);
});
});
hs::get_all_total_incoming_bytes.set(r, [](std::unique_ptr<request> req) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& sm) {
return sm.get_progress_on_all_shards().then([] (auto sbytes) {
return sbytes.bytes_received;
});
}, 0, std::plus<int64_t>()).then([](int64_t res) {
return make_ready_future<json::json_return_type>(res);
});
});
hs::get_total_outgoing_bytes.set(r, [](std::unique_ptr<request> req) {
gms::inet_address peer(req->param["peer"]);
return streaming::get_stream_manager().map_reduce0([peer] (streaming::stream_manager& sm) {
return sm.get_progress_on_all_shards(peer).then([] (auto sbytes) {
return sbytes.bytes_sent;
});
}, 0, std::plus<int64_t>()).then([](int64_t res) {
return make_ready_future<json::json_return_type>(res);
});
});
hs::get_all_total_outgoing_bytes.set(r, [](std::unique_ptr<request> req) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& sm) {
return sm.get_progress_on_all_shards().then([] (auto sbytes) {
return sbytes.bytes_sent;
});
}, 0, std::plus<int64_t>()).then([](int64_t res) {
return make_ready_future<json::json_return_type>(res);
});
});
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

70
api/system.cc Normal file
View File

@@ -0,0 +1,70 @@
/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "api/api-doc/system.json.hh"
#include "api/api.hh"
#include "http/exception.hh"
#include "log.hh"
namespace api {
namespace hs = httpd::system_json;
void set_system(http_context& ctx, routes& r) {
hs::get_all_logger_names.set(r, [](const_req req) {
return logging::logger_registry().get_all_logger_names();
});
hs::set_all_logger_level.set(r, [](const_req req) {
try {
logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));
logging::logger_registry().set_all_loggers_level(level);
} catch (boost::bad_lexical_cast& e) {
throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));
}
return json::json_void();
});
hs::get_logger_level.set(r, [](const_req req) {
try {
return logging::level_name(logging::logger_registry().get_logger_level(req.param["name"]));
} catch (std::out_of_range& e) {
throw bad_param_exception("Unknown logger name " + req.param["name"]);
}
// just to keep the compiler happy
return sstring();
});
hs::set_logger_level.set(r, [](const_req req) {
try {
logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));
logging::logger_registry().set_logger_level(req.param["name"], level);
} catch (std::out_of_range& e) {
throw bad_param_exception("Unknown logger name " + req.param["name"]);
} catch (boost::bad_lexical_cast& e) {
throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));
}
return json::json_void();
});
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2014 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -19,11 +19,12 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "streaming/messages/outgoing_file_message.hh"
#pragma once
namespace streaming {
namespace messages {
#include "api.hh"
namespace api {
} // namespace messages
} // namespace streaming
void set_system(http_context& ctx, routes& r);
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -54,9 +54,9 @@ class atomic_cell_or_collection;
*/
class atomic_cell_type final {
private:
static constexpr int8_t DEAD_FLAGS = 0;
static constexpr int8_t LIVE_FLAG = 0x01;
static constexpr int8_t EXPIRY_FLAG = 0x02; // When present, expiry field is present. Set only for live cells
static constexpr int8_t REVERT_FLAG = 0x04; // transient flag used to efficiently implement ReversiblyMergeable for atomic cells.
static constexpr unsigned flags_size = 1;
static constexpr unsigned timestamp_offset = flags_size;
static constexpr unsigned timestamp_size = 8;
@@ -67,14 +67,21 @@ private:
static constexpr unsigned ttl_offset = expiry_offset + expiry_size;
static constexpr unsigned ttl_size = 4;
private:
static bool is_revert_set(bytes_view cell) {
return cell[0] & REVERT_FLAG;
}
template<typename BytesContainer>
static void set_revert(BytesContainer& cell, bool revert) {
cell[0] = (cell[0] & ~REVERT_FLAG) | (revert * REVERT_FLAG);
}
static bool is_live(const bytes_view& cell) {
return cell[0] != DEAD_FLAGS;
return cell[0] & LIVE_FLAG;
}
static bool is_live_and_has_ttl(const bytes_view& cell) {
return cell[0] & EXPIRY_FLAG;
}
static bool is_dead(const bytes_view& cell) {
return cell[0] == DEAD_FLAGS;
return !is_live(cell);
}
// Can be called on live and dead cells
static api::timestamp_type timestamp(const bytes_view& cell) {
@@ -106,7 +113,7 @@ private:
}
static managed_bytes make_dead(api::timestamp_type timestamp, gc_clock::time_point deletion_time) {
managed_bytes b(managed_bytes::initialized_later(), flags_size + timestamp_size + deletion_time_size);
b[0] = DEAD_FLAGS;
b[0] = 0;
set_field(b, timestamp_offset, timestamp);
set_field(b, deletion_time_offset, deletion_time.time_since_epoch().count());
return b;
@@ -140,8 +147,11 @@ protected:
ByteContainer _data;
protected:
atomic_cell_base(ByteContainer&& data) : _data(std::forward<ByteContainer>(data)) { }
atomic_cell_base(const ByteContainer& data) : _data(data) { }
friend class atomic_cell_or_collection;
public:
bool is_revert_set() const {
return atomic_cell_type::is_revert_set(_data);
}
bool is_live() const {
return atomic_cell_type::is_live(_data);
}
@@ -187,10 +197,13 @@ public:
bytes_view serialize() const {
return _data;
}
void set_revert(bool revert) {
atomic_cell_type::set_revert(_data, revert);
}
};
class atomic_cell_view final : public atomic_cell_base<bytes_view> {
atomic_cell_view(bytes_view data) : atomic_cell_base(data) {}
atomic_cell_view(bytes_view data) : atomic_cell_base(std::move(data)) {}
public:
static atomic_cell_view from_bytes(bytes_view data) { return atomic_cell_view(data); }
@@ -198,6 +211,11 @@ public:
friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
};
class atomic_cell_ref final : public atomic_cell_base<managed_bytes&> {
public:
atomic_cell_ref(managed_bytes& buf) : atomic_cell_base(buf) {}
};
class atomic_cell final : public atomic_cell_base<managed_bytes> {
atomic_cell(managed_bytes b) : atomic_cell_base(std::move(b)) {}
public:
@@ -234,6 +252,8 @@ public:
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
};
class collection_mutation_view;
// Represents a mutation of a collection. Actual format is determined by collection type,
// and is:
// set: list of atomic_cell
@@ -241,58 +261,35 @@ public:
// list: tbd, probably ugly
class collection_mutation {
public:
struct view {
bytes_view data;
bytes_view serialize() const { return data; }
static view from_bytes(bytes_view v) { return { v }; }
};
struct one {
managed_bytes data;
one() {}
one(managed_bytes b) : data(std::move(b)) {}
one(view v) : data(v.data) {}
operator view() const { return { data }; }
};
managed_bytes data;
collection_mutation() {}
collection_mutation(managed_bytes b) : data(std::move(b)) {}
collection_mutation(collection_mutation_view v);
operator collection_mutation_view() const;
};
class collection_mutation_view {
public:
bytes_view data;
bytes_view serialize() const { return data; }
static collection_mutation_view from_bytes(bytes_view v) { return { v }; }
};
inline
collection_mutation::collection_mutation(collection_mutation_view v)
: data(v.data) {
}
inline
collection_mutation::operator collection_mutation_view() const {
return { data };
}
namespace db {
template<typename T>
class serializer;
}
// A variant type that can hold either an atomic_cell, or a serialized collection.
// Which type is stored is determined by the schema.
class atomic_cell_or_collection final {
managed_bytes _data;
template<typename T>
friend class db::serializer;
private:
atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
public:
atomic_cell_or_collection() = default;
atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
atomic_cell_or_collection(collection_mutation::one cm) : _data(std::move(cm.data)) {}
explicit operator bool() const {
return !_data.empty();
}
static atomic_cell_or_collection from_collection_mutation(collection_mutation::one data) {
return std::move(data.data);
}
collection_mutation::view as_collection_mutation() const {
return collection_mutation::view{_data};
}
bytes_view serialize() const {
return _data;
}
bool operator==(const atomic_cell_or_collection& other) const {
return _data == other._data;
}
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};
class column_definition;
int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);

75
atomic_cell_hash.hh Normal file
View File

@@ -0,0 +1,75 @@
/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh
#include "types.hh"
#include "atomic_cell.hh"
#include "hashing.hh"
template<>
struct appending_hash<collection_mutation_view> {
template<typename Hasher>
void operator()(Hasher& h, collection_mutation_view cell) const {
auto m_view = collection_type_impl::deserialize_mutation_form(cell);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second);
}
}
};
template<>
struct appending_hash<atomic_cell_view> {
template<typename Hasher>
void operator()(Hasher& h, atomic_cell_view cell) const {
feed_hash(h, cell.is_live());
feed_hash(h, cell.timestamp());
if (cell.is_live()) {
if (cell.is_live_and_has_ttl()) {
feed_hash(h, cell.expiry());
feed_hash(h, cell.ttl());
}
feed_hash(h, cell.value());
} else {
feed_hash(h, cell.deletion_time());
}
}
};
template<>
struct appending_hash<atomic_cell> {
template<typename Hasher>
void operator()(Hasher& h, const atomic_cell& cell) const {
feed_hash(h, static_cast<atomic_cell_view>(cell));
}
};
template<>
struct appending_hash<collection_mutation> {
template<typename Hasher>
void operator()(Hasher& h, const collection_mutation& cm) const {
feed_hash(h, static_cast<collection_mutation_view>(cm));
}
};

View File

@@ -0,0 +1,67 @@
/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "atomic_cell.hh"
#include "schema.hh"
#include "hashing.hh"
// A variant type that can hold either an atomic_cell, or a serialized collection.
// Which type is stored is determined by the schema.
// Has an "empty" state.
// Objects moved-from are left in an empty state.
class atomic_cell_or_collection final {
managed_bytes _data;
private:
atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
public:
atomic_cell_or_collection() = default;
atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
atomic_cell_ref as_atomic_cell_ref() { return { _data }; }
atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
explicit operator bool() const {
return !_data.empty();
}
static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
return std::move(data.data);
}
collection_mutation_view as_collection_mutation() const {
return collection_mutation_view{_data};
}
bytes_view serialize() const {
return _data;
}
bool operator==(const atomic_cell_or_collection& other) const {
return _data == other._data;
}
template<typename Hasher>
void feed_hash(Hasher& h, const column_definition& def) const {
if (def.is_atomic()) {
::feed_hash(h, as_atomic_cell());
} else {
::feed_hash(as_collection_mutation(), h, def.type);
}
}
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};

383
auth/auth.cc Normal file
View File

@@ -0,0 +1,383 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <seastar/core/sleep.hh>
#include <seastar/core/distributed.hh>
#include "auth.hh"
#include "authenticator.hh"
#include "authorizer.hh"
#include "database.hh"
#include "cql3/query_processor.hh"
#include "cql3/statements/cf_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "db/config.hh"
#include "service/migration_manager.hh"
#include "utils/loading_cache.hh"
#include "utils/hash.hh"
const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
const sstring auth::auth::AUTH_KS("system_auth");
const sstring auth::auth::USERS_CF("users");
static const sstring USER_NAME("name");
static const sstring SUPER("super");
static logging::logger logger("auth");
// TODO: configurable
using namespace std::chrono_literals;
const std::chrono::milliseconds auth::auth::SUPERUSER_SETUP_DELAY = 10000ms;
class auth_migration_listener : public service::migration_listener {
void on_create_keyspace(const sstring& ks_name) override {}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_update_keyspace(const sstring& ks_name) override {}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_keyspace(const sstring& ks_name) override {
auth::authorizer::get().revoke_all(auth::data_resource(ks_name));
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
auth::authorizer::get().revoke_all(auth::data_resource(ks_name, cf_name));
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
};
static auth_migration_listener auth_migration;
namespace std {
template <>
struct hash<auth::data_resource> {
size_t operator()(const auth::data_resource & v) const {
return v.hash_value();
}
};
template <>
struct hash<auth::authenticated_user> {
size_t operator()(const auth::authenticated_user & v) const {
return utils::tuple_hash()(v.name(), v.is_anonymous());
}
};
}
class auth::auth::permissions_cache {
public:
typedef utils::loading_cache<std::pair<authenticated_user, data_resource>, permission_set, utils::tuple_hash> cache_type;
typedef typename cache_type::key_type key_type;
permissions_cache()
: permissions_cache(
cql3::get_local_query_processor().db().local().get_config()) {
}
permissions_cache(const db::config& cfg)
: _cache(cfg.permissions_cache_max_entries(), expiry(cfg),
std::chrono::milliseconds(
cfg.permissions_validity_in_ms()),
[](const key_type& k) {
logger.debug("Refreshing permissions for {}", k.first.name());
return authorizer::get().authorize(::make_shared<authenticated_user>(k.first), k.second);
}) {
}
static std::chrono::milliseconds expiry(const db::config& cfg) {
auto exp = cfg.permissions_update_interval_in_ms();
if (exp == 0 || exp == std::numeric_limits<uint32_t>::max()) {
exp = cfg.permissions_validity_in_ms();
}
return std::chrono::milliseconds(exp);
}
future<> stop() {
return make_ready_future<>();
}
future<permission_set> get(::shared_ptr<authenticated_user> user, data_resource resource) {
return _cache.get(key_type(*user, std::move(resource)));
}
private:
cache_type _cache;
};
static distributed<auth::auth::permissions_cache> perm_cache;
/**
* Poor mans job schedule. For maximum 2 jobs. Sic.
* Still does nothing more clever than waiting 10 seconds
* like origin, then runs the submitted tasks.
*
* Only difference compared to sleep (from which this
* borrows _heavily_) is that if tasks have not run by the time
* we exit (and do static clean up) we delete the promise + cont
*
* Should be abstracted to some sort of global server function
* probably.
*/
struct waiter {
promise<> done;
timer<> tmr;
waiter() : tmr([this] {done.set_value();})
{
tmr.arm(auth::auth::SUPERUSER_SETUP_DELAY);
}
~waiter() {
if (tmr.armed()) {
tmr.cancel();
done.set_exception(std::runtime_error("shutting down"));
}
logger.trace("Deleting scheduled task");
}
void kill() {
}
};
typedef std::unique_ptr<waiter> waiter_ptr;
static std::vector<waiter_ptr> & thread_waiters() {
static thread_local std::vector<waiter_ptr> the_waiters;
return the_waiters;
}
void auth::auth::schedule_when_up(scheduled_func f) {
logger.trace("Adding scheduled task");
auto & waiters = thread_waiters();
waiters.emplace_back(std::make_unique<waiter>());
auto* w = waiters.back().get();
w->done.get_future().finally([w] {
auto & waiters = thread_waiters();
auto i = std::find_if(waiters.begin(), waiters.end(), [w](const waiter_ptr& p) {
return p.get() == w;
});
if (i != waiters.end()) {
waiters.erase(i);
}
}).then([f = std::move(f)] {
logger.trace("Running scheduled task");
return f();
}).handle_exception([](auto ep) {
return make_ready_future();
});
}
bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
if (type == classname) {
return true;
}
auto i = classname.find_last_of('.');
return classname.compare(i + 1, sstring::npos, type) == 0;
}
future<> auth::auth::setup() {
auto& db = cql3::get_local_query_processor().db().local();
auto& cfg = db.get_config();
future<> f = perm_cache.start();
if (is_class_type(cfg.authenticator(),
authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)
&& is_class_type(cfg.authorizer(),
authorizer::ALLOW_ALL_AUTHORIZER_NAME)
) {
// just create the objects
return f.then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
});
}
if (!db.has_keyspace(AUTH_KS)) {
std::map<sstring, sstring> opts;
opts["replication_factor"] = "1";
auto ksm = keyspace_metadata::new_keyspace(AUTH_KS, "org.apache.cassandra.locator.SimpleStrategy", opts, true);
f = service::get_local_migration_manager().announce_new_keyspace(ksm, false);
}
return f.then([] {
return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
90 * 24 * 60 * 60)); // 3 months.
}).then([&cfg] {
return authenticator::setup(cfg.authenticator());
}).then([&cfg] {
return authorizer::setup(cfg.authorizer());
}).then([] {
service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
// instead of once-timer, just schedule this later
schedule_when_up([] {
// setup default super user
return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
AUTH_KS, USERS_CF, USER_NAME, SUPER);
cql3::get_local_query_processor().process(query, db::consistency_level::ONE, {DEFAULT_SUPERUSER_NAME, true}).then([](auto) {
logger.info("Created default superuser '{}'", DEFAULT_SUPERUSER_NAME);
}).handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception&) {
logger.warn("Skipped default superuser setup: some nodes were not ready");
}
});
}
});
});
});
}
future<> auth::auth::shutdown() {
// just make sure we don't have pending tasks.
// this is mostly relevant for test cases where
// db-env-shutdown != process shutdown
return smp::invoke_on_all([] {
thread_waiters().clear();
}).then([] {
return perm_cache.stop();
});
}
future<auth::permission_set> auth::auth::get_permissions(::shared_ptr<authenticated_user> user, data_resource resource) {
return perm_cache.local().get(std::move(user), std::move(resource));
}
static db::consistency_level consistency_for_user(const sstring& username) {
if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
static future<::shared_ptr<cql3::untyped_result_set>> select_user(const sstring& username) {
// Here was a thread local, explicit cache of prepared statement. In normal execution this is
// fine, but since we in testing set up and tear down system over and over, we'd start using
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return cql3::get_local_query_processor().process(
sprint("SELECT * FROM %s.%s WHERE %s = ?",
auth::auth::AUTH_KS, auth::auth::USERS_CF,
USER_NAME), consistency_for_user(username),
{ username }, true);
}
future<bool> auth::auth::is_existing_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
}
future<bool> auth::auth::is_super_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty() && res->one().get_as<bool>(SUPER));
});
}
future<> auth::auth::insert_user(const sstring& username, bool is_super)
throw (exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
AUTH_KS, USERS_CF, USER_NAME, SUPER),
consistency_for_user(username), { username, is_super }).discard_result();
}
future<> auth::auth::delete_user(const sstring& username) throw(exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("DELETE FROM %s.%s WHERE %s = ?",
AUTH_KS, USERS_CF, USER_NAME),
consistency_for_user(username), { username }).discard_result();
}
future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
auto& qp = cql3::get_local_query_processor();
auto& db = qp.db().local();
if (db.has_schema(AUTH_KS, name)) {
return make_ready_future();
}
::shared_ptr<cql3::statements::cf_statement> parsed = static_pointer_cast<
cql3::statements::cf_statement>(cql3::query_processor::parse_statement(cql));
parsed->prepare_keyspace(AUTH_KS);
::shared_ptr<cql3::statements::create_table_statement> statement =
static_pointer_cast<cql3::statements::create_table_statement>(
parsed->prepare(db)->statement);
auto schema = statement->get_cf_meta_data();
auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
return service::get_local_migration_manager().announce_new_column_family(b.build(), false);
}
future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {
auto default_user_query = sprint("SELECT * FROM %s.%s WHERE %s = ?", AUTH_KS, cfname, name_column);
auto all_users_query = sprint("SELECT * FROM %s.%s LIMIT 1", AUTH_KS, cfname);
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::ONE, { def_user_name }).then([=](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::QUORUM, { def_user_name }).then([all_users_query](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(all_users_query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
});
});
}

124
auth/auth.hh Normal file
View File

@@ -0,0 +1,124 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "exceptions/exceptions.hh"
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class authenticated_user;
class auth {
public:
class permissions_cache;
static const sstring DEFAULT_SUPERUSER_NAME;
static const sstring AUTH_KS;
static const sstring USERS_CF;
static const std::chrono::milliseconds SUPERUSER_SETUP_DELAY;
static bool is_class_type(const sstring& type, const sstring& classname);
static future<permission_set> get_permissions(::shared_ptr<authenticated_user>, data_resource);
/**
* Checks if the username is stored in AUTH_KS.USERS_CF.
*
* @param username Username to query.
* @return whether or not Cassandra knows about the user.
*/
static future<bool> is_existing_user(const sstring& username);
/**
* Checks if the user is a known superuser.
*
* @param username Username to query.
* @return true is the user is a superuser, false if they aren't or don't exist at all.
*/
static future<bool> is_super_user(const sstring& username);
/**
* Inserts the user into AUTH_KS.USERS_CF (or overwrites their superuser status as a result of an ALTER USER query).
*
* @param username Username to insert.
* @param isSuper User's new status.
* @throws RequestExecutionException
*/
static future<> insert_user(const sstring& username, bool is_super) throw(exceptions::request_execution_exception);
/**
* Deletes the user from AUTH_KS.USERS_CF.
*
* @param username Username to delete.
* @throws RequestExecutionException
*/
static future<> delete_user(const sstring& username) throw(exceptions::request_execution_exception);
/**
* Sets up Authenticator and Authorizer.
*/
static future<> setup();
static future<> shutdown();
/**
* Set up table from given CREATE TABLE statement under system_auth keyspace, if not already done so.
*
* @param name name of the table
* @param cql CREATE TABLE statement
*/
static future<> setup_table(const sstring& name, const sstring& cql);
static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
// For internal use. Run function "when system is up".
typedef std::function<future<>()> scheduled_func;
static void schedule_when_up(scheduled_func);
};
}

View File

@@ -0,0 +1,72 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "authenticated_user.hh"
#include "auth.hh"
const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");
auth::authenticated_user::authenticated_user()
: _anon(true)
{}
auth::authenticated_user::authenticated_user(sstring name)
: _name(name), _anon(false)
{}
auth::authenticated_user::authenticated_user(authenticated_user&&) = default;
auth::authenticated_user::authenticated_user(const authenticated_user&) = default;
const sstring& auth::authenticated_user::name() const {
return _anon ? ANONYMOUS_USERNAME : _name;
}
future<bool> auth::authenticated_user::is_super() const {
if (is_anonymous()) {
return make_ready_future<bool>(false);
}
return auth::auth::is_super_user(_name);
}
bool auth::authenticated_user::operator==(const authenticated_user& v) const {
return _anon ? v._anon : _name == v._name;
}

View File

@@ -14,9 +14,12 @@
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by Cloudius Systems.
* Copyright 2015 Cloudius Systems.
* Modified by ScyllaDB
*/
/*
@@ -38,38 +41,42 @@
#pragma once
#include "utils/UUID.hh"
#include "streaming/messages/stream_message.hh"
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
namespace streaming {
namespace messages {
namespace auth {
class received_message : public stream_message {
class authenticated_user {
public:
using UUID = utils::UUID;
UUID cf_id;
int sequence_number;
static const sstring ANONYMOUS_USERNAME;
received_message() = default;
received_message(UUID cf_id_, int sequence_number_)
: stream_message(stream_message::Type::RECEIVED)
, cf_id (cf_id_)
, sequence_number(sequence_number_) {
authenticated_user();
authenticated_user(sstring name);
authenticated_user(authenticated_user&&);
authenticated_user(const authenticated_user&);
const sstring& name() const;
/**
* Checks the user's superuser status.
* Only a superuser is allowed to perform CREATE USER and DROP USER queries.
* Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
* (depends on IAuthorizer implementation).
*/
future<bool> is_super() const;
/**
* If IAuthenticator doesn't require authentication, this method may return true.
*/
bool is_anonymous() const {
return _anon;
}
#if 0
@Override
public String toString()
{
final StringBuilder sb = new StringBuilder("Received (");
sb.append(cfId).append(", #").append(sequenceNumber).append(')');
return sb.toString();
}
#endif
public:
void serialize(bytes::iterator& out) const;
static received_message deserialize(bytes_view& v);
size_t serialized_size() const;
bool operator==(const authenticated_user&) const;
private:
sstring _name;
bool _anon;
};
} // namespace messages
} // namespace streaming
}

127
auth/authenticator.cc Normal file
View File

@@ -0,0 +1,127 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "authenticator.hh"
#include "authenticated_user.hh"
#include "password_authenticator.hh"
#include "auth.hh"
#include "db/config.hh"
const sstring auth::authenticator::USERNAME_KEY("username");
const sstring auth::authenticator::PASSWORD_KEY("password");
const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");
auth::authenticator::option auth::authenticator::string_to_option(const sstring& name) {
if (strcasecmp(name.c_str(), "password") == 0) {
return option::PASSWORD;
}
throw std::invalid_argument(name);
}
sstring auth::authenticator::option_to_string(option opt) {
switch (opt) {
case option::PASSWORD:
return "PASSWORD";
default:
throw std::invalid_argument(sprint("Unknown option {}", opt));
}
}
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authenticator> global_authenticator;
future<>
auth::authenticator::setup(const sstring& type) throw (exceptions::configuration_exception) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHENTICATOR_NAME)) {
class allow_all_authenticator : public authenticator {
public:
const sstring& class_name() const override {
return ALLOW_ALL_AUTHENTICATOR_NAME;
}
bool require_authentication() const override {
return false;
}
option_set supported_options() const override {
return option_set();
}
option_set alterable_options() const override {
return option_set();
}
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override {
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
const resource_ids& protected_resources() const override {
static const resource_ids ids;
return ids;
}
::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
};
global_authenticator = std::make_unique<allow_all_authenticator>();
} else if (auth::auth::is_class_type(type, password_authenticator::PASSWORD_AUTHENTICATOR_NAME)) {
auto pwa = std::make_unique<password_authenticator>();
auto f = pwa->init();
return f.then([pwa = std::move(pwa)]() mutable {
global_authenticator = std::move(pwa);
});
} else {
throw exceptions::configuration_exception("Invalid authenticator type: " + type);
}
return make_ready_future();
}
auth::authenticator& auth::authenticator::get() {
assert(global_authenticator);
return *global_authenticator;
}

200
auth/authenticator.hh Normal file
View File

@@ -0,0 +1,200 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <memory>
#include <unordered_map>
#include <set>
#include <stdexcept>
#include <boost/any.hpp>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/enum.hh>
#include "bytes.hh"
#include "data_resource.hh"
#include "enum_set.hh"
#include "exceptions/exceptions.hh"
namespace db {
class config;
}
namespace auth {
class authenticated_user;
class authenticator {
public:
static const sstring USERNAME_KEY;
static const sstring PASSWORD_KEY;
static const sstring ALLOW_ALL_AUTHENTICATOR_NAME;
/**
* Supported CREATE USER/ALTER USER options.
* Currently only PASSWORD is available.
*/
enum class option {
PASSWORD
};
static option string_to_option(const sstring&);
static sstring option_to_string(option);
using option_set = enum_set<super_enum<option, option::PASSWORD>>;
using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
using credentials_map = std::unordered_map<sstring, sstring>;
/**
* Setup is called once upon system startup to initialize the IAuthenticator.
*
* For example, use this method to create any required keyspaces/column families.
* Note: Only call from main thread.
*/
static future<> setup(const sstring& type) throw(exceptions::configuration_exception);
/**
* Returns the system authenticator. Must have called setup before calling this.
*/
static authenticator& get();
virtual ~authenticator()
{}
virtual const sstring& class_name() const = 0;
/**
* Whether or not the authenticator requires explicit login.
* If false will instantiate user with AuthenticatedUser.ANONYMOUS_USER.
*/
virtual bool require_authentication() const = 0;
/**
* Set of options supported by CREATE USER and ALTER USER queries.
* Should never return null - always return an empty set instead.
*/
virtual option_set supported_options() const = 0;
/**
* Subset of supportedOptions that users are allowed to alter when performing ALTER USER [themselves].
* Should never return null - always return an empty set instead.
*/
virtual option_set alterable_options() const = 0;
/**
* Authenticates a user given a Map<String, String> of credentials.
* Should never return null - always throw AuthenticationException instead.
* Returning AuthenticatedUser.ANONYMOUS_USER is an option as well if authentication is not required.
*
* @throws authentication_exception if credentials don't match any known user.
*/
virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) = 0;
/**
* Called during execution of CREATE USER query (also may be called on startup, see seedSuperuserOptions method).
* If authenticator is static then the body of the method should be left blank, but don't throw an exception.
* options are guaranteed to be a subset of supportedOptions().
*
* @param username Username of the user to create.
* @param options Options the user will be created with.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Called during execution of ALTER USER query.
* options are always guaranteed to be a subset of supportedOptions(). Furthermore, if the user performing the query
* is not a superuser and is altering himself, then options are guaranteed to be a subset of alterableOptions().
* Keep the body of the method blank if your implementation doesn't support any options.
*
* @param username Username of the user that will be altered.
* @param options Options to alter.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Called during execution of DROP USER query.
*
* @param username Username of the user that will be dropped.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
*
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
* @see resource_ids
*/
virtual const resource_ids& protected_resources() const = 0;
class sasl_challenge {
public:
virtual ~sasl_challenge() {}
virtual bytes evaluate_response(bytes_view client_response) throw(exceptions::authentication_exception) = 0;
virtual bool is_complete() const = 0;
virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const throw(exceptions::authentication_exception) = 0;
};
/**
* Provide a sasl_challenge to be used by the CQL binary protocol server. If
* the configured authenticator requires authentication but does not implement this
* interface we refuse to start the binary protocol server as it will have no way
* of authenticating clients.
* @return sasl_challenge implementation
*/
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
};
inline std::ostream& operator<<(std::ostream& os, authenticator::option opt) {
return os << authenticator::option_to_string(opt);
}
}

104
auth/authorizer.cc Normal file
View File

@@ -0,0 +1,104 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "authorizer.hh"
#include "authenticated_user.hh"
#include "default_authorizer.hh"
#include "auth.hh"
#include "db/config.hh"
const sstring auth::authorizer::ALLOW_ALL_AUTHORIZER_NAME("org.apache.cassandra.auth.AllowAllAuthorizer");
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authorizer> global_authorizer;
future<>
auth::authorizer::setup(const sstring& type) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHORIZER_NAME)) {
class allow_all_authorizer : public authorizer {
public:
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override {
return make_ready_future<permission_set>(permissions::ALL);
}
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
}
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override {
throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
}
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const override {
throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
}
future<> revoke_all(sstring dropped_user) override {
return make_ready_future();
}
future<> revoke_all(data_resource) override {
return make_ready_future();
}
const resource_ids& protected_resources() override {
static const resource_ids ids;
return ids;
}
future<> validate_configuration() const override {
return make_ready_future();
}
};
global_authorizer = std::make_unique<allow_all_authorizer>();
} else if (auth::auth::is_class_type(type, default_authorizer::DEFAULT_AUTHORIZER_NAME)) {
auto da = std::make_unique<default_authorizer>();
auto f = da->init();
return f.then([da = std::move(da)]() mutable {
global_authorizer = std::move(da);
});
} else {
throw exceptions::configuration_exception("Invalid authorizer type: " + type);
}
return make_ready_future();
}
auth::authorizer& auth::authorizer::get() {
assert(global_authorizer);
return *global_authorizer;
}

171
auth/authorizer.hh Normal file
View File

@@ -0,0 +1,171 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include <tuple>
#include <experimental/optional>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include "permission.hh"
#include "data_resource.hh"
namespace auth {
class authenticated_user;
struct permission_details {
sstring user;
data_resource resource;
permission_set permissions;
bool operator<(const permission_details& v) const {
return std::tie(user, resource, permissions) < std::tie(v.user, v.resource, v.permissions);
}
};
using std::experimental::optional;
class authorizer {
public:
static const sstring ALLOW_ALL_AUTHORIZER_NAME;
virtual ~authorizer() {}
/**
* The primary Authorizer method. Returns a set of permissions of a user on a resource.
*
* @param user Authenticated user requesting authorization.
* @param resource Resource for which the authorization is being requested. @see DataResource.
* @return Set of permissions of the user on the resource. Should never return empty. Use permission.NONE instead.
*/
virtual future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const = 0;
/**
* Grants a set of permissions on a resource to a user.
* The opposite of revoke().
*
* @param performer User who grants the permissions.
* @param permissions Set of permissions to grant.
* @param to Grantee of the permissions.
* @param resource Resource on which to grant the permissions.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<> grant(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring to) = 0;
/**
* Revokes a set of permissions on a resource from a user.
* The opposite of grant().
*
* @param performer User who revokes the permissions.
* @param permissions Set of permissions to revoke.
* @param from Revokee of the permissions.
* @param resource Resource on which to revoke the permissions.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<> revoke(::shared_ptr<authenticated_user> performer, permission_set, data_resource, sstring from) = 0;
/**
* Returns a list of permissions on a resource of a user.
*
* @param performer User who wants to see the permissions.
* @param permissions Set of Permission values the user is interested in. The result should only include the matching ones.
* @param resource The resource on which permissions are requested. Can be null, in which case permissions on all resources
* should be returned.
* @param of The user whose permissions are requested. Can be null, in which case permissions of every user should be returned.
*
* @return All of the matching permission that the requesting user is authorized to know about.
*
* @throws RequestValidationException
* @throws RequestExecutionException
*/
virtual future<std::vector<permission_details>> list(::shared_ptr<authenticated_user> performer, permission_set, optional<data_resource>, optional<sstring>) const = 0;
/**
* This method is called before deleting a user with DROP USER query so that a new user with the same
* name wouldn't inherit permissions of the deleted user in the future.
*
* @param droppedUser The user to revoke all permissions from.
*/
virtual future<> revoke_all(sstring dropped_user) = 0;
/**
* This method is called after a resource is removed (i.e. keyspace or a table is dropped).
*
* @param droppedResource The resource to revoke all permissions on.
*/
virtual future<> revoke_all(data_resource) = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
*
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
*/
virtual const resource_ids& protected_resources() = 0;
/**
* Validates configuration of IAuthorizer implementation (if configurable).
*
* @throws ConfigurationException when there is a configuration error.
*/
virtual future<> validate_configuration() const = 0;
/**
* Setup is called once upon system startup to initialize the IAuthorizer.
*
* For example, use this method to create any required keyspaces/column families.
*/
static future<> setup(const sstring& type);
/**
* Returns the system authorizer. Must have called setup before calling this.
*/
static authorizer& get();
};
}

183
auth/data_resource.cc Normal file
View File

@@ -0,0 +1,183 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "data_resource.hh"
#include <regex>
#include "service/storage_proxy.hh"
const sstring auth::data_resource::ROOT_NAME("data");
auth::data_resource::data_resource(level l, const sstring& ks, const sstring& cf)
: _ks(ks), _cf(cf)
{
if (l != get_level()) {
throw std::invalid_argument("level/keyspace/column mismatch");
}
}
auth::data_resource::data_resource()
: data_resource(level::ROOT)
{}
auth::data_resource::data_resource(const sstring& ks)
: data_resource(level::KEYSPACE, ks)
{}
auth::data_resource::data_resource(const sstring& ks, const sstring& cf)
: data_resource(level::COLUMN_FAMILY, ks, cf)
{}
auth::data_resource::level auth::data_resource::get_level() const {
if (!_cf.empty()) {
assert(!_ks.empty());
return level::COLUMN_FAMILY;
}
if (!_ks.empty()) {
return level::KEYSPACE;
}
return level::ROOT;
}
auth::data_resource auth::data_resource::from_name(
const sstring& s) {
static std::regex slash_regex("/");
auto i = std::regex_token_iterator<sstring::const_iterator>(s.begin(),
s.end(), slash_regex, -1);
auto e = std::regex_token_iterator<sstring::const_iterator>();
auto n = std::distance(i, e);
if (n > 3 || ROOT_NAME != sstring(*i++)) {
throw std::invalid_argument(sprint("%s is not a valid data resource name", s));
}
if (n == 1) {
return data_resource();
}
auto ks = *i++;
if (n == 2) {
return data_resource(ks.str());
}
auto cf = *i++;
return data_resource(ks.str(), cf.str());
}
sstring auth::data_resource::name() const {
switch (get_level()) {
case level::ROOT:
return ROOT_NAME;
case level::KEYSPACE:
return sprint("%s/%s", ROOT_NAME, _ks);
case level::COLUMN_FAMILY:
default:
return sprint("%s/%s/%s", ROOT_NAME, _ks, _cf);
}
}
auth::data_resource auth::data_resource::get_parent() const {
switch (get_level()) {
case level::KEYSPACE:
return data_resource();
case level::COLUMN_FAMILY:
return data_resource(_ks);
default:
throw std::invalid_argument("Root-level resource can't have a parent");
}
}
const sstring& auth::data_resource::keyspace() const
throw (std::invalid_argument) {
if (is_root_level()) {
throw std::invalid_argument("ROOT data resource has no keyspace");
}
return _ks;
}
const sstring& auth::data_resource::column_family() const
throw (std::invalid_argument) {
if (!is_column_family_level()) {
throw std::invalid_argument(sprint("%s data resource has no column family", name()));
}
return _cf;
}
bool auth::data_resource::has_parent() const {
return !is_root_level();
}
bool auth::data_resource::exists() const {
switch (get_level()) {
case level::ROOT:
return true;
case level::KEYSPACE:
return service::get_local_storage_proxy().get_db().local().has_keyspace(_ks);
case level::COLUMN_FAMILY:
default:
return service::get_local_storage_proxy().get_db().local().has_schema(_ks, _cf);
}
}
sstring auth::data_resource::to_string() const {
switch (get_level()) {
case level::ROOT:
return "<all keyspaces>";
case level::KEYSPACE:
return sprint("<keyspace %s>", _ks);
case level::COLUMN_FAMILY:
default:
return sprint("<table %s.%s>", _ks, _cf);
}
}
bool auth::data_resource::operator==(const data_resource& v) const {
return _ks == v._ks && _cf == v._cf;
}
bool auth::data_resource::operator<(const data_resource& v) const {
return _ks < v._ks ? true : (v._ks < _ks ? false : _cf < v._cf);
}
std::ostream& auth::operator<<(std::ostream& os, const data_resource& r) {
return os << r.to_string();
}

157
auth/data_resource.hh Normal file
View File

@@ -0,0 +1,157 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "utils/hash.hh"
#include <iosfwd>
#include <set>
#include <seastar/core/sstring.hh>
namespace auth {
class data_resource {
private:
enum class level {
ROOT, KEYSPACE, COLUMN_FAMILY
};
static const sstring ROOT_NAME;
sstring _ks;
sstring _cf;
data_resource(level, const sstring& ks = {}, const sstring& cf = {});
level get_level() const;
public:
/**
* Creates a DataResource representing the root-level resource.
* @return the root-level resource.
*/
data_resource();
/**
* Creates a DataResource representing a keyspace.
*
* @param keyspace Name of the keyspace.
*/
data_resource(const sstring& ks);
/**
* Creates a DataResource instance representing a column family.
*
* @param keyspace Name of the keyspace.
* @param columnFamily Name of the column family.
*/
data_resource(const sstring& ks, const sstring& cf);
/**
* Parses a data resource name into a DataResource instance.
*
* @param name Name of the data resource.
* @return DataResource instance matching the name.
*/
static data_resource from_name(const sstring&);
/**
* @return Printable name of the resource.
*/
sstring name() const;
/**
* @return Parent of the resource, if any. Throws IllegalStateException if it's the root-level resource.
*/
data_resource get_parent() const;
bool is_root_level() const {
return get_level() == level::ROOT;
}
bool is_keyspace_level() const {
return get_level() == level::KEYSPACE;
}
bool is_column_family_level() const {
return get_level() == level::COLUMN_FAMILY;
}
/**
* @return keyspace of the resource.
* @throws std::invalid_argument if it's the root-level resource.
*/
const sstring& keyspace() const throw(std::invalid_argument);
/**
* @return column family of the resource.
* @throws std::invalid_argument if it's not a cf-level resource.
*/
const sstring& column_family() const throw(std::invalid_argument);
/**
* @return Whether or not the resource has a parent in the hierarchy.
*/
bool has_parent() const;
/**
* @return Whether or not the resource exists in scylla.
*/
bool exists() const;
sstring to_string() const;
bool operator==(const data_resource&) const;
bool operator<(const data_resource&) const;
size_t hash_value() const {
return utils::tuple_hash()(_ks, _cf);
}
};
/**
* Resource id mappings, i.e. keyspace and/or column families.
*/
using resource_ids = std::set<data_resource>;
std::ostream& operator<<(std::ostream&, const data_resource&);
}

240
auth/default_authorizer.cc Normal file
View File

@@ -0,0 +1,240 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <unistd.h>
#include <crypt.h>
#include <random>
#include <chrono>
#include <seastar/core/reactor.hh>
#include "auth.hh"
#include "default_authorizer.hh"
#include "authenticated_user.hh"
#include "permission.hh"
#include "cql3/query_processor.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
const sstring auth::default_authorizer::DEFAULT_AUTHORIZER_NAME(
"org.apache.cassandra.auth.CassandraAuthorizer");
static const sstring USER_NAME = "username";
static const sstring RESOURCE_NAME = "resource";
static const sstring PERMISSIONS_NAME = "permissions";
static const sstring PERMISSIONS_CF = "permissions";
static logging::logger logger("default_authorizer");
auth::default_authorizer::default_authorizer() {
}
auth::default_authorizer::~default_authorizer() {
}
future<> auth::default_authorizer::init() {
sstring create_table = sprint("CREATE TABLE %s.%s ("
"%s text,"
"%s text,"
"%s set<text>,"
"PRIMARY KEY(%s, %s)"
") WITH gc_grace_seconds=%d", auth::auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME,
USER_NAME, RESOURCE_NAME, 90 * 24 * 60 * 60); // 3 months.
return auth::setup_table(PERMISSIONS_CF, create_table);
}
future<auth::permission_set> auth::default_authorizer::authorize(
::shared_ptr<authenticated_user> user, data_resource resource) const {
return user->is_super().then([this, user, resource = std::move(resource)](bool is_super) {
if (is_super) {
return make_ready_future<permission_set>(permissions::ALL);
}
/**
* TOOD: could create actual data type for permission (translating string<->perm),
* but this seems overkill right now. We still must store strings so...
*/
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?"
, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, {user->name(), resource.name() })
.then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
if (res->empty() || !res->one().has(PERMISSIONS_NAME)) {
return make_ready_future<permission_set>(permissions::NONE);
}
return make_ready_future<permission_set>(permissions::from_strings(res->one().get_set<sstring>(PERMISSIONS_NAME)));
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to authorize {} for {}", user->name(), resource);
return make_ready_future<permission_set>(permissions::NONE);
}
});
});
}
#include <boost/range.hpp>
future<> auth::default_authorizer::modify(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring user, sstring op) {
// TODO: why does this not check super user?
auto& qp = cql3::get_local_query_processor();
auto query = sprint("UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
auth::AUTH_KS, PERMISSIONS_CF, PERMISSIONS_NAME,
PERMISSIONS_NAME, op, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::ONE, {
permissions::to_strings(set), user, resource.name() }).discard_result();
}
future<> auth::default_authorizer::grant(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring to) {
return modify(std::move(performer), std::move(set), std::move(resource), std::move(to), "+");
}
future<> auth::default_authorizer::revoke(
::shared_ptr<authenticated_user> performer, permission_set set,
data_resource resource, sstring from) {
return modify(std::move(performer), std::move(set), std::move(resource), std::move(from), "-");
}
future<std::vector<auth::permission_details>> auth::default_authorizer::list(
::shared_ptr<authenticated_user> performer, permission_set set,
optional<data_resource> resource, optional<sstring> user) const {
return performer->is_super().then([this, performer, set = std::move(set), resource = std::move(resource), user = std::move(user)](bool is_super) {
if (!is_super && (!user || performer->name() != *user)) {
throw exceptions::unauthorized_exception(sprint("You are not authorized to view %s's permissions", user ? *user : "everyone"));
}
auto query = sprint("SELECT %s, %s, %s FROM %s.%s", USER_NAME, RESOURCE_NAME, PERMISSIONS_NAME, auth::AUTH_KS, PERMISSIONS_CF);
auto& qp = cql3::get_local_query_processor();
// Oh, look, it is a case where it does not pay off to have
// parameters to process in an initializer list.
future<::shared_ptr<cql3::untyped_result_set>> f = make_ready_future<::shared_ptr<cql3::untyped_result_set>>();
if (resource && user) {
query += sprint(" WHERE %s = ? AND %s = ?", USER_NAME, RESOURCE_NAME);
f = qp.process(query, db::consistency_level::ONE, {*user, resource->name()});
} else if (resource) {
query += sprint(" WHERE %s = ? ALLOW FILTERING", RESOURCE_NAME);
f = qp.process(query, db::consistency_level::ONE, {resource->name()});
} else if (user) {
query += sprint(" WHERE %s = ?", USER_NAME);
f = qp.process(query, db::consistency_level::ONE, {*user});
} else {
f = qp.process(query, db::consistency_level::ONE, {});
}
return f.then([set](::shared_ptr<cql3::untyped_result_set> res) {
std::vector<permission_details> result;
for (auto& row : *res) {
if (row.has(PERMISSIONS_NAME)) {
auto username = row.get_as<sstring>(USER_NAME);
auto resource = data_resource::from_name(row.get_as<sstring>(RESOURCE_NAME));
auto ps = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));
ps = permission_set::from_mask(ps.mask() & set.mask());
result.emplace_back(permission_details {username, resource, ps});
}
}
return make_ready_future<std::vector<permission_details>>(std::move(result));
});
});
}
future<> auth::default_authorizer::revoke_all(sstring dropped_user) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?", auth::AUTH_KS,
PERMISSIONS_CF, USER_NAME);
return qp.process(query, db::consistency_level::ONE, { dropped_user }).discard_result().handle_exception(
[dropped_user](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", dropped_user, e);
}
});
}
future<> auth::default_authorizer::revoke_all(data_resource resource) {
auto& qp = cql3::get_local_query_processor();
auto query = sprint("SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
USER_NAME, auth::AUTH_KS, PERMISSIONS_CF, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { resource.name() })
.then_wrapped([resource, &qp](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
return parallel_for_each(res->begin(), res->end(), [&qp, res, resource](const cql3::untyped_result_set::row& r) {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ? AND %s = ?"
, auth::AUTH_KS, PERMISSIONS_CF, USER_NAME, RESOURCE_NAME);
return qp.process(query, db::consistency_level::LOCAL_ONE, { r.get_as<sstring>(USER_NAME), resource.name() })
.discard_result().handle_exception([resource](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
}
});
});
} catch (exceptions::request_execution_exception& e) {
logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", resource, e);
return make_ready_future();
}
});
}
const auth::resource_ids& auth::default_authorizer::protected_resources() {
static const resource_ids ids({ data_resource(auth::AUTH_KS, PERMISSIONS_CF) });
return ids;
}
future<> auth::default_authorizer::validate_configuration() const {
return make_ready_future();
}

View File

@@ -0,0 +1,78 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "authorizer.hh"
namespace auth {
class default_authorizer : public authorizer {
public:
static const sstring DEFAULT_AUTHORIZER_NAME;
default_authorizer();
~default_authorizer();
future<> init();
future<permission_set> authorize(::shared_ptr<authenticated_user>, data_resource) const override;
future<> grant(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
future<> revoke(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring) override;
future<std::vector<permission_details>> list(::shared_ptr<authenticated_user>, permission_set, optional<data_resource>, optional<sstring>) const override;
future<> revoke_all(sstring) override;
future<> revoke_all(data_resource) override;
const resource_ids& protected_resources() override;
future<> validate_configuration() const override;
private:
future<> modify(::shared_ptr<authenticated_user>, permission_set, data_resource, sstring, sstring);
};
} /* namespace auth */

View File

@@ -0,0 +1,358 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <unistd.h>
#include <crypt.h>
#include <random>
#include <chrono>
#include <seastar/core/reactor.hh>
#include "auth.hh"
#include "password_authenticator.hh"
#include "authenticated_user.hh"
#include "cql3/query_processor.hh"
#include "log.hh"
const sstring auth::password_authenticator::PASSWORD_AUTHENTICATOR_NAME("org.apache.cassandra.auth.PasswordAuthenticator");
// name of the hash column.
static const sstring SALTED_HASH = "salted_hash";
static const sstring USER_NAME = "username";
static const sstring DEFAULT_USER_NAME = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring CREDENTIALS_CF = "credentials";
static logging::logger logger("password_authenticator");
auth::password_authenticator::~password_authenticator()
{}
auth::password_authenticator::password_authenticator()
{}
// TODO: blowfish
// Origin uses Java bcrypt library, i.e. blowfish salt
// generation and hashing, which is arguably a "better"
// password hash than sha/md5 versions usually available in
// crypt_r. Otoh, glibc 2.7+ uses a modified sha512 algo
// which should be the same order of safe, so the only
// real issue should be salted hash compatibility with
// origin if importing system tables from there.
//
// Since bcrypt/blowfish is _not_ (afaict) not available
// as a dev package/lib on most linux distros, we'd have to
// copy and compile for example OWL crypto
// (http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/)
// to be fully bit-compatible.
//
// Until we decide this is needed, let's just use crypt_r,
// and some old-fashioned random salt generation.
static constexpr size_t rand_bytes = 16;
static sstring hashpw(const sstring& pass, const sstring& salt) {
// crypt_data is huge. should this be a thread_local static?
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
auto res = crypt_r(pass.c_str(), salt.c_str(), tmp.get());
if (res == nullptr) {
throw std::system_error(errno, std::system_category());
}
return res;
}
static bool checkpw(const sstring& pass, const sstring& salted_hash) {
auto tmp = hashpw(pass, salted_hash);
return tmp == salted_hash;
}
static sstring gensalt() {
static sstring prefix;
std::random_device rd;
std::default_random_engine e1(rd());
std::uniform_int_distribution<char> dist;
sstring valid_salt = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
sstring input(rand_bytes, 0);
for (char&c : input) {
c = valid_salt[dist(e1) % valid_salt.size()];
}
sstring salt;
if (!prefix.empty()) {
return prefix + salt;
}
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
// Try in order:
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
if (crypt_r("fisk", salt.c_str(), tmp.get())) {
prefix = pfx;
return salt;
}
}
throw std::runtime_error("Could not initialize hashing algorithm");
}
static sstring hashpw(const sstring& pass) {
return hashpw(pass, gensalt());
}
future<> auth::password_authenticator::init() {
gensalt(); // do this once to determine usable hashing
sstring create_table = sprint(
"CREATE TABLE %s.%s ("
"%s text,"
"%s text," // salt + hash + number of rounds
"options map<text,text>,"// for future extensions
"PRIMARY KEY(%s)"
") WITH gc_grace_seconds=%d",
auth::auth::AUTH_KS,
CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
90 * 24 * 60 * 60); // 3 months.
return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
// instead of once-timer, just schedule this later
auth::schedule_when_up([] {
return auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
auth::AUTH_KS,
CREDENTIALS_CF,
USER_NAME, SALTED_HASH
),
db::consistency_level::ONE, {DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD)}).then([](auto) {
logger.info("Created default user '{}'", DEFAULT_USER_NAME);
});
}
});
});
});
}
db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
if (username == DEFAULT_USER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
const sstring& auth::password_authenticator::class_name() const {
return PASSWORD_AUTHENTICATOR_NAME;
}
bool auth::password_authenticator::require_authentication() const {
return true;
}
auth::authenticator::option_set auth::password_authenticator::supported_options() const {
return option_set::of<option::PASSWORD>();
}
auth::authenticator::option_set auth::password_authenticator::alterable_options() const {
return option_set::of<option::PASSWORD>();
}
future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::authenticate(
const credentials_map& credentials) const
throw (exceptions::authentication_exception) {
if (!credentials.count(USERNAME_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
}
if (!credentials.count(PASSWORD_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));
}
auto& username = credentials.at(USERNAME_KEY);
auto& password = credentials.at(PASSWORD_KEY);
// Here was a thread local, explicit cache of prepared statement. In normal execution this is
// fine, but since we in testing set up and tear down system over and over, we'd start using
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
auto& qp = cql3::get_local_query_processor();
return qp.process(
sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
consistency_for_user(username), { username }, true).then_wrapped(
[=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
throw exceptions::authentication_exception("Username and/or password are incorrect");
}
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>(username));
} catch (std::system_error &) {
std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
} catch (exceptions::request_execution_exception& e) {
std::throw_with_nested(exceptions::authentication_exception(e.what()));
}
});
}
future<> auth::password_authenticator::create(sstring username,
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
future<> auth::password_authenticator::alter(sstring username,
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
auth::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
future<> auth::password_authenticator::drop(sstring username)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
const auth::resource_ids& auth::password_authenticator::protected_resources() const {
static const resource_ids ids({ data_resource(auth::AUTH_KS, CREDENTIALS_CF) });
return ids;
}
::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
class plain_text_password_challenge: public sasl_challenge {
public:
plain_text_password_challenge(const password_authenticator& a)
: _authenticator(a)
{}
/**
* SASL PLAIN mechanism specifies that credentials are encoded in a
* sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).
* The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}
* authzId is optional, and in fact we don't care about it here as we'll
* set the authzId to match the authnId (that is, there is no concept of
* a user being authorized to act on behalf of another).
*
* @param bytes encoded credentials string sent by the client
* @return map containing the username/password pairs in the form an IAuthenticator
* would expect
* @throws javax.security.sasl.SaslException
*/
bytes evaluate_response(bytes_view client_response)
throw (exceptions::authentication_exception) override {
logger.debug("Decoding credentials from client token");
sstring username, password;
auto b = client_response.crbegin();
auto e = client_response.crend();
auto i = b;
while (i != e) {
if (*i == 0) {
sstring tmp(i.base(), b.base());
if (password.empty()) {
password = std::move(tmp);
} else if (username.empty()) {
username = std::move(tmp);
}
b = ++i;
continue;
}
++i;
}
if (username.empty()) {
throw exceptions::authentication_exception("Authentication ID must not be null");
}
if (password.empty()) {
throw exceptions::authentication_exception("Password must not be null");
}
_credentials[USERNAME_KEY] = std::move(username);
_credentials[PASSWORD_KEY] = std::move(password);
_complete = true;
return {};
}
bool is_complete() const override {
return _complete;
}
future<::shared_ptr<authenticated_user>> get_authenticated_user() const
throw (exceptions::authentication_exception) override {
return _authenticator.authenticate(_credentials);
}
private:
const password_authenticator& _authenticator;
credentials_map _credentials;
bool _complete = false;
};
return ::make_shared<plain_text_password_challenge>(*this);
}

View File

@@ -0,0 +1,73 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "authenticator.hh"
namespace auth {
class password_authenticator : public authenticator {
public:
static const sstring PASSWORD_AUTHENTICATOR_NAME;
password_authenticator();
~password_authenticator();
future<> init();
const sstring& class_name() const override;
bool require_authentication() const override;
option_set supported_options() const override;
option_set alterable_options() const override;
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override;
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
const resource_ids& protected_resources() const override;
::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
static db::consistency_level consistency_for_user(const sstring& username);
};
}

101
auth/permission.cc Normal file
View File

@@ -0,0 +1,101 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <unordered_map>
#include "permission.hh"
const auth::permission_set auth::permissions::ALL_DATA =
auth::permission_set::of<auth::permission::CREATE,
auth::permission::ALTER, auth::permission::DROP,
auth::permission::SELECT,
auth::permission::MODIFY,
auth::permission::AUTHORIZE>();
const auth::permission_set auth::permissions::ALL = auth::permissions::ALL_DATA;
const auth::permission_set auth::permissions::NONE;
const auth::permission_set auth::permissions::ALTERATIONS =
auth::permission_set::of<auth::permission::CREATE,
auth::permission::ALTER, auth::permission::DROP>();
static const std::unordered_map<sstring, auth::permission> permission_names({
{ "READ", auth::permission::READ },
{ "WRITE", auth::permission::WRITE },
{ "CREATE", auth::permission::CREATE },
{ "ALTER", auth::permission::ALTER },
{ "DROP", auth::permission::DROP },
{ "SELECT", auth::permission::SELECT },
{ "MODIFY", auth::permission::MODIFY },
{ "AUTHORIZE", auth::permission::AUTHORIZE },
});
const sstring& auth::permissions::to_string(permission p) {
for (auto& v : permission_names) {
if (v.second == p) {
return v.first;
}
}
throw std::out_of_range("unknown permission");
}
auth::permission auth::permissions::from_string(const sstring& s) {
return permission_names.at(s);
}
std::unordered_set<sstring> auth::permissions::to_strings(const permission_set& set) {
std::unordered_set<sstring> res;
for (auto& v : permission_names) {
if (set.contains(v.second)) {
res.emplace(v.first);
}
}
return res;
}
auth::permission_set auth::permissions::from_strings(const std::unordered_set<sstring>& set) {
permission_set res = auth::permissions::NONE;
for (auto& s : set) {
res.set(from_string(s));
}
return res;
}
bool auth::operator<(const permission_set& p1, const permission_set& p2) {
return p1.mask() < p2.mask();
}

98
auth/permission.hh Normal file
View File

@@ -0,0 +1,98 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <unordered_set>
#include <seastar/core/sstring.hh>
#include "enum_set.hh"
namespace auth {
enum class permission {
//Deprecated
READ,
//Deprecated
WRITE,
// schema management
CREATE, // required for CREATE KEYSPACE and CREATE TABLE.
ALTER, // required for ALTER KEYSPACE, ALTER TABLE, CREATE INDEX, DROP INDEX.
DROP, // required for DROP KEYSPACE and DROP TABLE.
// data access
SELECT, // required for SELECT.
MODIFY, // required for INSERT, UPDATE, DELETE, TRUNCATE.
// permission management
AUTHORIZE, // required for GRANT and REVOKE.
};
typedef enum_set<super_enum<permission,
permission::READ,
permission::WRITE,
permission::CREATE,
permission::ALTER,
permission::DROP,
permission::SELECT,
permission::MODIFY,
permission::AUTHORIZE>> permission_set;
bool operator<(const permission_set&, const permission_set&);
namespace permissions {
extern const permission_set ALL_DATA;
extern const permission_set ALL;
extern const permission_set NONE;
extern const permission_set ALTERATIONS;
const sstring& to_string(permission);
permission from_string(const sstring&);
std::unordered_set<sstring> to_strings(const permission_set&);
permission_set from_strings(const std::unordered_set<sstring>&);
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2014 Cloudius Systems, Ltd.
* Copyright (C) 2014 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -22,6 +22,7 @@
#pragma once
#include "core/sstring.hh"
#include "hashing.hh"
#include <experimental/optional>
#include <iosfwd>
#include <functional>
@@ -57,3 +58,20 @@ std::ostream& operator<<(std::ostream& os, const bytes_view& b);
}
template<>
struct appending_hash<bytes> {
template<typename Hasher>
void operator()(Hasher& h, const bytes& v) const {
feed_hash(h, v.size());
h.update(reinterpret_cast<const char*>(v.cbegin()), v.size() * sizeof(bytes::value_type));
}
};
template<>
struct appending_hash<bytes_view> {
template<typename Hasher>
void operator()(Hasher& h, bytes_view v) const {
feed_hash(h, v.size());
h.update(reinterpret_cast<const char*>(v.begin()), v.size() * sizeof(bytes_view::value_type));
}
};

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -21,10 +21,12 @@
#pragma once
#include "types.hh"
#include "net/byteorder.hh"
#include "core/unaligned.hh"
#include <boost/range/iterator_range.hpp>
#include "bytes.hh"
#include "core/unaligned.hh"
#include "hashing.hh"
#include "seastar/core/simple-stream.hh"
/**
* Utility for writing data into a buffer when its final size is not known up front.
*
@@ -33,12 +35,22 @@
*
*/
class bytes_ostream {
public:
using size_type = bytes::size_type;
using value_type = bytes::value_type;
private:
static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
struct chunk {
// FIXME: group fragment pointers to reduce pointer chasing when packetizing
std::unique_ptr<chunk> next;
~chunk() {
auto p = std::move(next);
while (p) {
// Avoid recursion when freeing chunks
auto p_next = std::move(p->next);
p = std::move(p_next);
}
}
size_type offset; // Also means "size" after chunk is closed
size_type size;
value_type data[0];
@@ -117,13 +129,13 @@ private:
};
}
public:
bytes_ostream()
bytes_ostream() noexcept
: _begin()
, _current(nullptr)
, _size(0)
{ }
bytes_ostream(bytes_ostream&& o)
bytes_ostream(bytes_ostream&& o) noexcept
: _begin(std::move(o._begin))
, _current(o._current)
, _size(o._size)
@@ -148,7 +160,7 @@ public:
return *this;
}
bytes_ostream& operator=(bytes_ostream&& o) {
bytes_ostream& operator=(bytes_ostream&& o) noexcept {
_size = o._size;
_begin = std::move(o._begin);
_current = o._current;
@@ -160,16 +172,12 @@ public:
template <typename T>
struct place_holder {
value_type* ptr;
// makes the place_holder looks like a stream
seastar::simple_output_stream get_stream() {
return seastar::simple_output_stream{reinterpret_cast<char*>(ptr)};
}
};
// Writes given values in big-endian format
template <typename T>
inline
std::enable_if_t<std::is_fundamental<T>::value, void>
write(T val) {
*reinterpret_cast<unaligned<T>*>(alloc(sizeof(T))) = net::hton(val);
}
// Returns a place holder for a value to be written later.
template <typename T>
inline
@@ -203,17 +211,8 @@ public:
}
}
// Writes given sequence of bytes with a preceding length component encoded in big-endian format
inline void write_blob(bytes_view v) {
assert((size_type)v.size() == v.size());
write<size_type>(v.size());
write(v);
}
// Writes given value into the place holder in big-endian format
template <typename T>
inline void set(place_holder<T> ph, T val) {
*reinterpret_cast<unaligned<T>*>(ph.ptr) = net::hton(val);
void write(const char* ptr, size_t size) {
write(bytes_view(reinterpret_cast<const signed char*>(ptr), size));
}
bool is_linearized() const {
@@ -330,3 +329,13 @@ public:
_current->offset = pos._offset;
}
};
template<>
struct appending_hash<bytes_ostream> {
template<typename Hasher>
void operator()(Hasher& h, const bytes_ostream& b) const {
for (auto&& frag : b.fragments()) {
feed_hash(h, frag);
}
}
};

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -43,11 +43,13 @@ class caching_options {
throw exceptions::configuration_exception("Invalid key value: " + k);
}
try {
boost::lexical_cast<unsigned long>(r);
} catch (boost::bad_lexical_cast& e) {
if ((r != "ALL") && (r != "NONE")) {
throw exceptions::configuration_exception("Invalid key value: " + k);
if ((r == "ALL") || (r == "NONE")) {
return;
} else {
try {
boost::lexical_cast<unsigned long>(r);
} catch (boost::bad_lexical_cast& e) {
throw exceptions::configuration_exception("Invalid key value: " + r);
}
}
}
@@ -80,6 +82,12 @@ public:
}
return caching_options(k, r);
}
bool operator==(const caching_options& other) const {
return _key_cache == other._key_cache && _row_cache == other._row_cache;
}
bool operator!=(const caching_options& other) const {
return !(*this == other);
}
};

89
canonical_mutation.cc Normal file
View File

@@ -0,0 +1,89 @@
/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "canonical_mutation.hh"
#include "mutation.hh"
#include "mutation_partition_serializer.hh"
#include "converting_mutation_partition_applier.hh"
#include "hashing_partition_visitor.hh"
#include "utils/UUID.hh"
#include "serializer.hh"
#include "idl/uuid.dist.hh"
#include "idl/keys.dist.hh"
#include "idl/mutation.dist.hh"
#include "serializer_impl.hh"
#include "serialization_visitors.hh"
#include "idl/uuid.dist.impl.hh"
#include "idl/keys.dist.impl.hh"
#include "idl/mutation.dist.impl.hh"
canonical_mutation::canonical_mutation(bytes data)
: _data(std::move(data))
{ }
canonical_mutation::canonical_mutation(const mutation& m)
{
mutation_partition_serializer part_ser(*m.schema(), m.partition());
bytes_ostream out;
ser::writer_of_canonical_mutation wr(out);
std::move(wr).write_table_id(m.schema()->id())
.write_schema_version(m.schema()->version())
.write_key(m.key())
.write_mapping(m.schema()->get_column_mapping())
.partition([&] (auto wr) {
part_ser.write(std::move(wr));
}).end_canonical_mutation();
_data = to_bytes(out.linearize());
}
utils::UUID canonical_mutation::column_family_id() const {
auto in = ser::as_input_stream(_data);
auto mv = ser::deserialize(in, boost::type<ser::canonical_mutation_view>());
return mv.table_id();
}
mutation canonical_mutation::to_mutation(schema_ptr s) const {
auto in = ser::as_input_stream(_data);
auto mv = ser::deserialize(in, boost::type<ser::canonical_mutation_view>());
auto cf_id = mv.table_id();
if (s->id() != cf_id) {
throw std::runtime_error(sprint("Attempted to deserialize canonical_mutation of table %s with schema of table %s (%s.%s)",
cf_id, s->id(), s->ks_name(), s->cf_name()));
}
auto version = mv.schema_version();
auto pk = mv.key();
mutation m(std::move(pk), std::move(s));
if (version == m.schema()->version()) {
auto partition_view = mutation_partition_view::from_view(mv.partition());
m.partition().apply(*m.schema(), partition_view, *m.schema());
} else {
column_mapping cm = mv.mapping();
converting_mutation_partition_applier v(cm, *m.schema(), m.partition());
auto partition_view = mutation_partition_view::from_view(mv.partition());
partition_view.accept(cm, v);
}
return m;
}

55
canonical_mutation.hh Normal file
View File

@@ -0,0 +1,55 @@
/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "bytes.hh"
#include "schema.hh"
#include "database_fwd.hh"
#include "mutation_partition_visitor.hh"
#include "mutation_partition_serializer.hh"
// Immutable mutation form which can be read using any schema version of the same table.
// Safe to access from other shards via const&.
// Safe to pass serialized across nodes.
class canonical_mutation {
bytes _data;
public:
explicit canonical_mutation(bytes);
explicit canonical_mutation(const mutation&);
canonical_mutation(canonical_mutation&&) = default;
canonical_mutation(const canonical_mutation&) = default;
canonical_mutation& operator=(const canonical_mutation&) = default;
canonical_mutation& operator=(canonical_mutation&&) = default;
// Create a mutation object interpreting this canonical mutation using
// given schema.
//
// Data which is not representable in the target schema is dropped. If this
// is not intended, user should sync the schema first.
mutation to_mutation(schema_ptr) const;
utils::UUID column_family_id() const;
const bytes& representation() const { return _data; }
};

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
*/

147
checked-file-impl.hh Normal file
View File

@@ -0,0 +1,147 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "seastar/core/file.hh"
#include "disk-error-handler.hh"
class checked_file_impl : public file_impl {
public:
checked_file_impl(disk_error_signal_type& s, file f)
: _signal(s) , _file(f) {}
virtual future<size_t> write_dma(uint64_t pos, const void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->write_dma(pos, buffer, len, pc);
});
}
virtual future<size_t> write_dma(uint64_t pos, std::vector<iovec> iov, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->write_dma(pos, iov, pc);
});
}
virtual future<size_t> read_dma(uint64_t pos, void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->read_dma(pos, buffer, len, pc);
});
}
virtual future<size_t> read_dma(uint64_t pos, std::vector<iovec> iov, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->read_dma(pos, iov, pc);
});
}
virtual future<> flush(void) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->flush();
});
}
virtual future<struct stat> stat(void) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->stat();
});
}
virtual future<> truncate(uint64_t length) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->truncate(length);
});
}
virtual future<> discard(uint64_t offset, uint64_t length) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->discard(offset, length);
});
}
virtual future<> allocate(uint64_t position, uint64_t length) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->allocate(position, length);
});
}
virtual future<uint64_t> size(void) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->size();
});
}
virtual future<> close() override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->close();
});
}
virtual subscription<directory_entry> list_directory(std::function<future<> (directory_entry de)> next) override {
return do_io_check(_signal, [&] {
return get_file_impl(_file)->list_directory(next);
});
}
private:
disk_error_signal_type &_signal;
file _file;
};
inline file make_checked_file(disk_error_signal_type& signal, file& f)
{
return file(::make_shared<checked_file_impl>(signal, f));
}
future<file>
inline open_checked_file_dma(disk_error_signal_type& signal,
sstring name, open_flags flags,
file_open_options options)
{
return do_io_check(signal, [&] {
return open_file_dma(name, flags, options).then([&] (file f) {
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}
future<file>
inline open_checked_file_dma(disk_error_signal_type& signal,
sstring name, open_flags flags)
{
return do_io_check(signal, [&] {
return open_file_dma(name, flags).then([&] (file f) {
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}
future<file>
inline open_checked_directory(disk_error_signal_type& signal,
sstring name)
{
return do_io_check(signal, [&] {
return engine().open_directory(name).then([&] (file f) {
return make_ready_future<file>(make_checked_file(signal, f));
});
});
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -29,10 +29,13 @@ enum class compaction_strategy_type {
null,
major,
size_tiered,
// FIXME: Add support to LevelTiered, and DateTiered.
leveled,
// FIXME: Add support to DateTiered.
};
class compaction_strategy_impl;
class sstable;
struct compaction_descriptor;
class compaction_strategy {
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
@@ -45,7 +48,9 @@ public:
compaction_strategy(compaction_strategy&&);
compaction_strategy& operator=(compaction_strategy&&);
future<> compact(column_family& cfs);
// Return a list of sstables to be compacted after applying the strategy.
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
static sstring name(compaction_strategy_type type) {
switch (type) {
case compaction_strategy_type::null:
@@ -54,20 +59,26 @@ public:
return "MajorCompactionStrategy";
case compaction_strategy_type::size_tiered:
return "SizeTieredCompactionStrategy";
case compaction_strategy_type::leveled:
return "LeveledCompactionStrategy";
default:
throw std::runtime_error("Invalid Compaction Strategy");
}
}
static compaction_strategy_type type(const sstring& name) {
if (name == "NullCompactionStrategy") {
auto pos = name.find("org.apache.cassandra.db.compaction.");
sstring short_name = (pos == sstring::npos) ? name : name.substr(pos + 35);
if (short_name == "NullCompactionStrategy") {
return compaction_strategy_type::null;
} else if (name == "MajorCompactionStrategy") {
} else if (short_name == "MajorCompactionStrategy") {
return compaction_strategy_type::major;
} else if (name == "SizeTieredCompactionStrategy") {
} else if (short_name == "SizeTieredCompactionStrategy") {
return compaction_strategy_type::size_tiered;
} else if (short_name == "LeveledCompactionStrategy") {
return compaction_strategy_type::leveled;
} else {
throw exceptions::configuration_exception(sprint("Unable to find compaction strategy class 'org.apache.cassandra.db.compaction.%s", name));
throw exceptions::configuration_exception(sprint("Unable to find compaction strategy class '%s'", name));
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -26,29 +26,10 @@
#include <algorithm>
#include <vector>
#include <boost/range/iterator_range.hpp>
#include <boost/range/adaptor/transformed.hpp>
#include "utils/serialization.hh"
#include "unimplemented.hh"
// value_traits is meant to abstract away whether we are working on 'bytes'
// elements or 'bytes_opt' elements. We don't support optional values, but
// there are some generic layers which use this code which provide us with
// data in that format. In order to avoid allocation and rewriting that data
// into a new vector just to throw it away soon after that, we accept that
// format too.
template <typename T>
struct value_traits {
static const T& unwrap(const T& t) { return t; }
};
template<>
struct value_traits<bytes_opt> {
static const bytes& unwrap(const bytes_opt& t) {
assert(t);
return *t;
}
};
enum class allow_prefixes { no, yes };
template<allow_prefixes AllowPrefixes = allow_prefixes::no>
@@ -62,13 +43,14 @@ public:
static constexpr bool is_prefixable = AllowPrefixes == allow_prefixes::yes;
using prefix_type = compound_type<allow_prefixes::yes>;
using value_type = std::vector<bytes>;
using size_type = uint16_t;
compound_type(std::vector<data_type> types)
: _types(std::move(types))
, _byte_order_equal(std::all_of(_types.begin(), _types.end(), [] (auto t) {
return t->is_byte_order_equal();
}))
, _byte_order_comparable(_types.size() == 1 && _types[0]->is_byte_order_comparable())
, _byte_order_comparable(false)
, _is_reversed(_types.size() == 1 && _types[0]->is_reversed())
{ }
@@ -85,81 +67,56 @@ public:
prefix_type as_prefix() {
return prefix_type(_types);
}
private:
/*
* Format:
* <len(value1)><value1><len(value2)><value2>...<len(value_n-1)><value_n-1>(len(value_n))?<value_n>
* <len(value1)><value1><len(value2)><value2>...<len(value_n)><value_n>
*
* For non-prefixable compounds, the value corresponding to the last component of types doesn't
* have its length encoded, its length is deduced from the input range.
*
* serialize_value() and serialize_optionals() for single element rely on the fact that for a single-element
* compounds their serialized form is equal to the serialized form of the component.
*/
template<typename Wrapped>
void serialize_value(const std::vector<Wrapped>& values, bytes::iterator& out) {
if (AllowPrefixes == allow_prefixes::yes) {
assert(values.size() <= _types.size());
} else {
assert(values.size() == _types.size());
}
size_t n_left = _types.size();
for (auto&& wrapped : values) {
auto&& val = value_traits<Wrapped>::unwrap(wrapped);
assert(val.size() <= std::numeric_limits<uint16_t>::max());
if (--n_left || AllowPrefixes == allow_prefixes::yes) {
write<uint16_t>(out, uint16_t(val.size()));
}
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out) {
for (auto&& val : values) {
assert(val.size() <= std::numeric_limits<size_type>::max());
write<size_type>(out, size_type(val.size()));
out = std::copy(val.begin(), val.end(), out);
}
}
template <typename Wrapped>
size_t serialized_size(const std::vector<Wrapped>& values) {
template <typename RangeOfSerializedComponents>
static size_t serialized_size(RangeOfSerializedComponents&& values) {
size_t len = 0;
size_t n_left = _types.size();
for (auto&& wrapped : values) {
auto&& val = value_traits<Wrapped>::unwrap(wrapped);
assert(val.size() <= std::numeric_limits<uint16_t>::max());
if (--n_left || AllowPrefixes == allow_prefixes::yes) {
len += sizeof(uint16_t);
}
len += val.size();
for (auto&& val : values) {
len += sizeof(size_type) + val.size();
}
return len;
}
public:
bytes serialize_single(bytes&& v) {
if (AllowPrefixes == allow_prefixes::no) {
assert(_types.size() == 1);
return std::move(v);
} else {
// FIXME: Optimize
std::vector<bytes> vec;
vec.reserve(1);
vec.emplace_back(std::move(v));
return ::serialize_value(*this, vec);
}
return serialize_value({std::move(v)});
}
bytes serialize_value(const std::vector<bytes>& values) {
return ::serialize_value(*this, values);
}
bytes serialize_value(std::vector<bytes>&& values) {
if (AllowPrefixes == allow_prefixes::no && _types.size() == 1 && values.size() == 1) {
return std::move(values[0]);
template<typename RangeOfSerializedComponents>
static bytes serialize_value(RangeOfSerializedComponents&& values) {
auto size = serialized_size(values);
if (size > std::numeric_limits<size_type>::max()) {
throw std::runtime_error(sprint("Key size too large: %d > %d", size, std::numeric_limits<size_type>::max()));
}
return ::serialize_value(*this, values);
bytes b(bytes::initialized_later(), size);
auto i = b.begin();
serialize_value(values, i);
return b;
}
template<typename T>
static bytes serialize_value(std::initializer_list<T> values) {
return serialize_value(boost::make_iterator_range(values.begin(), values.end()));
}
bytes serialize_optionals(const std::vector<bytes_opt>& values) {
return ::serialize_value(*this, values);
return serialize_value(values | boost::adaptors::transformed([] (const bytes_opt& bo) -> bytes_view {
if (!bo) {
throw std::logic_error("attempted to create key component from empty optional");
}
return *bo;
}));
}
bytes serialize_optionals(std::vector<bytes_opt>&& values) {
if (AllowPrefixes == allow_prefixes::no && _types.size() == 1 && values.size() == 1) {
assert(values[0]);
return std::move(*values[0]);
}
return ::serialize_value(*this, values);
}
bytes serialize_value_deep(const std::vector<boost::any>& values) {
bytes serialize_value_deep(const std::vector<data_value>& values) {
// TODO: Optimize
std::vector<bytes> partial;
partial.reserve(values.size());
@@ -171,37 +128,21 @@ public:
return serialize_value(partial);
}
bytes decompose_value(const value_type& values) {
return ::serialize_value(*this, values);
return serialize_value(values);
}
class iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
private:
ssize_t _types_left;
bytes_view _v;
value_type _current;
private:
void read_current() {
if (_types_left == 0) {
if (!_v.empty()) {
throw marshal_exception();
}
_v = bytes_view(nullptr, 0);
return;
}
--_types_left;
uint16_t len;
if (_types_left == 0 && AllowPrefixes == allow_prefixes::no) {
len = _v.size();
} else {
size_type len;
{
if (_v.empty()) {
if (AllowPrefixes == allow_prefixes::yes) {
_types_left = 0;
_v = bytes_view(nullptr, 0);
return;
} else {
throw marshal_exception();
}
_v = bytes_view(nullptr, 0);
return;
}
len = read_simple<uint16_t>(_v);
len = read_simple<size_type>(_v);
if (_v.size() < len) {
throw marshal_exception();
}
@@ -211,10 +152,10 @@ public:
}
public:
struct end_iterator_tag {};
iterator(const compound_type& t, const bytes_view& v) : _types_left(t._types.size()), _v(v) {
iterator(const bytes_view& v) : _v(v) {
read_current();
}
iterator(end_iterator_tag, const bytes_view& v) : _types_left(0), _v(nullptr, 0) {}
iterator(end_iterator_tag, const bytes_view& v) : _v(nullptr, 0) {}
iterator& operator++() {
read_current();
return *this;
@@ -226,21 +167,18 @@ public:
}
const value_type& operator*() const { return _current; }
const value_type* operator->() const { return &_current; }
bool operator!=(const iterator& i) const { return _v.begin() != i._v.begin() || _types_left != i._types_left; }
bool operator==(const iterator& i) const { return _v.begin() == i._v.begin() && _types_left == i._types_left; }
bool operator!=(const iterator& i) const { return _v.begin() != i._v.begin(); }
bool operator==(const iterator& i) const { return _v.begin() == i._v.begin(); }
};
iterator begin(const bytes_view& v) const {
return iterator(*this, v);
static iterator begin(const bytes_view& v) {
return iterator(v);
}
iterator end(const bytes_view& v) const {
static iterator end(const bytes_view& v) {
return iterator(typename iterator::end_iterator_tag(), v);
}
boost::iterator_range<iterator> components(const bytes_view& v) const {
static boost::iterator_range<iterator> components(const bytes_view& v) {
return { begin(v), end(v) };
}
auto iter_items(const bytes_view& v) {
return boost::iterator_range<iterator>(begin(v), end(v));
}
value_type deserialize_value(bytes_view v) {
std::vector<bytes> result;
result.reserve(_types.size());
@@ -258,7 +196,7 @@ public:
}
auto t = _types.begin();
size_t h = 0;
for (auto&& value : iter_items(v)) {
for (auto&& value : components(v)) {
h ^= (*t)->hash(value);
++t;
}
@@ -277,12 +215,6 @@ public:
return type->compare(v1, v2);
});
}
bytes from_string(sstring_view s) {
throw std::runtime_error("not implemented");
}
sstring to_string(const bytes& b) {
throw std::runtime_error("not implemented");
}
// Retruns true iff given prefix has no missing components
bool is_full(bytes_view v) const {
assert(AllowPrefixes == allow_prefixes::yes);

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -114,6 +114,14 @@ public:
}
return opts;
}
bool operator==(const compression_parameters& other) const {
return _compressor == other._compressor
&& _chunk_length == other._chunk_length
&& _crc_check_chance == other._crc_check_chance;
}
bool operator!=(const compression_parameters& other) const {
return !(*this == other);
}
private:
void validate_options(const std::map<sstring, sstring>& options) {
// currently, there are no options specific to a particular compressor

View File

@@ -0,0 +1,15 @@
#
# cassandra-rackdc.properties
#
# The lines may include white spaces at the beginning and the end.
# The rack and data center names may also include white spaces.
# All trailing and leading white spaces will be trimmed.
#
# dc=my_data_center
# rack=my_rack
# prefer_local=<false | true>
# dc_suffix=<Data Center name suffix, used by EC2SnitchXXX snitches>
#

View File

@@ -106,6 +106,19 @@ write_request_timeout_in_ms: 2000
# most users should never need to adjust this.
# phi_convict_threshold: 8
# IEndpointSnitch. The snitch has two functions:
# - it teaches Scylla enough about your network topology to route
# requests efficiently
# - it allows Scylla to spread replicas around your cluster to avoid
# correlated failures. It does this by grouping machines into
# "datacenters" and "racks." Scylla will do its best not to have
# more than one replica on the same "rack" (which may not actually
# be a physical location)
#
# IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
# YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS
# ARE PLACED.
#
# Out of the box, Scylla provides
# - SimpleSnitch:
# Treats Strategy order as proximity. This can improve cache
@@ -169,6 +182,35 @@ rpc_address: localhost
# port for Thrift to listen for clients on
rpc_port: 9160
# port for REST API server
api_port: 10000
# IP for the REST API server
api_address: 127.0.0.1
# Log WARN on any batch size exceeding this value. 5kb per batch by default.
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
batch_size_warn_threshold_in_kb: 5
# Authentication backend, identifying users
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
# users. It keeps usernames and hashed passwords in system_auth.credentials table.
# Please increase system_auth keyspace replication factor if you use this authenticator.
# authenticator: AllowAllAuthenticator
# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. Please
# increase system_auth keyspace replication factor if you use this authorizer.
# authorizer: AllowAllAuthorizer
###################################################
## Not currently supported, reserved for future use
###################################################
@@ -205,25 +247,6 @@ rpc_port: 9160
# reduced proportionally to the number of nodes in the cluster.
# batchlog_replay_throttle_in_kb: 1024
# Authentication backend, implementing IAuthenticator; used to identify users
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
# users. It keeps usernames and hashed passwords in system_auth.credentials table.
# Please increase system_auth keyspace replication factor if you use this authenticator.
# authenticator: AllowAllAuthenticator
# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. Please
# increase system_auth keyspace replication factor if you use this authorizer.
# authorizer: AllowAllAuthorizer
# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
# one example). Defaults to 2000, set to 0 to disable.
@@ -409,15 +432,16 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# offheap_objects: native memory, eliminating nio buffer heap overhead
# memtable_allocation_type: heap_buffers
# Total space to use for commitlogs. Since commitlog segments are
# mmapped, and hence use up address space, the default size is 32
# on 32-bit JVMs, and 8192 on 64-bit JVMs.
# Total space to use for commitlogs.
#
# If space gets above this value (it will round up to the next nearest
# segment multiple), Scylla will flush every dirty CF in the oldest
# segment and remove it. So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
commitlog_total_space_in_mb: 8192
#
# A value of -1 (default) will automatically equate it to the total amount of memory
# available for Scylla.
commitlog_total_space_in_mb: -1
# This sets the amount of memtable flush writer threads. These will
# be blocked by disk io, and each one will hold a memtable in memory
@@ -598,10 +622,6 @@ commitlog_total_space_in_mb: 8192
# column_index_size_in_kb: 64
# Log WARN on any batch size exceeding this value. 5kb per batch by default.
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
# batch_size_warn_threshold_in_kb: 5
# Number of simultaneous compactions to allow, NOT including
# validation "compactions" for anti-entropy repair. Simultaneous
# compactions can help preserve read performance in a mixed read/write
@@ -672,58 +692,6 @@ commitlog_total_space_in_mb: 8192
# Default value is 0, which never timeout streams.
# streaming_socket_timeout_in_ms: 0
# endpoint_snitch -- Set this to a class that implements
# IEndpointSnitch. The snitch has two functions:
# - it teaches Scylla enough about your network topology to route
# requests efficiently
# - it allows Scylla to spread replicas around your cluster to avoid
# correlated failures. It does this by grouping machines into
# "datacenters" and "racks." Scylla will do its best not to have
# more than one replica on the same "rack" (which may not actually
# be a physical location)
#
# IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
# YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS
# ARE PLACED.
#
# Out of the box, Scylla provides
# - SimpleSnitch:
# Treats Strategy order as proximity. This can improve cache
# locality when disabling read repair. Only appropriate for
# single-datacenter deployments.
# - GossipingPropertyFileSnitch
# This should be your go-to snitch for production use. The rack
# and datacenter for the local node are defined in
# cassandra-rackdc.properties and propagated to other nodes via
# gossip. If cassandra-topology.properties exists, it is used as a
# fallback, allowing migration from the PropertyFileSnitch.
# - PropertyFileSnitch:
# Proximity is determined by rack and data center, which are
# explicitly configured in cassandra-topology.properties.
# - Ec2Snitch:
# Appropriate for EC2 deployments in a single Region. Loads Region
# and Availability Zone information from the EC2 API. The Region is
# treated as the datacenter, and the Availability Zone as the rack.
# Only private IPs are used, so this will not work across multiple
# Regions.
# - Ec2MultiRegionSnitch:
# Uses public IPs as broadcast_address to allow cross-region
# connectivity. (Thus, you should set seed addresses to the public
# IP as well.) You will need to open the storage_port or
# ssl_storage_port on the public IP firewall. (For intra-Region
# traffic, Scylla will switch to the private IP after
# establishing a connection.)
# - RackInferringSnitch:
# Proximity is determined by rack and data center, which are
# assumed to correspond to the 3rd and 2nd octet of each node's IP
# address, respectively. Unless this happens to match your
# deployment conventions, this is best used as an example of
# writing a custom Snitch class and is provided in that spirit.
#
# You can use a custom Snitch by setting this to the full class name
# of the snitch, which will be assumed to be on your classpath.
# controls how often to perform the more expensive part of host score
# calculation
# dynamic_snitch_update_interval_in_ms: 100
@@ -781,40 +749,25 @@ commitlog_total_space_in_mb: 8192
# the request scheduling. Currently the only valid option is keyspace.
# request_scheduler_id: keyspace
# Enable or disable inter-node encryption
# Default settings are TLS v1, RSA 1024-bit keys (it is imperative that
# users generate their own keys) TLS_RSA_WITH_AES_128_CBC_SHA as the cipher
# suite for authentication, key exchange and encryption of the actual data transfers.
# Use the DHE/ECDHE ciphers if running in FIPS 140 compliant mode.
# NOTE: No custom encryption options are enabled at the moment
# Enable or disable inter-node encryption.
# You must also generate keys and provide the appropriate key and trust store locations and passwords.
# No custom encryption options are currently enabled. The available options are:
#
# The available internode options are : all, none, dc, rack
#
# If set to dc cassandra will encrypt the traffic between the DCs
# If set to rack cassandra will encrypt the traffic between the racks
#
# The passwords used in these options must match the passwords used when generating
# the keystore and truststore. For instructions on generating these files, see:
# http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore
# If set to dc scylla will encrypt the traffic between the DCs
# If set to rack scylla will encrypt the traffic between the racks
#
# server_encryption_options:
# internode_encryption: none
# keystore: conf/.keystore
# keystore_password: cassandra
# truststore: conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
# require_client_auth: false
# certificate: conf/scylla.crt
# keyfile: conf/scylla.key
# truststore: <none, use system trust>
# enable or disable client/server encryption.
# client_encryption_options:
# enabled: false
# keystore: conf/.keystore
# keystore_password: cassandra
# certificate: conf/scylla.crt
# keyfile: conf/scylla.key
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
@@ -838,3 +791,17 @@ commitlog_total_space_in_mb: 8192
# reducing overhead from the TCP protocol itself, at the cost of increasing
# latency if you block for cross-datacenter responses.
# inter_dc_tcp_nodelay: false
# Relaxation of environment checks.
#
# Scylla places certain requirements on its environment. If these requirements are
# not met, performance and reliability can be degraded.
#
# These requirements include:
# - A filesystem with good support for aysnchronous I/O (AIO). Currently,
# this means XFS.
#
# false: strict environment checks are in place; do not start if they are not met.
# true: relaxed environment checks; performance and reliability may degraade.
#
# developer_mode: false

View File

@@ -1,6 +1,6 @@
#!/usr/bin/python3
#
# Copyright 2015 Cloudius Systems
# Copyright (C) 2015 ScyllaDB
#
#
@@ -25,6 +25,31 @@ from distutils.spawn import find_executable
configure_args = str.join(' ', [shlex.quote(x) for x in sys.argv[1:]])
for line in open('/etc/os-release'):
key, _, value = line.partition('=')
value = value.strip().strip('"')
if key == 'ID':
os_ids = [value]
if key == 'ID_LIKE':
os_ids += value.split(' ')
# distribution "internationalization", converting package names.
# Fedora name is key, values is distro -> package name dict.
i18n_xlat = {
'boost-devel': {
'debian': 'libboost-dev',
'ubuntu': 'libboost-dev (libboost1.55-dev on 14.04)',
},
}
def pkgname(name):
if name in i18n_xlat:
dict = i18n_xlat[name]
for id in os_ids:
if id in dict:
return dict[id]
return name
def get_flags():
with open('/proc/cpuinfo') as f:
for line in f:
@@ -50,6 +75,9 @@ def apply_tristate(var, test, note, missing):
return False
return False
def have_pkg(package):
return subprocess.call(['pkg-config', package]) == 0
def pkg_config(option, package):
output = subprocess.check_output(['pkg-config', option, package])
return output.decode('utf-8').strip()
@@ -132,8 +160,10 @@ modes = {
},
}
urchin_tests = [
scylla_tests = [
'tests/mutation_test',
'tests/schema_registry_test',
'tests/canonical_mutation_test',
'tests/range_test',
'tests/types_test',
'tests/keys_test',
@@ -151,7 +181,9 @@ urchin_tests = [
'tests/perf/perf_sstable',
'tests/cql_query_test',
'tests/storage_proxy_test',
'tests/schema_change_test',
'tests/mutation_reader_test',
'tests/key_reader_test',
'tests/mutation_query_test',
'tests/row_cache_test',
'tests/test-serialization',
@@ -161,7 +193,6 @@ urchin_tests = [
'tests/commitlog_test',
'tests/cartesian_product_test',
'tests/hash_test',
'tests/serializer_test',
'tests/map_difference_test',
'tests/message',
'tests/gossip',
@@ -169,6 +200,7 @@ urchin_tests = [
'tests/compound_test',
'tests/config_test',
'tests/gossiping_property_file_snitch_test',
'tests/ec2_snitch_test',
'tests/snitch_reset_test',
'tests/network_topology_strategy_test',
'tests/query_processor_test',
@@ -180,15 +212,23 @@ urchin_tests = [
'tests/logalloc_test',
'tests/managed_vector_test',
'tests/crc_test',
'tests/flush_queue_test',
'tests/dynamic_bitset_test',
'tests/auth_test',
'tests/idl_test',
]
apps = [
'scylla',
]
tests = urchin_tests
tests = scylla_tests
all_artifacts = apps + tests
other = [
'iotune',
]
all_artifacts = apps + tests + other
arg_parser = argparse.ArgumentParser('Configure scylla')
arg_parser.add_argument('--static', dest = 'static', action = 'store_const', default = '',
@@ -216,24 +256,33 @@ arg_parser.add_argument('--debuginfo', action = 'store', dest = 'debuginfo', typ
help = 'Enable(1)/disable(0)compiler debug information generation')
arg_parser.add_argument('--static-stdc++', dest = 'staticcxx', action = 'store_true',
help = 'Link libgcc and libstdc++ statically')
arg_parser.add_argument('--static-thrift', dest = 'staticthrift', action = 'store_true',
help = 'Link libthrift statically')
arg_parser.add_argument('--tests-debuginfo', action = 'store', dest = 'tests_debuginfo', type = int, default = 0,
help = 'Enable(1)/disable(0)compiler debug information generation for tests')
arg_parser.add_argument('--python', action = 'store', dest = 'python', default = 'python3',
help = 'Python3 path')
add_tristate(arg_parser, name = 'hwloc', dest = 'hwloc', help = 'hwloc support')
add_tristate(arg_parser, name = 'xen', dest = 'xen', help = 'Xen support')
args = arg_parser.parse_args()
defines = []
urchin_libs = '-llz4 -lsnappy -lz -lboost_thread -lcryptopp -lrt -lyaml-cpp -lboost_date_time'
extra_cxxflags = {}
cassandra_interface = Thrift(source = 'interface/cassandra.thrift', service = 'Cassandra')
urchin_core = (['database.cc',
scylla_core = (['database.cc',
'schema.cc',
'frozen_schema.cc',
'schema_registry.cc',
'bytes.cc',
'mutation.cc',
'row_cache.cc',
'canonical_mutation.cc',
'frozen_mutation.cc',
'memtable.cc',
'schema_mutations.cc',
'release.cc',
'utils/logalloc.cc',
'utils/large_bitset.cc',
@@ -242,6 +291,7 @@ urchin_core = (['database.cc',
'mutation_partition_serializer.cc',
'mutation_reader.cc',
'mutation_query.cc',
'key_reader.cc',
'keys.cc',
'sstables/sstables.cc',
'sstables/compress.cc',
@@ -250,6 +300,8 @@ urchin_core = (['database.cc',
'sstables/partition.cc',
'sstables/filter.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/compaction_manager.cc',
'log.cc',
'transport/event.cc',
'transport/event_notifier.cc',
@@ -266,12 +318,20 @@ urchin_core = (['database.cc',
'cql3/maps.cc',
'cql3/functions/functions.cc',
'cql3/statements/cf_prop_defs.cc',
'cql3/statements/cf_statement.cc',
'cql3/statements/authentication_statement.cc',
'cql3/statements/create_keyspace_statement.cc',
'cql3/statements/create_table_statement.cc',
'cql3/statements/create_type_statement.cc',
'cql3/statements/create_user_statement.cc',
'cql3/statements/drop_keyspace_statement.cc',
'cql3/statements/drop_table_statement.cc',
'cql3/statements/drop_type_statement.cc',
'cql3/statements/schema_altering_statement.cc',
'cql3/statements/ks_prop_defs.cc',
'cql3/statements/modification_statement.cc',
'cql3/statements/parsed_statement.cc',
'cql3/statements/property_definitions.cc',
'cql3/statements/update_statement.cc',
'cql3/statements/delete_statement.cc',
'cql3/statements/batch_statement.cc',
@@ -281,8 +341,19 @@ urchin_core = (['database.cc',
'cql3/statements/index_target.cc',
'cql3/statements/create_index_statement.cc',
'cql3/statements/truncate_statement.cc',
'cql3/statements/alter_table_statement.cc',
'cql3/statements/alter_user_statement.cc',
'cql3/statements/drop_user_statement.cc',
'cql3/statements/list_users_statement.cc',
'cql3/statements/authorization_statement.cc',
'cql3/statements/permission_altering_statement.cc',
'cql3/statements/list_permissions_statement.cc',
'cql3/statements/grant_statement.cc',
'cql3/statements/revoke_statement.cc',
'cql3/statements/alter_type_statement.cc',
'cql3/update_parameters.cc',
'cql3/ut_name.cc',
'cql3/user_options.cc',
'thrift/handler.cc',
'thrift/server.cc',
'thrift/thrift_validation.cc',
@@ -292,6 +363,7 @@ urchin_core = (['database.cc',
'utils/big_decimal.cc',
'types.cc',
'validation.cc',
'service/priority_manager.cc',
'service/migration_manager.cc',
'service/storage_proxy.cc',
'cql3/operator.cc',
@@ -317,7 +389,7 @@ urchin_core = (['database.cc',
'db/schema_tables.cc',
'db/commitlog/commitlog.cc',
'db/commitlog/commitlog_replayer.cc',
'db/serializer.cc',
'db/commitlog/commitlog_entry.cc',
'db/config.cc',
'db/index/secondary_index.cc',
'db/marshal/type_parser.cc',
@@ -329,8 +401,10 @@ urchin_core = (['database.cc',
'utils/bloom_filter.cc',
'utils/bloom_calculations.cc',
'utils/rate_limiter.cc',
'utils/compaction_manager.cc',
'utils/file_lock.cc',
'utils/dynamic_bitset.cc',
'utils/managed_bytes.cc',
'utils/exceptions.cc',
'gms/version_generator.cc',
'gms/versioned_value.cc',
'gms/gossiper.cc',
@@ -339,10 +413,12 @@ urchin_core = (['database.cc',
'gms/gossip_digest_ack.cc',
'gms/gossip_digest_ack2.cc',
'gms/endpoint_state.cc',
'gms/application_state.cc',
'dht/i_partitioner.cc',
'dht/murmur3_partitioner.cc',
'dht/byte_ordered_partitioner.cc',
'dht/boot_strapper.cc',
'dht/range_streamer.cc',
'unimplemented.cc',
'query.cc',
'query-result-set.cc',
@@ -350,16 +426,23 @@ urchin_core = (['database.cc',
'locator/simple_strategy.cc',
'locator/local_strategy.cc',
'locator/network_topology_strategy.cc',
'locator/everywhere_replication_strategy.cc',
'locator/token_metadata.cc',
'locator/locator.cc',
'locator/snitch_base.cc',
'locator/simple_snitch.cc',
'locator/rack_inferring_snitch.cc',
'locator/gossiping_property_file_snitch.cc',
'locator/production_snitch_base.cc',
'locator/ec2_snitch.cc',
'locator/ec2_multi_region_snitch.cc',
'message/messaging_service.cc',
'service/client_state.cc',
'service/migration_task.cc',
'service/storage_service.cc',
'streaming/streaming.cc',
'service/load_broadcaster.cc',
'service/pager/paging_state.cc',
'service/pager/query_pagers.cc',
'streaming/stream_task.cc',
'streaming/stream_session.cc',
'streaming/stream_request.cc',
@@ -372,18 +455,21 @@ urchin_core = (['database.cc',
'streaming/stream_coordinator.cc',
'streaming/stream_manager.cc',
'streaming/stream_result_future.cc',
'streaming/messages/stream_init_message.cc',
'streaming/messages/retry_message.cc',
'streaming/messages/received_message.cc',
'streaming/messages/prepare_message.cc',
'streaming/messages/file_message_header.cc',
'streaming/messages/outgoing_file_message.cc',
'streaming/messages/incoming_file_message.cc',
'streaming/stream_session_state.cc',
'gc_clock.cc',
'partition_slice_builder.cc',
'init.cc',
'repair/repair.cc',
'exceptions/exceptions.cc',
'dns.cc',
'auth/auth.cc',
'auth/authenticated_user.cc',
'auth/authenticator.cc',
'auth/authorizer.cc',
'auth/default_authorizer.cc',
'auth/data_resource.cc',
'auth/password_authenticator.cc',
'auth/permission.cc',
]
+ [Antlr3Grammar('cql3/Cql.g')]
+ [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -419,30 +505,54 @@ api = ['api/api.cc',
'api/lsa.cc',
'api/api-doc/stream_manager.json',
'api/stream_manager.cc',
'api/api-doc/system.json',
'api/system.cc'
]
urchin_tests_dependencies = urchin_core + [
idls = ['idl/gossip_digest.idl.hh',
'idl/uuid.idl.hh',
'idl/range.idl.hh',
'idl/keys.idl.hh',
'idl/read_command.idl.hh',
'idl/token.idl.hh',
'idl/ring_position.idl.hh',
'idl/result.idl.hh',
'idl/frozen_mutation.idl.hh',
'idl/reconcilable_result.idl.hh',
'idl/streaming.idl.hh',
'idl/paging_state.idl.hh',
'idl/frozen_schema.idl.hh',
'idl/partition_checksum.idl.hh',
'idl/replay_position.idl.hh',
'idl/truncation_record.idl.hh',
'idl/mutation.idl.hh',
'idl/query.idl.hh',
'idl/idl_test.idl.hh',
'idl/commitlog.idl.hh',
]
scylla_tests_dependencies = scylla_core + api + idls + [
'tests/cql_test_env.cc',
'tests/cql_assertions.cc',
'tests/result_set_assertions.cc',
'tests/mutation_source_test.cc',
]
urchin_tests_seastar_deps = [
scylla_tests_seastar_deps = [
'seastar/tests/test-utils.cc',
'seastar/tests/test_runner.cc',
]
deps = {
'scylla': ['main.cc'] + urchin_core + api,
'scylla': idls + ['main.cc'] + scylla_core + api,
}
tests_not_using_seastar_test_framework = set([
'tests/types_test',
'tests/keys_test',
'tests/partitioner_test',
'tests/map_difference_test',
'tests/frozen_mutation_test',
'tests/canonical_mutation_test',
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
@@ -461,23 +571,25 @@ tests_not_using_seastar_test_framework = set([
'tests/crc_test',
'tests/perf/perf_sstable',
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
])
for t in tests_not_using_seastar_test_framework:
if not t in urchin_tests:
raise Exception("Test %s not found in urchin_tests" % (t))
if not t in scylla_tests:
raise Exception("Test %s not found in scylla_tests" % (t))
for t in urchin_tests:
deps[t] = urchin_tests_dependencies + [t + '.cc']
for t in scylla_tests:
deps[t] = scylla_tests_dependencies + [t + '.cc']
if t not in tests_not_using_seastar_test_framework:
deps[t] += urchin_tests_seastar_deps
deps[t] += scylla_tests_seastar_deps
deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'log.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'log.cc', 'utils/dynamic_bitset.cc']
warnings = [
'-Wno-mismatched-tags', # clang-only
@@ -491,6 +603,7 @@ warnings = [w
warnings = ' '.join(warnings)
dbgflag = debug_flag(args.cxx) if args.debuginfo else ''
tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'
if args.so:
args.pie = '-shared'
@@ -502,6 +615,45 @@ else:
args.pie = ''
args.fpie = ''
# a list element means a list of alternative packages to consider
# the first element becomes the HAVE_pkg define
# a string element is a package name with no alternatives
optional_packages = [['libsystemd', 'libsystemd-daemon']]
pkgs = []
def setup_first_pkg_of_list(pkglist):
# The HAVE_pkg symbol is taken from the first alternative
upkg = pkglist[0].upper().replace('-', '_')
for pkg in pkglist:
if have_pkg(pkg):
pkgs.append(pkg)
defines.append('HAVE_{}=1'.format(upkg))
return True
return False
for pkglist in optional_packages:
if isinstance(pkglist, str):
pkglist = [pkglist]
if not setup_first_pkg_of_list(pkglist):
if len(pkglist) == 1:
print('Missing optional package {pkglist[0]}'.format(**locals()))
else:
alternatives = ':'.join(pkglist[1:])
print('Missing optional package {pkglist[0]} (or alteratives {alternatives})'.format(**locals()))
if not try_compile(compiler=args.cxx, source='#include <boost/version.hpp>'):
print('Boost not installed. Please install {}.'.format(pkgname("boost-devel")))
sys.exit(1)
if not try_compile(compiler=args.cxx, source='''\
#include <boost/version.hpp>
#if BOOST_VERSION < 105500
#error Boost version too low
#endif
'''):
print('Installed boost version too old. Please update {}.'.format(pkgname("boost-devel")))
sys.exit(1)
defines = ' '.join(['-D' + d for d in defines])
globals().update(vars(args))
@@ -530,10 +682,13 @@ if args.dpdk:
seastar_flags += ['--enable-dpdk']
elif args.dpdk_target:
seastar_flags += ['--dpdk-target', args.dpdk_target]
if args.staticcxx:
seastar_flags += ['--static-stdc++']
seastar_flags += ['--compiler', args.cxx, '--cflags=-march=nehalem']
seastar_cflags = args.user_cflags + " -march=nehalem"
seastar_flags += ['--compiler', args.cxx, '--cflags=%s' % (seastar_cflags)]
status = subprocess.call(['./configure.py'] + seastar_flags, cwd = 'seastar')
status = subprocess.call([python, './configure.py'] + seastar_flags, cwd = 'seastar')
if status != 0:
print('Seastar configuration failed')
@@ -562,11 +717,18 @@ for mode in build_modes:
seastar_deps = 'practically_anything_can_change_so_lets_run_it_every_time_and_restat.'
args.user_cflags += " " + pkg_config("--cflags", "jsoncpp")
libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem'
libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem' + ' -lcrypt' + ' -lboost_date_time'
for pkg in pkgs:
args.user_cflags += ' ' + pkg_config('--cflags', pkg)
libs += ' ' + pkg_config('--libs', pkg)
user_cflags = args.user_cflags
user_ldflags = args.user_ldflags
if args.staticcxx:
user_ldflags += " -static-libgcc -static-libstdc++"
if args.staticthrift:
thrift_libs = "-Wl,-Bstatic -lthrift -Wl,-Bdynamic"
else:
thrift_libs = "-lthrift"
outdir = 'build'
buildfile = 'build.ninja'
@@ -594,10 +756,16 @@ with open(buildfile, 'w') as f:
rule swagger
command = seastar/json/json2code.py -f $in -o $out
description = SWAGGER $out
rule serializer
command = {python} ./idl-compiler.py --ns ser -f $in -o $out
description = IDL compiler $out
rule ninja
command = {ninja} -C $subdir $target
restat = 1
description = NINJA $out
rule copy
command = cp $in $out
description = COPY $out
''').format(**globals()))
for mode in build_modes:
modeval = modes[mode]
@@ -630,9 +798,12 @@ with open(buildfile, 'w') as f:
compiles = {}
ragels = {}
swaggers = {}
serializers = {}
thrifts = set()
antlr3_grammars = set()
for binary in build_artifacts:
if binary in other:
continue
srcs = deps[binary]
objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o')
for src in srcs
@@ -665,17 +836,17 @@ with open(buildfile, 'w') as f:
# So we strip the tests by default; The user can very
# quickly re-link the test unstripped by adding a "_g"
# to the test name, e.g., "ninja build/release/testname_g"
f.write('build $builddir/{}/{}: link_stripped.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
f.write('build $builddir/{}/{}: {}.{} {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
if has_thrift:
f.write(' libs = -lthrift -lboost_system $libs\n')
f.write(' libs = {} -lboost_system $libs\n'.format(thrift_libs))
f.write('build $builddir/{}/{}_g: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
else:
f.write('build $builddir/{}/{}: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
if has_thrift:
f.write(' libs = -lthrift -lboost_system $libs\n')
f.write(' libs = {} -lboost_system $libs\n'.format(thrift_libs))
for src in srcs:
if src.endswith('.cc'):
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
@@ -683,6 +854,9 @@ with open(buildfile, 'w') as f:
elif src.endswith('.rl'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
ragels[hh] = src
elif src.endswith('.idl.hh'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
serializers[hh] = src
elif src.endswith('.json'):
hh = '$builddir/' + mode + '/gen/' + src + '.hh'
swaggers[hh] = src
@@ -695,12 +869,14 @@ with open(buildfile, 'w') as f:
for obj in compiles:
src = compiles[obj]
gen_headers = list(ragels.keys())
gen_headers += ['seastar/build/{}/http/request_parser.hh'.format(mode)]
gen_headers += ['seastar/build/{}/gen/http/request_parser.hh'.format(mode)]
gen_headers += ['seastar/build/{}/gen/http/http_response_parser.hh'.format(mode)]
for th in thrifts:
gen_headers += th.headers('$builddir/{}/gen'.format(mode))
for g in antlr3_grammars:
gen_headers += g.headers('$builddir/{}/gen'.format(mode))
gen_headers += list(swaggers.keys())
gen_headers += list(serializers.keys())
f.write('build {}: cxx.{} {} || {} \n'.format(obj, mode, src, ' '.join(gen_headers)))
if src in extra_cxxflags:
f.write(' cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode = mode, extra_cxxflags = extra_cxxflags[src], **modeval))
@@ -710,6 +886,9 @@ with open(buildfile, 'w') as f:
for hh in swaggers:
src = swaggers[hh]
f.write('build {}: swagger {}\n'.format(hh,src))
for hh in serializers:
src = serializers[hh]
f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh,src))
for thrift in thrifts:
outs = ' '.join(thrift.generated('$builddir/{}/gen'.format(mode)))
f.write('build {}: thrift.{} {}\n'.format(outs, mode, thrift.source))
@@ -722,24 +901,24 @@ with open(buildfile, 'w') as f:
grammar.source.rsplit('.', 1)[0]))
for cc in grammar.sources('$builddir/{}/gen'.format(mode)):
obj = cc.replace('.cpp', '.o')
f.write('build {}: cxx.{} {}\n'.format(obj, mode, cc))
f.write('build seastar/build/{}/libseastar.a: ninja {}\n'.format(mode, seastar_deps))
f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
f.write('build seastar/build/{mode}/libseastar.a seastar/build/{mode}/apps/iotune/iotune seastar/build/{mode}/gen/http/request_parser.hh seastar/build/{mode}/gen/http/http_response_parser.hh: ninja {seastar_deps}\n'
.format(**locals()))
f.write(' subdir = seastar\n')
f.write(' target = build/{}/libseastar.a\n'.format(mode))
f.write(' target = build/{mode}/libseastar.a build/{mode}/apps/iotune/iotune build/{mode}/gen/http/request_parser.hh build/{mode}/gen/http/http_response_parser.hh\n'.format(**locals()))
f.write(textwrap.dedent('''\
build build/{mode}/iotune: copy seastar/build/{mode}/apps/iotune/iotune
''').format(**locals()))
f.write('build {}: phony\n'.format(seastar_deps))
f.write(textwrap.dedent('''\
rule configure
command = python3 configure.py $configure_args
command = {python} configure.py $configure_args
generator = 1
build build.ninja: configure | configure.py
rule cscope
command = find -name '*.[chS]' -o -name "*.cc" -o -name "*.hh" | cscope -bq -i-
description = CSCOPE
build cscope: cscope
rule request_parser_hh
command = {ninja} -C seastar build/release/gen/http/request_parser.hh build/debug/gen/http/request_parser.hh
description = GEN seastar/http/request_parser.hh
build seastar/build/release/http/request_parser.hh seastar/build/debug/http/request_parser.hh: request_parser_hh
rule clean
command = rm -rf build
description = CLEAN

View File

@@ -0,0 +1,119 @@
/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "mutation_partition_view.hh"
#include "schema.hh"
// Mutation partition visitor which applies visited data into
// existing mutation_partition. The visited data may be of a different schema.
// Data which is not representable in the new schema is dropped.
// Weak exception guarantees.
class converting_mutation_partition_applier : public mutation_partition_visitor {
const schema& _p_schema;
mutation_partition& _p;
const column_mapping& _visited_column_mapping;
deletable_row* _current_row;
private:
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return new_def.kind == kind && new_def.type->is_value_compatible_with(*old_type);
}
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
dst.apply(new_def, atomic_cell_or_collection(cell));
}
}
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
if (!is_compatible(new_def, old_type, kind)) {
return;
}
auto&& ctype = static_pointer_cast<const collection_type_impl>(old_type);
auto old_view = ctype->deserialize_mutation_form(cell);
collection_type_impl::mutation_view new_view;
if (old_view.tomb.timestamp > new_def.dropped_at()) {
new_view.tomb = old_view.tomb;
}
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
new_view.cells.emplace_back(std::move(c));
}
}
dst.apply(new_def, ctype->serialize_mutation_form(std::move(new_view)));
}
public:
converting_mutation_partition_applier(
const column_mapping& visited_column_mapping,
const schema& target_schema,
mutation_partition& target)
: _p_schema(target_schema)
, _p(target)
, _visited_column_mapping(visited_column_mapping)
{ }
virtual void accept_partition_tombstone(tombstone t) override {
_p.apply(t);
}
virtual void accept_static_cell(column_id id, atomic_cell_view cell) override {
const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), cell);
}
}
virtual void accept_static_cell(column_id id, collection_mutation_view collection) override {
const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), collection);
}
}
virtual void accept_row_tombstone(clustering_key_prefix_view prefix, tombstone t) override {
_p.apply_row_tombstone(_p_schema, prefix, t);
}
virtual void accept_row(clustering_key_view key, tombstone deleted_at, const row_marker& rm) override {
deletable_row& r = _p.clustered_row(_p_schema, key);
r.apply(rm);
r.apply(deleted_at);
_current_row = &r;
}
virtual void accept_row_cell(column_id id, atomic_cell_view cell) override {
const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), cell);
}
}
virtual void accept_row_cell(column_id id, collection_mutation_view collection) override {
const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
}
}
};

View File

@@ -26,15 +26,20 @@ options {
@parser::namespace{cql3_parser}
@lexer::includes {
#include "cql3/error_collector.hh"
#include "cql3/error_listener.hh"
}
@parser::includes {
#include "cql3/selection/writetime_or_ttl.hh"
#include "cql3/statements/alter_table_statement.hh"
#include "cql3/statements/create_keyspace_statement.hh"
#include "cql3/statements/drop_keyspace_statement.hh"
#include "cql3/statements/create_index_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "cql3/statements/create_type_statement.hh"
#include "cql3/statements/drop_type_statement.hh"
#include "cql3/statements/alter_type_statement.hh"
#include "cql3/statements/property_definitions.hh"
#include "cql3/statements/drop_table_statement.hh"
#include "cql3/statements/truncate_statement.hh"
@@ -44,6 +49,13 @@ options {
#include "cql3/statements/index_prop_defs.hh"
#include "cql3/statements/use_statement.hh"
#include "cql3/statements/batch_statement.hh"
#include "cql3/statements/create_user_statement.hh"
#include "cql3/statements/alter_user_statement.hh"
#include "cql3/statements/drop_user_statement.hh"
#include "cql3/statements/list_users_statement.hh"
#include "cql3/statements/grant_statement.hh"
#include "cql3/statements/revoke_statement.hh"
#include "cql3/statements/list_permissions_statement.hh"
#include "cql3/statements/index_target.hh"
#include "cql3/statements/ks_prop_defs.hh"
#include "cql3/selection/raw_selector.hh"
@@ -106,10 +118,13 @@ struct uninitialized {
}
@context {
using listener_type = cql3::error_listener<RecognizerType>;
using collector_type = cql3::error_collector<ComponentType, ExceptionBaseType::TokenType, ExceptionBaseType>;
using listener_type = cql3::error_listener<ComponentType, ExceptionBaseType>;
listener_type* listener;
std::vector<::shared_ptr<cql3::column_identifier>> _bind_variables;
std::vector<std::unique_ptr<TokenType>> _missing_tokens;
// Can't use static variable, since it needs to be defined out-of-line
static const std::unordered_set<sstring>& _reserved_type_names() {
@@ -159,15 +174,26 @@ struct uninitialized {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex)
{
std::stringstream msg;
ex->displayRecognitionError(token_names, msg);
listener->syntax_error(*this, msg.str());
listener->syntax_error(*this, token_names, ex);
}
void add_recognition_error(const sstring& msg) {
listener->syntax_error(*this, msg);
}
bool is_eof_token(CommonTokenType token) const
{
return token == CommonTokenType::TOKEN_EOF;
}
std::string token_text(const TokenType* token)
{
if (!token) {
return "";
}
return token->getText();
}
std::map<sstring, sstring> convert_property_map(shared_ptr<cql3::maps::literal> map) {
if (!map || map->entries.empty()) {
return std::map<sstring, sstring>{};
@@ -214,6 +240,13 @@ struct uninitialized {
}
operations.emplace_back(std::move(key), std::move(update));
}
TokenType* getMissingSymbol(IntStreamType* istream, ExceptionBaseType* e,
ANTLR_UINT32 expectedTokenType, BitsetListType* follow) {
auto token = BaseType::getMissingSymbol(istream, e, expectedTokenType, follow);
_missing_tokens.emplace_back(token);
return token;
}
}
@lexer::namespace{cql3_parser}
@@ -231,7 +264,8 @@ struct uninitialized {
}
@lexer::context {
using listener_type = cql3::error_listener<RecognizerType>;
using collector_type = cql3::error_collector<ComponentType, ExceptionBaseType::TokenType, ExceptionBaseType>;
using listener_type = cql3::error_listener<ComponentType, ExceptionBaseType>;
listener_type* listener;
@@ -241,9 +275,20 @@ struct uninitialized {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex)
{
std::stringstream msg;
ex->displayRecognitionError(token_names, msg);
listener->syntax_error(*this, msg.str());
listener->syntax_error(*this, token_names, ex);
}
bool is_eof_token(CommonTokenType token) const
{
return token == CommonTokenType::TOKEN_EOF;
}
std::string token_text(const TokenType* token) const
{
if (!token) {
return "";
}
return std::to_string(int(*token));
}
}
@@ -269,8 +314,11 @@ cqlStatement returns [shared_ptr<parsed_statement> stmt]
| st12=dropTableStatement { $stmt = st12; }
#if 0
| st13=dropIndexStatement { $stmt = st13; }
#endif
| st14=alterTableStatement { $stmt = st14; }
#if 0
| st15=alterKeyspaceStatement { $stmt = st15; }
#endif
| st16=grantStatement { $stmt = st16; }
| st17=revokeStatement { $stmt = st17; }
| st18=listPermissionsStatement { $stmt = st18; }
@@ -278,11 +326,14 @@ cqlStatement returns [shared_ptr<parsed_statement> stmt]
| st20=alterUserStatement { $stmt = st20; }
| st21=dropUserStatement { $stmt = st21; }
| st22=listUsersStatement { $stmt = st22; }
#if 0
| st23=createTriggerStatement { $stmt = st23; }
| st24=dropTriggerStatement { $stmt = st24; }
#endif
| st25=createTypeStatement { $stmt = st25; }
| st26=alterTypeStatement { $stmt = st26; }
| st27=dropTypeStatement { $stmt = st27; }
#if 0
| st28=createFunctionStatement { $stmt = st28; }
| st29=dropFunctionStatement { $stmt = st29; }
| st30=createAggregateStatement { $stmt = st30; }
@@ -692,7 +743,6 @@ cfamOrdering[shared_ptr<cql3::statements::create_table_statement::raw_statement>
;
#if 0
/**
* CREATE TYPE foo (
* <name1> <type1>,
@@ -700,17 +750,16 @@ cfamOrdering[shared_ptr<cql3::statements::create_table_statement::raw_statement>
* ....
* )
*/
createTypeStatement returns [CreateTypeStatement expr]
@init { boolean ifNotExists = false; }
: K_CREATE K_TYPE (K_IF K_NOT K_EXISTS { ifNotExists = true; } )?
tn=userTypeName { $expr = new CreateTypeStatement(tn, ifNotExists); }
createTypeStatement returns [::shared_ptr<create_type_statement> expr]
@init { bool if_not_exists = false; }
: K_CREATE K_TYPE (K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
tn=userTypeName { $expr = ::make_shared<create_type_statement>(tn, if_not_exists); }
'(' typeColumns[expr] ( ',' typeColumns[expr]? )* ')'
;
typeColumns[CreateTypeStatement expr]
: k=ident v=comparatorType { $expr.addDefinition(k, v); }
typeColumns[::shared_ptr<create_type_statement> expr]
: k=ident v=comparatorType { $expr->add_definition(k, v); }
;
#endif
/**
@@ -768,7 +817,7 @@ alterKeyspaceStatement returns [AlterKeyspaceStatement expr]
: K_ALTER K_KEYSPACE ks=keyspaceName
K_WITH properties[attrs] { $expr = new AlterKeyspaceStatement(ks, attrs); }
;
#endif
/**
* ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>;
@@ -777,24 +826,25 @@ alterKeyspaceStatement returns [AlterKeyspaceStatement expr]
* ALTER COLUMN FAMILY <CF> WITH <property> = <value>;
* ALTER COLUMN FAMILY <CF> RENAME <column> TO <column>;
*/
alterTableStatement returns [AlterTableStatement expr]
alterTableStatement returns [shared_ptr<alter_table_statement> expr]
@init {
AlterTableStatement.Type type = null;
CFPropDefs props = new CFPropDefs();
Map<ColumnIdentifier.Raw, ColumnIdentifier.Raw> renames = new HashMap<ColumnIdentifier.Raw, ColumnIdentifier.Raw>();
boolean isStatic = false;
alter_table_statement::type type;
auto props = make_shared<cql3::statements::cf_prop_defs>();;
std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>, shared_ptr<cql3::column_identifier::raw>>> renames;
bool is_static = false;
}
: K_ALTER K_COLUMNFAMILY cf=columnFamilyName
( K_ALTER id=cident K_TYPE v=comparatorType { type = AlterTableStatement.Type.ALTER; }
| K_ADD id=cident v=comparatorType ({ isStatic=true; } K_STATIC)? { type = AlterTableStatement.Type.ADD; }
| K_DROP id=cident { type = AlterTableStatement.Type.DROP; }
| K_WITH properties[props] { type = AlterTableStatement.Type.OPTS; }
| K_RENAME { type = AlterTableStatement.Type.RENAME; }
id1=cident K_TO toId1=cident { renames.put(id1, toId1); }
( K_AND idn=cident K_TO toIdn=cident { renames.put(idn, toIdn); } )*
( K_ALTER id=cident K_TYPE v=comparatorType { type = alter_table_statement::type::alter; }
| K_ADD id=cident v=comparatorType ({ is_static=true; } K_STATIC)? { type = alter_table_statement::type::add; }
| K_DROP id=cident { type = alter_table_statement::type::drop; }
| K_WITH properties[props] { type = alter_table_statement::type::opts; }
| K_RENAME { type = alter_table_statement::type::rename; }
id1=cident K_TO toId1=cident { renames.emplace_back(id1, toId1); }
( K_AND idn=cident K_TO toIdn=cident { renames.emplace_back(idn, toIdn); } )*
)
{
$expr = new AlterTableStatement(cf, type, id, v, props, renames, isStatic);
$expr = ::make_shared<alter_table_statement>(std::move(cf), type, std::move(id),
std::move(v), std::move(props), std::move(renames), is_static);
}
;
@@ -803,20 +853,22 @@ alterTableStatement returns [AlterTableStatement expr]
* ALTER TYPE <name> ADD <field> <newtype>;
* ALTER TYPE <name> RENAME <field> TO <newtype> AND ...;
*/
alterTypeStatement returns [AlterTypeStatement expr]
alterTypeStatement returns [::shared_ptr<alter_type_statement> expr]
: K_ALTER K_TYPE name=userTypeName
( K_ALTER f=ident K_TYPE v=comparatorType { $expr = AlterTypeStatement.alter(name, f, v); }
| K_ADD f=ident v=comparatorType { $expr = AlterTypeStatement.addition(name, f, v); }
( K_ALTER f=ident K_TYPE v=comparatorType { $expr = ::make_shared<alter_type_statement::add_or_alter>(name, false, f, v); }
| K_ADD f=ident v=comparatorType { $expr = ::make_shared<alter_type_statement::add_or_alter>(name, true, f, v); }
| K_RENAME
{ Map<ColumnIdentifier, ColumnIdentifier> renames = new HashMap<ColumnIdentifier, ColumnIdentifier>(); }
id1=ident K_TO toId1=ident { renames.put(id1, toId1); }
( K_AND idn=ident K_TO toIdn=ident { renames.put(idn, toIdn); } )*
{ $expr = AlterTypeStatement.renames(name, renames); }
{ $expr = ::make_shared<alter_type_statement::renames>(name); }
renames[{ static_pointer_cast<alter_type_statement::renames>($expr) }]
)
;
#endif
renames[::shared_ptr<alter_type_statement::renames> expr]
: fromId=ident K_TO toId=ident { $expr->add_rename(fromId, toId); }
( K_AND renames[$expr] )?
;
/**
* DROP KEYSPACE [IF EXISTS] <KSP>;
*/
@@ -833,15 +885,15 @@ dropTableStatement returns [::shared_ptr<drop_table_statement> stmt]
: K_DROP K_COLUMNFAMILY (K_IF K_EXISTS { if_exists = true; } )? cf=columnFamilyName { $stmt = ::make_shared<drop_table_statement>(cf, if_exists); }
;
#if 0
/**
* DROP TYPE <name>;
*/
dropTypeStatement returns [DropTypeStatement stmt]
@init { boolean ifExists = false; }
: K_DROP K_TYPE (K_IF K_EXISTS { ifExists = true; } )? name=userTypeName { $stmt = new DropTypeStatement(name, ifExists); }
dropTypeStatement returns [::shared_ptr<drop_type_statement> stmt]
@init { bool if_exists = false; }
: K_DROP K_TYPE (K_IF K_EXISTS { if_exists = true; } )? name=userTypeName { $stmt = ::make_shared<drop_type_statement>(name, if_exists); }
;
#if 0
/**
* DROP INDEX [IF EXISTS] <INDEX_NAME>
*/
@@ -856,123 +908,121 @@ dropIndexStatement returns [DropIndexStatement expr]
* TRUNCATE <CF>;
*/
truncateStatement returns [::shared_ptr<truncate_statement> stmt]
: K_TRUNCATE cf=columnFamilyName { $stmt = ::make_shared<truncate_statement>(cf); }
: K_TRUNCATE (K_COLUMNFAMILY)? cf=columnFamilyName { $stmt = ::make_shared<truncate_statement>(cf); }
;
#if 0
/**
* GRANT <permission> ON <resource> TO <username>
*/
grantStatement returns [GrantStatement stmt]
grantStatement returns [::shared_ptr<grant_statement> stmt]
: K_GRANT
permissionOrAll
K_ON
resource
K_TO
username
{ $stmt = new GrantStatement($permissionOrAll.perms, $resource.res, $username.text); }
{ $stmt = ::make_shared<grant_statement>($permissionOrAll.perms, $resource.res, $username.text); }
;
/**
* REVOKE <permission> ON <resource> FROM <username>
*/
revokeStatement returns [RevokeStatement stmt]
revokeStatement returns [::shared_ptr<revoke_statement> stmt]
: K_REVOKE
permissionOrAll
K_ON
resource
K_FROM
username
{ $stmt = new RevokeStatement($permissionOrAll.perms, $resource.res, $username.text); }
{ $stmt = ::make_shared<revoke_statement>($permissionOrAll.perms, $resource.res, $username.text); }
;
listPermissionsStatement returns [ListPermissionsStatement stmt]
listPermissionsStatement returns [::shared_ptr<list_permissions_statement> stmt]
@init {
IResource resource = null;
String username = null;
boolean recursive = true;
std::experimental::optional<auth::data_resource> r;
std::experimental::optional<sstring> u;
bool recursive = true;
}
: K_LIST
permissionOrAll
( K_ON resource { resource = $resource.res; } )?
( K_OF username { username = $username.text; } )?
( K_ON resource { r = $resource.res; } )?
( K_OF username { u = sstring($username.text); } )?
( K_NORECURSIVE { recursive = false; } )?
{ $stmt = new ListPermissionsStatement($permissionOrAll.perms, resource, username, recursive); }
{ $stmt = ::make_shared<list_permissions_statement>($permissionOrAll.perms, std::move(r), std::move(u), recursive); }
;
permission returns [Permission perm]
permission returns [auth::permission perm]
: p=(K_CREATE | K_ALTER | K_DROP | K_SELECT | K_MODIFY | K_AUTHORIZE)
{ $perm = Permission.valueOf($p.text.toUpperCase()); }
{ $perm = auth::permissions::from_string($p.text); }
;
permissionOrAll returns [Set<Permission> perms]
: K_ALL ( K_PERMISSIONS )? { $perms = Permission.ALL_DATA; }
| p=permission ( K_PERMISSION )? { $perms = EnumSet.of($p.perm); }
permissionOrAll returns [auth::permission_set perms]
: K_ALL ( K_PERMISSIONS )? { $perms = auth::permissions::ALL_DATA; }
| p=permission ( K_PERMISSION )? { $perms = auth::permission_set::from_mask(auth::permission_set::mask_for($p.perm)); }
;
resource returns [IResource res]
resource returns [auth::data_resource res]
: r=dataResource { $res = $r.res; }
;
dataResource returns [DataResource res]
: K_ALL K_KEYSPACES { $res = DataResource.root(); }
| K_KEYSPACE ks = keyspaceName { $res = DataResource.keyspace($ks.id); }
dataResource returns [auth::data_resource res]
: K_ALL K_KEYSPACES { $res = auth::data_resource(); }
| K_KEYSPACE ks = keyspaceName { $res = auth::data_resource($ks.id); }
| ( K_COLUMNFAMILY )? cf = columnFamilyName
{ $res = DataResource.columnFamily($cf.name.getKeyspace(), $cf.name.getColumnFamily()); }
{ $res = auth::data_resource($cf.name->get_keyspace(), $cf.name->get_column_family()); }
;
/**
* CREATE USER [IF NOT EXISTS] <username> [WITH PASSWORD <password>] [SUPERUSER|NOSUPERUSER]
*/
createUserStatement returns [CreateUserStatement stmt]
createUserStatement returns [::shared_ptr<create_user_statement> stmt]
@init {
UserOptions opts = new UserOptions();
boolean superuser = false;
boolean ifNotExists = false;
auto opts = ::make_shared<cql3::user_options>();
bool superuser = false;
bool ifNotExists = false;
}
: K_CREATE K_USER (K_IF K_NOT K_EXISTS { ifNotExists = true; })? username
( K_WITH userOptions[opts] )?
( K_SUPERUSER { superuser = true; } | K_NOSUPERUSER { superuser = false; } )?
{ $stmt = new CreateUserStatement($username.text, opts, superuser, ifNotExists); }
{ $stmt = ::make_shared<create_user_statement>($username.text, std::move(opts), superuser, ifNotExists); }
;
/**
* ALTER USER <username> [WITH PASSWORD <password>] [SUPERUSER|NOSUPERUSER]
*/
alterUserStatement returns [AlterUserStatement stmt]
alterUserStatement returns [::shared_ptr<alter_user_statement> stmt]
@init {
UserOptions opts = new UserOptions();
Boolean superuser = null;
auto opts = ::make_shared<cql3::user_options>();
std::experimental::optional<bool> superuser;
}
: K_ALTER K_USER username
( K_WITH userOptions[opts] )?
( K_SUPERUSER { superuser = true; } | K_NOSUPERUSER { superuser = false; } )?
{ $stmt = new AlterUserStatement($username.text, opts, superuser); }
{ $stmt = ::make_shared<alter_user_statement>($username.text, std::move(opts), std::move(superuser)); }
;
/**
* DROP USER [IF EXISTS] <username>
*/
dropUserStatement returns [DropUserStatement stmt]
@init { boolean ifExists = false; }
: K_DROP K_USER (K_IF K_EXISTS { ifExists = true; })? username { $stmt = new DropUserStatement($username.text, ifExists); }
dropUserStatement returns [::shared_ptr<drop_user_statement> stmt]
@init { bool ifExists = false; }
: K_DROP K_USER (K_IF K_EXISTS { ifExists = true; })? username { $stmt = ::make_shared<drop_user_statement>($username.text, ifExists); }
;
/**
* LIST USERS
*/
listUsersStatement returns [ListUsersStatement stmt]
: K_LIST K_USERS { $stmt = new ListUsersStatement(); }
listUsersStatement returns [::shared_ptr<list_users_statement> stmt]
: K_LIST K_USERS { $stmt = ::make_shared<list_users_statement>(); }
;
userOptions[UserOptions opts]
userOptions[::shared_ptr<cql3::user_options> opts]
: userOption[opts]
;
userOption[UserOptions opts]
: k=K_PASSWORD v=STRING_LITERAL { opts.put($k.text, $v.text); }
userOption[::shared_ptr<cql3::user_options> opts]
: k=K_PASSWORD v=STRING_LITERAL { opts->put($k.text, $v.text); }
;
#endif
/** DEFINITIONS **/
@@ -1151,7 +1201,8 @@ columnOperation[operations_type& operations]
columnOperationDifferentiator[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
: '=' normalColumnOperation[operations, key]
| '[' k=term ']' specializedColumnOperation[operations, key, k]
| '[' k=term ']' specializedColumnOperation[operations, key, k, false]
| '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' specializedColumnOperation[operations, key, k, true]
;
normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
@@ -1193,11 +1244,12 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
specializedColumnOperation[std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>,
shared_ptr<cql3::operation::raw_update>>>& operations,
shared_ptr<cql3::column_identifier::raw> key,
shared_ptr<cql3::term::raw> k]
shared_ptr<cql3::term::raw> k,
bool by_uuid]
: '=' t=term
{
add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t));
add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t, by_uuid));
}
;
@@ -1224,8 +1276,8 @@ properties[::shared_ptr<cql3::statements::property_definitions> props]
;
property[::shared_ptr<cql3::statements::property_definitions> props]
: k=ident '=' (simple=propertyValue { try { $props->add_property(k->to_string(), simple); } catch (exceptions::syntax_exception e) { add_recognition_error(e.what()); } }
| map=mapLiteral { try { $props->add_property(k->to_string(), convert_property_map(map)); } catch (exceptions::syntax_exception e) { add_recognition_error(e.what()); } })
: k=ident '=' simple=propertyValue { try { $props->add_property(k->to_string(), simple); } catch (exceptions::syntax_exception e) { add_recognition_error(e.what()); } }
| k=ident '=' map=mapLiteral { try { $props->add_property(k->to_string(), convert_property_map(map)); } catch (exceptions::syntax_exception e) { add_recognition_error(e.what()); } }
;
propertyValue returns [sstring str]
@@ -1243,6 +1295,7 @@ relationType returns [const cql3::operator_type* op = nullptr]
;
relation[std::vector<cql3::relation_ptr>& clauses]
@init{ const cql3::operator_type* rt = nullptr; }
: name=cident type=relationType t=term { $clauses.emplace_back(::make_shared<cql3::single_column_relation>(std::move(name), *type, std::move(t))); }
| K_TOKEN l=tupleOfIdentifiers type=relationType t=term
@@ -1252,11 +1305,9 @@ relation[std::vector<cql3::relation_ptr>& clauses]
{ $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), cql3::operator_type::IN, std::move(marker))); }
| name=cident K_IN in_values=singleColumnInValues
{ $clauses.emplace_back(cql3::single_column_relation::create_in_relation(std::move(name), std::move(in_values))); }
#if 0
| name=cident K_CONTAINS { Operator rt = Operator.CONTAINS; } (K_KEY { rt = Operator.CONTAINS_KEY; })?
t=term { $clauses.add(new SingleColumnRelation(name, rt, t)); }
| name=cident '[' key=term ']' type=relationType t=term { $clauses.add(new SingleColumnRelation(name, key, type, t)); }
#endif
| name=cident K_CONTAINS { rt = &cql3::operator_type::CONTAINS; } (K_KEY { rt = &cql3::operator_type::CONTAINS_KEY; })?
t=term { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), *rt, std::move(t))); }
| name=cident '[' key=term ']' type=relationType t=term { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), std::move(key), *type, std::move(t))); }
| ids=tupleOfIdentifiers
( K_IN
( '(' ')'
@@ -1378,12 +1429,10 @@ tuple_type returns [shared_ptr<cql3::cql3_type::raw> t]
'>' { $t = cql3::cql3_type::raw::tuple(std::move(types)); }
;
#if 0
username
: IDENT
| STRING_LITERAL
;
#endif
// Basically the same as cident, but we need to exlude existing CQL3 types
// (which for some reason are not reserved otherwise)
@@ -1562,6 +1611,8 @@ K_OR: O R;
K_REPLACE: R E P L A C E;
K_DETERMINISTIC: D E T E R M I N I S T I C;
K_SCYLLA_TIMEUUID_LIST_INDEX: S C Y L L A '_' T I M E U U I D '_' L I S T '_' I N D E X;
// Case-insensitive alpha characters
fragment A: ('a'|'A');
fragment B: ('b'|'B');
@@ -1607,20 +1658,17 @@ STRING_LITERAL
setText(txt);
}
:
// FIXME:
#if 0
/* pg-style string literal */
(
'\$' '\$'
( /* collect all input until '$$' is reached again */
{ (input.size() - input.index() > 1)
&& !"$$".equals(input.substring(input.index(), input.index() + 1)) }?
=> c=. { txt.appendCodePoint(c); }
'$' '$'
(
(c=~('$') { txt.push_back(c); })
|
('$' (c=~('$') { txt.push_back('$'); txt.push_back(c); }))
)*
'\$' '\$'
'$' '$'
)
|
#endif
/* conventional quoted string literal */
(
'\'' (c=~('\'') { txt.push_back(c);} | '\'' '\'' { txt.push_back('\''); })* '\''

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2014 Cloudius Systems
* Copyright (C) 2014 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -80,7 +80,7 @@ int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
} catch (marshal_exception e) {
throw exceptions::invalid_request_exception("Invalid timestamp value");
}
return boost::any_cast<int64_t>(data_type_for<int64_t>()->deserialize(*tval));
return value_cast<int64_t>(data_type_for<int64_t>()->deserialize(*tval));
}
int32_t attributes::get_time_to_live(const query_options& options) {
@@ -99,7 +99,7 @@ int32_t attributes::get_time_to_live(const query_options& options) {
throw exceptions::invalid_request_exception("Invalid TTL value");
}
auto ttl = boost::any_cast<int32_t>(data_type_for<int32_t>()->deserialize(*tval));
auto ttl = value_cast<int32_t>(data_type_for<int32_t>()->deserialize(*tval));
if (ttl < 0) {
throw exceptions::invalid_request_exception("A TTL must be greater or equal to 0");
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -737,7 +737,7 @@ public:
/** A condition on a collection element. For example: "IF col['key'] = 'foo'" */
static ::shared_ptr<raw> collection_condition(::shared_ptr<term::raw> value, ::shared_ptr<term::raw> collection_element,
const operator_type& op) {
return ::make_shared<raw>(std::move(value), std::vector<::shared_ptr<term::raw>>{}, ::shared_ptr<abstract_marker::in_raw>{}, std::move(collection_element), operator_type::IN);
return ::make_shared<raw>(std::move(value), std::vector<::shared_ptr<term::raw>>{}, ::shared_ptr<abstract_marker::in_raw>{}, std::move(collection_element), op);
}
/** An IN condition on a collection element. For example: "IF col['key'] IN ('foo', 'bar', ...)" */

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*/
/*
@@ -121,3 +121,7 @@ column_identifier::new_selector_factory(database& db, schema_ptr schema, std::ve
}
}
bool cql3::column_identifier::text_comparator::operator()(const cql3::column_identifier& c1, const cql3::column_identifier& c2) const {
return c1.text() < c2.text();
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -55,15 +55,17 @@ namespace cql3 {
* Represents an identifer for a CQL column definition.
* TODO : should support light-weight mode without text representation for when not interned
*/
class column_identifier final : public selection::selectable /* implements IMeasurableMemory*/ {
class column_identifier final : public selection::selectable {
public:
bytes bytes_;
private:
sstring _text;
#if 0
private static final long EMPTY_SIZE = ObjectSizes.measure(new ColumnIdentifier("", true));
#endif
public:
// less comparator sorting by text
struct text_comparator {
bool operator()(const column_identifier& c1, const column_identifier& c2) const;
};
column_identifier(sstring raw_text, bool keep_case);
column_identifier(bytes bytes_, data_type type);
@@ -83,20 +85,6 @@ public:
}
#if 0
public long unsharedHeapSize()
{
return EMPTY_SIZE
+ ObjectSizes.sizeOnHeapOf(bytes)
+ ObjectSizes.sizeOf(text);
}
public long unsharedHeapSizeExcludingData()
{
return EMPTY_SIZE
+ ObjectSizes.sizeOnHeapExcludingData(bytes)
+ ObjectSizes.sizeOf(text);
}
public ColumnIdentifier clone(AbstractAllocator allocator)
{
return new ColumnIdentifier(allocator.clone(bytes), text);

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -160,7 +160,7 @@ void constants::deleter::execute(mutation& m, const exploded_clustering_prefix&
auto ctype = static_pointer_cast<const collection_type_impl>(column.type);
m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(ctype->serialize_mutation_form(coll_m)));
} else {
m.set_cell(prefix, column, params.make_dead_cell());
m.set_cell(prefix, column, make_dead_cell(params));
}
}

View File

@@ -17,9 +17,9 @@
*/
/*
* Copyright 2015 Cloudius Systems
* Copyright (C) 2015 ScyllaDB
*
* Modified by Cloudius Systems
* Modified by ScyllaDB
*/
/*
@@ -197,7 +197,7 @@ public:
virtual void execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) override {
auto value = _t->bind_and_get(params._options);
auto cell = value ? params.make_cell(*value) : params.make_dead_cell();
auto cell = value ? make_cell(*value, params) : make_dead_cell(params);
m.set_cell(prefix, column, std::move(cell));
}
};

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2014 Cloudius Systems, Ltd.
* Copyright (C) 2014 ScyllaDB
*/
/*
@@ -148,7 +148,7 @@ public:
try {
auto&& ks = db.find_keyspace(_name.get_keyspace());
try {
auto&& type = ks._user_types.get_type(_name.get_user_type_name());
auto&& type = ks.metadata()->user_types()->get_type(_name.get_user_type_name());
if (!_frozen) {
throw exceptions::invalid_request_exception("Non-frozen User-Defined types are not supported, please use frozen<>");
}

Some files were not shown because too many files have changed in this diff Show More