Compare commits

...

452 Commits

Author SHA1 Message Date
Nadav Har'El
3d2e6b9d0d sstable: fix use-after-free of temporary ioclass copy
Commit 6a3872b355 fixed some use-after-free
bugs but introduced a new one because of a typo:

Instead of capturing a reference to the long-living io-class object, as
all the code does, one place in the code accidentally captured a *copy*
of this object. This copy had a very temporary life, and when a reference
to that *copy* was passed to sstable reading code which assumed that it
lives at least as long as the read call, a use-after-free resulted.

Fixes #1072

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1458595629-9314-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit 2eb0627665)
2016-03-22 08:09:25 +02:00
Pekka Enberg
827b87b7e2 main: Defer API server hooks until commitlog replay
Defer registering services to the API server until commitlog has been
replayed to ensure that nobody is able to trigger sstable operations via
'nodetool' before we are ready for them.
Message-Id: <1458116227-4671-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 972fc6e014)
2016-03-18 09:20:40 +02:00
Paweł Dziepak
c4b24e4a0b lsa: update _closed_occupancy after freeing all segments
_closed_occupancy will be used when a region is removed from its region
group, make sure that it is accurate.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
(cherry picked from commit 338fd34770)
2016-03-18 08:11:19 +02:00
Glauber Costa
2f91375d36 sstables: do not assume mutation_reader will be kept alive
Our sstables::mutation_reader has a specialization in which start and end
ranges are passed as futures. That is needed because we may have to read the
index file for those.

This works well under the assumption that every time a mutation_reader will be
created it will be used, since whoever is using it will surely keep the state
of the reader alive.

However, that assumption is no longer true - for a while. We use a reader
interface for reading everything from mutations and sstables to cache entries,
and when we create an sstable mutation_reader, that does not mean we'll use it.
In fact we won't, if the read can be serviced first by a higher level entity.

If that happens to be the case, the reader will be destructed. However, since
it may take more time than that for the start and end futures to resolve, by
the time they are resolved the state of the mutation reader will no longer be
valid.

The proposed fix for that is to only resolve the future inside
mutation_reader's read() function. If that function is called,  we can have a
reasonable expectation that the caller object is being kept alive.

A second way to fix this would be to force the mutation reader to be kept alive
by transforming it into a shared pointer and acquiring a reference to itself.
However, because the reader may turn out not to be used, the delayed read
actually has the advantage of not even reading anything from the disk if there
is no need for it.

Also, because sstables can be compacted, we can't guarantee that the sst object
itself , used in the resolution of start and end can be alive and that has the
same problem. If we delay the calling of those, we will also solve a similar
problem.  We assume here that the outter reader is keeping the SSTable object
alive.

I must note that I have not reproduced this problem. What goes above is the
result of the analysis we have made in #1036. That being the case, a thorough
review is appreciated.

Fixes #1036

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <a7e4e722f76774d0b1f263d86c973061fb7fe2f2.1458135770.git.glauber@scylladb.com>
(cherry picked from commit 6a3872b355)
2016-03-18 07:56:28 +02:00
Asias He
38470ca6e8 main: Defer initalization of streaming
Streaming is used by bootstrap and repair. Streaming uses storage_proxy
class to apply the frozen_mutation and db/column_family class to
invalidate row cache. Defer the initalization just before repair and
bootstrap init.
Message-Id: <8e99cf443239dd8e17e6b6284dab171f7a12365c.1458034320.git.asias@scylladb.com>

(cherry picked from commit d79dbfd4e8)
2016-03-15 11:59:16 +02:00
Pekka Enberg
5bb25954b4 main: Defer REPAIR_CHECKSUM_RANGE RPC verb registration after commitlog replay
Register the REPAIR_CHECKSUM_RANGE messaging service verb handler after
we have replayed the commitlog to avoid responding with bogus checksums.
Message-Id: <1458027934-8546-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit eb13f65949)
2016-03-15 11:59:10 +02:00
Gleb Natapov
1d9ca3ef1f main: Defer storage proxy RPC verb registration after commitlog replay
Message-Id: <20160315071229.GM6117@scylladb.com>
(cherry picked from commit 5076f4878b)
2016-03-15 09:41:21 +02:00
Gleb Natapov
cb97e5dfe8 messaging: enable keepalive tcp option for inter-node communication
Some network equipment that does TCP session tracking tend to drop TCP
sessions after a period of inactivity. Use keepalive mechanism to
prevent this from happening for our inter-node communication.

Message-Id: <20160314173344.GI31837@scylladb.com>
(cherry picked from commit e228ef1bd9)
2016-03-14 20:33:12 +02:00
Pekka Enberg
831b5af999 Merge scylla-seastar branch-0.18
* seastar 60643a0...e039c46 (2):
  > rpc: allow configuring keepalive for rpc client
  > net: add keepalive configuration to socket interface
2016-03-14 20:32:52 +02:00
Pekka Enberg
7f1048efb4 main: Defer migration manager RPC verb registration after commitlog replay
Defer registering migration manager RPC verbs after commitlog has has
been replayed so that our own schema is fully loaded before other other
nodes start querying it or sending schema updates.
Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 1429213b4c)
2016-03-14 20:11:04 +02:00
Glauber Costa
510b1a3afc main: when scanning SSTables, run shard 0 first
Deletion of previous stale, temporary SSTables is done by Shard0. Therefore,
let's run Shard0 first. Technically, we could just have all shards agree on the
deletion and just delete it later, but that is prone to races.

Those races are not supposed to happen during normal operation, but if we have
bugs, they can. Scylla's Github Issue #1014 is an example of a situation where
that can happen, making existing problems worse. So running a single shard
first and getting making sure that all temporary tables are deleted provides
extra protection against such situations.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 6c4e31bbdb)
2016-03-14 20:10:56 +02:00
Gleb Natapov
dd831f6463 make initialization run in a thread
While looking at initialization code I felt like my head is going to
explode. Moving initialization into a thread makes things a little bit
better. Only lightly tested.

Message-Id: <20160310163142.GE28529@scylladb.com>
(cherry picked from commit 16135c2084)
2016-03-14 20:10:48 +02:00
Gleb Natapov
8bf59afb42 fix developer-mode parameter application on SMP
I am almost sure we want to apply it once on each shard, and not multiple
times on a single shard.

Message-Id: <20160310155804.GB28529@scylladb.com>
(cherry picked from commit 176aa25d35)
2016-03-14 20:10:37 +02:00
Avi Kivity
f29bc8918b main: sanity check cpu support
We require SSE 4.2 (for commitlog CRC32), verify it exists early and bail
out if it does not.

We need to check early, because the compiler may use newer instructions
in the generated code; the earlier we check, the lower the probability
we hit an undefined opcode exception.

Message-Id: <1456665401-18252-1-git-send-email-avi@scylladb.com>
(cherry picked from commit a1ff21f6ea)
2016-03-14 20:10:29 +02:00
Takuya ASADA
4c6d655e99 main: notify service start completion ealier, to reduce systemd unit startup time
Fixes #910

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455830245-11782-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 0f87922aa6)
2016-03-14 20:10:19 +02:00
Nadav Har'El
fafe166d2c repair: stop ongoing repairs during shutdown
When shutting down a node gracefully, this patch asks all ongoing repairs
started on this node to stop as soon as possible (without completing
their work), and then waits for these repairs to finish (with failure,
usually, because they didn't complete).

We need to do this, because if the repair loop continues to run while we
start destructing the various services it relies on, it can crash (as
reported in #699, although the specific crash reported there no longer
occurs after some changes in the streaming code). Additionally, it is
important that to stop the ongoing repair, and not wait for it to complete
its normal operation, because that can take a very long time, and shutdown
is supposed to not take more than a few seconds.

Fixes #699.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1455218873-6201-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit 7dc843fc1c)
2016-03-14 20:10:13 +02:00
Avi Kivity
3380340750 Merge scylla-seastar branch-0.18
* seastar 353b1a1...60643a0 (2):
  > rpc: make client connection error more clear
  > reactor: fix work item leak in syscall work queue
2016-03-14 20:04:03 +02:00
Avi Kivity
4d3dac7f98 gitmodules: point seastar submodule at scylla-seastar repository
Prepare for branch-0.18 specific seastar commits.
2016-03-14 20:02:46 +02:00
Pekka Enberg
7f6891341e release: prepare for 0.18.2 2016-03-14 16:02:25 +02:00
Glauber Costa
ece77cce90 database: turn sstable generation number into an optional
This patch makes sure that every time we need to create a new generation number -
the very first step in the creation of a new SSTable, the respective CF is already
initialized and populated. Failure to do so can lead to data being overwritten.
Extensive details about why this is important can be found
in Scylla's Github Issue #1014

Nothing should be writing to SSTables before we have the chance to populate the
existing SSTables and calculate what should the next generation number be.

However, if that happens, we want to protect against it in a way that does not
involve overwriting existing tables. This is one of the ways to do it: every
column family starts in an unwriteable state, and when it can finally be written
to, we mark it as writeable.

Note that this *cannot* be a part of add_column_family. That adds a column family
to a db in memory only, and if anybody is about to write to a CF, that was most
likely already called. We need to call this explicitly when we are sure we're ready
to issue disk operations safely.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit a339296385)
2016-03-14 15:52:52 +02:00
Glauber Costa
d4a10a0a3c database: remove unused parameter
We are no longer using the in_flight_seals gate, but forgot to remove it.
To guarantee that all seal operations will have finished when we're done,
we are using the memtable_flush_queue, which also guarantees order. But
that gate was never removed.

The FIXME code should also be removed, since such interface does exist now.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 8eb4e69053)
2016-03-14 15:51:14 +02:00
Glauber Costa
e885eacbe4 column_family: do not open code generation calculation
We already have a function that wraps this, re-use it.  This FIXME is still
relevant, so just move it there. Let's not lose it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 94e90d4a17)
2016-03-14 15:51:06 +02:00
Glauber Costa
3f67277804 colum_family: remove mutation_count
We use memory usage as a threshold these days, and nowhere is _mutation_count
checked. Get rid of it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 46fdeec60a)
2016-03-14 15:50:57 +02:00
Asias He
05aea2b65a storage_service: Fix pending_range_calculator_service
Since calculate_pending_ranges will modify token_metadata, we need to
replicate to other shards. With this patch, when we call
calculate_pending_ranges, token_metadata will be replciated to other
non-zero shards.

In addition, it is not useful as a standalone class. We can merge it
into the storage_service. Kill one singleton class.

Fixes #1033
Refs #962
Message-Id: <fb5b26311cafa4d315eb9e72d823c5ade2ab4bda.1457943074.git.asias@scylladb.com>

(cherry picked from commit 9f64c36a08)
2016-03-14 14:39:39 +02:00
Vlad Zolotarov
a2751a9592 sstables: properly account removal requests
The same shard may create an sstables::sstable object for the same SStable
that doesn't belong to it more than once and mark it
for deletion (e.g. in a 'nodetool refresh' flow).

In that case the destructor of sstables::sstable accounted
the deletion requests from the same shard more than once since it was a simple
counter incremented each time there was a deletion request while it should
account request from the same shard as a single request. This is because
the removal logic waited for all shards to agree on a removal of a specific
SStable by comparing the counter mentioned above to the total
number of shards and once they were equal the SStable files were actually removed.

This patch fixes this by replacing the counter by an std::unordered_set<unsigned>
that will store a shard ids of the shards requesting the deletion
of the sstable object and will compare the size() of this set
to smp::count in order to decide whether to actually delete the corresponding
SStable files.

Fixes #1004

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1457886812-32345-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit ce47fcb1ba)
2016-03-14 14:38:17 +02:00
Raphael S. Carvalho
eda8732b8e sstables: make write_simple() safer by using exclusive flag
We should guarantee that write_simple() will not try to overwrite
an existing file.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <194bd055f1f2dc1bb9766a67225ec38c88e7b005.1457818073.git.raphaelsc@scylladb.com>
(cherry picked from commit 1ff7d32272)
2016-03-14 14:38:07 +02:00
Raphael S. Carvalho
b24f5ece1f sstables: fix race condition when writing to the same sstable in parallel
When we are about to write a new sstable, we check if the sstable exists
by checking if respective TOC exists. That check was added to handle a
possible attempt to write a new sstable with a generation being used.
Gleb was worried that a TOC could appear after the check, and that's indeed
possible if there is an ongoing sstable write that uses the same generation
(running in parallel).
If TOC appear after the check, we would again crap an existing sstable with
a temporary, and user wouldn't be to boot scylla anymore without manual
intervention.

Then Nadav proposed the following solution:
"We could do this by the following variant of Raphael's idea:

   1. create .txt.tmp unconditionally, as before the commit 031bf57c1
(if we can't create it, fail).
   2. Now confirm that .txt does not exist. If it does, delete the .txt.tmp
we just created and fail.
   3. continue as usual
   4. and at the end, as before, rename .txt.tmp to .txt.

The key to solving the race is step 1: Since we created .txt.tmp in step 1
and know this creation succeeded, we know that we cannot be running in
parallel with another writer - because such a writer too would have tried to
create the same file, and kept it existing until the very last step of its
work (step 4)."

This patch implements the solution described above.
Let me also say that the race is theoretical and scylla wasn't affected by
it so far.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ef630f5ac1bd0d11632c343d9f77a5f6810d18c1.1457818331.git.raphaelsc@scylladb.com>
(cherry picked from commit 0af786f3ea)
2016-03-14 14:37:58 +02:00
Raphael S. Carvalho
1322ec6d6b sstables: bail out if toc exists for generation used by write_components
Currently, if sstable::write_components() is called to write a new sstable
using the same generation of a sstable that exists, a temporary TOC will
be unconditionally created. Afterwards, the same sstable::write_components()
will fail when it reaches sstable::create_data(). The reason is obvious
because data component exists for that generation (in this scenario).
After that, user will not be able to boot scylla anymore because there is
a generation with both a TOC and a temporary TOC. We cannot simply remove a
generation with TOC and temporary TOC because user data will be lost (again,
in this scenario). After all, the temporary TOC was only created because
sstable::write_components() was wrongly called with the generation of a
sstable that exists.

Solution proposed by this patch is to trigger exception if a TOC file
exists for the generation used.

Some SSTable unit tests were also changed to guarantee that we don't try
to overwrite components of an existing sstable.

Refs #1014.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <caffc4e19cdcf25e4c6b9dd277d115422f8246c4.1457643565.git.raphaelsc@scylladb.com>
(cherry picked from commit 031bf57c19)
2016-03-14 14:37:50 +02:00
Glauber Costa
efbf51c00b sstables: improve error messages
The standard C++ exception messages that will be thrown if there is anything
wrong writing the file, are suboptimal: they barely tell us the name of the failing
file.

Use a specialized create function so that we can capture that better.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit f2a8bcabc2)
2016-03-14 14:37:41 +02:00
Pekka Enberg
5d901b19c4 main: Initialize system keyspace earlier
We start services like gossiper before system keyspace is initialized
which means we can start writing too early. Shuffle code so that system
keyspace is initialized earlier.

Refs #1014
Message-Id: <1457593758-9444-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 5dd1fda6cf)
2016-03-14 13:47:18 +02:00
Tomasz Grabiec
7085fc95d1 log: Fix operator<<(std::ostream&, const std::exception_ptr&)
Attempt to print std::nested_exception currently results in exception
to leak outside the printer. Fix by capturing all exception in the
final catch block.

For nested exception, the logger will print now just
"std::nested_exception".  For nested exceptions specifically we should
log more, but that is a separate problem to solve.
Message-Id: <1457532215-7498-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 838a038cbd)
2016-03-09 16:11:14 +02:00
Pekka Enberg
776908fbf6 types: Implement to_string for timestamps and dates
The to_string() function is used for logging purpose so use boost
to_iso_extended_string() to format both timestamps and dates.

Fixes #968 (showstopper)
Message-Id: <1457528755-6164-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit ab502bcfa8)
2016-03-09 16:10:02 +02:00
Gleb Natapov
5f7f276ef6 fix EACH_QUORUM handling during bootstrapping
Currently write acknowledgements handling does not take bootstrapping
node into account for CL=EACH_QUORUM. The patch fixes it.

Fixes #994

Message-Id: <20160307121620.GR2253@scylladb.com>
(cherry picked from commit 626c9d046b)
2016-03-08 13:35:10 +02:00
Paweł Dziepak
5a38f3cbfd lsa: set _active to nullptr in region destructor
In region destructor, after active segments is freed pointer to it is
left unchanged. This confuses the remaining parts of the destructor
logic (namely, removal from region group) which may rely on the
information in region_impl::_active.

In this particular case the problem was that code removing from the
region group called region_impl::occupancy() which was
dereferencing _active if not null.

Fixes #993.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 99b61d3944)
2016-03-08 13:32:30 +02:00
Tomasz Grabiec
2d4309a926 validation: Fix validation of empty partition key
The validation was wrongly assuming that empty thrift key, for which
the original C* code guards against, can only correspond to empty
representation of our partition_key. This no longer holds after:

   commit 095efd01d6
   "keys: Make from_exploded() and components() work without schema"

This was responsible for dtest failure:
cql_additional_tests.TestCQL:column_name_validation_test

(cherry picked from commit 100b540a53)
2016-03-08 11:42:14 +02:00
Tomasz Grabiec
988d6cd153 cql3: Fix handling of lists with static columns
List operations and prefetching were not handling static columns
correctly. One issue was that prefetching was attaching static column
data to row data using ids which might overlap with clustered columns.

Another problem was that list operations were always constructing
clustering key even if they worked on a static column. For static
columns the key would be always empty and lookup would fail.

The effect was that list operations which depend on curent state had
no effect. Similar problem could be observed on C* 2.1.9, but not on 2.2.3.

Fixes #903.

(cherry picked from commit 383296c05b)
2016-03-06 11:06:03 +02:00
Pekka Enberg
bf71575fd7 release: prepare for 0.18.1 2016-03-05 08:53:07 +02:00
Gleb Natapov
cd75075214 storage_proxy: fix race between read cl completion and timeout in digest resolver
If timeout happens after cl promise is fulfilled, but before
continuation runs it removes all the data that cl continuation needs
to calculate result. Fix this by calculating result immediately and
returning it in cl promise instead of delaying this work until
continuation runs. This has a nice side effect of simplifying digest
mismatch handling and making it exception free.

Fixes #977.

Message-Id: <1457015870-2106-3-git-send-email-gleb@scylladb.com>
(cherry picked from commit b89b6f442b)
2016-03-03 17:10:38 +02:00
Gleb Natapov
e85f11566b storage_proxy: store only one data reply in digest resolver.
Read executor may ask for more than one data reply during digest
resolving stage, but only one result is actually needed to satisfy
a query, so no need to store all of them.

Message-Id: <1457015870-2106-2-git-send-email-gleb@scylladb.com>
(cherry picked from commit e4ac5157bc)
2016-03-03 17:10:32 +02:00
Gleb Natapov
8f682f018e storage_proxy: fix cl achieved condition in digest resolver timeout handler
In digest resolver for cl to be achieved it is not enough to get correct
number of replies, but also to have data reply among them. The condition
in digest timeout does not check that, fortunately we have a variable
that we set to true when cl is achieved, so use it instead.

Message-Id: <1457015870-2106-1-git-send-email-gleb@scylladb.com>
(cherry picked from commit 69b61b81ce)
2016-03-03 17:10:26 +02:00
Tomasz Grabiec
dba2b617e7 db: Fix error handling in populate_keyspace()
When find_uuid() fails Scylla would terminate with:

  Exiting on unhandled exception of type 'std::out_of_range': _Map_base::at

But we are supposed to ignore directories for unknown column
families. The try {} catch block is doing just that when
no_such_column_family is thrown from the find_column_family() call
which follows find_uuid(). Fix by converting std::out_of_range to
no_such_column_family.

Message-Id: <1456056280-3933-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 11:37:26 +02:00
Paweł Dziepak
f4e11007cf Revert "do not use boost::multiprecision::msb()"
This reverts commit dadd097f9c.

That commit caused serialized forms of varint and decimal to have some
excess leading zeros. They didn't affect deserialization in any way but
caused computed tokens to differ from the Cassandra ones.

Fixes #898.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1455537278-20106-1-git-send-email-pdziepak@scylladb.com>
2016-03-03 10:54:19 +02:00
Asias He
fdfa1df395 locator: Fix get token from a range<token>
With a range{t1, t2}, if t2 == {}, the range.end() will contain no
value. Fix getting t2 in this case.

Fixes #911.
Message-Id: <4462e499d706d275c03b116c4645e8aaee7821e1.1456128310.git.asias@scylladb.com>
2016-03-03 10:53:21 +02:00
Tomasz Grabiec
116055cc6f bytes_ostream: Avoid recursion when freeing chunks
When there is a lot of chunks we may get stack overflow.

This seems to fix issue #906, a memory corruption during schema
merge. I suspect that what causes corruption there is overflowing of
the stack allocated for the seastar thread. Those stacks don't have
red zones which would catch overflow.

Message-Id: <1456056288-3983-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 10:53:01 +02:00
Calle Wilund
04c19344de database: Fix use and assumptions about pending compations
Fixes #934 - faulty assert in discard_sstables

run_with_compaction_disabled clears out a CF from compaction
mananger queue. discard_sstables wants to assert on this, but looks
at the wrong counters.

pending_compactions is an indicator on how much interested parties
want a CF compacted (again and again). It should not be considered
an indicator of compactions actually being done.

This modifies the usage slightly so that:
1.) The counter is always incremented, even if compaction is disallowed.
    The counters value on end of run_with_compaction_disabled is then
    instead used as an indicator as to whether a compaction should be
    re-triggered. (If compactions finished, it will be zero)
2.) Document the use and purpose of the pending counter, and add
    method to re-add CF to compaction for r_w_c_d above.
3.) discard_sstables now asserts on the right things.

Message-Id: <1456332824-23349-1-git-send-email-calle@scylladb.com>
2016-03-03 10:51:27 +02:00
Raphael S. Carvalho
df19e546f9 tests: sstable_test: submit compaction request through column family
That's needed for reverted commit 9586793c to work. It's also the
correct thing to do, i.e. column family submits itself to manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2a1d141ad929c1957933f57412083dd52af0390b.1456415398.git.raphaelsc@scylladb.com>
2016-03-03 10:51:23 +02:00
Takuya ASADA
b532919c55 dist: add posix_net_conf.sh on Ubuntu package
Fixes #881

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455522990-32044-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit fb3f4cc148)
2016-02-15 17:03:10 +02:00
Takuya ASADA
6ae6dcc2fc dist: switch AMI base image to 'CentOS7-Base2', uses CentOS official kernel
On previous CentOS base image, it accsidently uses non-standard kernel from elrepo.
This replaces base image to new one, contains CentOS default kernel.

Fixes #890

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455398903-2865-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 3697cee76d)
2016-02-15 15:59:04 +02:00
Tomasz Grabiec
5716140a14 abstract_replication_strategy: Fix generation of token ranges
We can't move-from in the loop because the subject will be empty in
all but the first iteration.

Fixes crash during node stratup:

  "Exiting on unhandled exception of type 'runtime_exception': runtime error: Invalid token. Should have size 8, has size 0"

Fixes update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_node_1_test (and probably others)

Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
(cherry picked from commit efdbc3d6d7)
2016-02-14 14:39:31 +02:00
Avi Kivity
91cb9bae2e release: prepare for 0.18 2016-02-11 17:55:20 +02:00
Shlomi Livne
f938e1d303 dist: start scylla with SCYLLA_IO
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <d93a7b41a285fcde796c5681479a328f1efac0c3.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:03 +02:00
Shlomi Livne
5494135ddd dist: update SCYLLA_IO with params for AMI
Add setting of --num-io-queues, --max-io-requests for AMI

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <b94a63154a91c8568e194d7221b9ffc7d7813ebc.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:02 +02:00
Shlomi Livne
5cae2560a3 dist: introduce SCYLLA_IO
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <6490d049fd23a335bb0a95cac3e8a4c08c61166e.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:02 +02:00
Shlomi Livne
d8cdf76e70 dist: change setting of scylla home from "-d" to "-r"
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <53dcd9d1daa0194de3f889b67788d9c21d1e474d.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:00:37 +02:00
Avi Kivity
3c4f67f3e6 build: require boost > 1.55
See #898.

Add checks both for boost being installed, and for the correct version.
Message-Id: <1455193574-24959-1-git-send-email-avi@scylladb.com>
2016-02-11 15:15:49 +02:00
Avi Kivity
9249d45ae1 Update scylla-ami submodule
* dist/ami/files/scylla-ami b2724be...b3b85be (1):
  > adding --stop-services
2016-02-11 12:24:17 +02:00
Avi Kivity
5834815ed9 Merge seastar upstream
* seastar 14c9991...353b1a1 (2):
  > scripts: posix_net_conf.sh: Change the way we learn NIC's IRQ numbers
  > gate: protect against calling close() more than once
2016-02-11 12:23:51 +02:00
Takuya ASADA
09b1ec6103 dist: attach ephemeral disks on AMI by default
To attach maximum number of ephemeral disks available on the instance, specify 8.
On AMI creation, it will be reduce to available number.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454439628-2882-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:21:09 +02:00
Takuya ASADA
16e6db42e1 dist: abandon to start scylla-server when it's disabled from AMI userdata
Support AMi's --stop-services, prevent startup scylla-server (and scylla-jmx, since it's dependent on scylla-server)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454492729-11876-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:21:08 +02:00
Takuya ASADA
f227b3faac dist: On AMI, mark root disk with delete_on_termination
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454513308-12384-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:19:28 +02:00
Takuya ASADA
33309f667e dist: enable enhanced networking on AMI
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454971289-21369-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:18:48 +02:00
Raphael S. Carvalho
ed61fe5831 sstables: make compaction stop report user-friendly
When scylla stopped an ongoing compaction, the event was reported
as an error. This patch introduces a specialized exception for
compaction stop so that the event can be handled appropriately.

Before:
ERROR [shard 0] compaction_manager - compaction failed: read exception:
std::runtime_error (Compaction for keyspace1/standard1 was deliberately
stopped.)

After:
INFO  [shard 0] compaction_manager - compaction info: Compaction for
keyspace1/standard1 was stopped due to shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1f85d4e5c24d23a1b4e7e0370a2cffc97cbc6d44.1455034236.git.raphaelsc@scylladb.com>
2016-02-11 12:16:53 +02:00
Takuya ASADA
8d8130f9c9 dist: fix typo on build_ami.sh
We should always run scylla_setup, not just for locally built rpm

Fixes #897

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455103519-13780-1-git-send-email-syuu@scylladb.com>
2016-02-11 11:56:11 +02:00
Shlomi Livne
64f8d5a50e dist: update packer location
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <3c33ea073f702e00b789930fce9befef03ad9e88.1455178900.git.shlomi@scylladb.com>
2016-02-11 11:52:56 +02:00
Avi Kivity
bfbf89ee31 Merge "Serialize keys in a form independent of in-memory representation" from Tomasz
"This series changes the on-wire definitions of keys to be of the following form:

  class partition_key {
     std::vector<bytes> exploded();
  };

Keys are therefore collections of components. The components are serialized according
to the format specified in the CQL binary protocol. No bit depends now on how we store keys in memory.

Constructing keys from components currently requires a schema reference,
which makes it not possible to deserialize or serialize the keys automatically
by RPC. To avoid those complications, compound_type was changed so that
it can be constructed and components can be iterated over without schema.
Because of this, partition_key size increased by 2 bytes."
2016-02-10 17:54:42 +02:00
Tomasz Grabiec
b74301302c tests: Add test for key serialization 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
3e2c1840d8 idl: Make key definitions independent of in-memory representation 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
428fce3828 compound: Optimize serialize_single() 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
0cc2832a76 keys: Allow constructing from a range 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
3ffcb998fb keys: Enable serialization from a range not just a vector 2016-02-10 14:35:14 +01:00
Tomasz Grabiec
095efd01d6 keys: Make from_exploded() and components() work without schema
For simplicity, we want to have keys serializable and deserializable
without schema for now. We will serialize keys in a generic form of a
vector of components where the format of components is specified by
CQL binary protocol. So conversion between keys and vector of
components needs to be possible to do without schema.

We may want to make keys schema-dependent back in the future to apply
space optimizations specific to column types. Existing code should
still pass schema& to construct and access the key when possible.

One optimization had to be reverted in this change - avoidance of
storing key length (2 bytes) for single-component partition keys. One
consequence of this, in addition to a bit larger keys, is that we can
no longer avoid copy when constructing single-component partition keys
from a ready "bytes" object.

I haven't noticed any significant performance difference in:

  tests/perf/perf_simple_query -c1 --write

It does ~130K tps on my machine.
2016-02-10 14:35:13 +01:00
Tomasz Grabiec
31312722d1 compound: Reduce duplication 2016-02-10 14:35:13 +01:00
Tomasz Grabiec
085d148d6f compound: Remove unused methods 2016-02-10 14:35:13 +01:00
Tomasz Grabiec
b777cc9565 tests: Fix tests to not rely on key representation 2016-02-10 14:35:13 +01:00
Asias He
6d0407503b locator: Do not generate wrap-around ranges
Like we did in commit d54c77d5d0,
make the remaining functions in abstract_replication_strategy return
non-wrap-around ranges.

This fixes:

ERROR [shard 0] stream_session - [Stream #f0b7fda0-cf3e-11e5-b6c4-000000000000]
stream_transfer_task: Fail to send to 127.0.0.4:0: std::runtime_error (Not implemented: WRAP_AROUND)

in streaming.
Message-Id: <514d2a9a1d3b868d213464c8858ac5162c0338d8.1455093643.git.asias@scylladb.com>
2016-02-10 10:03:31 +01:00
Avi Kivity
9f3061ade8 Revert "streaming: Send mutations on all shards"
This reverts commit 31d439213c.

Fixes #894.

Conflicts:
    streaming/stream_manager.cc

(may have undone part of 63a5aa6122)
2016-02-09 18:26:14 +02:00
Calle Wilund
873f87430d database: Check sstable dir name UUID part when populating CF
Fixes #870
Only load sstables from CF directories that match the current
CF uuid.
Message-Id: <1454938450-4338-1-git-send-email-calle@scylladb.com>
2016-02-08 14:48:19 +01:00
Calle Wilund
2ffd7d7b99 stream_manager: Change construction to make gcc 4.9 happy
gcc 4.9 complains about the type{ val, val } construction of
type with implicit default constructor, i.e. member = initial
declarations. gcc 5 does not (and possibly rightly so).
However, we still (implicitly) claim to support gcc 4.9 so
why not just change this particular instance.

Message-Id: <1454921328-1106-1-git-send-email-calle@scylladb.com>
2016-02-08 10:54:48 +02:00
Paweł Dziepak
c90ec731c8 transport: do not close gate at connection shutdown
connection::_pending_requests_gate is responsible for keeping connection
objects alive as long as there are outstanding requests and is closed
in connection::proccess() when needed. Closing it in connection::shutdown()
as well may cause the gate to be closed twice what is a bug.

Fixes #690.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1454596390-23239-1-git-send-email-pdziepak@scylladb.com>
2016-02-07 20:07:23 +02:00
Avi Kivity
8b0a26f06d build: support for alternative versions of libsystemd pkgconfig
While pkgconfig is supposed to be a distribution and version neutral way
of detecting packages, it doesn't always work this way.  The sd_notify()
manual page documents that sd_notify is available via the libsystemd
package, but on centos 7.0 it is only available via the libsystemd-daemon
package (on centos 7.1+ it works as expected).

Fix by allowing for alternate version of package names, testing each one
until a match is found.

Fixes #879.

Message-Id: <1454858862-5239-1-git-send-email-avi@scylladb.com>
2016-02-07 17:36:57 +02:00
Avi Kivity
ad58663c96 row_cache: reindent 2016-02-07 13:25:29 +02:00
Asias He
31d439213c streaming: Send mutations on all shards
Currently, only the shard where the stream_plan is created on will send
streaing mutations. To utilize all the available cores, we can make each
shard send mutations which it is responsbile for. On the receiver side,
we do not forward the mutations to the shard where the stream_session is
created, so that we can avoid unnecessary forwarding.

Note: the downside is that it is now harder to:

1) to track number of bytes sent and received
2) to update the keep alive timer upon receive of the STREAM_MUTATION

To fix, we now store the sent/recieved bytes info on all shards. When
the keep alive timer expires, we check if any progress has been made.

Hopefully, this patch will make the streaming much faster and in turn
make the repair/decommission/adding a node faster.

Refs: https://github.com/scylladb/scylla/issues/849

Tested with decommission/repair dtest.

Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>
2016-02-07 10:57:51 +02:00
Gleb Natapov
63a5aa6122 prevent superfluous frozen_mutation copying
Sometimes frozen_mutation is copied while it can be moved instead. Fix
those cases.

Message-Id: <20160204165708.GI6705@scylladb.com>
2016-02-07 10:54:16 +02:00
Erich Keane
4197ceeedb raw_statement::is_reversed rewrite to avoid VLA
The is_reversed function uses a variable length array, which isn't
spec-abiding C++.  Additionally, the Clang compiler doesn't allow them
with non-POD types, so this function wouldn't compile.

After reading through the function it seems that the array wasn't
necessary as the check could be calculated inline rather than
separately.  This version should be more performant (since it no longer
requires the VLA lookup performance hit) while taking up less memory in
all but the smallest of edge-cases (when the clustering_key_size *
sizeof(optional<bool>) < sizeof(size_type) - sizeof(uint32_t) +
sizeof(bool).

This patch uses  relation_order_unsupported it assure that the exception
order is consistent with the preivous version.  The throw would
otherwise be moved into the initial for-loop.

There are two derrivations in behavior:
The first is the initial assert.  It however should not change the apparent
behavior besides causing orderings() to be looked up 2x in debug
situations.

The second is the conversion of is_reversed_ from an optional to a bool.
The result is that the final return value is now well-defined to be
false in the release-condition where orderings().size() == 0, rather
than be the ill-defined *is_reversed_ that was there previously.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454546285-16076-4-git-send-email-erich.keane@verizon.net>
2016-02-07 10:38:17 +02:00
Erich Keane
49842aacd9 managed_vector: maybe_constructed ctor to non-constexpr
Clang enforces that a union's constexpr CTOR must initialize
one of the members.  The spec is seemingly silent as to what
the rule on this is, however, making this non-constexpr results in clang
accepting the constructor.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454604300-1673-1-git-send-email-erich.keane@verizon.net>
2016-02-07 10:30:45 +02:00
Erich Keane
e87019843f Fix PHI_FACTOR definition to be spec compliant
PHI_FACTOR is a constexpr variable that is defined using std::log.
Though G++ has a constexpr version of std::log, this itself is not spec
complaint (in fact, Clang enforces this).  See C++ Spec 26.8 for the
definition of std::log and 17.6.5.6 for the rule regarding adding
constexpr where it isn't specified.

This patch replaces the std::log statement with a version from math.h
that contains the exact value (M_LOG10El).

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454603285-32677-1-git-send-email-erich.keane@verizon.net>
2016-02-04 18:33:44 +02:00
Avi Kivity
c85f6c4df1 Merge seastar upstream
* seastar 661ccd9...14c9991 (1):
  > reactor: use correct open_flags when opening a file without DMA support

Fixes #871.
2016-02-04 18:17:04 +02:00
Gleb Natapov
77d47c0c4b optimize serialization of array/vector of integral types
Array of integral types on little endian machine can be memcpyed into/out
of a buffer instead of serialized/deserialized element by element.

Message-Id: <20160204155425.GC6705@scylladb.com>
2016-02-04 18:01:14 +02:00
Avi Kivity
91fbb81477 Merge seastar upstream
* seastar f8beab9...661ccd9 (1):
  > Merge "Use swapcontext() with AddressSanitizer" from Paweł
2016-02-04 17:30:15 +02:00
Paweł Dziepak
ababdfc9e2 tests/batchlog: use proper batchlog version
Since 42e3999a00 "Check batchlog version
before replaying" there is a version check in batchlog replay.
However, the test wasn't updated and still used some arbitrary version
number which caused it to fail.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1454595368-21670-1-git-send-email-pdziepak@scylladb.com>
2016-02-04 16:50:45 +02:00
Gleb Natapov
049ae37d08 storage_proxy: change collectd to show foreground mutation instead of overall mutation count
It is much easier to see what is going on this way otherwise graphs for
bg mutations and overall mutations are very close with usual scaling for
many workloads.

Message-Id: <20160204083452.GH6705@scylladb.com>
2016-02-04 14:58:56 +02:00
Gleb Natapov
a9e4afd8d2 Drop query-result.hh from database.hh
It is not needed there but causes a lot of recompilation when changed.

Message-Id: <1454496142-14537-3-git-send-email-gleb@scylladb.com>
2016-02-04 13:22:27 +02:00
Gleb Natapov
2ae1ae2d18 Cleanup messaging_service.hh includes a bit.
Forward declare some classes instead.

Message-Id: <1454496142-14537-2-git-send-email-gleb@scylladb.com>
2016-02-04 13:22:24 +02:00
Avi Kivity
f3ca597a01 Merge "Sstable cleanup fixes" from Tomasz
"  - Added waiting for async cleanup on clean shutdown

  - Crash in the middle of sstable removal doesn't leave system in a non-bootable state"
2016-02-04 12:36:13 +02:00
Tomasz Grabiec
c7ef3703cc sstable: Make sstable deletion never leave sstable set in a non-bootable state
Refs #860
Refs #802

An sstable file set with any component missing is interpreted as a
critical error during boot. Currently sstable removal procedure could
leave the files in a non-bootable state if the process crashed after
TOC was removed but before all components were removed as well.

To solve this problem, start the removal by renaming the TOC file to a
so called "temporary TOC". Upon boot such kind of TOC file is
interpreted as an sstable which is safe to remove. This kind of TOC
was added before to deal with a similar scenario but in the opposite
direction - when writing a new sstable.
2016-02-03 17:36:17 +01:00
Tomasz Grabiec
c8a98b487c sstables: Remove coupling-hiding duplication 2016-02-03 17:36:17 +01:00
Tomasz Grabiec
355874281a sstables: Do not register exit hooks from static initializer
Fixes #868.

Registerring exit hooks while reactor is already iterating over exit
hooks is not allowed and currently leads to undefined behavior
observed in #868. While we should make the failure more user friendly,
registering exit hooks concurrently with shutdown will not be allowed.

We don't expect exit hooks to be registered after exit starts because
this would violate the guarantee which says that exit hooks are
executed in reverse order of registration. Starting exit sequence in
the middle of initialization sequence would result in use after free
errors. Btw, I'm not sure if currently there's anything which prevents
this

To solve this problem, move the exit hook to initilization
sequence. In case of tests, the cleanup has to be called explicitly.
2016-02-03 17:35:50 +01:00
Tomasz Grabiec
136c9d9247 sstables: Improve error message in case of generation duplication
Refs #870.
2016-02-03 17:35:50 +01:00
Calle Wilund
a00ff015f4 transport::server: read cqlv2 batch options correctly
Fixes #563.
Refs #584

CQLv2 encodes batch query_options in v1 format, not v2+.
CQLv1 otoh has no batch support at all.
Make read_options use explicit version format if needed.

v2: Ensure we preserve cql protocol version in query_opts
Message-Id: <1454514510-21706-1-git-send-email-calle@scylladb.com>
2016-02-03 16:55:07 +01:00
Gleb Natapov
b4b560e0fc change result_digest to hold std::array instead of a std::vector
Digest size if fixed, so no need to use std::vector to hold it.

Message-Id: <20160203102530.GU6705@scylladb.com>
2016-02-03 12:27:39 +02:00
Raphael S. Carvalho
4041f8cffc compaction: stop all ongoing compaction during shutdown
Currently, we wait for ongoing compaction during shutdown, but
that may take 'forever' if compacting huge sstables with a slow
disk. Compaction of huge sstables will take a considerable amount
of time even with fast disks. Therefore, all ongoing compaction
should be stopped during shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3370f17ce4274df417ea60651f33fc5d4de91199.1454441286.git.raphaelsc@scylladb.com>
2016-02-03 10:18:51 +02:00
Raphael S. Carvalho
cf22c827f9 compaction_manager: fix assertion when stopping task
Task is stopped by closing gate and forcing it to exit via gate
exception. The problem is that task->compacting_cf may be set to
the column family being compacted, and compaction_manager::remove
would see it and try to stop the same task again, which would
lead to problems. The fix is to clean task->compacting_cf when
stopping task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3473e93c1a107a619322769d65fa020529b5501b.1454441286.git.raphaelsc@scylladb.com>
2016-02-03 10:18:15 +02:00
Asias He
c67538009c streaming: Fix assert in update_progress
The problem is that on the follower side, we set up _session_info too
late, after received PREPARE_DONE_MESSAGE message. The initiator can
send STREAM_MUTATION before sending PREPARE_DONE_MESSAGE message.

To fix, we set up _session_info after we received the prepare_message on
both initiator and follower.

Fixes #869

scylla: streaming/session_info.cc:44: void
streaming::session_info::update_progress(streaming::progress_info):
Assertion `peer == new_progress.peer' failed.
Message-Id: <6d945ba1e8c4fc0949c3f0a72800c9448ba27761.1454476876.git.asias@scylladb.com>
2016-02-03 10:15:45 +02:00
Asias He
46c392eb17 messaging_service: Stop retrying if messaging_service is being shutdown
If we are shutting down the messaging_service, we should not retry the
message again.

Refs #862

Message-Id: <7c3afb646ba8254eca69096d80dd5ea007e416a7.1454418053.git.asias@scylladb.com>
2016-02-02 19:50:54 +02:00
Gleb Natapov
c509e48674 Parallelize batchlog replay
Current code is serialized by get_truncated_at(). Use map_reduce to make
it run in parallel.
Message-Id: <1454421603-13080-4-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:54 +01:00
Gleb Natapov
42e3999a00 Check batchlog version before replaying
In case batchlog serialization format changes check it before trying
to interpret raw data.
Message-Id: <1454421603-13080-3-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:54 +01:00
Gleb Natapov
116ad5a603 Use net::messaging_service::current_version for serialization format versioning
Message-Id: <1454421603-13080-2-git-send-email-gleb@scylladb.com>
2016-02-02 17:08:53 +01:00
Avi Kivity
b14d39bfb1 Merge "Move last bits to IDL serializer and get rid of old one" from Gleb 2016-02-02 12:33:18 +02:00
Gleb Natapov
19067db642 remove old serializer 2016-02-02 12:15:50 +02:00
Gleb Natapov
4e440ebf8e Remove old inet_address and uuid serializers 2016-02-02 12:15:50 +02:00
Gleb Natapov
31bb194c21 Remove old result_digest serializer 2016-02-02 12:15:50 +02:00
Gleb Natapov
10cd4d948c Move result_digest to idl 2016-02-02 12:15:50 +02:00
Gleb Natapov
775cc93880 remove unused range and token serializers 2016-02-02 12:15:49 +02:00
Gleb Natapov
e3a40254e6 Remove old partition_checksum serializer 2016-02-02 12:15:49 +02:00
Gleb Natapov
e6f7b12b51 Move partition_checksum to use idl 2016-02-02 12:15:49 +02:00
Gleb Natapov
8cc1d1a445 Add std:array serializer 2016-02-02 12:15:49 +02:00
Gleb Natapov
a8902ccb4a Remove old frozen_schema serializer 2016-02-02 12:15:49 +02:00
Gleb Natapov
60e3637efc Move frozen_schema to idl 2016-02-02 12:15:49 +02:00
Nadav Har'El
b95c15f040 repair: change checksum structure to be better suited for serializer
Change the partition_checksum structure to be better suited for the
new serializers:

 1. Use std::array<> instead of a C array, as the latters are not
    supported by the new serializers.

 2. Use an array of 32 bytes, instead of 4 8-byte integers. This will
    guarantee that no byte-swapping monkey-business will be done on
    these checksums.
    The checksum XOR and equality-checking methods still temporarily
    cast the bytes to 8-byte chunks, for (hopefully) better performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1454364900-3076-1-git-send-email-nyh@scylladb.com>
2016-02-02 11:58:25 +02:00
Calle Wilund
c67e7e4ce4 cql3::sets: Make insert/update frozen set handle null/empty correctly
Fixes #578

Message-Id: <1454345878-1977-1-git-send-email-calle@scylladb.com>
2016-02-01 19:15:28 +02:00
Takuya ASADA
5fe82ce555 dist: fix build error on Ubuntu 15.10
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454345982-5899-1-git-send-email-syuu@scylladb.com>
2016-02-01 19:14:49 +02:00
Avi Kivity
1f245e3bcb mutation_partition: fix use of boost::intrusive::set<>::comp()
Seems like boost::intrusive::set<>::comp() is not accessible on some
versions of boost.  Replace by the equivalent
boost::intrusive::set<>::key_comp().

Fixes #858.
Message-Id: <1454326483-29780-1-git-send-email-avi@scylladb.com>
2016-02-01 13:54:52 +01:00
Calle Wilund
159dbe3a64 sstable_datafile_tests: Replace '---' with auto
Fixes compilation issues on some g++.
Message-Id: <1454323749-21933-1-git-send-email-calle@scylladb.com>
2016-02-01 12:58:33 +02:00
Avi Kivity
2b84bd3b75 Merge "standalone tcp connection for streaming" from Asias
"Make the streaming use standalone tcp connection and send more mutations in
parallel.

It is supposed to help: "Decommission not fully utilizing hardware #849""
2016-02-01 09:54:11 +02:00
Asias He
c618c699b3 streaming: Increase mutation_send_limiter
The idea behind the current 10 stream_mutations per core limitation is
to avoid streaming overwhelms the TCP connection and starves normal cql
verbs if the streaming mutations are big and takes long time to
complete.

Now that we use a standalone connection for streaming verbs, we can
increase the limitation.

Hopefully, this will fix #849.
2016-02-01 11:01:56 +08:00
Asias He
fbf796b812 messaging_service: Use standalone connection for stream verbs
In streaming, the amount of data needs to be streamed to peer nodes
might be large.

In order to avoid the streaming overwhelms the TCP connection used by
user CQL verbs and starves the user CQL queries, we use a standalone TCP
connection for streaming verbs.
2016-02-01 11:01:56 +08:00
Avi Kivity
1146e3796d Merge "streaming refactor" from Asias
"- Wire up session progress
- Refactor stream_coordinator::host_streaming_data
- Introduce get_session helper to simplfy verb handling
- Remove unused code

Tested with streaming in update_cluster_layout_tests.py"
2016-01-31 20:17:53 +02:00
Tomasz Grabiec
945ae5d1ea Move std::hash<range<T>> definition to range.hh
Message-Id: <1454008052-5152-1-git-send-email-tgrabiec@scylladb.com>
2016-01-31 20:11:30 +02:00
Avi Kivity
f6e7dbf080 Merge seastar upstream
* seastar 6623379...f8beab9 (2):
  > json_base_element: do not assign the element name
  > io_queue: change visibility of internal function
2016-01-31 16:34:39 +02:00
Raphael S. Carvalho
a46aa47ab1 make sstables::compact_sstables return list of created sstables
Now, sstables::compact_sstables() receives as input a list of sstables
to be compacted, and outputs a list of sstables generated by compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0d8397f0395ce560a7c83cccf6e897a7f464d030.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Raphael S. Carvalho
ee84f310d9 move deletion of sstables generated by interrupted compaction
This deletion should be handled by sstables::compact_sstables, which
is the responsible for creation of new sstables.
It also simplifies the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <541206be2e910ab4edb1500b098eb5ebf29c6509.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Glauber Costa
7214649b8a sstables: const where const is due
Some SSTable methods are not marked as const. But they should be.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <72cd3ef0157eb38e7fd48d0c989f2342cbc42f3c.1454103008.git.glauber@scylladb.com>
2016-01-31 12:36:36 +02:00
Avi Kivity
3434b8e7c6 Merge seastar upstream
* seastar fbd9b30...6623379 (1):
  > fstream: improve make_file_input_stream() for a subrange of a file
2016-01-31 12:01:46 +02:00
Avi Kivity
a3fa123070 Update scylla-ami submodule
* dist/ami/files/scylla-ami e284bcd...b2724be (2):
  > Revert "Run scylla.yaml construction only once"
  > Move AMI dependent part of scylla_prepare to scylla-ami-setup.service
2016-01-31 12:01:15 +02:00
Avi Kivity
f08f5858a8 Merge "Introduce scylla-ami-setup.service, fix bugs" from Takuya
"This moves AMI dependent part of scylla_prepare to scylla-ami repo, make it scylla-ami-setup.service which is independent systemd unit.
Also, it stopped calling scylla_sysconfig_setup on scylla_setup (called on AMI creation time), call it from scylla-ami-setup instead."
2016-01-31 12:00:32 +02:00
Takuya ASADA
111dc19942 dist: construct scylla.yaml on first startup of AMI instance, not AMI image creation time
Install scylla-ami-setup.service, stop calling scylla_sysconfig_setup on AMI.
scylla-ami-setup.service will call it instead.
Only works with scylla-ami fix.
Fixes #857

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-30 15:48:45 -05:00
Takuya ASADA
71a26e1412 dist: don't need AMI_KEEP_VERSION anymore, since we fixed the issue that 'yum update' mistakenly replaces scylla development version with release version
It actually doesn't called unconditionally now, (since $LOCAL_PKG is always empty) so we can safely remove this.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-30 15:47:05 -05:00
Takuya ASADA
f9d32346ef dist: scylla_sysconfig_setup uses current sysconfig values as default value
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-30 15:46:21 -05:00
Takuya ASADA
4d5baef3e3 dist: keep original SCYLLA_ARGS when updating sysconfig
Since we dropped scylla_run, now default SCYLLA_ARGS parameter is not empty.
So we need to support it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-30 15:44:03 -05:00
Asias He
f07cd30c81 streaming: Remove unused create_message_for_retry 2016-01-29 16:31:07 +08:00
Asias He
cb92fe75e6 streaming: Introduce get_session helper
To simplify streaming verb handler.

- Use get_session instead of open coded logic to get get_coordinator and
  stream_session in all the verb handlers

- Use throw instead of assert for error handling

- init_receiving_side now returns a shared_ptr<stream_result_future>
2016-01-29 16:31:07 +08:00
Asias He
360df6089c streaming: Remove unused stream_session::retry 2016-01-29 16:31:07 +08:00
Asias He
2f48d402e2 streaming: Remove unused commented code 2016-01-29 16:31:07 +08:00
Asias He
ed3da7b04c streaming: Drop flush_tables option for add_transfer_ranges
We do not stream sstable files. No need to flush it.
2016-01-29 16:31:07 +08:00
Asias He
aa69d5ffb2 streaming: Drop update_progress in stream_coordinator
Since we have session_info inside stream_session now, we can call
update_progress directly in stream_session.
2016-01-29 16:31:07 +08:00
Asias He
30c745f11a streaming: Get rid of stream_coordinator::host_streaming_data
Now host_streaming_data only holds shared_ptr<stream_session>, we can
get rid of it and put shared_ptr<stream_session> inside _peer_sessions.
2016-01-29 16:31:07 +08:00
Asias He
46bec5980b streaming: Put session_info inside stream_session
It is 1:1 mapping between session_info and stream_session. Putting
session_info inside stream_session, we can get rid of the
stream_coordinator::host_streaming_data class.
2016-01-29 16:31:07 +08:00
Asias He
91e245edac streaming: Initialize total_size in stream_transfer_task
Also rename the private member to _total_size and _files
2016-01-29 16:31:07 +08:00
Asias He
c4bdb6f782 streaming: Wire up session progress
The progress info is needed by JMX api.
2016-01-29 16:31:07 +08:00
Avi Kivity
3e4ce609ee Merge seastar upstream
* seastar ec468ba...fbd9b30 (9):
  > Add implementation of count_leading_zeros<LL>
  > Fix htonl usage for clang
  > Fix gnutls_error_category ctor for clang
  > Add header files required for libc++
  > Add clang warning suppressions
  > Switch to correct usage of std::abs
  > Fix the do_marshall(sic) function to init_list
  > Corrected sockaddr_in initialization
  > Remove unused const char misc_strings
2016-01-28 18:24:46 +02:00
Raphael S. Carvalho
3b7970baff compaction: delete generated sstables in event of an interrupt
Generated sstables may imply either fully or partially written.
Compaction is interrupted if it was deriberately asked to stop (stop API)
or it was forced to do so in event of a failure, ex: out of disk space.
There is a need to explicitly delete sstables generated by a compaction
that was interrupted. Otherwise, such sstables will waste disk space and
even worsen read performance, which degrades as number of generations
to look at increases.

Fixes #852.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <49212dbf485598ae839c8e174e28299f7127f63e.1453912119.git.raphaelsc@scylladb.com>
2016-01-28 14:05:57 +02:00
Pekka Enberg
3c3c819280 Merge "api: Fix stream_manager" from Asias
"Fix the metrics for bytes sent and received"
2016-01-28 13:57:59 +02:00
Raphael S. Carvalho
ba4260ea8f api: print proper compaction type
There are several compaction types, and we should print the correct
one when listing ongoing compaction. Currently, we only support
compaction types: COMPACTION and CLEANUP.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c96b1508a8216bf5405b1a0b0f8489d5cc4be844.1453851299.git.raphaelsc@scylladb.com>
2016-01-28 13:47:00 +02:00
Tomasz Grabiec
9fa62af96b database: Move implementation to .cc
Message-Id: <1453980679-27226-1-git-send-email-tgrabiec@scylladb.com>
2016-01-28 13:35:33 +02:00
Tomasz Grabiec
ca6bafbb56 canonical_mutation: Remove commented out junk 2016-01-28 12:29:20 +01:00
Tomasz Grabiec
41dc98bb79 Merge branch 'cleanup_improvements' from git@github.com:raphaelsc/scylla.git
Compaction cleanup improvements from Raphael.
2016-01-27 18:30:46 +01:00
Avi Kivity
873deb5808 Merge "move paging_state to use idl" from Gleb 2016-01-27 19:06:04 +02:00
Takuya ASADA
03caacaad0 dist: enable collectd client by default
Fixes #838

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453910311-24928-2-git-send-email-syuu@scylladb.com>
2016-01-27 18:45:45 +02:00
Takuya ASADA
f33656ef03 dist: eliminate startup script
Fixes #373

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453910311-24928-1-git-send-email-syuu@scylladb.com>
2016-01-27 18:45:35 +02:00
Gleb Natapov
b065e2003f Move paging_state to use idl 2016-01-27 18:39:43 +02:00
Raphael S. Carvalho
45c446d6eb compaction: pass dht::token by reference
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:25:41 -02:00
Raphael S. Carvalho
fc541e2f08 compaction: remove code to sort local ranges
storage_service::get_local_ranges returns sorted ranges, which are
not overlapping nor wrap-around. As a result, there is no need for
the consumer to do anything.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:15:36 -02:00
Avi Kivity
c75d1c4eeb Merge "Ubuntu 'expect stop' related fixes" from Takuya 2016-01-27 17:00:23 +02:00
Gleb Natapov
65bd429a0b Add serialization helper to use outside of rpc. 2016-01-27 16:43:06 +02:00
Takuya ASADA
4162fb158c main: raise SIGSTOP only when scylla become ready
supervisor_notify() calls periodically, to log message on systemd.
So raise(SIGSTOP) will called multiple times, upstart doesn't expected that.
We need to call it just one time.

Fixes #846

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:30:26 +09:00
Takuya ASADA
851951d32d dist: run upstart job as 'scylla' user
Don't use sudo when launching scylla, run directly from upstart.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:30:02 +09:00
Takuya ASADA
89f0fc89b4 dist: set ulimit in upstart job
Upstart job able to specify ulimit like systemd, so drop ubuntu's scylla_run and merge with redhat one.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:29:52 +09:00
Takuya ASADA
b4accd8904 main: autodetect systemd/upstart
We can autodetect systemd/upstart by environment variables, don't need program argument.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:29:32 +09:00
Takuya ASADA
559f913494 dist: use nightly for prebuilt 3rdparty packages (CentOS)
Developers probably wants to use latest dependency packages, so switch to nightly.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453904521-2716-1-git-send-email-syuu@scylladb.com>
2016-01-27 16:24:49 +02:00
Gleb Natapov
19c55693fd idl: add missing header to serializer.hh 2016-01-27 15:49:29 +02:00
Amnon Heiman
7b53b99968 idl-compiler: split the idl list
Not all the idls are used by the messaging service, this patch removes
the auto-generated single include file that holds all the files and
replaes it with individual include of the generated fiels.
The patch does the following:
* It removes from the auto-generated inc file and clean the configure.py
  from it.
* It places an explicit include for each generated file in
  messaging_serivce.
* It add dependency of the generated code in the idl-compiler, so a
change in the compiler will trigger recreation of the generated files.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453900241-13053-1-git-send-email-amnon@scylladb.com>
2016-01-27 15:23:00 +02:00
Pekka Enberg
86173fb8cc db/commitlog: Fix debug log format string in commitlog_replayer::recover()
I saw the following Boost format string related warning during commitlog
replay:

  INFO  [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log
  WARN  [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed)

While inspecting the code, I noticed that one of the error loggers is
missing an argument. As I don't know how the original failure triggered,
I wasn't able to verify that that was the only one, though.

Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>
2016-01-27 13:40:19 +02:00
Pekka Enberg
c4dafe24f5 Update scylla-ami submodule
* dist/ami/files/scylla-ami 77cde04...e284bcd (2):
  > Run scylla.yaml construction only once
  > Revert "Run scylla.yaml construction only once"
2016-01-27 13:30:04 +02:00
Asias He
9fee1cc43a api: Use get_bytes_{received,sent} in stream_manager
The data in session_info is not correctly updated.

Tested while decommission a node:

$ curl -X GET  --silent --header "Accept: application/json"
"http://127.0.0.$i:10000/stream_manager/metrics/incoming";echo

$ curl -X GET --silent --header "Accept: application/json"
"http://127.0.0.$i:10000/stream_manager/metrics/outgoing";echo
2016-01-27 18:17:36 +08:00
Asias He
03aced39c4 streaming: Account number of bytes sent and received per session
The API will consume it soon.
2016-01-27 18:16:58 +08:00
Asias He
36829c4c87 api: Fix stream_manager total_incoming/outgoing bytes
Any stream, no matter initialized by us or initialized by a peer node,
can send and receive data. We should audit incoming/outgoing bytes in
the all streams.
2016-01-27 18:15:09 +08:00
Asias He
08f703ddf6 streaming: Add get_all_streams in stream_manager
Get all streams both initialized by us or initialized by peer node.
2016-01-27 18:15:09 +08:00
Tomasz Grabiec
c971544e83 bytes_ostream: Adapt to Output concept used in serializer.hh
Message-Id: <1453888242-2086-1-git-send-email-tgrabiec@scylladb.com>
2016-01-27 12:13:34 +02:00
Gleb Natapov
6a581bb8b6 messaging_service: replace rpc::type with boost::type
RPC moved to boost::type to make serializers less rpc centric. Scylla
should follow.

Message-Id: <20160126164450.GA11706@scylladb.com>
2016-01-27 11:57:45 +02:00
Gleb Natapov
6f6b231839 Make serializer use new simple stream location
Message-Id: <20160127093045.GG9236@scylladb.com>
2016-01-27 11:37:37 +02:00
Raphael S. Carvalho
d54c77d5d0 change abstract_replication_strategy::get_ranges to not return wrap-arounds
The main motivation behind this change is to make get_ranges() easier for
consumers to work with the returned ranges, e.g. binary search to find a
range in which a token is contained. In addition, a wrap-around range
introduces corner cases, so we should avoid it altogether.

Suppose that a node owns three tokens: -5, 6, 8

get_ranges() would return the following ranges:
(8, -5], (-5, 6], (6, 8]
get_ranges() will now return the following ranges:
(-inf, -5], (-5, 6], (6, 8], (8, +inf)

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <4bda1428d1ebbe7c8af25aa65119edc5b97bc2eb.1453827605.git.raphaelsc@scylladb.com>
2016-01-27 09:48:31 +01:00
Avi Kivity
b9ab28a0e6 Merge "storage_service: add drain on shutdown logic & fix" from Asias
"Fixes:
- storage_service::handle_state_removing() doesn't call drain() #825
https://github.com/scylladb/scylla/issues/825

- nodetool gossipinfo is out of sync #790
https://github.com/scylladb/scylla/issues/790"
2016-01-27 10:38:56 +02:00
Avi Kivity
1d7144ac14 Merge seastar upstream
* seastar bdb273a...ec468ba (1):
  > Move simple streams used for serialization into separate header
2016-01-27 10:38:09 +02:00
Amnon Heiman
fd94009d0e Fix API init process
The last patch of the API init process had a bug, that the wrong init
function was called.

This solve the issue.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453879042-26926-1-git-send-email-amnon@scylladb.com>
2016-01-27 10:03:24 +02:00
Asias He
8b4275126d storage_service: Shutdown messaging_service in decommission
It is commented out.
2016-01-27 11:48:49 +08:00
Asias He
b2f2c1c28c storage_service: Add drain on shutdown logic
We register engine().at_exit() callbacks when we initialize the services. We
do not really call the callbacks at the moment due to #293.

It is pretty hard to see the whole picture in which order the services
are shutdown. Instead of for each services to register a at_exit()
callbacks, I proposal to have a single at_exit() callback which do the
shutdown for all the services. In cassandra, the shutdown work is done
in storage_service::drain_on_shutdown callbacks.

In this patch, the drain_on_shutdown is executed during shutdown.

As a result, the proper gossip shutdown is executed and fixes #790.

With this patch, when Ctrl-C on a node, it looks like:

INFO  [shard 0] storage_service - Drain on shutdown: starts
INFO  [shard 0] gossip - Announcing shutdown
INFO  [shard 0] storage_service - Node 127.0.0.1 state jump to normal
INFO  [shard 0] storage_service - Drain on shutdown: stop_gossiping done
INFO  [shard 0] storage_service - CQL server stopped
INFO  [shard 0] storage_service - Drain on shutdown: shutdown rpc and cql server done
INFO  [shard 0] storage_service - Drain on shutdown: shutdown messaging_service done
INFO  [shard 0] storage_service - Drain on shutdown: flush column_families done
INFO  [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
INFO  [shard 0] storage_service - Drain on shutdown: done
2016-01-27 11:45:52 +08:00
Asias He
e733930dff storage_service: Call drain inside handle_state_removing
Now that drain is implemented, call it.

Fixes #825
2016-01-27 11:45:52 +08:00
Asias He
5003c6e78b config: Introduce shutdown_announce_in_ms option
Time a node waits after sending gossip shutdown message in milliseconds.

Reduces ./cql_query_test execution time

from
   real    2m24.272s
   user    0m8.339s
   sys     0m10.556s

to
   real    1m17.765s
   user    0m3.698s
   sys     0m11.578
2016-01-27 11:19:38 +08:00
Paweł Dziepak
490201fd1c row_cache: protect against stale entries
row_cache::update() does not explicitly invalidate the entries it failed
to update in case of a failure. This could lead to inconsistency between
row cache and sstables.

In paractice that's not a problem because before row_cache::update()
fails it will cause all entries in the cache to be invalidated during
memory reclaim, but it's better to be safe and explicitly remove entries
that should be updated but it was not possible to do so.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453829681-29239-1-git-send-email-pdziepak@scylladb.com>
2016-01-26 20:34:41 +01:00
Takuya ASADA
9b66d00115 dist: fix scylla_bootparam_setup for Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453836012-6436-1-git-send-email-syuu@scylladb.com>
2016-01-26 21:24:20 +02:00
Erich Keane
c836c88850 Replace deprecated BOOST_MESSAGE with BOOST_TEST_MESSAGE
BOOST Unit test deprecated BOOST_MESSAGE as early as 1.34 and had it
been perminently removed.  This patch replaces all uses of BOOST_MESSAGE
with BOOST_TEST_MESSAGE.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1453783854-4274-1-git-send-email-erich.keane@verizon.net>
2016-01-26 19:01:40 +02:00
Amnon Heiman
b1845cddec Breaking the API initialization into stages
The API needs to be available at an early stage of the initialization,
on the other hand not all the specific APIs are available at that time.

This patch breaks the API initialization into stages, in each stage
additional commands will be available.

While setting that the api header files was broken into api_init.hh that
is relevent to the main and to api.hh which holds the different
api helper functions.

Fixes #754

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453822331-16729-2-git-send-email-amnon@scylladb.com>
2016-01-26 17:41:31 +02:00
Calle Wilund
e6b792b2ff commitlog bugfix: Fix batch mode
Last series accidently broke batch mode.
With new, fancy, potentitally blocking ways, we need to treat
batch mode differently, since in this case, sync should always
come _after_ alloc-write.
Previous patch caused infinite loop. Broke jenkins.

Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>
2016-01-26 17:13:14 +02:00
Glauber Costa
3f94070d4e use auto&& instead of auto& for priority classes.
By Avi's request, who reminds us that auto& is more suited for situations
in which we are assigning to the variable in question.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>
2016-01-26 17:00:20 +02:00
Avi Kivity
71eb79aedd main: exit with code 0 on shutdown
To avoid confusing systemd.

Fixes #823.

Message-Id: <1453220473-28712-1-git-send-email-avi@scylladb.com>
2016-01-26 16:26:53 +02:00
Calle Wilund
89dc0f7be3 commitlog: wait for writes (if needed) on new segment as well
Also check closed status in allocate, since alloc queue waiting could
lead to us re-allocating in a segment that gets closed in between
queue enter and us running the continuation.

Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>
2016-01-26 15:05:12 +02:00
Shlomi Livne
0a553dae1f Fix test.py invocation of sstable_test
invocation of sstable_test "./test.py  --name sstable_test --mode
release --jenkins a"
ran ... --log_sink=a.release.sstable_test -c1.boost.xml" which caused
the test to fail "with error code -11" fix that.

In addition boost test printout was bad fix that as well

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <3af8c4b55beae673270f5302822d7b9dbba18c0f.1453809032.git.shlomi@scylladb.com>
2016-01-26 12:56:26 +01:00
Avi Kivity
fbf56b3d98 Merge "Commit log threshold / back pressure" from Calle
"Adds flush + write thresholds/limits that, when reached, causes
operations to wait before being issued.
Write ops waiting also causes further allocations to queue up,
i.e. limiting throughput.

Adds getters for some useful "backlog" measurements:

* Pending (ongoing) writes/flush
* Pending (queued, wating) allocations
* Num times write/flush threshold has been exceeded (i.e. waits occured)
* Finished, dirty segments
* Unused (preallocated) segments"
2016-01-26 13:19:58 +02:00
Avi Kivity
a53788d61d Merge "More streaming cleanup and fix" from Asias
"- Drop compression_info/stream_message
- Cleanup outgoing_file_message/prepare_message
- Fix stream manager API (more to come)"
2016-01-26 13:17:58 +02:00
Avi Kivity
486d937111 Merge seastar upstream
* seastar 97f418a...bdb273a (6):
  > rpc: alias rpc::type to boost::type
  > Fix warning_supported to properly work with Clang
  > rpc: change 'overflow' to 'underflow' in input stream processing
  > rpc: log an error that caused connection to be closed.
  > rpc: clarify deserialization error message
  > rpc: do not append new line in a logger
2016-01-26 12:58:32 +02:00
Avi Kivity
5ad4b59f99 Update scylla-ami submodule
* dist/ami/files/scylla-ami 188781c...77cde04 (1):
  > Run scylla.yaml construction only once
2016-01-26 12:58:11 +02:00
Takuya ASADA
12748cf1b9 dist: support CentOS AMI on scylla_ntp_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453801319-26072-1-git-send-email-syuu@scylladb.com>
2016-01-26 12:55:08 +02:00
Calle Wilund
f2c5315d33 commitlog: Add write/flush limits
Configured on start (for now - and dummy values at that). 
When shard write/flush count reaches limit, and incoming ops will queue
until previous ones finish. 

Consequently, if an allocation op forces a write, which blocks, any 
other incoming allocations will also queue up to provide back pressure.
2016-01-26 10:19:24 +00:00
Calle Wilund
7628a4dfe0 commitlog: Add some feedback/measurement methods
Suitable to derive "back pressure" from.
2016-01-26 09:47:14 +00:00
Calle Wilund
4f5bd4b64b commitlog: split write/flush counters 2016-01-26 09:47:14 +00:00
Calle Wilund
215c8b60bf commitlog: minor cleanup - remove red squiggles in eclipse 2016-01-26 09:42:26 +00:00
Calle Wilund
61c7235c11 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-26 09:42:08 +00:00
Avi Kivity
0de7d1fc1b Merge "Add priority classes to our I/O path" from Glauber
"After the patch, all of our relevant I/O is placed on a specific priority class.
The ones which are not are left into the Seastar's default priority, which will
effectively work as an idle class.

Examples of such I/O are commitlog replay and initial SSTable loading. Since they
will happen during initialization, they will run uncontended, and do not justify
having a class on their own."
2016-01-26 10:46:13 +02:00
Asias He
750573ca0c configure: Fix idl indentation 2016-01-26 15:04:45 +08:00
Asias He
cc6d928193 api: Fix peer -> streaming_plan id in stream_manager
It is wrong to get a stream plan id like below:

   utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);

We should look at all stream_sessions with the peer in question.
2016-01-26 15:00:44 +08:00
Asias He
384e81b48a streaming: Add get_peer_session_info
Like get_all_session_info, but only gets the session_info for a specific
peer.
2016-01-26 14:52:40 +08:00
Asias He
c7b156ed65 api: Fix get_{all}total_outgoing_byte in stream_manager
We should call get_total_size_sent instead of get_total_size_received
for outgoing byes.
2016-01-26 14:22:43 +08:00
Asias He
2e69d50c0c streaming: Cleanup prepare_message
- Drop empty prepare_message.cc
- Drop #if 0'ed code
2016-01-26 13:14:04 +08:00
Asias He
bbf025968b streaming: Cleanup outgoing_file_message
- Drop the unused headers
- Drop the outgoing_file_message.cc file which is empty
2016-01-26 13:12:01 +08:00
Asias He
e8b8b454df streaming: Flatten streaming messages class namespace
There are only two messages: prepare_message and outgoing_file_message.
Actually only the prepare_message is the message we send on wire.
Flatten the namespace.
2016-01-26 13:04:29 +08:00
Asias He
cab36a450b streaming: Remove stream_message
It is not useful to make stream_message as the base class for stream
messages. Scylla uses RPC verbs to distinguish different messages types.
2016-01-26 12:32:17 +08:00
Asias He
6a067bcc23 streaming: Drop unused compression_info 2016-01-26 11:55:36 +08:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
261c272178 introduce a priority manager
After the introduction of the Fair I/O Queueing mechanism in Seastar,
it is possible to add requests to a specific priority class, that will
end up being serviced fairly.

This patch introduces a Priority Manager service, that manages the priority
each class of request will get. At this moment, having a class for that may
sound like an overkill. However, the most interesting feature of the Fair I/O
queue comes from being able to adjust the priorities dynamically as workloads
changes: so we will benefit from having them all in the same place.

This is designed to behave like one of our services, with the exception that
it won't use the distributed interface. This is mainly because there is no
reason to introduce that complexity at this point - since we can do thread local
registration as we have been doing in Seastar, and because that would require us
to change most of our tests to start a new service.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
f6cfb04d61 add a priority class to mutation readers
SSTables already have a priority argument wired to their read path. However,
most of our reads do not call that interface directly, but employ the services
of a mutation reader instead.

Some of those readers will be used to read through a mutation_source, and those
have to patched as well.

Right now, whenever we need to pass a class, we pass Seastar's default priority
class.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
8e4bf025ae sstables: wire priority for read path
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
56c11a8109 sstables: wire priority for write path
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
03d5a89b90 sstables: mandate a buffer size parameter for data_stream_at
The only user for the default size is data_read, sitting at row.cc.
That reader wants to read and process a chunk all at once. So there's
really no reason to use the default buffer size - except that this code
is old.

We should do as we do in other single-key / single-range readers and
try to read all at once if possible, by looking at the size we received
as a parameter. Cleaning up the data_stream_at interface then comes as
a nice side effect.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
15336e7eb7 key_source: turn it into a class
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
58fdae33bd mutation_source: turn it into a class
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Gleb Natapov
c9bd069815 messaging_service: log rpc errors
Message-Id: <20160125155005.GC23862@scylladb.com>
2016-01-25 17:59:26 +02:00
Avi Kivity
91b57c7e20 Merge "Move streaming to use IDL" from Asias 2016-01-25 17:10:22 +02:00
Asias He
f027a9babe streaming: Drop unused serialization code 2016-01-25 22:39:13 +08:00
Asias He
ad80916905 messaging_service: Add streaming implementation for idl
- stream_request
- stream_summary
- prepare_message

 Please enter the commit message for your changes. Lines starting
2016-01-25 22:36:58 +08:00
Asias He
b299cc3bee idl: Add streaming.idl.hh
- stream_request
- stream_summary
- prepare_message
2016-01-25 22:29:25 +08:00
Asias He
5e100b3426 streaming: Drop unused repaired_at in stream_request 2016-01-25 22:28:48 +08:00
Avi Kivity
6fade0501b Update test/message.cc for MESSAGE verb rename 2016-01-25 14:47:55 +02:00
Nadav Har'El
db19a43d98 repair: try harder to repair, even when some nodes are unreachable
In the existing code, when we fail to reach one of the replicas of some
range being repaired, we would give up, and not continue to repair the
living replicas of this range. The thinking behind this was since the
repair should be considered failed anyway, there's no point in trying
to do a half-job better.

However, in a discussion I had with Shlomi, he raised the following
alternative thinking, which convinced me: In a large cluster, having
one node or another temporarily dead has a high probability. In that
case, even if the if the repair is doomed to be considered "failed",
we want it at least to do as much as it possibly can to repair the
data on the living part of the cluster. This is what this patch does:
If we can only reach some of the replicas of a given range, the repair
will be considered failed (as before), but we will still repair the
reachable replicas of this range, if they have different checksums.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1453724443-29320-1-git-send-email-nyh@scylladb.com>
2016-01-25 14:37:39 +02:00
Amnon Heiman
039e627b32 idl-compiler: Fix an issue with default values
This patch fix an issue when a parameter with a version attribute had a
default value.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453723251-9797-1-git-send-email-amnon@scylladb.com>
2016-01-25 14:32:00 +02:00
Takuya ASADA
e9fdb426b6 dist: add pyparsing on CentOS build time dependency CentOS
Porting pyparsing from Fedora23, build it for python34 which provided by epel.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453720780-21105-1-git-send-email-syuu@scylladb.com>
2016-01-25 13:26:58 +02:00
Takuya ASADA
b8b0ff0482 dist: add pyparsing on Ubuntu build time dependency
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453720081-15979-1-git-send-email-syuu@scylladb.com>
2016-01-25 13:08:48 +02:00
Avi Kivity
5c5207f122 Merge "Another round of streaming cleanup" from Asias
"- Merge stream_init_message and stream_parepare_message
- Drop  session_index / keep_ss_table_level / file_message_header"
2016-01-25 12:54:30 +02:00
Asias He
77684a5d4c messaging_service: Drop STREAM_INIT_MESSAGE
The verb is not used anymore.
Message-Id: <1453719054-29584-1-git-send-email-asias@scylladb.com>
2016-01-25 12:53:08 +02:00
Asias He
53c6cd7808 gossip: Rename echo verb to gossip_echo
It is used by gossip only. I really could not allow myself to get along
this inconsistence. Change before we still can.
Message-Id: <1453719054-29584-2-git-send-email-asias@scylladb.com>
2016-01-25 12:53:07 +02:00
Takuya ASADA
67d2aa677e dist: add pyparsing on Fedora build time dependency
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453715594-32675-1-git-send-email-syuu@scylladb.com>
2016-01-25 11:59:32 +02:00
Takuya ASADA
78d107ccaa dist: add missing dependencies for scylla-gdb
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453715339-32296-1-git-send-email-syuu@scylladb.com>
2016-01-25 11:59:31 +02:00
Asias He
51fa717b8e streaming: Get rid of file_message_header
Again, we do not send sstable files, thus neither header info for
sstables files.

TODO: Estimate mutation size we sent.
2016-01-25 17:56:43 +08:00
Asias He
eba9820b22 streaming: Remove stream_session::file_sent
It is the callback after sending file_message_header. In scylla, we do
not sent the file_message_header. Drop it.
2016-01-25 17:25:34 +08:00
Asias He
592683650a streaming: Remove unused serialization code for file_message_header 2016-01-25 17:16:57 +08:00
Calle Wilund
2d1e332fba Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-25 09:11:12 +00:00
Asias He
fa4e94aa27 streaming: Get rid of keep_ss_table_level
We stream mutation instead of files, so keep_ss_table_level is not
relevant for us.
2016-01-25 16:58:57 +08:00
Asias He
2cc31ac977 streaming: Get rid of the stream_index
It is always zero.
2016-01-25 16:58:57 +08:00
Asias He
6b30f08a38 streaming: Always return zero for session_index in api/stream_manager
We will remove session_index soon. It will always be zero. Do not drop
it in api so that the api will be compatible with c*.
2016-01-25 16:58:51 +08:00
Asias He
ad4a096b80 streaming: Get rid of stream_init_message
Unlike streaming in c*, scylla does not need to open tcp connections in
streaming service for both incoming and outgoing messages, seastar::rpc
does the work. There is no need for a standalone stream_init_message
message in the streaming negotiation stage, we can merge the
stream_init_message into stream_prepare_message.
2016-01-25 16:24:16 +08:00
Asias He
048965ea02 streaming: Do not print session_index in handle_session_prepared
session_index is always 0. It will be removed soon.
2016-01-25 16:24:16 +08:00
Avi Kivity
449b81f5d3 Merge "streaming cleanup" from Asias
"No mercy to the unused parameters and messages.
This will help the upcoming IDL serialize/deserialize work."
2016-01-25 10:21:16 +02:00
Avi Kivity
9ebd3f8098 Merge "Move gossip to use IDL" from Asias
"This changes gossip to use IDL based serialization code."
2016-01-25 10:18:34 +02:00
Asias He
20496ed9a8 tests: Stop gossip during shutdown in cql_test_env
Fixes the heap-use-after-free error in build/debug/tests/auth_test

==1415==ERROR: AddressSanitizer: heap-use-after-free on address
0x62200032cfa8 at pc 0x00000350701d bp 0x7fec96df8d40 sp
0x7fec96df8d30
READ of size 8 at 0x62200032cfa8 thread T1
    #0 0x350701c in
_ZZN3gms8gossiper3runEvENKUlOT_E0_clI6futureIJEEEEDaS2_
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x350701c)
    #1 0x35795b1 in apply<gms::gossiper::run()::<lambda(auto:40&&)>,
future<> > /home/penberg/scylla/seastar/core/future.hh:1203
    #2 0x369103d in
_ZZN6futureIJEE12then_wrappedIZN3gms8gossiper3runEvEUlOT_E0_S0_EET0_S5_ENUlS5_E_clI12future_stateIJEEEEDaS5_
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x369103d)
    #3 0x369182a in run /home/penberg/scylla/seastar/core/future.hh:399
    #4 0x435f24 in
reactor::run_tasks(circular_buffer<std::unique_ptr<task,
std::default_delete<task> >, std::allocator<std::unique_ptr<task,
std::default_delete<task> > > >&) core/reactor.cc:1368
    #5 0x43a44f in reactor::run() core/reactor.cc:1672
    #6 0x952e4b in app_template::run_deprecated(int, char**,
std::function<void ()>&&) core/app-template.cc:123
    #7 0x58dc79d in test_runner::start(int,
char**)::{lambda()#1}::operator()()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x58dc79d)
    #8 0x58e6cd6 in _M_invoke /usr/include/c++/5.3.1/functional:1871
    #9 0x688639 in std::function<void ()>::operator()() const
/usr/include/c++/5.3.1/functional:2271
    #10 0x8d939c in posix_thread::start_routine(void*) core/posix.cc:51
    #11 0x7feca02a4609 in start_thread (/lib64/libpthread.so.0+0x7609)
    #12 0x7fec9ffdea4c in clone (/lib64/libc.so.6+0x102a4c)

0x62200032cfa8 is located 5800 bytes inside of 5808-byte region
[0x62200032b900,0x62200032cfb0)
freed by thread T1 here:
    #0 0x7feca4f76472 in operator delete(void*, unsigned long)
(/lib64/libasan.so.2+0x9a472)
    #1 0x3740772 in gms::gossiper::~gossiper()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x3740772)
    #2 0x2588ba1 in shared_ptr<gms::gossiper>::~shared_ptr()
seastar/core/shared_ptr.hh:389
    #3 0x4fc908c in
seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}::~stop()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x4fc908c)
    #4 0x4ff722a in future<>
future<>::then<seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}, future<>
>(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}&&)::{lambda(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1})#1}::~then()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x4ff722a)
    #5 0x509a28c in continuation<future<>
future<>::then<seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}, future<>
>(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}&&)::{lambda(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1})#1}>::~continuation()
seastar/core/future.hh:395
    #6 0x509a40d in continuation<future<>
Message-Id: <f8f1c92c1eb88687ab0534f5e7874d53050a5b93.1453446350.git.asias@scylladb.com>
2016-01-25 08:19:18 +02:00
Asias He
bc4ac5004e streaming: Kill stream_result_future::create_and_register
The helper is used only once in init_sending_side and in
init_receiving_side we do not use create_and_register to create
stream_result_future. Kill the trivial helper to make the code more
consistent.

In addition, rename variables "future" and "f" to sr (streaming_result).
2016-01-25 11:38:13 +08:00
Asias He
face74a8f2 streaming: Rename stream_result_future::init to ::init_sending_side
So we have:

- init_sending_side
  called when the node initiates a stream_session

- init_receiving_side
  called when the node is a receiver of a stream_session initiated by a peer
2016-01-25 11:38:13 +08:00
Asias He
dc94c5e42e streaming: Rename get_or_create_next_session to get_or_create_session
There is only one session for each peer in stream_coordinator.
2016-01-25 11:38:13 +08:00
Asias He
e46d4166f2 streaming: Refactor host_streaming_data
In scylla, in each stream_coordinator, there will be only one
stream_session for each remote peer. Drop the code supporting multiple
stream_sessions in host_streaming_data.

We now have

   shared_ptr<stream_session> _stream_session

instead of

   std::map<int, shared_ptr<stream_session>> _stream_sessions
2016-01-25 11:38:13 +08:00
Asias He
8a4b563729 streaming: Drop the get_or_create_session_by_id interafce
The session index will always be 0 in stream_coordinator. Drop the api for it.
2016-01-25 11:38:13 +08:00
Asias He
9a346d56b9 streaming: Drop unnecessary parameters in stream_init_message
- from
  We can get it form the rpc::client_info

- session_index
  There will always be one session in stream_coordinator::host_streaming_data with a peer.

- is_for_outgoing
  In cassandra, it initiates two tcp connections, one for incoming stream and one for outgoing stream.
  logger.debug("[Stream #{}] Sending stream init for incoming stream", session.planId());
  logger.debug("[Stream #{}] Sending stream init for outgoing stream", session.planId());
  In scylla, it only initiates one "connection" for sending, the peer initiates another "connection" for receiving.
  So, is_for_outgoing will also be true in scylla, we can drop it.

- keep_ss_table_level
  In scylla, again, we stream mutations instead of sstable file. It is
  not relevant to us.
2016-01-25 11:38:13 +08:00
Asias He
1bc5cd1b22 streaming: Drop streaming/messages/session_failed_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
2a04e8d70e streaming: Drop streaming/messages/incoming_file_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
26ba21949e streaming: Drop streaming/messages/retry_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
4b4363b62d streaming: Drop streaming/messages/received_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
b3e00472ed streaming: Drop streaming/streaming.cc
It is used in the early stage of development to make sure things compile.
2016-01-25 11:38:13 +08:00
Asias He
5a0bf10a0b streaming: Drop streaming/messages/complete_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
bdd6a69af7 streaming: Drop unused parameters
- int connections_per_host

Scylla does not create connections per stream_session, instead it uses
rpc, thus connections_per_host is not relevant to scylla.

- bool keep_ss_table_level
- int repaired_at

Scylla does not stream sstable files. They are not relevant to scylla.
2016-01-25 11:38:13 +08:00
Asias He
7b633ad127 gossip: Drop unused serialization code
- heart_beat_state
2016-01-25 11:28:29 +08:00
Asias He
4ce08ff251 messaging_service: Add heart_beat_state implementation 2016-01-25 11:28:29 +08:00
Asias He
d7c7994f37 gossip: Drop unused serialization code
- versioned_value
2016-01-25 11:28:29 +08:00
Asias He
8098ba10b7 gossip: Drop unused serialization code
- endpoint_state
2016-01-25 11:28:29 +08:00
Asias He
ecca969adf messaging_service: Add gossip::endpoint_state implementation 2016-01-25 11:28:29 +08:00
Asias He
2a0b6589dd messaging_service: Add versioned_value implementation 2016-01-25 11:28:29 +08:00
Asias He
6660658742 gossip: Drop unused serialization code
- gossip_digest_serialization_helper
- gossip_digest
2016-01-25 11:28:29 +08:00
Asias He
15f2b353b9 messaging_service: Add gossip_digest implementation 2016-01-25 11:28:29 +08:00
Asias He
736d21a912 gossip: Drop unused serialization code
- gossip_digest_syn
- gossip_digest_ack
- gossip_digest_ack2
2016-01-25 11:28:29 +08:00
Asias He
d81fc12af3 messaging_service: Add gossip_digest_ack2 implementation 2016-01-25 11:28:29 +08:00
Asias He
e67cecaee1 messaging_service: Add gossip_digest_syn implementation 2016-01-25 11:28:29 +08:00
Asias He
d94b7e49d2 idl: Add gossip_digest_syn
Added get_partioner and get_cluster_id
2016-01-25 11:28:28 +08:00
Asias He
60f5891c3f idl: Add gossip_digest_ack2 2016-01-25 11:28:26 +08:00
Takuya ASADA
0f0d1c7aed dist: don't depends on libvirtd, since we are not using it
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453470970-31036-1-git-send-email-syuu@scylladb.com>
2016-01-24 17:15:13 +02:00
Avi Kivity
65a140481c Merge " streaming COMPLETE_MESSAGE failure and message retry logic fix" from Asias
"This series:

- Add more debug info to stream session
- Fail session if we fail to send COMPLETE_MESSAGE
- Handle message retry logic for verbs used by streaming

See commit log for details."
2016-01-24 16:41:06 +02:00
Avi Kivity
6135e0ae78 Merge "Move read/write mutation path to use IDL" from Gleb 2016-01-24 13:35:04 +02:00
Avi Kivity
b415f87324 Merge "Serializer Deserializer code generation" from Amnon
"The series do the following:
It adds the code generation
Perform the needed changes in the current classes so each would have getter for
each of its serializable value and a constructor from the serialized values.
It adds a schema definition that cover gossip_diget_ack
It changes the messaging_service to use the generated code.

An overall explanation of the solution with a description of the schema IDL can
be found on the wiki page:

https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation
"
2016-01-24 12:56:42 +02:00
Gleb Natapov
b9b6f703c3 Remove old serializer for frozen_mutation and reconcilable_result 2016-01-24 12:45:41 +02:00
Gleb Natapov
067bdb23cd Move reconcilable_result and frozen_mutation to idl 2016-01-24 12:45:41 +02:00
Gleb Natapov
18dff5ebc8 Move smart pointer serialization helpers to .cc file.
They are not used outside of the .cc file, so should not be in the
header.
2016-01-24 12:45:41 +02:00
Gleb Natapov
93da9b2725 Remove redundant vector serialization code.
IDL serializer has the code to serialize vectors, so use it instead.
2016-01-24 12:45:41 +02:00
Gleb Natapov
ab6703f9bc Remove old query::result serializer 2016-01-24 12:45:41 +02:00
Gleb Natapov
afc407c6e5 Move query::result to use idl. 2016-01-24 12:45:41 +02:00
Gleb Natapov
be4e68adbf Add bytes_ostream serializer. 2016-01-24 12:45:41 +02:00
Gleb Natapov
043d132ba9 Remove no longer used serializers. 2016-01-24 12:45:41 +02:00
Gleb Natapov
4ae906b204 Add serializer overload for query::partition_range.
From now on query::partition_range will use generated code.
2016-01-24 12:45:41 +02:00
Gleb Natapov
2d1b2765e6 Add serializer overload for query::read_command.
From now on query::read_command will use generated code.
2016-01-24 12:45:41 +02:00
Gleb Natapov
49ce2b83df Add ring_position constructor needed by serializer. 2016-01-24 12:45:41 +02:00
Gleb Natapov
6cc5b15a9c Fix read_command constructor to not copy parameters. 2016-01-24 12:45:41 +02:00
Gleb Natapov
4384c7fe85 un-nest range::bound class.
Serializer does not support nested classes yet, so move bound outside.
2016-01-24 12:45:41 +02:00
Gleb Natapov
7357b1ddfe Move specific_ranges to .hh and un-nest it.
Serializer requires class to be defined, so it has to be in .h file. It
also does not support nested types yet, so move it outside of containing
class.
2016-01-24 12:45:41 +02:00
Gleb Natapov
9ae7dc70da Prepare partition_slice to be used by serializer.
Add missing _specific_ranges getter and setter.
2016-01-24 12:45:41 +02:00
Gleb Natapov
48ab0bd613 Make constructor from bytes for partition_key and clustering_key_prefix public
Make constructor from bytes public since serializer will use it.
2016-01-24 12:45:41 +02:00
Gleb Natapov
8deb5e424c Add idl files for more types.
Add idl for uuid/range/read_command/token/ring_position/clustering_key_prefix/partition_key.
2016-01-24 12:45:41 +02:00
Gleb Natapov
11299aa3db Add serializers for more basic types.
We will need them in following patches.
2016-01-24 12:45:41 +02:00
Gleb Natapov
a643f3d61f Reorder bool and uint8_t serializers
bool serializer uses uint8_t one so should be defined after it.
2016-01-24 12:45:41 +02:00
Gleb Natapov
cba31eb4f8 cleanup gossip_digest.idl
Remove uuid class, nonexistent application states and add ';'.
2016-01-24 12:45:37 +02:00
Avi Kivity
a3efecb8fe Merge seastar upstream
* seastar 5c2660b...97f418a (4):
  > io_queues: register individual classes with collectd
  > reactor: destroy the I/O queues explicitly
  > rpc_impl: add pragma once
  > rpc: add skip to simple_input_stream
2016-01-24 12:35:24 +02:00
Takuya ASADA
b5029dae7e dist: remove abrt from AMI, since it's not able to work with Scylla
New CentOS Base Image contains abrt by default, so remove it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453416479-28553-2-git-send-email-syuu@scylladb.com>
2016-01-24 12:31:50 +02:00
Amnon Heiman
0006f236a6 Add an IDL definition file
This adds the IDL definition file.
It is also cover in the wiki page:
https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
f266c2ed42 README.md: Add dependency for pyparsing python3
python3 needs to install pyparsing excplicitely. This adds the
installation of python3-pyparsing to the require dependencies in the
README.md

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
577ce0d231 Adding a sepcific template initialization in messaging_service to use
the serializer

This patch adds a specific template initialization so that the rpc would
use the serializer and deserializer that are auto-generated.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
b625363072 Adding the serializer decleration and implementation files.
This patch adds the serializer and serializer_imple files. They holds
the functions that are not auto generated: primitives and templates (map
and vector) It also holds the include to the auto-generated code.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:20 +02:00
Amnon Heiman
451cf2692c configure.py Add serializer code generation from schema
This patch adds rules and the idl schema to configure, which will call
the code generation to create the serialization and deserialization
functions.

There is also a rule to create the header file that include the auto
generated header files.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:20 +02:00
Amnon Heiman
0715dcd6ba A schema definition for gossip_digest_ack
This is a definition example for gossip_digest_ack with all its sub
classes.

It can be used by the code generator to create the serializer and
deserializer functions.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:14 +02:00
Amnon Heiman
d27734b9be Add a constructor to inet_address from uint32_t
inet_address uses uint32_t to store the ip address, but its constructor
is int32_t.
So this patch adds such a constructor.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
8a4d211a99 Changes the versioned_value to make serializeble
This patch contains two changes, it make the constructor with parameters
public. And it removes the dependency in messaging_service.hh from the
header file by moving some of the code to the .cc file.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
ddc3fe1328 endpoint_state adds a constructor for all serialized parameters
An external deserialize function needs a constructor with all the
serialized parameters.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
4a34ed82a3 Add code generation for serializer and deserializer
The code generation takes a schema file and create two files from it,
one with a dist.hh extension containing the forward declarations and a second
with dist.impl.hh with the actual implementation.

Because the rpc uses templating for the input and output streams. The
generated functions are templates.

For each class, struct or enum, two functions are created:
serialize - that gets the output buffer as template parameter and
serialize the object to it. There must be a public way to get to each of
the parameters in the class (either a getter or the parameter should be
public)

deserialize - that gets an input buffer, and return the deserialize
object (and by reference the number of char it read).
To create the return object, the class must have a public constructor
with all of its parameters.

The solution description can be found here:
https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:12:51 +02:00
Takuya ASADA
aef1e67a9b dist: remove mdadm,xfsprogs from dependencies, install it when constructing RAID by scylla_raid_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453422886-26297-2-git-send-email-syuu@scylladb.com>
2016-01-24 12:10:41 +02:00
Takuya ASADA
b92a075a34 main: support supervisor_notify() on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453422886-26297-1-git-send-email-syuu@scylladb.com>
2016-01-24 12:10:41 +02:00
Asias He
7ac3e835a6 messaging_service: Fix send_message_timeout_and_retry
When a verb timeout, if we resend the message again, the peer could receive
the message more than once. This would confuse the receiver. Currently, only
the streaming code use the retry logic.

- In case of rpc:timeout_error:

Instead of doing timeout in a relatively short time and resending a few
times, we make the timeout big enough and let the tcp to do the resend.
Thus, we can avoid resending the message more than once, of course, the
receiver would not receive the message more than once.

- In case of rpc::closed_error:

There are two cases:
1) Failing to establish a connection.

For instance, the peer is down. It is safe to resend since we know for
sure the receiver hasn't received the message yet.

2) The connection is established.

We can not figure out if the remote peer have received the message
already or not upon receiving the rpc::closed_error exception.

Currently, we still sleep & resend the message again, so the receiver
might receive the message more than once. We do not have better choice
in this case, if we want the resend to recover the sending error due to
temporary network issue, since failing the whole stream_session due to
failing to send a single message is not wise.

NOTE: If the duplicated message is received when the stream_session is done,
it will be ignored since it can not find the stream_manager anymore.
For message like, STREAM_MUTATION, it is ok to receive twice (we apply the
mutation twice).

TODO: For other messages which uses the retry logic, we need
to make sure it is ok to receive more than once.
2016-01-22 08:20:48 +08:00
Asias He
864c7f636c streaming: Fail the session if fails to send COMPLETE_MESSAGE
We will retry sending COMPLETE_MESSAGE, if it fails even with the
retry, there must be something wrong. Abort the stream_session in this
case.
2016-01-22 07:44:21 +08:00
Asias He
9be671e7f5 streaming: Simplify send_complete_message
The send once logic is open coded. Moved it into
send_complete_message(), so we can simplify the caller.
2016-01-22 07:43:39 +08:00
Asias He
88e99e89d6 streaming: Add more debug info
- Add debug for the peer address info
- Add debug in stream_transfer_task and stream_receive_task
- Add debug when cancel the keep_alive timer
- Add debug for has_active_sessions in stream_result_future::maybe_complete
2016-01-22 07:43:16 +08:00
Pekka Enberg
81996bd10b Merge "Improvements to compaction manager" from Raphael 2016-01-21 20:54:49 +02:00
Raphael S. Carvalho
bb909798bc compaction_manager: introduce can_submit
Purpose is to reuse code and also make it easier to read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:42:23 -02:00
Raphael S. Carvalho
653a07d75d compaction_manager: introduce signal_less_busy_task
Purpose is to reuse code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:31:44 -02:00
Raphael S. Carvalho
2164aa8d5b move compaction manager from /utils to /sstables
Compaction manager was initially created at utils because it was
more generic, and wasn't only intended for compaction.
It was more like a task handler based on futures, but now it's
only intended to manage compaction tasks, and thus should be
moved elsewhere. /sstables is where compaction code is located.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:23:05 -02:00
Pekka Enberg
b5833e8002 Merge "Enable incremental backups option" from Vlad
"This series moves the "backup" logic into the sstable::write_components()
methods, adds a support for enabling backup for sstables flushed in the
compaction flow (in addition to a regular flushing flow which had this support
already) and enables the "incremental_backups" configuration option."

I fixed up a merge conflict with commit 5e953b5 ("Merge "Add support to
stop ongoing compaction" from Raphael").
2016-01-21 18:52:07 +02:00
Pekka Enberg
5e953b5e47 Merge "Add support to stop ongoing compaction" from Raphael
"stop compaction is about temporarily interrupting all ongoing compaction
 of a given type.
 That will also be needed for 'nodetool stop <compaction_type>'.

 The test was about starting scylla, stressing it, stopping compaction using
 the API and checking that scylla was able to recover.

 Scylla will print a message as follow for each compaction that was stopped:
 ERROR [shard 0] compaction_manager - compaction failed: read exception:
 std::runtime_error (Compaction for keyspace1/standard1 was deliberately stopped.)
 INFO  [shard 0] compaction_manager - compaction task handler sleeping for 20 seconds"
2016-01-21 18:34:10 +02:00
Takuya ASADA
fae47ee4a8 dist: fetch CentOS dependencies from koji, update them to latest version
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453378765-19596-1-git-send-email-syuu@scylladb.com>
2016-01-21 15:04:45 +02:00
Asias He
755d792c78 gossip: Wait for gossip timer callback to finish in do_stop_gossiping
Also do not rearm the timer if we stopped the gossip.

Message-Id: <73765857b554d9914e87b24d287ff35ab0af6fce.1453378191.git.asias@scylladb.com>
2016-01-21 14:15:57 +02:00
Vlad Zolotarov
e3d7db5e57 ec2_snitch: complete the EC2Snitch -> Ec2Snitch renaming
The rename started in 72b27a91fe
was not complete. This patch fixes the places that were missed
in the above patch.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1453375025-7512-3-git-send-email-vladz@cloudius-systems.com>
2016-01-21 13:35:30 +02:00
Vlad Zolotarov
9951edde1a locator::ec2_multi_region_snitch: add a get_name() implementation
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1453375025-7512-2-git-send-email-vladz@cloudius-systems.com>
2016-01-21 13:35:29 +02:00
Avi Kivity
43c81db74e Update ami submodule
* dist/ami/files/scylla-ami eb1fdd4...188781c (1):
  > Switch SimpleSnitch to Ec2Snitch
2016-01-21 13:13:23 +02:00
Vlad Zolotarov
de3bb01582 config: allow enabling the incremental backup via .yaml
Enable the incremental_backups/--incremental-backups option.
When enabled there will be a hard link created in the
<column family directory>/backup directory for every flushed
sstable.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:24 +02:00
Vlad Zolotarov
c2ab54e9c7 sstables flushing: enable incremental backup (if requested)
Enable incremental backup when sstables are flushed if
incremental backup has been requested.

It has been enabled in the regular flushing flow before but
wasn't in the compaction flow.

This patch enables it in both places and does it using a
backup capability of sstable::write_components() method(s).

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:20 +02:00
Vlad Zolotarov
cb5c66f264 sstable::write_components(): add a 'backup' parameter
When 'backup' parameter is TRUE - create backup hard
links for a newly written sstables in <sstable dir>/backups/
subdirectory.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:04:45 +02:00
Amnon Heiman
e33710d2ca API: storage_service get_logging_level
This patch adds the get_loggin_level command that returns a map between
the log name and its level.
To test the API do:
curl -X GET "http://localhost:10000/storage_service/logging_level"

this would enable the `nodetool getlogginglevels` command.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453365106-27294-3-git-send-email-amnon@scylladb.com>
2016-01-21 11:58:54 +02:00
Amnon Heiman
ba80121e49 migration_task: rename logger name
Logger name should not contain a space, it causes issues when trying to
modify their level from the nodetool.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453365106-27294-2-git-send-email-amnon@scylladb.com>
2016-01-21 11:58:42 +02:00
Calle Wilund
980681d28e auth: Add a simplistic "schedule" for auth db setup
Only difference from previous sleep is that we will
explicitly delete the objects if the process terminates
before tasks are run. I.e. make ASas happier.

Message-Id: <1453295521-29580-1-git-send-email-calle@scylladb.com>
2016-01-20 19:31:14 +02:00
Raphael S. Carvalho
f001bb0f53 sstables: fix make_checksummed_file_output_stream
Arguments buffer_size and true were accidently inverted.
GCC wasn't complaning because implicit conversion of bool to
int, and vice-versa, is valid.
However, this conversion is not very safe because we could
accidentaly invert parameters.

This should fix the last problem with sstable_test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <9478cd266006fdf8a7bd806f1c612ec9d1297c1f.1453301866.git.raphaelsc@scylladb.com>
2016-01-20 16:01:38 +01:00
Calle Wilund
07f992e42a Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-20 13:31:33 +00:00
Calle Wilund
63b17be4f0 auth_test: Modify yet another case to use "normal" continuation.
test_cassandra_hash also sort of expects exceptions. ASas causes false
positives here as well with seastar::thread, so do it with normal cont.
Message-Id: <1453295521-29580-2-git-send-email-calle@scylladb.com>
2016-01-20 15:15:45 +02:00
Takuya ASADA
2eb12681b0 dist: add 'scylla-gdb' package for CentOS
Fixes #831

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453291407-12232-1-git-send-email-syuu@scylladb.com>
2016-01-20 15:12:33 +02:00
Avi Kivity
7bc3e6ffd0 Merge seastar upstream
* seastar 0516ed0...5c2660b (4):
  > reactor: block all signals early
  > reactor: replace sigprocmask() with pthread_sigmask()
  > fstream: remove unused interface
  > foreign_ptr: remove make_local_and_release()

Fixes #601.
2016-01-20 14:59:52 +02:00
Asias He
1c2d95f2b0 streaming: Remove unused verb handlers
They are never used in scylla.
Message-Id: <1453283955-23691-2-git-send-email-asias@scylladb.com>
2016-01-20 13:58:59 +02:00
Asias He
767e25a686 streaming: Remove the _handlers helper
It is introduced to help to run the invoke_on_all, we can reuse the
distributed<database> db for it.
Message-Id: <1453283955-23691-1-git-send-email-asias@scylladb.com>
2016-01-20 13:58:44 +02:00
Paweł Dziepak
33892943d9 sstables: do not drop row marker when reading mutation
Since 581271a243 "sstables: ignore data
belonging to dropped columns" we silently drop cells if there is no
column in the current schema that they belong to or their timestamp is
older than the column dropped_at value. Originally this check was
applied to row markers as well which caused them to be always dropped
since there is no column in the schema representing these markers.
This patch makes sure that the check whether colum is alive is performed
only if the cell is not a row marker.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453289300-28607-1-git-send-email-pdziepak@scylladb.com>
2016-01-20 12:35:41 +01:00
Calle Wilund
9197a886f8 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-20 09:44:38 +00:00
Takuya ASADA
79b218eb1c dist: use our own CentOS7 Base image
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-4-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:56 +02:00
Takuya ASADA
b9cb91e934 dist: stop ntpd before running ntpdate
New CentOS Base Image runs ntpd by default, so shutdown it before running ntpdate.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-3-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:35 +02:00
Takuya ASADA
98e61a93ef dist: disable SELinux only when it enabled
New CentOS7 Base Image disabled SELinux by default, and running 'setenforce 0' on the image causes error, we won't able to build AMI.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-2-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:01 +02:00
Raphael S. Carvalho
c318f3baa3 sstables: fix sstable::data_stream_at
After 63967db8, offset is ignored when creating a input stream.
Found the problem after sstable_test failed recently.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <56ece21ff6e043e224eb2a6e76cdd422b94821b0.1453232689.git.raphaelsc@scylladb.com>
2016-01-20 09:35:57 +02:00
Raphael S. Carvalho
ff9b1694fe api: implement stop_compaction
stop_compaction is implemented by calling stop_compaction() of
compaction manager for each database.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
5cceb7d249 api: fix paramType of parameter of stop_compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
3bd240d9e8 compaction: add ability to stop an ongoing compaction
That's needed for nodetool stop, which is called to stop all ongoing
compaction. The implementation is about informing an ongoing compaction
that it was asked to stop, so the compaction itself will trigger an
exception. Compaction manager will catch this exception and re-schedule
the compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
ec4c73d451 compaction: rename compaction_stats to compaction_info
compaction_info makes more sense because this structure doesn't
only store stats about ongoing compaction. Soon, we will add
information to it about whether or not an user asked to stop the
respective ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Tomasz Grabiec
bd34adcf22 tests: memory_footprint: Show canonnical_mutation size
Message-Id: <1453227147-21918-1-git-send-email-tgrabiec@scylladb.com>
2016-01-19 20:22:59 +02:00
Tomasz Grabiec
b8c3fa4d46 cql3: Print only column name in error message
Printing column_definition prints all fields of the struct, we want
only name here.
Message-Id: <1453207531-16589-1-git-send-email-tgrabiec@scylladb.com>
2016-01-19 20:22:37 +02:00
Tomasz Grabiec
0596455dc2 Merge branch 'pdziepak/date-timestamp-fixes/v2'
From Paweł:

These patches contain fixes for date and timestamp types:
 - date and timestamp are considered compatible types
 - date type is added to abstract_type::parse_type()
2016-01-19 18:35:09 +01:00
Glauber Costa
63967db8bf sstables: always use a file_*_stream_options in our readers and writes
Instead of using the APIs that explicitly pass things like buffer_size,
always use the options instance instead.

This will make it easier to pass extra options in the future.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <5b04e60ab469c319a17a522694e5bedf806702fe.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:37 +02:00
Glauber Costa
c3ac5257b5 sstables: don't repeat file_writer creation all the time
When this code was originally written, we used to operate on a generic
output_stream. We created a file output stream, and then moved it into
the generic object.

Many patches and reworks later, we now have a file_writer object, but
that pattern was never reworked.

So in a couple of places we have something like this:

    f = file_object acquired by open_file_dma
    auto out = file_writer(std::move(f), 4096);
    auto w = make_shared<file_writer>(std::move(out));

The last statement is just totally redundant. make_shared can create
an object from its parameters without trouble, so we can just pass
the parameter list directly to it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c01801a1fdf37f8ea9a3e5c52cd424e35ba0a80d.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:36 +02:00
Calle Wilund
59bf54d59a commitlog_replayer: Modify logging to more match origin
* Match origin log messages
  - Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
  - At info level, like origin

Prompted by dtest that expects origin log output.

Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>
2016-01-19 17:19:52 +02:00
Avi Kivity
07e0f0a31f Merge "Support schema changes in batchlog manager" from Tomasz
"We need to be able to replay mutations created using older versions of
the table's schema. frozen_mutation can be only read using the version
it was serialized with, and there is no guarantee that the node will
know this version at the time of replay. Currently versions are kept
in-memory so a node forgets all past versions when it restarts. This
was not implemented yet, replay would fail with exception if the
version is unknown."
2016-01-19 17:17:47 +02:00
Calle Wilund
3f4c8d9eea commitlog_replayer: Modify logging to more match origin
* Match origin log messages
  - Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
  - At info level, like origin

Prompted by dtest that expects origin log output.

v2:
* Fixed broken + operator
* Use map_reduce instead of easily readable code
2016-01-19 15:14:21 +00:00
Paweł Dziepak
db30ac8d2d tests/types: add test for timestamp and date compatibility
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 15:34:45 +01:00
Avi Kivity
a2953833dc Merge seastar upstream
* seastar e93cd9d...0516ed0 (9):
  > http: use default file input stream options in file_handler
  > linecount: use default file input stream options
  > fstream: do not pass offset as part options member
  > net: move posix network stack registration to reactor.cc
  > net: throw a human-readable error if using an unregistered network stack
  > io_queue: remove pending_io counter
  > Revert "Merge "Improve rpc server-side statistics""
  > tests: corrections regarding Boost.Test 1.59 compilation failures
  > Merge "Improve rpc server-side statistics"
2016-01-19 16:33:35 +02:00
Calle Wilund
1b4b7aeb66 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-19 13:51:00 +00:00
Paweł Dziepak
900f5338e7 types: make timestamp_type and date_type compatible
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 14:03:15 +01:00
Tomasz Grabiec
ec12b75426 batchlog_manager: Store canonical_mutations
We need to be able to replay mutations created using older versions of
the table's schema. frozen_mutation can be only read using the version
it was serialized with, and there is no guarantee that the node will
know this version at the time of replay. Currently versions kept
in-memory so a node forgets all past versions when it restarts.

To solve this, let's store canonical_mutations which, like data in
sstables, can be read using any later schema version of given table.
2016-01-19 13:46:28 +01:00
Tomasz Grabiec
e21049328f batchlog_manager: Add more debug logging 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
608b606434 canonical_mutation: Introduce column_family_id() getter 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
06d1f4b584 database: Print table name when printing mutation 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
52073d619c database: Add trace-level logging of applied mutations 2016-01-19 13:46:28 +01:00
Paweł Dziepak
a6171d3e99 types: add date type to parse_type()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 13:43:36 +01:00
Paweł Dziepak
f77ab67809 types: use correct name for date_type
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 13:42:53 +01:00
Tomasz Grabiec
d7cb88e0af Merge branch 'pdziepak/fixes-for-alter-table/v1'
From Paweł:

"This series contains some more fixes for issues related to alter table,
namely: incorrect parsing of collection information in comparator, missing
schema::_raw._collections in equality check, missing compatibility
information for utf8->blob, ascii->blob and ascii->utf8 casts."
2016-01-19 13:22:10 +01:00
Calle Wilund
de9f9308a5 auth_test: workaround ASan false error
test_password_authenticator_operations causes ASan failures, in a way
that I am 99% sure is fully false positive, caused by a combo of
seastar threads, exception throwing and externals.

In lieu of actually identifying what ASan flaw causes this and
potentially cure it, for now, lets just re-write the test in question
to not use seastar::async, but normal continuation. Less easy to read,
but passes ASan.
Message-Id: <1453205136-10308-1-git-send-email-calle@scylladb.com>
2016-01-19 13:11:20 +01:00
Calle Wilund
79a5f7b19d auth_test: workaround ASan false error
test_password_authenticator_operations causes ASan failures, in a way
that I am 99% sure is fully false positive, caused by a combo of
seastar threads, exception throwing and externals.

In lieu of actually identifying what ASan flaw causes this and
potentially cure it, for now, lets just re-write the test in question
to not use seastar::async, but normal continuation. Less easy to read,
but passes ASan.
2016-01-19 12:02:50 +00:00
Raphael S. Carvalho
0c67b1d22b compaction: filter out mutation that doesn't belong to shard
When compacting sstable, mutation that doesn't belong to current shard
should be filtered out. Otherwise, mutation would be duplicated in
all shards that share the sstable being compacted.
sstable_test will now run with -c1 because arbitrary keys are chosen
for sstables to be compacted, so test could fail because of mutations
being filtered out.

fixes #527.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1acc2e8b9c66fb9c0c601b05e3ae4353e514ead5.1453140657.git.raphaelsc@scylladb.com>
2016-01-19 10:16:41 +01:00
Vlad Zolotarov
922eb218b1 locator::reconnectable_snitch_helper: don't check messaging_service version
Don't demand the messaging_service version to be the same on both
sides of the connection in order to use internal addresses.

Upstream has a similar change for CASSANDRA-6702 in commit a7cae32 ("Fix
ReconnectableSnitch reconnecting to peers during upgrade").

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1452686729-32629-1-git-send-email-vladz@cloudius-systems.com>
2016-01-19 11:04:37 +02:00
Paweł Dziepak
7c9708953e tests/cql3: add tests for ALTER TABLE with multiple collections
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:24 +01:00
Paweł Dziepak
e249d4eab5 tests/type: add test for simple type compatibility
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:20 +01:00
Paweł Dziepak
440b6d058e types: fix compatibility for text types
bytes_type is_compatible_with utf8_type and ascii_type
utf8_type is_compatible_with ascii_type

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:16 +01:00
Paweł Dziepak
17ca7e06f3 schema: print collection info
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:12 +01:00
Paweł Dziepak
2e2de35dfb schema: add _raw._collections check to operator==()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:08 +01:00
Paweł Dziepak
92dc95b73b schema: fix comparator parsing
The correct format of collection information in comparator is:

o.a.c.db.m.ColumnToCollection(<name1>:<type1>, <name2>:<type2>, ...)

not:

o.a.c.db.m.ColumnToCollection(<name1>:<type1>),
o.a.c.db.m.ColumnToCollection(<name2>:<type2>) ...

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:05 +01:00
Amnon Heiman
9be42bfd7b API: Add version to application state in failure_detection
The upstream of origin adds the version to the application_state in the
get_endpoints in the failure detector.

In our implementation we return an object to the jmx proxy and the proxy
do the string formatting.

This patch adds the version to the return object which is both useful as
an API and will allow the jmx proxy to add it to its output when we move
forward with the jmx version.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1448962889-19611-1-git-send-email-amnon@scylladb.com>
2016-01-19 10:23:56 +02:00
Tomasz Grabiec
5a1587353f tests: Don't depend on partition_key representation
Representation format is an implementation detail of
partition_key. Code which compares a value to representation makes
assumptions about key's representation. Compare keys to keys instead.
Message-Id: <1453136316-18125-1-git-send-email-tgrabiec@scylladb.com>
2016-01-18 19:01:56 +02:00
Pekka Enberg
2ca8606b4e streaming/stream_session: Don't stop stream manager
We cannot stop the stream manager because it's accessible via the API
server during shutdown, for example, which can cause a SIGSEGV.

Spotted by ASan.
Message-Id: <1453130811-22540-1-git-send-email-penberg@scylladb.com>
2016-01-18 16:34:19 +01:00
Pekka Enberg
422cff5e00 api/messaging_service: Fix heap-buffer-overflows in set_messaging_service()
Fix various issues in set_messaging_service() that caused
heap-buffer-overflows when JMX proxy connects to Scylla API:

  - Off-by-one error in 'num_verb' definition

  - Call to initializer list std::vector constructor variant that caused
    the vector to be two elements long.

  - Missing verb definitions from the Swagger definition that caused
    response vector to be too small.

Spotted by ASan.
Message-Id: <1453125439-16703-1-git-send-email-penberg@scylladb.com>
2016-01-18 15:43:29 +01:00
Pekka Enberg
3723beb302 service/storage_service: Fix typos in logger messages
Message-Id: <1453128076-18613-1-git-send-email-penberg@scylladb.com>
2016-01-18 15:43:04 +01:00
Takuya ASADA
d5d5857b62 dist: extend coredump size limit
16GB is not enough for some larger machines, so extend it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453115792-21989-2-git-send-email-syuu@scylladb.com>
2016-01-18 13:38:43 +02:00
Takuya ASADA
023c6dc620 dist: preserve environment variable when running scylla_prepare on sudo
sysconfig parameters are passed via environment variables, but sudo resets it by default.
Need to keep them beyond sudo.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453115792-21989-1-git-send-email-syuu@scylladb.com>
2016-01-18 13:22:14 +02:00
Gleb Natapov
dde2e80a20 storage_proxy: remove batchlog synchronously
Wait for batchlog removal before completing a query otherwise batchlog
removal queries may accumulate. Still ignore an error if it happens
since it is not critical, but log it.

Message-Id: <20160118095642.GB6705@scylladb.com>
2016-01-18 12:38:12 +02:00
Avi Kivity
221ef4536c messaging service: limit rpc server resources
Otherwise, a slow node can be overwhelmed by other nodes and run out of
memory.

Fixes #596.
Message-Id: <1452776394-13682-1-git-send-email-avi@scylladb.com>
2016-01-18 11:16:45 +02:00
Avi Kivity
a881e596fa Merge "Ubuntu dependency packages fix" from Takuya 2016-01-18 11:13:18 +02:00
Gleb Natapov
f97eed0c94 fix batch size checking
warn_threshold is in kbytes v.size is in bytes size is in kbytes.

Message-Id: <20160118090620.GZ6705@scylladb.com>
2016-01-18 11:08:13 +02:00
Avi Kivity
5313a28044 Merge "Fix re-addinig collections" from Paweł
"This series makes sure that Scylla rejects adding a collections if
its column name is the same as a collection that existed before and
their types are incompatible.

Fixes #782"
2016-01-18 10:58:40 +02:00
Tomasz Grabiec
237819c31f logalloc: Excluded zones' free segments in lsa/byres-non_lsa_used_space
Historically the purpose of the metric is to show how much memory is
in standard allocations. After zones were introduced, this would also
include free space in lsa zones, which is almost all memory, and thus
the metric lost its original meaning. This change brings it back to
its original meaning.

Message-Id: <1452865125-4033-1-git-send-email-tgrabiec@scylladb.com>
2016-01-18 10:48:14 +02:00
Takuya ASADA
5270da1eef dist: prevent use abrt with scylla
scylla should use with systemd-coredump, not abrt.
Fixes #762

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452713081-32492-1-git-send-email-syuu@scylladb.com>
2016-01-18 10:47:11 +02:00
Pekka Enberg
7d3a3bd201 Merge "column family cleanup support" from Raphael
"This patch is intended to add support to column family cleanup, which will
 make 'nodetool cleanup' possible.

 Why is this feature needed? Remove irrelevant data from a node that loses part
 of its token range to a newly added node."
2016-01-18 10:15:05 +02:00
Pekka Enberg
6cc02242f6 Merge "Multi schema support in commit log" from Paweł
"This series adds support for multiple schema versions to the commit log.
 All segments contain column mappings of all schema versions used by the
 mutations contained in the segment, which are necessary in order to be
 able to read frozen mutations and upgrade them to the current schema
 version."
2016-01-18 10:11:26 +02:00
Avi Kivity
d5050e4c6a storage_proxy: make MUTATION and MUTATION_DONE verbs sychronous at the server side
While MUTATION and MUTATION_DONE are asynchronous by nature (when a MUTATION
completes, it sends a MUTATION_DONE message instead of responding
synchronously), we still want them to be synchronous at the server side
wrt. the RPC server itself.  This is because RPC accounts for resources
consumed by the handler only while the handler is executing; if we return
immediately, and let the code execute asynchronously, RPC believes no
resources are consumed and can instantiate more handlers than the shard
has resources for.

Fix by changing the return type of the handlers to future<no_wait_type>
(from a plain no_wait_type), and making that future complete when local
processing is over.

Ref #596.
Message-Id: <1453048967-5286-1-git-send-email-avi@scylladb.com>
2016-01-18 09:59:34 +02:00
Nadav Har'El
d97cbbbe43 repair: forbid repair with "-dc" not including the current host
Theoretically, one could want to repair a single host *and* all the hosts
in one or more other data centers which don't include this host. However,
Cassandra's "nodetool repair" explicitly does not allow this, and fails if
given a list of data centers (via the "-dc" option) which doesn't include
the host starting the repair. So we need to behave like "nodetool repair"
and fail in this case too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1453037016-25775-1-git-send-email-nyh@scylladb.com>
2016-01-18 09:54:16 +02:00
Paweł Dziepak
fa7bef72d4 tests/cql3: add tests for ALTER TABLE validation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:50 +01:00
Paweł Dziepak
b7e58db7ec tests: allow any future in assert_that_failed()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:44 +01:00
Paweł Dziepak
00f7a873a5 cql3: forbid re-adding collection with incompatible type
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:38 +01:00
Paweł Dziepak
4927ff95da schema: read collections from comparator
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:33 +01:00
Paweł Dziepak
725129deb7 type_parser: accept sstring_view
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:27 +01:00
Paweł Dziepak
6372a22064 schema: use _raw._collections to generate comparator name
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:03 +01:00
Paweł Dziepak
84840c1c98 schema: keep track of removed collections
Cassandra disallows adding a column with the same name as a collection
that existed in the past in that table if the types aren't compatible.
To enforce that Scylla needs to keep track of all collections that ever
existed in the column family.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:34:29 +01:00
Avi Kivity
c4cf4e0bcd Merge seastar upstream
* seastar a8183c1...e93cd9d (2):
  > rpc: make sure we serialize on _resources_available sempahore
  > rpc: fix support for handlers returning future<no_wait_type>
2016-01-17 18:36:22 +02:00
Avi Kivity
249dbc1d8e Merge seastar upstream
* seastar 6f9453d...a8183c1 (2):
  > rpc: fix server losing handler
  > Merge "Fair I/O Queue" from Glauber
2016-01-17 14:21:53 +02:00
Takuya ASADA
01309c0dd8 dist: add missing dependency (xfslibs-dev) for Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
9ad3365353 dist: use gdebi to resolve install-time dependencies
Since we switched to use mk-build-deps, it only resolves build-time dependencies.
We also need to install install-time dependencies.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
705285cf27 dist: resolve build time dependency by mk-build-deps command, do not install them manually
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
90be81f9ba dist: add missing build time dependency for thrift package on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Tomasz Grabiec
d332fcaefc row_cache: Restore indentation 2016-01-15 15:33:17 +01:00
Tomasz Grabiec
6b3cd35109 Merge branch 'pdziepak/multi-schema-sstables/v1'
From Paweł:

This series add support for reading sstables using different schema than
the one that was used to write them.
2016-01-15 14:23:18 +01:00
Paweł Dziepak
dbf23fdff5 tests/sstable: add test for multi schema
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Paweł Dziepak
cfc0a132a9 sstable: handle multi-cell vs atomic incompatibilities
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Paweł Dziepak
581271a243 sstables: ignore data belonging to dropped columns
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Asias He
e10580f474 cql_server: Fix connection shutdown
_fd is of type connected_socket. shutdown_input() and shutdown_output()
return future<>. Do not ignore the future.

Message-Id: <786eee890541a18d3501ecd52415f2900c545157.1452835922.git.asias@scylladb.com>
2016-01-15 11:37:30 +02:00
Tomasz Grabiec
ccd609185f sstables: Add ability to wait for async sstable cleanup tasks
This patch adds a function which waits for the background cleanup work
which is started from sstable destructors.

We wait for those cleanups on reactor exit so that unit tests don't
leak. This fixes erratic ASAN complaint about memory leak when running
schema_change_test in debug mode:

    Indirect leak of 64 byte(s) in 1 object(s) allocated from:
         0x7fab24413912 in operator new(unsigned long) (/lib64/libasan.so.2+0x99912)
         0x1776aeb in make_unique<continuation<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> >, future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /usr/include/c++/5.1.1/bits/unique_ptr.h:765
         0x1752b69 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:513
        0x1711365 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:690
        0x16d0474 in then_wrapped<future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>, future<> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:880
        0x1696e9c in handle_exception<sstables::sstable::~sstable()::<lambda(auto:52)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:1012
        0x1638ba8 in sstables::sstable::~sstable() sstables/sstables.cc:1619

The leak is about allocations related to close() syscall tasks invoked
from sstable destructor, which were not waited for.

Message-Id: <1452783887-25244-1-git-send-email-tgrabiec@scylladb.com>
2016-01-15 11:32:15 +02:00
Calle Wilund
e935c9cd34 select_statement: Make sure all aggregate queries use paging
Mainly to make sure we respect row limits. Since normal result
generation does not for aggregates.

Fixes #752 

Message-Id: <1452681048-30171-2-git-send-email-calle@scylladb.com>
2016-01-14 19:03:37 +02:00
Calle Wilund
1dc5937f40 query_pagers: fix log message in requires_paging
message would state that all queries required paging, even when
returning the opposite to caller

Message-Id: <1452681048-30171-1-git-send-email-calle@scylladb.com>
2016-01-14 19:03:16 +02:00
Asias He
cc3073b42d gossip: cleanup application_state
Drop the unused one.

Message-Id: <4cc45164d55742951b618d2c7b1e8bdb997f005a.1452771260.git.asias@scylladb.com>
2016-01-14 19:01:51 +02:00
Avi Kivity
d47a58cc32 README: add libxml2 and libpciaccess packages to list or required packages
Needed for link stage.
2016-01-14 17:47:48 +02:00
Avi Kivity
cf7e6cede2 README: add hwloc and numactl to install recommendations 2016-01-14 17:30:43 +02:00
Tomasz Grabiec
b7976f3b82 config: Set default logging level to info
Commit d7b403db1f changed the default in
logging::logger. It affected tests but not scylla binary, where it's
being overwritten in main.cc.
Message-Id: <1452777008-21708-1-git-send-email-tgrabiec@scylladb.com>
2016-01-14 15:11:58 +02:00
Pekka Enberg
9306f4eb22 Merge "Disable ALTER TABLE statement unless --experimental=on" from Tomek 2016-01-14 14:30:20 +02:00
Avi Kivity
cf8ab65fbc Merge seastar upstream
* seastar 43e64c2...6f9453d (2):
  > Merge "rpc resource accounting"
  > core: Introduce smp::invoke_on_all()
2016-01-14 14:28:27 +02:00
Asias He
826b6ed877 gossip: Print node status in handle_major_state_change
Message-Id: <1452768680-32355-1-git-send-email-asias@scylladb.com>
2016-01-14 14:22:37 +02:00
Asias He
e7a899f5f3 gossip: Enable debug msg for convcit
Kill one FIXME in convict

Message-Id: <1452768680-32355-2-git-send-email-asias@scylladb.com>
2016-01-14 14:22:36 +02:00
Tomasz Grabiec
054f1df0a5 cql3: Disable ALTER TABLE unless experimental features are on 2016-01-14 13:21:13 +01:00
Tomasz Grabiec
1fd03ea1d2 tests: cql_test_env: Enable experimental features 2016-01-14 13:21:13 +01:00
Tomasz Grabiec
a13aaa62df config: Add 'experimental' switch 2016-01-14 13:21:13 +01:00
Paweł Dziepak
218898b297 commitlog: upgrade mutations during commitlog replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:26 +01:00
Paweł Dziepak
661849dbc3 commitlog: learn about schema versions during replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:23 +01:00
Paweł Dziepak
55d342181a commitlog: do not skip entries inside a chunk
All entries inside a chunk needs to be read since any of them may
contain column mapping.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:23:00 +01:00
Paweł Dziepak
18d0a57bf4 commitlog: use commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:20:06 +01:00
Paweł Dziepak
a877905bd4 commitlog: allow adding entries using commitlog_entry_writer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:17:45 +01:00
Paweł Dziepak
0254c3e30b commitlog: add commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:49 +01:00
Paweł Dziepak
434c02cdfa commitlog: keep track of schema versions
Each segment chunk should contain column mappings for all schema
versions used by the mutations it contains. In order to avoid
duplication db::commitlog::segment remembers all schema versions already
written in current chunk.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:41 +01:00
Paweł Dziepak
9d74268234 commitlog: introduce entry_writer
Current commitlog interface requires writers to specify the size of a
new entry which cannot depend on the segment to which the entry is
written.
If column mappings are going to be stored in the commitlog that's not
enough since we don't know whether column mapping needs to be written
until we known in which segment the entry is going to be stored.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:26 +01:00
Raphael S. Carvalho
fc6a1934b0 api: implement force_keyspace_cleanup
This will add support for an user to clean up an entire keyspace
or some of its column families.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 03:53:22 -02:00
Raphael S. Carvalho
a5c90194f5 db: add support to clean up a column family
Cleanup is a procedure that will discard irrelevant keys from
all sstables of a column family, thus saving disk space.
Scylla will clean up a sstable by using compaction code, in
which this sstable will be the only input used.
Compaction manager was changed to become aware of cleanup, such
that it will be able to schedule cleanup requests and also know
how to handle them properly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 03:53:04 -02:00
Raphael S. Carvalho
d44a5d1e94 compaction: filter out compacting sstables
The implementation is about storing generation of compacting sstables
in an unordered set per column family, so before strategy is called,
compaction manager will filter out compacting sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 01:18:29 -02:00
Raphael S. Carvalho
9c13c1c738 compaction: move compaction execution from strategy to manager
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 00:04:27 -02:00
Raphael S. Carvalho
68619211f5 tests: add test to a sstable rewrite
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 21:43:41 -02:00
Raphael S. Carvalho
ed80ed82ef sstables: prepare compact_sstables to work with cleanup
Cleanup is about rewriting a sstable discarding any keys that
are irrelevant, i.e. keys that don't belong to current node.
Parameter cleanup was added to compact_sstables.
If set to true, irrelevant code such as the one that updates
compaction history will be skipped. Logic was also added to
discard irrelevant keys.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 21:43:40 -02:00
Raphael S. Carvalho
5c674091dc db: move code that rebuilds sstable list to a function
That code will be used by column family cleanup, so let's put
that code into a function. This change also improves the code
readability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 19:51:04 -02:00
Raphael S. Carvalho
58189dd489 db: move generation calculation code to a function
Code that calculates generation should be put in a function.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 19:51:02 -02:00
254 changed files with 6557 additions and 6114 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

103
IDL.md Normal file
View File

@@ -0,0 +1,103 @@
#IDL definition
The schema we use similar to c++ schema.
Use class or struct similar to the object you need the serializer for.
Use namespace when applicable.
##keywords
* class/struct - a class or a struct like C++
class/struct can have final or stub marker
* namespace - has the same C++ meaning
* enum class - has the same C++ meaning
* final modifier for class - when a class mark as final it will not contain a size parameter. Note that final class cannot be extended by future version, so use with care
* stub class - when a class is mark as stub, it means that no code will be generated for this class and it is only there as a documentation.
* version attributes - mark with [[version id ]] mark that a field is available from a specific version
* template - A template class definition like C++
##Syntax
###Namespace
```
namespace ns_name { namespace-body }
```
* ns_name: either a previously unused identifier, in which case this is original-namespace-definition or the name of a namespace, in which case this is extension-namespace-definition
* namespace-body: possibly empty sequence of declarations of any kind (including class and struct definitions as well as nested namespaces)
###class/struct
`
class-key class-name final(optional) stub(optional) { member-specification } ;(optional)
`
* class-key: one of class or struct.
* class-name: the name of the class that's being defined. optionally followed by keyword final, optionally followed by keyword stub
* final: when a class mark as final, it means it can not be extended and there is no need to serialize its size, use with care.
* stub: when a class is mark as stub, it means no code will generate for it and it is added for documentation only.
* member-specification: list of access specifiers, and public member accessor see class member below.
* to be compatible with C++ a class definition can be followed by a semicolon.
###enum
`enum-key identifier enum-base { enumerator-list(optional) }`
* enum-key: only enum class is supported
* identifier: the name of the enumeration that's being declared.
* enum-base: colon (:), followed by a type-specifier-seq that names an integral type (see the C++ standard for the full list of all possible integral types).
* enumerator-list: comma-separated list of enumerator definitions, each of which is either simply an identifier, which becomes the name of the enumerator, or an identifier with an initializer: identifier = integral value.
Note that though C++ allows constexpr as an initialize value, it makes the documentation less readable, hence is not permitted.
###class member
`type member-access attributes(optional) default-value(optional);`
* type: Any valid C++ type, following the C++ notation. note that there should be a serializer for the type, but deceleration order is not mandatory
* member-access: is the way the member can be access. If the member is public it can be the name itself. if not it could be a getter function that should be followed by braces. Note that getter can (and probably should) be const methods.
* attributes: Attributes define by square brackets. Currently are use to mark a version in which a specific member was added [ [ version version-number] ] would mark that the specific member was added in the given version number.
###template
`template < parameter-list > class-declaration`
* parameter-list - a non-empty comma-separated list of the template parameters.
* class-decleration - (See class section) The class name declared become a template name.
##IDL example
Forward slashes comments are ignored until the end of the line.
```
namespace utils {
// An example of a stub class
class UUID stub {
int64_t most_sig_bits;
int64_t least_sig_bits;
}
}
namespace gms {
//an enum example
enum class application_state:int {STATUS = 0,
LOAD,
SCHEMA,
DC};
// example of final class
class versioned_value final {
// getter and setter as public member
int version;
sstring value;
}
class heart_beat_state {
//getter as function
int32_t get_generation();
//default value example
int32_t get_heart_beat_version() = 1;
}
class endpoint_state {
heart_beat_state get_heart_beat_state();
std::map<application_state, versioned_value> get_application_state_map();
}
class gossip_digest {
inet_address get_endpoint();
int32_t get_generation();
//mark that a field was added on a specific version
int32_t get_max_version() [ [version 0.14.2] ];
}
class gossip_digest_ack {
std::vector<gossip_digest> digests();
std::map<inet_address, gms::endpoint_state> get_endpoint_state_map();
}
}
```

View File

@@ -15,7 +15,7 @@ git submodule update --recursive
* Installing required packages:
```
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing
```
* Build Scylla

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=666.development
VERSION=0.18.2
if test -f version
then

View File

@@ -106,7 +106,7 @@
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"string"
"paramType":"query"
}
]
}

View File

@@ -196,6 +196,10 @@
"value": {
"type": "string",
"description": "The version value"
},
"version": {
"type": "int",
"description": "The application state version"
}
}
}

View File

@@ -234,12 +234,12 @@
"type":"string",
"enum":[
"CLIENT_ID",
"ECHO",
"MUTATION",
"MUTATION_DONE",
"READ_DATA",
"READ_MUTATION_DATA",
"READ_DIGEST",
"GOSSIP_ECHO",
"GOSSIP_DIGEST_SYN",
"GOSSIP_DIGEST_ACK2",
"GOSSIP_SHUTDOWN",
@@ -247,13 +247,13 @@
"TRUNCATE",
"REPLICATION_FINISHED",
"MIGRATION_REQUEST",
"STREAM_INIT_MESSAGE",
"PREPARE_MESSAGE",
"PREPARE_DONE_MESSAGE",
"STREAM_MUTATION",
"STREAM_MUTATION_DONE",
"COMPLETE_MESSAGE",
"LAST"
"REPAIR_CHECKSUM_RANGE",
"GET_SCHEMA_VERSION"
]
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright 2015 ScyllaDB
*/
/*
@@ -52,67 +52,98 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
return std::make_unique<reply>();
}
future<> set_server(http_context& ctx) {
future<> set_server_init(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
r.register_exeption_handler(exception_reply);
httpd::directory_handler* dir = new httpd::directory_handler(ctx.api_dir,
new content_replace("html"));
r.put(GET, "/ui", new httpd::file_handler(ctx.api_dir + "/index.html",
new content_replace("html")));
r.add(GET, url("/ui").remainder("path"), dir);
rb->set_api_doc(r);
rb->register_function(r, "storage_service",
"The storage service API");
set_storage_service(ctx,r);
rb->register_function(r, "commitlog",
"The commit log API");
set_commitlog(ctx,r);
rb->register_function(r, "gossiper",
"The gossiper API");
set_gossiper(ctx,r);
rb->register_function(r, "column_family",
"The column family API");
set_column_family(ctx, r);
rb->register_function(r, "lsa", "Log-structured allocator API");
set_lsa(ctx, r);
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx,r);
rb->register_function(r, "messaging_service",
"The messaging service API");
set_messaging_service(ctx, r);
rb->register_function(r, "storage_proxy",
"The storage proxy API");
set_storage_proxy(ctx, r);
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
rb->register_function(r, "collectd",
"The collectd API");
set_collectd(ctx, r);
rb->register_function(r, "endpoint_snitch_info",
"The endpoint snitch info API");
set_endpoint_snitch(ctx, r);
rb->register_function(r, "compaction_manager",
"The Compaction manager API");
set_compaction_manager(ctx, r);
rb->register_function(r, "hinted_handoff",
"The hinted handoff API");
set_hinted_handoff(ctx, r);
rb->register_function(r, "stream_manager",
"The stream manager API");
set_stream_manager(ctx, r);
r.add(GET, url("/ui").remainder("path"), new httpd::directory_handler(ctx.api_dir,
new content_replace("html")));
rb->register_function(r, "system",
"The system related API");
set_system(ctx, r);
rb->set_api_doc(r);
});
}
static future<> register_api(http_context& ctx, const sstring& api_name,
const sstring api_desc,
std::function<void(http_context& ctx, routes& r)> f) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, api_name, api_desc, f](routes& r) {
rb->register_function(r, api_name, api_desc);
f(ctx,r);
});
}
future<> set_server_storage_service(http_context& ctx) {
return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
}
future<> set_server_gossip(http_context& ctx) {
return register_api(ctx, "gossiper",
"The gossiper API", set_gossiper);
}
future<> set_server_load_sstable(http_context& ctx) {
return register_api(ctx, "column_family",
"The column family API", set_column_family);
}
future<> set_server_messaging_service(http_context& ctx) {
return register_api(ctx, "messaging_service",
"The messaging service API", set_messaging_service);
}
future<> set_server_storage_proxy(http_context& ctx) {
return register_api(ctx, "storage_proxy",
"The storage proxy API", set_storage_proxy);
}
future<> set_server_stream_manager(http_context& ctx) {
return register_api(ctx, "stream_manager",
"The stream manager API", set_stream_manager);
}
future<> set_server_gossip_settle(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx,r);
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
rb->register_function(r, "endpoint_snitch_info",
"The endpoint snitch info API");
set_endpoint_snitch(ctx, r);
});
}
future<> set_server_done(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
rb->register_function(r, "compaction_manager",
"The Compaction manager API");
set_compaction_manager(ctx, r);
rb->register_function(r, "lsa", "Log-structured allocator API");
set_lsa(ctx, r);
rb->register_function(r, "commitlog",
"The commit log API");
set_commitlog(ctx,r);
rb->register_function(r, "hinted_handoff",
"The hinted handoff API");
set_hinted_handoff(ctx, r);
rb->register_function(r, "collectd",
"The collectd API");
set_collectd(ctx, r);
});
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright 2015 ScyllaDB
*/
/*
@@ -21,31 +21,17 @@
#pragma once
#include "http/httpd.hh"
#include "json/json_elements.hh"
#include "database.hh"
#include "service/storage_proxy.hh"
#include <boost/lexical_cast.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include "api/api-doc/utils.json.hh"
#include "utils/histogram.hh"
#include "http/exception.hh"
#include "api_init.hh"
namespace api {
struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
http_context(distributed<database>& _db, distributed<service::storage_proxy>&
_sp) : db(_db), sp(_sp) {}
};
future<> set_server(http_context& ctx);
template<class T>
std::vector<sstring> container_to_vec(const T& container) {
std::vector<sstring> res;

51
api/api_init.hh Normal file
View File

@@ -0,0 +1,51 @@
/*
* Copyright 2016 ScylaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "database.hh"
#include "service/storage_proxy.hh"
#include "http/httpd.hh"
namespace api {
struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp)
: db(_db), sp(_sp) {
}
};
future<> set_server_init(http_context& ctx);
future<> set_server_storage_service(http_context& ctx);
future<> set_server_gossip(http_context& ctx);
future<> set_server_load_sstable(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx);
future<> set_server_stream_manager(http_context& ctx);
future<> set_server_gossip_settle(http_context& ctx);
future<> set_server_done(http_context& ctx);
}

View File

@@ -49,7 +49,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
s.ks = c->ks;
s.cf = c->cf;
s.unit = "keys";
s.task_type = "compaction";
s.task_type = sstables::compaction_name(c->type);
s.completed = c->total_keys_written;
s.total = c->total_partitions;
summaries.push_back(std::move(s));
@@ -67,11 +67,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
cm::stop_compaction.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
warn(unimplemented::cause::API);
return make_ready_future<json::json_return_type>("");
cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<request> req) {
auto type = req->get_query_param("type");
return ctx.db.invoke_on_all([type] (database& db) {
auto& cm = db.get_compaction_manager();
cm.stop_compaction(type);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {

View File

@@ -44,6 +44,7 @@ void set_failure_detector(http_context& ctx, routes& r) {
// method that the state index are static but the name can be changed.
version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(a.first);
version_val.value = a.second.value;
version_val.version = a.second.version;
val.application_state.push(version_val);
}
res.push_back(val);

View File

@@ -34,7 +34,7 @@ namespace api {
using shard_info = messaging_service::shard_info;
using msg_addr = messaging_service::msg_addr;
static const int32_t num_verb = static_cast<int32_t>(messaging_verb::LAST) + 1;
static const int32_t num_verb = static_cast<int32_t>(messaging_verb::LAST);
std::vector<message_counter> map_to_message_counters(
const std::unordered_map<gms::inet_address, unsigned long>& map) {
@@ -124,7 +124,7 @@ void set_messaging_service(http_context& ctx, routes& r) {
});
get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb, 0);
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);
return net::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
for (auto i = 0; i < num_verb; i++) {
@@ -137,8 +137,12 @@ void set_messaging_service(http_context& ctx, routes& r) {
for (auto i : verb_counter::verb_wrapper::all_items()) {
verb_counter c;
messaging_verb v = i; // for type safety we use messaging_verb values
if ((*map)[static_cast<int32_t>(v)] > 0) {
c.count = (*map)[static_cast<int32_t>(v)];
auto idx = static_cast<uint32_t>(v);
if (idx >= map->size()) {
throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));
}
if ((*map)[idx] > 0) {
c.count = (*map)[idx];
c.verb = i;
res.push_back(c);
}

View File

@@ -30,6 +30,7 @@
#include "repair/repair.hh"
#include "locator/snitch_base.hh"
#include "column_family.hh"
#include "log.hh"
namespace api {
@@ -271,15 +272,21 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::force_keyspace_cleanup.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
// FIXME
// the nodetool clean up is used in many tests
// this workaround willl let it work until
// a cleanup is implemented
warn(unimplemented::cause::API);
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
return make_ready_future<json::json_return_type>(0);
auto column_families = split_cf(req->get_query_param("cf"));
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto entry : column_families) {
column_family* cf = &db.find_column_family(keyspace, entry);
cm.submit_cleanup_job(cf);
}
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -398,9 +405,13 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_logging_levels.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
std::vector<ss::mapper> res;
for (auto i : logging::logger_registry().get_all_logger_names()) {
ss::mapper log;
log.key = i;
log.value = logging::level_name(logging::logger_registry().get_logger_level(i));
res.push_back(log);
}
return make_ready_future<json::json_return_type>(res);
});

View File

@@ -47,7 +47,7 @@ static hs::progress_info get_progress_info(const streaming::progress_info& info)
res.direction = info.dir;
res.file_name = info.file_name;
res.peer = boost::lexical_cast<std::string>(info.peer);
res.session_index = info.session_index;
res.session_index = 0;
res.total_bytes = info.total_bytes;
return res;
}
@@ -70,7 +70,7 @@ static hs::stream_state get_state(
for (auto info : result_future.get_coordinator().get()->get_all_session_info()) {
hs::stream_info si;
si.peer = boost::lexical_cast<std::string>(info.peer);
si.session_index = info.session_index;
si.session_index = 0;
si.state = info.state;
si.connecting = si.peer;
set_summaries(info.receiving_summaries, si.receiving_summaries);
@@ -109,14 +109,16 @@ void set_stream_manager(http_context& ctx, routes& r) {
});
hs::get_total_incoming_bytes.set(r, [](std::unique_ptr<request> req) {
gms::inet_address ep(req->param["peer"]);
utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);
return streaming::get_stream_manager().map_reduce0([plan_id](streaming::stream_manager& stream) {
gms::inet_address peer(req->param["peer"]);
return streaming::get_stream_manager().map_reduce0([peer](streaming::stream_manager& sm) {
int64_t res = 0;
streaming::stream_result_future* s = stream.get_receiving_stream(plan_id).get();
if (s != nullptr) {
for (auto si: s->get_coordinator()->get_all_session_info()) {
res += si.get_total_size_received();
for (auto sr : sm.get_all_streams()) {
if (sr) {
for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
if (session->peer == peer) {
res += session->get_bytes_received();
}
}
}
}
return res;
@@ -126,12 +128,12 @@ void set_stream_manager(http_context& ctx, routes& r) {
});
hs::get_all_total_incoming_bytes.set(r, [](std::unique_ptr<request> req) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& sm) {
int64_t res = 0;
for (auto s : stream.get_receiving_streams()) {
if (s.second.get() != nullptr) {
for (auto si: s.second.get()->get_coordinator()->get_all_session_info()) {
res += si.get_total_size_received();
for (auto sr : sm.get_all_streams()) {
if (sr) {
for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
res += session->get_bytes_received();
}
}
}
@@ -142,14 +144,16 @@ void set_stream_manager(http_context& ctx, routes& r) {
});
hs::get_total_outgoing_bytes.set(r, [](std::unique_ptr<request> req) {
gms::inet_address ep(req->param["peer"]);
utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);
return streaming::get_stream_manager().map_reduce0([plan_id](streaming::stream_manager& stream) {
gms::inet_address peer(req->param["peer"]);
return streaming::get_stream_manager().map_reduce0([peer](streaming::stream_manager& sm) {
int64_t res = 0;
streaming::stream_result_future* s = stream.get_sending_stream(plan_id).get();
if (s != nullptr) {
for (auto si: s->get_coordinator()->get_all_session_info()) {
res += si.get_total_size_received();
for (auto sr : sm.get_all_streams()) {
if (sr) {
for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
if (session->peer == peer) {
res += session->get_bytes_sent();
}
}
}
}
return res;
@@ -159,12 +163,12 @@ void set_stream_manager(http_context& ctx, routes& r) {
});
hs::get_all_total_outgoing_bytes.set(r, [](std::unique_ptr<request> req) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& sm) {
int64_t res = 0;
for (auto s : stream.get_initiated_streams()) {
if (s.second.get() != nullptr) {
for (auto si: s.second.get()->get_coordinator()->get_all_session_info()) {
res += si.get_total_size_received();
for (auto sr : sm.get_all_streams()) {
if (sr) {
for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
res += session->get_bytes_sent();
}
}
}

View File

@@ -91,6 +91,62 @@ class auth_migration_listener : public service::migration_listener {
static auth_migration_listener auth_migration;
/**
* Poor mans job schedule. For maximum 2 jobs. Sic.
* Still does nothing more clever than waiting 10 seconds
* like origin, then runs the submitted tasks.
*
* Only difference compared to sleep (from which this
* borrows _heavily_) is that if tasks have not run by the time
* we exit (and do static clean up) we delete the promise + cont
*
* Should be abstracted to some sort of global server function
* probably.
*/
void auth::auth::schedule_when_up(scheduled_func f) {
struct waiter {
promise<> done;
timer<> tmr;
waiter() : tmr([this] {done.set_value();})
{
tmr.arm(SUPERUSER_SETUP_DELAY);
}
~waiter() {
if (tmr.armed()) {
tmr.cancel();
done.set_exception(std::runtime_error("shutting down"));
}
logger.trace("Deleting scheduled task");
}
void kill() {
}
};
typedef std::unique_ptr<waiter> waiter_ptr;
static thread_local std::vector<waiter_ptr> waiters;
logger.trace("Adding scheduled task");
waiters.emplace_back(std::make_unique<waiter>());
auto* w = waiters.back().get();
w->done.get_future().finally([w] {
auto i = std::find_if(waiters.begin(), waiters.end(), [w](const waiter_ptr& p) {
return p.get() == w;
});
if (i != waiters.end()) {
waiters.erase(i);
}
}).then([f = std::move(f)] {
logger.trace("Running scheduled task");
return f();
}).handle_exception([](auto ep) {
return make_ready_future();
});
}
bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
if (type == classname) {
return true;
@@ -128,7 +184,7 @@ future<> auth::auth::setup() {
}).then([] {
service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
// instead of once-timer, just schedule this later
sleep(SUPERUSER_SETUP_DELAY).then([] {
schedule_when_up([] {
// setup default super user
return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {

View File

@@ -112,5 +112,9 @@ public:
static future<> setup_table(const sstring& name, const sstring& cql);
static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
// For internal use. Run function "when system is up".
typedef std::function<future<>()> scheduled_func;
static void schedule_when_up(scheduled_func);
};
}

View File

@@ -160,8 +160,8 @@ future<> auth::password_authenticator::init() {
return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
// instead of once-timer, just schedule this later
sleep(auth::SUPERUSER_SETUP_DELAY).then([] {
auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
auth::schedule_when_up([] {
return auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
auth::AUTH_KS,

View File

@@ -42,6 +42,14 @@ private:
struct chunk {
// FIXME: group fragment pointers to reduce pointer chasing when packetizing
std::unique_ptr<chunk> next;
~chunk() {
auto p = std::move(next);
while (p) {
// Avoid recursion when freeing chunks
auto p_next = std::move(p->next);
p = std::move(p_next);
}
}
size_type offset; // Also means "size" after chunk is closed
size_type size;
value_type data[0];
@@ -206,6 +214,10 @@ public:
}
}
void write(const char* ptr, size_t size) {
write(bytes_view(reinterpret_cast<const signed char*>(ptr), size));
}
// Writes given sequence of bytes with a preceding length component encoded in big-endian format
inline void write_blob(bytes_view v) {
assert((size_type)v.size() == v.size());

View File

@@ -53,6 +53,11 @@ canonical_mutation::canonical_mutation(const mutation& m)
}())
{ }
utils::UUID canonical_mutation::column_family_id() const {
data_input in(_data);
return db::serializer<utils::UUID>::read(in);
}
mutation canonical_mutation::to_mutation(schema_ptr s) const {
data_input in(_data);

View File

@@ -49,16 +49,10 @@ public:
// is not intended, user should sync the schema first.
mutation to_mutation(schema_ptr) const;
utils::UUID column_family_id() const;
friend class db::serializer<canonical_mutation>;
};
//
//template<>
//struct hash<canonical_mutation> {
// template<typename Hasher>
// void operator()(Hasher& h, const canonical_mutation& m) const {
// m.feed_hash(h);
// }
//};
namespace db {

View File

@@ -34,6 +34,8 @@ enum class compaction_strategy_type {
};
class compaction_strategy_impl;
class sstable;
struct compaction_descriptor;
class compaction_strategy {
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
@@ -46,7 +48,9 @@ public:
compaction_strategy(compaction_strategy&&);
compaction_strategy& operator=(compaction_strategy&&);
future<> compact(column_family& cfs);
// Return a list of sstables to be compacted after applying the strategy.
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
static sstring name(compaction_strategy_type type) {
switch (type) {
case compaction_strategy_type::null:

View File

@@ -26,29 +26,10 @@
#include <algorithm>
#include <vector>
#include <boost/range/iterator_range.hpp>
#include <boost/range/adaptor/transformed.hpp>
#include "utils/serialization.hh"
#include "unimplemented.hh"
// value_traits is meant to abstract away whether we are working on 'bytes'
// elements or 'bytes_opt' elements. We don't support optional values, but
// there are some generic layers which use this code which provide us with
// data in that format. In order to avoid allocation and rewriting that data
// into a new vector just to throw it away soon after that, we accept that
// format too.
template <typename T>
struct value_traits {
static const T& unwrap(const T& t) { return t; }
};
template<>
struct value_traits<bytes_opt> {
static const bytes& unwrap(const bytes_opt& t) {
assert(t);
return *t;
}
};
enum class allow_prefixes { no, yes };
template<allow_prefixes AllowPrefixes = allow_prefixes::no>
@@ -68,7 +49,7 @@ public:
, _byte_order_equal(std::all_of(_types.begin(), _types.end(), [] (auto t) {
return t->is_byte_order_equal();
}))
, _byte_order_comparable(!is_prefixable && _types.size() == 1 && _types[0]->is_byte_order_comparable())
, _byte_order_comparable(false)
, _is_reversed(_types.size() == 1 && _types[0]->is_reversed())
{ }
@@ -88,76 +69,47 @@ public:
/*
* Format:
* <len(value1)><value1><len(value2)><value2>...<len(value_n-1)><value_n-1>(len(value_n))?<value_n>
* <len(value1)><value1><len(value2)><value2>...<len(value_n)><value_n>
*
* For non-prefixable compounds, the value corresponding to the last component of types doesn't
* have its length encoded, its length is deduced from the input range.
*
* serialize_value() and serialize_optionals() for single element rely on the fact that for a single-element
* compounds their serialized form is equal to the serialized form of the component.
*/
template<typename Wrapped>
void serialize_value(const std::vector<Wrapped>& values, bytes::iterator& out) {
if (AllowPrefixes == allow_prefixes::yes) {
assert(values.size() <= _types.size());
} else {
assert(values.size() == _types.size());
}
size_t n_left = _types.size();
for (auto&& wrapped : values) {
auto&& val = value_traits<Wrapped>::unwrap(wrapped);
template<typename RangeOfSerializedComponents>
static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out) {
for (auto&& val : values) {
assert(val.size() <= std::numeric_limits<uint16_t>::max());
if (--n_left || AllowPrefixes == allow_prefixes::yes) {
write<uint16_t>(out, uint16_t(val.size()));
}
write<uint16_t>(out, uint16_t(val.size()));
out = std::copy(val.begin(), val.end(), out);
}
}
template <typename Wrapped>
size_t serialized_size(const std::vector<Wrapped>& values) {
template <typename RangeOfSerializedComponents>
static size_t serialized_size(RangeOfSerializedComponents&& values) {
size_t len = 0;
size_t n_left = _types.size();
for (auto&& wrapped : values) {
auto&& val = value_traits<Wrapped>::unwrap(wrapped);
for (auto&& val : values) {
assert(val.size() <= std::numeric_limits<uint16_t>::max());
if (--n_left || AllowPrefixes == allow_prefixes::yes) {
len += sizeof(uint16_t);
}
len += val.size();
len += sizeof(uint16_t) + val.size();
}
return len;
}
bytes serialize_single(bytes&& v) {
if (AllowPrefixes == allow_prefixes::no) {
assert(_types.size() == 1);
return std::move(v);
} else {
// FIXME: Optimize
std::vector<bytes> vec;
vec.reserve(1);
vec.emplace_back(std::move(v));
return ::serialize_value(*this, vec);
}
return serialize_value({std::move(v)});
}
bytes serialize_value(const std::vector<bytes>& values) {
return ::serialize_value(*this, values);
template<typename RangeOfSerializedComponents>
static bytes serialize_value(RangeOfSerializedComponents&& values) {
bytes b(bytes::initialized_later(), serialized_size(values));
auto i = b.begin();
serialize_value(values, i);
return b;
}
bytes serialize_value(std::vector<bytes>&& values) {
if (AllowPrefixes == allow_prefixes::no && _types.size() == 1 && values.size() == 1) {
return std::move(values[0]);
}
return ::serialize_value(*this, values);
template<typename T>
static bytes serialize_value(std::initializer_list<T> values) {
return serialize_value(boost::make_iterator_range(values.begin(), values.end()));
}
bytes serialize_optionals(const std::vector<bytes_opt>& values) {
return ::serialize_value(*this, values);
}
bytes serialize_optionals(std::vector<bytes_opt>&& values) {
if (AllowPrefixes == allow_prefixes::no && _types.size() == 1 && values.size() == 1) {
assert(values[0]);
return std::move(*values[0]);
}
return ::serialize_value(*this, values);
return serialize_value(values | boost::adaptors::transformed([] (const bytes_opt& bo) -> bytes_view {
if (!bo) {
throw std::logic_error("attempted to create key component from empty optional");
}
return *bo;
}));
}
bytes serialize_value_deep(const std::vector<data_value>& values) {
// TODO: Optimize
@@ -171,35 +123,19 @@ public:
return serialize_value(partial);
}
bytes decompose_value(const value_type& values) {
return ::serialize_value(*this, values);
return serialize_value(values);
}
class iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
private:
ssize_t _types_left;
bytes_view _v;
value_type _current;
private:
void read_current() {
if (_types_left == 0) {
if (!_v.empty()) {
throw marshal_exception();
}
_v = bytes_view(nullptr, 0);
return;
}
--_types_left;
uint16_t len;
if (_types_left == 0 && AllowPrefixes == allow_prefixes::no) {
len = _v.size();
} else {
{
if (_v.empty()) {
if (AllowPrefixes == allow_prefixes::yes) {
_types_left = 0;
_v = bytes_view(nullptr, 0);
return;
} else {
throw marshal_exception();
}
_v = bytes_view(nullptr, 0);
return;
}
len = read_simple<uint16_t>(_v);
if (_v.size() < len) {
@@ -211,10 +147,10 @@ public:
}
public:
struct end_iterator_tag {};
iterator(const compound_type& t, const bytes_view& v) : _types_left(t._types.size()), _v(v) {
iterator(const bytes_view& v) : _v(v) {
read_current();
}
iterator(end_iterator_tag, const bytes_view& v) : _types_left(0), _v(nullptr, 0) {}
iterator(end_iterator_tag, const bytes_view& v) : _v(nullptr, 0) {}
iterator& operator++() {
read_current();
return *this;
@@ -226,21 +162,18 @@ public:
}
const value_type& operator*() const { return _current; }
const value_type* operator->() const { return &_current; }
bool operator!=(const iterator& i) const { return _v.begin() != i._v.begin() || _types_left != i._types_left; }
bool operator==(const iterator& i) const { return _v.begin() == i._v.begin() && _types_left == i._types_left; }
bool operator!=(const iterator& i) const { return _v.begin() != i._v.begin(); }
bool operator==(const iterator& i) const { return _v.begin() == i._v.begin(); }
};
iterator begin(const bytes_view& v) const {
return iterator(*this, v);
static iterator begin(const bytes_view& v) {
return iterator(v);
}
iterator end(const bytes_view& v) const {
static iterator end(const bytes_view& v) {
return iterator(typename iterator::end_iterator_tag(), v);
}
boost::iterator_range<iterator> components(const bytes_view& v) const {
static boost::iterator_range<iterator> components(const bytes_view& v) {
return { begin(v), end(v) };
}
auto iter_items(const bytes_view& v) {
return boost::iterator_range<iterator>(begin(v), end(v));
}
value_type deserialize_value(bytes_view v) {
std::vector<bytes> result;
result.reserve(_types.size());
@@ -258,7 +191,7 @@ public:
}
auto t = _types.begin();
size_t h = 0;
for (auto&& value : iter_items(v)) {
for (auto&& value : components(v)) {
h ^= (*t)->hash(value);
++t;
}
@@ -277,12 +210,6 @@ public:
return type->compare(v1, v2);
});
}
bytes from_string(sstring_view s) {
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
sstring to_string(const bytes& b) {
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
// Retruns true iff given prefix has no missing components
bool is_full(bytes_view v) const {
assert(AllowPrefixes == allow_prefixes::yes);

View File

@@ -25,6 +25,31 @@ from distutils.spawn import find_executable
configure_args = str.join(' ', [shlex.quote(x) for x in sys.argv[1:]])
for line in open('/etc/os-release'):
key, _, value = line.partition('=')
value = value.strip().strip('"')
if key == 'ID':
os_ids = [value]
if key == 'ID_LIKE':
os_ids += value.split(' ')
# distribution "internationalization", converting package names.
# Fedora name is key, values is distro -> package name dict.
i18n_xlat = {
'boost-devel': {
'debian': 'libboost-dev',
'ubuntu': 'libboost-dev (libboost1.55-dev on 14.04)',
},
}
def pkgname(name):
if name in i18n_xlat:
dict = i18n_xlat[name]
for id in os_ids:
if id in dict:
return dict[id]
return name
def get_flags():
with open('/proc/cpuinfo') as f:
for line in f:
@@ -269,6 +294,7 @@ scylla_core = (['database.cc',
'sstables/partition.cc',
'sstables/filter.cc',
'sstables/compaction.cc',
'sstables/compaction_manager.cc',
'log.cc',
'transport/event.cc',
'transport/event_notifier.cc',
@@ -316,6 +342,7 @@ scylla_core = (['database.cc',
'utils/big_decimal.cc',
'types.cc',
'validation.cc',
'service/priority_manager.cc',
'service/migration_manager.cc',
'service/storage_proxy.cc',
'cql3/operator.cc',
@@ -353,7 +380,6 @@ scylla_core = (['database.cc',
'utils/bloom_filter.cc',
'utils/bloom_calculations.cc',
'utils/rate_limiter.cc',
'utils/compaction_manager.cc',
'utils/file_lock.cc',
'utils/dynamic_bitset.cc',
'gms/version_generator.cc',
@@ -390,11 +416,9 @@ scylla_core = (['database.cc',
'service/client_state.cc',
'service/migration_task.cc',
'service/storage_service.cc',
'service/pending_range_calculator_service.cc',
'service/load_broadcaster.cc',
'service/pager/paging_state.cc',
'service/pager/query_pagers.cc',
'streaming/streaming.cc',
'streaming/stream_task.cc',
'streaming/stream_session.cc',
'streaming/stream_request.cc',
@@ -407,13 +431,6 @@ scylla_core = (['database.cc',
'streaming/stream_coordinator.cc',
'streaming/stream_manager.cc',
'streaming/stream_result_future.cc',
'streaming/messages/stream_init_message.cc',
'streaming/messages/retry_message.cc',
'streaming/messages/received_message.cc',
'streaming/messages/prepare_message.cc',
'streaming/messages/file_message_header.cc',
'streaming/messages/outgoing_file_message.cc',
'streaming/messages/incoming_file_message.cc',
'streaming/stream_session_state.cc',
'gc_clock.cc',
'partition_slice_builder.cc',
@@ -466,7 +483,23 @@ api = ['api/api.cc',
'api/system.cc'
]
scylla_tests_dependencies = scylla_core + [
idls = ['idl/gossip_digest.idl.hh',
'idl/uuid.idl.hh',
'idl/range.idl.hh',
'idl/keys.idl.hh',
'idl/read_command.idl.hh',
'idl/token.idl.hh',
'idl/ring_position.idl.hh',
'idl/result.idl.hh',
'idl/frozen_mutation.idl.hh',
'idl/reconcilable_result.idl.hh',
'idl/streaming.idl.hh',
'idl/paging_state.idl.hh',
'idl/frozen_schema.idl.hh',
'idl/partition_checksum.idl.hh',
]
scylla_tests_dependencies = scylla_core + api + idls + [
'tests/cql_test_env.cc',
'tests/cql_assertions.cc',
'tests/result_set_assertions.cc',
@@ -479,11 +512,10 @@ scylla_tests_seastar_deps = [
]
deps = {
'scylla': ['main.cc'] + scylla_core + api,
'scylla': idls + ['main.cc'] + scylla_core + api,
}
tests_not_using_seastar_test_framework = set([
'tests/types_test',
'tests/keys_test',
'tests/partitioner_test',
'tests/map_difference_test',
@@ -550,16 +582,44 @@ else:
args.pie = ''
args.fpie = ''
optional_packages = ['libsystemd']
# a list element means a list of alternative packages to consider
# the first element becomes the HAVE_pkg define
# a string element is a package name with no alternatives
optional_packages = [['libsystemd', 'libsystemd-daemon']]
pkgs = []
for pkg in optional_packages:
if have_pkg(pkg):
pkgs.append(pkg)
upkg = pkg.upper().replace('-', '_')
defines.append('HAVE_{}=1'.format(upkg))
else:
print('Missing optional package {pkg}'.format(**locals()))
def setup_first_pkg_of_list(pkglist):
# The HAVE_pkg symbol is taken from the first alternative
upkg = pkglist[0].upper().replace('-', '_')
for pkg in pkglist:
if have_pkg(pkg):
pkgs.append(pkg)
defines.append('HAVE_{}=1'.format(upkg))
return True
return False
for pkglist in optional_packages:
if isinstance(pkglist, str):
pkglist = [pkglist]
if not setup_first_pkg_of_list(pkglist):
if len(pkglist) == 1:
print('Missing optional package {pkglist[0]}'.format(**locals()))
else:
alternatives = ':'.join(pkglist[1:])
print('Missing optional package {pkglist[0]} (or alteratives {alternatives})'.format(**locals()))
if not try_compile(compiler=args.cxx, source='#include <boost/version.hpp>'):
print('Boost not installed. Please install {}.'.format(pkgname("boost-devel")))
sys.exit(1)
if not try_compile(compiler=args.cxx, source='''\
#include <boost/version.hpp>
#if BOOST_VERSION < 105500
#error Boost version too low
#endif
'''):
print('Installed boost version too old. Please update {}.'.format(pkgname("boost-devel")))
sys.exit(1)
defines = ' '.join(['-D' + d for d in defines])
@@ -657,6 +717,9 @@ with open(buildfile, 'w') as f:
rule swagger
command = seastar/json/json2code.py -f $in -o $out
description = SWAGGER $out
rule serializer
command = ./idl-compiler.py --ns ser -f $in -o $out
description = IDL compiler $out
rule ninja
command = {ninja} -C $subdir $target
restat = 1
@@ -693,6 +756,7 @@ with open(buildfile, 'w') as f:
compiles = {}
ragels = {}
swaggers = {}
serializers = {}
thrifts = set()
antlr3_grammars = set()
for binary in build_artifacts:
@@ -746,6 +810,9 @@ with open(buildfile, 'w') as f:
elif src.endswith('.rl'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
ragels[hh] = src
elif src.endswith('.idl.hh'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
serializers[hh] = src
elif src.endswith('.json'):
hh = '$builddir/' + mode + '/gen/' + src + '.hh'
swaggers[hh] = src
@@ -764,6 +831,7 @@ with open(buildfile, 'w') as f:
for g in antlr3_grammars:
gen_headers += g.headers('$builddir/{}/gen'.format(mode))
gen_headers += list(swaggers.keys())
gen_headers += list(serializers.keys())
f.write('build {}: cxx.{} {} || {} \n'.format(obj, mode, src, ' '.join(gen_headers)))
if src in extra_cxxflags:
f.write(' cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode = mode, extra_cxxflags = extra_cxxflags[src], **modeval))
@@ -773,6 +841,9 @@ with open(buildfile, 'w') as f:
for hh in swaggers:
src = swaggers[hh]
f.write('build {}: swagger {}\n'.format(hh,src))
for hh in serializers:
src = serializers[hh]
f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh,src))
for thrift in thrifts:
outs = ' '.join(thrift.generated('$builddir/{}/gen'.format(mode)))
f.write('build {}: thrift.{} {}\n'.format(outs, mode, thrift.source))

View File

@@ -259,7 +259,10 @@ lists::setter_by_index::execute(mutation& m, const exploded_clustering_prefix& p
// we should not get here for frozen lists
assert(column.type->is_multi_cell()); // "Attempted to set an individual element on a frozen list";
auto row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
auto index = _idx->bind_and_get(params._options);
auto value = _t->bind_and_get(params._options);
@@ -269,8 +272,7 @@ lists::setter_by_index::execute(mutation& m, const exploded_clustering_prefix& p
}
auto idx = net::ntoh(int32_t(*unaligned_cast<int32_t>(index->begin())));
auto existing_list_opt = params.get_prefetched_list(m.key(), row_key, column);
auto&& existing_list_opt = params.get_prefetched_list(m.key(), std::move(row_key), column);
if (!existing_list_opt) {
throw exceptions::invalid_request_exception("Attempted to set an element on a list which is null");
}
@@ -383,8 +385,13 @@ lists::discarder::requires_read() {
void
lists::discarder::execute(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
assert(column.type->is_multi_cell()); // "Attempted to delete from a frozen list";
auto&& row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
auto&& existing_list = params.get_prefetched_list(m.key(), row_key, column);
std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
auto&& existing_list = params.get_prefetched_list(m.key(), std::move(row_key), column);
// We want to call bind before possibly returning to reject queries where the value provided is not a list.
auto&& value = _t->bind(params._options);
@@ -444,8 +451,11 @@ lists::discarder_by_index::execute(mutation& m, const exploded_clustering_prefix
auto cvalue = dynamic_pointer_cast<constants::value>(index);
assert(cvalue);
auto row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
auto&& existing_list = params.get_prefetched_list(m.key(), row_key, column);
std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
auto&& existing_list = params.get_prefetched_list(m.key(), std::move(row_key), column);
int32_t idx = read_simple_exactly<int32_t>(*cvalue->_bytes);
if (!existing_list) {
throw exceptions::invalid_request_exception("Attempted to delete an element from a list which is null");

View File

@@ -258,16 +258,14 @@ sets::adder::do_add(mutation& m, const exploded_clustering_prefix& row_key, cons
auto smut = set_type->serialize_mutation_form(mut);
m.set_cell(row_key, column, std::move(smut));
} else {
} else if (set_value != nullptr) {
// for frozen sets, we're overwriting the whole cell
auto v = set_type->serialize_partially_deserialized_form(
{set_value->_elements.begin(), set_value->_elements.end()},
serialization_format::internal());
if (set_value->_elements.empty()) {
m.set_cell(row_key, column, params.make_dead_cell());
} else {
m.set_cell(row_key, column, params.make_cell(std::move(v)));
}
m.set_cell(row_key, column, params.make_cell(std::move(v)));
} else {
m.set_cell(row_key, column, params.make_dead_cell());
}
}

View File

@@ -42,6 +42,7 @@
#include "cql3/statements/alter_table_statement.hh"
#include "service/migration_manager.hh"
#include "validation.hh"
#include "db/config.hh"
namespace cql3 {
@@ -77,9 +78,13 @@ void alter_table_statement::validate(distributed<service::storage_proxy>& proxy,
// validated in announce_migration()
}
static const sstring ALTER_TABLE_FEATURE = "ALTER TABLE";
future<bool> alter_table_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only)
{
auto& db = proxy.local().get_db().local();
db.get_config().check_experimental(ALTER_TABLE_FEATURE);
auto schema = validation::validate_column_family(db, keyspace(), column_family());
auto cfm = schema_builder(schema);
@@ -132,6 +137,12 @@ future<bool> alter_table_statement::announce_migration(distributed<service::stor
if (schema->is_super()) {
throw exceptions::invalid_request_exception("Cannot use non-frozen collections with super column families");
}
auto it = schema->collections().find(column_name->name());
if (it != schema->collections().end() && !type->is_compatible_with(*it->second)) {
throw exceptions::invalid_request_exception(sprint("Cannot add a collection with the name %s "
"because a collection with the same name and a different type has already been used in the past", column_name));
}
}
cfm.with_column(column_name->name(), type, _is_static ? column_kind::static_column : column_kind::regular_column);

View File

@@ -88,7 +88,7 @@ void batch_statement::verify_batch_size(const std::vector<mutation>& mutations)
auto size = v.size / 1024;
if (v.size > warn_threshold) {
if (size > warn_threshold) {
std::unordered_set<sstring> ks_cf_pairs;
for (auto&& m : mutations) {
ks_cf_pairs.insert(m.schema()->ks_name() + "." + m.schema()->cf_name());

View File

@@ -186,11 +186,23 @@ modification_statement::make_update_parameters(
class prefetch_data_builder {
update_parameters::prefetch_data& _data;
const query::partition_slice& _ps;
schema_ptr _schema;
std::experimental::optional<partition_key> _pkey;
private:
void add_cell(update_parameters::prefetch_data::row& cells, const column_definition& def, const std::experimental::optional<collection_mutation_view>& cell) {
if (cell) {
auto ctype = static_pointer_cast<const collection_type_impl>(def.type);
if (!ctype->is_multi_cell()) {
throw std::logic_error(sprint("cannot prefetch frozen collection: %s", def.name_as_text()));
}
cells.emplace(def.id, collection_mutation{*cell});
}
};
public:
prefetch_data_builder(update_parameters::prefetch_data& data, const query::partition_slice& ps)
prefetch_data_builder(schema_ptr s, update_parameters::prefetch_data& data, const query::partition_slice& ps)
: _data(data)
, _ps(ps)
, _schema(std::move(s))
{ }
void accept_new_partition(const partition_key& key, uint32_t row_count) {
@@ -205,20 +217,9 @@ public:
const query::result_row_view& row) {
update_parameters::prefetch_data::row cells;
auto add_cell = [&cells] (column_id id, std::experimental::optional<collection_mutation_view>&& cell) {
if (cell) {
cells.emplace(id, collection_mutation{to_bytes(cell->data)});
}
};
auto static_row_iterator = static_row.iterator();
for (auto&& id : _ps.static_columns) {
add_cell(id, static_row_iterator.next_collection_cell());
}
auto row_iterator = row.iterator();
for (auto&& id : _ps.regular_columns) {
add_cell(id, row_iterator.next_collection_cell());
add_cell(cells, _schema->regular_column_at(id), row_iterator.next_collection_cell());
}
_data.rows.emplace(std::make_pair(*_pkey, key), std::move(cells));
@@ -228,7 +229,16 @@ public:
assert(0);
}
void accept_partition_end(const query::result_row_view& static_row) {}
void accept_partition_end(const query::result_row_view& static_row) {
update_parameters::prefetch_data::row cells;
auto static_row_iterator = static_row.iterator();
for (auto&& id : _ps.static_columns) {
add_cell(cells, _schema->static_column_at(id), static_row_iterator.next_collection_cell());
}
_data.rows.emplace(std::make_pair(*_pkey, std::experimental::nullopt), std::move(cells));
}
};
future<update_parameters::prefetched_rows_type>
@@ -278,7 +288,7 @@ modification_statement::read_required_rows(
bytes_ostream buf(result->buf());
query::result_view v(buf.linearize());
auto prefetched_rows = update_parameters::prefetched_rows_type({update_parameters::prefetch_data(s)});
v.consume(ps, prefetch_data_builder(prefetched_rows.value(), ps));
v.consume(ps, prefetch_data_builder(s, prefetched_rows.value(), ps));
return prefetched_rows;
});
}

View File

@@ -226,15 +226,16 @@ select_statement::execute(distributed<service::storage_proxy>& proxy, service::q
// An aggregation query will never be paged for the user, but we always page it internally to avoid OOM.
// If we user provided a page_size we'll use that to page internally (because why not), otherwise we use our default
// Note that if there are some nodes in the cluster with a version less than 2.0, we can't use paging (CASSANDRA-6707).
if (_selection->is_aggregate() && page_size <= 0) {
auto aggregate = _selection->is_aggregate();
if (aggregate && page_size <= 0) {
page_size = DEFAULT_COUNT_PAGE_SIZE;
}
auto key_ranges = _restrictions->get_partition_key_ranges(options);
if (page_size <= 0
if (!aggregate && (page_size <= 0
|| !service::pager::query_pagers::may_need_paging(page_size,
*command, key_ranges)) {
*command, key_ranges))) {
return execute(proxy, command, std::move(key_ranges), state, options,
now);
}
@@ -242,7 +243,7 @@ select_statement::execute(distributed<service::storage_proxy>& proxy, service::q
auto p = service::pager::query_pagers::pager(_schema, _selection,
state, options, command, std::move(key_ranges));
if (_selection->is_aggregate()) {
if (aggregate) {
return do_with(
cql3::selection::result_set_builder(*_selection, now,
options.get_serialization_format()),
@@ -528,9 +529,12 @@ select_statement::raw_statement::get_ordering_comparator(schema_ptr schema,
}
bool select_statement::raw_statement::is_reversed(schema_ptr schema) {
std::experimental::optional<bool> reversed_map[schema->clustering_key_size()];
uint32_t i = 0;
assert(_parameters->orderings().size() > 0);
parameters::orderings_type::size_type i = 0;
bool is_reversed_ = false;
bool relation_order_unsupported = false;
for (auto&& e : _parameters->orderings()) {
::shared_ptr<column_identifier> column = e.first->prepare_column_identifier(schema);
bool reversed = e.second;
@@ -550,32 +554,23 @@ bool select_statement::raw_statement::is_reversed(schema_ptr schema) {
"Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY");
}
reversed_map[i] = std::experimental::make_optional(reversed != def->type->is_reversed());
bool current_reverse_status = (reversed != def->type->is_reversed());
if (i == 0) {
is_reversed_ = current_reverse_status;
}
if (is_reversed_ != current_reverse_status) {
relation_order_unsupported = true;
}
++i;
}
// GCC incorrenctly complains about "*is_reversed_" below
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
// Check that all bool in reversedMap, if set, agrees
std::experimental::optional<bool> is_reversed_{};
for (auto&& b : reversed_map) {
if (b) {
if (!is_reversed_) {
is_reversed_ = b;
} else {
if ((*is_reversed_) != *b) {
throw exceptions::invalid_request_exception("Unsupported order by relation");
}
}
}
if (relation_order_unsupported) {
throw exceptions::invalid_request_exception("Unsupported order by relation");
}
assert(is_reversed_);
return *is_reversed_;
#pragma GCC diagnostic pop
return is_reversed_;
}
/** If ALLOW FILTERING was not specified, this verifies that it is not needed */

View File

@@ -59,7 +59,7 @@ bool update_statement::require_full_clustering_key() const {
void update_statement::add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
if (s->is_dense()) {
if (!prefix || (prefix.size() == 1 && prefix.components().front().empty())) {
throw exceptions::invalid_request_exception(sprint("Missing PRIMARY KEY part %s", *s->clustering_key_columns().begin()));
throw exceptions::invalid_request_exception(sprint("Missing PRIMARY KEY part %s", s->clustering_key_columns().begin()->name_as_text()));
}
// An empty name for the compact value is what we use to recognize the case where there is not column

View File

@@ -45,15 +45,15 @@ namespace cql3 {
std::experimental::optional<collection_mutation_view>
update_parameters::get_prefetched_list(
const partition_key& pkey,
const clustering_key& row_key,
partition_key pkey,
std::experimental::optional<clustering_key> ckey,
const column_definition& column) const
{
if (!_prefetched) {
return {};
}
auto i = _prefetched->rows.find(std::make_pair(pkey, row_key));
auto i = _prefetched->rows.find(std::make_pair(std::move(pkey), std::move(ckey)));
if (i == _prefetched->rows.end()) {
return {};
}

View File

@@ -58,8 +58,9 @@ namespace cql3 {
*/
class update_parameters final {
public:
// Holder for data needed by CQL list updates which depend on current state of the list.
struct prefetch_data {
using key = std::pair<partition_key, clustering_key>;
using key = std::pair<partition_key, std::experimental::optional<clustering_key>>;
struct key_hashing {
partition_key::hashing pk_hash;
clustering_key::hashing ck_hash;
@@ -70,7 +71,7 @@ public:
{ }
size_t operator()(const key& k) const {
return pk_hash(k.first) ^ ck_hash(k.second);
return pk_hash(k.first) ^ (k.second ? ck_hash(*k.second) : 0);
}
};
struct key_equality {
@@ -83,7 +84,8 @@ public:
{ }
bool operator()(const key& k1, const key& k2) const {
return pk_eq(k1.first, k2.first) && ck_eq(k1.second, k2.second);
return pk_eq(k1.first, k2.first)
&& bool(k1.second) == bool(k2.second) && (!k1.second || ck_eq(*k1.second, *k2.second));
}
};
using row = std::unordered_map<column_id, collection_mutation>;
@@ -183,8 +185,11 @@ public:
return _timestamp;
}
std::experimental::optional<collection_mutation_view> get_prefetched_list(
const partition_key& pkey, const clustering_key& row_key, const column_definition& column) const;
std::experimental::optional<collection_mutation_view>
get_prefetched_list(
partition_key pkey,
std::experimental::optional<clustering_key> ckey,
const column_definition& column) const;
};
}

View File

@@ -23,6 +23,7 @@
#include "database.hh"
#include "unimplemented.hh"
#include "core/future-util.hh"
#include "db/commitlog/commitlog_entry.hh"
#include "db/system_keyspace.hh"
#include "db/consistency_level.hh"
#include "db/serializer.hh"
@@ -58,6 +59,7 @@
#include "utils/latency.hh"
#include "utils/flush_queue.hh"
#include "schema_registry.hh"
#include "service/priority_manager.hh"
using namespace std::chrono_literals;
@@ -127,9 +129,9 @@ column_family::make_partition_presence_checker(lw_shared_ptr<sstable_list> old_s
mutation_source
column_family::sstables_as_mutation_source() {
return [this] (schema_ptr s, const query::partition_range& r) {
return make_sstable_reader(std::move(s), r);
};
return mutation_source([this] (schema_ptr s, const query::partition_range& r, const io_priority_class& pc) {
return make_sstable_reader(std::move(s), r, pc);
});
}
// define in .cc, since sstable is forward-declared in .hh
@@ -154,10 +156,14 @@ class range_sstable_reader final : public mutation_reader::impl {
const query::partition_range& _pr;
lw_shared_ptr<sstable_list> _sstables;
mutation_reader _reader;
// Use a pointer instead of copying, so we don't need to regenerate the reader if
// the priority changes.
const io_priority_class* _pc;
public:
range_sstable_reader(schema_ptr s, lw_shared_ptr<sstable_list> sstables, const query::partition_range& pr)
range_sstable_reader(schema_ptr s, lw_shared_ptr<sstable_list> sstables, const query::partition_range& pr, const io_priority_class& pc)
: _pr(pr)
, _sstables(std::move(sstables))
, _pc(&pc)
{
std::vector<mutation_reader> readers;
for (const lw_shared_ptr<sstables::sstable>& sst : *_sstables | boost::adaptors::map_values) {
@@ -184,11 +190,15 @@ class single_key_sstable_reader final : public mutation_reader::impl {
mutation_opt _m;
bool _done = false;
lw_shared_ptr<sstable_list> _sstables;
// Use a pointer instead of copying, so we don't need to regenerate the reader if
// the priority changes.
const io_priority_class* _pc;
public:
single_key_sstable_reader(schema_ptr schema, lw_shared_ptr<sstable_list> sstables, const partition_key& key)
single_key_sstable_reader(schema_ptr schema, lw_shared_ptr<sstable_list> sstables, const partition_key& key, const io_priority_class& pc)
: _schema(std::move(schema))
, _key(sstables::key::from_partition_key(*_schema, key))
, _sstables(std::move(sstables))
, _pc(&pc)
{ }
virtual future<mutation_opt> operator()() override {
@@ -207,26 +217,26 @@ public:
};
mutation_reader
column_family::make_sstable_reader(schema_ptr s, const query::partition_range& pr) const {
column_family::make_sstable_reader(schema_ptr s, const query::partition_range& pr, const io_priority_class& pc) const {
if (pr.is_singular() && pr.start()->value().has_key()) {
const dht::ring_position& pos = pr.start()->value();
if (dht::shard_of(pos.token()) != engine().cpu_id()) {
return make_empty_reader(); // range doesn't belong to this shard
}
return make_mutation_reader<single_key_sstable_reader>(std::move(s), _sstables, *pos.key());
return make_mutation_reader<single_key_sstable_reader>(std::move(s), _sstables, *pos.key(), pc);
} else {
// range_sstable_reader is not movable so we need to wrap it
return make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr);
return make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr, pc);
}
}
key_source column_family::sstables_as_key_source() const {
return [this] (const query::partition_range& range) {
return key_source([this] (const query::partition_range& range, const io_priority_class& pc) {
std::vector<key_reader> readers;
readers.reserve(_sstables->size());
std::transform(_sstables->begin(), _sstables->end(), std::back_inserter(readers), [&] (auto&& entry) {
auto& sst = entry.second;
auto rd = sstables::make_key_reader(_schema, sst, range);
auto rd = sstables::make_key_reader(_schema, sst, range, pc);
if (sst->is_shared()) {
rd = make_filtering_reader(std::move(rd), [] (const dht::decorated_key& dk) {
return dht::shard_of(dk.token()) == engine().cpu_id();
@@ -235,7 +245,7 @@ key_source column_family::sstables_as_key_source() const {
return rd;
});
return make_combined_reader(_schema, std::move(readers));
};
});
}
// Exposed for testing, not performance critical.
@@ -275,7 +285,7 @@ column_family::find_row(schema_ptr s, const dht::decorated_key& partition_key, c
}
mutation_reader
column_family::make_reader(schema_ptr s, const query::partition_range& range) const {
column_family::make_reader(schema_ptr s, const query::partition_range& range, const io_priority_class& pc) const {
if (query::is_wrap_around(range, *s)) {
// make_combined_reader() can't handle streams that wrap around yet.
fail(unimplemented::cause::WRAP_AROUND);
@@ -309,14 +319,15 @@ column_family::make_reader(schema_ptr s, const query::partition_range& range) co
}
if (_config.enable_cache) {
readers.emplace_back(_cache.make_reader(s, range));
readers.emplace_back(_cache.make_reader(s, range, pc));
} else {
readers.emplace_back(make_sstable_reader(s, range));
readers.emplace_back(make_sstable_reader(s, range, pc));
}
return make_combined_reader(std::move(readers));
}
// Not performance critical. Currently used for testing only.
template <typename Func>
future<bool>
column_family::for_all_partitions(schema_ptr s, Func&& func) const {
@@ -463,7 +474,15 @@ future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sst
}
update_sstables_known_generation(comps.generation);
assert(_sstables->count(comps.generation) == 0);
{
auto i = _sstables->find(comps.generation);
if (i != _sstables->end()) {
auto new_toc = sstdir + "/" + fname;
throw std::runtime_error(sprint("Attempted to add sstable generation %d twice: new=%s existing=%s",
comps.generation, new_toc, i->second->toc_filename()));
}
}
auto fut = sstable::get_sstable_key_range(*_schema, _schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
return std::move(fut).then([this, sstdir = std::move(sstdir), comps] (range<partition_key> r) {
@@ -570,9 +589,7 @@ column_family::seal_active_memtable() {
future<stop_iteration>
column_family::try_flush_memtable_to_sstable(lw_shared_ptr<memtable> old) {
// FIXME: better way of ensuring we don't attempt to
// overwrite an existing table.
auto gen = _sstable_generation++ * smp::count + engine().cpu_id();
auto gen = calculate_generation_for_new_table();
auto newtab = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(),
_config.datadir, gen,
@@ -585,27 +602,20 @@ column_family::try_flush_memtable_to_sstable(lw_shared_ptr<memtable> old) {
_config.cf_stats->pending_memtables_flushes_bytes += memtable_size;
newtab->set_unshared();
dblog.debug("Flushing to {}", newtab->get_filename());
return newtab->write_components(*old).then([this, newtab, old] {
return newtab->open_data().then([this, newtab] {
// Note that due to our sharded architecture, it is possible that
// in the face of a value change some shards will backup sstables
// while others won't.
//
// This is, in theory, possible to mitigate through a rwlock.
// However, this doesn't differ from the situation where all tables
// are coming from a single shard and the toggle happens in the
// middle of them.
//
// The code as is guarantees that we'll never partially backup a
// single sstable, so that is enough of a guarantee.
if (!incremental_backups_enabled()) {
return make_ready_future<>();
}
auto dir = newtab->get_dir() + "/backups/";
return touch_directory(dir).then([dir, newtab] {
return newtab->create_links(dir);
});
});
// Note that due to our sharded architecture, it is possible that
// in the face of a value change some shards will backup sstables
// while others won't.
//
// This is, in theory, possible to mitigate through a rwlock.
// However, this doesn't differ from the situation where all tables
// are coming from a single shard and the toggle happens in the
// middle of them.
//
// The code as is guarantees that we'll never partially backup a
// single sstable, so that is enough of a guarantee.
auto&& priority = service::get_local_memtable_flush_priority();
return newtab->write_components(*old, incremental_backups_enabled(), priority).then([this, newtab, old] {
return newtab->open_data();
}).then_wrapped([this, old, newtab, memtable_size] (future<> ret) {
_config.cf_stats->pending_memtables_flushes_count--;
_config.cf_stats->pending_memtables_flushes_bytes -= memtable_size;
@@ -710,68 +720,104 @@ column_family::reshuffle_sstables(int64_t start) {
});
}
void
column_family::rebuild_sstable_list(const std::vector<sstables::shared_sstable>& new_sstables,
const std::vector<sstables::shared_sstable>& sstables_to_remove) {
// Build a new list of _sstables: We remove from the existing list the
// tables we compacted (by now, there might be more sstables flushed
// later), and we add the new tables generated by the compaction.
// We create a new list rather than modifying it in-place, so that
// on-going reads can continue to use the old list.
auto current_sstables = _sstables;
_sstables = make_lw_shared<sstable_list>();
// zeroing live_disk_space_used and live_sstable_count because the
// sstable list is re-created below.
_stats.live_disk_space_used = 0;
_stats.live_sstable_count = 0;
std::unordered_set<sstables::shared_sstable> s(
sstables_to_remove.begin(), sstables_to_remove.end());
for (const auto& oldtab : *current_sstables) {
// Checks if oldtab is a sstable not being compacted.
if (!s.count(oldtab.second)) {
update_stats_for_new_sstable(oldtab.second->data_size());
_sstables->emplace(oldtab.first, oldtab.second);
}
}
for (const auto& newtab : new_sstables) {
// FIXME: rename the new sstable(s). Verify a rename doesn't cause
// problems for the sstable object.
update_stats_for_new_sstable(newtab->data_size());
_sstables->emplace(newtab->generation(), newtab);
}
for (const auto& oldtab : sstables_to_remove) {
oldtab->mark_for_deletion();
}
}
future<>
column_family::compact_sstables(sstables::compaction_descriptor descriptor) {
column_family::compact_sstables(sstables::compaction_descriptor descriptor, bool cleanup) {
if (!descriptor.sstables.size()) {
// if there is nothing to compact, just return.
return make_ready_future<>();
}
return with_lock(_sstables_lock.for_read(), [this, descriptor = std::move(descriptor)] {
return with_lock(_sstables_lock.for_read(), [this, descriptor = std::move(descriptor), cleanup] {
auto sstables_to_compact = make_lw_shared<std::vector<sstables::shared_sstable>>(std::move(descriptor.sstables));
auto new_tables = make_lw_shared<std::vector<
std::pair<unsigned, sstables::shared_sstable>>>();
auto create_sstable = [this, new_tables] {
// FIXME: this generation calculation should be in a function.
auto gen = _sstable_generation++ * smp::count + engine().cpu_id();
auto create_sstable = [this] {
auto gen = this->calculate_generation_for_new_table();
// FIXME: use "tmp" marker in names of incomplete sstable
auto sst = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(), _config.datadir, gen,
sstables::sstable::version_types::ka,
sstables::sstable::format_types::big);
sst->set_unshared();
new_tables->emplace_back(gen, sst);
return sst;
};
return sstables::compact_sstables(*sstables_to_compact, *this,
create_sstable, descriptor.max_sstable_bytes, descriptor.level).then([this, new_tables, sstables_to_compact] {
// Build a new list of _sstables: We remove from the existing list the
// tables we compacted (by now, there might be more sstables flushed
// later), and we add the new tables generated by the compaction.
// We create a new list rather than modifying it in-place, so that
// on-going reads can continue to use the old list.
auto current_sstables = _sstables;
_sstables = make_lw_shared<sstable_list>();
// zeroing live_disk_space_used and live_sstable_count because the
// sstable list is re-created below.
_stats.live_disk_space_used = 0;
_stats.live_sstable_count = 0;
std::unordered_set<sstables::shared_sstable> s(
sstables_to_compact->begin(), sstables_to_compact->end());
for (const auto& oldtab : *current_sstables) {
// Checks if oldtab is a sstable not being compacted.
if (!s.count(oldtab.second)) {
update_stats_for_new_sstable(oldtab.second->data_size());
_sstables->emplace(oldtab.first, oldtab.second);
}
}
for (const auto& newtab : *new_tables) {
// FIXME: rename the new sstable(s). Verify a rename doesn't cause
// problems for the sstable object.
update_stats_for_new_sstable(newtab.second->data_size());
_sstables->emplace(newtab.first, newtab.second);
}
for (const auto& oldtab : *sstables_to_compact) {
oldtab->mark_for_deletion();
}
return sstables::compact_sstables(*sstables_to_compact, *this, create_sstable, descriptor.max_sstable_bytes, descriptor.level,
cleanup).then([this, sstables_to_compact] (auto new_sstables) {
this->rebuild_sstable_list(new_sstables, *sstables_to_compact);
});
});
}
static bool needs_cleanup(const lw_shared_ptr<sstables::sstable>& sst,
const lw_shared_ptr<std::vector<range<dht::token>>>& owned_ranges,
schema_ptr s) {
auto first = sst->get_first_partition_key(*s);
auto last = sst->get_last_partition_key(*s);
auto first_token = dht::global_partitioner().get_token(*s, first);
auto last_token = dht::global_partitioner().get_token(*s, last);
range<dht::token> sst_token_range = range<dht::token>::make(first_token, last_token);
// return true iff sst partition range isn't fully contained in any of the owned ranges.
for (auto& r : *owned_ranges) {
if (r.contains(sst_token_range, dht::token_comparator())) {
return false;
}
}
return true;
}
future<> column_family::cleanup_sstables(sstables::compaction_descriptor descriptor) {
std::vector<range<dht::token>> r = service::get_local_storage_service().get_local_ranges(_schema->ks_name());
auto owned_ranges = make_lw_shared<std::vector<range<dht::token>>>(std::move(r));
auto sstables_to_cleanup = make_lw_shared<std::vector<sstables::shared_sstable>>(std::move(descriptor.sstables));
return parallel_for_each(*sstables_to_cleanup, [this, owned_ranges = std::move(owned_ranges), sstables_to_cleanup] (auto& sst) {
if (!owned_ranges->empty() && !needs_cleanup(sst, owned_ranges, _schema)) {
return make_ready_future<>();
}
std::vector<sstables::shared_sstable> sstable_to_compact({ sst });
return this->compact_sstables(sstables::compaction_descriptor(std::move(sstable_to_compact)), true);
});
}
future<>
column_family::load_new_sstables(std::vector<sstables::entry_descriptor> new_tables) {
return parallel_for_each(new_tables, [this] (auto comps) {
@@ -811,18 +857,26 @@ void column_family::start_compaction() {
void column_family::trigger_compaction() {
// Submitting compaction job to compaction manager.
// #934 - always inc the pending counter, to help
// indicate the want for compaction.
_stats.pending_compactions++;
do_trigger_compaction(); // see below
}
void column_family::do_trigger_compaction() {
// But only submit if we're not locked out
if (!_compaction_disabled) {
_stats.pending_compactions++;
_compaction_manager.submit(this);
}
}
future<> column_family::run_compaction() {
sstables::compaction_strategy strategy = _compaction_strategy;
return do_with(std::move(strategy), [this] (sstables::compaction_strategy& cs) {
return cs.compact(*this).then([this] {
_stats.pending_compactions--;
});
future<> column_family::run_compaction(sstables::compaction_descriptor descriptor) {
assert(_stats.pending_compactions > 0);
return compact_sstables(std::move(descriptor)).then([this] {
// only do this on success. (no exceptions)
// in that case, we rely on it being still set
// for reqeueuing
_stats.pending_compactions--;
});
}
@@ -960,6 +1014,9 @@ future<> column_family::populate(sstring sstdir) {
return make_ready_future<>();
});
});
}).then([this] {
// Make sure this is called even if CF is empty
mark_ready_for_writes();
});
}
@@ -1032,19 +1089,37 @@ future<> database::populate_keyspace(sstring datadir, sstring ks_name) {
dblog.error("Keyspace {}: Skipping malformed CF {} ", ksdir, de.name);
return make_ready_future<>();
}
sstring cfname = comps[0];
auto sstdir = ksdir + "/" + de.name;
sstring cfname = comps[0];
sstring uuidst = comps[1];
try {
auto& cf = find_column_family(ks_name, cfname);
dblog.info("Keyspace {}: Reading CF {} ", ksdir, cfname);
// FIXME: Increase parallelism.
return cf.populate(sstdir);
auto&& uuid = [&] {
try {
return find_uuid(ks_name, cfname);
} catch (const std::out_of_range& e) {
std::throw_with_nested(no_such_column_family(ks_name, cfname));
}
}();
auto& cf = find_column_family(uuid);
// #870: Check that the directory name matches
// the current, expected UUID of the CF.
if (utils::UUID(uuidst) == uuid) {
// FIXME: Increase parallelism.
auto sstdir = ksdir + "/" + de.name;
dblog.info("Keyspace {}: Reading CF {} ", ksdir, cfname);
return cf.populate(sstdir);
}
// Nope. Warn and ignore.
dblog.info("Keyspace {}: Skipping obsolete version of CF {} ({})", ksdir, cfname, uuidst);
} catch (marshal_exception&) {
// Bogus UUID part of directory name
dblog.warn("{}, CF {}: malformed UUID: {}. Ignoring", ksdir, comps[0], uuidst);
} catch (no_such_column_family&) {
dblog.warn("{}, CF {}: schema not loaded!", ksdir, comps[0]);
return make_ready_future<>();
}
return make_ready_future<>();
});
}
return make_ready_future<>();
@@ -1126,6 +1201,14 @@ database::init_system_keyspace() {
return populate_keyspace(_cfg->data_file_directories()[0], db::system_keyspace::NAME).then([this]() {
return init_commitlog();
});
}).then([this] {
auto& ks = find_keyspace(db::system_keyspace::NAME);
return parallel_for_each(ks.metadata()->cf_meta_data(), [this] (auto& pair) {
auto cfm = pair.second;
auto& cf = this->find_column_family(cfm);
cf.mark_ready_for_writes();
return make_ready_future<>();
});
});
}
@@ -1506,7 +1589,7 @@ column_family::query(schema_ptr s, const query::read_command& cmd, const std::ve
return do_with(query_state(std::move(s), cmd, partition_ranges), [this] (query_state& qs) {
return do_until(std::bind(&query_state::done, &qs), [this, &qs] {
auto&& range = *qs.current_partition_range++;
qs.reader = make_reader(qs.schema, range);
qs.reader = make_reader(qs.schema, range, service::get_local_sstable_query_read_priority());
qs.range_empty = false;
return do_until([&qs] { return !qs.limit || qs.range_empty; }, [&qs] {
return qs.reader().then([&qs](mutation_opt mo) {
@@ -1535,9 +1618,9 @@ column_family::query(schema_ptr s, const query::read_command& cmd, const std::ve
mutation_source
column_family::as_mutation_source() const {
return [this] (schema_ptr s, const query::partition_range& range) {
return this->make_reader(std::move(s), range);
};
return mutation_source([this] (schema_ptr s, const query::partition_range& range, const io_priority_class& pc) {
return this->make_reader(std::move(s), range, pc);
});
}
future<lw_shared_ptr<query::result>>
@@ -1594,7 +1677,8 @@ std::ostream& operator<<(std::ostream& out, const atomic_cell_or_collection& c)
}
std::ostream& operator<<(std::ostream& os, const mutation& m) {
fprint(os, "{mutation: schema %p key %s data ", m.schema().get(), m.decorated_key());
const ::schema& s = *m.schema();
fprint(os, "{%s.%s key %s data ", s.ks_name(), s.cf_name(), m.decorated_key());
os << m.partition() << "}";
return os;
}
@@ -1613,6 +1697,47 @@ std::ostream& operator<<(std::ostream& out, const database& db) {
return out;
}
void
column_family::apply(const mutation& m, const db::replay_position& rp) {
utils::latency_counter lc;
_stats.writes.set_latency(lc);
active_memtable().apply(m, rp);
seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
}
}
void
column_family::apply(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
utils::latency_counter lc;
_stats.writes.set_latency(lc);
check_valid_rp(rp);
active_memtable().apply(m, m_schema, rp);
seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
}
}
void
column_family::seal_on_overflow() {
if (active_memtable().occupancy().total_space() >= _config.max_memtable_size) {
// FIXME: if sparse, do some in-memory compaction first
// FIXME: maybe merge with other in-memory memtables
seal_active_memtable();
}
}
void
column_family::check_valid_rp(const db::replay_position& rp) const {
if (rp < _highest_flushed_rp) {
throw replay_position_reordered_exception();
}
}
future<> database::apply_in_memory(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
try {
auto& cf = find_column_family(m.column_family_id());
@@ -1634,9 +1759,8 @@ future<> database::do_apply(schema_ptr s, const frozen_mutation& m) {
s->ks_name(), s->cf_name(), s->version()));
}
if (cf.commitlog() != nullptr) {
bytes_view repr = m.representation();
auto write_repr = [repr] (data_output& out) { out.write(repr.begin(), repr.end()); };
return cf.commitlog()->add_mutation(uuid, repr.size(), write_repr).then([&m, this, s](auto rp) {
commitlog_entry_writer cew(s, m);
return cf.commitlog()->add_entry(uuid, cew).then([&m, this, s](auto rp) {
try {
return this->apply_in_memory(m, s, rp);
} catch (replay_position_reordered_exception&) {
@@ -1685,6 +1809,9 @@ void database::unthrottle() {
}
future<> database::apply(schema_ptr s, const frozen_mutation& m) {
if (dblog.is_enabled(logging::log_level::trace)) {
dblog.trace("apply {}", m.pretty_printer(s));
}
return throttle().then([this, &m, s = std::move(s)] {
return do_apply(std::move(s), m);
});
@@ -2169,7 +2296,8 @@ void column_family::clear() {
// NOTE: does not need to be futurized, but might eventually, depending on
// if we implement notifications, whatnot.
future<db::replay_position> column_family::discard_sstables(db_clock::time_point truncated_at) {
assert(_stats.pending_compactions == 0);
assert(_compaction_disabled > 0);
assert(!compaction_manager_queued());
return with_lock(_sstables_lock.for_read(), [this, truncated_at] {
db::replay_position rp;

View File

@@ -56,7 +56,6 @@
#include "tombstone.hh"
#include "atomic_cell.hh"
#include "query-request.hh"
#include "query-result.hh"
#include "keys.hh"
#include "mutation.hh"
#include "memtable.hh"
@@ -64,7 +63,7 @@
#include "mutation_reader.hh"
#include "row_cache.hh"
#include "compaction_strategy.hh"
#include "utils/compaction_manager.hh"
#include "sstables/compaction_manager.hh"
#include "utils/exponential_backoff_retry.hh"
#include "utils/histogram.hh"
#include "sstables/estimated_histogram.hh"
@@ -160,8 +159,8 @@ private:
// the read lock, and the ones that wish to stop that process will take the write lock.
rwlock _sstables_lock;
mutable row_cache _cache; // Cache covers only sstables.
int64_t _sstable_generation = 1;
unsigned _mutation_count = 0;
std::experimental::optional<int64_t> _sstable_generation = {};
db::replay_position _highest_flushed_rp;
// Provided by the database that owns this commitlog
db::commitlog* _commitlog;
@@ -172,6 +171,9 @@ private:
int _compaction_disabled = 0;
class memtable_flush_queue;
std::unique_ptr<memtable_flush_queue> _flush_queue;
// Store generation of sstables being compacted at the moment. That's needed to prevent a
// sstable from being compacted twice.
std::unordered_set<unsigned long> _compacting_generations;
private:
void update_stats_for_new_sstable(uint64_t new_sstable_data_size);
void add_sstable(sstables::sstable&& sstable);
@@ -183,26 +185,66 @@ private:
// update the sstable generation, making sure that new new sstables don't overwrite this one.
void update_sstables_known_generation(unsigned generation) {
_sstable_generation = std::max<uint64_t>(_sstable_generation, generation / smp::count + 1);
if (!_sstable_generation) {
_sstable_generation = 1;
}
_sstable_generation = std::max<uint64_t>(*_sstable_generation, generation / smp::count + 1);
}
uint64_t calculate_generation_for_new_table() {
assert(_sstable_generation);
// FIXME: better way of ensuring we don't attempt to
// overwrite an existing table.
return (*_sstable_generation)++ * smp::count + engine().cpu_id();
}
// Rebuild existing _sstables with new_sstables added to it and sstables_to_remove removed from it.
void rebuild_sstable_list(const std::vector<sstables::shared_sstable>& new_sstables,
const std::vector<sstables::shared_sstable>& sstables_to_remove);
private:
// Creates a mutation reader which covers sstables.
// Caller needs to ensure that column_family remains live (FIXME: relax this).
// The 'range' parameter must be live as long as the reader is used.
// Mutations returned by the reader will all have given schema.
mutation_reader make_sstable_reader(schema_ptr schema, const query::partition_range& range) const;
mutation_reader make_sstable_reader(schema_ptr schema, const query::partition_range& range, const io_priority_class& pc) const;
mutation_source sstables_as_mutation_source();
key_source sstables_as_key_source() const;
partition_presence_checker make_partition_presence_checker(lw_shared_ptr<sstable_list> old_sstables);
std::chrono::steady_clock::time_point _sstable_writes_disabled_at;
void do_trigger_compaction();
public:
// This function should be called when this column family is ready for writes, IOW,
// to produce SSTables. Extensive details about why this is important can be found
// in Scylla's Github Issue #1014
//
// Nothing should be writing to SSTables before we have the chance to populate the
// existing SSTables and calculate what should the next generation number be.
//
// However, if that happens, we want to protect against it in a way that does not
// involve overwriting existing tables. This is one of the ways to do it: every
// column family starts in an unwriteable state, and when it can finally be written
// to, we mark it as writeable.
//
// Note that this *cannot* be a part of add_column_family. That adds a column family
// to a db in memory only, and if anybody is about to write to a CF, that was most
// likely already called. We need to call this explicitly when we are sure we're ready
// to issue disk operations safely.
void mark_ready_for_writes() {
update_sstables_known_generation(0);
}
// Creates a mutation reader which covers all data sources for this column family.
// Caller needs to ensure that column_family remains live (FIXME: relax this).
// Note: for data queries use query() instead.
// The 'range' parameter must be live as long as the reader is used.
// Mutations returned by the reader will all have given schema.
mutation_reader make_reader(schema_ptr schema, const query::partition_range& range = query::full_partition_range) const;
// If I/O needs to be issued to read anything in the specified range, the operations
// will be scheduled under the priority class given by pc.
mutation_reader make_reader(schema_ptr schema,
const query::partition_range& range = query::full_partition_range,
const io_priority_class& pc = default_priority_class()) const;
mutation_source as_mutation_source() const;
@@ -290,7 +332,15 @@ public:
// not a real compaction policy.
future<> compact_all_sstables();
// Compact all sstables provided in the vector.
future<> compact_sstables(sstables::compaction_descriptor descriptor);
// If cleanup is set to true, compaction_sstables will run on behalf of a cleanup job,
// meaning that irrelevant keys will be discarded.
future<> compact_sstables(sstables::compaction_descriptor descriptor, bool cleanup = false);
// Performs a cleanup on each sstable of this column family, excluding
// those ones that are irrelevant to this node or being compacted.
// Cleanup is about discarding keys that are no longer relevant for a
// given sstable, e.g. after node loses part of its token range because
// of a newly added node.
future<> cleanup_sstables(sstables::compaction_descriptor descriptor);
future<bool> snapshot_exists(sstring name);
@@ -313,7 +363,7 @@ public:
void start_compaction();
void trigger_compaction();
future<> run_compaction();
future<> run_compaction(sstables::compaction_descriptor descriptor);
void set_compaction_strategy(sstables::compaction_strategy_type strategy);
const sstables::compaction_strategy& get_compaction_strategy() const {
return _compaction_strategy;
@@ -339,11 +389,19 @@ public:
Result run_with_compaction_disabled(Func && func) {
++_compaction_disabled;
return _compaction_manager.remove(this).then(std::forward<Func>(func)).finally([this] {
if (--_compaction_disabled == 0) {
trigger_compaction();
// #934. The pending counter is actually a great indicator into whether we
// actually need to trigger a compaction again.
if (--_compaction_disabled == 0 && _stats.pending_compactions > 0) {
// we're turning if on again, use function that does not increment
// the counter further.
do_trigger_compaction();
}
});
}
std::unordered_set<unsigned long>& compacting_generations() {
return _compacting_generations;
}
private:
// One does not need to wait on this future if all we are interested in, is
// initiating the write. The writes initiated here will eventually
@@ -353,16 +411,11 @@ private:
// But it is possible to synchronously wait for the seal to complete by
// waiting on this future. This is useful in situations where we want to
// synchronously flush data to disk.
//
// FIXME: A better interface would guarantee that all writes before this
// one are also complete
future<> seal_active_memtable();
// filter manifest.json files out
static bool manifest_json_filter(const sstring& fname);
seastar::gate _in_flight_seals;
// Iterate over all partitions. Protocol is the same as std::all_of(),
// so that iteration can be stopped by returning false.
// Func signature: bool (const decorated_key& dk, const mutation_partition& mp)
@@ -675,53 +728,6 @@ public:
// FIXME: stub
class secondary_index_manager {};
inline
void
column_family::apply(const mutation& m, const db::replay_position& rp) {
utils::latency_counter lc;
_stats.writes.set_latency(lc);
active_memtable().apply(m, rp);
seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
}
}
inline
void
column_family::seal_on_overflow() {
++_mutation_count;
if (active_memtable().occupancy().total_space() >= _config.max_memtable_size) {
// FIXME: if sparse, do some in-memory compaction first
// FIXME: maybe merge with other in-memory memtables
_mutation_count = 0;
seal_active_memtable();
}
}
inline
void
column_family::check_valid_rp(const db::replay_position& rp) const {
if (rp < _highest_flushed_rp) {
throw replay_position_reordered_exception();
}
}
inline
void
column_family::apply(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
utils::latency_counter lc;
_stats.writes.set_latency(lc);
check_valid_rp(rp);
active_memtable().apply(m, m_schema, rp);
seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
}
}
future<> update_schema_version_and_announce(distributed<service::storage_proxy>& proxy);
#endif /* DATABASE_HH_ */

View File

@@ -45,6 +45,7 @@
#include <boost/range/adaptor/sliced.hpp>
#include "batchlog_manager.hh"
#include "canonical_mutation.hh"
#include "service/storage_service.hh"
#include "service/storage_proxy.hh"
#include "system_keyspace.hh"
@@ -117,14 +118,14 @@ mutation db::batchlog_manager::get_batch_log_mutation_for(const std::vector<muta
auto key = partition_key::from_singular(*schema, id);
auto timestamp = api::new_timestamp();
auto data = [this, &mutations] {
std::vector<frozen_mutation> fm(mutations.begin(), mutations.end());
std::vector<canonical_mutation> fm(mutations.begin(), mutations.end());
const auto size = std::accumulate(fm.begin(), fm.end(), size_t(0), [](size_t s, auto& m) {
return s + serializer<frozen_mutation>{m}.size();
return s + serializer<canonical_mutation>{m}.size();
});
bytes buf(bytes::initialized_later(), size);
data_output out(buf);
for (auto& m : fm) {
serializer<frozen_mutation>{m}(out);
serializer<canonical_mutation>{m}(out);
}
return buf;
}();
@@ -152,49 +153,60 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
auto batch = [this, limiter](const cql3::untyped_result_set::row& row) {
auto written_at = row.get_as<db_clock::time_point>("written_at");
// enough time for the actual write + batchlog entry mutation delivery (two separate requests).
auto id = row.get_as<utils::UUID>("id");
// enough time for the actual write + batchlog entry mutation delivery (two separate requests).
auto timeout = get_batch_log_timeout();
if (db_clock::now() < written_at + timeout) {
logger.debug("Skipping replay of {}, too fresh", id);
return make_ready_future<>();
}
// not used currently. ever?
//auto version = row.has("version") ? row.get_as<uint32_t>("version") : /*MessagingService.VERSION_12*/6u;
auto id = row.get_as<utils::UUID>("id");
// check version of serialization format
if (!row.has("version")) {
logger.warn("Skipping logged batch because of unknown version");
return make_ready_future<>();
}
auto version = row.get_as<int32_t>("version");
if (version != net::messaging_service::current_version) {
logger.warn("Skipping logged batch because of incorrect version");
return make_ready_future<>();
}
auto data = row.get_blob("data");
logger.debug("Replaying batch {}", id);
auto fms = make_lw_shared<std::deque<frozen_mutation>>();
auto fms = make_lw_shared<std::deque<canonical_mutation>>();
data_input in(data);
while (in.has_next()) {
fms->emplace_back(serializer<frozen_mutation>::read(in));
fms->emplace_back(serializer<canonical_mutation>::read(in));
}
auto mutations = make_lw_shared<std::vector<mutation>>();
auto size = data.size();
return repeat([this, fms = std::move(fms), written_at, mutations]() mutable {
if (fms->empty()) {
return make_ready_future<stop_iteration>(stop_iteration::yes);
}
auto& fm = fms->front();
auto mid = fm.column_family_id();
return system_keyspace::get_truncated_at(mid).then([this, &fm, written_at, mutations](db_clock::time_point t) {
warn(unimplemented::cause::SCHEMA_CHANGE);
auto schema = local_schema_registry().get(fm.schema_version());
return map_reduce(*fms, [this, written_at] (canonical_mutation& fm) {
return system_keyspace::get_truncated_at(fm.column_family_id()).then([written_at, &fm] (db_clock::time_point t) ->
std::experimental::optional<std::reference_wrapper<canonical_mutation>> {
if (written_at > t) {
mutations->emplace_back(fm.unfreeze(schema));
return { std::ref(fm) };
} else {
return {};
}
}).then([fms] {
fms->pop_front();
return make_ready_future<stop_iteration>(stop_iteration::no);
});
}).then([this, id, mutations, limiter, written_at, size] {
if (mutations->empty()) {
},
std::vector<mutation>(),
[this] (std::vector<mutation> mutations, std::experimental::optional<std::reference_wrapper<canonical_mutation>> fm) {
if (fm) {
schema_ptr s = _qp.db().local().find_schema(fm.value().get().column_family_id());
mutations.emplace_back(fm.value().get().to_mutation(s));
}
return mutations;
}).then([this, id, limiter, written_at, size, fms] (std::vector<mutation> mutations) {
if (mutations.empty()) {
return make_ready_future<>();
}
const auto ttl = [this, mutations, written_at]() -> clock_type {
const auto ttl = [this, &mutations, written_at]() -> clock_type {
/*
* Calculate ttl for the mutations' hints (and reduce ttl by the time the mutations spent in the batchlog).
* This ensures that deletes aren't "undone" by an old batch replay.
@@ -216,8 +228,8 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
// Our normal write path does not add much redundancy to the dispatch, and rate is handled after send
// in both cases.
// FIXME: verify that the above is reasonably true.
return limiter->reserve(size).then([this, mutations, id] {
return _qp.proxy().local().mutate(std::move(*mutations), db::consistency_level::ANY);
return limiter->reserve(size).then([this, mutations = std::move(mutations), id] {
return _qp.proxy().local().mutate(mutations, db::consistency_level::ANY);
});
}).then([this, id] {
// delete batch

View File

@@ -64,6 +64,8 @@
#include "utils/crc.hh"
#include "utils/runtime.hh"
#include "log.hh"
#include "commitlog_entry.hh"
#include "service/priority_manager.hh"
static logging::logger logger("commitlog");
@@ -155,6 +157,9 @@ public:
bool _shutdown = false;
semaphore _new_segment_semaphore;
semaphore _write_semaphore;
semaphore _flush_semaphore;
scollectd::registrations _regs;
// TODO: verify that we're ok with not-so-great granularity
@@ -170,7 +175,11 @@ public:
uint64_t bytes_slack = 0;
uint64_t segments_created = 0;
uint64_t segments_destroyed = 0;
uint64_t pending_operations = 0;
uint64_t pending_writes = 0;
uint64_t pending_flushes = 0;
uint64_t pending_allocations = 0;
uint64_t write_limit_exceeded = 0;
uint64_t flush_limit_exceeded = 0;
uint64_t total_size = 0;
uint64_t buffer_list_bytes = 0;
uint64_t total_size_on_disk = 0;
@@ -178,33 +187,73 @@ public:
stats totals;
void begin_op() {
future<> begin_write() {
_gate.enter();
++totals.pending_operations;
++totals.pending_writes; // redundant, given semaphore. but easier to read
if (totals.pending_writes >= cfg.max_active_writes) {
++totals.write_limit_exceeded;
logger.trace("Write ops overflow: {}. Will block.", totals.pending_writes);
}
return _write_semaphore.wait();
}
void end_op() {
--totals.pending_operations;
void end_write() {
_write_semaphore.signal();
--totals.pending_writes;
_gate.leave();
}
future<> begin_flush() {
_gate.enter();
++totals.pending_flushes;
if (totals.pending_flushes >= cfg.max_active_flushes) {
++totals.flush_limit_exceeded;
logger.trace("Flush ops overflow: {}. Will block.", totals.pending_flushes);
}
return _flush_semaphore.wait();
}
void end_flush() {
_flush_semaphore.signal();
--totals.pending_flushes;
_gate.leave();
}
bool should_wait_for_write() const {
return _write_semaphore.waiters() > 0 || _flush_semaphore.waiters() > 0;
}
segment_manager(config c)
: cfg(c), max_size(
std::min<size_t>(std::numeric_limits<position_type>::max(),
std::max<size_t>(cfg.commitlog_segment_size_in_mb,
1) * 1024 * 1024)), max_mutation_size(
max_size >> 1), max_disk_size(
size_t(
std::ceil(
cfg.commitlog_total_space_in_mb
/ double(smp::count))) * 1024 * 1024)
: cfg([&c] {
config cfg(c);
if (cfg.commit_log_location.empty()) {
cfg.commit_log_location = "/var/lib/scylla/commitlog";
}
if (cfg.max_active_writes == 0) {
cfg.max_active_writes = // TODO: call someone to get an idea...
25 * smp::count;
}
cfg.max_active_writes = std::max(uint64_t(1), cfg.max_active_writes / smp::count);
if (cfg.max_active_flushes == 0) {
cfg.max_active_flushes = // TODO: call someone to get an idea...
5 * smp::count;
}
cfg.max_active_flushes = std::max(uint64_t(1), cfg.max_active_flushes / smp::count);
return cfg;
}())
, max_size(std::min<size_t>(std::numeric_limits<position_type>::max(), std::max<size_t>(cfg.commitlog_segment_size_in_mb, 1) * 1024 * 1024))
, max_mutation_size(max_size >> 1)
, max_disk_size(size_t(std::ceil(cfg.commitlog_total_space_in_mb / double(smp::count))) * 1024 * 1024)
, _write_semaphore(cfg.max_active_writes)
, _flush_semaphore(cfg.max_active_flushes)
{
assert(max_size > 0);
if (cfg.commit_log_location.empty()) {
cfg.commit_log_location = "/var/lib/scylla/commitlog";
}
logger.trace("Commitlog {} maximum disk size: {} MB / cpu ({} cpus)",
cfg.commit_log_location, max_disk_size / (1024 * 1024),
smp::count);
_regs = create_counters();
}
~segment_manager() {
@@ -238,6 +287,8 @@ public:
}
std::vector<sstring> get_active_names() const;
uint64_t get_num_dirty_segments() const;
uint64_t get_num_active_segments() const;
using buffer_type = temporary_buffer<char>;
@@ -341,9 +392,39 @@ class db::commitlog::segment: public enable_lw_shared_from_this<segment> {
std::unordered_map<cf_id_type, position_type> _cf_dirty;
time_point _sync_time;
seastar::gate _gate;
uint64_t _write_waiters = 0;
semaphore _queue;
std::unordered_set<table_schema_version> _known_schema_versions;
friend std::ostream& operator<<(std::ostream&, const segment&);
friend class segment_manager;
future<> begin_flush() {
// This is maintaining the semantica of only using the write-lock
// as a gate for flushing, i.e. once we've begun a flush for position X
// we are ok with writes to positions > X
return _dwrite.write_lock().then(std::bind(&segment_manager::begin_flush, _segment_manager)).finally([this] {
_dwrite.write_unlock();
});
}
void end_flush() {
_segment_manager->end_flush();
}
future<> begin_write() {
// This is maintaining the semantica of only using the write-lock
// as a gate for flushing, i.e. once we've begun a flush for position X
// we are ok with writes to positions > X
return _dwrite.read_lock().then(std::bind(&segment_manager::begin_write, _segment_manager));
}
void end_write() {
_segment_manager->end_write();
_dwrite.read_unlock();
}
public:
struct cf_mark {
const segment& s;
@@ -365,7 +446,7 @@ public:
segment(segment_manager* m, const descriptor& d, file && f, bool active)
: _segment_manager(m), _desc(std::move(d)), _file(std::move(f)), _sync_time(
clock_type::now())
clock_type::now()), _queue(0)
{
++_segment_manager->totals.segments_created;
logger.debug("Created new {} segment {}", active ? "active" : "reserve", *this);
@@ -383,9 +464,19 @@ public:
}
}
bool is_schema_version_known(schema_ptr s) {
return _known_schema_versions.count(s->version());
}
void add_schema_version(schema_ptr s) {
_known_schema_versions.emplace(s->version());
}
void forget_schema_versions() {
_known_schema_versions.clear();
}
bool must_sync() {
if (_segment_manager->cfg.mode == sync_mode::BATCH) {
return true;
return false;
}
auto now = clock_type::now();
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(
@@ -401,8 +492,9 @@ public:
*/
future<sseg_ptr> finish_and_get_new() {
_closed = true;
sync();
return _segment_manager->active_segment();
return maybe_wait_for_write(sync()).then([](sseg_ptr s) {
return s->_segment_manager->active_segment();
});
}
void reset_sync_time() {
_sync_time = clock_type::now();
@@ -417,7 +509,7 @@ public:
logger.trace("Sync not needed {}: ({} / {})", *this, position(), _flush_pos);
return make_ready_future<sseg_ptr>(shared_from_this());
}
return cycle().then([](auto seg) {
return cycle().then([](sseg_ptr seg) {
return seg->flush();
});
}
@@ -440,16 +532,14 @@ public:
// This is not 100% neccesary, we really only need the ones below our flush pos,
// but since we pretty much assume that task ordering will make this the case anyway...
return _dwrite.write_lock().then(
return begin_flush().then(
[this, me, pos]() mutable {
_dwrite.write_unlock(); // release it already.
pos = std::max(pos, _file_pos);
if (pos <= _flush_pos) {
logger.trace("{} already synced! ({} < {})", *this, pos, _flush_pos);
return make_ready_future<sseg_ptr>(std::move(me));
}
_segment_manager->begin_op();
return _file.flush().then_wrapped([this, pos, me](auto f) {
return _file.flush().then_wrapped([this, pos, me](future<> f) {
try {
f.get();
// TODO: retry/ignore/fail/stop - optional behaviour in origin.
@@ -462,16 +552,50 @@ public:
logger.error("Failed to flush commits to disk: {}", std::current_exception());
throw;
}
}).finally([this, me] {
_segment_manager->end_op();
});
});
}).finally([this] {
end_flush();
});
}
/**
* Allocate a new buffer
*/
void new_buffer(size_t s) {
assert(_buffer.empty());
auto overhead = segment_overhead_size;
if (_file_pos == 0) {
overhead += descriptor_header_size;
}
auto a = align_up(s + overhead, alignment);
auto k = std::max(a, default_size);
for (;;) {
try {
_buffer = _segment_manager->acquire_buffer(k);
break;
} catch (std::bad_alloc&) {
logger.warn("Could not allocate {} k bytes output buffer ({} k required)", k / 1024, a / 1024);
if (k > a) {
k = std::max(a, k / 2);
logger.debug("Trying reduced size: {} k", k / 1024);
continue;
}
throw;
}
}
_buf_pos = overhead;
auto * p = reinterpret_cast<uint32_t *>(_buffer.get_write());
std::fill(p, p + overhead, 0);
_segment_manager->totals.total_size += k;
}
/**
* Send any buffer contents to disk and get a new tmp buffer
*/
// See class comment for info
future<sseg_ptr> cycle(size_t s = 0) {
future<sseg_ptr> cycle() {
auto size = clear_buffer_slack();
auto buf = std::move(_buffer);
auto off = _file_pos;
@@ -479,36 +603,6 @@ public:
_file_pos += size;
_buf_pos = 0;
// if we need new buffer, get one.
// TODO: keep a queue of available buffers?
if (s > 0) {
auto overhead = segment_overhead_size;
if (_file_pos == 0) {
overhead += descriptor_header_size;
}
auto a = align_up(s + overhead, alignment);
auto k = std::max(a, default_size);
for (;;) {
try {
_buffer = _segment_manager->acquire_buffer(k);
break;
} catch (std::bad_alloc&) {
logger.warn("Could not allocate {} k bytes output buffer ({} k required)", k / 1024, a / 1024);
if (k > a) {
k = std::max(a, k / 2);
logger.debug("Trying reduced size: {} k", k / 1024);
continue;
}
throw;
}
}
_buf_pos = overhead;
auto * p = reinterpret_cast<uint32_t *>(_buffer.get_write());
std::fill(p, p + overhead, 0);
_segment_manager->totals.total_size += k;
}
auto me = shared_from_this();
assert(!me.owned());
@@ -545,13 +639,15 @@ public:
out.write(uint32_t(_file_pos));
out.write(crc.checksum());
forget_schema_versions();
// acquire read lock
return _dwrite.read_lock().then([this, size, off, buf = std::move(buf), me]() mutable {
return begin_write().then([this, size, off, buf = std::move(buf), me]() mutable {
auto written = make_lw_shared<size_t>(0);
auto p = buf.get();
_segment_manager->begin_op();
return repeat([this, size, off, written, p]() mutable {
return _file.dma_write(off + *written, p + *written, size - *written).then_wrapped([this, size, written](auto&& f) {
auto&& priority_class = service::get_local_commitlog_priority();
return _file.dma_write(off + *written, p + *written, size - *written, priority_class).then_wrapped([this, size, written](future<size_t>&& f) {
try {
auto bytes = std::get<0>(f.get());
*written += bytes;
@@ -575,20 +671,59 @@ public:
});
}).finally([this, buf = std::move(buf)]() mutable {
_segment_manager->release_buffer(std::move(buf));
_segment_manager->end_op();
});
}).then([me] {
return make_ready_future<sseg_ptr>(std::move(me));
}).finally([me, this]() {
_dwrite.read_unlock(); // release
end_write(); // release
});
}
future<sseg_ptr> maybe_wait_for_write(future<sseg_ptr> f) {
if (_segment_manager->should_wait_for_write()) {
++_write_waiters;
logger.trace("Too many pending writes. Must wait.");
return f.finally([this] {
if (--_write_waiters == 0) {
_queue.signal(_queue.waiters());
}
});
}
return make_ready_future<sseg_ptr>(shared_from_this());
}
/**
* If an allocation causes a write, and the write causes a block,
* any allocations post that need to wait for this to finish,
* other wise we will just continue building up more write queue
* eventually (+ loose more ordering)
*
* Some caution here, since maybe_wait_for_write actually
* releases _all_ queued up ops when finishing, we could get
* "bursts" of alloc->write, causing build-ups anyway.
* This should be measured properly. For now I am hoping this
* will work out as these should "block as a group". However,
* buffer memory usage might grow...
*/
bool must_wait_for_alloc() {
return _write_waiters > 0;
}
future<sseg_ptr> wait_for_alloc() {
auto me = shared_from_this();
++_segment_manager->totals.pending_allocations;
logger.trace("Previous allocation is blocking. Must wait.");
return _queue.wait().then([me] { // TODO: do we need a finally?
--me->_segment_manager->totals.pending_allocations;
return make_ready_future<sseg_ptr>(me);
});
}
/**
* Add a "mutation" to the segment.
*/
future<replay_position> allocate(const cf_id_type& id, size_t size,
serializer_func func) {
future<replay_position> allocate(const cf_id_type& id, shared_ptr<entry_writer> writer) {
const auto size = writer->size(*this);
const auto s = size + entry_overhead_size; // total size
if (s > _segment_manager->max_mutation_size) {
return make_exception_future<replay_position>(
@@ -597,23 +732,26 @@ public:
+ " bytes is too large for the maxiumum size of "
+ std::to_string(_segment_manager->max_mutation_size)));
}
// would we make the file too big?
for (;;) {
if (position() + s > _segment_manager->max_size) {
// do this in next segment instead.
return finish_and_get_new().then(
[id, size, func = std::move(func)](auto new_seg) {
return new_seg->allocate(id, size, func);
});
}
// enough data?
if (s > (_buffer.size() - _buf_pos)) {
// TODO: iff we have to many writes running, maybe we should
// wait for this?
cycle(s);
continue; // re-check file size overflow
}
break;
std::experimental::optional<future<sseg_ptr>> op;
if (must_sync()) {
op = sync();
} else if (must_wait_for_alloc()) {
op = wait_for_alloc();
} else if (!is_still_allocating() || position() + s > _segment_manager->max_size) { // would we make the file too big?
// do this in next segment instead.
op = finish_and_get_new();
} else if (_buffer.empty()) {
new_buffer(s);
} else if (s > (_buffer.size() - _buf_pos)) { // enough data?
op = maybe_wait_for_write(cycle());
}
if (op) {
return op->then([id, writer = std::move(writer)] (sseg_ptr new_seg) mutable {
return new_seg->allocate(id, std::move(writer));
});
}
_gate.enter(); // this might throw. I guess we accept this?
@@ -634,7 +772,7 @@ public:
out.write(crc.checksum());
// actual data
func(out);
writer->write(*this, out);
crc.process_bytes(p + 2 * sizeof(uint32_t), size);
@@ -645,9 +783,8 @@ public:
_gate.leave();
// finally, check if we're required to sync.
if (must_sync()) {
return sync().then([rp](auto seg) {
if (_segment_manager->cfg.mode == sync_mode::BATCH) {
return sync().then([rp](sseg_ptr) {
return make_ready_future<replay_position>(rp);
});
}
@@ -736,7 +873,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
}
return make_ready_future<std::experimental::optional<directory_entry_type>>(de.type);
};
return entry_type(de).then([this, de](auto type) {
return entry_type(de).then([this, de](std::experimental::optional<directory_entry_type> type) {
if (type == directory_entry_type::regular && de.name[0] != '.') {
try {
_result.emplace_back(de.name);
@@ -753,7 +890,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
}
};
return engine().open_directory(dirname).then([this, dirname](auto dir) {
return engine().open_directory(dirname).then([this, dirname](file dir) {
auto h = make_lw_shared<helper>(std::move(dirname), std::move(dir));
return h->done().then([h]() {
return make_ready_future<std::vector<db::commitlog::descriptor>>(std::move(h->_result));
@@ -762,7 +899,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
}
future<> db::commitlog::segment_manager::init() {
return list_descriptors(cfg.commit_log_location).then([this](auto descs) {
return list_descriptors(cfg.commit_log_location).then([this](std::vector<descriptor> descs) {
segment_id_type id = std::chrono::duration_cast<std::chrono::milliseconds>(runtime::get_boot_time().time_since_epoch()).count() + 1;
for (auto& d : descs) {
id = std::max(id, replay_position(d.id).base_id());
@@ -832,9 +969,23 @@ scollectd::registrations db::commitlog::segment_manager::create_counters() {
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "queue_length", "pending_operations")
, make_typed(data_type::GAUGE, totals.pending_operations)
, per_cpu_plugin_instance, "queue_length", "pending_writes")
, make_typed(data_type::GAUGE, totals.pending_writes)
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "queue_length", "pending_flushes")
, make_typed(data_type::GAUGE, totals.pending_flushes)
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "total_operations", "write_limit_exceeded")
, make_typed(data_type::DERIVE, totals.write_limit_exceeded)
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "total_operations", "flush_limit_exceeded")
, make_typed(data_type::DERIVE, totals.flush_limit_exceeded)
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "memory", "total_size")
, make_typed(data_type::GAUGE, totals.total_size)
@@ -963,7 +1114,7 @@ std::ostream& db::operator<<(std::ostream& out, const db::replay_position& p) {
void db::commitlog::segment_manager::discard_unused_segments() {
logger.trace("Checking for unused segments ({} active)", _segments.size());
auto i = std::remove_if(_segments.begin(), _segments.end(), [=](auto s) {
auto i = std::remove_if(_segments.begin(), _segments.end(), [=](sseg_ptr s) {
if (s->can_delete()) {
logger.debug("Segment {} is unused", *s);
return true;
@@ -1057,7 +1208,7 @@ void db::commitlog::segment_manager::on_timer() {
return this->allocate_segment(false).then([this](sseg_ptr s) {
if (!_shutdown) {
// insertion sort.
auto i = std::upper_bound(_reserve_segments.begin(), _reserve_segments.end(), s, [](auto s1, auto s2) {
auto i = std::upper_bound(_reserve_segments.begin(), _reserve_segments.end(), s, [](sseg_ptr s1, sseg_ptr s2) {
const descriptor& d1 = s1->_desc;
const descriptor& d2 = s2->_desc;
return d1.id < d2.id;
@@ -1069,7 +1220,7 @@ void db::commitlog::segment_manager::on_timer() {
--_reserve_allocating;
});
});
}).handle_exception([](auto ep) {
}).handle_exception([](std::exception_ptr ep) {
logger.warn("Exception in segment reservation: {}", ep);
});
arm();
@@ -1086,6 +1237,19 @@ std::vector<sstring> db::commitlog::segment_manager::get_active_names() const {
return res;
}
uint64_t db::commitlog::segment_manager::get_num_dirty_segments() const {
return std::count_if(_segments.begin(), _segments.end(), [](sseg_ptr s) {
return !s->is_still_allocating() && !s->is_clean();
});
}
uint64_t db::commitlog::segment_manager::get_num_active_segments() const {
return std::count_if(_segments.begin(), _segments.end(), [](sseg_ptr s) {
return s->is_still_allocating();
});
}
db::commitlog::segment_manager::buffer_type db::commitlog::segment_manager::acquire_buffer(size_t s) {
auto i = _temp_buffers.begin();
auto e = _temp_buffers.end();
@@ -1128,8 +1292,44 @@ void db::commitlog::segment_manager::release_buffer(buffer_type&& b) {
*/
future<db::replay_position> db::commitlog::add(const cf_id_type& id,
size_t size, serializer_func func) {
return _segment_manager->active_segment().then([=](auto s) {
return s->allocate(id, size, std::move(func));
class serializer_func_entry_writer final : public entry_writer {
serializer_func _func;
size_t _size;
public:
serializer_func_entry_writer(size_t sz, serializer_func func)
: _func(std::move(func)), _size(sz)
{ }
virtual size_t size(segment&) override { return _size; }
virtual void write(segment&, output& out) override {
_func(out);
}
};
auto writer = ::make_shared<serializer_func_entry_writer>(size, std::move(func));
return _segment_manager->active_segment().then([id, writer] (auto s) {
return s->allocate(id, writer);
});
}
future<db::replay_position> db::commitlog::add_entry(const cf_id_type& id, const commitlog_entry_writer& cew)
{
class cl_entry_writer final : public entry_writer {
commitlog_entry_writer _writer;
public:
cl_entry_writer(const commitlog_entry_writer& wr) : _writer(wr) { }
virtual size_t size(segment& seg) override {
_writer.set_with_schema(!seg.is_schema_version_known(_writer.schema()));
return _writer.size();
}
virtual void write(segment& seg, output& out) override {
if (_writer.with_schema()) {
seg.add_schema_version(_writer.schema());
}
_writer.write(out);
}
};
auto writer = ::make_shared<cl_entry_writer>(cew);
return _segment_manager->active_segment().then([id, writer] (auto s) {
return s->allocate(id, writer);
});
}
@@ -1200,11 +1400,18 @@ future<> db::commitlog::shutdown() {
return _segment_manager->shutdown();
}
size_t db::commitlog::max_record_size() const {
return _segment_manager->max_mutation_size - segment::entry_overhead_size;
}
uint64_t db::commitlog::max_active_writes() const {
return _segment_manager->cfg.max_active_writes;
}
uint64_t db::commitlog::max_active_flushes() const {
return _segment_manager->cfg.max_active_flushes;
}
future<> db::commitlog::clear() {
return _segment_manager->clear();
}
@@ -1386,10 +1593,6 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
return skip(slack);
}
if (start_off > pos) {
return skip(size - entry_header_size);
}
return fin.read_exactly(size - entry_header_size).then([this, size, crc = std::move(crc), rp](temporary_buffer<char> buf) mutable {
advance(buf);
@@ -1459,7 +1662,28 @@ uint64_t db::commitlog::get_flush_count() const {
}
uint64_t db::commitlog::get_pending_tasks() const {
return _segment_manager->totals.pending_operations;
return _segment_manager->totals.pending_writes
+ _segment_manager->totals.pending_flushes;
}
uint64_t db::commitlog::get_pending_writes() const {
return _segment_manager->totals.pending_writes;
}
uint64_t db::commitlog::get_pending_flushes() const {
return _segment_manager->totals.pending_flushes;
}
uint64_t db::commitlog::get_pending_allocations() const {
return _segment_manager->totals.pending_allocations;
}
uint64_t db::commitlog::get_write_limit_exceeded_count() const {
return _segment_manager->totals.write_limit_exceeded;
}
uint64_t db::commitlog::get_flush_limit_exceeded_count() const {
return _segment_manager->totals.flush_limit_exceeded;
}
uint64_t db::commitlog::get_num_segments_created() const {
@@ -1470,6 +1694,14 @@ uint64_t db::commitlog::get_num_segments_destroyed() const {
return _segment_manager->totals.segments_destroyed;
}
uint64_t db::commitlog::get_num_dirty_segments() const {
return _segment_manager->get_num_dirty_segments();
}
uint64_t db::commitlog::get_num_active_segments() const {
return _segment_manager->get_num_active_segments();
}
future<std::vector<db::commitlog::descriptor>> db::commitlog::list_existing_descriptors() const {
return list_existing_descriptors(active_config().commit_log_location);
}

View File

@@ -48,6 +48,7 @@
#include "core/stream.hh"
#include "utils/UUID.hh"
#include "replay_position.hh"
#include "commitlog_entry.hh"
class file;
@@ -114,6 +115,10 @@ public:
// Max number of segments to keep in pre-alloc reserve.
// Not (yet) configurable from scylla.conf.
uint64_t max_reserve_segments = 12;
// Max active writes/flushes. Default value
// zero means try to figure it out ourselves
uint64_t max_active_writes = 0;
uint64_t max_active_flushes = 0;
sync_mode mode = sync_mode::PERIODIC;
};
@@ -181,6 +186,13 @@ public:
});
}
/**
* Add an entry to the commit log.
*
* @param entry_writer a writer responsible for writing the entry
*/
future<replay_position> add_entry(const cf_id_type& id, const commitlog_entry_writer& entry_writer);
/**
* Modifies the per-CF dirty cursors of any commit log segments for the column family according to the position
* given. Discards any commit log segments that are no longer used.
@@ -233,14 +245,37 @@ public:
uint64_t get_completed_tasks() const;
uint64_t get_flush_count() const;
uint64_t get_pending_tasks() const;
uint64_t get_pending_writes() const;
uint64_t get_pending_flushes() const;
uint64_t get_pending_allocations() const;
uint64_t get_write_limit_exceeded_count() const;
uint64_t get_flush_limit_exceeded_count() const;
uint64_t get_num_segments_created() const;
uint64_t get_num_segments_destroyed() const;
/**
* Get number of inactive (finished), segments lingering
* due to still being dirty
*/
uint64_t get_num_dirty_segments() const;
/**
* Get number of active segments, i.e. still being allocated to
*/
uint64_t get_num_active_segments() const;
/**
* Returns the largest amount of data that can be written in a single "mutation".
*/
size_t max_record_size() const;
/**
* Return max allowed pending writes (per this shard)
*/
uint64_t max_active_writes() const;
/**
* Return max allowed pending flushes (per this shard)
*/
uint64_t max_active_flushes() const;
future<> clear();
const config& active_config() const;
@@ -283,6 +318,11 @@ public:
const sstring&, commit_load_reader_func, position_type = 0);
private:
commitlog(config);
struct entry_writer {
virtual size_t size(segment&) = 0;
virtual void write(segment&, output&) = 0;
};
};
}

View File

@@ -0,0 +1,88 @@
/*
* Copyright 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <experimental/optional>
#include "frozen_mutation.hh"
#include "schema.hh"
namespace stdx = std::experimental;
class commitlog_entry_writer {
schema_ptr _schema;
db::serializer<column_mapping> _column_mapping_serializer;
const frozen_mutation& _mutation;
bool _with_schema = true;
public:
commitlog_entry_writer(schema_ptr s, const frozen_mutation& fm)
: _schema(std::move(s)), _column_mapping_serializer(_schema->get_column_mapping()), _mutation(fm)
{ }
void set_with_schema(bool value) {
_with_schema = value;
}
bool with_schema() {
return _with_schema;
}
schema_ptr schema() const {
return _schema;
}
size_t size() const {
size_t size = data_output::serialized_size<bool>();
if (_with_schema) {
size += _column_mapping_serializer.size();
}
size += _mutation.representation().size();
return size;
}
void write(data_output& out) const {
out.write(_with_schema);
if (_with_schema) {
_column_mapping_serializer.write(out);
}
auto bv = _mutation.representation();
out.write(bv.begin(), bv.end());
}
};
class commitlog_entry_reader {
frozen_mutation _mutation;
stdx::optional<column_mapping> _column_mapping;
public:
commitlog_entry_reader(const temporary_buffer<char>& buffer)
: _mutation(bytes())
{
data_input in(buffer);
bool has_column_mapping = in.read<bool>();
if (has_column_mapping) {
_column_mapping = db::serializer<::column_mapping>::read(in);
}
auto bv = in.read_view(in.avail());
_mutation = frozen_mutation(bytes(bv.begin(), bv.end()));
}
const stdx::optional<column_mapping>& get_column_mapping() const { return _column_mapping; }
const frozen_mutation& mutation() const { return _mutation; }
};

View File

@@ -56,10 +56,14 @@
#include "db/serializer.hh"
#include "cql3/query_processor.hh"
#include "log.hh"
#include "converting_mutation_partition_applier.hh"
#include "schema_registry.hh"
#include "commitlog_entry.hh"
static logging::logger logger("commitlog_replayer");
class db::commitlog_replayer::impl {
std::unordered_map<table_schema_version, column_mapping> _column_mappings;
public:
impl(seastar::sharded<cql3::query_processor>& db);
@@ -70,6 +74,19 @@ public:
uint64_t skipped_mutations = 0;
uint64_t applied_mutations = 0;
uint64_t corrupt_bytes = 0;
stats& operator+=(const stats& s) {
invalid_mutations += s.invalid_mutations;
skipped_mutations += s.skipped_mutations;
applied_mutations += s.applied_mutations;
corrupt_bytes += s.corrupt_bytes;
return *this;
}
stats operator+(const stats& s) const {
stats tmp = *this;
tmp += s;
return tmp;
}
};
future<> process(stats*, temporary_buffer<char> buf, replay_position rp);
@@ -148,8 +165,6 @@ future<> db::commitlog_replayer::impl::init() {
future<db::commitlog_replayer::impl::stats>
db::commitlog_replayer::impl::recover(sstring file) {
logger.info("Replaying {}", file);
replay_position rp{commitlog::descriptor(file)};
auto gp = _min_pos[rp.shard_id()];
@@ -182,19 +197,29 @@ db::commitlog_replayer::impl::recover(sstring file) {
}
future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char> buf, replay_position rp) {
auto shard = rp.shard_id();
if (rp < _min_pos[shard]) {
logger.trace("entry {} is less than global min position. skipping", rp);
s->skipped_mutations++;
return make_ready_future<>();
}
try {
frozen_mutation fm(bytes(reinterpret_cast<const int8_t *>(buf.get()), buf.size()));
commitlog_entry_reader cer(buf);
auto& fm = cer.mutation();
auto cm_it = _column_mappings.find(fm.schema_version());
if (cm_it == _column_mappings.end()) {
if (!cer.get_column_mapping()) {
throw std::runtime_error(sprint("unknown schema version {}", fm.schema_version()));
}
logger.debug("new schema version {} in entry {}", fm.schema_version(), rp);
cm_it = _column_mappings.emplace(fm.schema_version(), *cer.get_column_mapping()).first;
}
auto shard_id = rp.shard_id();
if (rp < _min_pos[shard_id]) {
logger.trace("entry {} is less than global min position. skipping", rp);
s->skipped_mutations++;
return make_ready_future<>();
}
auto uuid = fm.column_family_id();
auto& map = _rpm[shard];
auto& map = _rpm[shard_id];
auto i = map.find(uuid);
if (i != map.end() && rp <= i->second) {
logger.trace("entry {} at {} is younger than recorded replay position {}. skipping", fm.column_family_id(), rp, i->second);
@@ -203,7 +228,8 @@ future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char>
}
auto shard = _qp.local().db().local().shard_of(fm);
return _qp.local().db().invoke_on(shard, [fm = std::move(fm), rp, shard, s] (database& db) -> future<> {
return _qp.local().db().invoke_on(shard, [this, cer = std::move(cer), cm_it, rp, shard, s] (database& db) -> future<> {
auto& fm = cer.mutation();
// TODO: might need better verification that the deserialized mutation
// is schema compatible. My guess is that just applying the mutation
// will not do this.
@@ -219,8 +245,11 @@ future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char>
// their "replay_position" attribute will be empty, which is
// lower than anything the new session will produce.
if (cf.schema()->version() != fm.schema_version()) {
// TODO: Convert fm to current schema
fail(unimplemented::cause::SCHEMA_CHANGE);
const column_mapping& cm = cm_it->second;
mutation m(fm.decorated_key(*cf.schema()), cf.schema());
converting_mutation_partition_applier v(cm, *cf.schema(), m.partition());
fm.partition().accept(cm, v);
cf.apply(std::move(m));
} else {
cf.apply(fm, cf.schema());
}
@@ -263,32 +292,41 @@ future<db::commitlog_replayer> db::commitlog_replayer::create_replayer(seastar::
}
future<> db::commitlog_replayer::recover(std::vector<sstring> files) {
return parallel_for_each(files, [this](auto f) {
return this->recover(f);
logger.info("Replaying {}", join(", ", files));
return map_reduce(files, [this](auto f) {
logger.debug("Replaying {}", f);
return _impl->recover(f).then([f](impl::stats stats) {
if (stats.corrupt_bytes != 0) {
logger.warn("Corrupted file: {}. {} bytes skipped.", f, stats.corrupt_bytes);
}
logger.debug("Log replay of {} complete, {} replayed mutations ({} invalid, {} skipped)"
, f
, stats.applied_mutations
, stats.invalid_mutations
, stats.skipped_mutations
);
return make_ready_future<impl::stats>(stats);
}).handle_exception([f](auto ep) -> future<impl::stats> {
logger.error("Error recovering {}: {}", f, ep);
try {
std::rethrow_exception(ep);
} catch (std::invalid_argument&) {
logger.error("Scylla cannot process {}. Make sure to fully flush all Cassandra commit log files to sstable before migrating.", f);
throw;
} catch (...) {
throw;
}
});
}, impl::stats(), std::plus<impl::stats>()).then([](impl::stats totals) {
logger.info("Log replay complete, {} replayed mutations ({} invalid, {} skipped)"
, totals.applied_mutations
, totals.invalid_mutations
, totals.skipped_mutations
);
});
}
future<> db::commitlog_replayer::recover(sstring f) {
return _impl->recover(f).then([f](impl::stats stats) {
if (stats.corrupt_bytes != 0) {
logger.warn("Corrupted file: {}. {} bytes skipped.", f, stats.corrupt_bytes);
}
logger.info("Log replay of {} complete, {} replayed mutations ({} invalid, {} skipped)"
, f
, stats.applied_mutations
, stats.invalid_mutations
, stats.skipped_mutations
);
}).handle_exception([f](auto ep) {
logger.error("Error recovering {}: {}", f, ep);
try {
std::rethrow_exception(ep);
} catch (std::invalid_argument&) {
logger.error("Scylla cannot process {}. Make sure to fully flush all Cassandra commit log files to sstable before migrating.");
throw;
} catch (...) {
throw;
}
});;
return recover(std::vector<sstring>{ f });
}

View File

@@ -30,6 +30,7 @@
#include "core/shared_ptr.hh"
#include "core/fstream.hh"
#include "core/do_with.hh"
#include "core/print.hh"
#include "log.hh"
#include <boost/any.hpp>
@@ -432,3 +433,9 @@ boost::filesystem::path db::config::get_conf_dir() {
return confdir;
}
void db::config::check_experimental(const sstring& what) const {
if (!experimental()) {
throw std::runtime_error(sprint("%s is currently disabled. Start Scylla with --experimental=on to enable.", what));
}
}

View File

@@ -102,6 +102,9 @@ public:
config();
// Throws exception if experimental feature is disabled.
void check_experimental(const sstring& what) const;
boost::program_options::options_description
get_options_description();
@@ -265,7 +268,7 @@ public:
"Counter writes read the current values before incrementing and writing them back. The recommended value is (16 × number_of_drives) ." \
) \
/* Common automatic backup settings */ \
val(incremental_backups, bool, false, Unused, \
val(incremental_backups, bool, false, Used, \
"Backs up data updated since the last snapshot was taken. When enabled, Cassandra creates a hard link to each SSTable flushed or streamed locally in a backups/ subdirectory of the keyspace data. Removing these links is the operator's responsibility.\n" \
"Related information: Enabling incremental backups" \
) \
@@ -690,7 +693,7 @@ public:
val(ssl_storage_port, uint32_t, 7001, Used, \
"The SSL port for encrypted communication. Unused unless enabled in encryption_options." \
) \
val(default_log_level, sstring, "warn", Used, \
val(default_log_level, sstring, "info", Used, \
"Default log level for log messages. Valid values are trace, debug, info, warn, error.") \
val(logger_log_level, string_map, /* none */, Used,\
"map of logger name to log level. Valid values are trace, debug, info, warn, error. " \
@@ -715,8 +718,10 @@ public:
val(replace_address_first_boot, sstring, "", Used, "Like replace_address option, but if the node has been bootstrapped sucessfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.") \
val(override_decommission, bool, false, Used, "Set true to force a decommissioned node to join the cluster") \
val(ring_delay_ms, uint32_t, 30 * 1000, Used, "Time a node waits to hear from other nodes before joining the ring in milliseconds. Same as -Dcassandra.ring_delay_ms in cassandra.") \
val(shutdown_announce_in_ms, uint32_t, 2 * 1000, Used, "Time a node waits after sending gossip shutdown message in milliseconds. Same as -Dcassandra.shutdown_announce_in_ms in cassandra.") \
val(developer_mode, bool, false, Used, "Relax environement checks. Setting to true can reduce performance and reliability significantly.") \
val(skip_wait_for_gossip_to_settle, int32_t, -1, Used, "An integer to configure the wait for gossip to settle. -1: wait normally, 0: do not wait at all, n: wait for at most n polls. Same as -Dcassandra.skip_wait_for_gossip_to_settle in cassandra.") \
val(experimental, bool, false, Used, "Set to true to unlock experimental features.") \
/* done! */
#define _make_value_member(name, type, deflt, status, desc, ...) \
@@ -733,5 +738,4 @@ private:
int _dummy;
};
}

View File

@@ -50,16 +50,20 @@ namespace db {
namespace marshal {
type_parser::type_parser(const sstring& str, size_t idx)
: _str{str}
type_parser::type_parser(sstring_view str, size_t idx)
: _str{str.begin(), str.end()}
, _idx{idx}
{ }
type_parser::type_parser(const sstring& str)
type_parser::type_parser(sstring_view str)
: type_parser{str, 0}
{ }
data_type type_parser::parse(const sstring& str) {
return type_parser(sstring_view(str)).parse();
}
data_type type_parser::parse(sstring_view str) {
return type_parser(str).parse();
}

View File

@@ -62,14 +62,15 @@ class type_parser {
public static final TypeParser EMPTY_PARSER = new TypeParser("", 0);
#endif
type_parser(const sstring& str, size_t idx);
type_parser(sstring_view str, size_t idx);
public:
explicit type_parser(const sstring& str);
explicit type_parser(sstring_view str);
/**
* Parse a string containing an type definition.
*/
static data_type parse(const sstring& str);
static data_type parse(sstring_view str);
#if 0
public static AbstractType<?> parse(CharSequence compareWith) throws SyntaxException, ConfigurationException

View File

@@ -415,16 +415,16 @@ future<std::vector<frozen_mutation>> convert_schema_to_mutations(distributed<ser
if (partition_key == system_keyspace::NAME) {
continue;
}
results.emplace_back(p.mut());
results.emplace_back(std::move(p.mut()));
}
return results;
});
};
auto reduce = [] (auto&& result, auto&& mutations) {
std::copy(mutations.begin(), mutations.end(), std::back_inserter(result));
std::move(mutations.begin(), mutations.end(), std::back_inserter(result));
return std::move(result);
};
return map_reduce(ALL.begin(), ALL.end(), map, std::move(std::vector<frozen_mutation>{}), reduce);
return map_reduce(ALL.begin(), ALL.end(), map, std::vector<frozen_mutation>{}, reduce);
}
future<schema_result>
@@ -703,6 +703,8 @@ static void merge_tables(distributed<service::storage_proxy>& proxy,
auto& ks = db.find_keyspace(s->ks_name());
auto cfg = ks.make_column_family_config(*s);
db.add_column_family(s, cfg);
auto& cf = db.find_column_family(s);
cf.mark_ready_for_writes();
ks.make_directory_for_column_family(s->cf_name(), s->id()).get();
service::get_local_migration_manager().notify_create_column_family(s);
}
@@ -1327,8 +1329,10 @@ schema_ptr create_table_from_mutations(schema_mutations sm, std::experimental::o
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
bool is_compound = cell_comparator::check_compound(table_row.get_nonnull<sstring>("comparator"));
auto comparator = table_row.get_nonnull<sstring>("comparator");
bool is_compound = cell_comparator::check_compound(comparator);
builder.set_is_compound(is_compound);
cell_comparator::read_collections(builder, comparator);
#if 0
CellNameType comparator = CellNames.fromAbstractType(fullRawComparator, isDense);

View File

@@ -251,39 +251,6 @@ std::ostream& operator<<(std::ostream& out, const ring_position& pos) {
return out << "}";
}
size_t ring_position::serialized_size() const {
size_t size = serialize_int32_size; /* _key length */
if (_key) {
size += _key.value().representation().size();
} else {
size += sizeof(int8_t); /* _token_bund */
}
return size + _token.serialized_size();
}
void ring_position::serialize(bytes::iterator& out) const {
_token.serialize(out);
if (_key) {
auto v = _key.value().representation();
serialize_int32(out, v.size());
out = std::copy(v.begin(), v.end(), out);
} else {
serialize_int32(out, 0);
serialize_int8(out, static_cast<int8_t>(_token_bound));
}
}
ring_position ring_position::deserialize(bytes_view& in) {
auto token = token::deserialize(in);
auto size = read_simple<uint32_t>(in);
if (size == 0) {
auto bound = dht::ring_position::token_bound(read_simple<int8_t>(in));
return ring_position(std::move(token), bound);
} else {
return ring_position(std::move(token), partition_key::from_bytes(to_bytes(read_simple_bytes(in, size))));
}
}
unsigned shard_of(const token& t) {
return global_partitioner().shard_of(t);
}
@@ -296,29 +263,6 @@ int token_comparator::operator()(const token& t1, const token& t2) const {
return tri_compare(t1, t2);
}
void token::serialize(bytes::iterator& out) const {
uint8_t kind = _kind == dht::token::kind::before_all_keys ? 0 :
_kind == dht::token::kind::key ? 1 : 2;
serialize_int8(out, kind);
serialize_int16(out, _data.size());
out = std::copy(_data.begin(), _data.end(), out);
}
token token::deserialize(bytes_view& in) {
uint8_t kind = read_simple<uint8_t>(in);
size_t size = read_simple<uint16_t>(in);
return token(kind == 0 ? dht::token::kind::before_all_keys :
kind == 1 ? dht::token::kind::key :
dht::token::kind::after_all_keys,
to_bytes(read_simple_bytes(in, size)));
}
size_t token::serialized_size() const {
return serialize_int8_size // token::kind;
+ serialize_int16_size // token size
+ _data.size();
}
bool ring_position::equal(const schema& s, const ring_position& other) const {
return tri_compare(s, other) == 0;
}

View File

@@ -97,11 +97,6 @@ public:
bool is_maximum() const {
return _kind == kind::after_all_keys;
}
void serialize(bytes::iterator& out) const;
static token deserialize(bytes_view& in);
size_t serialized_size() const;
};
token midpoint_unsigned(const token& t1, const token& t2);
@@ -338,6 +333,12 @@ public:
, _key(std::experimental::make_optional(std::move(key)))
{ }
ring_position(dht::token token, token_bound bound, std::experimental::optional<partition_key> key)
: _token(std::move(token))
, _token_bound(bound)
, _key(std::move(key))
{ }
ring_position(const dht::decorated_key& dk)
: _token(dk._token)
, _key(std::experimental::make_optional(dk._key))
@@ -379,10 +380,6 @@ public:
// "less" comparator corresponding to tri_compare()
bool less_compare(const schema&, const ring_position&) const;
size_t serialized_size() const;
void serialize(bytes::iterator& out) const;
static ring_position deserialize(bytes_view& in);
friend std::ostream& operator<<(std::ostream&, const ring_position&);
};

View File

@@ -107,7 +107,7 @@ public:
, _tokens(std::move(tokens))
, _address(address)
, _description(std::move(description))
, _stream_plan(_description, true) {
, _stream_plan(_description) {
}
range_streamer(distributed<database>& db, token_metadata& tm, inet_address address, sstring description)

View File

@@ -30,19 +30,20 @@ if [ ! -f variables.json ]; then
fi
if [ ! -d packer ]; then
wget https://dl.bintray.com/mitchellh/packer/packer_0.8.6_linux_amd64.zip
wget https://releases.hashicorp.com/packer/0.8.6/packer_0.8.6_linux_amd64.zip
mkdir packer
cd packer
unzip -x ../packer_0.8.6_linux_amd64.zip
cd -
fi
echo "sudo yum remove -y abrt" > scylla_deploy.sh
if [ $LOCALRPM = 0 ]; then
echo "sudo sh -x -e /home/centos/scylla_install_pkg; sudo sh -x -e /usr/lib/scylla/scylla_setup -a" > scylla_deploy.sh
echo "sudo sh -x -e /home/centos/scylla_install_pkg" >> scylla_deploy.sh
else
echo "sudo sh -x -e /home/centos/scylla_install_pkg -l /home/centos; sudo sh -x -e /usr/lib/scylla/scylla_setup -a" > scylla_deploy.sh
echo "sudo sh -x -e /home/centos/scylla_install_pkg -l /home/centos" >> scylla_deploy.sh
fi
echo "sudo sh -x -e /usr/lib/scylla/scylla_setup -a" >> scylla_deploy.sh
chmod a+rx scylla_deploy.sh
packer/packer build -var-file=variables.json scylla.json

40
dist/ami/scylla.json vendored
View File

@@ -8,16 +8,52 @@
"security_group_id": "{{user `security_group_id`}}",
"region": "{{user `region`}}",
"associate_public_ip_address": "{{user `associate_public_ip_address`}}",
"source_ami": "ami-61bbf104",
"source_ami": "ami-f3102499",
"user_data_file": "user_data.txt",
"instance_type": "{{user `instance_type`}}",
"ssh_username": "centos",
"ssh_timeout": "5m",
"ami_name": "scylla_{{isotime | clean_ami_name}}",
"enhanced_networking": true,
"launch_block_device_mappings": [
{
"device_name": "/dev/sda1",
"volume_size": 10
"volume_size": 10,
"delete_on_termination": true
}
],
"ami_block_device_mappings": [
{
"device_name": "/dev/sdb",
"virtual_name": "ephemeral0"
},
{
"device_name": "/dev/sdc",
"virtual_name": "ephemeral1"
},
{
"device_name": "/dev/sdd",
"virtual_name": "ephemeral2"
},
{
"device_name": "/dev/sde",
"virtual_name": "ephemeral3"
},
{
"device_name": "/dev/sdf",
"virtual_name": "ephemeral4"
},
{
"device_name": "/dev/sdg",
"virtual_name": "ephemeral5"
},
{
"device_name": "/dev/sdh",
"virtual_name": "ephemeral6"
},
{
"device_name": "/dev/sdi",
"virtual_name": "ephemeral7"
}
]
}

View File

@@ -4,7 +4,6 @@
. /etc/os-release
. /etc/sysconfig/scylla-server
if [ ! -f /etc/default/grub ]; then
echo "Unsupported bootloader"
exit 1
@@ -18,7 +17,7 @@ fi
sed -e "s#^GRUB_CMDLINE_LINUX=\"#GRUB_CMDLINE_LINUX=\"hugepagesz=2M hugepages=$NR_HUGEPAGES #" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
if [ "$ID" = "ubuntu" ]; then
grub2-mkconfig -o /boot/grub/grub.cfg
grub-mkconfig -o /boot/grub/grub.cfg
else
grub2-mkconfig -o /boot/grub2/grub.cfg
fi

View File

@@ -31,8 +31,8 @@ else
[Coredump]
Storage=external
Compress=yes
ProcessSizeMax=16G
ExternalSizeMax=16G
ProcessSizeMax=1024G
ExternalSizeMax=1024G
EOS
if [ $SYMLINK = 1 ]; then
rm -rf /var/lib/systemd/coredump

View File

@@ -29,10 +29,13 @@ if [ "$NAME" = "Ubuntu" ]; then
else
yum install -y ntp ntpdate || true
if [ $AMI -eq 1 ]; then
sed -e s#fedora.pool.ntp.org#amazon.pool.ntp.org# /etc/ntp.conf > /tmp/ntp.conf
sed -e s#centos.pool.ntp.org#amazon.pool.ntp.org# /etc/ntp.conf > /tmp/ntp.conf
mv /tmp/ntp.conf /etc/ntp.conf
fi
systemctl enable ntpd.service
if [ "`systemctl is-active ntpd`" = "active" ]; then
systemctl stop ntpd.service
fi
ntpdate `cat /etc/ntp.conf |grep "^server"|head -n1|awk '{print $2}'`
systemctl enable ntpd.service
systemctl start ntpd.service
fi

View File

@@ -1,39 +1,8 @@
#!/bin/sh -e
if [ "$AMI" = "yes" ]; then
RAIDCNT=`grep xvdb /proc/mdstat | wc -l`
RAIDDEV=`grep xvdb /proc/mdstat | awk '{print $1}'`
if [ $RAIDCNT -ge 1 ]; then
echo "RAID already constructed."
if [ "`mount|grep /var/lib/scylla`" = "" ]; then
mount -o noatime /dev/$RAIDDEV /var/lib/scylla
fi
else
echo "RAID does not constructed, going to initialize..."
if [ "$AMI_KEEP_VERSION" != "yes" ]; then
yum update -y
fi
DISKS=""
for i in /dev/xvd{b..z}; do
if [ -b $i ];then
echo "Found disk $i"
if [ "$DISKS" = "" ]; then
DISKS=$i
else
DISKS="$DISKS,$i"
fi
fi
done
if [ "$DISKS" != "" ]; then
/usr/lib/scylla/scylla_raid_setup -d $DISKS
fi
/usr/lib/scylla/scylla-ami/ds2_configure.py
fi
if [ "$AMI" = "yes" ] && [ -f /etc/scylla/ami_disabled ]; then
rm /etc/scylla/ami_disabled
exit 1
fi
if [ "$NETWORK_MODE" = "virtio" ]; then

View File

@@ -43,6 +43,13 @@ if [ "`mount|grep /var/lib/scylla`" != "" ]; then
echo "/var/lib/scylla is already mounted"
exit 1
fi
. /etc/os-release
if [ "$NAME" = "Ubuntu" ]; then
apt-get -y install mdadm xfsprogs
else
yum -y install mdadm xfsprogs
fi
mdadm --create --verbose --force --run $RAID --level=0 -c256 --raid-devices=$NR_DISK $DISKS
blockdev --setra 65536 $RAID
mkfs.xfs $RAID -f

View File

@@ -35,24 +35,22 @@ while getopts d:n:al:h OPT; do
esac
done
SYSCONFIG_SETUP_ARGS="-n $NIC"
. /etc/os-release
if [ "$ID" != "ubuntu" ]; then
setenforce 0
sed -e "s/enforcing/disabled/" /etc/sysconfig/selinux > /tmp/selinux
mv /tmp/selinux /etc/sysconfig/
if [ "`sestatus | awk '{print $3}'`" != "disabled" ]; then
setenforce 0
sed -e "s/enforcing/disabled/" /etc/sysconfig/selinux > /tmp/selinux
mv /tmp/selinux /etc/sysconfig/
fi
if [ $AMI -eq 1 ]; then
SYSCONFIG_SETUP_ARGS="$SYSCONFIG_SETUP_ARGS -N -a"
if [ "$LOCAL_PKG" = "" ]; then
yum update -y
else
SYSCONFIG_SETUP_ARGS="$SYSCONFIG_SETUP_ARGS -k"
fi
grep -v ' - mounts' /etc/cloud/cloud.cfg > /tmp/cloud.cfg
mv /tmp/cloud.cfg /etc/cloud/cloud.cfg
mv /home/centos/scylla-ami/scylla-ami-setup.service /usr/lib/systemd/system/
mv /home/centos/scylla-ami /usr/lib/scylla/scylla-ami
chmod a+rx /usr/lib/scylla/scylla-ami/ds2_configure.py
systemctl daemon-reload
systemctl enable scylla-ami-setup.service
fi
systemctl enable scylla-server.service
systemctl enable scylla-jmx.service
@@ -70,5 +68,5 @@ else
/usr/lib/scylla/scylla_coredump_setup -s
/usr/lib/scylla/scylla_ntp_setup -a
/usr/lib/scylla/scylla_bootparam_setup -a
/usr/lib/scylla/scylla_sysconfig_setup -n $NIC
fi
/usr/lib/scylla/scylla_sysconfig_setup $SYSCONFIG_SETUP_ARGS

View File

@@ -3,17 +3,17 @@
# Copyright (C) 2015 ScyllaDB
print_usage() {
echo "scylla-sysconfig-setup -n eth0 -m posix -p 64 -u scylla -g scylla -d /var/lib/scylla -c /etc/scylla -N -a -k"
echo "scylla-sysconfig-setup -n eth0 -m posix -p 64 -u scylla -g scylla -r /var/lib/scylla -c /etc/scylla -N -a -k"
echo " -n specify NIC"
echo " -m network mode (posix, dpdk)"
echo " -p number of hugepages"
echo " -u user (dpdk requires root)"
echo " -g group (dpdk requires root)"
echo " -d scylla home directory"
echo " -r scylla home directory"
echo " -c scylla config directory"
echo " -N setup NIC's interrupts, RPS, XPS"
echo " -a AMI instance mode"
echo " -k keep package version on AMI"
echo " -d disk count"
exit 1
}
@@ -23,19 +23,9 @@ if [ "$ID" = "ubuntu" ]; then
else
SYSCONFIG=/etc/sysconfig
fi
. $SYSCONFIG/scylla-server
NIC=eth0
NETWORK_MODE=posix
NR_HUGEPAGES=64
USER=scylla
GROUP=scylla
SCYLLA_HOME=/var/lib/scylla
SCYLLA_CONF=/etc/scylla
SETUP_NIC=0
SET_NIC="no"
AMI=no
AMI_KEEP_VERSION=no
SCYLLA_ARGS=
DISK_COUNT=0
while getopts n:m:p:u:g:d:c:Nakh OPT; do
case "$OPT" in
"n")
@@ -53,7 +43,7 @@ while getopts n:m:p:u:g:d:c:Nakh OPT; do
"g")
GROUP=$OPTARG
;;
"d")
"r")
SCYLLA_HOME=$OPTARG
;;
"c")
@@ -65,8 +55,8 @@ while getopts n:m:p:u:g:d:c:Nakh OPT; do
"a")
AMI=yes
;;
"k")
AMI_KEEP_VERSION=yes
"d")
DISK_COUNT=$OPTARG
;;
"h")
print_usage
@@ -79,11 +69,29 @@ echo Setting parameters on $SYSCONFIG/scylla-server
ETHDRV=`/usr/lib/scylla/dpdk_nic_bind.py --status | grep if=$NIC | sed -e "s/^.*drv=//" -e "s/ .*$//"`
ETHPCIID=`/usr/lib/scylla/dpdk_nic_bind.py --status | grep if=$NIC | awk '{print $1}'`
NR_CPU=`cat /proc/cpuinfo |grep processor|wc -l`
if [ $NR_CPU -ge 8 ]; then
NR=$((NR_CPU - 1))
NR_SHARDS=$NR_CPU
if [ $NR_CPU -ge 8 ] && [ "$SET_NIC" = "no" ]; then
NR_SHARDS=$((NR_CPU - 1))
SET_NIC="yes"
SCYLLA_ARGS="--cpuset 1-$NR --smp $NR"
SCYLLA_ARGS="$SCYLLA_ARGS --cpuset 1-$NR_SHARDS --smp $NR_SHARDS"
fi
if [ "$AMI" = "yes" ] && [ $DISK_COUNT -gt 0 ]; then
NR_DISKS=$DISK_COUNT
if [ $NR_DISKS -lt 2 ]; then NR_DISKS=2; fi
NR_REQS=$((32 * $NR_DISKS / 2))
NR_IO_QUEUES=$NR_SHARDS
if [ $(($NR_REQS/$NR_IO_QUEUES)) -lt 4 ]; then
NR_IO_QUEUES=$(($NR_REQS / 4))
fi
NR_REQS=$(($(($NR_REQS / $NR_IO_QUEUES)) * $NR_IO_QUEUES))
SCYLLA_IO="$SCYLLA_IO --num-io-queues $NR_IO_QUEUES --max-io-requests $NR_REQS"
fi
sed -e s#^NETWORK_MODE=.*#NETWORK_MODE=$NETWORK_MODE# \
-e s#^ETHDRV=.*#ETHDRV=$ETHDRV# \
-e s#^ETHPCIID=.*#ETHPCIID=$ETHPCIID# \
@@ -93,8 +101,8 @@ sed -e s#^NETWORK_MODE=.*#NETWORK_MODE=$NETWORK_MODE# \
-e s#^SCYLLA_HOME=.*#SCYLLA_HOME=$SCYLLA_HOME# \
-e s#^SCYLLA_CONF=.*#SCYLLA_CONF=$SCYLLA_CONF# \
-e s#^SET_NIC=.*#SET_NIC=$SET_NIC# \
-e s#^SCYLLA_ARGS=.*#SCYLLA_ARGS="$SCYLLA_ARGS"# \
-e s#^AMI=.*#AMI="$AMI"# \
-e s#^AMI_KEEP_VERSION=.*#AMI_KEEP_VERSION="$AMI_KEEP_VERSION"# \
-e "s#^SCYLLA_ARGS=.*#SCYLLA_ARGS=\"$SCYLLA_ARGS\"#" \
-e "s#^SCYLLA_IO=.*#SCYLLA_IO=\"$SCYLLA_IO\"#" \
-e s#^AMI=.*#AMI=$AMI# \
$SYSCONFIG/scylla-server > /tmp/scylla-server
mv /tmp/scylla-server $SYSCONFIG/scylla-server

View File

@@ -1 +1 @@
scylla ALL=(ALL) NOPASSWD: /usr/lib/scylla/scylla_prepare,/usr/lib/scylla/scylla_stop
scylla ALL=(ALL) NOPASSWD:SETENV: /usr/lib/scylla/scylla_prepare,/usr/lib/scylla/scylla_stop

View File

@@ -34,11 +34,14 @@ SCYLLA_HOME=/var/lib/scylla
# scylla config dir
SCYLLA_CONF=/etc/scylla
# additional arguments
SCYLLA_ARGS=""
# scylla arguments (for posix mode)
SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info --collectd-address=127.0.0.1:25826 --collectd=1 --collectd-poll-period 3000 --network-stack posix"
## scylla arguments (for dpdk mode)
#SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info --collectd-address=127.0.0.1:25826 --collectd=1 --collectd-poll-period 3000 --network-stack native --dpdk-pmd"
# scylla io
SCYLLA_IO=
# setup as AMI instance
AMI=no
# do not upgrade Scylla packages on AMI startup
AMI_KEEP_VERSION=no

View File

@@ -43,7 +43,7 @@ if [ "$ID" = "centos" ]; then
if [ $REBUILD = 1 ]; then
./dist/redhat/centos_dep/build_dependency.sh
else
sudo curl https://s3.amazonaws.com/downloads.scylladb.com/rpm/centos/scylla.repo -o /etc/yum.repos.d/scylla.repo
sudo curl https://s3.amazonaws.com/downloads.scylladb.com/rpm/unstable/centos/master/latest/scylla.repo -o /etc/yum.repos.d/scylla.repo
fi
fi
VERSION=$(./SCYLLA-VERSION-GEN)

View File

@@ -1,5 +1,5 @@
--- binutils.spec 2015-10-19 05:45:55.106745163 +0000
+++ binutils.spec.1 2015-10-19 05:45:55.807742899 +0000
--- binutils.spec.orig 2015-09-30 14:48:25.000000000 +0000
+++ binutils.spec 2016-01-20 14:42:17.856037134 +0000
@@ -17,7 +17,7 @@
%define enable_deterministic_archives 1
@@ -7,7 +7,7 @@
-Name: %{?cross}binutils%{?_with_debug:-debug}
+Name: scylla-%{?cross}binutils%{?_with_debug:-debug}
Version: 2.25
Release: 5%{?dist}
Release: 15%{?dist}
License: GPLv3+
@@ -29,6 +29,7 @@
# instead.
@@ -17,7 +17,7 @@
Source2: binutils-2.19.50.0.1-output-format.sed
Patch01: binutils-2.20.51.0.2-libtool-lib64.patch
@@ -82,6 +83,9 @@
@@ -89,6 +90,9 @@
BuildRequires: texinfo >= 4.0, gettext, flex, bison, zlib-devel
# BZ 920545: We need pod2man in order to build the manual pages.
BuildRequires: /usr/bin/pod2man
@@ -27,7 +27,7 @@
# Required for: ld-bootstrap/bootstrap.exp bootstrap with --static
# It should not be required for: ld-elf/elf.exp static {preinit,init,fini} array
%if %{run_testsuite}
@@ -105,8 +109,8 @@
@@ -112,8 +116,8 @@
%if "%{build_gold}" == "both"
Requires(post): coreutils
@@ -38,7 +38,7 @@
%endif
# On ARM EABI systems, we do want -gnueabi to be part of the
@@ -131,11 +135,12 @@
@@ -138,11 +142,12 @@
%package devel
Summary: BFD and opcodes static and dynamic libraries and header files
Group: System Environment/Libraries
@@ -50,10 +50,10 @@
Requires: zlib-devel
-Requires: binutils = %{version}-%{release}
+Requires: scylla-binutils = %{version}-%{release}
# BZ 1215242: We need touch...
Requires: coreutils
%description devel
This package contains BFD and opcodes static and dynamic libraries.
@@ -411,11 +416,11 @@
@@ -426,11 +431,11 @@
%post
%if "%{build_gold}" == "both"
%__rm -f %{_bindir}/%{?cross}ld
@@ -68,7 +68,7 @@
%endif
%if %{isnative}
/sbin/ldconfig
@@ -433,8 +438,8 @@
@@ -448,8 +453,8 @@
%preun
%if "%{build_gold}" == "both"
if [ $1 = 0 ]; then

View File

@@ -1,5 +1,5 @@
--- boost.spec 2015-05-03 17:32:13.000000000 +0000
+++ boost.spec.1 2015-10-19 06:03:12.670534256 +0000
--- boost.spec.orig 2016-01-15 18:41:47.000000000 +0000
+++ boost.spec 2016-01-20 14:46:47.397663246 +0000
@@ -6,6 +6,11 @@
# We should be able to install directly.
%define boost_docdir __tmp_docdir
@@ -20,9 +20,9 @@
+Name: scylla-boost
+%define orig_name boost
Summary: The free peer-reviewed portable C++ source libraries
Version: 1.57.0
%define version_enc 1_57_0
Release: 6%{?dist}
Version: 1.58.0
%define version_enc 1_58_0
Release: 11%{?dist}
License: Boost and MIT and Python
-%define toplev_dirname %{name}_%{version_enc}
@@ -93,8 +93,8 @@
+Requires: scylla-boost-wave%{?_isa} = %{version}-%{release}
BuildRequires: m4
BuildRequires: libstdc++-devel%{?_isa}
@@ -151,6 +159,7 @@
BuildRequires: libstdc++-devel
@@ -156,6 +164,7 @@
%package atomic
Summary: Run-Time component of boost atomic library
Group: System Environment/Libraries
@@ -102,7 +102,7 @@
%description atomic
@@ -162,7 +171,8 @@
@@ -167,7 +176,8 @@
%package chrono
Summary: Run-Time component of boost chrono library
Group: System Environment/Libraries
@@ -112,7 +112,7 @@
%description chrono
@@ -171,6 +181,7 @@
@@ -176,6 +186,7 @@
%package container
Summary: Run-Time component of boost container library
Group: System Environment/Libraries
@@ -120,7 +120,7 @@
%description container
@@ -183,6 +194,7 @@
@@ -188,6 +199,7 @@
%package context
Summary: Run-Time component of boost context switching library
Group: System Environment/Libraries
@@ -128,7 +128,7 @@
%description context
@@ -192,6 +204,7 @@
@@ -197,6 +209,7 @@
%package coroutine
Summary: Run-Time component of boost coroutine library
Group: System Environment/Libraries
@@ -136,7 +136,7 @@
%description coroutine
Run-Time support for Boost.Coroutine, a library that provides
@@ -203,6 +216,7 @@
@@ -208,6 +221,7 @@
%package date-time
Summary: Run-Time component of boost date-time library
Group: System Environment/Libraries
@@ -144,7 +144,7 @@
%description date-time
@@ -212,7 +226,8 @@
@@ -217,7 +231,8 @@
%package filesystem
Summary: Run-Time component of boost filesystem library
Group: System Environment/Libraries
@@ -154,7 +154,7 @@
%description filesystem
@@ -223,7 +238,8 @@
@@ -228,7 +243,8 @@
%package graph
Summary: Run-Time component of boost graph library
Group: System Environment/Libraries
@@ -164,7 +164,7 @@
%description graph
@@ -243,9 +259,10 @@
@@ -248,9 +264,10 @@
%package locale
Summary: Run-Time component of boost locale library
Group: System Environment/Libraries
@@ -178,7 +178,7 @@
%description locale
@@ -255,6 +272,7 @@
@@ -260,6 +277,7 @@
%package log
Summary: Run-Time component of boost logging library
Group: System Environment/Libraries
@@ -186,7 +186,7 @@
%description log
@@ -265,6 +283,7 @@
@@ -270,6 +288,7 @@
%package math
Summary: Math functions for boost TR1 library
Group: System Environment/Libraries
@@ -194,7 +194,7 @@
%description math
@@ -274,6 +293,7 @@
@@ -279,6 +298,7 @@
%package program-options
Summary: Run-Time component of boost program_options library
Group: System Environment/Libraries
@@ -202,7 +202,7 @@
%description program-options
@@ -284,6 +304,7 @@
@@ -289,6 +309,7 @@
%package python
Summary: Run-Time component of boost python library
Group: System Environment/Libraries
@@ -210,7 +210,7 @@
%description python
@@ -298,6 +319,7 @@
@@ -303,6 +324,7 @@
%package python3
Summary: Run-Time component of boost python library for Python 3
Group: System Environment/Libraries
@@ -218,7 +218,7 @@
%description python3
@@ -310,8 +332,9 @@
@@ -315,8 +337,9 @@
%package python3-devel
Summary: Shared object symbolic links for Boost.Python 3
Group: System Environment/Libraries
@@ -230,7 +230,7 @@
%description python3-devel
@@ -322,6 +345,7 @@
@@ -327,6 +350,7 @@
%package random
Summary: Run-Time component of boost random library
Group: System Environment/Libraries
@@ -238,7 +238,7 @@
%description random
@@ -330,6 +354,7 @@
@@ -335,6 +359,7 @@
%package regex
Summary: Run-Time component of boost regular expression library
Group: System Environment/Libraries
@@ -246,7 +246,7 @@
%description regex
@@ -338,6 +363,7 @@
@@ -343,6 +368,7 @@
%package serialization
Summary: Run-Time component of boost serialization library
Group: System Environment/Libraries
@@ -254,7 +254,7 @@
%description serialization
@@ -346,6 +372,7 @@
@@ -351,6 +377,7 @@
%package signals
Summary: Run-Time component of boost signals and slots library
Group: System Environment/Libraries
@@ -262,7 +262,7 @@
%description signals
@@ -354,6 +381,7 @@
@@ -359,6 +386,7 @@
%package system
Summary: Run-Time component of boost system support library
Group: System Environment/Libraries
@@ -270,7 +270,7 @@
%description system
@@ -364,6 +392,7 @@
@@ -369,6 +397,7 @@
%package test
Summary: Run-Time component of boost test library
Group: System Environment/Libraries
@@ -278,7 +278,7 @@
%description test
@@ -373,7 +402,8 @@
@@ -378,7 +407,8 @@
%package thread
Summary: Run-Time component of boost thread library
Group: System Environment/Libraries
@@ -288,7 +288,7 @@
%description thread
@@ -385,8 +415,9 @@
@@ -390,8 +420,9 @@
%package timer
Summary: Run-Time component of boost timer library
Group: System Environment/Libraries
@@ -300,7 +300,7 @@
%description timer
@@ -397,11 +428,12 @@
@@ -402,11 +433,12 @@
%package wave
Summary: Run-Time component of boost C99/C++ pre-processing library
Group: System Environment/Libraries
@@ -318,7 +318,7 @@
%description wave
@@ -412,27 +444,20 @@
@@ -417,27 +449,20 @@
%package devel
Summary: The Boost C++ headers and shared development libraries
Group: Development/Libraries
@@ -352,7 +352,7 @@
%description static
Static Boost C++ libraries.
@@ -443,11 +468,7 @@
@@ -448,11 +473,7 @@
%if 0%{?rhel} >= 6
BuildArch: noarch
%endif
@@ -365,7 +365,7 @@
%description doc
This package contains the documentation in the HTML format of the Boost C++
@@ -460,7 +481,7 @@
@@ -465,7 +486,7 @@
%if 0%{?rhel} >= 6
BuildArch: noarch
%endif
@@ -374,19 +374,18 @@
%description examples
This package contains example source files distributed with boost.
@@ -471,9 +492,10 @@
@@ -476,8 +497,9 @@
%package openmpi
Summary: Run-Time component of Boost.MPI library
Group: System Environment/Libraries
+Requires: scylla-env
Requires: openmpi%{?_isa}
BuildRequires: openmpi-devel
-Requires: boost-serialization%{?_isa} = %{version}-%{release}
+Requires: scylla-boost-serialization%{?_isa} = %{version}-%{release}
%description openmpi
@@ -483,10 +505,11 @@
@@ -487,10 +509,11 @@
%package openmpi-devel
Summary: Shared library symbolic links for Boost.MPI
Group: System Environment/Libraries
@@ -402,7 +401,7 @@
%description openmpi-devel
@@ -496,9 +519,10 @@
@@ -500,9 +523,10 @@
%package openmpi-python
Summary: Python run-time component of Boost.MPI library
Group: System Environment/Libraries
@@ -416,7 +415,7 @@
%description openmpi-python
@@ -508,8 +532,9 @@
@@ -512,8 +536,9 @@
%package graph-openmpi
Summary: Run-Time component of parallel boost graph library
Group: System Environment/Libraries
@@ -428,12 +427,11 @@
%description graph-openmpi
@@ -526,11 +551,11 @@
@@ -530,10 +555,10 @@
%package mpich
Summary: Run-Time component of Boost.MPI library
Group: System Environment/Libraries
+Requires: scylla-env
Requires: mpich%{?_isa}
BuildRequires: mpich-devel
-Requires: boost-serialization%{?_isa} = %{version}-%{release}
-Provides: boost-mpich2 = %{version}-%{release}
@@ -443,7 +441,7 @@
%description mpich
@@ -540,12 +565,12 @@
@@ -543,12 +568,12 @@
%package mpich-devel
Summary: Shared library symbolic links for Boost.MPI
Group: System Environment/Libraries
@@ -462,7 +460,7 @@
%description mpich-devel
@@ -555,11 +580,11 @@
@@ -558,11 +583,11 @@
%package mpich-python
Summary: Python run-time component of Boost.MPI library
Group: System Environment/Libraries
@@ -479,7 +477,7 @@
%description mpich-python
@@ -569,10 +594,10 @@
@@ -572,10 +597,10 @@
%package graph-mpich
Summary: Run-Time component of parallel boost graph library
Group: System Environment/Libraries
@@ -494,7 +492,7 @@
%description graph-mpich
@@ -586,7 +611,8 @@
@@ -589,7 +614,8 @@
%package build
Summary: Cross platform build system for C++ projects
Group: Development/Tools
@@ -504,7 +502,7 @@
BuildArch: noarch
%description build
@@ -600,6 +626,7 @@
@@ -613,6 +639,7 @@
%package jam
Summary: A low-level build tool
Group: Development/Tools
@@ -512,7 +510,7 @@
%description jam
Boost.Jam (BJam) is the low-level build engine tool for Boost.Build.
@@ -1134,7 +1161,7 @@
@@ -1186,7 +1213,7 @@
%files devel
%defattr(-, root, root, -)
%doc LICENSE_1_0.txt

View File

@@ -12,28 +12,36 @@ sudo yum install -y wget yum-utils rpm-build rpmdevtools gcc gcc-c++ make patch
mkdir -p build/srpms
cd build/srpms
if [ ! -f binutils-2.25-5.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/b/binutils-2.25-5.fc22.src.rpm
if [ ! -f binutils-2.25-15.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/binutils/2.25/15.fc23/src/binutils-2.25-15.fc23.src.rpm
fi
if [ ! -f isl-0.14-3.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/i/isl-0.14-3.fc22.src.rpm
if [ ! -f isl-0.14-4.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/isl/0.14/4.fc23/src/isl-0.14-4.fc23.src.rpm
fi
if [ ! -f gcc-5.1.1-4.fc22.src.rpm ]; then
wget https://s3.amazonaws.com/scylla-centos-dep/gcc-5.1.1-4.fc22.src.rpm
if [ ! -f gcc-5.3.1-2.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/gcc/5.3.1/2.fc23/src/gcc-5.3.1-2.fc23.src.rpm
fi
if [ ! -f boost-1.57.0-6.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/b/boost-1.57.0-6.fc22.src.rpm
if [ ! -f boost-1.58.0-11.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/boost/1.58.0/11.fc23/src/boost-1.58.0-11.fc23.src.rpm
fi
if [ ! -f ninja-build-1.5.3-2.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/n/ninja-build-1.5.3-2.fc22.src.rpm
if [ ! -f ninja-build-1.6.0-2.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/ninja-build/1.6.0/2.fc23/src/ninja-build-1.6.0-2.fc23.src.rpm
fi
if [ ! -f ragel-6.8-3.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/r/ragel-6.8-3.fc22.src.rpm
if [ ! -f ragel-6.8-5.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/ragel/6.8/5.fc23/src/ragel-6.8-5.fc23.src.rpm
fi
if [ ! -f gdb-7.10.1-30.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/gdb/7.10.1/30.fc23/src/gdb-7.10.1-30.fc23.src.rpm
fi
if [ ! -f pyparsing-2.0.3-2.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/pyparsing/2.0.3/2.fc23/src/pyparsing-2.0.3-2.fc23.src.rpm
fi
cd -
@@ -46,6 +54,8 @@ sudo yum install -y flex bison dejagnu zlib-static glibc-static sharutils bc lib
sudo yum install -y gcc-objc
sudo yum install -y asciidoc
sudo yum install -y gettext
sudo yum install -y rpm-devel python34-devel guile-devel readline-devel ncurses-devel expat-devel texlive-collection-latexrecommended xz-devel libselinux-devel
sudo yum install -y dos2unix
if [ ! -f $RPMBUILD/RPMS/noarch/scylla-env-1.0-1.el7.centos.noarch.rpm ]; then
cd dist/redhat/centos_dep
@@ -55,48 +65,62 @@ if [ ! -f $RPMBUILD/RPMS/noarch/scylla-env-1.0-1.el7.centos.noarch.rpm ]; then
fi
do_install scylla-env-1.0-1.el7.centos.noarch.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-binutils-2.25-5.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/binutils-2.25-5.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-binutils-2.25-15.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/binutils-2.25-15.fc23.src.rpm
patch $RPMBUILD/SPECS/binutils.spec < dist/redhat/centos_dep/binutils.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/binutils.spec
fi
do_install scylla-binutils-2.25-5.el7.centos.x86_64.rpm
do_install scylla-binutils-2.25-15.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-isl-0.14-3.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/isl-0.14-3.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-isl-0.14-4.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/isl-0.14-4.fc23.src.rpm
patch $RPMBUILD/SPECS/isl.spec < dist/redhat/centos_dep/isl.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/isl.spec
fi
do_install scylla-isl-0.14-3.el7.centos.x86_64.rpm
do_install scylla-isl-devel-0.14-3.el7.centos.x86_64.rpm
do_install scylla-isl-0.14-4.el7.centos.x86_64.rpm
do_install scylla-isl-devel-0.14-4.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gcc-5.1.1-4.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gcc-5.1.1-4.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gcc-5.3.1-2.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gcc-5.3.1-2.fc23.src.rpm
patch $RPMBUILD/SPECS/gcc.spec < dist/redhat/centos_dep/gcc.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/gcc.spec
fi
do_install scylla-*5.1.1-4*
do_install scylla-*5.3.1-2*
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-boost-1.57.0-6.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/boost-1.57.0-6.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-boost-1.58.0-11.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/boost-1.58.0-11.fc23.src.rpm
patch $RPMBUILD/SPECS/boost.spec < dist/redhat/centos_dep/boost.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/boost.spec
fi
do_install scylla-boost*
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ninja-build-1.5.3-2.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ninja-build-1.5.3-2.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ninja-build-1.6.0-2.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ninja-build-1.6.0-2.fc23.src.rpm
patch $RPMBUILD/SPECS/ninja-build.spec < dist/redhat/centos_dep/ninja-build.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/ninja-build.spec
fi
do_install scylla-ninja-build-1.5.3-2.el7.centos.x86_64.rpm
do_install scylla-ninja-build-1.6.0-2.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ragel-6.8-3.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ragel-6.8-3.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ragel-6.8-5.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ragel-6.8-5.fc23.src.rpm
patch $RPMBUILD/SPECS/ragel.spec < dist/redhat/centos_dep/ragel.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/ragel.spec
fi
do_install scylla-ragel-6.8-3.el7.centos.x86_64.rpm
do_install scylla-ragel-6.8-5.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gdb-7.10.1-30.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gdb-7.10.1-30.fc23.src.rpm
patch $RPMBUILD/SPECS/gdb.spec < dist/redhat/centos_dep/gdb.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/gdb.spec
fi
do_install scylla-gdb-7.10.1-30.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/noarch/python34-pyparsing-2.0.3-2.el7.centos.noarch.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/pyparsing-2.0.3-2.fc23.src.rpm
patch $RPMBUILD/SPECS/pyparsing.spec < dist/redhat/centos_dep/pyparsing.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/pyparsing.spec
fi
do_install python34-pyparsing-2.0.3-2.el7.centos.noarch.rpm
if [ ! -f $RPMBUILD/RPMS/noarch/scylla-antlr3-tool-3.5.2-1.el7.centos.noarch.rpm ]; then
mkdir build/scylla-antlr3-tool-3.5.2

View File

@@ -1,30 +1,14 @@
--- gcc.spec 2015-10-19 06:31:44.889189647 +0000
+++ gcc.spec.1 2015-10-19 07:56:17.445991665 +0000
@@ -1,22 +1,15 @@
%global DATE 20150618
%global SVNREV 224595
%global gcc_version 5.1.1
--- gcc.spec.orig 2015-12-08 16:03:46.000000000 +0000
+++ gcc.spec 2016-01-21 08:47:49.160667342 +0000
@@ -1,6 +1,7 @@
%global DATE 20151207
%global SVNREV 231358
%global gcc_version 5.3.1
+%define _prefix /opt/scylladb
# Note, gcc_release must be integer, if you want to add suffixes to
# %{release}, append them after %{gcc_release} on Release: line.
%global gcc_release 4
%global _unpackaged_files_terminate_build 0
%global _performance_build 1
%global multilib_64_archs sparc64 ppc64 ppc64p7 s390x x86_64
-%ifarch %{ix86} x86_64 ia64 ppc ppc64 ppc64p7 alpha %{arm} aarch64
-%global build_ada 1
-%else
%global build_ada 0
-%endif
-%ifarch %{ix86} x86_64 ppc ppc64 ppc64le ppc64p7 s390 s390x %{arm} aarch64
-%global build_go 1
-%else
%global build_go 0
-%endif
%ifarch %{ix86} x86_64 ia64
%global build_libquadmath 1
%else
@@ -82,7 +75,8 @@
%global gcc_release 2
@@ -84,7 +85,8 @@
%global multilib_32_arch i686
%endif
Summary: Various compilers (C, C++, Objective-C, Java, ...)
@@ -34,7 +18,7 @@
Version: %{gcc_version}
Release: %{gcc_release}%{?dist}
# libgcc, libgfortran, libgomp, libstdc++ and crtstuff have
@@ -97,6 +91,7 @@
@@ -99,6 +101,7 @@
%global isl_version 0.14
URL: http://gcc.gnu.org
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)
@@ -42,7 +26,7 @@
# Need binutils with -pie support >= 2.14.90.0.4-4
# Need binutils which can omit dot symbols and overlap .opd on ppc64 >= 2.15.91.0.2-4
# Need binutils which handle -msecure-plt on ppc >= 2.16.91.0.2-2
@@ -108,7 +103,7 @@
@@ -110,7 +113,7 @@
# Need binutils which support .cfi_sections >= 2.19.51.0.14-33
# Need binutils which support --no-add-needed >= 2.20.51.0.2-12
# Need binutils which support -plugin
@@ -51,7 +35,7 @@
# While gcc doesn't include statically linked binaries, during testing
# -static is used several times.
BuildRequires: glibc-static
@@ -143,15 +138,15 @@
@@ -145,15 +148,15 @@
BuildRequires: libunwind >= 0.98
%endif
%if %{build_isl}
@@ -71,7 +55,7 @@
# Need .eh_frame ld optimizations
# Need proper visibility support
# Need -pie support
@@ -166,7 +161,7 @@
@@ -168,7 +171,7 @@
# Need binutils that support .cfi_sections
# Need binutils that support --no-add-needed
# Need binutils that support -plugin
@@ -80,7 +64,7 @@
# Make sure gdb will understand DW_FORM_strp
Conflicts: gdb < 5.1-2
Requires: glibc-devel >= 2.2.90-12
@@ -174,17 +169,15 @@
@@ -176,17 +179,15 @@
# Make sure glibc supports TFmode long double
Requires: glibc >= 2.3.90-35
%endif
@@ -102,7 +86,7 @@
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
AutoReq: true
@@ -226,12 +219,12 @@
@@ -228,12 +229,12 @@
The gcc package contains the GNU Compiler Collection version 5.
You'll need this package in order to compile C code.
@@ -117,7 +101,7 @@
%endif
Obsoletes: libmudflap
Obsoletes: libmudflap-devel
@@ -239,17 +232,19 @@
@@ -241,17 +242,19 @@
Obsoletes: libgcj < %{version}-%{release}
Obsoletes: libgcj-devel < %{version}-%{release}
Obsoletes: libgcj-src < %{version}-%{release}
@@ -141,7 +125,7 @@
Autoreq: true
%description c++
@@ -257,50 +252,55 @@
@@ -259,50 +262,55 @@
It includes support for most of the current C++ specification,
including templates and exception handling.
@@ -209,7 +193,7 @@
Autoreq: true
%description objc
@@ -311,29 +311,32 @@
@@ -313,29 +321,32 @@
%package objc++
Summary: Objective-C++ support for GCC
Group: Development/Languages
@@ -249,7 +233,7 @@
%endif
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
@@ -343,260 +346,286 @@
@@ -345,260 +356,286 @@
The gcc-gfortran package provides support for compiling Fortran
programs with the GNU Compiler Collection.
@@ -608,7 +592,7 @@
Cpp is the GNU C-Compatible Compiler Preprocessor.
Cpp is a macro processor which is used automatically
by the C compiler to transform your program before actual
@@ -621,8 +650,9 @@
@@ -623,8 +660,9 @@
%package gnat
Summary: Ada 83, 95, 2005 and 2012 support for GCC
Group: Development/Languages
@@ -620,7 +604,7 @@
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
Autoreq: true
@@ -631,40 +661,44 @@
@@ -633,82 +671,90 @@
GNAT is a GNU Ada 83, 95, 2005 and 2012 front-end to GCC. This package includes
development tools, the documents and Ada compiler.
@@ -674,8 +658,13 @@
+Requires: scylla-libgo-devel = %{version}-%{release}
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
Requires(post): %{_sbindir}/update-alternatives
@@ -675,38 +709,42 @@
-Requires(post): %{_sbindir}/update-alternatives
-Requires(postun): %{_sbindir}/update-alternatives
+Requires(post): /sbin/update-alternatives
+Requires(postun): /sbin/update-alternatives
Autoreq: true
%description go
The gcc-go package provides support for compiling Go programs
with the GNU Compiler Collection.
@@ -728,7 +717,7 @@
Requires: gmp-devel >= 4.1.2-8, mpfr-devel >= 2.2.1, libmpc-devel >= 0.8.1
%description plugin-devel
@@ -726,7 +764,8 @@
@@ -728,7 +774,8 @@
Summary: Debug information for package %{name}
Group: Development/Debug
AutoReqProv: 0
@@ -738,21 +727,21 @@
%description debuginfo
This package provides debug information for package %{name}.
@@ -961,11 +1000,10 @@
@@ -958,11 +1005,11 @@
--enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu \
--enable-plugin --enable-initfini-array \
--disable-libgcj \
-%if 0%{fedora} >= 21 && 0%{fedora} <= 22
--with-default-libstdcxx-abi=c++98 \
--with-default-libstdcxx-abi=gcc4-compatible \
-%endif
%if %{build_isl}
- --with-isl \
--with-isl \
+ --with-isl-include=/opt/scylladb/include/ \
+ --with-isl-lib=/opt/scylladb/lib64/ \
%else
--without-isl \
%endif
@@ -974,11 +1012,9 @@
@@ -971,11 +1018,9 @@
%else
--disable-libmpx \
%endif
@@ -764,7 +753,7 @@
%ifarch %{arm}
--disable-sjlj-exceptions \
%endif
@@ -1009,9 +1045,6 @@
@@ -1006,9 +1051,6 @@
%if 0%{?rhel} >= 7
--with-cpu-32=power8 --with-tune-32=power8 --with-cpu-64=power8 --with-tune-64=power8 \
%endif
@@ -774,7 +763,7 @@
%endif
%ifarch ppc
--build=%{gcc_target_platform} --target=%{gcc_target_platform} --with-cpu=default32
@@ -1273,16 +1306,15 @@
@@ -1270,16 +1312,15 @@
mv %{buildroot}%{_prefix}/%{_lib}/libmpx.spec $FULLPATH/
%endif
@@ -797,7 +786,7 @@
%endif
%ifarch ppc
rm -f $FULLPATH/libgcc_s.so
@@ -1816,7 +1848,7 @@
@@ -1819,7 +1860,7 @@
chmod 755 %{buildroot}%{_prefix}/bin/c?9
cd ..
@@ -806,7 +795,7 @@
%find_lang cpplib
# Remove binaries we will not be including, so that they don't end up in
@@ -1866,11 +1898,7 @@
@@ -1869,11 +1910,7 @@
# run the tests.
make %{?_smp_mflags} -k check ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++ \
@@ -818,7 +807,7 @@
echo ====================TESTING=========================
( LC_ALL=C ../contrib/test_summary || : ) 2>&1 | sed -n '/^cat.*EOF/,/^EOF/{/^cat.*EOF/d;/^EOF/d;/^LAST_UPDATED:/d;p;}'
echo ====================TESTING END=====================
@@ -1897,13 +1925,13 @@
@@ -1900,13 +1937,13 @@
--info-dir=%{_infodir} %{_infodir}/gcc.info.gz || :
fi
@@ -834,7 +823,21 @@
if [ $1 = 0 -a -f %{_infodir}/cpp.info.gz ]; then
/sbin/install-info --delete \
--info-dir=%{_infodir} %{_infodir}/cpp.info.gz || :
@@ -1954,7 +1982,7 @@
@@ -1945,19 +1982,19 @@
fi
%post go
-%{_sbindir}/update-alternatives --install \
+/sbin/update-alternatives --install \
%{_prefix}/bin/go go %{_prefix}/bin/go.gcc 92 \
--slave %{_prefix}/bin/gofmt gofmt %{_prefix}/bin/gofmt.gcc
%preun go
if [ $1 = 0 ]; then
- %{_sbindir}/update-alternatives --remove go %{_prefix}/bin/go.gcc
+ /sbin/update-alternatives --remove go %{_prefix}/bin/go.gcc
fi
# Because glibc Prereq's libgcc and /sbin/ldconfig
# comes from glibc, it might not exist yet when
# libgcc is installed
@@ -843,7 +846,7 @@
if posix.access ("/sbin/ldconfig", "x") then
local pid = posix.fork ()
if pid == 0 then
@@ -1964,7 +1992,7 @@
@@ -1967,7 +2004,7 @@
end
end
@@ -852,7 +855,7 @@
if posix.access ("/sbin/ldconfig", "x") then
local pid = posix.fork ()
if pid == 0 then
@@ -1974,120 +2002,120 @@
@@ -1977,120 +2014,120 @@
end
end
@@ -1011,7 +1014,7 @@
%defattr(-,root,root,-)
%{_prefix}/bin/cc
%{_prefix}/bin/c89
@@ -2409,7 +2437,7 @@
@@ -2414,7 +2451,7 @@
%{!?_licensedir:%global license %%doc}
%license gcc/COPYING* COPYING.RUNTIME
@@ -1020,7 +1023,7 @@
%defattr(-,root,root,-)
%{_prefix}/lib/cpp
%{_prefix}/bin/cpp
@@ -2420,10 +2448,10 @@
@@ -2425,10 +2462,10 @@
%dir %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/cc1
@@ -1034,7 +1037,7 @@
%{!?_licensedir:%global license %%doc}
%license gcc/COPYING* COPYING.RUNTIME
@@ -2461,7 +2489,7 @@
@@ -2469,7 +2506,7 @@
%endif
%doc rpm.doc/changelogs/gcc/cp/ChangeLog*
@@ -1043,7 +1046,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libstdc++.so.6*
%dir %{_datadir}/gdb
@@ -2473,7 +2501,7 @@
@@ -2481,7 +2518,7 @@
%dir %{_prefix}/share/gcc-%{gcc_version}/python
%{_prefix}/share/gcc-%{gcc_version}/python/libstdcxx
@@ -1052,7 +1055,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/include/c++
%dir %{_prefix}/include/c++/%{gcc_version}
@@ -2488,7 +2516,7 @@
@@ -2507,7 +2544,7 @@
%endif
%doc rpm.doc/changelogs/libstdc++-v3/ChangeLog* libstdc++-v3/README*
@@ -1061,7 +1064,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2509,7 +2537,7 @@
@@ -2528,7 +2565,7 @@
%endif
%if %{build_libstdcxx_docs}
@@ -1070,7 +1073,7 @@
%defattr(-,root,root)
%{_mandir}/man3/*
%doc rpm.doc/libstdc++-v3/html
@@ -2548,7 +2576,7 @@
@@ -2567,7 +2604,7 @@
%dir %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/cc1objplus
@@ -1079,7 +1082,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libobjc.so.4*
@@ -2602,11 +2630,11 @@
@@ -2621,11 +2658,11 @@
%endif
%doc rpm.doc/gfortran/*
@@ -1093,7 +1096,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2652,12 +2680,12 @@
@@ -2671,12 +2708,12 @@
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/gnat1
%doc rpm.doc/changelogs/gcc/ada/ChangeLog*
@@ -1108,7 +1111,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2683,7 +2711,7 @@
@@ -2702,7 +2739,7 @@
%exclude %{_prefix}/lib/gcc/%{gcc_target_platform}/%{gcc_version}/adalib/libgnarl.a
%endif
@@ -1117,7 +1120,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2707,7 +2735,7 @@
@@ -2726,7 +2763,7 @@
%endif
%endif
@@ -1126,7 +1129,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libgomp.so.1*
%{_prefix}/%{_lib}/libgomp-plugin-host_nonshm.so.1*
@@ -2715,14 +2743,14 @@
@@ -2734,14 +2771,14 @@
%doc rpm.doc/changelogs/libgomp/ChangeLog*
%if %{build_libquadmath}
@@ -1143,7 +1146,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2735,7 +2763,7 @@
@@ -2754,7 +2791,7 @@
%endif
%doc rpm.doc/libquadmath/ChangeLog*
@@ -1152,7 +1155,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2754,12 +2782,12 @@
@@ -2773,12 +2810,12 @@
%endif
%if %{build_libitm}
@@ -1167,7 +1170,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2772,7 +2800,7 @@
@@ -2791,7 +2828,7 @@
%endif
%doc rpm.doc/libitm/ChangeLog*
@@ -1176,7 +1179,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2791,11 +2819,11 @@
@@ -2810,11 +2847,11 @@
%endif
%if %{build_libatomic}
@@ -1190,7 +1193,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2815,11 +2843,11 @@
@@ -2834,11 +2871,11 @@
%endif
%if %{build_libasan}
@@ -1204,7 +1207,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2841,11 +2869,11 @@
@@ -2860,11 +2897,11 @@
%endif
%if %{build_libubsan}
@@ -1218,7 +1221,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2867,11 +2895,11 @@
@@ -2886,11 +2923,11 @@
%endif
%if %{build_libtsan}
@@ -1232,7 +1235,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2883,11 +2911,11 @@
@@ -2902,11 +2939,11 @@
%endif
%if %{build_liblsan}
@@ -1246,7 +1249,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2899,11 +2927,11 @@
@@ -2918,11 +2955,11 @@
%endif
%if %{build_libcilkrts}
@@ -1260,7 +1263,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2923,12 +2951,12 @@
@@ -2942,12 +2979,12 @@
%endif
%if %{build_libmpx}
@@ -1275,7 +1278,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2990,12 +3018,12 @@
@@ -3009,12 +3046,12 @@
%endif
%doc rpm.doc/go/*
@@ -1290,7 +1293,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -3023,7 +3051,7 @@
@@ -3042,7 +3079,7 @@
%{_prefix}/lib/gcc/%{gcc_target_platform}/%{gcc_version}/libgo.so
%endif
@@ -1299,7 +1302,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -3041,12 +3069,12 @@
@@ -3060,12 +3097,12 @@
%endif
%endif

29
dist/redhat/centos_dep/gdb.diff vendored Normal file
View File

@@ -0,0 +1,29 @@
--- gdb.spec.orig 2015-12-06 04:10:30.000000000 +0000
+++ gdb.spec 2016-01-20 14:49:12.745843903 +0000
@@ -16,7 +16,10 @@
}
Summary: A GNU source-level debugger for C, C++, Fortran, Go and other languages
-Name: %{?scl_prefix}gdb
+Name: %{?scl_prefix}scylla-gdb
+%define orig_name gdb
+Requires: scylla-env
+%define _prefix /opt/scylladb
# Freeze it when GDB gets branched
%global snapsrc 20150706
@@ -572,12 +575,8 @@
BuildRequires: rpm-devel%{buildisa}
BuildRequires: zlib-devel%{buildisa} libselinux-devel%{buildisa}
%if 0%{!?_without_python:1}
-%if 0%{?rhel:1} && 0%{?rhel} <= 7
-BuildRequires: python-devel%{buildisa}
-%else
-%global __python %{__python3}
-BuildRequires: python3-devel%{buildisa}
-%endif
+BuildRequires: python34-devel%{?_isa}
+%global __python /usr/bin/python3.4
%if 0%{?rhel:1} && 0%{?rhel} <= 7
# Temporarily before python files get moved to libstdc++.rpm
# libstdc++%{bits_other} is not present in Koji, the .spec script generating

View File

@@ -1,5 +1,5 @@
--- isl.spec 2015-01-06 16:24:49.000000000 +0000
+++ isl.spec.1 2015-10-18 12:12:38.000000000 +0000
--- isl.spec.orig 2016-01-20 14:41:16.891802146 +0000
+++ isl.spec 2016-01-20 14:43:13.838336396 +0000
@@ -1,5 +1,5 @@
Summary: Integer point manipulation library
-Name: isl

View File

@@ -1,34 +1,56 @@
1c1
< Name: ninja-build
---
> Name: scylla-ninja-build
8d7
< Source1: ninja.vim
10a10
> Requires: scylla-env
14,16c14,15
< BuildRequires: re2c >= 0.11.3
< Requires: emacs-filesystem
< Requires: vim-filesystem
---
> #BuildRequires: scylla-re2c >= 0.11.3
> %define _prefix /opt/scylladb
35,37c34
< # TODO: Install ninja_syntax.py?
< mkdir -p %{buildroot}/{%{_bindir},%{_datadir}/bash-completion/completions,%{_datadir}/emacs/site-lisp,%{_datadir}/vim/vimfiles/syntax,%{_datadir}/vim/vimfiles/ftdetect,%{_datadir}/zsh/site-functions}
<
---
> mkdir -p %{buildroot}/opt/scylladb/bin
39,43d35
< install -pm644 misc/bash-completion %{buildroot}%{_datadir}/bash-completion/completions/ninja-bash-completion
< install -pm644 misc/ninja-mode.el %{buildroot}%{_datadir}/emacs/site-lisp/ninja-mode.el
< install -pm644 misc/ninja.vim %{buildroot}%{_datadir}/vim/vimfiles/syntax/ninja.vim
< install -pm644 %{SOURCE1} %{buildroot}%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
< install -pm644 misc/zsh-completion %{buildroot}%{_datadir}/zsh/site-functions/_ninja
53,58d44
< %{_datadir}/bash-completion/completions/ninja-bash-completion
< %{_datadir}/emacs/site-lisp/ninja-mode.el
< %{_datadir}/vim/vimfiles/syntax/ninja.vim
< %{_datadir}/vim/vimfiles/ftdetect/ninja.vim
< # zsh does not have a -filesystem package
< %{_datadir}/zsh/
--- ninja-build.spec.orig 2016-01-20 14:41:16.892802134 +0000
+++ ninja-build.spec 2016-01-20 14:44:42.453227192 +0000
@@ -1,19 +1,18 @@
-Name: ninja-build
+Name: scylla-ninja-build
Version: 1.6.0
Release: 2%{?dist}
Summary: A small build system with a focus on speed
License: ASL 2.0
URL: http://martine.github.com/ninja/
Source0: https://github.com/martine/ninja/archive/v%{version}.tar.gz#/ninja-%{version}.tar.gz
-Source1: ninja.vim
# Rename mentions of the executable name to be ninja-build.
Patch1000: ninja-1.6.0-binary-rename.patch
+Requires: scylla-env
BuildRequires: asciidoc
BuildRequires: gtest-devel
BuildRequires: python2-devel
-BuildRequires: re2c >= 0.11.3
-Requires: emacs-filesystem
-Requires: vim-filesystem
+#BuildRequires: scylla-re2c >= 0.11.3
+%define _prefix /opt/scylladb
%description
Ninja is a small build system with a focus on speed. It differs from other
@@ -32,15 +31,8 @@
./ninja -v ninja_test
%install
-# TODO: Install ninja_syntax.py?
-mkdir -p %{buildroot}/{%{_bindir},%{_datadir}/bash-completion/completions,%{_datadir}/emacs/site-lisp,%{_datadir}/vim/vimfiles/syntax,%{_datadir}/vim/vimfiles/ftdetect,%{_datadir}/zsh/site-functions}
-
+mkdir -p %{buildroot}/opt/scylladb/bin
install -pm755 ninja %{buildroot}%{_bindir}/ninja-build
-install -pm644 misc/bash-completion %{buildroot}%{_datadir}/bash-completion/completions/ninja-bash-completion
-install -pm644 misc/ninja-mode.el %{buildroot}%{_datadir}/emacs/site-lisp/ninja-mode.el
-install -pm644 misc/ninja.vim %{buildroot}%{_datadir}/vim/vimfiles/syntax/ninja.vim
-install -pm644 %{SOURCE1} %{buildroot}%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
-install -pm644 misc/zsh-completion %{buildroot}%{_datadir}/zsh/site-functions/_ninja
%check
# workaround possible too low default limits
@@ -50,12 +42,6 @@
%files
%doc COPYING HACKING.md README doc/manual.html
%{_bindir}/ninja-build
-%{_datadir}/bash-completion/completions/ninja-bash-completion
-%{_datadir}/emacs/site-lisp/ninja-mode.el
-%{_datadir}/vim/vimfiles/syntax/ninja.vim
-%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
-# zsh does not have a -filesystem package
-%{_datadir}/zsh/
%changelog
* Mon Nov 16 2015 Ben Boeckel <mathstuf@gmail.com> - 1.6.0-2

40
dist/redhat/centos_dep/pyparsing.diff vendored Normal file
View File

@@ -0,0 +1,40 @@
--- pyparsing.spec.orig 2016-01-25 19:11:14.663651658 +0900
+++ pyparsing.spec 2016-01-25 19:12:49.853875369 +0900
@@ -1,4 +1,4 @@
-%if 0%{?fedora}
+%if 0%{?centos}
%global with_python3 1
%endif
@@ -15,7 +15,7 @@
BuildRequires: dos2unix
BuildRequires: glibc-common
%if 0%{?with_python3}
-BuildRequires: python3-devel
+BuildRequires: python34-devel
%endif # if with_python3
%description
@@ -30,11 +30,11 @@
The package contains documentation for pyparsing.
%if 0%{?with_python3}
-%package -n python3-pyparsing
+%package -n python34-pyparsing
Summary: An object-oriented approach to text processing (Python 3 version)
Group: Development/Libraries
-%description -n python3-pyparsing
+%description -n python34-pyparsing
pyparsing is a module that can be used to easily and directly configure syntax
definitions for any number of text parsing applications.
@@ -90,7 +90,7 @@
%{python_sitelib}/pyparsing.py*
%if 0%{?with_python3}
-%files -n python3-pyparsing
+%files -n python34-pyparsing
%doc CHANGES README LICENSE
%{python3_sitelib}/pyparsing*egg-info
%{python3_sitelib}/pyparsing.py*

View File

@@ -1,11 +1,11 @@
--- ragel.spec 2014-08-18 11:55:49.000000000 +0000
+++ ragel.spec.1 2015-10-18 12:18:23.000000000 +0000
--- ragel.spec.orig 2015-06-18 22:12:28.000000000 +0000
+++ ragel.spec 2016-01-20 14:49:53.980327766 +0000
@@ -1,17 +1,20 @@
-Name: ragel
+Name: scylla-ragel
+%define orig_name ragel
Version: 6.8
Release: 3%{?dist}
Release: 5%{?dist}
Summary: Finite state machine compiler
Group: Development/Tools

View File

@@ -1,14 +0,0 @@
#!/bin/sh -e
args="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info $SCYLLA_ARGS"
if [ "$NETWORK_MODE" = "posix" ]; then
args="$args --network-stack posix"
elif [ "$NETWORK_MODE" = "virtio" ]; then
args="$args --network-stack native"
elif [ "$NETWORK_MODE" = "dpdk" ]; then
args="$args --network-stack native --dpdk-pmd"
fi
export HOME=/var/lib/scylla
exec /usr/bin/scylla $args

View File

@@ -9,9 +9,10 @@ URL: http://www.scylladb.com/
Source0: %{name}-@@VERSION@@-@@RELEASE@@.tar
BuildRequires: libaio-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel systemd-devel
%{?fedora:BuildRequires: boost-devel ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan}
%{?rhel:BuildRequires: scylla-libstdc++-static scylla-boost-devel scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1}
Requires: systemd-libs xfsprogs mdadm hwloc
%{?fedora:BuildRequires: boost-devel ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan python3-pyparsing}
%{?rhel:BuildRequires: scylla-libstdc++-static scylla-boost-devel scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1, python34-pyparsing}
Requires: systemd-libs hwloc
Conflicts: abrt
%description
@@ -51,7 +52,6 @@ install -m644 conf/scylla.yaml $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 conf/cassandra-rackdc.properties $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 dist/redhat/systemd/scylla-server.service $RPM_BUILD_ROOT%{_unitdir}/
install -m755 dist/common/scripts/* $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 dist/redhat/scripts/* $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 seastar/scripts/posix_net_conf.sh $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 seastar/dpdk/tools/dpdk_nic_bind.py $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
@@ -140,7 +140,6 @@ rm -rf $RPM_BUILD_ROOT
%{_unitdir}/scylla-server.service
%{_bindir}/scylla
%{_prefix}/lib/scylla/scylla_prepare
%{_prefix}/lib/scylla/scylla_run
%{_prefix}/lib/scylla/scylla_stop
%{_prefix}/lib/scylla/scylla_setup
%{_prefix}/lib/scylla/scylla_coredump_setup

View File

@@ -1,6 +1,6 @@
[Unit]
Description=Scylla Server
After=network.target libvirtd.service
After=network.target
[Service]
Type=notify
@@ -8,10 +8,12 @@ LimitMEMLOCK=infinity
LimitNOFILE=200000
LimitAS=infinity
LimitNPROC=8096
WorkingDirectory=/var/lib/scylla
Environment="HOME=/var/lib/scylla"
EnvironmentFile=/etc/sysconfig/scylla-server
ExecStartPre=/usr/bin/sudo /usr/lib/scylla/scylla_prepare
ExecStart=/usr/lib/scylla/scylla_run
ExecStopPost=/usr/bin/sudo /usr/lib/scylla/scylla_stop
ExecStartPre=/usr/bin/sudo -E /usr/lib/scylla/scylla_prepare
ExecStart=/usr/bin/scylla $SCYLLA_ARGS $SCYLLA_IO
ExecStopPost=/usr/bin/sudo -E /usr/lib/scylla/scylla_stop
TimeoutStartSec=900
KillMode=process
Restart=no

View File

@@ -9,6 +9,19 @@ if [ -e debian ] || [ -e build/release ]; then
rm -rf debian build
mkdir build
fi
sudo apt-get -y update
if [ ! -f /usr/bin/git ]; then
sudo apt-get -y install git
fi
if [ ! -f /usr/bin/mk-build-deps ]; then
sudo apt-get -y install devscripts
fi
if [ ! -f /usr/bin/equivs-build ]; then
sudo apt-get -y install equivs
fi
if [ ! -f /usr/bin/add-apt-repository ]; then
sudo apt-get -y install software-properties-common
fi
RELEASE=`lsb_release -r|awk '{print $2}'`
CODENAME=`lsb_release -c|awk '{print $2}'`
@@ -30,28 +43,24 @@ cp dist/ubuntu/changelog.in debian/changelog
sed -i -e "s/@@VERSION@@/$SCYLLA_VERSION/g" debian/changelog
sed -i -e "s/@@RELEASE@@/$SCYLLA_RELEASE/g" debian/changelog
sed -i -e "s/@@CODENAME@@/$CODENAME/g" debian/changelog
cp dist/ubuntu/rules.in debian/rules
cp dist/ubuntu/control.in debian/control
if [ "$RELEASE" = "15.10" ]; then
sed -i -e "s/@@COMPILER@@/g++/g" debian/rules
sed -i -e "s/@@COMPILER@@/g++/g" debian/control
else
sed -i -e "s/@@COMPILER@@/g++-4.9/g" debian/rules
sed -i -e "s/@@COMPILER@@/g++-4.9/g" debian/control
fi
sudo apt-get -y update
./dist/ubuntu/dep/build_dependency.sh
DEP="libyaml-cpp-dev liblz4-dev libsnappy-dev libcrypto++-dev libjsoncpp-dev libaio-dev ragel ninja-build git liblz4-1 libaio1 hugepages software-properties-common libgnutls28-dev libhwloc-dev libnuma-dev libpciaccess-dev"
if [ "$RELEASE" = "14.04" ]; then
DEP="$DEP libboost1.55-dev libboost-program-options1.55.0 libboost-program-options1.55-dev libboost-system1.55.0 libboost-system1.55-dev libboost-thread1.55.0 libboost-thread1.55-dev libboost-test1.55.0 libboost-test1.55-dev libboost-filesystem1.55-dev libboost-filesystem1.55.0 libsnappy1"
else
DEP="$DEP libboost-dev libboost-program-options-dev libboost-system-dev libboost-thread-dev libboost-test-dev libboost-filesystem-dev libboost-filesystem-dev libsnappy1v5"
fi
if [ "$RELEASE" = "15.10" ]; then
DEP="$DEP libjsoncpp0v5 libcrypto++9v5 libyaml-cpp0.5v5 antlr3"
else
DEP="$DEP libjsoncpp0 libcrypto++9 libyaml-cpp0.5"
fi
sudo apt-get -y install $DEP
if [ "$RELEASE" != "15.10" ]; then
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
sudo apt-get -y update
sudo apt-get -y install g++-4.9
fi
sudo apt-get -y install g++-4.9
echo Y | sudo mk-build-deps -i -r
debuild -r fakeroot -us -uc

View File

@@ -4,11 +4,11 @@ Homepage: http://scylladb.com
Section: database
Priority: optional
Standards-Version: 3.9.5
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, g++-4.9, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev, xfslibs-dev, python3-pyparsing, libxml2-dev, @@COMPILER@@
Package: scylla-server
Architecture: amd64
Depends: ${shlibs:Depends}, ${misc:Depends}, hugepages, adduser, mdadm, xfsprogs, hwloc-nox
Depends: ${shlibs:Depends}, ${misc:Depends}, hugepages, adduser, hwloc-nox
Description: Scylla database server binaries
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.

View File

@@ -11,23 +11,31 @@ umask 022
console log
expect stop
setuid scylla
setgid scylla
limit core unlimited unlimited
limit memlock unlimited unlimited
limit nofile 200000 200000
limit as unlimited unlimited
limit nproc 8096 8096
chdir /var/lib/scylla
env HOME=/var/lib/scylla
pre-start script
cd /var/lib/scylla
. /etc/default/scylla-server
export NETWORK_MODE TAP BRIDGE ETHDRV ETHPCIID NR_HUGEPAGES USER GROUP SCYLLA_HOME SCYLLA_CONF SCYLLA_ARGS
/usr/lib/scylla/scylla_prepare
export NETWORK_MODE TAP BRIDGE ETHDRV ETHPCIID NR_HUGEPAGES USER GROUP SCYLLA_HOME SCYLLA_CONF SCYLLA_ARGS SCYLLA_IO
sudo /usr/lib/scylla/scylla_prepare
end script
script
cd /var/lib/scylla
. /etc/default/scylla-server
export NETWORK_MODE TAP BRIDGE ETHDRV ETHPCIID NR_HUGEPAGES USER GROUP SCYLLA_HOME SCYLLA_CONF SCYLLA_ARGS
exec /usr/lib/scylla/scylla_run
export NETWORK_MODE TAP BRIDGE ETHDRV ETHPCIID NR_HUGEPAGES USER GROUP SCYLLA_HOME SCYLLA_CONF SCYLLA_ARGS SCYLLA_IO
exec /usr/bin/scylla $SCYLLA_ARGS $SCYLLA_IO
end script
post-stop script
cd /var/lib/scylla
. /etc/default/scylla-server
export NETWORK_MODE TAP BRIDGE ETHDRV ETHPCIID NR_HUGEPAGES USER GROUP SCYLLA_HOME SCYLLA_CONF SCYLLA_ARGS
/usr/lib/scylla/scylla_stop
export NETWORK_MODE TAP BRIDGE ETHDRV ETHPCIID NR_HUGEPAGES USER GROUP SCYLLA_HOME SCYLLA_CONF SCYLLA_ARGS SCYLLA_IO
sudo /usr/lib/scylla/scylla_stop
end script

View File

@@ -1,15 +1,8 @@
#!/bin/sh -e
RELEASE=`lsb_release -r|awk '{print $2}'`
DEP="build-essential debhelper openjdk-7-jre-headless build-essential autoconf automake pkg-config libtool bison flex libevent-dev libglib2.0-dev libqt4-dev python-dev python-dbg php5-dev devscripts python-support xfslibs-dev"
if [ "$RELEASE" = "14.04" ]; then
DEP="$DEP libboost1.55-dev libboost-test1.55-dev"
else
DEP="$DEP libboost-dev libboost-test-dev"
fi
sudo apt-get -y install $DEP
sudo apt-get install -y gdebi-core
if [ "$RELEASE" = "14.04" ]; then
if [ ! -f build/antlr3_3.5.2-1_all.deb ]; then
rm -rf build/antlr3-3.5.2
@@ -17,6 +10,7 @@ if [ "$RELEASE" = "14.04" ]; then
cp -a dist/ubuntu/dep/antlr3-3.5.2/* build/antlr3-3.5.2
cd build/antlr3-3.5.2
wget http://www.antlr3.org/download/antlr-3.5.2-complete-no-st3.jar
echo Y | sudo mk-build-deps -i -r
debuild -r fakeroot --no-tgz-check -us -uc
cd -
fi
@@ -33,6 +27,7 @@ if [ ! -f build/antlr3-c++-dev_3.5.2-1_all.deb ]; then
cd -
cp -a dist/ubuntu/dep/antlr3-c++-dev-3.5.2/debian build/antlr3-c++-dev-3.5.2
cd build/antlr3-c++-dev-3.5.2
echo Y | sudo mk-build-deps -i -r
debuild -r fakeroot --no-tgz-check -us -uc
cd -
fi
@@ -46,8 +41,15 @@ if [ ! -f build/libthrift0_1.0.0-dev_amd64.deb ]; then
tar xpf thrift-0.9.1.tar.gz
cd thrift-0.9.1
patch -p0 < ../../dist/ubuntu/dep/thrift.diff
echo Y | sudo mk-build-deps -i -r
debuild -r fakeroot --no-tgz-check -us -uc
cd ../..
fi
sudo dpkg -i build/*.deb
if [ "$RELEASE" = "14.04" ]; then
sudo gdebi -n build/antlr3_*.deb
fi
sudo gdebi -n build/antlr3-c++-dev_*.deb
sudo gdebi -n build/libthrift0_*.deb
sudo gdebi -n build/libthrift-dev_*.deb
sudo gdebi -n build/thrift-compiler_*.deb

View File

@@ -1,6 +1,5 @@
diff -Nur ./debian/changelog ../thrift-0.9.1/debian/changelog
--- ./debian/changelog 2013-08-15 23:04:29.000000000 +0900
+++ ../thrift-0.9.1/debian/changelog 2015-10-29 23:03:25.797937232 +0900
--- debian/changelog 2013-08-15 23:04:29.000000000 +0900
+++ ../thrift-0.9.1-ubuntu/debian/changelog 2016-01-15 23:22:11.189982999 +0900
@@ -1,65 +1,4 @@
-thrift (1.0.0-dev) stable; urgency=low
- * update version
@@ -70,9 +69,8 @@ diff -Nur ./debian/changelog ../thrift-0.9.1/debian/changelog
-
- -- Esteve Fernandez <esteve@fluidinfo.com> Thu, 15 Jan 2009 11:34:24 +0100
+ -- Takuya ASADA <syuu@scylladb.com> Wed, 28 Oct 2015 05:11:38 +0900
diff -Nur ./debian/control ../thrift-0.9.1/debian/control
--- ./debian/control 2013-08-18 23:58:22.000000000 +0900
+++ ../thrift-0.9.1/debian/control 2015-10-28 00:54:05.950464999 +0900
--- debian/control 2013-08-18 23:58:22.000000000 +0900
+++ ../thrift-0.9.1-ubuntu/debian/control 2016-01-15 23:32:47.373982999 +0900
@@ -1,12 +1,10 @@
Source: thrift
Section: devel
@@ -86,7 +84,7 @@ diff -Nur ./debian/control ../thrift-0.9.1/debian/control
+Build-Depends: debhelper (>= 5), build-essential, autoconf,
+ automake, pkg-config, libtool, bison, flex, libboost-dev | libboost1.55-dev,
+ libboost-test-dev | libboost-test1.55-dev, libevent-dev,
+ libglib2.0-dev, libqt4-dev
+ libglib2.0-dev, libqt4-dev, libssl-dev, python-support
Maintainer: Thrift Developer's <dev@thrift.apache.org>
Homepage: http://thrift.apache.org/
Vcs-Git: https://git-wip-us.apache.org/repos/asf/thrift.git
@@ -205,9 +203,8 @@ diff -Nur ./debian/control ../thrift-0.9.1/debian/control
- build services that work efficiently and seamlessly.
- .
- This package contains the PHP bindings for Thrift.
diff -Nur ./debian/rules ../thrift-0.9.1/debian/rules
--- ./debian/rules 2013-08-15 23:04:29.000000000 +0900
+++ ../thrift-0.9.1/debian/rules 2015-10-28 00:54:05.950464999 +0900
--- debian/rules 2013-08-15 23:04:29.000000000 +0900
+++ ../thrift-0.9.1-ubuntu/debian/rules 2016-01-15 23:22:11.189982999 +0900
@@ -45,18 +45,6 @@
# Compile C (glib) library
$(MAKE) -C $(CURDIR)/lib/c_glib

View File

@@ -5,12 +5,13 @@ SCRIPTS = $(CURDIR)/debian/scylla-server/usr/lib/scylla
SWAGGER = $(SCRIPTS)/swagger-ui
API = $(SCRIPTS)/api
SYSCTL = $(CURDIR)/debian/scylla-server/etc/sysctl.d
SUDOERS = $(CURDIR)/debian/scylla-server/etc/sudoers.d
LIMITS= $(CURDIR)/debian/scylla-server/etc/security/limits.d
LIBS = $(CURDIR)/debian/scylla-server/usr/lib
CONF = $(CURDIR)/debian/scylla-server/etc/scylla
override_dh_auto_build:
./configure.py --disable-xen --enable-dpdk --mode=release --static-stdc++ --compiler=g++-4.9
./configure.py --disable-xen --enable-dpdk --mode=release --static-stdc++ --compiler=@@COMPILER@@
ninja
override_dh_auto_clean:
@@ -25,6 +26,9 @@ override_dh_auto_install:
mkdir -p $(SYSCTL) && \
cp $(CURDIR)/dist/ubuntu/sysctl.d/99-scylla.conf $(SYSCTL)
mkdir -p $(SUDOERS) && \
cp $(CURDIR)/dist/common/sudoers.d/scylla $(SUDOERS)
mkdir -p $(CONF) && \
cp $(CURDIR)/conf/scylla.yaml $(CONF)
cp $(CURDIR)/conf/cassandra-rackdc.properties $(CONF)
@@ -36,7 +40,8 @@ override_dh_auto_install:
cp -r $(CURDIR)/licenses $(DOC)
mkdir -p $(SCRIPTS) && \
cp $(CURDIR)/seastar/dpdk/tools/dpdk_nic_bind.py $(SCRIPTS)
cp $(CURDIR)/seastar/scripts/dpdk_nic_bind.py $(SCRIPTS)
cp $(CURDIR)/seastar/scripts/posix_net_conf.sh $(SCRIPTS)
cp $(CURDIR)/dist/common/scripts/* $(SCRIPTS)
cp $(CURDIR)/dist/ubuntu/scripts/* $(SCRIPTS)

View File

@@ -1,19 +0,0 @@
#!/bin/bash -e
args="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info $SCYLLA_ARGS"
if [ "$NETWORK_MODE" = "posix" ]; then
args="$args --network-stack posix"
elif [ "$NETWORK_MODE" = "virtio" ]; then
args="$args --network-stack native"
elif [ "$NETWORK_MODE" = "dpdk" ]; then
args="$args --network-stack native --dpdk-pmd"
fi
export HOME=/var/lib/scylla
ulimit -c unlimited
ulimit -l unlimited
ulimit -n 200000
ulimit -m unlimited
ulimit -u 8096
exec sudo -E -u $USER /usr/bin/scylla $args

View File

@@ -118,26 +118,3 @@ std::ostream& operator<<(std::ostream& out, const frozen_mutation::printer& pr)
frozen_mutation::printer frozen_mutation::pretty_printer(schema_ptr s) const {
return { *this, std::move(s) };
}
template class db::serializer<frozen_mutation>;
template<>
db::serializer<frozen_mutation>::serializer(const frozen_mutation& mutation)
: _item(mutation), _size(sizeof(uint32_t) /* size */ + mutation.representation().size()) {
}
template<>
void db::serializer<frozen_mutation>::write(output& out, const frozen_mutation& mutation) {
bytes_view v = mutation.representation();
out.write(v);
}
template<>
void db::serializer<frozen_mutation>::read(frozen_mutation& m, input& in) {
m = read(in);
}
template<>
frozen_mutation db::serializer<frozen_mutation>::read(input& in) {
return frozen_mutation(bytes_serializer::read(in));
}

View File

@@ -67,14 +67,3 @@ public:
};
frozen_mutation freeze(const mutation& m);
namespace db {
typedef serializer<frozen_mutation> frozen_mutation_serializer;
template<> serializer<frozen_mutation>::serializer(const frozen_mutation &);
template<> void serializer<frozen_mutation>::write(output&, const type&);
template<> void serializer<frozen_mutation>::read(frozen_mutation&, input&);
template<> frozen_mutation serializer<frozen_mutation>::read(input&);
}

View File

@@ -23,8 +23,6 @@
#include "db/schema_tables.hh"
#include "schema_mutations.hh"
template class db::serializer<frozen_schema>;
frozen_schema::frozen_schema(const schema_ptr& s)
: _data([&s] {
bytes_ostream out;
@@ -46,19 +44,7 @@ frozen_schema::frozen_schema(bytes b)
: _data(std::move(b))
{ }
template<>
db::serializer<frozen_schema>::serializer(const frozen_schema& v)
: _item(v)
, _size(db::serializer<bytes>(v._data).size())
{ }
template<>
void
db::serializer<frozen_schema>::write(output& out, const frozen_schema& v) {
db::serializer<bytes>(v._data).write(out);
}
template<>
frozen_schema db::serializer<frozen_schema>::read(input& in) {
return frozen_schema(db::serializer<bytes>::read(in));
bytes_view frozen_schema::representation() const
{
return _data;
}

View File

@@ -30,9 +30,8 @@
// It's safe to access from another shard by const&.
class frozen_schema {
bytes _data;
private:
frozen_schema(bytes);
public:
explicit frozen_schema(bytes);
frozen_schema(const schema_ptr&);
frozen_schema(frozen_schema&&) = default;
frozen_schema(const frozen_schema&) = default;
@@ -40,14 +39,5 @@ public:
frozen_schema& operator=(frozen_schema&&) = default;
schema_ptr unfreeze() const;
friend class db::serializer<frozen_schema>;
bytes_view representation() const;
};
namespace db {
template<> serializer<frozen_schema>::serializer(const frozen_schema&);
template<> void serializer<frozen_schema>::write(output&, const frozen_schema&);
template<> frozen_schema serializer<frozen_schema>::read(input&);
extern template class serializer<frozen_schema>;
}

View File

@@ -55,21 +55,10 @@ static const std::map<application_state, sstring> application_state_names = {
{application_state::REMOVAL_COORDINATOR, "REMOVAL_COORDINATOR"},
{application_state::INTERNAL_IP, "INTERNAL_IP"},
{application_state::RPC_ADDRESS, "RPC_ADDRESS"},
{application_state::X_11_PADDING, "X_11_PADDING"},
{application_state::SEVERITY, "SEVERITY"},
{application_state::NET_VERSION, "NET_VERSION"},
{application_state::HOST_ID, "HOST_ID"},
{application_state::TOKENS, "TOKENS"},
{application_state::X1, "X1"},
{application_state::X2, "X2"},
{application_state::X3, "X3"},
{application_state::X4, "X4"},
{application_state::X5, "X5"},
{application_state::X6, "X6"},
{application_state::X7, "X7"},
{application_state::X8, "X8"},
{application_state::X9, "X9"},
{application_state::X10, "X10"},
};
std::ostream& operator<<(std::ostream& os, const application_state& m) {

View File

@@ -61,42 +61,4 @@ std::ostream& operator<<(std::ostream& os, const endpoint_state& x) {
return os;
}
void endpoint_state::serialize(bytes::iterator& out) const {
/* serialize the HeartBeatState */
_heart_beat_state.serialize(out);
/* serialize the map of ApplicationState objects */
int32_t app_state_size = _application_state.size();
serialize_int32(out, app_state_size);
for (auto& entry : _application_state) {
const application_state& state = entry.first;
const versioned_value& value = entry.second;
serialize_int32(out, int32_t(state));
value.serialize(out);
}
}
endpoint_state endpoint_state::deserialize(bytes_view& v) {
heart_beat_state hbs = heart_beat_state::deserialize(v);
endpoint_state es = endpoint_state(hbs);
int32_t app_state_size = read_simple<int32_t>(v);
for (int32_t i = 0; i < app_state_size; ++i) {
auto state = static_cast<application_state>(read_simple<int32_t>(v));
auto value = versioned_value::deserialize(v);
es.add_application_state(state, value);
}
return es;
}
size_t endpoint_state::serialized_size() const {
long size = _heart_beat_state.serialized_size();
size += serialize_int32_size;
for (auto& entry : _application_state) {
const versioned_value& value = entry.second;
size += serialize_int32_size;
size += value.serialized_size();
}
return size;
}
}

View File

@@ -81,6 +81,14 @@ public:
, _is_alive(true) {
}
endpoint_state(heart_beat_state&& initial_hb_state,
const std::map<application_state, versioned_value>& application_state)
: _heart_beat_state(std::move(initial_hb_state))
,_application_state(application_state)
, _update_timestamp(clk::now())
, _is_alive(true) {
}
heart_beat_state& get_heart_beat_state() {
return _heart_beat_state;
}
@@ -141,13 +149,6 @@ public:
}
friend std::ostream& operator<<(std::ostream& os, const endpoint_state& x);
// The following replaces EndpointStateSerializer from the Java code
void serialize(bytes::iterator& out) const;
static endpoint_state deserialize(bytes_view& v);
size_t serialized_size() const;
};
} // gms

View File

@@ -65,7 +65,7 @@ private:
// because everyone seems pretty accustomed to the default of 8, and users who have
// already tuned their phi_convict_threshold for their own environments won't need to
// change.
static constexpr double PHI_FACTOR{1.0 / std::log(10.0)};
static constexpr double PHI_FACTOR{M_LOG10El};
public:
arrival_window(int size)
@@ -102,7 +102,8 @@ private:
// because everyone seems pretty accustomed to the default of 8, and users who have
// already tuned their phi_convict_threshold for their own environments won't need to
// change.
static constexpr double PHI_FACTOR{1.0 / std::log(10.0)}; // 0.434...
static constexpr double PHI_FACTOR{M_LOG10El};
std::map<inet_address, arrival_window> _arrival_samples;
std::list<i_failure_detection_event_listener*> _fd_evnt_listeners;
double _phi = 8;

View File

@@ -97,50 +97,6 @@ public:
friend inline std::ostream& operator<<(std::ostream& os, const gossip_digest& d) {
return os << d._endpoint << ":" << d._generation << ":" << d._max_version;
}
// The following replaces GossipDigestSerializer from the Java code
void serialize(bytes::iterator& out) const {
_endpoint.serialize(out);
serialize_int32(out, _generation);
serialize_int32(out, _max_version);
}
static gossip_digest deserialize(bytes_view& v) {
auto endpoint = inet_address::deserialize(v);
auto generation = read_simple<int32_t>(v);
auto max_version = read_simple<int32_t>(v);
return gossip_digest(endpoint, generation, max_version);
}
size_t serialized_size() const {
return _endpoint.serialized_size() + serialize_int32_size + serialize_int32_size;
}
}; // class gossip_digest
// serialization helper for std::vector<gossip_digest>
class gossip_digest_serialization_helper {
public:
static void serialize(bytes::iterator& out, const std::vector<gossip_digest>& digests) {
serialize_int32(out, int32_t(digests.size()));
for (auto& digest : digests) {
digest.serialize(out);
}
}
static std::vector<gossip_digest> deserialize(bytes_view& v) {
int32_t size = read_simple<int32_t>(v);
std::vector<gossip_digest> digests;
for (int32_t i = 0; i < size; ++i)
digests.push_back(gossip_digest::deserialize(v));
return digests;
}
static size_t serialized_size(const std::vector<gossip_digest>& digests) {
size_t size = serialize_int32_size;
for (auto& digest : digests)
size += digest.serialized_size();
return size;
}
};
} // namespace gms

View File

@@ -54,44 +54,4 @@ std::ostream& operator<<(std::ostream& os, const gossip_digest_ack& ack) {
return os << "}";
}
void gossip_digest_ack::serialize(bytes::iterator& out) const {
// 1) Digest
gossip_digest_serialization_helper::serialize(out, _digests);
// 2) Map size
serialize_int32(out, int32_t(_map.size()));
// 3) Map contents
for (auto& entry : _map) {
const inet_address& ep = entry.first;
const endpoint_state& st = entry.second;
ep.serialize(out);
st.serialize(out);
}
}
gossip_digest_ack gossip_digest_ack::deserialize(bytes_view& v) {
// 1) Digest
std::vector<gossip_digest> _digests = gossip_digest_serialization_helper::deserialize(v);
// 2) Map size
int32_t map_size = read_simple<int32_t>(v);
// 3) Map contents
std::map<inet_address, endpoint_state> _map;
for (int32_t i = 0; i < map_size; ++i) {
inet_address ep = inet_address::deserialize(v);
endpoint_state st = endpoint_state::deserialize(v);
_map.emplace(std::move(ep), std::move(st));
}
return gossip_digest_ack(std::move(_digests), std::move(_map));
}
size_t gossip_digest_ack::serialized_size() const {
size_t size = gossip_digest_serialization_helper::serialized_size(_digests);
size += serialize_int32_size;
for (auto& entry : _map) {
const inet_address& ep = entry.first;
const endpoint_state& st = entry.second;
size += ep.serialized_size() + st.serialized_size();
}
return size;
}
} // namespace gms

View File

@@ -72,13 +72,6 @@ public:
return _map;
}
// The following replaces GossipDigestAckSerializer from the Java code
void serialize(bytes::iterator& out) const;
static gossip_digest_ack deserialize(bytes_view& v);
size_t serialized_size() const;
friend std::ostream& operator<<(std::ostream& os, const gossip_digest_ack& ack);
};

View File

@@ -49,39 +49,4 @@ std::ostream& operator<<(std::ostream& os, const gossip_digest_ack2& ack2) {
return os << "}";
}
void gossip_digest_ack2::serialize(bytes::iterator& out) const {
// 1) Map size
serialize_int32(out, int32_t(_map.size()));
// 2) Map contents
for (auto& entry : _map) {
const inet_address& ep = entry.first;
const endpoint_state& st = entry.second;
ep.serialize(out);
st.serialize(out);
}
}
gossip_digest_ack2 gossip_digest_ack2::deserialize(bytes_view& v) {
// 1) Map size
int32_t map_size = read_simple<int32_t>(v);
// 2) Map contents
std::map<inet_address, endpoint_state> _map;
for (int32_t i = 0; i < map_size; ++i) {
inet_address ep = inet_address::deserialize(v);
endpoint_state st = endpoint_state::deserialize(v);
_map.emplace(std::move(ep), std::move(st));
}
return gossip_digest_ack2(std::move(_map));
}
size_t gossip_digest_ack2::serialized_size() const {
size_t size = serialize_int32_size;
for (auto& entry : _map) {
const inet_address& ep = entry.first;
const endpoint_state& st = entry.second;
size += ep.serialized_size() + st.serialized_size();
}
return size;
}
} // namespace gms

View File

@@ -69,13 +69,6 @@ public:
return _map;
}
// The following replaces GossipDigestAck2Serializer from the Java code
void serialize(bytes::iterator& out) const;
static gossip_digest_ack2 deserialize(bytes_view& v);
size_t serialized_size() const;
friend std::ostream& operator<<(std::ostream& os, const gossip_digest_ack2& ack2);
};

View File

@@ -50,22 +50,4 @@ std::ostream& operator<<(std::ostream& os, const gossip_digest_syn& syn) {
return os << "}";
}
void gossip_digest_syn::serialize(bytes::iterator& out) const {
serialize_string(out, _cluster_id);
serialize_string(out, _partioner);
gossip_digest_serialization_helper::serialize(out, _digests);
}
gossip_digest_syn gossip_digest_syn::deserialize(bytes_view& v) {
sstring cluster_id = read_simple_short_string(v);
sstring partioner = read_simple_short_string(v);
std::vector<gossip_digest> digests = gossip_digest_serialization_helper::deserialize(v);
return gossip_digest_syn(cluster_id, partioner, std::move(digests));
}
size_t gossip_digest_syn::serialized_size() const {
return serialize_string_size(_cluster_id) + serialize_string_size(_partioner) +
gossip_digest_serialization_helper::serialized_size(_digests);
}
} // namespace gms

View File

@@ -72,17 +72,18 @@ public:
return _partioner;
}
sstring get_cluster_id() const {
return cluster_id();
}
sstring get_partioner() const {
return partioner();
}
std::vector<gossip_digest> get_gossip_digests() const {
return _digests;
}
// The following replaces GossipDigestSynSerializer from the Java code
void serialize(bytes::iterator& out) const;
static gossip_digest_syn deserialize(bytes_view& v);
size_t serialized_size() const;
friend std::ostream& operator<<(std::ostream& os, const gossip_digest_syn& syn);
};

View File

@@ -233,7 +233,7 @@ future<> gossiper::handle_ack_msg(msg_addr id, gossip_digest_ack ack_msg) {
}
void gossiper::init_messaging_service_handler() {
ms().register_echo([] {
ms().register_gossip_echo([] {
return smp::submit_to(0, [] {
auto& gossiper = gms::get_local_gossiper();
gossiper.set_last_processed_message_at();
@@ -279,7 +279,7 @@ void gossiper::init_messaging_service_handler() {
void gossiper::uninit_messaging_service_handler() {
auto& ms = net::get_local_messaging_service();
ms.unregister_echo();
ms.unregister_gossip_echo();
ms.unregister_gossip_shutdown();
ms.unregister_gossip_digest_syn();
ms.unregister_gossip_digest_ack2();
@@ -409,8 +409,14 @@ future<> gossiper::apply_state_locally(const std::map<inet_address, endpoint_sta
// Runs inside seastar::async context
void gossiper::remove_endpoint(inet_address endpoint) {
// do subscribers first so anything in the subscriber that depends on gossiper state won't get confused
_subscribers.for_each([endpoint] (auto& subscriber) {
subscriber->on_remove(endpoint);
// We can not run on_remove callbacks here becasue on_remove in
// storage_service might take the gossiper::timer_callback_lock
seastar::async([this, endpoint] {
_subscribers.for_each([endpoint] (auto& subscriber) {
subscriber->on_remove(endpoint);
});
}).handle_exception([] (auto ep) {
logger.warn("Fail to call on_remove callback: {}", ep);
});
if(_seeds.count(endpoint)) {
@@ -478,7 +484,7 @@ void gossiper::do_status_check() {
}
void gossiper::run() {
seastar::async([this, g = this->shared_from_this()] {
_callback_running = seastar::async([this, g = this->shared_from_this()] {
logger.trace("=== Gossip round START");
//wait on messaging service to start listening
@@ -588,7 +594,10 @@ void gossiper::run() {
logger.trace("ep={}, eps={}", x.first, x.second);
}
}
_scheduled_gossip_task.arm(INTERVAL);
if (_enabled) {
_scheduled_gossip_task.arm(INTERVAL);
}
return make_ready_future<>();
});
}
@@ -662,8 +671,7 @@ void gossiper::convict(inet_address endpoint, double phi) {
return;
}
auto& state = it->second;
// FIXME: Add getGossipStatus
// logger.debug("Convicting {} with status {} - alive {}", endpoint, getGossipStatus(epState), state.is_alive());
logger.debug("Convicting {} with status {} - alive {}", endpoint, get_gossip_status(state), state.is_alive());
if (!state.is_alive()) {
return;
}
@@ -1049,7 +1057,7 @@ void gossiper::mark_alive(inet_address addr, endpoint_state& local_state) {
msg_addr id = get_msg_addr(addr);
logger.trace("Sending a EchoMessage to {}", id);
auto ok = make_shared<bool>(false);
ms().send_echo(id).then_wrapped([this, id, ok] (auto&& f) mutable {
ms().send_gossip_echo(id).then_wrapped([this, id, ok] (auto&& f) mutable {
try {
f.get();
logger.trace("Got EchoMessage Reply");
@@ -1113,7 +1121,7 @@ void gossiper::handle_major_state_change(inet_address ep, const endpoint_state&
logger.info("Node {} is now part of the cluster", ep);
}
}
logger.trace("Adding endpoint state for {}", ep);
logger.trace("Adding endpoint state for {}, status = {}", ep, get_gossip_status(eps));
endpoint_state_map[ep] = eps;
auto& ep_state = endpoint_state_map.at(ep);
@@ -1430,12 +1438,13 @@ future<> gossiper::do_stop_gossiping() {
return make_ready_future<>();
}).get();
}
// FIXME: Integer.getInteger("cassandra.shutdown_announce_in_ms", 2000)
sleep(INTERVAL * 2).get();
auto& cfg = service::get_local_storage_service().db().local().get_config();
sleep(std::chrono::milliseconds(cfg.shutdown_announce_in_ms())).get();
} else {
logger.warn("No local state or state is in silent shutdown, not announcing shutdown");
}
_scheduled_gossip_task.cancel();
_callback_running.get();
get_gossiper().invoke_on_all([] (gossiper& g) {
if (engine().cpu_id() == 0) {
get_local_failure_detector().unregister_failure_detection_event_listener(&g);

View File

@@ -99,6 +99,7 @@ private:
bool _enabled = false;
std::set<inet_address> _seeds_from_config;
sstring _cluster_name;
future<> _callback_running = make_ready_future<>();
public:
sstring get_cluster_name();
sstring get_partitioner_name();

View File

@@ -90,22 +90,6 @@ public:
friend inline std::ostream& operator<<(std::ostream& os, const heart_beat_state& h) {
return os << "{ generation = " << h._generation << ", version = " << h._version << " }";
}
// The following replaces HeartBeatStateSerializer from the Java code
void serialize(bytes::iterator& out) const {
serialize_int32(out, _generation);
serialize_int32(out, _version);
}
static heart_beat_state deserialize(bytes_view& v) {
auto generation = read_simple<int32_t>(v);
auto version = read_simple<int32_t>(v);
return heart_beat_state(generation, version);
}
size_t serialized_size() const {
return serialize_int32_size + serialize_int32_size;
}
};
} // gms

View File

@@ -37,6 +37,9 @@ public:
inet_address(int32_t ip)
: _addr(uint32_t(ip)) {
}
explicit inet_address(uint32_t ip)
: _addr(ip) {
}
inet_address(net::ipv4_address&& addr) : _addr(std::move(addr)) {}
const net::ipv4_address& addr() const {
@@ -57,19 +60,6 @@ public:
bool is_broadcast_address() {
return _addr == net::ipv4::broadcast_address();
}
void serialize(bytes::iterator& out) const {
int8_t inet_address_size = sizeof(inet_address);
serialize_int8(out, inet_address_size);
serialize_int32(out, _addr.ip);
}
static inet_address deserialize(bytes_view& v) {
int8_t inet_address_size = read_simple<int8_t>(v);
assert(inet_address_size == sizeof(inet_address));
return inet_address(read_simple<int32_t>(v));
}
size_t serialized_size() const {
return serialize_int8_size + serialize_int32_size;
}
friend inline bool operator==(const inet_address& x, const inet_address& y) {
return x._addr == y._addr;
}

View File

@@ -36,6 +36,7 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "gms/versioned_value.hh"
#include "message/messaging_service.hh"
namespace gms {
@@ -52,19 +53,8 @@ constexpr const char* versioned_value::HIBERNATE;
constexpr const char* versioned_value::SHUTDOWN;
constexpr const char* versioned_value::REMOVAL_COORDINATOR;
void versioned_value::serialize(bytes::iterator& out) const {
serialize_string(out, value);
serialize_int32(out, version);
}
versioned_value versioned_value::deserialize(bytes_view& v) {
auto value = read_simple_short_string(v);
auto version = read_simple<int32_t>(v);
return versioned_value(std::move(value), version);
}
size_t versioned_value::serialized_size() const {
return serialize_string_size(value) + serialize_int32_size;
versioned_value versioned_value::factory::network_version() {
return versioned_value(sprint("%s",net::messaging_service::current_version));
}
}

Some files were not shown because too many files have changed in this diff Show More