Commit Graph

8493 Commits

Author SHA1 Message Date
Glauber Costa
e885eacbe4 column_family: do not open code generation calculation
We already have a function that wraps this, re-use it.  This FIXME is still
relevant, so just move it there. Let's not lose it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 94e90d4a17)
2016-03-14 15:51:06 +02:00
Glauber Costa
3f67277804 colum_family: remove mutation_count
We use memory usage as a threshold these days, and nowhere is _mutation_count
checked. Get rid of it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 46fdeec60a)
2016-03-14 15:50:57 +02:00
Asias He
05aea2b65a storage_service: Fix pending_range_calculator_service
Since calculate_pending_ranges will modify token_metadata, we need to
replicate to other shards. With this patch, when we call
calculate_pending_ranges, token_metadata will be replciated to other
non-zero shards.

In addition, it is not useful as a standalone class. We can merge it
into the storage_service. Kill one singleton class.

Fixes #1033
Refs #962
Message-Id: <fb5b26311cafa4d315eb9e72d823c5ade2ab4bda.1457943074.git.asias@scylladb.com>

(cherry picked from commit 9f64c36a08)
2016-03-14 14:39:39 +02:00
Vlad Zolotarov
a2751a9592 sstables: properly account removal requests
The same shard may create an sstables::sstable object for the same SStable
that doesn't belong to it more than once and mark it
for deletion (e.g. in a 'nodetool refresh' flow).

In that case the destructor of sstables::sstable accounted
the deletion requests from the same shard more than once since it was a simple
counter incremented each time there was a deletion request while it should
account request from the same shard as a single request. This is because
the removal logic waited for all shards to agree on a removal of a specific
SStable by comparing the counter mentioned above to the total
number of shards and once they were equal the SStable files were actually removed.

This patch fixes this by replacing the counter by an std::unordered_set<unsigned>
that will store a shard ids of the shards requesting the deletion
of the sstable object and will compare the size() of this set
to smp::count in order to decide whether to actually delete the corresponding
SStable files.

Fixes #1004

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1457886812-32345-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit ce47fcb1ba)
2016-03-14 14:38:17 +02:00
Raphael S. Carvalho
eda8732b8e sstables: make write_simple() safer by using exclusive flag
We should guarantee that write_simple() will not try to overwrite
an existing file.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <194bd055f1f2dc1bb9766a67225ec38c88e7b005.1457818073.git.raphaelsc@scylladb.com>
(cherry picked from commit 1ff7d32272)
2016-03-14 14:38:07 +02:00
Raphael S. Carvalho
b24f5ece1f sstables: fix race condition when writing to the same sstable in parallel
When we are about to write a new sstable, we check if the sstable exists
by checking if respective TOC exists. That check was added to handle a
possible attempt to write a new sstable with a generation being used.
Gleb was worried that a TOC could appear after the check, and that's indeed
possible if there is an ongoing sstable write that uses the same generation
(running in parallel).
If TOC appear after the check, we would again crap an existing sstable with
a temporary, and user wouldn't be to boot scylla anymore without manual
intervention.

Then Nadav proposed the following solution:
"We could do this by the following variant of Raphael's idea:

   1. create .txt.tmp unconditionally, as before the commit 031bf57c1
(if we can't create it, fail).
   2. Now confirm that .txt does not exist. If it does, delete the .txt.tmp
we just created and fail.
   3. continue as usual
   4. and at the end, as before, rename .txt.tmp to .txt.

The key to solving the race is step 1: Since we created .txt.tmp in step 1
and know this creation succeeded, we know that we cannot be running in
parallel with another writer - because such a writer too would have tried to
create the same file, and kept it existing until the very last step of its
work (step 4)."

This patch implements the solution described above.
Let me also say that the race is theoretical and scylla wasn't affected by
it so far.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ef630f5ac1bd0d11632c343d9f77a5f6810d18c1.1457818331.git.raphaelsc@scylladb.com>
(cherry picked from commit 0af786f3ea)
2016-03-14 14:37:58 +02:00
Raphael S. Carvalho
1322ec6d6b sstables: bail out if toc exists for generation used by write_components
Currently, if sstable::write_components() is called to write a new sstable
using the same generation of a sstable that exists, a temporary TOC will
be unconditionally created. Afterwards, the same sstable::write_components()
will fail when it reaches sstable::create_data(). The reason is obvious
because data component exists for that generation (in this scenario).
After that, user will not be able to boot scylla anymore because there is
a generation with both a TOC and a temporary TOC. We cannot simply remove a
generation with TOC and temporary TOC because user data will be lost (again,
in this scenario). After all, the temporary TOC was only created because
sstable::write_components() was wrongly called with the generation of a
sstable that exists.

Solution proposed by this patch is to trigger exception if a TOC file
exists for the generation used.

Some SSTable unit tests were also changed to guarantee that we don't try
to overwrite components of an existing sstable.

Refs #1014.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <caffc4e19cdcf25e4c6b9dd277d115422f8246c4.1457643565.git.raphaelsc@scylladb.com>
(cherry picked from commit 031bf57c19)
2016-03-14 14:37:50 +02:00
Glauber Costa
efbf51c00b sstables: improve error messages
The standard C++ exception messages that will be thrown if there is anything
wrong writing the file, are suboptimal: they barely tell us the name of the failing
file.

Use a specialized create function so that we can capture that better.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit f2a8bcabc2)
2016-03-14 14:37:41 +02:00
Pekka Enberg
5d901b19c4 main: Initialize system keyspace earlier
We start services like gossiper before system keyspace is initialized
which means we can start writing too early. Shuffle code so that system
keyspace is initialized earlier.

Refs #1014
Message-Id: <1457593758-9444-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 5dd1fda6cf)
2016-03-14 13:47:18 +02:00
Tomasz Grabiec
7085fc95d1 log: Fix operator<<(std::ostream&, const std::exception_ptr&)
Attempt to print std::nested_exception currently results in exception
to leak outside the printer. Fix by capturing all exception in the
final catch block.

For nested exception, the logger will print now just
"std::nested_exception".  For nested exceptions specifically we should
log more, but that is a separate problem to solve.
Message-Id: <1457532215-7498-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 838a038cbd)
2016-03-09 16:11:14 +02:00
Pekka Enberg
776908fbf6 types: Implement to_string for timestamps and dates
The to_string() function is used for logging purpose so use boost
to_iso_extended_string() to format both timestamps and dates.

Fixes #968 (showstopper)
Message-Id: <1457528755-6164-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit ab502bcfa8)
2016-03-09 16:10:02 +02:00
Gleb Natapov
5f7f276ef6 fix EACH_QUORUM handling during bootstrapping
Currently write acknowledgements handling does not take bootstrapping
node into account for CL=EACH_QUORUM. The patch fixes it.

Fixes #994

Message-Id: <20160307121620.GR2253@scylladb.com>
(cherry picked from commit 626c9d046b)
2016-03-08 13:35:10 +02:00
Paweł Dziepak
5a38f3cbfd lsa: set _active to nullptr in region destructor
In region destructor, after active segments is freed pointer to it is
left unchanged. This confuses the remaining parts of the destructor
logic (namely, removal from region group) which may rely on the
information in region_impl::_active.

In this particular case the problem was that code removing from the
region group called region_impl::occupancy() which was
dereferencing _active if not null.

Fixes #993.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 99b61d3944)
2016-03-08 13:32:30 +02:00
Tomasz Grabiec
2d4309a926 validation: Fix validation of empty partition key
The validation was wrongly assuming that empty thrift key, for which
the original C* code guards against, can only correspond to empty
representation of our partition_key. This no longer holds after:

   commit 095efd01d6
   "keys: Make from_exploded() and components() work without schema"

This was responsible for dtest failure:
cql_additional_tests.TestCQL:column_name_validation_test

(cherry picked from commit 100b540a53)
2016-03-08 11:42:14 +02:00
Tomasz Grabiec
988d6cd153 cql3: Fix handling of lists with static columns
List operations and prefetching were not handling static columns
correctly. One issue was that prefetching was attaching static column
data to row data using ids which might overlap with clustered columns.

Another problem was that list operations were always constructing
clustering key even if they worked on a static column. For static
columns the key would be always empty and lookup would fail.

The effect was that list operations which depend on curent state had
no effect. Similar problem could be observed on C* 2.1.9, but not on 2.2.3.

Fixes #903.

(cherry picked from commit 383296c05b)
2016-03-06 11:06:03 +02:00
Pekka Enberg
bf71575fd7 release: prepare for 0.18.1 scylla-0.18.1 2016-03-05 08:53:07 +02:00
Gleb Natapov
cd75075214 storage_proxy: fix race between read cl completion and timeout in digest resolver
If timeout happens after cl promise is fulfilled, but before
continuation runs it removes all the data that cl continuation needs
to calculate result. Fix this by calculating result immediately and
returning it in cl promise instead of delaying this work until
continuation runs. This has a nice side effect of simplifying digest
mismatch handling and making it exception free.

Fixes #977.

Message-Id: <1457015870-2106-3-git-send-email-gleb@scylladb.com>
(cherry picked from commit b89b6f442b)
2016-03-03 17:10:38 +02:00
Gleb Natapov
e85f11566b storage_proxy: store only one data reply in digest resolver.
Read executor may ask for more than one data reply during digest
resolving stage, but only one result is actually needed to satisfy
a query, so no need to store all of them.

Message-Id: <1457015870-2106-2-git-send-email-gleb@scylladb.com>
(cherry picked from commit e4ac5157bc)
2016-03-03 17:10:32 +02:00
Gleb Natapov
8f682f018e storage_proxy: fix cl achieved condition in digest resolver timeout handler
In digest resolver for cl to be achieved it is not enough to get correct
number of replies, but also to have data reply among them. The condition
in digest timeout does not check that, fortunately we have a variable
that we set to true when cl is achieved, so use it instead.

Message-Id: <1457015870-2106-1-git-send-email-gleb@scylladb.com>
(cherry picked from commit 69b61b81ce)
2016-03-03 17:10:26 +02:00
Tomasz Grabiec
dba2b617e7 db: Fix error handling in populate_keyspace()
When find_uuid() fails Scylla would terminate with:

  Exiting on unhandled exception of type 'std::out_of_range': _Map_base::at

But we are supposed to ignore directories for unknown column
families. The try {} catch block is doing just that when
no_such_column_family is thrown from the find_column_family() call
which follows find_uuid(). Fix by converting std::out_of_range to
no_such_column_family.

Message-Id: <1456056280-3933-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 11:37:26 +02:00
Paweł Dziepak
f4e11007cf Revert "do not use boost::multiprecision::msb()"
This reverts commit dadd097f9c.

That commit caused serialized forms of varint and decimal to have some
excess leading zeros. They didn't affect deserialization in any way but
caused computed tokens to differ from the Cassandra ones.

Fixes #898.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1455537278-20106-1-git-send-email-pdziepak@scylladb.com>
2016-03-03 10:54:19 +02:00
Asias He
fdfa1df395 locator: Fix get token from a range<token>
With a range{t1, t2}, if t2 == {}, the range.end() will contain no
value. Fix getting t2 in this case.

Fixes #911.
Message-Id: <4462e499d706d275c03b116c4645e8aaee7821e1.1456128310.git.asias@scylladb.com>
2016-03-03 10:53:21 +02:00
Tomasz Grabiec
116055cc6f bytes_ostream: Avoid recursion when freeing chunks
When there is a lot of chunks we may get stack overflow.

This seems to fix issue #906, a memory corruption during schema
merge. I suspect that what causes corruption there is overflowing of
the stack allocated for the seastar thread. Those stacks don't have
red zones which would catch overflow.

Message-Id: <1456056288-3983-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 10:53:01 +02:00
Calle Wilund
04c19344de database: Fix use and assumptions about pending compations
Fixes #934 - faulty assert in discard_sstables

run_with_compaction_disabled clears out a CF from compaction
mananger queue. discard_sstables wants to assert on this, but looks
at the wrong counters.

pending_compactions is an indicator on how much interested parties
want a CF compacted (again and again). It should not be considered
an indicator of compactions actually being done.

This modifies the usage slightly so that:
1.) The counter is always incremented, even if compaction is disallowed.
    The counters value on end of run_with_compaction_disabled is then
    instead used as an indicator as to whether a compaction should be
    re-triggered. (If compactions finished, it will be zero)
2.) Document the use and purpose of the pending counter, and add
    method to re-add CF to compaction for r_w_c_d above.
3.) discard_sstables now asserts on the right things.

Message-Id: <1456332824-23349-1-git-send-email-calle@scylladb.com>
2016-03-03 10:51:27 +02:00
Raphael S. Carvalho
df19e546f9 tests: sstable_test: submit compaction request through column family
That's needed for reverted commit 9586793c to work. It's also the
correct thing to do, i.e. column family submits itself to manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2a1d141ad929c1957933f57412083dd52af0390b.1456415398.git.raphaelsc@scylladb.com>
2016-03-03 10:51:23 +02:00
Takuya ASADA
b532919c55 dist: add posix_net_conf.sh on Ubuntu package
Fixes #881

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455522990-32044-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit fb3f4cc148)
scylla-0.18
2016-02-15 17:03:10 +02:00
Takuya ASADA
6ae6dcc2fc dist: switch AMI base image to 'CentOS7-Base2', uses CentOS official kernel
On previous CentOS base image, it accsidently uses non-standard kernel from elrepo.
This replaces base image to new one, contains CentOS default kernel.

Fixes #890

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455398903-2865-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 3697cee76d)
2016-02-15 15:59:04 +02:00
Tomasz Grabiec
5716140a14 abstract_replication_strategy: Fix generation of token ranges
We can't move-from in the loop because the subject will be empty in
all but the first iteration.

Fixes crash during node stratup:

  "Exiting on unhandled exception of type 'runtime_exception': runtime error: Invalid token. Should have size 8, has size 0"

Fixes update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_node_1_test (and probably others)

Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
(cherry picked from commit efdbc3d6d7)
2016-02-14 14:39:31 +02:00
Avi Kivity
91cb9bae2e release: prepare for 0.18 2016-02-11 17:55:20 +02:00
Shlomi Livne
f938e1d303 dist: start scylla with SCYLLA_IO
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <d93a7b41a285fcde796c5681479a328f1efac0c3.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:03 +02:00
Shlomi Livne
5494135ddd dist: update SCYLLA_IO with params for AMI
Add setting of --num-io-queues, --max-io-requests for AMI

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <b94a63154a91c8568e194d7221b9ffc7d7813ebc.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:02 +02:00
Shlomi Livne
5cae2560a3 dist: introduce SCYLLA_IO
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <6490d049fd23a335bb0a95cac3e8a4c08c61166e.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:01:02 +02:00
Shlomi Livne
d8cdf76e70 dist: change setting of scylla home from "-d" to "-r"
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <53dcd9d1daa0194de3f889b67788d9c21d1e474d.1455188901.git.shlomi@scylladb.com>
2016-02-11 17:00:37 +02:00
Avi Kivity
3c4f67f3e6 build: require boost > 1.55
See #898.

Add checks both for boost being installed, and for the correct version.
Message-Id: <1455193574-24959-1-git-send-email-avi@scylladb.com>
2016-02-11 15:15:49 +02:00
Avi Kivity
9249d45ae1 Update scylla-ami submodule
* dist/ami/files/scylla-ami b2724be...b3b85be (1):
  > adding --stop-services
2016-02-11 12:24:17 +02:00
Avi Kivity
5834815ed9 Merge seastar upstream
* seastar 14c9991...353b1a1 (2):
  > scripts: posix_net_conf.sh: Change the way we learn NIC's IRQ numbers
  > gate: protect against calling close() more than once
2016-02-11 12:23:51 +02:00
Takuya ASADA
09b1ec6103 dist: attach ephemeral disks on AMI by default
To attach maximum number of ephemeral disks available on the instance, specify 8.
On AMI creation, it will be reduce to available number.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454439628-2882-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:21:09 +02:00
Takuya ASADA
16e6db42e1 dist: abandon to start scylla-server when it's disabled from AMI userdata
Support AMi's --stop-services, prevent startup scylla-server (and scylla-jmx, since it's dependent on scylla-server)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454492729-11876-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:21:08 +02:00
Takuya ASADA
f227b3faac dist: On AMI, mark root disk with delete_on_termination
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454513308-12384-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:19:28 +02:00
Takuya ASADA
33309f667e dist: enable enhanced networking on AMI
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1454971289-21369-1-git-send-email-syuu@scylladb.com>
2016-02-11 12:18:48 +02:00
Raphael S. Carvalho
ed61fe5831 sstables: make compaction stop report user-friendly
When scylla stopped an ongoing compaction, the event was reported
as an error. This patch introduces a specialized exception for
compaction stop so that the event can be handled appropriately.

Before:
ERROR [shard 0] compaction_manager - compaction failed: read exception:
std::runtime_error (Compaction for keyspace1/standard1 was deliberately
stopped.)

After:
INFO  [shard 0] compaction_manager - compaction info: Compaction for
keyspace1/standard1 was stopped due to shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1f85d4e5c24d23a1b4e7e0370a2cffc97cbc6d44.1455034236.git.raphaelsc@scylladb.com>
2016-02-11 12:16:53 +02:00
Takuya ASADA
8d8130f9c9 dist: fix typo on build_ami.sh
We should always run scylla_setup, not just for locally built rpm

Fixes #897

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455103519-13780-1-git-send-email-syuu@scylladb.com>
2016-02-11 11:56:11 +02:00
Shlomi Livne
64f8d5a50e dist: update packer location
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <3c33ea073f702e00b789930fce9befef03ad9e88.1455178900.git.shlomi@scylladb.com>
2016-02-11 11:52:56 +02:00
Avi Kivity
bfbf89ee31 Merge "Serialize keys in a form independent of in-memory representation" from Tomasz
"This series changes the on-wire definitions of keys to be of the following form:

  class partition_key {
     std::vector<bytes> exploded();
  };

Keys are therefore collections of components. The components are serialized according
to the format specified in the CQL binary protocol. No bit depends now on how we store keys in memory.

Constructing keys from components currently requires a schema reference,
which makes it not possible to deserialize or serialize the keys automatically
by RPC. To avoid those complications, compound_type was changed so that
it can be constructed and components can be iterated over without schema.
Because of this, partition_key size increased by 2 bytes."
2016-02-10 17:54:42 +02:00
Tomasz Grabiec
b74301302c tests: Add test for key serialization 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
3e2c1840d8 idl: Make key definitions independent of in-memory representation 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
428fce3828 compound: Optimize serialize_single() 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
0cc2832a76 keys: Allow constructing from a range 2016-02-10 15:22:56 +01:00
Tomasz Grabiec
3ffcb998fb keys: Enable serialization from a range not just a vector 2016-02-10 14:35:14 +01:00
Tomasz Grabiec
095efd01d6 keys: Make from_exploded() and components() work without schema
For simplicity, we want to have keys serializable and deserializable
without schema for now. We will serialize keys in a generic form of a
vector of components where the format of components is specified by
CQL binary protocol. So conversion between keys and vector of
components needs to be possible to do without schema.

We may want to make keys schema-dependent back in the future to apply
space optimizations specific to column types. Existing code should
still pass schema& to construct and access the key when possible.

One optimization had to be reverted in this change - avoidance of
storing key length (2 bytes) for single-component partition keys. One
consequence of this, in addition to a bit larger keys, is that we can
no longer avoid copy when constructing single-component partition keys
from a ready "bytes" object.

I haven't noticed any significant performance difference in:

  tests/perf/perf_simple_query -c1 --write

It does ~130K tps on my machine.
2016-02-10 14:35:13 +01:00