Commit Graph

8773 Commits

Author SHA1 Message Date
Raphael S. Carvalho
dcd2b85e02 sstables: fix race condition when writing to the same sstable in parallel
When we are about to write a new sstable, we check if the sstable exists
by checking if respective TOC exists. That check was added to handle a
possible attempt to write a new sstable with a generation being used.
Gleb was worried that a TOC could appear after the check, and that's indeed
possible if there is an ongoing sstable write that uses the same generation
(running in parallel).
If TOC appear after the check, we would again crap an existing sstable with
a temporary, and user wouldn't be to boot scylla anymore without manual
intervention.

Then Nadav proposed the following solution:
"We could do this by the following variant of Raphael's idea:

   1. create .txt.tmp unconditionally, as before the commit 031bf57c1
(if we can't create it, fail).
   2. Now confirm that .txt does not exist. If it does, delete the .txt.tmp
we just created and fail.
   3. continue as usual
   4. and at the end, as before, rename .txt.tmp to .txt.

The key to solving the race is step 1: Since we created .txt.tmp in step 1
and know this creation succeeded, we know that we cannot be running in
parallel with another writer - because such a writer too would have tried to
create the same file, and kept it existing until the very last step of its
work (step 4)."

This patch implements the solution described above.
Let me also say that the race is theoretical and scylla wasn't affected by
it so far.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <ef630f5ac1bd0d11632c343d9f77a5f6810d18c1.1457818331.git.raphaelsc@scylladb.com>
(cherry picked from commit 0af786f3ea)
2016-03-14 11:59:55 +02:00
Raphael S. Carvalho
1d1416f841 sstables: bail out if toc exists for generation used by write_components
Currently, if sstable::write_components() is called to write a new sstable
using the same generation of a sstable that exists, a temporary TOC will
be unconditionally created. Afterwards, the same sstable::write_components()
will fail when it reaches sstable::create_data(). The reason is obvious
because data component exists for that generation (in this scenario).
After that, user will not be able to boot scylla anymore because there is
a generation with both a TOC and a temporary TOC. We cannot simply remove a
generation with TOC and temporary TOC because user data will be lost (again,
in this scenario). After all, the temporary TOC was only created because
sstable::write_components() was wrongly called with the generation of a
sstable that exists.

Solution proposed by this patch is to trigger exception if a TOC file
exists for the generation used.

Some SSTable unit tests were also changed to guarantee that we don't try
to overwrite components of an existing sstable.

Refs #1014.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <caffc4e19cdcf25e4c6b9dd277d115422f8246c4.1457643565.git.raphaelsc@scylladb.com>
(cherry picked from commit 031bf57c19)
2016-03-14 11:59:46 +02:00
Glauber Costa
be552139ce sstables: improve error messages
The standard C++ exception messages that will be thrown if there is anything
wrong writing the file, are suboptimal: they barely tell us the name of the failing
file.

Use a specialized create function so that we can capture that better.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit f2a8bcabc2)
2016-03-14 11:59:39 +02:00
Avi Kivity
1b45b5d649 Merge seastar upstream
* seastar 906b562...88cc232 (2):
  > reactor: fix work item leak in syscall work queue
  > rpc_test: add missing header
2016-03-14 11:16:22 +02:00
Tomasz Grabiec
7c1268765c log: Fix operator<<(std::ostream&, const std::exception_ptr&)
Attempt to print std::nested_exception currently results in exception
to leak outside the printer. Fix by capturing all exception in the
final catch block.

For nested exception, the logger will print now just
"std::nested_exception".  For nested exceptions specifically we should
log more, but that is a separate problem to solve.
Message-Id: <1457532215-7498-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 838a038cbd)
2016-03-09 16:10:27 +02:00
Pekka Enberg
9ef84d1f01 types: Implement to_string for timestamps and dates
The to_string() function is used for logging purpose so use boost
to_iso_extended_string() to format both timestamps and dates.

Fixes #968 (showstopper)
Message-Id: <1457528755-6164-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit ab502bcfa8)
2016-03-09 16:10:18 +02:00
Calle Wilund
4e3b98f281 lists.cc: fix update insert of frozen list
Fixes #967

Frozen lists are just atomic cells. However, old code inserted the
frozen data directly as an atomic_cell_or_collection, which in turn
meant it lacked the header data of a cell. When in turn it was
handled by internal serialization (freeze), since the schema said
is was not a (non-frozen) collection, we tried to look at frozen
list data as cell header -> most likely considered dead.
Message-Id: <1457432538-28836-1-git-send-email-calle@scylladb.com>

(cherry picked from commit 8575f1391f)
scylla-0.19
2016-03-08 15:36:29 +02:00
Pekka Enberg
124489e8d8 Update scylla-ami submodule
* dist/ami/files/scylla-ami d4a0e18...84bcd0d (1):
  > Add --ami parameter

(cherry picked from commit 81af486b69)
2016-03-08 14:10:53 +02:00
Takuya ASADA
7a2c57d6bd dist: export all entries on /etc/default/scylla-server on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
(cherry picked from commit 18a27de3c8)
2016-03-08 14:10:46 +02:00
Takuya ASADA
10543bf81e dist: export sysconfig for scylla-io-setup.service
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
(cherry picked from commit 9ee14abf24)
2016-03-08 14:10:22 +02:00
Takuya ASADA
579a220162 Revert "Revert "dist: align ami option with others (-a --> --ami)""
This reverts commit 66c5feb9e9.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
(cherry picked from commit 3d9dc52f5f)
2016-03-08 14:10:14 +02:00
Takuya ASADA
8c5ffb84ce Revert "Revert "Revert "dist: remove AMI entry from sysconfig, since there is no script refering it"""
This reverts commit 643beefc8c.

Conflicts:
	dist/common/scripts/scylla_sysconfig_setup
	dist/common/sysconfig/scylla-server

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
(cherry picked from commit c9882bc2c4)
2016-03-08 14:10:05 +02:00
Takuya ASADA
d05cdb0f6e dist: add /etc/scylla.d/io.conf on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
(cherry picked from commit c888eaac74)
2016-03-08 14:09:58 +02:00
Gleb Natapov
df02fb7a3e fix EACH_QUORUM handling during bootstrapping
Currently write acknowledgements handling does not take bootstrapping
node into account for CL=EACH_QUORUM. The patch fixes it.

Fixes #994

Message-Id: <20160307121620.GR2253@scylladb.com>
(cherry picked from commit 626c9d046b)
2016-03-08 13:35:58 +02:00
Gleb Natapov
559a8b41f2 log: add space between log level and date in the outpu
It was dropped by 6dc51027a3

Message-Id: <20160306125313.GI2253@scylladb.com>
(cherry picked from commit 8dad399256)
2016-03-08 13:31:07 +02:00
Paweł Dziepak
8b1f18ee1a lsa: set _active to nullptr in region destructor
In region destructor, after active segments is freed pointer to it is
left unchanged. This confuses the remaining parts of the destructor
logic (namely, removal from region group) which may rely on the
information in region_impl::_active.

In this particular case the problem was that code removing from the
region group called region_impl::occupancy() which was
dereferencing _active if not null.

Fixes #993.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 99b61d3944)
2016-03-08 13:30:37 +02:00
Takuya ASADA
cbbd18a249 dist: show message to use XFS for scylla data directory and also notify about developer mode, when iotune fails
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1457426286-15925-1-git-send-email-syuu@scylladb.com>
2016-03-08 12:21:02 +02:00
Pekka Enberg
4db985e505 release: prepare for 0.19 2016-03-06 13:26:44 +02:00
Takuya ASADA
6dc51027a3 log: make log.cc able to compile with g++-4.9
std::put_time() is not implemented on g++-4.9, so replace it with strftime().
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1457024183-893-1-git-send-email-syuu@scylladb.com>
2016-03-04 12:48:43 +01:00
Avi Kivity
6c2e57b003 Merge seastar upstream
* seastar ba615c7...906b562 (1):
  > rpc: prepare some more for feature negotiation
2016-03-03 18:22:57 +02:00
Gleb Natapov
b89b6f442b storage_proxy: fix race between read cl completion and timeout in digest resolver
If timeout happens after cl promise is fulfilled, but before
continuation runs it removes all the data that cl continuation needs
to calculate result. Fix this by calculating result immediately and
returning it in cl promise instead of delaying this work until
continuation runs. This has a nice side effect of simplifying digest
mismatch handling and making it exception free.

Fixes #977.

Message-Id: <1457015870-2106-3-git-send-email-gleb@scylladb.com>
2016-03-03 16:48:28 +02:00
Gleb Natapov
e4ac5157bc storage_proxy: store only one data reply in digest resolver.
Read executor may ask for more than one data reply during digest
resolving stage, but only one result is actually needed to satisfy
a query, so no need to store all of them.

Message-Id: <1457015870-2106-2-git-send-email-gleb@scylladb.com>
2016-03-03 16:47:53 +02:00
Gleb Natapov
69b61b81ce storage_proxy: fix cl achieved condition in digest resolver timeout handler
In digest resolver for cl to be achieved it is not enough to get correct
number of replies, but also to have data reply among them. The condition
in digest timeout does not check that, fortunately we have a variable
that we set to true when cl is achieved, so use it instead.

Message-Id: <1457015870-2106-1-git-send-email-gleb@scylladb.com>
2016-03-03 16:47:11 +02:00
Tomasz Grabiec
2abd62b5cb bytes_ostream: Drop methods which serialize integers
This will make bytes_ostream completely agnostic to serialization
format, which should be determined by layer above it.

Message-Id: <1457004221-8345-2-git-send-email-tgrabiec@scylladb.com>
2016-03-03 13:27:27 +02:00
Tomasz Grabiec
aaac2a3cec serializer: Add missing include
Message-Id: <1457004221-8345-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 13:27:22 +02:00
Pekka Enberg
9c930d88a0 db/system_keyspace: Remove ifdef'd code
We have our implementations of all the three ifdef'd functions.

Message-Id: <1456926917-12594-1-git-send-email-penberg@scylladb.com>
2016-03-03 12:26:50 +02:00
Takuya ASADA
da56325f69 configure.py: add support --static-stdc++ for seastar binaries (iotune)
Ubuntu 14.04LTS package is broken now because iotune does not statically linked against libstdc++, so this patch fixed it.
Requires seastar patch to add --static-stdc++ on configure.py.

Fixes #982

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456995050-22007-1-git-send-email-syuu@scylladb.com>
2016-03-03 12:18:47 +02:00
Avi Kivity
d4c92c7e27 Merge seastar upstream
* seastar b3fc7c5...ba615c7 (1):
  > configure.py: add --static-stdc++ to link libstdc++ statically
2016-03-03 12:18:23 +02:00
Asias He
01cb6b0d42 gossip: Send syn message in parallel and do not wait for it
1) As explained in commit 697b16414a (gossip: Make gossip message
handling async), in each gossip round we can make talking to the 1-3
peer nodes in parallel to reduce latency of gossip round.

2) Gossip syn message uses one way rpc message, but now the returned
future of the one way message is ready only when message is dequeued for
some reason (sent or dropped). If we wait for the one way syn messge to
return it might block the gossip round for a unbounded time. To fix, do
not wait for it in the gossip round. The downside is there will be no
back pressure to bound the syn messages, however since the messages are
once per second, I think it is fine.
Message-Id: <ea4655f121213702b3f58185378bb8899e422dd1.1456991561.git.asias@scylladb.com>
2016-03-03 11:17:50 +02:00
Takuya ASADA
e545013e47 Revert "dist: downgrade g++ to 4.9 on Ubuntu"
This reverts commit 01bd4959ac.

Fixes #983

Conflicts:
	dist/ubuntu/build_deb.sh
	dist/ubuntu/control.in
	dist/ubuntu/rules.in

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1456996244-19889-1-git-send-email-syuu@scylladb.com>
2016-03-03 11:12:18 +02:00
Tomasz Grabiec
04f2482d74 schema_tables: Log results of schema merge
Currently schema changes are only logged at coordinator node which
initiates the change. It would be helpful in post morten analysis to
also see when and how schema changes are resolved when applied on
other nodes.
Message-Id: <1456953095-1982-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 11:12:15 +02:00
Nadav Har'El
2cf09147b5 Repair: don't use freeze() to calculate mutation checksums
Use the existing "feed_hash" mechanism to find a checksum of the
content of a mutation, instead of serializing the mutation (with freeze())
and then finding the checksum of that string.

The serialized form is more prone to future changes, and not really
guaranteed to provide equal hashes for mutations which are considered
"equal".

Fixes #971

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1456958676-27121-1-git-send-email-nyh@scylladb.com>
2016-03-03 09:58:24 +01:00
Avi Kivity
bec30ccf25 build: add order-only dependency between building antlr .o and IDL headers
This ensures that if an antlr generated .cpp file depends on an
IDL-generated .hh file, then that .hh is generated before the .o is
built.
2016-03-03 09:52:25 +02:00
Tomasz Grabiec
b42d3a90b3 cql3: create_table_statement: Sort _defined_names by text
Currently they are sorted by address in memory, which breaks the
check for column name duplicates, which assumes sorting by text.

Fixes #975.

Message-Id: <1456937400-20475-1-git-send-email-tgrabiec@scylladb.com>
2016-03-02 18:53:43 +02:00
Avi Kivity
dda77d14b9 Merge seastar upstream
* seastar 9964cbf...b3fc7c5 (2):
  > Introduce util/indirect.hh
  > reactor: new counters for the io queue
2016-03-02 18:52:36 +02:00
Calle Wilund
0c3322befd commitlog: Ensure segment survives whole flush call
Must keep shared pointer alíve.
Likewise though, the shared pointer copy in cycle main continuation
is not needed.

Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>
2016-03-02 18:22:13 +02:00
Calle Wilund
f1c4e3eb3d commitlog: Clear reserve segments in orphan_all
Otherwise they will keep the segment_manager alive (leak).
Fixes jenkins ASan errors.

Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>
2016-03-02 18:22:09 +02:00
Calle Wilund
a556f665c0 commitlog: Take segment_manager locks first in write/flush
While is is formally better to take a local lock first and
then first contend for a global, in this case it is arguably
better to ensure we get a gate exception synchronously (early)
instead of potentially in a continuation. Old version might
cause us to do a gate::leave even while never entered.

And since we should really only have one active (contending)
segment per shard anyway, it should not matter.

Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>
2016-03-02 18:22:05 +02:00
Calle Wilund
e79ca557ed managed_bytes: Change init of small object to silence error on gcc5
Fixes #865

(Some) gcc 5 (5.3.0 for me) on ubuntu will generate errors on
compilation of this code (compiling logalloc_test). The memcpy
to inline storage seems to confuse the compiler.
Simply change to std::copy, which shuts the compiler up.
Any decent stl should convert primitive std::copy to memcpy
anyway, but since it is also the inline (small storage),
it should not matter which way.

Message-Id: <1456931988-5876-4-git-send-email-calle@scylladb.com>
2016-03-02 18:21:51 +02:00
Pekka Enberg
6d7e14a53a Merge "Implement describe_schema_versions" from Paweł
"This series implements describe_schema_versions so that we nodetool
 describecluster can return proper schema information for the whole
 cluster. It involves adding new verb SCHEMA_CHECK which is used to get
 schema version for a given node and a simple map-reduce that using that
 verb gets info from the whole cluster.

 This fixes #677, fixes #684, and fixes #472."
2016-03-02 16:02:53 +02:00
Paweł Dziepak
5396042f06 api: use proper describe_schema_versions implementation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:55 +00:00
Paweł Dziepak
723b3ae7ed storage_service: implement describe_schema_versions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:55 +00:00
Paweł Dziepak
b5eee2e5d4 gms: add inet_address::to_sstring()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:55 +00:00
Paweł Dziepak
ca68c36c8c storage_proxy: handle SCHEMA_CHECK verb
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:54 +00:00
Paweł Dziepak
b92f8a6d2b messaging_service: add SCHEMA_CHECK verb
SCHEMA_CHECK is used to get node schema version.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:54 +00:00
Tomasz Grabiec
9a5d7c6388 log: Prepend log lines with timestamp when printed to stdout
Useful for determining order of events in logs of different nodes, or
for estimating how much time passed between two events.

Fixes #941.

Example log:

INFO  2016-03-01 18:30:37,688 [shard 0] gossip - Waiting for gossip to settle before accepting client requests...
INFO  2016-03-01 18:30:45,689 [shard 0] gossip - No gossip backlog; proceeding
INFO  2016-03-01 18:30:45,689 [shard 0] storage_service - Starting listening for CQL clients on localhost:9042...

Message-Id: <1456853532-28800-1-git-send-email-tgrabiec@scylladb.com>
2016-03-02 13:49:39 +02:00
Avi Kivity
431e1fd379 Merge "Drop db::serializer<>s" from Paweł
"This series removes old-style db::serializer<>s which were replaced by
the IDL-based serialization."
2016-03-02 13:16:36 +02:00
Asias He
a41bcad585 storage_service: Fix run with api lock
Start with coarse control:

1) converting the run_with_write_api_lock operations:

join_ring, start_gossiping, stop_gossiping, start_rpc_server,
stop_rpc_server, start_native_transport, stop_native_transport,
decommission, remove_node, drain, move, rebuild

to use run_with_api_lock which uses a flag to indicate current operation
in progress.

If one of the above operation is in progress when admin issues another
opeartion we return a "try again" exception to avoid running two
operations in parallel.

2) converting the run_with_read_api_lock to use no lock.

Fixes #850.

Message-Id: <00782b601028ed87437e5decae382f72dff634f6.1456758391.git.asias@scylladb.com>
2016-03-02 11:32:02 +02:00
Paweł Dziepak
d50594351b db: remove old-style serializers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 09:09:30 +00:00
Paweł Dziepak
bdc23ae5b5 remove db/serializer.hh includes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 09:07:09 +00:00