Commit Graph

177 Commits

Author SHA1 Message Date
Duarte Nunes
9ffdf4a5cd db: Implement size_estimates_recorder
This patch implements the size_estimates_recorder, which periodically
writes estimations for all the non-system column families in the
size_estimates system table. The size_estimates_recorder class
corresponds to the one in Cassandra's SizeEstimatesRecorder.java.

Estimation is carried out by shard 0. Since we're estimating based on
data in shared sstables, having multiple shards doing this would skew
the results.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-19 09:44:58 +00:00
Paweł Dziepak
7e06499458 repair: convert hashing to streamed_mutations
This patch makes hashing for repair calculate checksums in a way that
doesn't require rebuilding whole mutation.
Unfortunately, such checksums are incompatible with the old ones so the
old way for computing checksums is preserved for compatibility reasons.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00
Gleb Natapov
726b79ea91 messaging_service: enable internode_compression option
Use LZ4 for internode compression if enabled.

Message-Id: <20160711141734.GZ18455@scylladb.com>
2016-07-11 18:30:21 +03:00
Raphael S. Carvalho
85cb2a6d35 database: trigger compaction on boot
At the moment, we only trigger compaction after creating a new
sstable as a result of memtable flush, or some other event such
as changing compaction strategy of a column family.
However, it's important to trigger compaction on boot too.
That will happen after loading all column families.

Fixes #1404.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <54d38a418157454eec97aaba6b8a6b6e51484db4.1467135349.git.raphaelsc@scylladb.com>
2016-06-29 13:47:42 +03:00
Avi Kivity
5b81448ed6 main: add scylla --version option
Fixes #1384.
Message-Id: <1466691517-29964-1-git-send-email-avi@scylladb.com>
2016-06-23 16:24:03 +02:00
Avi Kivity
5af22f6cb1 main: handle exceptions during startup
If we don't, std::terminate() causes a core dump, even though an
exception is sort-of-expected here and can be handled.

Add an exception handler to fix.

Fixes #1379.
Message-Id: <1466595221-20358-1-git-send-email-avi@scylladb.com>
2016-06-23 09:25:33 +03:00
Nadav Har'El
3372052d48 Rewriting shared sstables only after all shards loaded sstables
After commit faa4581, each shard only starts splitting its shared sstables
after opening all sstables. This was important because compaction needs to
be aware of all sstables.

However, another bug remained: If one shard finishes loading its sstables
and starts the splitting compactions, and in parallel a different shard is
still opening sstables - the second shard might find a half-written sstable
being written by the first shard, and abort on a malformed sstable.

So in this patch we start the shared sstable rewrites - on all shards -
only after all shards finished loading their sstables. Doing this is easy,
because main.cc already contains a list of sequential steps where each
uses invoke_on_all() to make sure the step completes on all shards before
continuing to the next step.

Fixes #1371

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>
2016-06-20 16:25:24 +03:00
Avi Kivity
85bb5ea064 Merge "Reduce LSA reclaim latency" from Tomasz
"Reclaiming many segments was observed to cause up to multi-ms
latency. With the new setting, the latency of reclamation cycle with
full segments (worst case mode) is below 1ms.

I saw no difference in throughput in a CQL write micro benchmark
in neither of these workloads:
 - full segments, reclaim by random eviction
 - sparse segments (3% occupancy), reclaim by compaction and no eviction

Fixes #1274."
2016-06-16 10:47:57 +03:00
Tomasz Grabiec
75f899cc93 lsa: Make reclamation step configurable via config 2016-06-14 15:13:15 +02:00
Vlad Zolotarov
d3960f0bbb tracing: rearrange shut down
tracing::tracing local instance is dereferenced from a
cql_server::connection::process_request(), therefore tracing::tracing
service may be stop()ed only after a CQL server service is down.
On the other hand it may not be stopped before RPC service is down
because a remote side may request a tracing for a specific command too.

This patch splits the tracing::tracing stop() into two phases:
   1) Flush all pending tracing records and stop the backend.
   2) Stop the service.

The first phase is called after CQL server is down and before RPC is down.
The second phase is called after RPC is down.

Fixes #1339

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465840496-19990-1-git-send-email-vladz@cloudius-systems.com>
2016-06-14 07:58:04 +03:00
Asias He
e6f63a50e1 main: Delay the messaging_service api registration
Since messaging_service is fully initialized in
storage_service::init_server which calls
messaging_service::start_listen, we need to delay
the messaging_service api registration after it.
2016-06-08 11:13:35 +08:00
Vlad Zolotarov
4b43b08ffc main: start a tracing service
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:53 +03:00
Pekka Enberg
0255318bf3 Revert "Revert "main: change order between storage service and drain execution during exit""
This reverts commit b3ed55be1d.

The issue is in the failing dtest, not this commit. Gleb writes:

  "The bug is in the test, not the patch. Test waits for repair session
   to end one way or the other when node is killed, but for nodetool to
   know if repair is completed it needs to poll for it.  If node dies
   before nodetool managed to see repair completion it will stuck
   forever since jmx is alive, but does not provide answers any more.
   The patch changes timing, repair is completed much close to exit now,
   so problem appears, but it may happen even without the patch.

   The fix is for dtest to kill jmx as part of killing a node
   operation."

Now that Lucas fixed the problem in scylla-ccm, revert the revert.
2016-06-01 08:48:50 +03:00
Pekka Enberg
b3ed55be1d Revert "main: change order between storage service and drain execution during exit"
This reverts commit 0ebd8b18b7.

The change breaks repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test
2016-05-30 12:48:09 +03:00
Avi Kivity
b50cb3eca8 config: rename compact_on_idle
compact_on_idle will lead users to thinking we're talking about sstable
compaction, not log-structured-allocator compaction.

Rename the variable to reduce the probability of confusion.
Message-Id: <1464261650-14136-1-git-send-email-avi@scylladb.com>
2016-05-30 08:39:13 +03:00
Gleb Natapov
0ebd8b18b7 main: change order between storage service and drain execution during exit
Even the comment says drain_on_shutdown should be called first, but for
that in has to be registered last.

Fixes #862

Message-Id: <1463579574-15789-2-git-send-email-gleb@scylladb.com>
2016-05-29 11:39:24 +03:00
Piotr Jastrzebski
136b8148d2 Use idle CPU to compact LSA memory
Register an idle CPU handler that compacts a single segment
every time there's nothing better to execute on CPU.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c26aa608a1e0752fb9e6db1833ef3ba1de95f161.1464169748.git.piotr@scylladb.com>
2016-05-26 12:43:53 +03:00
Pekka Enberg
94e7e61cd0 api: Register snitch API earlier
Currently, we register snitch API in set_server_gossip_settle() which
waits until a node has joined the cluster. This makes 'nodetool status'
not properly show the status of a joining node. Fix the issue by
registering snitch API earlier.

Fixes #1269.
Message-Id: <1463576381-15484-1-git-send-email-penberg@scylladb.com>
2016-05-20 14:24:14 +03:00
Raphael S. Carvalho
bf18025937 main: stop compaction manager earlier
Avi says:
"During shutdown, we prevent new compactions, but perhaps too late.
Memtables are flushed and these can trigger compaction."

To solve that, let's stop compaction manager at a very early step
of shutdown. We will still try to stop compaction manager in
database::stop() because user may ask for a shutdown before scylla
was fully started. It's fine to stop compaction manager twice.
Only the first call will actually stop the manager.

Fixes #1238.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c64ab11f3c91129c424259d317e48abc5bde6ff3.1462496694.git.raphaelsc@scylladb.com>
2016-05-06 07:41:29 +03:00
Takuya ASADA
2bfc8e8c12 main: add tcp_syncookies sanity check
Check net.ipv4.tcp_syncookies, show error message when it set to 0.
Fixes #1118

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460738415-3798-1-git-send-email-syuu@scylladb.com>
2016-04-21 14:55:26 +03:00
Avi Kivity
e43dbac836 main: cancel pending atomic deletions on shutdown
A shared sstable must be compacted by all shards before it can be deleted.
Since we're stoping, that's not going to happen.  Cancel those pending
deletions to let anyone waiting on them to continue.
2016-04-14 17:14:26 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Glauber Costa
e750a94300 sanity check Seastar's I/O queue configuration
While Seastar in general can accept any parameter for its I/O queues, Scylla
in particular shouldn't run with them disabled. Such will be the status when
the max-io-requests parameter is not enabled.

On top of that, we would like to have enough depth per I/O queue not to allow
for shard-local parallelism. Therefore, we will require a minimum per-queue
capacity of 4. In machines where the disk iodepth is not enough to allow for 4
concurrent requests per shard, one should reduce the number of I/O queues.

For --max-io-requests, we will check the parameter itself. However, the
--num-io-queues parameter is not mandatory, and given enough concurrent
requests, Seastar's default configuration can very well just be doing the right
thing. So for that, we will check the final result of each I/O queue.

As it is the case with other checks of the sorts, this can be overridden by
the --developer-mode switch.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <63bf7e91ac10c95810351815bb8f5e94d75592a5.1458836000.git.glauber@scylladb.com>
2016-03-25 11:33:57 +03:00
Gleb Natapov
48c83163b9 init: make more initialization threaded
Since initialization now runs in a thread storage, messaging and
gossiper services initialization code may take advantage of it too.

Message-Id: <20160323094732.GF2282@scylladb.com>
2016-03-23 11:53:11 +02:00
Gleb Natapov
ea92064d38 avoid invoke_on_all during developer-mode application if possible
Message-Id: <20160315145327.GW6117@scylladb.com>
2016-03-22 10:40:30 +02:00
Benoît Canet
3b1d3d977d exceptions: Shutdown communications on non file I/O errors
Apply the same treatment to non file filesystem I/O errors.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:54 +02:00
Benoît Canet
1fb9a48ac5 exception: Optionally shutdown communication on I/O errors.
I/O errors cannot be fixed by Scylla the only solution
is to shutdown the database communications.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:52 +02:00
Pekka Enberg
69dacf9063 main: Fix broadcast_address and listen_address validation errors
Fix the validation error message to look like this:

  Scylla version 666.development-20160316.49af399 starting ...
  WARN  2016-03-17 12:24:15,137 [shard 0] config - Option partitioner is not (yet) used.
  WARN  2016-03-17 12:24:15,138 [shard 0] init - NOFILE rlimit too low (recommended setting 200000, minimum setting 10000; you may run out of file descriptors.
  ERROR 2016-03-17 12:24:15,138 [shard 0] init - Bad configuration: invalid 'listen_address': eth0: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> > (Invalid argument)
  Exiting on unhandled exception of type 'bad_configuration_error': std::exception

Instead of:

  Exiting on unhandled exception of type 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >': Invalid argument

Fixes #1051.

Message-Id: <1458210329-4488-1-git-send-email-penberg@scylladb.com>
2016-03-17 14:59:00 +02:00
Pekka Enberg
972fc6e014 main: Defer API server hooks until commitlog replay
Defer registering services to the API server until commitlog has been
replayed to ensure that nobody is able to trigger sstable operations via
'nodetool' before we are ready for them.
Message-Id: <1458116227-4671-1-git-send-email-penberg@scylladb.com>
2016-03-17 10:04:35 +02:00
Asias He
d79dbfd4e8 main: Defer initalization of streaming
Streaming is used by bootstrap and repair. Streaming uses storage_proxy
class to apply the frozen_mutation and db/column_family class to
invalidate row cache. Defer the initalization just before repair and
bootstrap init.
Message-Id: <8e99cf443239dd8e17e6b6284dab171f7a12365c.1458034320.git.asias@scylladb.com>
2016-03-15 11:56:34 +02:00
Pekka Enberg
eb13f65949 main: Defer REPAIR_CHECKSUM_RANGE RPC verb registration after commitlog replay
Register the REPAIR_CHECKSUM_RANGE messaging service verb handler after
we have replayed the commitlog to avoid responding with bogus checksums.
Message-Id: <1458027934-8546-1-git-send-email-penberg@scylladb.com>
2016-03-15 11:56:18 +02:00
Gleb Natapov
5076f4878b main: Defer storage proxy RPC verb registration after commitlog replay
Message-Id: <20160315071229.GM6117@scylladb.com>
2016-03-15 09:18:12 +02:00
Pekka Enberg
1429213b4c main: Defer migration manager RPC verb registration after commitlog replay
Defer registering migration manager RPC verbs after commitlog has has
been replayed so that our own schema is fully loaded before other other
nodes start querying it or sending schema updates.
Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com>
2016-03-14 18:03:16 +01:00
Glauber Costa
6c4e31bbdb main: when scanning SSTables, run shard 0 first
Deletion of previous stale, temporary SSTables is done by Shard0. Therefore,
let's run Shard0 first. Technically, we could just have all shards agree on the
deletion and just delete it later, but that is prone to races.

Those races are not supposed to happen during normal operation, but if we have
bugs, they can. Scylla's Github Issue #1014 is an example of a situation where
that can happen, making existing problems worse. So running a single shard
first and getting making sure that all temporary tables are deleted provides
extra protection against such situations.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-10 21:06:05 -05:00
Gleb Natapov
16135c2084 make initialization run in a thread
While looking at initialization code I felt like my head is going to
explode. Moving initialization into a thread makes things a little bit
better. Only lightly tested.

Message-Id: <20160310163142.GE28529@scylladb.com>
2016-03-10 17:42:05 +01:00
Gleb Natapov
176aa25d35 fix developer-mode parameter application on SMP
I am almost sure we want to apply it once on each shard, and not multiple
times on a single shard.

Message-Id: <20160310155804.GB28529@scylladb.com>
2016-03-10 17:17:48 +01:00
Pekka Enberg
5dd1fda6cf main: Initialize system keyspace earlier
We start services like gossiper before system keyspace is initialized
which means we can start writing too early. Shuffle code so that system
keyspace is initialized earlier.

Refs #1014
Message-Id: <1457593758-9444-1-git-send-email-penberg@scylladb.com>
2016-03-10 10:39:27 +01:00
Avi Kivity
a1ff21f6ea main: sanity check cpu support
We require SSE 4.2 (for commitlog CRC32), verify it exists early and bail
out if it does not.

We need to check early, because the compiler may use newer instructions
in the generated code; the earlier we check, the lower the probability
we hit an undefined opcode exception.

Message-Id: <1456665401-18252-1-git-send-email-avi@scylladb.com>
2016-02-29 11:41:54 +02:00
Takuya ASADA
0f87922aa6 main: notify service start completion ealier, to reduce systemd unit startup time
Fixes #910

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1455830245-11782-1-git-send-email-syuu@scylladb.com>
2016-02-23 14:33:16 +02:00
Nadav Har'El
7dc843fc1c repair: stop ongoing repairs during shutdown
When shutting down a node gracefully, this patch asks all ongoing repairs
started on this node to stop as soon as possible (without completing
their work), and then waits for these repairs to finish (with failure,
usually, because they didn't complete).

We need to do this, because if the repair loop continues to run while we
start destructing the various services it relies on, it can crash (as
reported in #699, although the specific crash reported there no longer
occurs after some changes in the streaming code). Additionally, it is
important that to stop the ongoing repair, and not wait for it to complete
its normal operation, because that can take a very long time, and shutdown
is supposed to not take more than a few seconds.

Fixes #699.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1455218873-6201-1-git-send-email-nyh@scylladb.com>
2016-02-14 16:52:41 +02:00
Gleb Natapov
2ae1ae2d18 Cleanup messaging_service.hh includes a bit.
Forward declare some classes instead.

Message-Id: <1454496142-14537-2-git-send-email-gleb@scylladb.com>
2016-02-04 13:22:24 +02:00
Tomasz Grabiec
355874281a sstables: Do not register exit hooks from static initializer
Fixes #868.

Registerring exit hooks while reactor is already iterating over exit
hooks is not allowed and currently leads to undefined behavior
observed in #868. While we should make the failure more user friendly,
registering exit hooks concurrently with shutdown will not be allowed.

We don't expect exit hooks to be registered after exit starts because
this would violate the guarantee which says that exit hooks are
executed in reverse order of registration. Starting exit sequence in
the middle of initialization sequence would result in use after free
errors. Btw, I'm not sure if currently there's anything which prevents
this

To solve this problem, move the exit hook to initilization
sequence. In case of tests, the cleanup has to be called explicitly.
2016-02-03 17:35:50 +01:00
Takuya ASADA
4162fb158c main: raise SIGSTOP only when scylla become ready
supervisor_notify() calls periodically, to log message on systemd.
So raise(SIGSTOP) will called multiple times, upstart doesn't expected that.
We need to call it just one time.

Fixes #846

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:30:26 +09:00
Takuya ASADA
b4accd8904 main: autodetect systemd/upstart
We can autodetect systemd/upstart by environment variables, don't need program argument.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:29:32 +09:00
Asias He
b2f2c1c28c storage_service: Add drain on shutdown logic
We register engine().at_exit() callbacks when we initialize the services. We
do not really call the callbacks at the moment due to #293.

It is pretty hard to see the whole picture in which order the services
are shutdown. Instead of for each services to register a at_exit()
callbacks, I proposal to have a single at_exit() callback which do the
shutdown for all the services. In cassandra, the shutdown work is done
in storage_service::drain_on_shutdown callbacks.

In this patch, the drain_on_shutdown is executed during shutdown.

As a result, the proper gossip shutdown is executed and fixes #790.

With this patch, when Ctrl-C on a node, it looks like:

INFO  [shard 0] storage_service - Drain on shutdown: starts
INFO  [shard 0] gossip - Announcing shutdown
INFO  [shard 0] storage_service - Node 127.0.0.1 state jump to normal
INFO  [shard 0] storage_service - Drain on shutdown: stop_gossiping done
INFO  [shard 0] storage_service - CQL server stopped
INFO  [shard 0] storage_service - Drain on shutdown: shutdown rpc and cql server done
INFO  [shard 0] storage_service - Drain on shutdown: shutdown messaging_service done
INFO  [shard 0] storage_service - Drain on shutdown: flush column_families done
INFO  [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
INFO  [shard 0] storage_service - Drain on shutdown: done
2016-01-27 11:45:52 +08:00
Amnon Heiman
b1845cddec Breaking the API initialization into stages
The API needs to be available at an early stage of the initialization,
on the other hand not all the specific APIs are available at that time.

This patch breaks the API initialization into stages, in each stage
additional commands will be available.

While setting that the api header files was broken into api_init.hh that
is relevent to the main and to api.hh which holds the different
api helper functions.

Fixes #754

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453822331-16729-2-git-send-email-amnon@scylladb.com>
2016-01-26 17:41:31 +02:00
Avi Kivity
71eb79aedd main: exit with code 0 on shutdown
To avoid confusing systemd.

Fixes #823.

Message-Id: <1453220473-28712-1-git-send-email-avi@scylladb.com>
2016-01-26 16:26:53 +02:00
Takuya ASADA
b92a075a34 main: support supervisor_notify() on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453422886-26297-1-git-send-email-syuu@scylladb.com>
2016-01-24 12:10:41 +02:00
Pekka Enberg
733584c44d main: Start the API service as the last step
This reverts commit f0d68e4 ("main: start the http server in the first
step"). The service layer is not ready to serve clients before it's
fully up and running which causes early startup crashes everywhere.
Message-Id: <1452768015-22763-1-git-send-email-penberg@scylladb.com>
2016-01-14 12:55:50 +02:00
Avi Kivity
39f81b95d6 main: make --developer-mode relax dma requirements
With Docker we might be running on a filesystem that does not support DMA
(aufs; or tmpfs on boot2docker), so let --developer-mode allow running
on those file systems.
Message-Id: <1452593083-25601-1-git-send-email-avi@scylladb.com>
2016-01-12 13:34:46 +02:00