scylladb

Author	SHA1	Message	Date
Duarte Nunes	9ffdf4a5cd	db: Implement size_estimates_recorder This patch implements the size_estimates_recorder, which periodically writes estimations for all the non-system column families in the size_estimates system table. The size_estimates_recorder class corresponds to the one in Cassandra's SizeEstimatesRecorder.java. Estimation is carried out by shard 0. Since we're estimating based on data in shared sstables, having multiple shards doing this would skew the results. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-19 09:44:58 +00:00
Paweł Dziepak	7e06499458	repair: convert hashing to streamed_mutations This patch makes hashing for repair calculate checksums in a way that doesn't require rebuilding whole mutation. Unfortunately, such checksums are incompatible with the old ones so the old way for computing checksums is preserved for compatibility reasons. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Gleb Natapov	726b79ea91	messaging_service: enable internode_compression option Use LZ4 for internode compression if enabled. Message-Id: <20160711141734.GZ18455@scylladb.com>	2016-07-11 18:30:21 +03:00
Raphael S. Carvalho	85cb2a6d35	database: trigger compaction on boot At the moment, we only trigger compaction after creating a new sstable as a result of memtable flush, or some other event such as changing compaction strategy of a column family. However, it's important to trigger compaction on boot too. That will happen after loading all column families. Fixes #1404. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <54d38a418157454eec97aaba6b8a6b6e51484db4.1467135349.git.raphaelsc@scylladb.com>	2016-06-29 13:47:42 +03:00
Avi Kivity	5b81448ed6	main: add scylla --version option Fixes #1384. Message-Id: <1466691517-29964-1-git-send-email-avi@scylladb.com>	2016-06-23 16:24:03 +02:00
Avi Kivity	5af22f6cb1	main: handle exceptions during startup If we don't, std::terminate() causes a core dump, even though an exception is sort-of-expected here and can be handled. Add an exception handler to fix. Fixes #1379. Message-Id: <1466595221-20358-1-git-send-email-avi@scylladb.com>	2016-06-23 09:25:33 +03:00
Nadav Har'El	3372052d48	Rewriting shared sstables only after all shards loaded sstables After commit `faa4581`, each shard only starts splitting its shared sstables after opening all sstables. This was important because compaction needs to be aware of all sstables. However, another bug remained: If one shard finishes loading its sstables and starts the splitting compactions, and in parallel a different shard is still opening sstables - the second shard might find a half-written sstable being written by the first shard, and abort on a malformed sstable. So in this patch we start the shared sstable rewrites - on all shards - only after all shards finished loading their sstables. Doing this is easy, because main.cc already contains a list of sequential steps where each uses invoke_on_all() to make sure the step completes on all shards before continuing to the next step. Fixes #1371 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>	2016-06-20 16:25:24 +03:00
Avi Kivity	85bb5ea064	Merge "Reduce LSA reclaim latency" from Tomasz "Reclaiming many segments was observed to cause up to multi-ms latency. With the new setting, the latency of reclamation cycle with full segments (worst case mode) is below 1ms. I saw no difference in throughput in a CQL write micro benchmark in neither of these workloads: - full segments, reclaim by random eviction - sparse segments (3% occupancy), reclaim by compaction and no eviction Fixes #1274."	2016-06-16 10:47:57 +03:00
Tomasz Grabiec	75f899cc93	lsa: Make reclamation step configurable via config	2016-06-14 15:13:15 +02:00
Vlad Zolotarov	d3960f0bbb	tracing: rearrange shut down tracing::tracing local instance is dereferenced from a cql_server::connection::process_request(), therefore tracing::tracing service may be stop()ed only after a CQL server service is down. On the other hand it may not be stopped before RPC service is down because a remote side may request a tracing for a specific command too. This patch splits the tracing::tracing stop() into two phases: 1) Flush all pending tracing records and stop the backend. 2) Stop the service. The first phase is called after CQL server is down and before RPC is down. The second phase is called after RPC is down. Fixes #1339 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465840496-19990-1-git-send-email-vladz@cloudius-systems.com>	2016-06-14 07:58:04 +03:00
Asias He	e6f63a50e1	main: Delay the messaging_service api registration Since messaging_service is fully initialized in storage_service::init_server which calls messaging_service::start_listen, we need to delay the messaging_service api registration after it.	2016-06-08 11:13:35 +08:00
Vlad Zolotarov	4b43b08ffc	main: start a tracing service Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:53 +03:00
Pekka Enberg	0255318bf3	Revert "Revert "main: change order between storage service and drain execution during exit"" This reverts commit `b3ed55be1d`. The issue is in the failing dtest, not this commit. Gleb writes: "The bug is in the test, not the patch. Test waits for repair session to end one way or the other when node is killed, but for nodetool to know if repair is completed it needs to poll for it. If node dies before nodetool managed to see repair completion it will stuck forever since jmx is alive, but does not provide answers any more. The patch changes timing, repair is completed much close to exit now, so problem appears, but it may happen even without the patch. The fix is for dtest to kill jmx as part of killing a node operation." Now that Lucas fixed the problem in scylla-ccm, revert the revert.	2016-06-01 08:48:50 +03:00
Pekka Enberg	b3ed55be1d	Revert "main: change order between storage service and drain execution during exit" This reverts commit `0ebd8b18b7`. The change breaks repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test	2016-05-30 12:48:09 +03:00
Avi Kivity	b50cb3eca8	config: rename compact_on_idle compact_on_idle will lead users to thinking we're talking about sstable compaction, not log-structured-allocator compaction. Rename the variable to reduce the probability of confusion. Message-Id: <1464261650-14136-1-git-send-email-avi@scylladb.com>	2016-05-30 08:39:13 +03:00
Gleb Natapov	0ebd8b18b7	main: change order between storage service and drain execution during exit Even the comment says drain_on_shutdown should be called first, but for that in has to be registered last. Fixes #862 Message-Id: <1463579574-15789-2-git-send-email-gleb@scylladb.com>	2016-05-29 11:39:24 +03:00
Piotr Jastrzebski	136b8148d2	Use idle CPU to compact LSA memory Register an idle CPU handler that compacts a single segment every time there's nothing better to execute on CPU. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c26aa608a1e0752fb9e6db1833ef3ba1de95f161.1464169748.git.piotr@scylladb.com>	2016-05-26 12:43:53 +03:00
Pekka Enberg	94e7e61cd0	api: Register snitch API earlier Currently, we register snitch API in set_server_gossip_settle() which waits until a node has joined the cluster. This makes 'nodetool status' not properly show the status of a joining node. Fix the issue by registering snitch API earlier. Fixes #1269. Message-Id: <1463576381-15484-1-git-send-email-penberg@scylladb.com>	2016-05-20 14:24:14 +03:00
Raphael S. Carvalho	bf18025937	main: stop compaction manager earlier Avi says: "During shutdown, we prevent new compactions, but perhaps too late. Memtables are flushed and these can trigger compaction." To solve that, let's stop compaction manager at a very early step of shutdown. We will still try to stop compaction manager in database::stop() because user may ask for a shutdown before scylla was fully started. It's fine to stop compaction manager twice. Only the first call will actually stop the manager. Fixes #1238. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <c64ab11f3c91129c424259d317e48abc5bde6ff3.1462496694.git.raphaelsc@scylladb.com>	2016-05-06 07:41:29 +03:00
Takuya ASADA	2bfc8e8c12	main: add tcp_syncookies sanity check Check net.ipv4.tcp_syncookies, show error message when it set to 0. Fixes #1118 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1460738415-3798-1-git-send-email-syuu@scylladb.com>	2016-04-21 14:55:26 +03:00
Avi Kivity	e43dbac836	main: cancel pending atomic deletions on shutdown A shared sstable must be compacted by all shards before it can be deleted. Since we're stoping, that's not going to happen. Cancel those pending deletions to let anyone waiting on them to continue.	2016-04-14 17:14:26 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Glauber Costa	e750a94300	sanity check Seastar's I/O queue configuration While Seastar in general can accept any parameter for its I/O queues, Scylla in particular shouldn't run with them disabled. Such will be the status when the max-io-requests parameter is not enabled. On top of that, we would like to have enough depth per I/O queue not to allow for shard-local parallelism. Therefore, we will require a minimum per-queue capacity of 4. In machines where the disk iodepth is not enough to allow for 4 concurrent requests per shard, one should reduce the number of I/O queues. For --max-io-requests, we will check the parameter itself. However, the --num-io-queues parameter is not mandatory, and given enough concurrent requests, Seastar's default configuration can very well just be doing the right thing. So for that, we will check the final result of each I/O queue. As it is the case with other checks of the sorts, this can be overridden by the --developer-mode switch. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <63bf7e91ac10c95810351815bb8f5e94d75592a5.1458836000.git.glauber@scylladb.com>	2016-03-25 11:33:57 +03:00
Gleb Natapov	48c83163b9	init: make more initialization threaded Since initialization now runs in a thread storage, messaging and gossiper services initialization code may take advantage of it too. Message-Id: <20160323094732.GF2282@scylladb.com>	2016-03-23 11:53:11 +02:00
Gleb Natapov	ea92064d38	avoid invoke_on_all during developer-mode application if possible Message-Id: <20160315145327.GW6117@scylladb.com>	2016-03-22 10:40:30 +02:00
Benoît Canet	3b1d3d977d	exceptions: Shutdown communications on non file I/O errors Apply the same treatment to non file filesystem I/O errors. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:54 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Pekka Enberg	69dacf9063	main: Fix broadcast_address and listen_address validation errors Fix the validation error message to look like this: Scylla version 666.development-20160316.49af399 starting ... WARN 2016-03-17 12:24:15,137 [shard 0] config - Option partitioner is not (yet) used. WARN 2016-03-17 12:24:15,138 [shard 0] init - NOFILE rlimit too low (recommended setting 200000, minimum setting 10000; you may run out of file descriptors. ERROR 2016-03-17 12:24:15,138 [shard 0] init - Bad configuration: invalid 'listen_address': eth0: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> > (Invalid argument) Exiting on unhandled exception of type 'bad_configuration_error': std::exception Instead of: Exiting on unhandled exception of type 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >': Invalid argument Fixes #1051. Message-Id: <1458210329-4488-1-git-send-email-penberg@scylladb.com>	2016-03-17 14:59:00 +02:00
Pekka Enberg	972fc6e014	main: Defer API server hooks until commitlog replay Defer registering services to the API server until commitlog has been replayed to ensure that nobody is able to trigger sstable operations via 'nodetool' before we are ready for them. Message-Id: <1458116227-4671-1-git-send-email-penberg@scylladb.com>	2016-03-17 10:04:35 +02:00
Asias He	d79dbfd4e8	main: Defer initalization of streaming Streaming is used by bootstrap and repair. Streaming uses storage_proxy class to apply the frozen_mutation and db/column_family class to invalidate row cache. Defer the initalization just before repair and bootstrap init. Message-Id: <8e99cf443239dd8e17e6b6284dab171f7a12365c.1458034320.git.asias@scylladb.com>	2016-03-15 11:56:34 +02:00
Pekka Enberg	eb13f65949	main: Defer REPAIR_CHECKSUM_RANGE RPC verb registration after commitlog replay Register the REPAIR_CHECKSUM_RANGE messaging service verb handler after we have replayed the commitlog to avoid responding with bogus checksums. Message-Id: <1458027934-8546-1-git-send-email-penberg@scylladb.com>	2016-03-15 11:56:18 +02:00
Gleb Natapov	5076f4878b	main: Defer storage proxy RPC verb registration after commitlog replay Message-Id: <20160315071229.GM6117@scylladb.com>	2016-03-15 09:18:12 +02:00
Pekka Enberg	1429213b4c	main: Defer migration manager RPC verb registration after commitlog replay Defer registering migration manager RPC verbs after commitlog has has been replayed so that our own schema is fully loaded before other other nodes start querying it or sending schema updates. Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com>	2016-03-14 18:03:16 +01:00
Glauber Costa	6c4e31bbdb	main: when scanning SSTables, run shard 0 first Deletion of previous stale, temporary SSTables is done by Shard0. Therefore, let's run Shard0 first. Technically, we could just have all shards agree on the deletion and just delete it later, but that is prone to races. Those races are not supposed to happen during normal operation, but if we have bugs, they can. Scylla's Github Issue #1014 is an example of a situation where that can happen, making existing problems worse. So running a single shard first and getting making sure that all temporary tables are deleted provides extra protection against such situations. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:06:05 -05:00
Gleb Natapov	16135c2084	make initialization run in a thread While looking at initialization code I felt like my head is going to explode. Moving initialization into a thread makes things a little bit better. Only lightly tested. Message-Id: <20160310163142.GE28529@scylladb.com>	2016-03-10 17:42:05 +01:00
Gleb Natapov	176aa25d35	fix developer-mode parameter application on SMP I am almost sure we want to apply it once on each shard, and not multiple times on a single shard. Message-Id: <20160310155804.GB28529@scylladb.com>	2016-03-10 17:17:48 +01:00
Pekka Enberg	5dd1fda6cf	main: Initialize system keyspace earlier We start services like gossiper before system keyspace is initialized which means we can start writing too early. Shuffle code so that system keyspace is initialized earlier. Refs #1014 Message-Id: <1457593758-9444-1-git-send-email-penberg@scylladb.com>	2016-03-10 10:39:27 +01:00
Avi Kivity	a1ff21f6ea	main: sanity check cpu support We require SSE 4.2 (for commitlog CRC32), verify it exists early and bail out if it does not. We need to check early, because the compiler may use newer instructions in the generated code; the earlier we check, the lower the probability we hit an undefined opcode exception. Message-Id: <1456665401-18252-1-git-send-email-avi@scylladb.com>	2016-02-29 11:41:54 +02:00
Takuya ASADA	0f87922aa6	main: notify service start completion ealier, to reduce systemd unit startup time Fixes #910 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1455830245-11782-1-git-send-email-syuu@scylladb.com>	2016-02-23 14:33:16 +02:00
Nadav Har'El	7dc843fc1c	repair: stop ongoing repairs during shutdown When shutting down a node gracefully, this patch asks all ongoing repairs started on this node to stop as soon as possible (without completing their work), and then waits for these repairs to finish (with failure, usually, because they didn't complete). We need to do this, because if the repair loop continues to run while we start destructing the various services it relies on, it can crash (as reported in #699, although the specific crash reported there no longer occurs after some changes in the streaming code). Additionally, it is important that to stop the ongoing repair, and not wait for it to complete its normal operation, because that can take a very long time, and shutdown is supposed to not take more than a few seconds. Fixes #699. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1455218873-6201-1-git-send-email-nyh@scylladb.com>	2016-02-14 16:52:41 +02:00
Gleb Natapov	2ae1ae2d18	Cleanup messaging_service.hh includes a bit. Forward declare some classes instead. Message-Id: <1454496142-14537-2-git-send-email-gleb@scylladb.com>	2016-02-04 13:22:24 +02:00
Tomasz Grabiec	355874281a	sstables: Do not register exit hooks from static initializer Fixes #868. Registerring exit hooks while reactor is already iterating over exit hooks is not allowed and currently leads to undefined behavior observed in #868. While we should make the failure more user friendly, registering exit hooks concurrently with shutdown will not be allowed. We don't expect exit hooks to be registered after exit starts because this would violate the guarantee which says that exit hooks are executed in reverse order of registration. Starting exit sequence in the middle of initialization sequence would result in use after free errors. Btw, I'm not sure if currently there's anything which prevents this To solve this problem, move the exit hook to initilization sequence. In case of tests, the cleanup has to be called explicitly.	2016-02-03 17:35:50 +01:00
Takuya ASADA	4162fb158c	main: raise SIGSTOP only when scylla become ready supervisor_notify() calls periodically, to log message on systemd. So raise(SIGSTOP) will called multiple times, upstart doesn't expected that. We need to call it just one time. Fixes #846 Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-01-27 23:30:26 +09:00
Takuya ASADA	b4accd8904	main: autodetect systemd/upstart We can autodetect systemd/upstart by environment variables, don't need program argument. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-01-27 23:29:32 +09:00
Asias He	b2f2c1c28c	storage_service: Add drain on shutdown logic We register engine().at_exit() callbacks when we initialize the services. We do not really call the callbacks at the moment due to #293. It is pretty hard to see the whole picture in which order the services are shutdown. Instead of for each services to register a at_exit() callbacks, I proposal to have a single at_exit() callback which do the shutdown for all the services. In cassandra, the shutdown work is done in storage_service::drain_on_shutdown callbacks. In this patch, the drain_on_shutdown is executed during shutdown. As a result, the proper gossip shutdown is executed and fixes #790. With this patch, when Ctrl-C on a node, it looks like: INFO [shard 0] storage_service - Drain on shutdown: starts INFO [shard 0] gossip - Announcing shutdown INFO [shard 0] storage_service - Node 127.0.0.1 state jump to normal INFO [shard 0] storage_service - Drain on shutdown: stop_gossiping done INFO [shard 0] storage_service - CQL server stopped INFO [shard 0] storage_service - Drain on shutdown: shutdown rpc and cql server done INFO [shard 0] storage_service - Drain on shutdown: shutdown messaging_service done INFO [shard 0] storage_service - Drain on shutdown: flush column_families done INFO [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO [shard 0] storage_service - Drain on shutdown: done	2016-01-27 11:45:52 +08:00
Amnon Heiman	b1845cddec	Breaking the API initialization into stages The API needs to be available at an early stage of the initialization, on the other hand not all the specific APIs are available at that time. This patch breaks the API initialization into stages, in each stage additional commands will be available. While setting that the api header files was broken into api_init.hh that is relevent to the main and to api.hh which holds the different api helper functions. Fixes #754 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1453822331-16729-2-git-send-email-amnon@scylladb.com>	2016-01-26 17:41:31 +02:00
Avi Kivity	71eb79aedd	main: exit with code 0 on shutdown To avoid confusing systemd. Fixes #823. Message-Id: <1453220473-28712-1-git-send-email-avi@scylladb.com>	2016-01-26 16:26:53 +02:00
Takuya ASADA	b92a075a34	main: support supervisor_notify() on Ubuntu Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1453422886-26297-1-git-send-email-syuu@scylladb.com>	2016-01-24 12:10:41 +02:00
Pekka Enberg	733584c44d	main: Start the API service as the last step This reverts commit `f0d68e4` ("main: start the http server in the first step"). The service layer is not ready to serve clients before it's fully up and running which causes early startup crashes everywhere. Message-Id: <1452768015-22763-1-git-send-email-penberg@scylladb.com>	2016-01-14 12:55:50 +02:00
Avi Kivity	39f81b95d6	main: make --developer-mode relax dma requirements With Docker we might be running on a filesystem that does not support DMA (aufs; or tmpfs on boot2docker), so let --developer-mode allow running on those file systems. Message-Id: <1452593083-25601-1-git-send-email-avi@scylladb.com>	2016-01-12 13:34:46 +02:00

1 2 3 4

177 Commits